摘要现如今的新浪微博已经很深入的影响着人们的日常生活,随着智能手机的迅速普及,人们可以随时随地发布自己的状态,具有实时性和信息碎片性等特点。随着新浪微博功能不断完善,开始形成自己的生态系统,微博用户之间可以相互关注,可以评论、转发和赞自己感兴趣的微博内容,同时还能发布长微博,具有很强的互动性和灵活性。作为现如今的第一社交媒体,新浪微博庞大的用户群和因此而产生的海量数据是值得我们很好的研究的。本文研究了微博数据的提取、话题检测和微博内容的相应情感分析。23120
传统的网络文本数据提取一般是利用图遍历的方法通过网络爬虫搜集信息,而本文是利用新浪微博提供的API接口去获取自己想要的微博中的内容。
本文介绍了相关的微博话题检测大致流程和相应算法,本文主要调用中科院ICTCLAS 2014分词系统里已有的关键词提取算法去获得微博话题。从而筛选相应的微博内容,在此基础上,通过情感分类对微博内容进行模型化表示,进一步转换为能通过weka处理的数据格式,进而通过机器学习来进行情感分析。
毕业论文关键词:微博;数据提取;话题检测;机器学习;情感分析
Title Microblogging hot topic extraction and analysis
        techniques                                               
Abstract
Today's sina weibo has been very deeply into and affect people's daily lives,with the rapidly growing popularity of smart phones,people just need to be anywhere that people can publish their own state through  finger.So it has real-time information and other characteristics of debris.And now sina weibo function continuously improved, began to form their own ecosystem,weibo users can mutual concern and comment, forwarding, like mutual concern people microblogging content, which has a strong interaction and flexibility makes microblogging has a very strong social features.As is now the first social media, weibo huge user base and huge amounts of data thus generated is worth a good study. This paper studies the microblogging data extraction, topic detection and corresponding emotions microblogging content analysis.
Traditional network text data extraction using graph traversal general idea of gathering information through the web crawler, but this paper is to use API interface provided by sina weibo to get what you want microblogging content, only to realize it is convenient to extract data and extract efficiency is very good.
In introducing the relevant microblogging topic detection process and the corresponding algorithms, the paper calls the CASICTCLAS 2014 segmentation system existing keyword extraction algorithm to obtain microblogging topic. Thereby filtering the corresponding micro-blog content, on this basis, through emotional dictionaries for the micro-blog content processing, expressed as processed by weka data format, and then through machine learning for sentiment analysis.
Keywords : microblogging; data acquisition; topic detection; machine learning; sentiment analysis
目录
摘    要I
AbstractII
1  绪论.1
  1.1  研究背景.1
  1.2  研究现状.2
  1.3  研究的内容和意义.2
      1.3.1  研究内容.2
      1.3.2  研究意义.3
  1.4  论文组织结构.3
2  相关背景知识介绍.4
  2.1  微博.4
      2.1.1  微博的发展历程、新浪微博及其特性.4
上一篇:基于Android的图书管理系统中学生端挂失模块设计
下一篇:深空目标中段飞行仿真中的航迹生成软件的开发

Wireshark的P2P文件共享中的行为提取软件设计

安卓的微博客户端设计+源代码

Python广告投放分类问题中的特征提取方法

微博热搜”机制的创新传...

语音信号的基音周期提取方法研究

社交网络上用户建模融合...

微博社交网络社区发现方法的研究

AES算法GPU协处理下分组加...

从政策角度谈黑龙江對俄...

提高教育质量,构建大學生...

浅论职工思想政治工作茬...

STC89C52单片机NRF24L01的无线病房呼叫系统设计

浅谈高校行政管理人员的...

基于Joomla平台的计算机学院网站设计与开发

上海居民的社会参与研究

酵母菌发酵生产天然香料...

压疮高危人群的标准化中...