Performance analysis of new word weighting procedures for opinion mining

来源 :Frontiers of Information Technology & Electronic Engineering | 被引量 : 0次 | 上传用户:cjn2503687
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
The proliferation of forums and blogs leads to challenges and opportunities for processing large amounts of information. The information shared on various topics often contains opinionated words which are qualitative in nature. These qualitative words need statistical computations to convert them into useful quantitative data. This data should be processed properly since it expresses opinions. Each of these opinion bearing words differs based on the significant meaning it conveys. To process the linguistic meaning of words into data and to enhance opinion mining analysis, we propose a novel weighting scheme, referred to as inferred word weighting(IWW). IWW is computed based on the significance of the word in the document(SWD) and the significance of the word in the expression(SWE) to enhance their performance. The proposed weighting methods give an analytic view and provide appropriate weights to the words compared to existing methods. In addition to the new weighting methods, another type of checking is done on the performance of text classification by including stop-words. Generally, stop-words are removed in text processing. When this new concept of including stop-words is applied to the proposed and existing weighting methods, two facts are observed:(1) Classification performance is enhanced;(2) The outcome difference between inclusion and exclusion of stop-words is smaller in the proposed methods, and larger in existing methods. The inferences provided by these observations are discussed. Experimental results of the benchmark data sets show the potential enhancement in terms of classification accuracy. The proliferation of forums and blogs leads to challenges and opportunities for processing large amounts of information. The information shared on various topics often containing opinionated words which are qualitative in nature. These qualitative words need statistical computations to convert them into useful quantitative data. This data each of these opinion bearing words differs based on the significant meaning it conveys. To process the linguistic meaning of words into data and to enhance opinion mining analysis, we propose a novel weighting scheme, referred to as inferred word weighting (IWW). IWW is computed based on the significance of the word in the document (SWD) and the significance of the word in the expression (SWE) to enhance their performance. The proposed weighting methods give an analytic view and provide appropriate weights to the words compared to existing methods. another addition to the new weighting methods, another type of checking is done on the performance of text classification by including stop-words. Generally, stop-words are removed in text processing. When this new concept of including stop-words is applied to the proposed and existing weighting methods, two facts are (2) The outcome difference between inclusion and exclusion of stop-words is smaller in the proposed methods, and larger in existing methods. The inferences provided by these observations are discussed. Experimental results of the benchmark data sets show the potential enhancement in terms of classification accuracy.
其他文献
虚拟专用网(VPN)是近几年提出的—个新的网络概念。它是INTERNET飞速发展,社会经济日趋全球化、信息化,和网络安全问题日益突出这三方面因素共同作用的产物。研究、实现VPN的解
请下载后查看,本文暂不支持在线获取查看简介。 Please download to view, this article does not support online access to view profile.
期刊
Coordinating mobile robots are widely used in commercial and industrial settings to fulfill various tasks. However, to program the coordination among mobile rob
最优标号与最优嵌入问题是组合最优化学科非常活跃的一个研究课题.,它具有很强的应用性,并且包含一系列内容相当丰富的理论问题.该文的研究与其中的两个问题有关.由以下两部
利用光镜和电镜技术系统研究了苹果轮纹病菌葡萄座腔菌在成熟果实上的侵染扩展过程及其细胞学特征。扫描电镜观察发现,接种后3h位于皮孔处的分生孢子开始萌发,萌发后的孢子从
随着福利分房成为历史,以住房货币化为核心的中国住房制度改革已经起步.作为中国新的经济增长点,中国住宅产业的启动将对中国国民经济产生重要而深远的影响.但是,住宅金融市
压缩感知理论作为一种全新的信号采集、编解码理论,已被广泛地应用到图像处理、模式识别、自动控制和生物传感等领域.压缩感知信号恢复是压缩感知理论的核心内容之一,恢复算
该文首先建立了两种软测量模型,即多元逐步回归软测量模型和BP神经网络软测量模型.该文提出将正交试验设计和多元逐步回归结合的思想来确定BP神经网络软测量模型的输入神经元
6月26日,中共湘西土家族苗族自治州委在凤凰县火炉坪乡隆重集会,为新时期领导干部的优秀代表——郑培民同志铜像落成举行盛大的揭幕仪式。 州委领导童名谦、龙颂江、向厚兴
硕士学位论文《Banach空间上算子代数K-理论初探》是泛函分析学科Ba-nach空间理论与算子理论有机结合进行初步探索的产物,在§1,通过引入(半)Browder算子等概念,说明这类算子