Filtering Chinese Image Spam Using Pseudo-OCR

来源 :Chinese Journal of Electronics | 被引量 : 0次 | 上传用户:hfg595
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
论文部分内容阅读
For image spam filtering, the Optical character recognition(OCR) based methods often achieve a better performance due to the more complex structure of recognizing corresponding text. However, applying traditional OCR techniques usually introduced shortcomings like the expensive computational cost, vulnerability to image noises and artificial interferences, especially for Chinese image spam filtering. So, by optimizing recognition procedure of traditional OCR, we propose the idea of pseudo-OCR more suitable for Chinese image spam filtering. During which discriminating the potential image spam character features from ham ones is sufficient, instead of recognizing them. What’s more, a novel Chinese key-point based character feature specific for pseudo-OCR is also devised and extracted using a carefully designed algorithm, which outperforms classic corner detection methods in finding such key-points. Experiment results show that our proposed system usually has a better performance than traditional OCR based method while maintaining a low false positive rate. However, applying traditional OCR techniques only introduces shortcomings like the expensive computational cost, vulnerability to image noises and Artificial interferences, especially for Chinese image spam filtering. So, by optimizing recognition procedure of traditional OCR, we propose the idea of ​​pseudo-OCR more suitable for Chinese image spam filtering. During which discriminating the potential image spam character features from ham ones is sufficient instead of recognizing them. What’s more, a novel Chinese key-point based character feature specific for pseudo-OCR is also devised and extracted using a calibrated designed algorithm, which outperforms classic corner detection methods in finding such key-points. that our proposed system usually has a better performance than tra ditional OCR based method while maintaining a low false positive rate.
其他文献
管道在役检验rn对无损检测的挑战--管道检验rnB Nestleroth(美国)rn对总气、油管道焊缝进行漏磁法非破坏性监测的装置rnA Kovalenko,S Makarov,A Sedykh(俄罗斯)rn发展相控阵
提出了用遗传算法优化ⅡR数字滤波器的传递函数,使其幅频响应曲线的斜率逼近-3 dB/oct.用优化得到的数字滤波器对白噪声序列滤波即可获得粉红噪声.试验结果表明,用遗传算法优
目的探讨儿童、父母的行为生活方式及室内环境暴露与儿童急性白血病(AL)发病关系。方法以2011年4月至2014年1月常住上海市年龄<15岁的66例新发AL患儿为病例组,通过1∶2配对的病例-对照研究对性别、年龄、居住地进行匹配,排除有血液系统疾病、肿瘤等恶性疾病的儿童,选取132名儿童作为对照组。两组研究对象均排除领养儿童及同时患有唐氏综合征或HIV阳性或其他能够增高白血病发病风险的遗传性疾病者。
期刊
期刊
该文从挂篮荷载计算、施工流程、支座及临时固结施工、挂篮安装及试验、合拢段施工、模板制作安装、钢筋安装、混凝土的浇筑及养生、测量监控等方面人手,介绍了S226海滨大桥
唐筛和糖筛是完全不同的概念.糖筛是妊娠期糖尿病的筛查,它针对的是孕妇.唐筛是唐氏综合征的筛查,它主要针对的是胎儿.唐氏综合征现在主要是通过筛查的手段来早期发现.孕妇怀
期刊
在电站锅炉受热面改造管排焊口探伤中 ,改进原探伤工艺 ,采用铅箔增感屏补偿法及周向 X射线探伤机 ,对规格相同、透照焦距不等的管排焊口 ,一次透照出多张合格的底片 ,提高工
This paper focuses on the Direction of arrival(DOA) tracking problem in dynamic environments where each source signal is modeled as a Gaussian process with time