论文部分内容阅读
非共现数据是指不符合联合概率分布,而是符合一个未知函数的数据.将非共现数据转化为共现形式后可以采用熵来定量度量信息并进行聚类.但是,现有算法假设非共现数据的各个属性特征对聚类贡献均匀,没有考虑代表性属性和不相关(冗余)属性对聚类效果的不同影响.因此,本文提出一个非共现数据的两阶段加权IB算法(TSAW-sIB),在非共现数据共现转化的两个阶段,从“非共现/共现/联合”三个视角观察非共现数据,突出代表性属性,抑制冗余属性,获得更能准确反映非共现数据特征的数据表示并进行聚类.实验表明,TSAW-sIB算法优于ROCK、COOLCAT和LIMBO算法.
Non-co-occurrence data refers to data that does not conform to the joint probability distribution but to an unknown function.When the non-co-occurrence data is transformed into co-occurrence form, entropy can be used to measure and cluster the metric information.However, The characteristics of non-co-occurrence data contribute uniformly to the cluster without considering the different influence of the representative attribute and the irrelevant (redundant) attribute on the clustering effect.Therefore, this paper proposes a two-phase weighted IB algorithm (TSAW-sIB). In non-co-occurrence data co-occurrence transformation, non-co-occurrence data are observed from three perspectives: “non-coexistence / coexistence / union”, highlighting representative attributes and suppressing redundant attributes , And get the data representation that can more accurately reflect the characteristics of non-co-occurrence data.Further experiments show that the TSAW-sIB algorithm outperforms the ROCK, COOLCAT and LIMBO algorithms.