Instance reduction for supervised learning using input-output clustering method

来源 :Journal of Central South University | 被引量 : 0次 | 上传用户:houqiusheng
下载到本地 , 更方便阅读
声明 : 本文档内容版权归属内容提供方 , 如果您对本文有版权争议 , 可与客服联系进行内容授权或下架
A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed.The proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data.Then,a set of prototypes are selected from the clustered input data.The inessential data can be ultimately discarded from the data set.The proposed method can reduce the effect from outliers because only the prototypes are used.This method is applied to reduce the data set in regression problems.Two standard synthetic data sets and three standard real-world data sets are used for evaluation.The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets.From the experiments,the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets.The numbers of instances of the synthetic data sets are decreased by 25%-69%.The reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%,respectively.The reduction rate of 96% is very good for the electrocardiogram(ECG) data set because of the redundant and periodic nature of ECG signals.For all of the data sets,the regression results are similar to those from the corresponding original data sets.Therefore,the regression performance of the proposed method is good while only a fraction of the data is needed in the training process. A method that applies clustering technique to reduce the number of samples of large data sets using input-output clustering is proposed. Proposed method clusters the output data into groups and clusters the input data in accordance with the groups of output data .hen, a set of prototypes are selected from the clustered input data. inessential data can be ultimately discarded from the data set. proposed method can reduce the effect from outliers because only the prototypes are used. This method is applied to reduce the data set in regression problems.Two standard synthetic data sets and three standard real-world data sets are used for evaluation.The root-mean-square errors are compared from support vector regression models trained with the original data sets and the corresponding instance-reduced data sets. From the experiments, the proposed method provides good results on the reduction and the reconstruction of the standard synthetic and real-world data sets.The numbers of instanc es of the synthetic data sets are decreased by 25% -69%. reduction rates for the real-world data sets of the automobile miles per gallon and the 1990 census in CA are 46% and 57%, respectively. reduction rate of 96% is very good for the electrocardiogram (ECG) data set because of redundant and periodic nature of ECG signals. For all of the data sets, the regression results are similar to those from corresponding that the original data sets. Agofore the regression performance of the proposed method is good while only a fraction of the data is needed in the training process.
在现有拟阵和模糊拟阵理论的基础上,本文主要研究了模糊拟阵模糊基,特别是准模糊图拟阵模糊基的性质。  1)研究了模糊拟阵的闭、正规与其模糊基的存在性之间的关系,给出了闭模