论文部分内容阅读
An optimal selection method o f s amples o f calibration s et a nd o f v alidation set I n spectral multivariate analysis YUAN Hong-fu*,LIU Wei,ZHAO Zhong,SONG Chun-Feng,LI Xiao-yu Beijing University of Chemical Technology,B eijing l0002 9,C hina K eyword S ample s ubset pa rtitioning; K ennard-Stone a lgorithm; N IR spectroscopy l.Introduction The sample selection of calibration set and validation set is very important to multivariate analysis.The selected samples should cover all components which will exist possibly in the sample to be analyzed and same as the range of their properties so that the interpolation method can be used to predict.In calibration set or validation set,population distribution of samples should be even in the analyzed range.However,in the property range of th e samples c ollected I n a ctual a pplication,usually t he s ample p opulation d istribution is not e ven d ue to th e limitation of availability of samples.The most common case is that there are more samples in the middle range and less in the end ranges,as well without any sample in some range.Sample population can play a role of weight on regression analysis.If a calibration set,in which there are more samples in middle,is used to construct a model and the prediction value of the model will lead to i?averagingi±,that is,for large value,the prediction value become lower,and vice versa.The common u sed m ethods o f sample s election I nclude random(RS)m ethod and ke nnard-Stone(KS)method.The RS method is simple and cani_t assure that the selected samples are of typical.The calibration set is selected by KS method according toEuclidean distance or Mahalanobis distance in the matrix X.The samples which are apparently different in spectral space are selected into calibration set and the rest samples into validation set.It has been proved that the KS method is much better than the KS.However,in the low concentration range,because the spectral difference between samples are tiny,the selected samples are not typical.A new method of optimal sample selection method named as Rank-KS by considering the spectral space and property space is proposed in this paper,which aims at improving the uniformity of calibration set and validation set.