论文部分内容阅读
针对复杂产品多属性、高维度的特点,引入LASSO方法对其关键质量特性进行识别。首先利用LASSO方法的特征选择能力降低原始数据集的维度,并获得原始数据集中质量属性同质量类别相关性的排序;根据要求数量选取属性组成关键质量特性属性子集,利用SVM测试所选取属性子集的分类精度,并同已有文献结果进行对比。以UCI数据库中SECOM数据集为例,采用SMOTE过抽样和随机欠抽样相结合的方法使数据均衡后进行测试。结果表明,该方法不仅能够消除高维原数据集中不相关和冗余属性,还能保持良好的分类质量。同IG和ReliefF等方法相比,文中方法所获得的关键质量特性的分类精度有显著提高,并且第二类错误率也明显低于前两种方法。
In view of the multi-attribute and high-dimensional features of complex products, LASSO method was introduced to identify its key quality features. Firstly, the feature selection ability of LASSO method is used to reduce the dimension of the original data set, and the ranking of the correlation between the quality attribute in the original data set and the quality class is obtained; the attributes are selected according to the required number to form a subset of the key quality characteristic attributes, and the selected attributes are tested using the SVM. The classification accuracy of the set is compared with the results of the existing literature. Taking the SECOM dataset in the UCI database as an example, the SMOTE oversampling and random undersampling methods are used to test the data after equalization. The results show that this method can not only eliminate the uncorrelated and redundant attributes of the high-dimensional original data set, but also maintain a good classification quality. Compared with other methods such as IG and ReliefF, the classification accuracy of the key quality features obtained by the proposed method is significantly improved, and the second error rate is also significantly lower than the former two methods.