论文部分内容阅读
本文以某银行1887家贷款意愿已知的中小企业为研究对象,依托大数据环境下的网络爬虫技术,从互联网采集中小企业的工商、失信、裁判、百度以及招聘信息,建立影响中小企业贷款意愿的指标体系,并利用决策树和Logistic回归算法分别对中小企业贷款意愿进行预测。最后通过准确率、F测度和ROC面积等评价指标的对比分析发现,决策树模型的预测结果优于Logistic回归模型,并且企业是否有百度信息、是否发生工商变更、一级行业对中小企业的贷款意愿有显著的影响,为银行发掘贷款目标客户提供有益参考,同时在一定程度上缓解了中小企业融资难、融资贵问题。
This article takes 1887 small and medium-sized enterprises (SMEs) that a bank wishes to borrow as their research object. Based on the web crawler technology under the environment of big data, it collects the business, dishonesty, referee, Baidu and recruitment information of SMEs from the Internet and establishes the loan intention that will affect SMEs The index system, and use the decision tree and Logistic regression algorithm to predict the willingness of SMEs loans. Finally, through the comparative analysis of evaluation indexes such as accuracy rate, F measure and ROC area, it is found that the prediction result of decision tree model is better than Logistic regression model, and whether there is Baidu information, whether there is business change or not, Intention to have a significant impact on banks for the loan target customers to provide a useful reference, to a certain extent, ease the financing of SMEs, financing problems.