基于改进Bert模型的建筑事故隐患分类方法研究Research on hidden danger classification method of construction accident based on improved Bert model
李华;陈俞源;高红;何思敏;乔峥元;
摘要(Abstract):
在智慧工地项目安全管理过程中,为实现事故隐患排查信息的自动分类识别,提出了建筑事故隐患分类的Bert改进模型。该模型首先将术语多类别加权与单词嵌入方式相结合,其次对focal loss函数采用遗传算法优化类别权重α_t代替交叉熵损失函数,再者以Bert模型为基础构建了3种改进型分类算法,实现了隐患语料集的有效分类,最后采用3组算法对语料集进行对比验证。结果表明:ga_Bert+tfidf+focal模型在各隐患类别上的总体F_1分别高出其他3类模型5.9%、1.6%和0.66%,达到92.86%,对建筑事故隐患文本分类适用性较好。改进后的Bert模型解决了术语在不同类别标签的文档中具有不同重要性的问题,减缓了在多分类任务中各类别数据分布不均衡对模型分类性能的影响,为建筑企业项目安全管理智能化提供了理论支持。
关键词(KeyWords): 安全社会工程;Focal loss;Bert;术语权重;不均衡数据集;事故隐患分类
基金项目(Foundation): 西安建筑科技大学校基金自然科学专项(X20180011)
作者(Authors): 李华;陈俞源;高红;何思敏;乔峥元;
DOI: 10.13637/j.issn.1009-6094.2020.1817
参考文献(References):
- [1] 张鸿辉,李润求.2012—2018年建筑施工事故统计分析及对策[J].科技创新与应用,2020(31):135-137,139.ZHANG H H,LI R Q.Statistical analysis and countermeasures of construction accidents in years 2012-2018[J].Technology Innovation and Application,2020(31):135-137,139.
- [2] 中华人民共和国住房和城乡建设部.住房和城乡建设部办公厅关于2018年房屋市政工程生产安全事故情况的通报[EB/OL].(2019-03-22)[2020-06-20].http://www.mohurd.goc.cn/.Ministry of Housing and Urban-Rural Development of the People's Republic of China.Notification on the special action on production safety accidents and construction safety in housing and municipal works in 2018[EB/OL].(2019-03-22)[2020-06-20].http://www.mohurd.goc.cn/.
- [3] 谭章禄,陈孝慈.基于文本挖掘的煤矿安全隐患管理研究[J].中国安全生产科学技术,2020,16(2):43-48.TAN Z L,CHEN X C.Research on management of hidden danger in coal mine based on text mining[J].Journal of Safety Science and Technology,2020,16(2):43-48.
- [4] 陈孝慈,谭章禄,单斐,等.基于Bigram的安全隐患文本分类研究[J].中国安全科学学报,2017,27(8):156-161.CHEN X C,TAN Z L,SHAN F,et al.Research on text categorization for hidden dangers based on Bigram[J].China Safety Science Journal,2017,27(8):156-161.
- [5] GOH Y M,UBEYNARAYANA C U.Construction accident narrative classification:an evaluation of text mining techniques[J].Accident Analysis & Prevention,2017,108:122-130.
- [6] 王洁宁,张聪俊,张钰涵.民航不安全事件报告危险源识别模型[J].安全与环境学报,2020,20(1):186-192.WANG J N,ZHANF C J,ZHANG Y H.Causative safety model identification reports on the civil aviation incidents and accidents[J].Journal of Safety and Environment,2020,20(1):186-192.
- [7] DEVLIN J,CHANG M W,LEE K,et al.Bert:pre-training of deep bidirectional transformers for language understanding[EB/OL].[2018-10-11].https://doi.org/10.48550/arXiv.1810.04805.
- [8] 赵旸,张智雄,刘欢,等.基于BERT模型的中文医学文献分类研究[J].数据分析与知识发现,2020,4(8):41-49.ZHAO Y,ZHANG Z X,LIU H,et al.Classification of Chinese medical literature with BERT model[J].Data Analysis and Knowledge Discovery,2020,4(8):41-49.
- [9] FANG W L,LUO H B,XU S J,et al.Automated text classification of near-misses from safety reports:an improved deep learning approach[J].Advanced Engineering Informatics,2020,44(6):101060.
- [10] CHAWLA N V,BOWYER K W,HALL L O,et al.SMOTE:synthetic minority over-sampling technique[J].Journal of Artificial Intelligence Research,2002,16(1):321-357.
- [11] CIESLAK D A,CHAWLA N V,STRIEGEL A.Combating imbalance in network intrusion datasets[C]//Proceedings of IEEE International Conference on Granular Computing.New York:Institute of Electrical and Electronics Engineers,2006:732-737.
- [12] LIN T Y,GOYAL P,GIRSHICK R,et al.Focal loss for dense object detection[C]//Proceedings of the IEEE International Conference on Computer Vision.New York:Institute of Electrical and Electronics Engineers,2017:2980-2988.
- [13] 裴颂文,吴百锋.动态自适应特征权重的多类文本分类算法研究[J].计算机应用研究,2011,28(11):4092-4096.PEI S W,WU B F.Research on dynamic self-adaptive term weighting for multi-class text classification algorithm[J].Application Research of Computers,2011,28(11):4092-4096.
- [14] ULLAH F,JABBAR S,AL-TURJMAN F.Programmers' de-anonymization using a hybrid approach of abstract syntax tree and deep learning[J].Technological Forecasting and Social Change,2020,159(2):120186
- [15] WU H B,GU X D,GU Y W.Balancing between over-weighting and under-weighting in supervised term weighting[J].Information Processing & Management,2017,53(2):547-557.
- [16] JONES K S.A statistical interpretation of term specificity and its application in retrieval[J].Journal of Documentation,2004,60(5):493-502.
- [17] GOODFELLOW I,BENGIO Y,COURVILLE A,et al.Deep learning[M].Cambridge:MIT Press,2016:218-227.
- [18] HO K L,HSU Y Y,CHEN C F,et al.Short term load forcasting of Taiwan power system using a knowledge-based expert system[J].IEEE Transactions on Power Systems,1990,5(4):1214-1221.
- [19] POWERS D M W.Evaluation:from precision,recall and F-measure to ROC,informedness,markedness and correlation[J].Journal of Machine Learning Technologies,2011,2(1):37-63.