Collected Essays on Finance and Economics ›› 2023, Vol. 39 ›› Issue (9): 80-90.

Previous Articles     Next Articles

Negative Media Coverage and Corporate Financial Distress Prediction: Based on Text Analysis and Machine Learning

SUN Jie, LI Nengfei, ZHAO Mengru   

  1. School of Accounting, Tianjin University of Finance and Economics, Tianjin 300222, China
  • Received:2022-05-19 Online:2023-09-10 Published:2023-09-12

媒体负面报道与公司财务困境预警——基于文本分析和机器学习

孙洁, 李能飞, 赵梦茹   

  1. 天津财经大学会计学院,天津 300222
  • 通讯作者: 李能飞(1995—),女,甘肃会宁人,天津财经大学会计学院博士生
  • 作者简介:孙洁(1979—),女,浙江瑞安人,天津财经大学会计学院教授;赵梦茹(1995—),女,河北石家庄人,天津财经大学会计学院博士生。
  • 基金资助:
    国家自然科学基金面上项目(72271174;71771162);2022年天津市研究生科研创新项目(2022BKY227)

Abstract: This paper firstly uses text mining and natural language processing techniques to analyze the text of negative media coverage of A-share listed companies in Shanghai Stock Exchange and Shenzhen Stock Exchange from 2011 to 2021, and then constructs variables in the following three dimensions: The number of negative media coverage, the textual tone of the headlines/content of the coverage, and the emotional tendency. Based on the concept drift perspective, an ensemble classifier XGBoost is used to construct a dynamic financial distress prediction model. The empirical results show that the dynamic financial distress prediction model that combines negative media coverage variables with firm feature variables can enhance the overall classification performance and reduce the type I error rate. After considering the rare event bias and endogeneity, the study finds that the number of negative media coverage, the text tone and emotional tendency of headlines/content are still significantly correlated with whether a company has financial distress. The conclusion of this paper shows that negative media coverage contains the incremental information of corporate financial distress prediction, and negative media coverage plays an important role in external corporate governance in the process of corporate financial distress prediction.

Key words: Negative Media Coverage, Financial Distress Prediction, Concept Drift, Text Analysis, Machine Learning

摘要: 本文运用文本挖掘和自然语言处理技术,对2011—2021年沪深A股上市公司的媒体负面报道进行文本分析,构建了媒体负面报道数量、报道标题/内容文本语调和情感倾向三个维度的变量。基于概念漂移视角,采用集成分类器XGBoost构建了公司财务困境动态预警模型。实证结果表明:在公司特征变量的基础上,融入媒体负面报道变量后,财务困境动态预警模型的总体分类性能增强,模型的第一类错误率明显降低。考虑稀有事件偏差和内生性后,媒体负面报道数量、文本语调和情感倾向与公司是否发生财务困境仍然显著相关。本文的研究结论表明媒体负面报道包含公司财务困境预警的增量信息,媒体负面报道在公司财务困境预警过程中发挥了重要的外部公司治理作用。

关键词: 媒体负面报道, 财务困境预警, 概念漂移, 文本分析, 机器学习

CLC Number: