Presentation Title

Application of Text Mining to Determine the Overall Sentimental Value and Word Frequencies of Abstracts Related to Traffic Safety

Faculty Mentor

Wen Cheng

Start Date

23-11-2019 1:30 PM

End Date

23-11-2019 1:45 PM

Location

Markstein 105

Session

oral 3

Type of Presentation

Oral Talk

Subject Area

engineering_computer_science

Abstract

Text mining is a relatively new method that analyzes the frequencies, sentimental values, and main topics of various texts. Being extensively used in social sciences and other similar fields, text mining’s applications in the field of traffic safety have been very limited. The research utilizes a large number of abstracts collected online from the Traffic Research Record (TRR) between 1996 and 2018, which serves as a testbed to explore the capabilities of text mining in performing a safety-related literature review. To yield more reliable results, the abstract data were first filtered through singularization and applying some commonly used stop words. The word frequency was then determined for the follow-up research including sentimental analysis, pairwise correlation calculations and the topic analysis. Word frequency was proven to be inversely related to the ranking the word through Ziff’s law. Both “bing” and “afinn” methods were employed to evaluate the sentimental value of the data. Pearsons’ correlation coefficient was calculated for Bigram analysis while Latent Dirichlet Allocation (LDA) modeling was utilized to determine topics and assign associated words. The results indicate that the overall sentimental value of the data is negative due to the overall nature of traffic safety. The word frequency and its rank are inversely correlated, which is consistent with Zipf’s law. Both bi-gram correlations and topic analysis revealed several “hot topics” that were explored within the last two decades. This information could be used to direct future research areas that have been inadequately explored.

This document is currently not available here.

Share

COinS
 
Nov 23rd, 1:30 PM Nov 23rd, 1:45 PM

Application of Text Mining to Determine the Overall Sentimental Value and Word Frequencies of Abstracts Related to Traffic Safety

Markstein 105

Text mining is a relatively new method that analyzes the frequencies, sentimental values, and main topics of various texts. Being extensively used in social sciences and other similar fields, text mining’s applications in the field of traffic safety have been very limited. The research utilizes a large number of abstracts collected online from the Traffic Research Record (TRR) between 1996 and 2018, which serves as a testbed to explore the capabilities of text mining in performing a safety-related literature review. To yield more reliable results, the abstract data were first filtered through singularization and applying some commonly used stop words. The word frequency was then determined for the follow-up research including sentimental analysis, pairwise correlation calculations and the topic analysis. Word frequency was proven to be inversely related to the ranking the word through Ziff’s law. Both “bing” and “afinn” methods were employed to evaluate the sentimental value of the data. Pearsons’ correlation coefficient was calculated for Bigram analysis while Latent Dirichlet Allocation (LDA) modeling was utilized to determine topics and assign associated words. The results indicate that the overall sentimental value of the data is negative due to the overall nature of traffic safety. The word frequency and its rank are inversely correlated, which is consistent with Zipf’s law. Both bi-gram correlations and topic analysis revealed several “hot topics” that were explored within the last two decades. This information could be used to direct future research areas that have been inadequately explored.