Presentation Title

Using part-of-speech tags to identify presence of location information in social media messages

Start Date

November 2016

End Date

November 2016

Location

MSE 113

Type of Presentation

Oral Talk

Abstract

With the growth of social media in the modern world, it has integrated itself as a quick source to access and produce information, most often on a local and personal level. Much of this information, however, is expressed with limited context and informal use of language, where many standard natural language processing techniques fall short. Analyzing comments made on social media platforms using part-of-speech (POS) tags, it is possible to automatically create new rules for indicating points of interest such as locations contained in the message. We used word segments from approximately 10,000 tweets relevant to transportation issues in California and tagged them using the Ark-Twitter API to get the POS tags. We then verified if locations are contained in these segments by validating with Wikidata and Google’s Knowledge Graph. We transform this data into feature vectors containing POS for the word segment, its adjacent words with a defined radius of tags before and after, and relative position in a tweet. This data was then used to train eight different machine learning classifiers; the highest precision and recall were found to be approximately 86% and 90% respectively. There was an increase in precision and recall from the baseline case of no radius to a radius of 3 words by +6.7% and +15% respectively. This suggests that there is a recognizable pattern still widely used in social media with enough context to indicate when locations are being discussed, despite incorrect grammar and noisy text.

This document is currently not available here.

Share

COinS
 
Nov 12th, 10:00 AM Nov 12th, 10:15 AM

Using part-of-speech tags to identify presence of location information in social media messages

MSE 113

With the growth of social media in the modern world, it has integrated itself as a quick source to access and produce information, most often on a local and personal level. Much of this information, however, is expressed with limited context and informal use of language, where many standard natural language processing techniques fall short. Analyzing comments made on social media platforms using part-of-speech (POS) tags, it is possible to automatically create new rules for indicating points of interest such as locations contained in the message. We used word segments from approximately 10,000 tweets relevant to transportation issues in California and tagged them using the Ark-Twitter API to get the POS tags. We then verified if locations are contained in these segments by validating with Wikidata and Google’s Knowledge Graph. We transform this data into feature vectors containing POS for the word segment, its adjacent words with a defined radius of tags before and after, and relative position in a tweet. This data was then used to train eight different machine learning classifiers; the highest precision and recall were found to be approximately 86% and 90% respectively. There was an increase in precision and recall from the baseline case of no radius to a radius of 3 words by +6.7% and +15% respectively. This suggests that there is a recognizable pattern still widely used in social media with enough context to indicate when locations are being discussed, despite incorrect grammar and noisy text.