Presentation Title

Fighting Fake News One Article at a Time

Presenter Information

Cole ElpelFollow

Faculty Mentor

Dr. Chang-Shyh Peng

Start Date

18-11-2017 12:30 PM

End Date

18-11-2017 1:30 PM

Location

BSC-Ursa Minor 92

Session

Poster 2

Type of Presentation

Poster

Subject Area

engineering_computer_science

Abstract

Fighting Fake News One Article At a Time

Cole Elpel

Dr. Peng

Introduction: With the rise of independent news sources, the concept of “fake news” has always been an active threat. The goal of this project is to attempt to understand how fake news works since, as far as its appearance online is concerned, it’s a relatively new issue. The hypothesis is that if multiple news stories on a central topic are accurate, they should share similar information. The program designed over the course of this Summer was meant to test for these very similarities by using turnitin.com’s model for comparing papers against its database.

Current Project Status: Currently, it can extract any given news story from the HTML code embedded in a web page using an API called “Jsoup.” With this story, the program can provide a number which represents the amount of similarities between one story and another with an API called “java-string-similarity.” The specific method for comparing stories is known as a "Normalized Levenshtein Comparison." This gives us the percentage of operations required to turn one String into another String, producing a percentage of “differences” between the two. By subtracting 1 from that percentage, the number provided represents the amount of similarities between the two stories. By running this comparison multiple times with one story against a series of stories in the “database,” the average of those percentages was used to determine whether or not a story could be “fake.”

Conclusion: The threshold used for these tests was, if the percentage was within the range of 50% or above, the story was most likely real or related to the articles in the database. These results, with the proper conformation, could provide us with a new understanding of how fake news is identified.

Summary of research results to be presented

The program can, in its current state, provide a list of percentages for a comparison of one story against a set of five other stories of varying degrees of similarity. While testing a story within the pool of five stories against the others, we managed to have an average similarity score of ~40% or more for several tests.

These results are what we'd expect to see from articles that are related to each other on a given topic. However, despite satisfying the goal of the project, further testing for accuracy on several more news stories in the future will be conducted.

This document is currently not available here.

Share

COinS
 
Nov 18th, 12:30 PM Nov 18th, 1:30 PM

Fighting Fake News One Article at a Time

BSC-Ursa Minor 92

Fighting Fake News One Article At a Time

Cole Elpel

Dr. Peng

Introduction: With the rise of independent news sources, the concept of “fake news” has always been an active threat. The goal of this project is to attempt to understand how fake news works since, as far as its appearance online is concerned, it’s a relatively new issue. The hypothesis is that if multiple news stories on a central topic are accurate, they should share similar information. The program designed over the course of this Summer was meant to test for these very similarities by using turnitin.com’s model for comparing papers against its database.

Current Project Status: Currently, it can extract any given news story from the HTML code embedded in a web page using an API called “Jsoup.” With this story, the program can provide a number which represents the amount of similarities between one story and another with an API called “java-string-similarity.” The specific method for comparing stories is known as a "Normalized Levenshtein Comparison." This gives us the percentage of operations required to turn one String into another String, producing a percentage of “differences” between the two. By subtracting 1 from that percentage, the number provided represents the amount of similarities between the two stories. By running this comparison multiple times with one story against a series of stories in the “database,” the average of those percentages was used to determine whether or not a story could be “fake.”

Conclusion: The threshold used for these tests was, if the percentage was within the range of 50% or above, the story was most likely real or related to the articles in the database. These results, with the proper conformation, could provide us with a new understanding of how fake news is identified.