Presentation Title

Multi-Core Joins

Faculty Mentor

Dr. Graham Matthews

Start Date

17-11-2018 8:30 AM

End Date

17-11-2018 10:30 AM

Location

HARBESON 34

Session

POSTER 1

Type of Presentation

Poster

Subject Area

engineering_computer_science

Abstract

Multi-Core Joins

Danny Suarez

Graham Matthews

Data Integration is the process of combining together multiple tables of data from different data sources to create one uniform view of the data. This is an important topic as organizations rely heavily on data integration to pull information from multiple sources in order to make data driven decisions. One example could be a hospital that needs to pull information about patients from doctor’s offices, testing labs, and other hospitals to make a decision for how they should treat their patients. The primary operation of data integration is called a join. A join operation combines two tables of data according to some criterion. Typically when integrating data there are multiple join operations to be done between multiple sources, so it is important to perform these operations as quickly as possible. Our research consists of implementing and analyzing methods to efficiently execute chains of joins. The methods that we investigate are geared towards modern computers which have multiple cores, allowing them to perform multiple tasks simultaneously. Ideally the more cores that a computer has the more efficient our methods will be. The first method we implemented is the phased approach in which we read in data followed by processing it. Secondly, we have a method called the multi-phased approach which has multiple read-process phases. Our testing showed that in both approaches more cores did increase the speed but not to the extent that we expected. We also looked at the differences in performance between the phased and multi-phased approaches. Our hypothesis was that the multi-phased approach would be more efficient than the phased approach, however our results showed no significant difference between the two. We were able to make significant improvements to each approach, but our results show that there is still room for improvement.

This document is currently not available here.

Share

COinS
 
Nov 17th, 8:30 AM Nov 17th, 10:30 AM

Multi-Core Joins

HARBESON 34

Multi-Core Joins

Danny Suarez

Graham Matthews

Data Integration is the process of combining together multiple tables of data from different data sources to create one uniform view of the data. This is an important topic as organizations rely heavily on data integration to pull information from multiple sources in order to make data driven decisions. One example could be a hospital that needs to pull information about patients from doctor’s offices, testing labs, and other hospitals to make a decision for how they should treat their patients. The primary operation of data integration is called a join. A join operation combines two tables of data according to some criterion. Typically when integrating data there are multiple join operations to be done between multiple sources, so it is important to perform these operations as quickly as possible. Our research consists of implementing and analyzing methods to efficiently execute chains of joins. The methods that we investigate are geared towards modern computers which have multiple cores, allowing them to perform multiple tasks simultaneously. Ideally the more cores that a computer has the more efficient our methods will be. The first method we implemented is the phased approach in which we read in data followed by processing it. Secondly, we have a method called the multi-phased approach which has multiple read-process phases. Our testing showed that in both approaches more cores did increase the speed but not to the extent that we expected. We also looked at the differences in performance between the phased and multi-phased approaches. Our hypothesis was that the multi-phased approach would be more efficient than the phased approach, however our results showed no significant difference between the two. We were able to make significant improvements to each approach, but our results show that there is still room for improvement.