Presentation Title

Homology based gene annotation of protein coding and regulatory regions in 57 kb of D. eugracilis DNA.

Presenter Information

Eduardo CruzFollow

Faculty Mentor

Dr. Alexa Sawa

Start Date

17-11-2018 8:30 AM

End Date

17-11-2018 10:30 AM

Location

CREVELING 64

Session

POSTER 1

Type of Presentation

Poster

Subject Area

biological_agricultural_sciences

Abstract

Currently, sequencing of DNA is a fast and relatively inexpensive endeavor, even allowing for whole-genome sequencing of some organisms. However, accurately identifying genes within newly sequenced DNA remains challenging, and the development of software to perform computational gene prediction is ongoing. De novo gene prediction software often does not accurately annotate genes found in newly sequenced DNA. The Genomics Education Partnership teaches undergraduates to use bioinformatics tools to perform homology-based gene annotation of both coding and regulatory DNA. In this project, possible protein-coding regions were identified within a 57 kilobase contig of D. eugracillis DNA using various gene prediction programs. Predicted genes were translated into amino acid sequences and evaluated for comparison to orthologous proteins in D. melanogaster using the Flybase protein database. Matches were evaluated based on conservation, homology, and relative expect-values. Potential orthologs in D. eugracilis were annotated using the D. melanogaster sequences provided by Flybase and aligned using NCBI Basic Local Alignment Search Tool (BLAST). After annotating the coding DNA regions, possible Transcription Start Site (TSS) locations were identified, defined, and annotated. Search regions were identified and narrowed based on sequence alignment from BLAST using the 5' Untranslated Region of their respective orthologs in D. melanogaster. TSS regions were defined based on BLAST sequences alignment, DNase Hypersensitivity, RNA-Sequence coverage, and the characterization and location of core promoter motifs. Our data support a model that contains 6 genes, with a total of 13 isoforms, including a putative paralog that does not exist in D. melanogaster. TSS search regions were defined and annotated for each isoform and the putative paralog.

This document is currently not available here.

Share

COinS
 
Nov 17th, 8:30 AM Nov 17th, 10:30 AM

Homology based gene annotation of protein coding and regulatory regions in 57 kb of D. eugracilis DNA.

CREVELING 64

Currently, sequencing of DNA is a fast and relatively inexpensive endeavor, even allowing for whole-genome sequencing of some organisms. However, accurately identifying genes within newly sequenced DNA remains challenging, and the development of software to perform computational gene prediction is ongoing. De novo gene prediction software often does not accurately annotate genes found in newly sequenced DNA. The Genomics Education Partnership teaches undergraduates to use bioinformatics tools to perform homology-based gene annotation of both coding and regulatory DNA. In this project, possible protein-coding regions were identified within a 57 kilobase contig of D. eugracillis DNA using various gene prediction programs. Predicted genes were translated into amino acid sequences and evaluated for comparison to orthologous proteins in D. melanogaster using the Flybase protein database. Matches were evaluated based on conservation, homology, and relative expect-values. Potential orthologs in D. eugracilis were annotated using the D. melanogaster sequences provided by Flybase and aligned using NCBI Basic Local Alignment Search Tool (BLAST). After annotating the coding DNA regions, possible Transcription Start Site (TSS) locations were identified, defined, and annotated. Search regions were identified and narrowed based on sequence alignment from BLAST using the 5' Untranslated Region of their respective orthologs in D. melanogaster. TSS regions were defined based on BLAST sequences alignment, DNase Hypersensitivity, RNA-Sequence coverage, and the characterization and location of core promoter motifs. Our data support a model that contains 6 genes, with a total of 13 isoforms, including a putative paralog that does not exist in D. melanogaster. TSS search regions were defined and annotated for each isoform and the putative paralog.