Citation:
Ockendon NF, O’Connell LA, Bush, Stephen J, Monzonsandoval J, Barnes H, Szekely T, Hofmann HA, Dorus S, Urrutia AO. Optimization of next-generation sequencing transcriptome annotation for species lacking sequenced genomes. Molecular Ecology Resources [Internet]. 16 :446-458.
Abstract
Next-generation sequencing methods, such as RNA-seq, have permitted the exploration of gene expression in a rangeof organisms which have been studied in ecological contexts but lack a sequenced genome. However, the efficacyand accuracy of RNA-seq annotation methods using reference genomes from related species have yet to be robustlycharacterized. Here we conduct a comprehensive power analysis employing RNA-seq data from Drosophila melano-gaster in conjunction with 11 additional genomes from related Drosophila species to compare annotation methodsand quantify the impact of evolutionary divergence between transcriptome and the reference genome. Our analysesdemonstrate that, regardless of the level of sequence divergence, direct genome mapping (DGM), where transcriptshort reads are aligned directly to the reference genome, significantly outperforms the widely used de novo andguided assembly-based methods in both the quantity and accuracy of gene detection. Our analysis also reveals thatDGM recovers a more representative profile of Gene Ontology functional categories, which are often used to inter-pret emergent patterns in genomewide expression analyses. Lastly, analysis of available primate RNA-seq datademonstrates the applicability of our observations across diverse taxa. Our quantification of annotation accuracy andreduced gene detection associated with sequence divergence thus provides empirically derived guidelines for thedesign of future gene expression studies in species without sequenced genomes.