GenomeThreader Gene Prediction Software
GenomeThreader is a software tool to compute gene structure predictions.
The gene structure predictions are calculated using a similarity-based approach
where additional cDNA/EST and/or protein sequences are used to predict gene
structures via spliced alignments.
GenomeThreader was motivated by disabling limitations in
GeneSeqer, a popular gene prediction program which is widely used
for plant genome annotation.
Features
-
Intron Cutout Technique:
The intron cutout technique allows to overcome the time and space
limitations of the dynamic programming (DP) algorithms used in
GeneSeqer,
in particular, when applied to organisms containing long introns.
-
Baysian Splice Site Models (BSSMs):
With BSSMs it is possible to assign probabilities to GT donor, GC donor,
and AG acceptor sites. This information is used in the DP to get the exact
exon/intron boundaries right.
-
Combination of cDNA/EST Based Spliced Alignments with Protein Based Spliced
Alignments:
After (spliced) aligning the supplied cDNAs/ESTs and protein sequences onto
the genomic template, GenomeThreader computes consensus spliced
alignments. Consensus spliced alignments combine several spliced alignments
to resolve the complete gene structure and to uncover alternative splicing.
-
Incremental Updates:
When the used cDNA/EST or protein database is updated, a common approach
was to redo the complete mapping. With GenomeThreader, you can combine
newly computed spliced alignments with precomputed spliced alignments to
quickly recompute consensus spliced alignments.
-
XML:
The additional GenomeThreader XML output conforms to our gthXML
standard GenomeThreader.rng.txt. With
the included script XML2GFF.py, it is possible to convert gthXML output to the
GFF format.
A variety of gthXML-specific tools can be found
here.
-
gthDB:
We also provide
a schema and load script for gthDB, which permits storage
and query of GenomeThreader output in a relational format.
References have been omitted for brevity; you can find them and more details on
the implementation in the GenomeThreader
paper.
How to take advantage of these features and many more is described in depth in
the GenomeThreader manual.
Please consult the FAQ page for frequently asked
questions.
All mentioned files and scripts are also part of the GenomeThreader
distribution (see below).
Availability
GenomeThreader is available free of charge.
You can download a copy.
Examples
-
Evaluation cases described in Gremme et
al. 2005 (see below)
-
A 16.6Kb rice gene structure tractable with GenomeThreader (using
both an intron cutout technique
and without), but beyond
GeneSeqer's limitations.
-
A 125Kb intron-containing human
gene structure.
-
Small samples of gzip'ed
plain text and
XML
GenomeThreader output.
Users
The following sites use GenomeThreader. This list is not intended to be
comprehensive.
Citations
Here are the most important publications citing GenomeThreader (sorted by Journal)
-
Wang et. al.
The genome sequence of African rice (Oryza glaberrima) and
evidence for independent domestication,
Nature Genetics
46:982-988, 2014.
-
Argout et. al.
The genome of Theobroma cacao,
Nature Genetics
43:101-108, 2011.
-
The Tomato Genome Consortium
The tomato genome sequence provides insights into fleshy fruit
evolution,
Nature
485:635-641, 2012.
-
The International Barley Genome Sequencing Consortium
A physical, genetic and functional sequence assembly of the barley
genome,
Nature
491:711-716, 2012.
-
J.M. Cock et. al.
The Ectocarpus genome and the independent evolution of multicellularity in brown algae,
Nature
465:617-621, 2010.
-
The International Brachypodium Initiative
Genome sequencing and analysis of the model grass Brachypodium
distachyon,
Nature
463:763-768, 2010.
-
R. Wang et. al.
PEP1 regulates perennial flowering in
Arabis alpina,
Nature
459:423-427, 2009.
-
A.H. Paterson et. al.
The Sorghum bicolor genome and the diversification of
grasses,
Nature
457:551-556, 2009.
-
P. Abad et. al.
Genome sequence of the metazoan plant-parasitic nematode
Meloidogyne incognita,
Nature Biotechnology
26:909-915, 2008.
-
Wang et. al.
The Spirodela polyrhiza genome reveals insights into its
neotenous reduction fast growth and aquatic lifestyle,
Nature Communications
5 Article number: 3311, 2014.
-
The International Wheat Genome Sequencing Consortium (IWGSC)
A chromosome-based draft sequence of the hexaploid bread wheat
(Triticum aestivum) genome,
Science
345(6194), 2014.
-
Pfeifer et. al.
Genome interplay in the grain transcriptome of hexaploid bread
wheat,
Science
345(6194), 2014.
-
R. Bruggmann et. al.
Uneven chromosome contraction and expansion in the maize
genome,
Genome Research
16:1241-1251, 2006.
-
Moreau et. al.
Gene functionalities and genome structure in
Bathycoccus prasinos reflect cellular specializations at the
base of the green lineage,
Genome Biology
13(8):R74, 2012.
-
Duvick et. al.
PlantGDB: a resource for comparative plant genomics,
Nucl. Acids Res.
36:D959-D965, 2008.
-
Nijkamp et. al.
Exploring variation-aware contig graphs for (comparative)
metagenomics using MaryGold,
Bioinformatics
29(22):2826-2834, 2013.
-
Montalent et. al.
EuGène-maize: a web site for maize gene prediction,
Bioinformatics
26(9):1254-1255, 2010.
-
Wang et. al.
Identification and Dissection of Four Major QTL Affecting Milk Fat
Content in the German Holstein-Friesian Population,
PLOS one
7(7):e40711, 2012.
-
Petre et. al.
RNA-Seq of Early-Infected Poplar Leaves by the Rust Pathogen
Melampsora larici-populina Uncovers PtSultr3;5, a
Fungal-Induced Host Sulfate Transporter,
PLOS one
7(8):e44408, 2012.
-
Grenville-Briggs et. al.
A Molecular Insight into Algal-Oomycete Warfare: cDNA Analysis of
Ectocarpus siliculosus Infected with the Basal
Oomycete Eurychasma dicksonii,
PLOS one
6(9):e24500, 2011.
-
Di Filippo et. al.
Euchromatic and heterochromatic compositional properties
emerging from the analysis of Solanum lycopersicum BAC
sequences,
Gene
499(1):176-181, 2012.
-
Pausch et. al.
Genome-Wide Association Study Identifies Two Major Loci Affecting
Calving Ease and Growth-Related Traits in Cattle,
Genetics
187(1):289-297, 2011.
-
Martin et. al.
A uniquely high number of ftsZ genes in the moss
Physcomitrella patens,
Plant Biology
11(5):744-750, 2009.
-
Richardt et. al.
Microarray analysis of the moss Physcomitrella patens reveals
evolutionarily conserved transcriptional regulation of salt stress
and abscisic acid signalling,
Plant Molecular Biology
72(1):27-45, 2010.
-
De Palma et. al.
Suppression Subtractive Hybridization analysis provides new insights into the tomato (Solanum lycopersicum L.) response to the plant probiotic microorganism Trichoderma longibrachiatum MK1,
Journal of Plant Physiology
190:79-94, 2016.
-
van der Burgt et. al.
Pseudogenization in pathogenic fungi with different host plants and lifestyles might reflect their evolutionary past,
Molecular Plant Pathology
15(2):133-144, 2014.
-
M. Calviño, R. Bruggmann and J. Messing
Screen of genes linked to high-sugar content in stems by
comparative genomics,
Rice
1(2):166-176, 2008.
-
Lin et. al.
Structural and Functional Divergence of a 1-Mb Duplicated Region in
the Soybean (Glycine max) Genome and Comparison to an
Orthologous Region from Phaseolus vulgaris,
The Plant Cell
22(8):2545-2561, 2010.
-
Lelandais-Briere et. al.
Genome-Wide Medicago truncatula Small RNA Analysis Revealed
Novel MicroRNAs and Isoforms Differentially Regulated in Roots and
Nodules,
The Plant Cell
21(9):2780-2896, 2009.
-
Van de Velde et. al.
Inference of Transcriptional Networks in Arabidopsis through
Conserved Noncoding Sequence Analysis,
The Plant Cell
26(7):2729-2745, 2009.
-
Schallau et. al.
Identification and genetic analysis of the APOSPORY locus
in Hypericum perforatum L,
The Plant Journal
62(5):773-784, 2010.
-
Tang et. al.
Unleashing the Genome of Brassica Rapa,
Front Plant Sci.
3:172, 2012.
-
Castagnone-Sereno et. al.
Data-mining of the Meloidogyne incognita degradome and
comparative analysis of proteases in nematodes,
Genomics
97(1):29-36, 2011.
-
Pausch et. al.
Homozygous haplotype deficiency reveals deleterious mutations
compromising reproductive and rearing success in cattle,
BMC Genomics
16:312, 2015.
-
Jung et. al.
A nonsense mutation in PLD4 is associated with a zinc
deficiency-like syndrome in Fleckvieh cattle,
BMC Genomics
15:632, 2014.
-
Ercolano et. al.
Patchwork sequencing of tomato San Marzano and Vesuviano
varieties highlights genome-wide variations,
BMC Genomics
15:138, 2014.
-
Venhoranta et. al.
In frame exon skipping in UBE3B is associated with developmental
disorders and increased mortality in cattle,
BMC Genomics
15:1, 2014.
-
Zimmer et. al.
Reannotation and extended community resources for the genome of the
non-seed plant Physcomitrella patens provide insights into the
evolution of plant gene structures and functions,
BMC Genomics
14:498, 2013.
-
Jansen et. al.
Assessment of the genomic variation in a cattle population by
re-sequencing of key animals at low to medium coverage,
BMC Genomics
14:446, 2013.
-
Schiffer et. al.
The genome of Romanomermis culicivorax: revealing fundamental
changes in the core developmental genetic toolkit in Nematoda,
BMC Genomics
14:923, 2013.
-
Duo et. al.
Mitochondrial genome evolution in species belonging to the
Phialocephala fortinii s.l. - Acephala applanata
species complex,
BMC Genomics
13:166, 2012.
-
Steuernagel et. al.
De novo 454 sequencing of barcoded BAC pools for comprehensive gene
survey and genome analysis in the complex genome of barley,
BMC Genomics
10:547, 2009.
-
Mondego et. al.
A genome survey of Moniliophthora perniciosa gives new
insights into Witches' Broom Disease of cacao,
BMC Genomics
9:548, 2008.
-
A. Ballvora et. al.
Comparative sequence analysis of Solanum and
Arabidopsis in
a hot spot for pathogen resistance on potato chromosome V reveals
a patchwork of conserved and rapidly evolving genome segments,
BMC Genomics
8:112, 2007.
-
Iorizzo et. al.
A DArT marker-based linkage map for wild potato
Solanum bulbocastanum facilitates structural comparisons
between Solanum A and B genomes,
BMC Genetics
15:123, 2014.
-
Licciardello et. al.
Characterization of the glutathione S-transferase gene family through
ESTs and expression analyses within common and pigmented cultivars
of Citrus sinensis (L.) Osbeck,
BMC Plant Biology
14:39, 2014.
-
Sinha et. al.
Identification and characterization of NAGNAG alternative
splicing in the moss Physcomitrella patens,
BMC Plant Biology
10:76, 2010.
-
Bazzini et. al.
miSolRNA: A tomato micro RNA relational database,
BMC Plant Biology
10:240, 2010.
-
D'Agostino et. al.
SolEST database: a "one-stop shop" approach to the study
of Solanaceae transcriptomes,
BMC Plant Biology
9:142, 2009.
-
M.E. Sparks and V. Brendel
MetWAMer: eukaryotic translation initiation site prediction,
BMC Bioinformatics
9:381, 2008.
-
Chiusano et. al.
ISOL@: an Italian SOLAnaceae genomics resource,
BMC Bioinformatics
9(2):57, 2008.
-
Q. Dong, M.D. Wilkerson and V. Brendel
Tracembler - software for in-silico chromosome walking in
unassembled genomes,
BMC Bioinformatics
8:151, 2007.
-
Flisikowski et. al.
Variation in neighbouring genes of the dopaminergic and serotonergic
systems affects feather pecking behaviour of laying hens,
Animal Genetics
40(2):192-199, 2009.
-
Juling et. al.
Characterization of a 320-kb region containing the HEXA gene on
bovine chromosome 10 and analysis of its association with BSE
susceptibility,
Animal Genetics
39(4):400-406, 2008.
-
Foissac et. al.
Genome Annotation in Plants and Fungi: EuGèene as a Model
Platform,
Current Bioinformatics
3(2), 2008.
-
Sen et. al.
MaizeGDB becomes 'sequence-centric',
Database--the journal of biological databases and curation
2009 bap020, 2009.
-
Nijkamp et. al.
De novo sequencing, assembly and analysis of the genome of the
laboratory strain Saccharomyces cerevisiae CEN.PK113-7D,
a model for modern industrial biotechnology,
Microbial Cell Factories
9:548, 2012.
-
Asp et. al.
Comparative sequence analysis of VRN1 alleles of
Lolium perenne
with the co-linear regions in barley, wheat, and rice,
Molecular Genetics and Genomics
286(5):433-447, 2011.
-
Cohen et. al.
RAPPORT: running scientific high-performance computing applications
on the cloud,
Philos Trans A Math Phys Eng Sci.
371:20120073, 2013.
-
Traini et. al.
Genome Microscale Heterogeneity among Wild Potatoes Revealed by
Diversity Arrays Technology Marker Sequences,
International Journal of Genomics
Article ID 257218, 2013.
If I missed a publication which cites GenomeThreader, please contact
me.
Developers
GenomeThreader is being actively developed by the following individuals:
Publications
Please cite the following article in publications about research using
GenomeThreader:
For in-depth information about GenomeThreader please refer to the
following dissertation: