Transposon identification using profile HMMs.

Publication Type:

Journal Article


BMC genomics, Volume 11 Suppl 1, p.S10 (2010)


2010, Algorithms, Animals, Center-Authored Paper, DNA Transposable Elements, Gene Expression Profiling, Humans, Markov Chains, MICE, Models, Genetic, Molecular Sequence Data, Public Health Sciences Division, Sequence Alignment, Sequence Analysis, DNA, Vaccine and Infectious Disease Division


Transposons are "jumping genes" that account for large quantities of repetitive content in genomes. They are known to affect transcriptional regulation in several different ways, and are implicated in many human diseases. Transposons are related to microRNAs and viruses, and many genes, pseudogenes, and gene promoters are derived from transposons or have origins in transposon-induced duplication. Modeling transposon-derived genomic content is difficult because they are poorly conserved. Profile hidden Markov models (profile HMMs), widely used for protein sequence family modeling, are rarely used for modeling DNA sequence families. The algorithm commonly used to estimate the parameters of profile HMMs, Baum-Welch, is prone to prematurely converge to local optima. The DNA domain is especially problematic for the Baum-Welch algorithm, since it has only four letters as opposed to the twenty residues of the amino acid alphabet.