![]() Using this annotated genome and manual inspection, we demonstrated the superior performance of xTea to existing methods for both germline and somatic insertions. We created a high-quality catalogue of haplotype-resolved nonreference TE insertions in an individual whose genome was extensively curated by multiple sequencing platforms. Rewritten from scratch for greater efficiency, it has five major improvements over the original (2012) version of Tea: (i) it has increased accuracy due to more refined filtering criteria (ii) it identifies transduction events, both canonical and orphan (iii) it detects a wide range of retroelement insertions, including processed pseudogene and human endogenous retrovirus (HERV) insertions (iv) it detects both germline and somatic insertions, including mosaic insertions from very high-coverage data and (v) it can incorporate data from multiple sequencing technologies including long-read platforms. Here, we propose a computational tool, xTea (x-Transposable element analyzer), that detects nonreference TE insertions (i.e., insertions that are not present in the reference genome) from WGS data. To date, PALMER 21 is the only tool specifically designed for TE-insertion detection from long reads. Recent advances in sequencing technologies, such as PacBio and Oxford Nanopore long reads create >10–15 Kbp reads and thus allow us to reconstruct the entire sequences of inserted TEs and their flanking regions, enabling the discovery and characterization of those challenging types of TE insertions. One critical shortcoming of current TE analysis based on short-read data is its inability to detect TE insertions that accompany complex rearrangements or fall into highly repetitive regions, such as those within existing TE copies from the same TE subfamily or within centromeric/telomeric repeats 18, 19, 20. Most tools were designed to detect either germline-inherited or de novo, thus present in all cells in the body-or somatic TE insertions. The tools include MELT 14, which detects polymorphic inherited insertions, and TraFiC-mem 12, which detects somatic insertions from a case/control pair. Multiple tools have been developed to detect TE insertions from Illumina paired-end short reads 7, 12, 14, 15, 16, 17. These studies highlight the importance of accurate TE detection for genomic medicine. In a landmark study, an SVA insertion causing exon-trapping was identified in a child with Batten disease and it led to the development of a personalized antisense-oligonucleotide drug to fix the splicing defect 13. A recent pan-cancer analysis of ~3000 cancer genomes has identified not only numerous somatic L1 insertions, making L1 the third most frequent type of somatic SVs, but also various types of L1-mediated structural variations (SVs) 10. Subsequent studies have elaborated the role of TEs, e.g., in cancer immunity 7, 8, 9, 10, 11, 12. With the availability of whole-genome sequencing (WGS) data, we have reported frequent somatic L1 insertions in some cancer types, especially in epithelial cancers, suggesting a role of TEs in tumorigenesis 7. To date, more than a hundred TE insertions have been causally linked to Mendelian disorders and hereditary cancers, with TE impacting gene regulation through diverse mechanisms including insertional mutagenesis, premature polyadenylation, and alteration of RNA expression and splicing 3, 6. Increasing evidence suggests the contribution of TEs to human development and health, such as placental development 4 and innate immunity 5. Although most TEs are genomic fossils that have lost their ability to mobilize, several types of TEs (L1s, Alus, and SVAs) can still mobilize via a copy-paste mechanism through RNA intermediates. Transposable elements (TEs) comprise nearly half of the human genome 1, and their mobilization is a significant source of genomic variation and human diseases 2, 3. Notably, we find that individual genomes have an average of nine groups of full-length L1s in centromeres, suggesting that centromeres and other highly repetitive regions such as telomeres are a significant yet unexplored source of active L1s. ![]() With long-read data, we created a catalogue of polymorphic insertions with full assembly and annotation of insertional sequences for various types of retroelements, including pseudogenes and endogenous retroviruses. Our analysis shows that xTea outperforms other short read-based methods for both germline and somatic TE insertion discovery. ![]() Whereas existing methods are mostly designed for short-read data, xTea can be applied to both short-read and long-read data. Here, we present xTea (x-Transposable element analyzer), a tool for identifying TE insertions in whole-genome sequencing data. When inserted into some locations, TEs may disrupt gene regulation and cause diseases. Transposable elements (TEs) help shape the structure and function of the human genome.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |