Identifies sequences

4/10/2023 0 Comments

Identifies sequences

b Most transcribed pseudogenes identified here were absent from Gencode. a Full-length consensus PacBio cDNA reads from normal tissues and cell lines were compared to Gencode annotations to generate a pseudogene transcriptome. Long-read cDNA sequencing elucidates the human pseudogene transcriptome. We aligned the reads to the human reference genome (hg38) at high stringency (q60) and compared the identified transcript isoforms to Gencode annotations using SQANTI2, a bioinformatics QC tool designed to annotate full-length transcript (Iso-Seq) data with respect to a reference transcriptome. To further broaden the biological scope of our analysis, we integrated our data with a deep PacBio in-house Sequel II dataset of 6,775,127 full-length reads from a mixture of 10 human cell lines.

To comprehensively survey the human processed pseudogene transcriptome, we sequenced high quality RNA from 20 normal mixed adult and foetal human tissues (Qiagen XpressRef Universal Total RNA) on a Sequel II platform (Fig. PacBio Iso-Seq is particularly suitable for this application due to the high consensus accuracy enabled by circular consensus reads. Long-read cDNA sequencing via Pacific Biosciences Isoform Sequencing (PacBio Iso-Seq) or Oxford Nanopore Technologies is a potentially powerful approach to identify full-length pseudogene transcripts and accurately differentiate pseudogenes and their parent mRNAs. Whilst most pseudogenes are presumed to act by noncoding mechanisms, some retain the capacity to encode full-length or truncated proteins. Pseudogene transcripts can control the expression of their parent genes by acting as competitive endogenous RNAs (ceRNAs), antisense transcripts, precursors for small interfering RNAs (siRNAs), and piwi-interacting RNAs (piRNAs). As a result, the extent of the human pseudogene transcriptome in most spatiotemporal contexts remains largely unresolved. Most full-length pseudogene transcripts found to date were identified by relatively low-throughput capillary sequencing of full-length cDNA libraries. However, studies of pseudogene transcription are hindered by the limited capacity of short-read sequencing, and microarray hybridisation, to discriminate pseudogenes from their highly similar parent genes. Transcriptomic surveys of cancer and normal human tissues by high-throughput short-read sequencing suggest that pseudogene transcription may be widespread.

Due to the loss of parental cis-regulatory elements, processed pseudogenes were initially presumed to be transcriptionally silent and were excluded from genome-wide functional screens and most transcriptome analyses. Most human pseudogenes (72%) are derived from retrotransposition of processed mRNAs, mediated by proteins encoded by the LINE-1 retrotransposon. Pseudogenes are gene copies which are thought to be defective due to frame-disrupting mutations or transcriptional silencing.

0 Comments

YOUR CART

Identifies sequences

Leave a Reply.

Author

Archives

Categories