Integrative analysis of 26848 human transcriptomes reveals cancer-defining transcriptional architecture at the isoform resolution통합적 전사체 분석을 통한 mRNA 동위체 수준에서의 암 특이적 전사체 구조 규명
Most human genes can produce multiple transcript isoforms through the usage of alternative promoters and splice sites, which greatly expand the phenotypic complexity. Recent analyses of thousands of tumor RNA sequencing data revealed that aberrant splicing and alternative promoter activation are widespread in cancer, suggesting the pivotal role of isoform dysregulation in carcinogenesis. Here, by integrating RNA sequencing data from TCGA, PCAWG, and GTEx projects, we reconstruct the sequence of individual transcripts expressed in >26000 cancer and normal samples and provide a comprehensive landscape of alternative isoform usage across 39 tissues and 38 cancer types. Our isoform-centric analysis reveals that global transcriptome architecture is highly distinct across tissues, between cancer and normal samples of the same tissue origin, and between molecular subtypes of cancer even at a sub-gene resolution; alternative promoter usage and splicing together contribute to ~40% of the transcriptome heterogeneity across tissues and ~50-60% within tissues and cancer types; cancer often increases the usage of aberrant tumor suppressor gene isoforms that have lost the protein-coding potential; and aberrant splicing events occasionally co-occur in a single RNA molecule in a way that offsets the frame-shifting effect of each other and rescues the function of important oncogenes in tumors. Our findings suggest that cancer often dysregulates the usage of alternative isoforms to perturb key oncogenic pathways, highlighting the importance of transcriptome profiling at the transcript resolution to improve individualized diagnostic and therapeutic strategy for cancer patients.