Ensemble Modeling Approach Targeting Heterogeneous RNA-Seq data: Application to Melanoma Pseudogenes

Enrico Capobianco, Camilo Valdes, Samanta Sarti, Zhijie Jiang, Laura Poliseno, Nicolas F. Tsinoremas

Research output: Contribution to journalArticlepeer-review

2 Scopus citations


We studied the transcriptome landscape of skin cutaneous melanoma (SKCM) using 103 primary tumor samples from TCGA, and measured the expression levels of both protein coding genes and non-coding RNAs (ncRNAs). In particular, we emphasized pseudogenes potentially relevant to this cancer. While cataloguing the profiles based on the known biotypes, all the employed RNA-Seq methods generated just a small consensus of significant biotypes. We thus designed an approach to reconcile the profiles from all methods following a simple strategy: we selected genes that were confirmed as differentially expressed by the ensemble predictions obtained in a regression model. The main advantages of this approach are: 1) Selection of a high-confidence gene set identifying relevant pathways; 2) Use of a regression model whose covariates embed all method-driven outcomes to predict an averaged profile; 3) Method-specific assessment of prediction power and significance. Furthermore, the approach can be generalized to any biological system for which noisy RNA-Seq profiles are computed. As our analyses concerned bio-annotations of both high-quality protein coding genes and ncRNAs, we considered the associations between pseudogenes and parental genes (targets). Among the candidate targets that were validated, we identified PINK1, which is studied in patients with Parkinson and cancer (especially melanoma).

Original languageEnglish (US)
Article number17344
JournalScientific reports
Issue number1
StatePublished - Dec 1 2017

ASJC Scopus subject areas

  • General


Dive into the research topics of 'Ensemble Modeling Approach Targeting Heterogeneous RNA-Seq data: Application to Melanoma Pseudogenes'. Together they form a unique fingerprint.

Cite this