Viral expression associated with gastrointestinal adenocarcinomas in TCGA high-throughput sequencing data

Daria Salyakina, Nicholas F. Tsinoremas

Research output: Contribution to journalArticlepeer-review

42 Scopus citations


Background: Up to 20% of cancers worldwide are thought to be associated with microbial pathogens, including bacteria and viruses. The widely used methods of viral infection detection are usually limited to a few a priori suspected viruses in one cancer type. To our knowledge, there have not been many broad screening approaches to address this problem more comprehensively. Methods: In this study, we performed a comprehensive screening for viruses in nine common cancers using a multistep computational approach. Tumor transcriptome and genome sequencing data were available from The Cancer Genome Atlas (TCGA). Nine hundred fifty eight primary tumors in nine common cancers with poor prognosis were screened against a non-redundant database of virus sequences. DNA sequences from normal matched tissue specimens were used as controls to test whether each virus is associated with tumors. Results: We identified human papilloma virus type 18 (HPV-18) and four human herpes viruses (HHV) types 4, 5, 6B, and 8, also known as EBV, CMV, roseola virus, and KSHV, in colon, rectal, and stomach adenocarcinomas. In total, 59% of screened gastrointestinal adenocarcinomas (GIA) were positive for at least one virus: 26% for EBV, 21% for CMV, 7% for HHV-6B, and 20% for HPV-18. Over 20% of tumors were co-infected with multiple viruses. Two viruses (EBV and CMV) were statistically significantly associated with colorectal cancers when compared to the matched healthy tissues from the same individuals (p = 0.02 and 0.03, respectively). HPV-18 was not detected in DNA, and thus, no association testing was possible. Nevertheless, HPV-18 expression patterns suggest viral integration in the host genome, consistent with the potentially oncogenic nature of HPV-18 in colorectal adenocarcinomas. The estimated counts of viral copies were below one per cell for all identified viruses and approached the detection limit. Conclusions: Our comprehensive screening for viruses in multiple cancer types using next-generation sequencing data clearly demonstrates the presence of viral sequences in GIA. EBV, CMV, and HPV-18 are potentially causal for GIA, although their oncogenic role is yet to be established.

Original languageEnglish (US)
Article number23
JournalHuman genomics
Issue number1
StatePublished - 2013


  • Cancer
  • Herpes virus
  • Papilloma virus

ASJC Scopus subject areas

  • Molecular Biology
  • Molecular Medicine
  • Drug Discovery
  • Genetics


Dive into the research topics of 'Viral expression associated with gastrointestinal adenocarcinomas in TCGA high-throughput sequencing data'. Together they form a unique fingerprint.

Cite this