Please use this identifier to cite or link to this item:
Title: Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics
Authors: Khurana, E
Fu, Y
Colonna, V
Mu, XJ
Kang, HM
Lappalainen, T
Sboner, A
Lochovsky, L
Chen, J
Harmanci, A
Das, J
Abyzov, A
Balasubramanian, S
Beal, K
Chakravarty, D
Challis, D
Chen, Y
Clarke, D
Clarke, L
Cunningham, F
Evani, US
Flicek, P
Fragoza, R
Garrison, E
Gibbs, R
Guemues, ZH
Herrero, J
Kitabayashi, N
Kong, Y
Lage, K
Liluashvili, V
Lipkin, SM
MacArthur, DG
Marth, G
Muzny, D
Pers, TH
Ritchie, GRS
Rosenfeld, JA
Sisu, C
Wei, X
Wilson, M
Xue, Y
Yu, F
Dermitzakis, ET
Yu, H
Rubin, MA
Tyler-Smith, C
Gerstein, M
Issue Date: 2013
Citation: SCIENCE, 2013, 342 (6154), pp. 84 - + (10)
Abstract: Introduction: Plummeting sequencing costs have led to a great increase in the number of personal genomes. Interpreting the large number of variants in them, particularly in noncoding regions, is a current challenge. This is especially the case for somatic variants in cancer genomes, a large proportion of which are noncoding. Methods: We investigated patterns of selection in DNA elements from the ENCODE project using the full spectrum of variants from 1092 individuals in the 1000 Genomes Project (Phase 1), including single-nucleotide variants (SNVs), short insertions and deletions (indels), and structural variants (SVs). Although we analyzed broad functional annotations, such as all transcription-factor binding sites, we focused more on highly specifi c categories such as distal binding sites of factor ZNF274. The greater statistical power of the Phase 1 data set compared with earlier ones allowed us to differentiate the selective constraints on these categories. We also used connectivity information between elements from protein-protein-interaction and regulatory networks. We integrated all the information on selection to develop a workfl ow (FunSeq) to prioritize personal-genome variants on the basis of their deleterious impact. As a proof of principle, we experimentally validated and characterized a few candidate variants. Results: We identifi ed a specifi c subgroup of noncoding categories with almost as much selective constraint as coding genes: “ultrasensitive” regions. We also uncovered a number of clear patterns of selection. Elements more consistently active across tissues and both maternal and paternal alleles (in terms of allele-specifi c activity) are under stronger selection. Variants disruptive because of mechanistic effects on transcription-factor binding (i.e. “motif-breakers”) are selected against. Higher network connectivity (i.e. for hubs) is associated with higher constraint. Additionally, many hub promoters and regulatory elements show evidence of recent positive selection. Overall, indels and SVs follow the same pattern as SNVs; however, there are notable exceptions. For instance, enhancers are enriched for SVs formed by nonallelic homologous recombination. We integrated these patterns of selection into the FunSeq prioritization workfl ow and applied it to cancer variants, because they present a strong contrast to inherited polymorphisms. In particular, application to ~90 cancer genomes (breast, prostate and medulloblastoma) reveals nearly a hundred candidate noncoding drivers. Discussion: Our approach can be readily used to prioritize variants in cancer and is immediately applicable in a precision-medicine context. It can be further improved by incorporation of largerscale population sequencing, better annotations, and expression data from large cohorts.
ARTN 1235587
ARTN 1235587
Appears in Collections:Dept of Clinical Sciences Research Papers

Files in This Item:
File Description SizeFormat 
Fulltext.pdf1.14 MBAdobe PDFView/Open

Items in BURA are protected by copyright, with all rights reserved, unless otherwise indicated.