Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics

Khurana, E; Fu, Y; Colonna, V; Mu, XJ; Kang, HM; Lappalainen, T; Sboner, A; Lochovsky, L; Chen, J; Harmanci, A; Das, J; Abyzov, A; Balasubramanian, S; Beal, K; Chakravarty, D; Challis, D; Chen, Y; Clarke, D; Clarke, L; Cunningham, F; Evani, US; Flicek, P; Fragoza, R; Garrison, E; Gibbs, R; Guemues, ZH; Herrero, J; Kitabayashi, N; Kong, Y; Lage, K; Liluashvili, V; Lipkin, SM; MacArthur, DG; Marth, G; Muzny, D; Pers, TH; Ritchie, GRS; Rosenfeld, JA; Sisu, C; Wei, X; Wilson, M; Xue, Y; Yu, F; Dermitzakis, ET; Yu, H; Rubin, MA; Tyler-Smith, C; Gerstein, M

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/14923

Title:	Integrative Annotation of Variants from 1092 Humans: Application to Cancer Genomics
Authors:	Khurana, E Fu, Y Colonna, V Mu, XJ Kang, HM Lappalainen, T Sboner, A Lochovsky, L Chen, J Harmanci, A Das, J Abyzov, A Balasubramanian, S Beal, K Chakravarty, D Challis, D Chen, Y Clarke, D Clarke, L Cunningham, F Evani, US Flicek, P Fragoza, R Garrison, E Gibbs, R Guemues, ZH Herrero, J Kitabayashi, N Kong, Y Lage, K Liluashvili, V Lipkin, SM MacArthur, DG Marth, G Muzny, D Pers, TH Ritchie, GRS Rosenfeld, JA Sisu, C Wei, X Wilson, M Xue, Y Yu, F Dermitzakis, ET Yu, H Rubin, MA Tyler-Smith, C Gerstein, M
Keywords:	Science & Technology;Multidisciplinary Sciences;Science & Technology - Other Topics;MULTIDISCIPLINARY SCIENCES;TERT PROMOTER MUTATIONS;NATURAL-SELECTION;PROSTATE-CANCER;POSITIVE SELECTION;GENETIC-VARIATION;HUMAN-EVOLUTION;COPY NUMBER;ELEMENTS;NETWORK;DISEASE
Issue Date:	2013
Publisher:	AMER ASSOC ADVANCEMENT SCIENCE
Citation:	SCIENCE, 2013, 342 (6154), pp. 84 - + (10)
Abstract:	Introduction: Plummeting sequencing costs have led to a great increase in the number of personal genomes. Interpreting the large number of variants in them, particularly in noncoding regions, is a current challenge. This is especially the case for somatic variants in cancer genomes, a large proportion of which are noncoding. Methods: We investigated patterns of selection in DNA elements from the ENCODE project using the full spectrum of variants from 1092 individuals in the 1000 Genomes Project (Phase 1), including single-nucleotide variants (SNVs), short insertions and deletions (indels), and structural variants (SVs). Although we analyzed broad functional annotations, such as all transcription-factor binding sites, we focused more on highly specifi c categories such as distal binding sites of factor ZNF274. The greater statistical power of the Phase 1 data set compared with earlier ones allowed us to differentiate the selective constraints on these categories. We also used connectivity information between elements from protein-protein-interaction and regulatory networks. We integrated all the information on selection to develop a workfl ow (FunSeq) to prioritize personal-genome variants on the basis of their deleterious impact. As a proof of principle, we experimentally validated and characterized a few candidate variants. Results: We identifi ed a specifi c subgroup of noncoding categories with almost as much selective constraint as coding genes: “ultrasensitive” regions. We also uncovered a number of clear patterns of selection. Elements more consistently active across tissues and both maternal and paternal alleles (in terms of allele-specifi c activity) are under stronger selection. Variants disruptive because of mechanistic effects on transcription-factor binding (i.e. “motif-breakers”) are selected against. Higher network connectivity (i.e. for hubs) is associated with higher constraint. Additionally, many hub promoters and regulatory elements show evidence of recent positive selection. Overall, indels and SVs follow the same pattern as SNVs; however, there are notable exceptions. For instance, enhancers are enriched for SVs formed by nonallelic homologous recombination. We integrated these patterns of selection into the FunSeq prioritization workfl ow and applied it to cancer variants, because they present a strong contrast to inherited polymorphisms. In particular, application to ~90 cancer genomes (breast, prostate and medulloblastoma) reveals nearly a hundred candidate noncoding drivers. Discussion: Our approach can be readily used to prioritize variants in cancer and is immediately applicable in a precision-medicine context. It can be further improved by incorporation of largerscale population sequencing, better annotations, and expression data from large cohorts.
URI:	http://bura.brunel.ac.uk/handle/2438/14923
DOI:	http://dx.doi.org/10.1126/science.1235587
ISSN:	http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000325126100049&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=f12c8c83318cf2733e615e54d9ed7ad5 ARTN 1235587 http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000325126100049&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=f12c8c83318cf2733e615e54d9ed7ad5 ARTN 1235587 0036-8075 1095-9203
Appears in Collections:	Department of Health Sciences Research Papers

Files in This Item:

File	Description	Size	Format
Fulltext.pdf		1.14 MB	Adobe PDF	View/Open

Show full item record