A comparative study of RNA-seq analysis strategies

Jänes, J; Hu, F; Lewin, A; Turro, E

Please use this identifier to cite or link to this item: http://bura.brunel.ac.uk/handle/2438/12710

Full metadata record

DC Field	Value	Language
dc.contributor.author	Jänes, J	-
dc.contributor.author	Hu, F	-
dc.contributor.author	Lewin, A	-
dc.contributor.author	Turro, E	-
dc.date.accessioned	2016-06-02T15:46:13Z	-
dc.date.available	2015-02-06	-
dc.date.available	2016-06-02T15:46:13Z	-
dc.date.issued	2015	-
dc.identifier.citation	Briefings in Bioinformatics, 16(6): pp. 932 - 940, (2015)	en_US
dc.identifier.issn	1467-5463	-
dc.identifier.issn	1477-4054	-
dc.identifier.uri	http://bib.oxfordjournals.org/content/16/6/932	-
dc.identifier.uri	http://bura.brunel.ac.uk/handle/2438/12710	-
dc.description.abstract	Three principal approaches have been proposed for inferring the set of transcripts expressed in RNA samples using RNA- seq. The simplest approach uses curated annotations, which assumes the transcripts in a sample are a subset of the tran- scripts listed in a curated database. Amore ambitious method involves aligning reads to a reference genome and using the alignments to infer the transcript structures, possibly with the aid of a curated transcript database. Themost challenging approach is to assemble reads into putative transcripts de novo without the aid of reference data. We have systematically assessed the properties of these three approaches through a simulation study. We have found that the sensitivity of compu- tational transcript set estimation is severely limited. Computational approaches (both genome-guided and de novo assem- bly) produce a large number of artefacts, which are assigned large expression estimates and absorb a substantial proportion of the signal when performing expression analysis. The approach using curated annotations shows good expression correl- ation even when the annotations are incomplete. Furthermore, any incorrect transcripts present in a curated set do not ab- sorbmuch signal, so it is preferable to have a curation set with high sensitivity than high precision. Software to simulate transcript sets, expression values and sequence reads under a wider range of parameter values and to compare sensitivity, precision and signal-to-noise ratios of differentmethods is freely available online (https://github.com/boboppie/RSSS) and can be expanded by interested parties to includemethods other than the exemplars presented in this article.	en_US
dc.description.sponsorship	This work was supported by the Wellcome Trust (WT097679); the Cambridge Biomedical Research Centre; Cancer Research UK (C14303/A10825) and the Medical Research Council (G1002319).	en_US
dc.format.extent	932 - 940	-
dc.language.iso	en	en_US
dc.publisher	Oxford University Press	en_US
dc.subject	RNA-seq	en_US
dc.subject	Transcriptome assembly	en_US
dc.subject	Gene expression	en_US
dc.subject	RNA splicing	en_US
dc.title	A comparative study of RNA-seq analysis strategies	en_US
dc.type	Article	en_US
dc.identifier.doi	http://dx.doi.org/10.1093/bib/bbv007	-
dc.relation.isPartOf	Briefings in Bioinformatics	-
pubs.issue	6	-
pubs.publication-status	Published	-
pubs.volume	16	-
Appears in Collections:	Dept of Mathematics Research Papers

Files in This Item:

File	Description	Size	Format
Fulltext.pdf		570.4 kB	Adobe PDF	View/Open

Show simple item record