Scotty, We Need More Power! Power, Sample Size, and Coverage Estimation for RNA-Seq

January 28, 2013

(This article was originally published at Getting Genetics Done, and syndicated at StatsBlogs.)

Two of the most common questions at the beginning of an RNA-seq experiments are "how many reads do I need?" and "how many replicates do I need?". This paper describes a web application for designing RNA-seq applications that calculates an appropriate sample size and read depth to satisfy user-defined criteria such as cost, maximum number of reads or replicates attainable, etc. The power and sample size estimations are based on a t-test, which the authors claim, performs no worse than the negative binomial models implemented by popular RNA-seq methods such as DESeq, when there are three or more replicates present. Empirical distributions are taken from either (1) pilot data that the user can upload, or (2) built in publicly available data. The authors find that there is substantial heterogeneity between experiments (technical variation is larger than biological variation in many cases), and that power and sample size estimation will be more accurate when the user provides their own pilot data.

My only complaint, for all the reasons expressed in my previous blog post about why you shouldn't host things like this exclusively on your lab website, is that the code to run this analysis doesn't appear to be available to save, study, modify, maintain, or archive. When lead author Michele Busby leaves Gabor Marth's lab, hopefully the app doesn't fall into the graveyard of computational biology web apps Update 2/7/13: Michele Busby created a public Github repository for the Scotty code:

tl;dr? There's a new web app that does power, sample size, and coverage calculations for RNA-seq, but it only works well if the pilot or public data you give it closely matches the actual data you'll collect. 

Please comment on the article here: Getting Genetics Done

Tags: , , , ,