Blog Archives

The Next National Library of Medicine Director Can Help Define the Future of Data Science

August 24, 2015
By

The main motivation for starting this blog was to share our enthusiasm about the increased importance of data and data analysis in science, industry, and society in general. Based on recent initiatives, such as BD2k, it is clear that the NIH is also enthusiastic and very much interested in supporting data science. For those that don't know,

Read more »

Correlation is not a measure of reproducibility

August 12, 2015
By
Correlation is not a measure of reproducibility

Biologists make wide use of correlation as a measure of reproducibility. Specifically, they quantify reproducibility with the correlation between measurements obtained from replicated experiments. For example, the ENCODE data standards document states A typical R2 (Pearson) correlation of gene expression (RPKM) between two biological replicates, for RNAs that are detected in both samples using RPKM or read counts, should

Read more »

rafalib package now on CRAN

August 10, 2015
By
rafalib package now on CRAN

For the last several years I have been collecting functions I routinely use during exploratory data analysis in a private R package. Mike Love and I used some of these in our HarvardX course and now, due to popular demand, I have created man pages and added the rafalib package to CRAN. Mike has made several

Read more »

How public relations and the media are distorting science

June 24, 2015
By

Throughout history, engineers, medical doctors and other applied scientists have helped convert  basic science discoveries into products, public goods and policy that have greatly improved our quality of life. With rare exceptions, it has taken years if not decades to establish these discoveries. And even the exceptions stand on the shoulders of incremental contributions. The researchers that

Read more »

Batch effects are everywhere! Deflategate edition

June 9, 2015
By
Batch effects are everywhere! Deflategate edition

In my opinion, batch effects are the biggest challenge faced by genomics research, especially in precision medicine. As we point out in this review, they are everywhere among high-throughput experiments. But batch effects are not specific to genomics technology. In fact, in this 1972 paper (paywalled), WJ Youden describes batch effects in the context of measurements made by physicists.

Read more »

Is it species or is it batch? They are confounded, so we can’t know

May 20, 2015
By

In a 2005 OMICS paper, an analysis of human and mouse gene expression microarray measurements from several tissues led the authors to conclude that "any tissue is more similar to any other human tissue examined than to its corresponding mouse tissue". Note that this was a rather surprising result given how similar tissues are between species.

Read more »

Genomics Case Studies Online Courses Start in Two Weeks (4/27)

April 13, 2015
By

The last month of the HarvardX Data Analysis for Genomics series start on 4/27. We will cover case studies on RNAseq, Variant calling, ChipSeq and DNA methylation. Faculty includes Shirley Liu, Mike Love, Oliver Hoffman and the HSPH Bioinformatics Core. Although taking the previous courses on the series will help, the four case study courses

Read more »

Introduction to Bioconductor HarvardX MOOC starts this Monday March 30

March 24, 2015
By

Bioconductor is one of the most widely used open source toolkits for biological high-throughput data. In this four week course, co-taught with Vince Carey and Mike Love, we will introduce you to Bioconductor's general infrastructure and then focus on two specific technologies: next generation sequencing and microarrays. The lectures and assessments will be annotated in

Read more »

π day special: How to use Bioconductor to find empirical evidence in support of π being a normal number

March 14, 2015
By
π day special: How to use Bioconductor to find empirical evidence in support of π being a normal number

Editor's note: Today 3/14/15 at some point between  9:26:53 and 9:26:54 it was the most π day of them all. Below is a repost from last year.  Happy π day everybody! I wanted to write some simple code (included below) to the test parallelization capabilities of my  new cluster. So, in honor of  π day, I

Read more »

Advanced Statistics for the Life Sciences MOOC Launches Today

March 2, 2015
By

In this four week course we will teach statistical techniques that are commonly used in the analysis of high-throughput data and their corresponding R implementations. In Week 1 we will explain inference in the context of high-throughput data and introduce the concept of error controlling procedures. We will describe the strengths and weakness of the Bonferroni correction,

Read more »


Subscribe

Email:

  Subscribe