Blog Archives

Residual expertise – or why scientists are amateurs at most of science

May 18, 2015
By

Editor's note: I have been unsuccessfully attempting to finish a book I started 3 years ago about how and why everyone should get pumped about reading and understanding scientific papers. I've adapted part of one of the chapters into this blogpost. It is pretty raw but hopefully gets the idea across.  An episode of The Daily Show with

Read more »

The tyranny of the idea in science

May 8, 2015
By

There are a lot of analogies between startups and academic science labs. One thing that is definitely very different is the relative value of ideas in the startup world and in the academic world. For example, Paul Graham has said: Actually, startup ideas are not million dollar ideas, and here's an experiment you can try

Read more »

Mendelian randomization inspires a randomized trial design for multiple drugs simultaneously

May 7, 2015
By
Mendelian randomization inspires a randomized trial design for multiple drugs simultaneously

Joe Pickrell has an interesting new paper out about Mendelian randomization. He discusses some of the interesting issues that come up with these studies and performs a mini-review of previously published studies using the technique. The basic idea behind Mendelian Randomization is the following. In a simple, randomly mating population Mendel's laws tell us that at any

Read more »

Rafa’s citations above replacement in statistics journals is crazy high.

May 1, 2015
By
Rafa’s citations above replacement in statistics journals is crazy high.

Editor's note:  I thought it would be fun to do some bibliometrics on a Friday. This is super hacky and the CAR/Y stat should not be taken seriously.  I downloaded data on the 400 most cited papers between 2000-2010 in some statistical journals from Web of Science. Here is a boxplot of the average number

Read more »

Data analysis subcultures

April 29, 2015
By

Roger and I responded to the controversy around the journal that banned p-values today in Nature. A piece like this requires a lot of information packed into very little space but I thought one idea that deserved to be talked about more was the idea of data analysis subcultures. From the paper: Data analysis is taught

Read more »

A blessing of dimensionality often observed in high-dimensional data sets

April 9, 2015
By

Tidy data sets have one observation per row and one variable per column.  Using this definition, big data sets can be either: Wide - a wide data set has a large number of measurements per observation, but fewer observations. This type of data set is typical in neuroimaging, genomics, and other biomedical applications. Tall - a

Read more »

Teaser trailer for the Genomic Data Science Specialization on Coursera

March 26, 2015
By

  We have been hard at work in the studio putting together our next specialization to launch on Coursera. It will be called the "Genomic Data Science Specialization" and includes a spectacular line up of instructors: Steven Salzberg, Ela Pertea, James Taylor, Liliana Florea, Kasper Hansen, and me. The specialization will cover command line tools, statistics,

Read more »

A surprisingly tricky issue when using genomic signatures for personalized medicine

March 19, 2015
By
A surprisingly tricky issue when using genomic signatures for personalized medicine

My student Prasad Patil has a really nice paper that just came out in Bioinformatics (preprint in case paywalled). The paper is about a surprisingly tricky normalization issue with genomic signatures. Genomic signatures are basically statistical/machine learning functions applied to the measurements for a set of genes to predict how long patients will survive, or how they

Read more »

A simple (and fair) way all statistics journals could drive up their impact factor.

March 18, 2015
By

Hypothesis: If every method in every stats journal was implemented in a corresponding R package (easy), was required to have a  companion document that was a tutorial on how to use the software (easy), included a reference to how to cite the paper if you used the software (easy) and the paper/tutorial was posted to

Read more »

Data science done well looks easy – and that is a big problem for data scientists

March 17, 2015
By

Data science has a ton of different definitions. For the purposes of this post I'm going to use the definition of data science we used when creating our Data Science program online. Data science is: Data science is the process of formulating a quantitative question that can be answered with data, collecting and cleaning the

Read more »


Subscribe

Email:

  Subscribe