My student Prasad Patil has a really nice paper that just came out in Bioinformatics (preprint in case paywalled). The paper is about a surprisingly tricky normalization issue with genomic signatures. Genomic signatures are basically statistical/machine learning functions applied to the measurements for a set of genes to predict how long patients will survive, or how they

Only 6% of crashes in New Zealand involve foreign drivers, according to the latest figures provided by the Ministry of Transport. But in some remote regions of the South Island particularly popular with tourists for their scenery... foreign drivers are involved in about a quarter of all crashes. These sentences come from a CNN article about a vigilante movement in those regions popular with tourists.

Hypothesis: If every method in every stats journal was implemented in a corresponding R package (easy), was required to have a companion document that was a tutorial on how to use the software (easy), included a reference to how to cite the paper if you used the software (easy) and the paper/tutorial was posted to

I've written a short piece about the Tapestry conference for the Graphically Speaking column in Computer Graphics and Applications. The article talks about the reasoning behind Tapestry, how it's different from academic conferences, and gives a few examples of talks. It even includes anecdotal evidence to show that the conference has enabled actual knowledge transfer.

If you train a model on a set of data, it should fit that data well. The hope, however, is that it will fit a new set of data well. So in machine learning and statistics, people split their data into two parts. They train the model on one half, and see how well it […]

This week I'll start my Bayesian Statistics master's course at the Collegio Carlo Alberto. I realized that some of last year students got PhD positions in prestigious US universities. So I thought that letting this year's students have a first grasp of some great Bayesian papers wouldn't do harm. The idea is that in addition to the course,

Data science has a ton of different definitions. For the purposes of this post I'm going to use the definition of data science we used when creating our Data Science program online. Data science is: Data science is the process of formulating a quantitative question that can be answered with data, collecting and cleaning the

