## Interview with Cole Trapnell of UW Genome Sciences

December 5, 2014
By

Cole Trapnell is an Assistant Professor of Genome Sciences at the University of Washington. He is the developer of multiple incredibly widely used tools for genomics including Tophat, Cufflinks, and Monocle. His lab at UW studies cell differentiation, reprogramming, and other transitions between stable or metastable cellular states using a combination of computational and experimental techniques.

## A matrix computation on Pascal’s triangle

December 5, 2014
By

A colleague asked me a question regarding my recent post about the Pascal triangle matrix. While responding to his question, I discovered a program that I had written in 1999 that computed with a Pascal triangle matrix. Wow, I've been computing with Pascal's triangle for 15 years! I don't know […]

## The persistence of the “schools are failing” story line

December 5, 2014
By

I happened to come across a post from 2011 about some work of Roland Fryer, a prominent economist who works in education research. In an article, Fryer made the offhand remark that “test scores have been largely constant over the past thirty years,” a claim that was completely contradicted by one of the graphs in […] The post The persistence of the “schools are failing” story line appeared first on…

## Prediction competitions

December 5, 2014
By

Competitions have a long history in forecasting and prediction, and have been instrumental in forcing research attention on methods that work well in practice. In the forecasting community, the M competition and M3 competition have been particularly influential. The data mining community have the annual KDD cup which has generated attention on a wide range […]

## “Probing with Severity: Beyond Bayesian Probabilism and Frequentist Performance” (Dec 3 Seminar slides)

December 5, 2014
By

Below are the slides from my Rutgers seminar for the Department of Statistics and Biostatistics yesterday, since some people have been asking me for them. The abstract is here. I don’t know how explanatory a bare outline like this can be, but I’d be glad to try and answer questions[i]. I am impressed at how interested in foundational matters I […]

## More on Prediction From Log-Linear Regressions

December 4, 2014
By

My therapy sessions are actually going quite well. I'm down to just one meeting with Jane a week, now. Yes, there are still far too many log-linear regressions being bandied around, but I'm learning to cope with it!Last year, in an attempt to be helpfu...

## Initial steps towards reproducible research

December 4, 2014
By

In anticipation of next week’s Reproducible Science Hackathon at NESCent, I was thinking about Christie Bahlai’s post on “Baby steps for the open-curious.” Moving from Ye Olde Standard Computational Science Practice to a fully reproducible workflow seems a monumental task, but partially reproducible is better than not-at-all reproducible, and it’d be good to give people […]

## A comment on preparing data for classifiers

December 4, 2014
By

I have been working through (with some honest appreciation) a recent article comparing many classifiers on many data sets: “Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?” Manuel Fernández-Delgado, Eva Cernadas, Senén Barro, Dinani Amorim; 15(Oct):3133−3181, 2014 (which we will call “the DWN paper” in this note). This paper applies 179 … Continue reading A comment on preparing data for classifiers → Related posts: The Geometry…

## Degrees of Freedom and Information Criteria

December 4, 2014
By
$Degrees of Freedom and Information Criteria$

Degrees of freedom and information criteria are two fundamental concepts in statistical modeling, which are also taught in introductory statistics courses. But what are the exact abstract definitions for them which can be used to derive specific calculation formula in different situations. I often use fit criteria like AIC and BIC to choose between models. […]

## Repost: A deterministic statistical machine

December 4, 2014
By

Editor's note: This is a repost of our previous post about deterministic statistical machines. It is inspired by the recent announcement that the Automatic Statistician received funding from Google. In 2012 we also applied to Google for a small research award to study this same problem, but didn't get it. In the interest of extreme openness

## Designing a study to see if “the 10x programmer” is a real thing

December 4, 2014
By

Lorin H. writes: One big question in the world of software engineering is: how much variation is there in productivity across programmers? (If you google for “10x programmer” you’ll see lots of hits). Let’s say I wanted to explore this research question with a simple study. Choose a set of participants at random from a […] The post Designing a study to see if “the 10x programmer” is a real…

## The Rock Hyrax Problem

December 4, 2014
By

This is the third of a series of articles about Bayesian analysis.  The previous article is here.Earlier this semester I posed this problem to my Bayesian statistics class at Olin College:Suppose I capture and tag 10 rock hyrax...

## Useful for referring—12-04-2014

December 4, 2014
By

Tutorial: How to detect spurious correlations, and how to find the … Practical illustration of Map-Reduce (Hadoop-style), on real data Jackknife logistic and linear regression for clustering and predict… From the trenches: 360-degrees data science A synthetic variance designed for Hadoop and big data Fast Combinatorial Feature Selection with New Definition of Predict… A […]

December 4, 2014
By

I have been asked to speak at the 2014 LRA Conference on the topic of Academic Blogging. Time: 1:15-2:15 Location: Islands Ballroom Salon B – Lobby Level My Slides: http://clari.buffalo.edu/blog My Précis: http://clari.buffalo.edu/blog/materials/precis.pdf The talk is part of a larger … Continue reading →

## Study of a plot

December 3, 2014
By

I began to think on a nice way of plotting campaign expenditures in a paper I'm working on. I thought this would be something like the following--simple but meaningful even when there are outliers in both tails. Though I like the seniors Tukey's boxplot and scatter plots, I had already used them the last time […]

## If observational studies are outlawed, then only outlaws will do observational studies

December 3, 2014
By

My article “Experimental reasoning in social science” begins as follows: As a statistician, I was trained to think of randomized experimentation as representing the gold standard of knowledge in the social sciences, and, despite having seen occasional arguments to the contrary, I still hold that view, expressed pithily by Box, Hunter, and Hunter (1978) that […] The post If observational studies are outlawed, then only outlaws will do observational studies…

## Where a scatter plot fails

December 3, 2014
By

Found this chart in the magazine that Charles Schwab sends to customers: When there are two variables, and their correlation is of interest, a scatter plot is usually recommended. But not here! The text labels completely dominate this chart and...

## R resources

December 3, 2014
By

This is the third in my weekly series of posts pointing out resources on this site. This week’s topic is R. R language for programmers Default arguments and lazy evaluation in R Distributions in R Moving data between R and Excel via the clipboard Sweave: First steps toward reproducible analyses Troubleshooting Sweave Regular expressions in […]

## Pascal’s triangle in SAS

December 3, 2014
By

Pascal's triangle is the name given to the triangular array of binomial coefficients. The nth row is the set of coefficients in the expansion of the binomial expression (1 + x)n. Complicated stuff, right? Well, yes and no. Pascal's triangle is known to many school children who have never heard of polynomials […]

## Pascal’s triangle in SAS

December 3, 2014
By

Pascal's triangle is the name given to the triangular array of binomial coefficients. The nth row is the set of coefficients in the expansion of the binomial expression (1 + x)n. Complicated stuff, right? Well, yes and no. Pascal's triangle is known to many school children who have never heard of polynomials […]

## Random probability tweets

December 3, 2014
By

-+*For the next few weeks, I’ve scheduled @ProbFact tweets to come out at random times. They will follow a Poisson distribution with an average of two per day. (Times are truncated to multiples of 5 minutes because my scheduling software requires that.)

## Study of a Plot: The Manhattan Plot

December 3, 2014
By

I was thinking on a nice way of plotting campaign expenditures in a paper I’m working on. I thought this would be something like the following, simple, but meaningful even in the context of lots of outliers in both tails. Although I like the sen...

## Study of a Plot: The Manhattan Plot

December 3, 2014
By

I was thinking on a nice way of plotting campaign expenditures in a paper I’m working on. I thought this would be something like the following, simple, but meaningful even in the context of lots of outliers in both tails. Although I like the sen...