## Sunday data/statistics link roundup 12/23/12

December 23, 2012
By

A cool data visualization for blood glucose levels for diabetic individuals. This kind of interactive visualization can help people see where/when major health issues arise for chronic diseases. This was a class project by Jeff Heer’s Stanford CS448B students Ben Rudolph … Continue reading →

## Peter Bartlett on model complexity and sample size

December 23, 2012
By

Zach Shahn saw this and writes: I just heard a talk by Peter Bartlett about model selection in “unlimited” data situations that essentially addresses this curve. He talks about the problem of model selection given a computational budget (rather than given a sample size). You can either use your computational budget to get more data [...]

## Visualizing Principal Components

December 22, 2012
By

Principal Component Analysis (PCA) is a procedure that converts observations into linearly uncorrelated variables called principal components (Wikipedia). The PCA is a useful descriptive tool to examine your data. Today I will show how to find and visualize Principal Components. Let’s look at the components of the Dow Jones Industrial Average index over 2012. First, [...]

## Get the party started

December 22, 2012
By

Have you already used trees or random forests to model a relationship of a response and some covariates? Then you might like the condtional trees, which are implemented in the party package.In difference to the CART (Classification and Regression ...

## More Pinker Pinker Pinker

December 22, 2012
By

After I posted this recent comment on a blog of Steven Pinker (see also here), we had the following exchange. I’m reposting it here (with Pinker’s agreement) not because we achieved any deep insights but because I thought it useful to reveal to people that so-called experts such as us are not so clear on [...]

## Another reason to use JAGS instead of BUGS

December 21, 2012
By

BUGS is the pioneering software that made MCMC available to so many of us, but it has some problems with robustness that are not suffered by the subsequent software JAGS. Readers of DBDA tell me of some new problems running models in BUGS, which I have...

## R for inquisition

December 21, 2012
By

A post on high-dimensional arrays by @isomorphisms reminded me of APL and, more generally, of matrix languages, which took me back to inquisitive computing: computing not in the sense of software engineering, or databases, or formats, but of learning by poking problems through a computer. I like languages not because I can get a job [...]

## Computing an empirical pFDR in R

December 21, 2012
By

The positive false discovery rate (pFDR) has become a classical procedure to test for false positive. It is one of my favourite because it rely on a re-sampling approach.I base my implementation on John Storey PNAS paper and the technical report he pub...

## Two reviews of Nate Silver’s new book, from Kaiser Fung and Cathy O’Neil

December 21, 2012
By

People keep asking me what I think of Nate’s book, and I keep replying that, as a blogger, I’m spoiled. I’m so used to getting books for free that I wouldn’t go out and buy a book just for the purpose of reviewing it. (That reminds me that I should post reviews of some of [...]

## Guest Post: ROB TIBSHIRANI

December 21, 2012
By

GUEST POST: ROB TIBSHIRANI Today we have a guest post by my good friend Rob Tibshirani. Rob has a list of nine great statistics papers. (He is too modest to include his own papers.) Have a look and let us know what papers you would add to the list. And what machine learning papers would [...]

## Y2K38: Our Own Mayan Calendar…Again

December 21, 2012
By
$Y2K38: Our Own Mayan Calendar…Again$

It’s not quite the end of the world as we know it.  We made it through December 21, 2012 unscathed. It’s not going to be the last time we will make it through such a pseudo-calamity.  After all we have built our own end of the world before (e.g. Y2K). Next up January 19, 2038. [...]

## Y2K38: Our Own Mayan Calendar…Again

December 21, 2012
By
$Y2K38: Our Own Mayan Calendar…Again$

It’s not quite the end of the world as we know it.  We made it through December 21, 2012 unscathed. It’s not going to be the last time we will make it through such a pseudo-calamity.  After all we have built our own end of the world before (e.g. Y2K). Next up January 19, 2038. [...]

## Kahan on Pinker on politics

December 21, 2012
By

Reacting to my recent post on Steven Pinker’s too-broad (in my opinion) speculations on red and blue states, Dan “cultural cognition” Kahan writes: Pinker is clearly right to note that mass political opinions on seemingly diverse issues cohere, and Andrew, I think, is way too quick to challenge this I [Kahan] could cite to billions [...]

## Rejected Post: Clinical Trial Statistics Doomed by Mayan Apocalypse?

December 21, 2012
By

See Rejected Posts. Filed under: Rejected Posts, Statistics

## Man vs Wild Data

December 21, 2012
By

I’m speaking on this topic at the Young Statisticians Conference, 7–8 February 2013. If you’re a young statistician and live in Australia, please book in. It promises to be a great couple of days. Early registrations close on 2 January. Abstract for my talk: For 25 years I have been an intrepid statistical consultant, tackling the wild frontiers of real data, real problems and real time constraints. I have faced…

## The NIH peer review system is still the best at identifying innovative biomedical investigators

December 20, 2012
By

This recent Nature paper makes the controversial claim that the most innovative (interpreted as best) scientists are not being funded by NIH. Not surprisingly, it is getting a lot of attention in the popular media. The title and introduction make it sound … Continue reading →

## Who exactly are those silly academics who aren’t as smart as a Vegas bookie?

December 20, 2012
By

I get suspicious when I hear unsourced claims that unnamed experts somewhere are making foolish statements. For example, I recently came across this, from a Super Bowl-themed article from 2006 by Stephen Dubner and Steven Levitt: As it happens, there is one betting strategy that will routinely beat a bookie, and you don’t even have [...]

## Computation

December 20, 2012
By

These days I have been working with computation and programming languages. I want to share something with you here. You cannot expect C++ to magically make your code faster. If speed is of concern, you need profiling to find the bottleneck instead of blind guessing.——Yan Zhou. Thus we have to learn to know how to profile […]

## Italian elections (1)

December 19, 2012
By

You'd think that the last week before the holidays would be very quiet and not much would be going on. Well, if you did, you'd be wrong, I guess, as the last few days have been quite busy (for many reasons). Anyway, I managed to track down some be...

## PhilStat/Law/Stock: more on “bad statistics”: Schachtman

December 19, 2012
By

Nathan Schachtman has an update on the case of U.S. v. Harkonen discussed in my last 3 posts: here, here, and here. United States of America v. W. Scott Harkonen, MD — Part III Background The recent oral argument in United States v. Harkonen (see “The (Clinical) Trial by Franz Kafka” (Dec. 11, 2012)), pushed me to revisit the brief [...]

## Rafa interviewed about statistical genomics

December 19, 2012
By

He talks about the problems created by the speed of increase in data sizes in molecular biology, the way that genomics is hugely driven by data analysis/statistics, how Bioconductor is an example of bottom up science, Simply Statistics gets a … Continue reading →

## Beef Stakes: Representing US Beef Production as Meat

December 19, 2012
By

Beef Stakes, designed by art and technology student Sarah Hallacher, is a data representation of the amount of beef produced in the US during 2011, scaled down to only include the top 4 beef-producing states. The height of each steak is mapped to the...