## The Art of Fielding

June 16, 2012
I liked it; the reviews were well-deserved. It indeed is a cross between The Mysteries of Pittsburgh and The Universal Baseball Association, J. Henry Waugh, Prop. What struck me most, though, was the contrast with Indecision, the novel by Harbach’s associate, Benjamin Kunkel. As I noted a few years ago, Indecision was notable in that [...]

## Carnon [and Core, end]

June 15, 2012
Yet another full day working on Bayesian Core with Jean-Michel in Carnon… This morning, I ran along the canal for about an hour and at last saw some pink flamingos close enough to take pictures (if only to convince my daughter that there were flamingos in the area!). Then I worked full-time on the spatial [...]

## Statisticians, ASA, and Big Data

June 15, 2012
Today I got my copy of Amstat News and eagerly opened it before I realized it was not the issue with the salary survey…. But the President’s Corner section had the following column on big data by ASA president Robert Rodriguez. Big Data is...

## Coaching, teaching, and writing

June 15, 2012
I sent the following email to Thomas Basbøll: I read this: http://secondlanguage.blogspot.com/p/writing-coach.html and was reminded of this: http://andrewgelman.com/2011/10/could-i-use-a-statistics-coach/ He replied: Which reminds me of this http://secondlanguage.blogspot.com/2011/10/teacher-or-coach.html We seem to be approaching some sort of Platonic ideal in which we can conduct an entire conversation from links to our previous writings. Just like that joke about [...]

## Update to Data on Github Post: Solution to an RCurl problem

June 15, 2012
A reader of my most recent post tried the R code I had written to download the data set of electoral disproportionality from the GitHub repository. However, it didn’t work for them. After entering disproportionality.data <- getURL(url) they go...

## Cool-ass signal processing using Gaussian processes (birthdays again)

June 14, 2012
Aki writes: Here’s my version of the birthday frequency graph. I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. I guess that periodic component due to week effect [...]

## Simple rendering of complex data

June 14, 2012
Andrew Gelman likes this line chart showing the day-by-day trend in childbirth: Andrew makes a number of good points about this chart. Make sure you read the whole post. One of his points concerns making the line smoother by removing...

## You are your shoes. An incoherent study claims.

June 14, 2012
According to this link, a study proved that "90 percent of a person's traits can be judged with their shoes." Without needing to look at the study, a reader can reason that this claim makes no sense. What does it mean by 90 percent of a person's traits? How many traits are there in a person? If the researcher defines 1000 traits, then the shoe predictor will need to predict…

## Freedman’s Neglected Theorem

June 14, 2012
$Freedman’s Neglected Theorem$

—Larry Wasserman In this post I want to review an interesting result by David Freedman (Annals of Mathematical Statistics, Volume 36, Number 2 (1965), 454-456) available at projecteuclid.org. The result gets very little attention. Most researchers in statistics and machine learning seem to be unaware of the result. The result says that, “almost all” Bayesian [...]

## Body Weight in the United States – Part 2, "Non Factors"

June 13, 2012
Sometimes the story isn't what is a trend, but rather what is not a trend. In this second installment about body weight in the U.S., listing what doesn't seem to be contributing factors will help narrow down what might actually be the p...

## Economists . . .

June 13, 2012
Catherine Rampell writes: On Monday the Nobel Foundation, which bestows the world’s most prestigious academic, literary and humanitarian prizes, said it was reducing the cash awarded with Nobel Prizes by about 20 percent. . . . Peter A. Diamond, a professor emeritus at the Massachusetts Institute of Technology who also received the Nobel in economic [...]

June 13, 2012
Jacob Oaknin asks: Akaike‘s selection criterion is often justified on the basis of the empirical risk of a ML estimate being a biased estimate of the true generalization error of a parametric family, say the family, S_m, of linear regressors on a m-dimensional variable x=(x_1,..,x_m) with gaussian noise independent of x (for instance in “Unifying [...]

## Why R is Hard to Learn

June 13, 2012
The open source R software for analytics has a reputation for being hard to learn. It certainly can be, especially for people who are already familiar with similar packages such as SAS, SPSS or Stata. Training and documentation that leverages … Continue reading →

## Data on GitHub: The easy way to make your data available

June 13, 2012
Update (15 June 2012): See this post for instructions on how to download GitHub based data into R if you are getting the error about an SSL certificate problem. GitHub is designed for collaborating on coding projects. Nonetheless, it is also a pote...

## Convergence or divergence? A simple iteration with a random component

June 13, 2012
A collegue who works with time series sent me the following code snippet. He said that the calculation was overflowing and wanted to know if this was a bug in SAS: data A(drop=m); call streaminit(12345); m = 2; x = 0; do i = 1 to 5000; x = m*x [...]

## Teaching STT 200

June 12, 2012
This summer, I am teaching an undergraduate stats class, which is a first class in stats to cover three units, descriptive statistics, probability and statistical inference. The course webpage is here. The following paragraph is from the thesis of Michael Phillip Lesnick. It explains the relationship among the three units: Recall first that in statistics, [...]

## Statistics Versus Machine Learning

June 12, 2012
—Larry Wasserman Welcome to my blog, which will discuss topics in Statistics and Machine Learning. Some posts will be technical  and others will be non-technical. Since this blog is about topics in both Statistics and Machine Learning, perhaps I should address the question: What is the difference between these two fields? The short answer is: [...]

## NBA Predictions — Finals

June 12, 2012
Now we are on to the finals! The algorithm enters the finals with a 6-4 record so far. Here is what we have for tonight: So, let's see if OKC wins this one.

## Next R meeting in Paris INSEE: ggplot2 and parallel computing

June 12, 2012
$Next R meeting in Paris INSEE: ggplot2 and parallel computing$

Hi, our group of R users from INSEE, aka FLR, meets monthly in Paris. Next meeting is on Wed 13 (tomorrow), 1-2 pm, room 539 (an ID is needed to come in,  map to access INSEE R), about ggplot2 and parallel computing. Since the first meeting in February, presentations have included hot topics like webscrapping, C in R, RStudio, SQLite […]

## Finding word use patterns in Wikileaks cables

June 12, 2012
6/18: A follow-up to this post is now available here. Recent DiscoveriesWhen I was a diplomat, I was always interested in the Wikileaks cables and what could be done with them. Unfortunately, I never got a chance to look at the site in depth, due to s...

## Poison gas or…air pollution?

June 12, 2012
From our Beijing bureau, we have the following message from the U.S. embassy that was recently issued to U.S. citizens in China: The Embassy has received reports from U.S. citizens living and traveling in Wuhan that the air quality in the city has be...