Update to Data on Github Post: Solution to an RCurl problem

June 15, 2012
By

A reader of my most recent post tried the R code I had written to download the data set of electoral disproportionality from the GitHub repository. However, it didn’t work for them. After entering disproportionality.data <- getURL(url) they go...

Cool-ass signal processing using Gaussian processes (birthdays again)

June 14, 2012
By

Aki writes: Here’s my version of the birthday frequency graph. I used Gaussian process with two slowly varying components and periodic component with decay, so that periodic form can change in time. I used Student’s t-distribution as observation model to allow exceptional dates to be outliers. I guess that periodic component due to week effect [...]

Simple rendering of complex data

June 14, 2012
By

Andrew Gelman likes this line chart showing the day-by-day trend in childbirth: Andrew makes a number of good points about this chart. Make sure you read the whole post. One of his points concerns making the line smoother by removing...

You are your shoes. An incoherent study claims.

June 14, 2012
By

According to this link, a study proved that "90 percent of a person's traits can be judged with their shoes." Without needing to look at the study, a reader can reason that this claim makes no sense. What does it mean by 90 percent of a person's traits? How many traits are there in a person? If the researcher defines 1000 traits, then the shoe predictor will need to predict…

Freedman’s Neglected Theorem

June 14, 2012
By
$Freedman’s Neglected Theorem$

—Larry Wasserman In this post I want to review an interesting result by David Freedman (Annals of Mathematical Statistics, Volume 36, Number 2 (1965), 454-456) available at projecteuclid.org. The result gets very little attention. Most researchers in statistics and machine learning seem to be unaware of the result. The result says that, “almost all” Bayesian [...]

Body Weight in the United States – Part 2, "Non Factors"

June 13, 2012
By

Sometimes the story isn't what is a trend, but rather what is not a trend. In this second installment about body weight in the U.S., listing what doesn't seem to be contributing factors will help narrow down what might actually be the problem...

Economists . . .

June 13, 2012
By

Catherine Rampell writes: On Monday the Nobel Foundation, which bestows the world’s most prestigious academic, literary and humanitarian prizes, said it was reducing the cash awarded with Nobel Prizes by about 20 percent. . . . Peter A. Diamond, a professor emeritus at the Massachusetts Institute of Technology who also received the Nobel in economic [...]

June 13, 2012
By

Jacob Oaknin asks: Akaike‘s selection criterion is often justified on the basis of the empirical risk of a ML estimate being a biased estimate of the true generalization error of a parametric family, say the family, S_m, of linear regressors on a m-dimensional variable x=(x_1,..,x_m) with gaussian noise independent of x (for instance in “Unifying [...]

Why R is Hard to Learn

June 13, 2012
By

The open source R software for analytics has a reputation for being hard to learn. It certainly can be, especially for people who are already familiar with similar packages such as SAS, SPSS or Stata. Training and documentation that leverages … Continue reading →

Data on GitHub: The easy way to make your data available

June 13, 2012
By

Update (15 June 2012): See this post for instructions on how to download GitHub based data into R if you are getting the error about an SSL certificate problem. GitHub is designed for collaborating on coding projects. Nonetheless, it is also a pote...

Convergence or divergence? A simple iteration with a random component

June 13, 2012
By

A collegue who works with time series sent me the following code snippet. He said that the calculation was overflowing and wanted to know if this was a bug in SAS: data A(drop=m); call streaminit(12345); m = 2; x = 0; do i = 1 to 5000; x = m*x [...]

Teaching STT 200

June 12, 2012
By

This summer, I am teaching an undergraduate stats class, which is a first class in stats to cover three units, descriptive statistics, probability and statistical inference. The course webpage is here. The following paragraph is from the thesis of Michael Phillip Lesnick. It explains the relationship among the three units: Recall first that in statistics, [...]

Statistics Versus Machine Learning

June 12, 2012
By

—Larry Wasserman Welcome to my blog, which will discuss topics in Statistics and Machine Learning. Some posts will be technical  and others will be non-technical. Since this blog is about topics in both Statistics and Machine Learning, perhaps I should address the question: What is the difference between these two fields? The short answer is: [...]

NBA Predictions — Finals

June 12, 2012
By

Now we are on to the finals! The algorithm enters the finals with a 6-4 record so far. Here is what we have for tonight: So, let’s see if OKC wins this one.

NBA Predictions — Finals

June 12, 2012
By

Now we are on to the finals! The algorithm enters the finals with a 6-4 record so far. Here is what we have for tonight: So, let’s see if OKC wins this one.

NBA Predictions — Finals

June 12, 2012
By

Now we are on to the finals! The algorithm enters the finals with a 6-4 record so far. Here is what we have for tonight: So, let's see if OKC wins this one.

Next R meeting in Paris INSEE: ggplot2 and parallel computing

June 12, 2012
By
$Next R meeting in Paris INSEE: ggplot2 and parallel computing$

Hi, our group of R users from INSEE, aka FLR, meets monthly in Paris. Next meeting is on Wed 13 (tomorrow), 1-2 pm, room 539 (an ID is needed to come in,  map to access INSEE R), about ggplot2 and parallel computing. Since the first meeting in February, presentations have included hot topics like webscrapping, C in R, RStudio, SQLite […]

Finding word use patterns in Wikileaks cables

June 12, 2012
By

6/18: A follow-up to this post is now available here. Recent DiscoveriesWhen I was a diplomat, I was always interested in the Wikileaks cables and what could be done with them. Unfortunately, I never got a chance to look at the site in depth, due to ...

Finding word use patterns in Wikileaks cables

June 12, 2012
By

6/18: A follow-up to this post is now available here. Recent Discoveries When I was a diplomat, I was always interested in the Wikileaks cables and what could be done with them. Unfortunately, I never got a chance to look Continue reading →

Finding word use patterns in Wikileaks cables

June 12, 2012
By

6/18: A follow-up to this post is now available here. Recent DiscoveriesWhen I was a diplomat, I was always interested in the Wikileaks cables and what could be done with them. Unfortunately, I never got a chance to look at the site in depth, due to s...

NBA Predictions — Finals

June 12, 2012
By

Now we are on to the finals! The algorithm enters the finals with a 6-4 record so far. Here is what we have for tonight: So, let’s see if OKC wins this one.

Finding Word Use Patterns in Wikileaks Cables

June 12, 2012
By

6/18: A follow-up to this post is now available here. Recent Discoveries When I was a diplomat, I was always interested in the Wikileaks cables and what could be done with them. Unfortunately, I never got a chance to look at the site in depth, du...