March 30, 2013
The R statistical package is available at CRAN website. In this site, do the followingClick the Download R for Windows. Then go to install R for the first time in the base subdirectory. And finally download the latest version of R for windows...

## Householder matrices

March 29, 2013
Householder matrices are square matrices of the form $$P = I - \beta v v^T$$ where $\beta$ is a scalar and $v$ is a vector. It has the useful property that for suitable chosen $v$ and $\beta$ it makes the product $P x$ to zero out all of the coordinat...

## Open Data Exchange 2013, April 6. Montreal

March 29, 2013
UPDATE: The day was great! There are many people doing really amazing things with open data and it was amazing to meet them. Here are my slides from the panel talk. Next Saturday, I’ll be sitting on a panel discussing future avenues for open data at ODX13. From the odx13 site: Odx13 is a mini-conference […]

## Another Feller theory

March 29, 2013
My paper with Christian Robert, “Not Only Defended But Also Applied”: The Perceived Absurdity of Bayesian Inference, was recently published in The American Statistician, along with discussions by Steve Fienberg, Steve Stigler, Deborah Mayo, and Wesley Johnson, and our rejoinder, The Anti-Bayesian Moment and Its Passing. These articles revolved around the question of why the [...]

## The mirage of large numbers

March 29, 2013
The first thing one (should) learn about statistics is "all that data is not information." That's the very first thing I tell my class each semester. This message is doubly resonant in this age of "Big Data". *** I was reading a post on Dell at Felix Salmon's blog, a post written by Ryan McCarthy or Ben Walsh. It cited BusinessWeek's Roben Farzad: "When it comes to putting a price…

## latent Gaussian model workshop in Reykjavik

March 28, 2013
An announcement for an Icelandic meeting next September, meeting I would have loved to attend (darn!)… This meeting is sponsored by the BayesComp session, of course!!! We are pleased to announce that the University of Iceland will host the 3rd Workshop on Bayesian Inference for Latent Gaussian Models with Applications (LGM). The workshop will be [...]

## Generalized Pairs Plot: It’s about time!

March 28, 2013
JW Emerson, WA Green, B Schloerke, J Crowley, D Cook, H Hofmann, H Wickham (2013) The Generalized Pairs Plot. Journal of Computational and Graphical Statistics 22(1). Here's a free preprint version. Until this new paper and implementation by Emerson et al., there were no widely available pairs plots that accommodated both numerical and categorical fields. [...]

## Benford law and lognormal distributions

March 28, 2013
$X$

Benford’s law is nowadays extremely popular (see e.g. http://en.wikipedia.org/…). It is usually claimed that, for a given set data set, changing units does not affect the distribution of the first digit. Thus, it should be related to scale invariant distributions. Heuristically, scale (or unit) invariance means that the density of the measure  (or probability function) should be proportional to . Thus, because densities integrate to 1, the proportionality coefficient has…

## Racism!

March 28, 2013
I was reading a book of Alfred Kazin’s letters—I don’t know if they’d be so interesting to someone who hadn’t already read a bunch of his stuff, but I found them pretty interesting—and came across this amazing bit, dated August 11, 1957: No, really, Al. Tell us what you really feel. This was in his [...]

## The state of charting software

March 28, 2013
Andrew Wheeler took the time to write code (in SPSS) to create the "Scariest Chart ever" (link). I previously wrote about my own attempt to remake the famous chart in grayscale. I complained that this is a chart that is...

## Amanda Knox and Statistical Nullification

March 27, 2013
In today’s New York Times, Leila Schneps and Coralie Colmez correctly warn that … math can become a weapon that impedes justice and destroys innocent lives. They discuss Lucia de Berk, and Sally Clark, two unfortunate people who were convicted of crimes based on bogus statistical arguments. Statistician Richard Gill helped get de Berk’s conviction […]

## Metropolitain: Exploring the Paris Metro in 3D

March 27, 2013
Metropolitain [metropolitain.io], developed by French data visualization studio Dataveyes, provides a dynamic 2D and 3D view on the hectic and feverish metro of the city of Paris. Based on data on crowd turnouts retrieved from RATP (Autonomous Opera...

## Last session of Caltech’s Learning from Data course starts April 2

March 27, 2013
I just received this email:Caltech's Machine Learning MOOC is coming to an end this spring, with the final session starting on April 2. There will be no future sessions. The course has attracted more than 200,000 participants since its launch last year...

## Higgs analysis and statistical flukes (part 2)

March 27, 2013
Everyone was excited when the Higgs boson results were reported on July 4, 2012 indicating evidence for a Higgs-like particle based on a “5 sigma observed effect”. The observed effect refers to the number of excess events of a given type that are “observed” in comparison to the number (or proportion) that would be expected from background […]

## Evolutionary Computation and Data Mining in Biology

March 27, 2013
For over 15 years, members of the computer science, machine learning, and data mining communities have gathered in a beautiful European location each spring to share ideas about biologically-inspired computation.  Stemming from the work of John Ho...

## “Two Dogmas of Strong Objective Bayesianism”

March 27, 2013
Prasanta Bandyopadhyay and Gordon Brittan write: We introduce a distinction, unnoticed in the literature, between four varieties of objective Bayesianism. What we call ‘strong objective Bayesianism’ is characterized by two claims, that all scientific inference is ‘logical’ and that, given the same background information two agents will ascribe a unique probability to their priors. We [...]

## What’s Congress saying?

March 27, 2013
Thanks to Codeacademy's API primer, I'm able to access a bunch of cool information related to what the U.S. Congress is up to. For example, some of the more interesting of the top 20 most popular phrases said by members of New York's Congress are (rank...

## Big Data and official statistics

March 27, 2013
From: http://www.bruegel.org/nc/blog/detail/article/1059-blogs-review-big-data-aggregates-and-individuals/#.UVL84BxhWCkBig Data is relevant to the production, relevance and reliability of key official statistics such as GDP and inflation.Michael Horrig...

## My talk at the University of Michigan today 4pm

March 27, 2013
Causality and Statistical Learning Andrew Gelman, Statistics and Political Science, Columbia University Wed 27 Mar, 4pm, Betty Ford Auditorium, Ford School of Public Policy Causal inference is central to the social and biomedical sciences. There are unresolved debates about the meaning of causality and the methods that should be used to measure it. As a [...]

## Mix percent metaphors, add average confusion, and serve

March 27, 2013
Sometimes, a chart just strains your mind. Such is the case with the following, a tip from Augustine F. (@acfou) There are just so many percentages on the chart it's really hard to figure out which is which. Under the...

## How to compute the distance between observations in SAS

March 27, 2013
In statistics, distances between observations are used to form clusters, to identify outliers, and to estimate distributions. Distances are used in spatial statistics and in other application areas. There are many ways to define the distance between observations. I have previously written an article that explains Mahalanobis distance, which is [...]

## Got Data from People? Take Dan Ariely’s Coursera course.

March 26, 2013
A Beginner's Guide to Irrational Behavior started yesterday.  One might not immediately think that such a course would be relevant for statistical modeling.  Well, it is if your statistical modeling uses people as informants.  If the dat...

## An instructor’s thoughts on peer-review for data analysis in Coursera

March 26, 2013
I used peer-review for the data analysis course I just finished. As I mentioned in the post-mortem podcast I knew in advance that it was likely to be the most controversial component of the class. So it wasn’t surprising that based … Continue reading →