## Pedagogical Content Knowledge

May 19, 2013
Pedagogical content knowledge for Statistics Pedagogical content knowledge means knowing how to teach a specific subject, discipline or context. There is a school of thought that the skill of teaching is transferable between subjects, so long as the teacher knows

## Numerical optimizers for Logistic Regression

May 19, 2013
Following a challenge proposed by Gael to my group I compared several implementations of Logistic Regression. The task was to implement a Logistic Regression model using standard optimization tools from scipy.optimize and compare them against state of ...

## Exploratory Data Analysis – Computing Descriptive Statistics in R for Data on Ozone Pollution in New York City

Introduction This is the first of a series of posts on exploratory data analysis (EDA).  This post will calculate the common summary statistics of a univariate continuous data set – the data on ozone pollution in New York City that is part of the built-in “airquality” data set in R.  This is a particularly good data set […]

## Sunday data/statistics link roundup (5/19/2013)

May 19, 2013
This is a ridiculously good post on 20th versus 21st century problems and the rise of the importance of empirical science. I particularly like the discussion of what it means to be a "solved" problem and how that has changed.

## Prose is paragraphs, prose is sentences

May 19, 2013
This isn’t quite right—poetry, too, can be in paragraph form (see Auden, for example, or Frost, or lots of other examples)—but Basbøll is on to something here. I’m reminded of Nicholson Baker’s hilarious “From the I...

## Sharing my R notes

May 19, 2013
I started working with R 2 1/2 years ago. I remember opening R closing it and thinking it was the dumbest thing ever (command line to a non programmer is not inviting). Now it's my constant friend. From the beginning

May 18, 2013
$STEIN’S PARADOX$

STEIN’S PARADOX Something that is well known in the statistics world but perhaps less well known in the machine learning world is Stein’s paradox. When I was growing up, people used to say: do you remember where you were when you heard that JFK died? (I was three, so I don’t remember. My first memory […]

## What is probabilistic truth?

May 18, 2013
I am currently working on a validation metric for binary prediction models. That is, models which make predictions about outcomes that can take on either of two possible states (eg Dead/not dead, heads/tails, cat in picture/no cat in picture, etc.) The most commonly used metric for this class of models is AUC, which assesses the […]

## uuuuuuuuuuuuugly

May 18, 2013
Hamdan Azhar writes: I came across this graphic of vaccine-attributed decreases in mortality and was curious if you found it as unattractive and unintuitive as I did. Hope all is well with you! My reply: All's well with me. And yes, that's one horrible graph. It has all the problems with a bad infographic with

## Bubble sort tuning

May 18, 2013
I was reading Paul Hiemsta's blogpost on Much more efficient bubble sort in R using the Rcpp and inline packages, went back to his first post  Bubble sort implemented in pure R and thought, surely we can do it better in pure R. So I...

## Chutes & ladders: How long is this going to take?

May 17, 2013
I was playing Chutes & Ladders with my four-year-old daughter yesterday, and I thought, “How long is this going to take?” I saw an interesting mathematical analysis of the game a few years ago, but it seems to be offline, though you can read it via the wayback machine. But that didn’t answer my specific […]

## Where do theories come from?

May 17, 2013
Lee Sechrest sends along this article by Brian Haig and writes that it "presents what seems to me a useful perspective on much of what scientists/statisticians do and how science works, at least in the fields in which I work." Here's Haig's abstract: A broad theory of scientific method is sketched that has particular relevance

## The Future of Non Probability Sampling

May 17, 2013
While attending the American Association for Public Opinion Research conference in Boston, MA the topic of non-probability samples was something of a reoccurring theme. I attended the task force panel review on the topic. However, there is currently no commonly accepted solution. It was about one year ago that Pew reported (Pew report) that their […]

## Words & Votes: The Changing Congressional Opinions on Gun Violence

May 17, 2013
The political visualization Words & Votes [sandyhookpromise.org], developed by digital agency R/GA for non-profit organization Sandy Hook Promise, provides a comprehensive look into the opinions of congressional representatives on the issue of gun vio...

## Spatial Statistics Seminar in Toronto – Tuesday, May 21, 2013 @ SAS Canada Headquarters

I volunteer with the Southern Ontario Regional Association (SORA) of the Statistical Society of Canada (SSC) to organize a seminar series on business analytics here in Toronto.  The final seminar of the 2012-2013 series will be held on Tuesday, May 21 at SAS Canada Headquarters.  If you’re interested in attending, please email seminar.sora@gmail.com with the […]

## When does replication reveal fraud?

May 17, 2013
Here's a little thought experiment for your weekend pleasure. Consider the following: Joe Scientist decides to conduct a study (call it Study A) to test the hypothesis that a parameter D > 0 vs. the null hypothesis that D = 0. He

## How can statisticians help psychologists do their research better?

May 17, 2013
I received two emails yesterday on related topics. First, Stephen Olivier pointed me to this post by Daniel Lakens, who wrote the following open call to statisticians: You would think that if you are passionate about statistics, then you want to help people to calculate them correctly in any way you can. . . .

## Analyzing a simple experiment with heterogeneous variances using asreml, MCMCglmm and SAS

May 17, 2013
I was working with a small experiment which includes families from two Eucalyptus species and thought it would be nice to code a first analysis using alternative approaches. The experiment is a randomized complete block design, with species as fixed effect and family and block as a random effects, while the response variable is growth […]

## Finding patterns in time series using regular expressions

May 17, 2013
Regular expressions are a fantastic tool when you’re looking for patterns in time series. I wish I’d realised that sooner. Here’s a timely example: traditionally, when you have two successive quarters of negative GDP growth, you’re in recession. We have a quarterly GDP time series for Australia, and we want to know how many recessions […]

## IJF quality indicators

May 17, 2013
I often receive email asking about IJF quality indicators. Here is one I received today. Dear Professor Hyndman, I recently had a paper published in IJF entitled, “xxxxxxxxxxxx”. I am very pleased with the publication and consider IJF to be an excellent outlet for my work in time-series econometrics. I have an unusual request, but I hope you will consider responding. My research is judged by non-economists and IJF is…

## Multiple Pie Charts

May 16, 2013
I was looking at a report the other day that was comparing the number of sub groups in several sets of data. The author had decided that the best way to show the quantity of each sub group was using a pie chart. All well and good but as there were 12 d...

## How do we choose our default methods?

May 16, 2013
I was asked to write an article for the Committee of Presidents of Statistical Societies (COPSS) 50th anniversary volume. Here it is (it's labeled as "Chapter 1," which isn't right; that's just what came out when I used the template that was supplied). The article begins as follows: The field of statistics continues to be

## I don’t like 401(k) either

May 16, 2013
Felix Salmon hates the 401(k), and he explains his reasoning here. His strongest argument is the data, which shows that the first generation of retirees who grew up with these individual retirement savings accounts find themselves with meager retirement savings (average: \$120,000, excluding those with zero). I have always disliked 401(k), and here are some reasons: I hate the myth of individual control. These accounts (just like health savings accounts…