## Time-series forecasting: Bike Accidents

August 20, 2013
By

About a year ago I posted this video visualization of all the reported accidents involving bicycles in Montreal between 2006 and 2010. In the process I also calculated and plotted the accident rate using a monthly moving average. The results followed a pattern that was for the most part to be expected. The rate shoots up […]

## “[” and “[[” with the apply() functions

August 20, 2013
By

Did you know you can use "[" and "[[" as function names for subsetting with calls to the apply-type functions? For example, suppose you have a bunch of identifier strings like "ZYY-43S-CWA3" and you want to pull off the bit before the first hyphen ("ZYY" in this case). (For code to create random IDs like […]

## When did statistics jump the shark?

August 20, 2013
By

Statistics jumped the shark the moment they adopted the following definition, (Gelman & Hill, page 13): A probability distribution corresponds to an urn with a potentially infinite number of balls inside. When a ball is drawn at random, the &#8220...

## A couple of requests for the @Statistics2013 future of statistics workshop

August 20, 2013
By

Statistics 2013 is hosting a workshop on the future of statistics. Given the timing and the increasing popularity of our discipline I think its a great idea to showcase the future of our field. I just have two requests: Please … Continue reading →

## Correcting for multiple comparisons in a Bayesian regression model

August 20, 2013
By

Joe Northrup writes: I have a question about correcting for multiple comparisons in a Bayesian regression model. I believe I understand the argument in your 2012 paper in Journal of Research on Educational Effectiveness that when you have a hierarchical model there is shrinkage of estimates towards the group-level mean and thus there is no […]The post Correcting for multiple comparisons in a Bayesian regression model appeared first on Statistical…

## Light entertainment: Hidden time, and shifted label

August 20, 2013
By

Rick (via Twitter) tells me he is baffled by this chart that showed up in Financial Review: I'm baffled as well. What might the designer have in mind? Based on the cues such as length of the curves, one would...

## Electronic lab notebook

August 20, 2013
By

I was interested to read C. Titus Brown‘s recent post, “Is version control an electronic lab notebook?” I think version control is really important, and I think all computational scientists should have something equivalent to a lab notebook. But I think of version control as serving needs orthogonal to those served by a lab notebook. […]

## Step by step to build my first R Hadoop System

August 20, 2013
By

by Yanchang Zhao, RDataMining.com After reading documents and tutorials on MapReduce and Hadoop and playing with RHadoop for about 2 weeks, finally I have built my first R Hadoop system and successfully run some R examples on it. My experience … Continue reading →

## ChainLadder 0.1.6 released with chain-ladder factor models

August 20, 2013
By

Version 0.1.6 of the ChainLadder package has been released and is already available from CRAN.The new version adds the function CLFMdelta. CLFMdelta finds consistent weighting parameters delta for a vector of selected age-to-age chain-ladder factors fo...

## Exploratory Data Analysis: Useful R Functions for Exploring a Data Frame

Introduction Data in R are often stored in data frames, because they can store multiple types of data.  (In R, data frames are more general than matrices, because matrices can only store one type of data.)  Today’s post highlights some common functions in R that I like to use to explore a data frame before […]

## MovieGalaxies: the Social Graph of Popular Movies

August 19, 2013
By

Movie Galaxies [moviegalaxies.com], developed by Jermain Kaminski and Michael Schober provides an alternative, data-driven experience to the story lines of popular movies. Based on each movie script, all the interactions of the main characters are ...

## Statistics and Dr. Strangelove

August 19, 2013
By
$Statistics and Dr. Strangelove$

One of the biggest embarrassments in statistics is that we don’t really have confidence bands for nonparametric functions estimation. This is a fact that we tend to sweep under the rug. Consider, for example, estimating a density from a sample . The kernel estimator with kernel and bandwidth is Let’s start with getting a confidence […]

## Mean Values

August 19, 2013
By

Statistical parameters are used to describe a population and are often based on a large number of observations in public …Continue reading »

## The Bayesian Counterpart of Pearson’s Correlation Test

August 19, 2013
By

Except for maybe the t test, a contender for the title “most used and abused statistical test” is Pearson’s correlation test. Whenever someone wants to check if two variables relate somehow it is a safe bet (at least in psychology) that the fir...

## BDA3 still (I hope) at 40% off! (and a link to one of my favorite papers)

August 19, 2013
By

Follow the Amazon link and check to see if it’s still on sale. P.S. I don’t make any money through this link. We do get some royalties from the book, but only a very small amount. I’m pushing the Amazon link right now because (a) I think the book is great, and I want as […]The post BDA3 still (I hope) at 40% off! (and a link to one of…

## Exponential Smoothing and Stochastic Volatility

August 19, 2013
By

Exponential smoothing is alive and well, and evolving. For the latest, check out Neil Shephard's important 2013 working paper, "Martingale Unobserved Component Models." (Fortunately for North America, the link to Neil's home page will soon be...

## Book review: Data Points by Nathan Yau

August 19, 2013
By

One of my summer projects is to develop the curriculum for a new Certificate in Analytics and Data Visualization, offered at NYU (link). (If you are interested in teaching these courses, please contact me.) The program aims to give students...

## A letter to reporters on the economy

August 19, 2013
By

The New York Times took over 1,000 words to tell us that Big Data won't change the economy (or is it the economists' profession?) ("Is Big Data an economic Big Dud?") I'm less pessimistic; I think the collection of vast troves of observational data is ultimately beneficial but only if (a) we set a high bar for analytics, such as requiring multiple corroborating data sources pointing to the same conclusion;…

## A letter to reporters on the economy

August 19, 2013
By

The New York Times took over 1,000 words to tell us that Big Data won't change the economy (or is it the economists' profession?) ("Is Big Data an economic Big Dud?") I'm less pessimistic; I think the collection of vast troves of observational data is ultimately beneficial but only if (a) we set a high bar for analytics, such as requiring multiple corroborating data sources pointing to the same conclusion;…

## Errors that cause SAS to "freeze"… and what to do about them

August 19, 2013
By

Even the best programmers make mistakes. For most errors, SAS software displays the nature and location of the error, returns control to the programmer, and awaits further instructions. However, there are a handful of insidious errors that cause SAS to think that a statement or program is not finished. For [...]

## Fitting psychometric functions using STAN

August 19, 2013
By

STAN is a new system for Bayesian inference, similar to BUGS and JAGS. I’ve played with it a bit and it’s quite promising, it really has the potential to make MCMC less of a pain (on simple models). I’ve written a short introduction to fitting psychometric functions using STAN and R, in case that’s useful […]

## Forecasting a Timeseries

August 19, 2013
By
$Forecasting a Timeseries$

Suppose you have decided on a suitable model for a timeseries. In this case, we have selected an ARIMA(2,1,3) model, using the Akaike Information Criteria (AIC) as our sole criterion for choosing between various models here, where we model the DJIA. Note: There are many criteria for choosing a model, and the AIC is only […]

## The Perfect Visualization

August 19, 2013
By

There are many rules about how to visualize data. We know how to encode specific types of data, what visual encodings work well, and what does not work so well. But is there such a thing as a perfect visualization for a given set of data? This is a topic that comes up every now […]