Time-series forecasting: Bike Accidents

August 20, 2013
By
Time-series forecasting: Bike Accidents

About a year ago I posted this video visualization of all the reported accidents involving bicycles in Montreal between 2006 and 2010. In the process I also calculated and plotted the accident rate using a monthly moving average. The results followed a pattern that was for the most part to be expected. The rate shoots up […]

Read more »

“[” and “[[” with the apply() functions

August 20, 2013
By
“[” and “[[” with the apply() functions

Did you know you can use "[" and "[[" as function names for subsetting with calls to the apply-type functions? For example, suppose you have a bunch of identifier strings like "ZYY-43S-CWA3" and you want to pull off the bit before the first hyphen ("ZYY" in this case). (For code to create random IDs like […]

Read more »

When did statistics jump the shark?

August 20, 2013
By

Statistics jumped the shark the moment they adopted the following definition, (Gelman & Hill, page 13): A probability distribution corresponds to an urn with a potentially infinite number of balls inside. When a ball is drawn at random, the &#8220...

Read more »

A couple of requests for the @Statistics2013 future of statistics workshop

August 20, 2013
By
A couple of requests for the @Statistics2013 future of statistics workshop

Statistics 2013 is hosting a workshop on the future of statistics. Given the timing and the increasing popularity of our discipline I think its a great idea to showcase the future of our field. I just have two requests: Please … Continue reading →

Read more »

Correcting for multiple comparisons in a Bayesian regression model

August 20, 2013
By

Joe Northrup writes: I have a question about correcting for multiple comparisons in a Bayesian regression model. I believe I understand the argument in your 2012 paper in Journal of Research on Educational Effectiveness that when you have a hierarchical model there is shrinkage of estimates towards the group-level mean and thus there is no […]The post Correcting for multiple comparisons in a Bayesian regression model appeared first on Statistical…

Read more »

Light entertainment: Hidden time, and shifted label

August 20, 2013
By
Light entertainment: Hidden time, and shifted label

Rick (via Twitter) tells me he is baffled by this chart that showed up in Financial Review: I'm baffled as well. What might the designer have in mind? Based on the cues such as length of the curves, one would...

Read more »

Electronic lab notebook

August 20, 2013
By
Electronic lab notebook

I was interested to read C. Titus Brown‘s recent post, “Is version control an electronic lab notebook?” I think version control is really important, and I think all computational scientists should have something equivalent to a lab notebook. But I think of version control as serving needs orthogonal to those served by a lab notebook. […]

Read more »

Step by step to build my first R Hadoop System

August 20, 2013
By
Step by step to build my first R Hadoop System

by Yanchang Zhao, RDataMining.com After reading documents and tutorials on MapReduce and Hadoop and playing with RHadoop for about 2 weeks, finally I have built my first R Hadoop system and successfully run some R examples on it. My experience … Continue reading →

Read more »

ChainLadder 0.1.6 released with chain-ladder factor models

August 20, 2013
By
ChainLadder 0.1.6 released with chain-ladder factor models

Version 0.1.6 of the ChainLadder package has been released and is already available from CRAN.The new version adds the function CLFMdelta. CLFMdelta finds consistent weighting parameters delta for a vector of selected age-to-age chain-ladder factors fo...

Read more »

Exploratory Data Analysis: Useful R Functions for Exploring a Data Frame

Exploratory Data Analysis: Useful R Functions for Exploring a Data Frame

Introduction Data in R are often stored in data frames, because they can store multiple types of data.  (In R, data frames are more general than matrices, because matrices can only store one type of data.)  Today’s post highlights some common functions in R that I like to use to explore a data frame before […]

Read more »

MovieGalaxies: the Social Graph of Popular Movies

August 19, 2013
By
MovieGalaxies: the Social Graph of Popular Movies

Movie Galaxies [moviegalaxies.com], developed by Jermain Kaminski and Michael Schober provides an alternative, data-driven experience to the story lines of popular movies. Based on each movie script, all the interactions of the main characters are ...

Read more »

Statistics and Dr. Strangelove

August 19, 2013
By
Statistics and Dr. Strangelove

One of the biggest embarrassments in statistics is that we don’t really have confidence bands for nonparametric functions estimation. This is a fact that we tend to sweep under the rug. Consider, for example, estimating a density from a sample . The kernel estimator with kernel and bandwidth is Let’s start with getting a confidence […]

Read more »

Mean Values

August 19, 2013
By
Mean Values

Statistical parameters are used to describe a population and are often based on a large number of observations in public …Continue reading »

Read more »

The Bayesian Counterpart of Pearson’s Correlation Test

August 19, 2013
By
The Bayesian Counterpart of Pearson’s Correlation Test

Except for maybe the t test, a contender for the title “most used and abused statistical test” is Pearson’s correlation test. Whenever someone wants to check if two variables relate somehow it is a safe bet (at least in psychology) that the fir...

Read more »

BDA3 still (I hope) at 40% off! (and a link to one of my favorite papers)

August 19, 2013
By
BDA3 still (I hope) at 40% off!  (and a link to one of my favorite papers)

Follow the Amazon link and check to see if it’s still on sale. P.S. I don’t make any money through this link. We do get some royalties from the book, but only a very small amount. I’m pushing the Amazon link right now because (a) I think the book is great, and I want as […]The post BDA3 still (I hope) at 40% off! (and a link to one of…

Read more »

Exponential Smoothing and Stochastic Volatility

August 19, 2013
By

Exponential smoothing is alive and well, and evolving. For the latest, check out Neil Shephard's important 2013 working paper, "Martingale Unobserved Component Models." (Fortunately for North America, the link to Neil's home page will soon be...

Read more »

Book review: Data Points by Nathan Yau

August 19, 2013
By
Book review: Data Points by Nathan Yau

One of my summer projects is to develop the curriculum for a new Certificate in Analytics and Data Visualization, offered at NYU (link). (If you are interested in teaching these courses, please contact me.) The program aims to give students...

Read more »

A letter to reporters on the economy

August 19, 2013
By

The New York Times took over 1,000 words to tell us that Big Data won't change the economy (or is it the economists' profession?) ("Is Big Data an economic Big Dud?") I'm less pessimistic; I think the collection of vast troves of observational data is ultimately beneficial but only if (a) we set a high bar for analytics, such as requiring multiple corroborating data sources pointing to the same conclusion;…

Read more »

A letter to reporters on the economy

August 19, 2013
By

The New York Times took over 1,000 words to tell us that Big Data won't change the economy (or is it the economists' profession?) ("Is Big Data an economic Big Dud?") I'm less pessimistic; I think the collection of vast troves of observational data is ultimately beneficial but only if (a) we set a high bar for analytics, such as requiring multiple corroborating data sources pointing to the same conclusion;…

Read more »

Errors that cause SAS to "freeze"… and what to do about them

August 19, 2013
By
Errors that cause SAS to "freeze"… and what to do about them

Even the best programmers make mistakes. For most errors, SAS software displays the nature and location of the error, returns control to the programmer, and awaits further instructions. However, there are a handful of insidious errors that cause SAS to think that a statement or program is not finished. For [...]

Read more »

Fitting psychometric functions using STAN

August 19, 2013
By
Fitting psychometric functions using STAN

STAN is a new system for Bayesian inference, similar to BUGS and JAGS. I’ve played with it a bit and it’s quite promising, it really has the potential to make MCMC less of a pain (on simple models). I’ve written a short introduction to fitting psychometric functions using STAN and R, in case that’s useful […]

Read more »

Forecasting a Timeseries

August 19, 2013
By
Forecasting a Timeseries

Suppose you have decided on a suitable model for a timeseries. In this case, we have selected an ARIMA(2,1,3) model, using the Akaike Information Criteria (AIC) as our sole criterion for choosing between various models here, where we model the DJIA. Note: There are many criteria for choosing a model, and the AIC is only […]

Read more »

The Perfect Visualization

August 19, 2013
By

There are many rules about how to visualize data. We know how to encode specific types of data, what visual encodings work well, and what does not work so well. But is there such a thing as a perfect visualization for a given set of data? This is a topic that comes up every now […]

Read more »


Subscribe

Email:

  Subscribe