## Comparing two groups? Two tips that make a difference

August 21, 2013
A common visualization is to compare characteristics of two groups. This article emphasizes two tips that will help make the comparison clear. First, consider graphing the differences between the groups. Second, in any plot that has a categorical axis, sort the categories by a meaningful quantity. This article is motivated [...]

## Las Vegas and financial institutions

August 21, 2013
Exactly one month ago, I entered the Bellagio casino to gamble at the roulette. It was actually a request from my daughter’s godfather (who happens to be a probabilist, actually). On a comment on a previous post, he suggested the following deal, In the Bellagio you put 10\$ for me on the 33 and 10\$ for you as well. If 33 shows up, you bring me to a French “3…

## Job opening at an organization that promotes reproducible research!

August 21, 2013
I was told about an organization called Reproducibility Initiative. They tell me they are trying to make what was described in our “50 shades of gray” post standard across all of science, particularly areas like cancer research. I don’t know anything else about them, but that sounds like a good start! Here’s the ad: Data […]The post Job opening at an organization that promotes reproducible research! appeared first on Statistical…

## BKLYNR: Mapping the Age of each Building in Brooklyn

August 20, 2013
BKLYNR [bklynr.com] by web designer Thomas Rhiel is a highly detailed map that reveals the age of each of the more than 320,000 buildings currently present in Brooklyn. The interactive map reveals how the historical urban development has rippled acro...

## Time-series forecasting: Bike Accidents

August 20, 2013
About a year ago I posted this video visualization of all the reported accidents involving bicycles in Montreal between 2006 and 2010. In the process I also calculated and plotted the accident rate using a monthly moving average. The results followed a pattern that was for the most part to be expected. The rate shoots up […]

## “[” and “[[” with the apply() functions

August 20, 2013
Did you know you can use "[" and "[[" as function names for subsetting with calls to the apply-type functions? For example, suppose you have a bunch of identifier strings like "ZYY-43S-CWA3" and you want to pull off the bit before the first hyphen ("ZYY" in this case). (For code to create random IDs like […]

## When did statistics jump the shark?

August 20, 2013
Statistics jumped the shark the moment they adopted the following definition, (Gelman & Hill, page 13): A probability distribution corresponds to an urn with a potentially infinite number of balls inside. When a ball is drawn at random, the &#8220...

## A couple of requests for the @Statistics2013 future of statistics workshop

August 20, 2013
Statistics 2013 is hosting a workshop on the future of statistics. Given the timing and the increasing popularity of our discipline I think its a great idea to showcase the future of our field. I just have two requests: Please … Continue reading →

## Correcting for multiple comparisons in a Bayesian regression model

August 20, 2013
Joe Northrup writes: I have a question about correcting for multiple comparisons in a Bayesian regression model. I believe I understand the argument in your 2012 paper in Journal of Research on Educational Effectiveness that when you have a hierarchical model there is shrinkage of estimates towards the group-level mean and thus there is no […]The post Correcting for multiple comparisons in a Bayesian regression model appeared first on Statistical…

## Light entertainment: Hidden time, and shifted label

August 20, 2013
Rick (via Twitter) tells me he is baffled by this chart that showed up in Financial Review: I'm baffled as well. What might the designer have in mind? Based on the cues such as length of the curves, one would...

## Electronic lab notebook

August 20, 2013
I was interested to read C. Titus Brown‘s recent post, “Is version control an electronic lab notebook?” I think version control is really important, and I think all computational scientists should have something equivalent to a lab notebook. But I think of version control as serving needs orthogonal to those served by a lab notebook. […]

## Step by step to build my first R Hadoop System

August 20, 2013
by Yanchang Zhao, RDataMining.com After reading documents and tutorials on MapReduce and Hadoop and playing with RHadoop for about 2 weeks, finally I have built my first R Hadoop system and successfully run some R examples on it. My experience … Continue reading →

August 20, 2013
Version 0.1.6 of the ChainLadder package has been released and is already available from CRAN.The new version adds the function CLFMdelta. CLFMdelta finds consistent weighting parameters delta for a vector of selected age-to-age chain-ladder factors fo...

## Exploratory Data Analysis: Useful R Functions for Exploring a Data Frame

Introduction Data in R are often stored in data frames, because they can store multiple types of data.  (In R, data frames are more general than matrices, because matrices can only store one type of data.)  Today’s post highlights some common functions in R that I like to use to explore a data frame before […]

## MovieGalaxies: the Social Graph of Popular Movies

August 19, 2013
Movie Galaxies [moviegalaxies.com], developed by Jermain Kaminski and Michael Schober provides an alternative, data-driven experience to the story lines of popular movies. Based on each movie script, all the interactions of the main characters are ...

## Statistics and Dr. Strangelove

August 19, 2013
$Statistics and Dr. Strangelove$

One of the biggest embarrassments in statistics is that we don’t really have confidence bands for nonparametric functions estimation. This is a fact that we tend to sweep under the rug. Consider, for example, estimating a density from a sample . The kernel estimator with kernel and bandwidth is Let’s start with getting a confidence […]

## Mean Values

August 19, 2013
Statistical parameters are used to describe a population and are often based on a large number of observations in public …Continue reading »

## The Bayesian Counterpart of Pearson’s Correlation Test

August 19, 2013
Except for maybe the t test, a contender for the title “most used and abused statistical test” is Pearson’s correlation test. Whenever someone wants to check if two variables relate somehow it is a safe bet (at least in psychology) that the fir...

## BDA3 still (I hope) at 40% off! (and a link to one of my favorite papers)

August 19, 2013
Follow the Amazon link and check to see if it’s still on sale. P.S. I don’t make any money through this link. We do get some royalties from the book, but only a very small amount. I’m pushing the Amazon link right now because (a) I think the book is great, and I want as […]The post BDA3 still (I hope) at 40% off! (and a link to one of…

## Exponential Smoothing and Stochastic Volatility

August 19, 2013
Exponential smoothing is alive and well, and evolving. For the latest, check out Neil Shephard's important 2013 working paper, "Martingale Unobserved Component Models." (Fortunately for North America, the link to Neil's home page will soon be...

## Book review: Data Points by Nathan Yau

August 19, 2013
One of my summer projects is to develop the curriculum for a new Certificate in Analytics and Data Visualization, offered at NYU (link). (If you are interested in teaching these courses, please contact me.) The program aims to give students...

## A letter to reporters on the economy

August 19, 2013
The New York Times took over 1,000 words to tell us that Big Data won't change the economy (or is it the economists' profession?) ("Is Big Data an economic Big Dud?") I'm less pessimistic; I think the collection of vast troves of observational data is ultimately beneficial but only if (a) we set a high bar for analytics, such as requiring multiple corroborating data sources pointing to the same conclusion;…

