## The robust beauty of improper linear models in decision making

August 14, 2013
By

Andreas Graefe writes (see here here here): The usual procedure for developing linear models to predict any kind of target variable is to identify a subset of most important predictors and to estimate weights that provide the best possible solution for a given sample. The resulting “optimally” weighted linear composite is then used when predicting […]The post The robust beauty of improper linear models in decision making appeared first on…

## R tutorials

August 14, 2013
By

My course on non-life insurance (ACT2040) will start in a few weeks. I will use R to illustrate predictive modeling. A nice introduction for those who do not know R can be found online.

## Attribution in online marketing: a Big Data problem

August 14, 2013
By

Avinash Kaushik's masterful post on the "mutli-channel attribution problem" in Web analytics is required reading for anyone seeking an understanding of what Big Data is really about. Kaushik's posts are marathons; I provide here a little background, plus some highlights from his post to save you some time. But you absolutely should read the whole thing! I will start from the elementary. Big Data is big because the Internet was…

## Using AIC to Test ARIMA Models

August 14, 2013
By

The Akaike Information Critera (AIC) is a widely used measure of a statistical model. It basically quantifies 1) the goodness of fit, and 2) the simplicity/parsimony, of the model into a single statistic. When comparing two models, the one with the lower AIC is generally “better”. Now, let us apply this powerful tool in comparing […]

## Dryer balls and drying time: A statistical analysis

August 14, 2013
By

Earlier this week I posted a "guest blog" in which my 8th grade son described a visualization of data for the 2013 ASA Poster Competition. The purpose of today's blog post is to present a higher-level statistical analysis of the same data. I will use a t test and a [...]

## How do I re-arrange??: Ordering a plot revisited

August 14, 2013
By

Back in October of last year I wrote a blog post about reordering/rearanging plots. This was, and continues to be, a frequent question on list serves and R help sites. In light of my recent studies/presenting on The Mechanics of … Continue reading →

## The Brand as Affordance: Item Response Modeling of Brand Perceptions

August 14, 2013
By

It is just too easy to think of a brand as a web of associations.  What comes to mind when I say "Subway Sandwich"?  Did you remember a commercial or the "eat fresh" tagline?  Without much effort, one can generate a long list of associat...

## WANTED: Neuro-quants

August 13, 2013
By

Our good colleagues Brian Caffo, Martin Lindquist, and Ciprian Crainiceanu have written a nice editorial for the HuffPo on the need for statisticians in neuroimaging.

## Baseball’s Steroids Problem Won’t Go Away

August 13, 2013
By

Steroids continue to plague professional sports. The latest name to fall is Alex Rodriguez, the shortstop/3rd baseball superstar who currently plays for the Yankees. It wasn't long ago that he was considered a "good guy" of the sport. Now, he's a pariah. In the rush to make Rodriguez the villain, the media continues to miss these two important aspects to the steroids story: Anti-doping tests have a huge false-negative problem.…

## Blogging E.S. Pearson’s Statistical Philosophy

August 13, 2013
By

For a bit more on the statistical philosophy of Egon Sharpe (E.S.) Pearson (11 Aug, 1895-12 June, 1980), I reblog a post from last year. It gets to the question I now call: performance or probativeness? Are frequentist methods mainly useful to supply procedures which will not err too frequently in some long run? (performance) […]

## Convincing Evidence

August 13, 2013
By

Keith O’Rourke and I wrote an article that begins: Textbooks on statistics emphasize care and precision, via concepts such as reliability and validity in measurement, random sampling and treatment assignment in data collection, and causal identification and bias in estimation. But how do researchers decide what to believe and what to trust when choosing which […]The post Convincing Evidence appeared first on Statistical Modeling, Causal Inference, and Social Science.

## Test scores and grades predict job performance (but maybe not at Google)

August 13, 2013
By

Eric Loken writes: If you’re used to Google upending conventional wisdom, then yesterday’s interview with Laszlo Bock in the New York Times did not disappoint. Google has determined that test scores and transcripts are useless because they don’t predict performance among its employees. . . . I [Loken] am going to assume they’re well aware […]The post Test scores and grades predict job performance (but maybe not at Google) appeared…

## Various ways to show variability

August 13, 2013
By

Reader Doeke W. sends me to this chart. I like many aspects of this exercise. This chart displays the results of an experiment conducted by a computer games company to show that the new build ("249") renders frames faster than...

## When Discussing Confidence Level With Others…

August 13, 2013
By

This post spawned from a discussion I had the other day. Confidence intervals are notoriously a difficult topic for those unfamiliar with statistics. I can’t really think of another statistical topic that is so widely published in newspaper articles, television, and elsewhere that so few people really understand. It’s been this way since the moment […]

## Damaged and can’t be opened

August 13, 2013
By

Perhaps you tried to open some application or mount some DMG on your Mac and encountered the following alarming message “[Application] is damaged and can’t be opened. You should move it to the trash.” Perhaps it is indeed damaged. But more likely it is just not signed by its developer or not made available from […]

## Genetic drift simulation

August 13, 2013
By

While preparing for the new teaching semester I have created an implementation of NetLogo GenDrift P local in GNU R.The model works as follows. Initially a square grid having side size is randomly populated with n types of agen...

## Installing a SSD drive into a mid-2007 iMac

August 13, 2013
By

I have a mid-2007 iMac with a 2.4 GHz Core2Duo processor and despite the fact that it is already six years old, it still does a good job. However, compared to a friend's recent MacBook Air with a solid state disk (SSD) it feels sluggish when opening pr...

## The Golden Age of Information Graphics

August 13, 2013
By

Infographics today are mostly pointless decorations around a few simple facts that add nothing meaningful. But information graphics once deserved their name with dense, meticulously-drawn, well-researched information. Here is an example from 1944. The Lawrence Livermoore National Lab recently posted this Chart of Electromagnetic Radiations, which was originally published in 1944, on their flickr stream. […]

## Krugman’s "Very Serious Person" (VSP)

August 13, 2013
By

Paul Krugman's term "VSP" is simply wonderful: so concise and apt, capturing a personage previously vaguely sensed but never fully grasped. And of course it's funny too. Hence it's even better than classics from decades past, like WASP (coined, by the ...

## Short tales of two NCAA basketball conferences (Big 12 and West Coast) using graphs

August 12, 2013
By

UPDATE: THE BLOG/SITE HAS MOVED TO GITHUB. THE NEW LINK FOR THE BLOG/SITE IS patilv.github.io and THE LINK TO THIS POST IS: http://bit.ly/1kvathJ. PLEASE UPDATE ANY BOOKMARKS YOU MAY HAVE.Having been at the University of Kansas (Kansas Jayhawks) as a s...

## Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R

$Exploratory Data Analysis: The 5-Number Summary – Two Different Methods in R$

Introduction Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on 5-number summaries, which were previously mentioned in the post on descriptive statistics in this series.  I will define and calculate the 5-number summary in 2 different ways that are commonly used in R.  (It turns out that different methods arise from […]

## Understanding the ENSEMBL Schema

August 12, 2013
By

ENSEMBL is a frequently used resource for various genomics and transcriptomics tasks.  The ENSEMBL website and MART tools provide easy access to their rich database, but ENSEMBL also provides flat-file downloads of their entire database and a publ...

## Fixing the race, ethnicity, and national origin questions on the U.S. Census

August 12, 2013
By

In his new book, “What is Your Race? The Census and Our Flawed Efforts to Classify Americans,” former Census Bureau director Ken Prewitt recommends taking the race question off the decennial census: He recommends gradual changes, integrating the race and national origin questions while improving both. In particular, he would replace the main “race” question […]The post Fixing the race, ethnicity, and national origin questions on the U.S. Census appeared…