Multiple Pie Charts

May 16, 2013
I was looking at a report the other day that was comparing the number of sub groups in several sets of data. The author had decided that the best way to show the quantity of each sub group was using a pie chart. All well and good but as there were 12 d...

How do we choose our default methods?

May 16, 2013
I was asked to write an article for the Committee of Presidents of Statistical Societies (COPSS) 50th anniversary volume. Here it is (it’s labeled as “Chapter 1,” which isn’t right; that’s just what came out when I used the template that was supplied). The article begins as follows: The field of statistics continues to be [...]The post How do we choose our default methods? appeared first on Statistical Modeling, Causal…

I don’t like 401(k) either

May 16, 2013
Felix Salmon hates the 401(k), and he explains his reasoning here. His strongest argument is the data, which shows that the first generation of retirees who grew up with these individual retirement savings accounts find themselves with meager retirement savings (average: \$120,000, excluding those with zero). I have always disliked 401(k), and here are some reasons: I hate the myth of individual control. These accounts (just like health savings accounts…

Prediction results!

May 16, 2013
$e^{\beta_{1}}$

The moment you've all been waiting for has arrived. No, not the day Ryan Gosling actually eats his cereal, but rather the results of the highly anticipated study which attempted to predict the likelihood of a college student having smoked marijuana giv...

Does quantum uncertainty have a place in everyday applied statistics?

May 15, 2013
Several months ago, Mike Betancourt and I wrote a discussion for the article, Can quantum probability provide a new direction for cognitive modeling?, by Emmanuel Pothos and Jerome Busemeyer, in Behavioral and Brain Sciences. We didn’t say much, but it was a milestone for me because, with this article, BBS became the 100th journal I’d [...]The post Does quantum uncertainty have a place in everyday applied statistics? appeared first on…

Automated Archival and Visual Analysis of Tweets Mentioning #bog13, Bioinformatics, #rstats, and Others

May 15, 2013
Automatically Archiving Twitter ResultsEver since Twitter gamed its own API and killed off great services like IFTTT triggers, I've been looking for a way to automatically archive tweets containing certain search terms of interest to me. Twitter's buil...

Big News! “Practical Data Science with R” MEAP launched!

May 15, 2013
Nina Zumel and I ( John Mount ) have been working very hard on producing an exciting new book called “Practical Data Science with R.” The book has now entered Manning Early Access Program (MEAP) which allows you to subscribe to chapters as they become available and give us feedback before the book goes into […] Related posts: Setting expectations in data science projects Data Science, Machine Learning, and Statistics:…

The bright future of applied statistics

May 15, 2013
In 2013, the Committee of Presidents of Statistical Societies (COPSS) celebrates its 50th Anniversary. As part of its celebration, COPSS will publish a book, with contributions from past recipients of its awards, titled “Past, Present and Future of Statistical Science". Below is … Continue reading →

Reputations changeable, situations tolerable

May 15, 2013
David Kessler, Peter Hoff, and David Dunson write: Marginally specified priors for nonparametric Bayesian estimation Prior specification for nonparametric Bayesian inference involves the difficult task of quantifying prior knowledge about a parameter of high, often infinite, dimension. Realistically, a statistician is unlikely to have informed opinions about all aspects of such a parameter, but may [...]The post Reputations changeable, situations tolerable appeared first on Statistical Modeling, Causal Inference, and Social…

More power brings more responsibility

May 15, 2013
Nick C. on Twitter sent us to the following chart of salaries in Major League Soccer. (link) This chart is hosted at Tableau, which is one of the modern visualization software suites. It appears to be a user submission. Alas,...

Variance matrix differences

May 15, 2013
Torturing portfolios to give different volatilities between a factor model and Ledoit-Wolf shrinkage. Previously There have been posts on: “What the hell is a variance matrix?” factor models Ledoit-Wolf shrinkage Question Two of the several ways to produce an estimate of the variance matrix of asset returns is a statistical factor model and Ledoit-Wolf shrinkage.  … Continue reading →

How to vectorize computations in a matrix language

May 15, 2013
Last week someone posted an interesting question to the SAS/IML Support Community. The problem involved four nested DO loops and took hours to run. By transforming several nested DO loops into an equivalent matrix operation, I was able to reduce the run time to about one second. The process of [...]

Forecasting annual totals from monthly data

May 15, 2013
This question was posed on crossvalidated.com: I have a monthly time series (for 2009–2012 non-stationary, with seasonality). I can use ARIMA (or ETS) to obtain point and interval forecasts for each month of 2013, but I am interested in forecasting the total for the whole year, including prediction intervals. Is there an easy way in R to obtain interval forecasts for the total for 2013? I’ve come across this problem…

From a random generator to a sample function

May 15, 2013
This week-end, I wrote a post since I had some trouble to generate a sample random sample with R, to reproduce one obtained by a co-author, with SAS (generated using Fishman and Moore (1982) used in function RANUNI). I was lucky since another contributor for that book, Christrophe Dutang, got the anwer to the last question I asked: is it possible to reproduce the random generator ? Yes, we can. And…

Le Monde puzzle [#820]

May 14, 2013
The current puzzle is… puzzling: Given the set {1,…,N} with N<61, one iterates the following procedure: take (x,y) within the set and replace the pair with the smallest divider of x+y (bar 1). What are the values of N such that the final value in the set is 61? I find it puzzling because the […]

SIR Model – The Flue Season – Dynamic Programming

May 14, 2013
# The SIR Model (susceptible, infected, and recovered) model is a common and useful tool in epidemiological modelling.# In this post and in future posts I hope to explore how this basic model can be enriched by including different population group...

Much more efficient bubble sort in R using the Rcpp and inline packages

May 14, 2013
Recently I wrote a blogpost showing the implementation of a simple bubble sort algorithm in pure R code. The downside of that implementation was that is was awfully slow. And by slow, I mean really slow, as in “a 100… See more ›

“A sense of security regarding the future of statistical science…” Anon review of Error and Inference

May 14, 2013
Aris Spanos, my colleague and co-author (Economics),recently came across this seemingly anonymous review of our Error and Inference (2010) [E & I]. It’s interesting that the reviewer remarks that “The book gives a sense of security regarding the future of statistical science and its importance in many walks of life.” I wish I knew just what […]

Forecast Update: Will 2014 be the Beginning of the End for SAS and SPSS?

May 14, 2013
I recently updated my plots of the data analysis tools used in academia in my ongoing article, The Popularity of Data Analysis Software. I repeat those here and update my previous forecast of data analysis software usage. Learning to use … Continue reading →

GPstuff: Bayesian Modeling with Gaussian Processes

May 14, 2013
I think it’s part of my duty as a blogger to intersperse, along with the steady flow of jokes, rants, and literary criticism, some material that will actually be useful to you. So here goes. Jarno Vanhatalo, Jaakko Riihimäki, Jouni Hartikainen, Pasi Jylänki, Ville Tolvanen, and Aki Vehtari write: The GPstuff toolbox is a versatile [...]The post GPstuff: Bayesian Modeling with Gaussian Processes appeared first on Statistical Modeling, Causal Inference,…

So much medical research is pretend-science

May 14, 2013
Medical researchers are somehow allowed to get away with statistical murder. It upsets me to read the article in Forbes titled "Pet Owners May Have Lower Risk For Heart Disease." (link) This article takes the form of many other similar articles that purport to find an association between some risk factor and a common disease. Note they always use the weasel word "may". If you see this word, and immediately…

Claims Inflation – a known unknown

May 14, 2013
Over the last year I worked with two colleagues of mine on the subject of inflation and claims inflation in particular. I didn't expect it to be such a challenging topic, but we ended up with more questions than answers. The key question and biggest ch...

Stan!

May 13, 2013
Guy Freeman writes: I thought you’d all like to know that Stan was used and referenced in a peer-reviewed Rapid Communications paper on influenza. Thank you for this excellent modelling language and sampler, which made it possible to carry out this work quickly! I haven’t actually read the paper, but I’m happy to see Stan [...]The post Stan! appeared first on Statistical Modeling, Causal Inference, and Social Science.