## Absorbing Markov chains in SAS

July 13, 2016
Last week I showed how to represent a Markov transition matrix in the SAS/IML matrix language. I also showed how to use matrix multiplication to iterate a state vector, thereby producing a discrete-time forecast of the state of the Markov chain system. This article shows that the expected behavior of

## I know you guys think I have no filter, but . . .

July 13, 2016
. . . Someone sent me a juicy bit of news related to one of our frequent blog topics, and I shot back a witty response (or, at least, it seemed witty to me), but I decided not to post it here because I was concerned that people might take it as a personal attack

## Extending R

July 12, 2016
As I was previously unaware of this book coming up, my surprise and excitement were both extreme when I received it from CRC Press a few weeks ago! John Chambers, one of the fathers of S, precursor of R, had just published a book about extending R. It covers some reflections of the author on

## vtreat version 0.5.26 released on CRAN

July 12, 2016
'vtreat' version 0.5.26 has been released on CRAN. 'vtreat' is a data.frame processor/conditioner that prepares real-world data for predictive modeling in a statistically sound manner. (from the package documentation) 'vtreat' is an R package that incorporates a number of transforms and simulated out of

## Some insider stuff on the Stan refactor

July 12, 2016
From the stan-dev list, Bob wrote [and has since added brms based on comments; the * packages are ones that aren't developed or maintained by the stan-dev team, so we only know what we hear from their authors]: The bigger picture is this, and you see the stan-dev/stan repo really spans three logical layers: stan

## It’s more important to know the source than the value of a number

July 12, 2016
Here we go again. ABC News reported that Ricky Williams, former NFL star, proclaimed himself as holding "the world record for most times drug tested". (link) He said he was tested 500 times. During this 11-year career, Williams failed the test four times. So there is one thing we know - the drug testing regime is not much of a deterrent. Since the athlete knows when he is juicing or

## Retro 1990s post

July 11, 2016
I have one more for you on the topic of jail time for fraud . . . Paul Alper points us to a news article entitled, "Michael Hubbard, Former Alabama Speaker, Sentenced to 4 Years in Prison." From the headline this doesn't seem like such a big deal, just run-of-the-mill corruption that we see all

## MCMC effective sample size for difference of parameters (in Bayesian posterior distribution)

July 11, 2016
We'd like the MCMC representation of a posterior distribution to have large effective sample size (ESS) for the relevant parameters. (I recommend ESS > 10,000 for reasonably stable estimates of the limits of the 95% highest density interval.) In man...

## “Most notably, the vast majority of Americans support criminalizing data fraud, and many also believe the offense deserves a sentence of incarceration.”

July 11, 2016
Justin Pickett sends along this paper he wrote with Sean Roche: Data fraud and selective reporting both present serious threats to the credibility of science. However, there remains considerable disagreement among scientists about how best to sanction data fraud, and about the ethicality of selective reporting. OK, let's move away from asking scientists. Let's ask

## On deck this week

July 11, 2016
Mon: "Most notably, the vast majority of Americans support criminalizing data fraud, and many also believe the offense deserves a sentence of incarceration." Tues: Some insider stuff on the Stan refactor Wed: I know you guys think I have no filter, but . . . Thurs: Bigmilk strikes again Fri: "Pointwise mutual information as test

## Break a sentence into words in SAS

July 11, 2016
Two of my favorite string-manipulation functions in the SAS DATA step are the COUNTW function and the SCAN function. The COUNTW function counts the number of words in a long string of text. Here "word" means a substring that is delimited by special characters, such as a space character, a

## A Reanalysis of A Study About (Square) Pie Charts from 2009

July 11, 2016
After my recent posting on the results of our pie charts studies, Jorge Camoes teased me on Twitter about square pie charts. So I dug up the data from a study we ran many years ago to look at how well they compare to bars, pies, and squares. In 2009, my then-Ph.D. student Caroline Ziemkiewicz and

## Tuesday update

July 11, 2016
It Might All Be Wrong Tom Nichols and colleagues have published a paper on the software used to analyze fMRI data: Functional MRI (fMRI) is 25 years old, yet surprisingly its most common statistical methods have not been validated using real data....

## Bayesian variable selection in multiple linear regression: Model with highest R^2 is not necessarily highest posterior probability

July 10, 2016
Chapter 18 of DBDA2E includes sections on Bayesian variable selection in multiple linear regression. The idea is that each predictor (a.k.a., "variable") has an inclusion coefficient $$\delta_j$$ that can be 0 or 1 (along with its regression coefficien...

## Contemporaneous, Independent, and Complementary

July 10, 2016
You've probably been in a situation where you and someone else discovered something "contemporaneously and independently". Despite the initial sinking feeling, I've come to realize that there's usually nothing to worry about. First, normal-time sc...

## Over at the sister blog, they’re overinterpreting forecasts

July 10, 2016
Matthew Atkinson and Darin DeWitt write, "Economic forecasts suggest the presidential race should be a toss-up. So why aren't Republicans doing better?" Their question arises from a juxtaposition of two apparently discordant facts: 1. "PredictWise gives the Republicans a 35 percent chance of winning the White House." 2. A particular forecasting model (one of many

## Causal and predictive inference in policy research

July 9, 2016
Todd Rogers pointed me to a paper by Jon Kleinberg, Jens Ludwig, Sendhil Mullainathan, and Ziad Obermeyer that begins: Empirical policy research often focuses on causal inference. Since policy choices seem to depend on understanding the counterfactual—what happens with and without a policy—this tight link of causality and policy seems natural. While this link holds

## Using Python decorators to be a lazy programmer: a case study

Decorators are considered one of the more advanced features of python and it will often be the last topic in a python class or introductory book. It will, unfortunately, also be one that trips up many beginning or even intermediate

## “Participants reported being hungrier when they walked into the café (mean = 7.38, SD = 2.20) than when they walked out [mean = 1.53, SD = 2.70, F(1, 75) = 107.68, P < 0.001]."

July 8, 2016
E. J. Wagenmakers points me to a delightful bit of silliness from PPNAS, "Hunger promotes acquisition of nonfood objects," by Alison Jing Xu, Norbert Schwarz, and Robert Wyer. It has everything we're used to seeing in this literature: small-N, between-subject designs, comparisons of significant to non-significant, and enough researcher degrees of freedom to buy Uri

## Reproducible Research with Stan, R, knitr, Docker, and Git (with free GitLab hosting)

July 7, 2016
Jon Zelner recently developed a neat Docker packaging of Stan, R, and knitr for fully reproducible research. The first in his series of posts (with links to the next parts) is here: * Reproducibility, part 1 The post on making changes online and auto-updating results using GitLab's continuous integration service is here: * GitLab continuous

## Why Be Bayesian? Let Me Count the Ways

July 7, 2016
In answer to an old friend's question. Bayesians have more fun. Our conferences are in better places too. It's the model not the estimator. Life's too short to be a frequentist: In an infinite number of replications ... Software works better...

