In the previous installment of this saga, I admitted that my predictions had completely failed, and I outlined the debugging process I began. Then the semester happened, so I didn't get to work on it again until last week.It turns out that there ...

Matt Brigg’s comment on outliers in his post Tyranny of the mean: Coontz used the word “outliers”. There are no such things. There can be mismeasured data, i.e. incorrect data, say when you tried to measure air temperature but your thermometer fell into boiling water. Or there can be errors in recording the data; transposition […]

Anirban Bhattacharya, Debdeep Pati, Natesh Pillai, and David Dunson write: Penalized regression methods, such as L1 regularization, are routinely used in high-dimensional applications, and there is a rich literature on optimality properties under sparsity assumptions. In the Bayesian paradigm, sparsity is routinely induced through two-component mixture priors having a probability mass at zero, but such [...]

I will be at Book Expo this Friday signing books at the McGraw-Hill booth. If you're in NYC, drop by and say hi between 11 and 12. Yes, it's a new book! The title is Numbersense: How to Use Big Data to Your Advantage (link). If you read my blogs, you already know where I'm going with this. How can we be smart consumers of data analyses in a world…

The power of a statistical test measures the test's ability to detect a specific alternate hypothesis. For example, educational researchers might want to compare the mean scores of boys and girls on a standardized test. They plan to use the well-known two-sample t test. The null hypothesis is that the [...]

K. W. Staley Associate Professor Department of Philosophy, Saint Louis University (Almost) All about error BOOK REVIEW Metascience (2012) 21:709–713 DOI 10.1007/s11016-011-9618-1 Deborah G. Mayo and Aris Spanos (eds): Error and inference: Recent exchanges on experimental reasoning, reliability, objectivity, and rationality. New York: Cambridge University Press, 2010, xvii+419 pp The ERROR’06 (experimental reasoning, reliability, objectivity, […]

There has been a lot of discussion among statisticians about big data and what statistics should do to get involved. Recently Steve M. and Larry W. took up the same issue on their blog. I have been thinking about this … Continue reading →

Paul Alper writes: Unless I missed it, you haven't commented on the recent article of Michael Bang Peterson [with Daniel Sznycer, Aaron Sell, Leda Cosmides, and John Tooby]. It seems to have been reviewed extensively in the lay press. A typical example is here. This review begins with "If you are physically strong, social science [...]

True story (no really, this did actually happen). While in grad school one of the other teaching assistants was approached by one of the students and was asked “will mu go out with median?” The teaching assistant thought the play on words was pretty funny, laughed, and then cluelessly walked away. All of us other grad students […]

Almost once every year someone asks if R has a package for running the MaxDiff procedure sold by Sawtooth. One such inquiry recently received a reply with a link showing in some detail the R code needed to generate a balanced incomplete...

Saw Argo the other day, was impressed by the way it was filmed in such a 70s style, sorta like that movie The Limey or an episode of the Rockford Files. I also felt nostalgia for that relatively nonviolent era. All those hostages and nobody was killed. It's a good thing the Ayatollah didn't have [...]

Steve Marron is a statistician at UNC. In his younger days he was well known for his work on nonparametric theory. These days he works on a number of interesting things including analysis of structured objects (like tree-structured data) and high dimensional theory. Steve sent me a thoughtful email the other day about “Big Data” […]

I received the following email: I am trying to develop a Bayesian model to represent the process through which individual consumers make online product rating decisions. In my model each individual faces total J product options and for each product option (j) each individual (i) needs to make three sequential decisions: - First he decides [...]

Has anyone noticed that the REG procedure in SAS/STAT 12.1 produces heat maps instead of scatter plots for fit plots and residual plots when the regression involves more than 5,000 observations? I wasn't aware of the change until a colleague informed me, although the change is discussed in the "Details" [...]