UPDATE: THE BLOG/SITE HAS MOVED TO GITHUB. THE NEW LINK FOR THE BLOG/SITE IS patilv.github.io and THE LINK TO THIS POST IS: http://bit.ly/1pi5z8l . PLEASE UPDATE ANY BOOKMARKS YOU MAY HAVE.

I saw this great post on crayola crayon colors at the Learning R blog, reproducing a nice graph of the Crayola crayon colors over time. (Also see this even nicer version.) The Learning R post shows how to grab the crayon colors from the wikipedia page, “List of Crayola crayon colors,” directly in R. Here’s […]

The paradox of racism is that at any given moment, the racism of the day seems reasonable and very possibly true, but the racism of the past always seems so ridiculous. I’ve been thinking about this for a few months ever since receiving in the mail a new book, “A Troublesome Inheritance: Genes, Race, and […] The post Nicholas Wade and the paradox of racism appeared first on Statistical Modeling,…

Here is a question from my friend Shravan Vasishth about the consequences of using a stopping rule: Psycholinguists and psychologists often adopt the following type of data-gathering procedure: The experimenter gathers n data points, then checks for significance (p<0.05 or not). If it’s not significant, he gets more data (n more data points). Since time […]

I pointed Steven Pinker to my post, How much time (if any) should we spend criticizing research that’s fraudulent, crappy, or just plain pointless?, and he responded: Clearly it *is* important to call out publicized research whose conclusions are likely to be false. The only danger is that it’s so easy and fun to criticize, […] The post Discussion with Steven Pinker on research that is attached to data that…

Today’s email question: I work within a government budget office and sometimes have to forecast fairly simple time series several quarters into the future. Auto.arima() works great and I often get something along the lines of: ARIMA(0,0,1)(1,1,0)[12] with drift as the lowest AICc. However, my boss (who does not use R) takes issue with low-order AR and MA because “you’re essentially using forecasted data to make your forecast.” His models…

In a recent post on the DataColada blog, Uri Simonsohn wrote about “We cannot afford to study effect size in the lab“. The central message is: If we want accurate effect size (ES) estimates, we need large sample sizes (he suggests four-digi...

Last night I was working on a talk on creating effective graphs. Mostly, I needed to update the colors, as there’d been some gaudy ones in its previous form (e.g., slide 22). I usually pick colors using the crayons in the Mac Color Picker. But that has just 40 crayons, and I wanted more choices. […]

We have a Stan users meetup for NYC. We’ll have monthly sessions where we can discuss modeling, success stories, pain points, and really have a chance for the user base and the developers to interact in NYC. The first meetup will be on Tuesday, 5/13. I’ll be giving a overview of Stan aimed at a […] The post Stan users meetup next week appeared first on Statistical Modeling, Causal Inference,…

Ben Murell writes: Our reply to Kinney and Atwal has come out (http://www.pnas.org/content/early/2014/04/29/1403623111.full.pdf) along with their response (http://www.pnas.org/content/early/2014/04/29/1404661111.full.pdf). I feel like they somewhat missed the point. If you’re still interested in this line of discussion, feel free to post, and maybe the Murrells and Kinney can bash it out in your comments! Background: Too many […] The post Once more on nonparametric measures of mutual information appeared first on Statistical…

This story (“Yale tells students to keep Kissinger talk secret . . . ‘Dr. Kissinger’s visit to campus will not be publicized, so we appreciate your confidentiality…’”) reminds me of two things: - In the 1980s, I once went to a public lecture at Harvard by Kissinger protogé Ted Koppel, who indeed has that deep […] The post Cause he thinks he’s so-phisticated appeared first on Statistical Modeling, Causal Inference,…

It seems like Seth Kugel's article in the New York Times about "Crunching the Numbers to find the Best Airfare" is quite popular. In this article, he said things like this: The overall take on the best day to book tickets turns out to be somewhat underwhelming, if you look at the country as a whole. Hopper’s data shows it’s actually Thursday, but don’t expect that fact to save you…

Last week Chris Hemedinger posted an article about spam that is sent to SAS blogs and discussed how anti-spam software helps to block spam. No algorithm can be 100% accurate at distinguishing spam from valid comments because of the inherent trade-off between specificity and sensitivity in any statistical test. Therefore, […]

UPDATE: I was pointed out a problem with original post due to look ahead bias introduced by prices > SMA(prices,100) statement. In the calendar strategy logic I did not use a usual lag of one day because important days are known before hand. However, the prices > SMA(prices,100) statement should be lagged by one day. […]