## Prior for normality (df) parameter in t distribution

August 8, 2014
A routine way to describe outliers in metric data is with a heavy-tailed t distribution instead of with a normal distribution. The heaviness of the tails is governed by a normality parameter, ν, also called the df parameter. What is a reasonable prior...

## Estimated effect of early childhood intervention downgraded from 42% to 25%

August 8, 2014
Last year I came across an article, “Labor Market Returns to Early Childhood Stimulation: a 20-year Followup to an Experimental Intervention in Jamaica,” by Paul Gertler, James Heckman, Rodrigo Pinto, Arianna Zanolini, Christel Vermeerch, Susan Walker, Susan M. Chang, and Sally Grantham-McGregor, that claimed that early childhood stimulation raised adult earnings by 42%. At the […] The post Estimated effect of early childhood intervention downgraded from 42% to 25% appeared…

## Machine Learning and Applied Statistics Lesson of the Day – Positive Predictive Value and Negative Predictive Value

$Machine Learning and Applied Statistics Lesson of the Day – Positive Predictive Value and Negative Predictive Value$

For a binary classifier, its positive predictive value (PPV) is the proportion of positively classified cases that were truly positive. its negative predictive value (NPV) is the proportion of negatively classified cases that were truly negative. In a later Statistics and Machine Learning Lesson of the Day, I will discuss the differences between PPV/NPV and sensitivity/specificity […]

## Vtreat: designing a package for variable treatment

August 8, 2014
When you apply machine learning algorithms on a regular basis, on a wide variety of data sets, you find that certain data issues come up again and again: Missing values (NA or blanks) Problematic numerical values (Inf, NaN, sentinel values like 999999999 or -1) Valid categorical levels that don’t appear in the training data (especially … Continue reading Vtreat: designing a package for variable treatment → Related posts: R minitip:…

## A Simple Shiny App for Monitoring Trading Strategies – Part II

August 7, 2014
This is a follow up on my previous post “A Simple Shiny App for Monitoring Trading Strategies“.  I added a few improvements that make the app a bit better (at least for me!). Below is the list of new features : A sample  .csv file (the one that contains the raw data) A “EndDate”  drop […]

## It’s like Tinder, but for peer review.

August 7, 2014
I have an idea for an app. You input the title and authors of a preprint (maybe even the abstract). The app shows the title/authors/abstract to people who work in a similar area to you. You could estimate this based … Continue reading →

## Nate Silver’s website

August 7, 2014
Someone who wishes to remain anonymous writes: I believe you are aware that Nate Silver spoke at last year’s JSM and that he began a publication under ESPN (http://fivethirtyeight.com/). Do you have any opinions on the publication? Maybe some you wish to share with the public. I was hoping to hear your opinions about 538 […] The post Nate Silver’s website appeared first on Statistical Modeling, Causal Inference, and Social…

## Wrapping up at the JSM

August 7, 2014
Today was my last day at the 2014 JSM in Boston - unfortunately I'm unable to stay on for tomorrow's sessions.As always, it was a terrific meeting - extremely well organized, and with a huge number of interesting papers from a very diverse range of con...

## What did Nate Silver just say? Blogging the JSM 2013

August 7, 2014
Memory Lane: August 6, 2013. My initial post on JSM13 (8/5/13) was here. Nate Silver gave his ASA Presidential talk to a packed audience (with questions tweeted[i]). Here are some quick thoughts—based on scribbled notes (from last night). Silver gave a list of 10 points that went something like this (turns out there were 11): […]

## President of American Association of Buggy-Whip Manufacturers takes a strong stand against internal combustion engine, argues that the so-called “automobile” has “little grounding in theory” and that “results can vary widely based on the particular fuel that is used”

August 6, 2014
Some people pointed me to this official statement signed by Michael Link, president of the American Association for Public Opinion Research (AAPOR). My colleague David Rothschild and I wrote a measured response to Link’s statement which I posted on the sister blog. But then I made the mistake of actually reading what Link wrote, and […] The post President of American Association of Buggy-Whip Manufacturers takes a strong stand against…

## If you like A/B testing here are some other Biostatistics ideas you may like

August 6, 2014
Web companies are using A/B testing and experimentation regularly now to determine which features to push for advertising or improving user experience. A/B testing is a form of randomized controlled trial that was originally employed in psychology but first adopted on a massive … Continue reading →

## Scientific communication by press release

August 6, 2014
Hector Cordero-Guzman writes: I have a question for you about an ongoing congroversy\incident related to reporting of social science research. Please see article linked below if you have a chance. I think this incident exposes real problems in the way social science research is presented and how it reaches the public… http://www.latinorebels.com/2014/05/22/new-york-times-piece-on-hispanics-and-census-based-on-study-not-yet-finalized-or-public/ Essentially, we have […] The post Scientific communication by press release appeared first on Statistical Modeling, Causal Inference,…

## A report from #JSM Joint Statistical Meetings 2014

August 6, 2014
Stephen Stigler, the preeminent historian of statistics, gave a great talk at JSM, the annual gathering of statisticians on Monday afternoon in Boston. He outlined seven core ideas ("pillars of wisdom") in statistical research that sets the field apart; these are ideas developed by statisticians that represent significant advances to science and to human knowledge. As he remarked, each of these advances overturned then-established science, but even today, many people…

## Define an objective function that evaluates an integral in SAS

August 6, 2014
The SAS/IML language is used for many kinds of computations, but three important numerical tasks are integration, optimization, and root finding. Recently a SAS customer asked for help with a problem that involved all three tasks. The customer had an objective function that was defined in terms of an integral. […]

## My Talk at the JSM

August 5, 2014
Tomorrow morning (Wednesday 6 August) I'll be presenting at the Joint Statistical Meetings in Boston.The title for my talk is "Modelling Asymmetries in the Market for Gasoline in Western Canada", and it's based on some research that I have underwa...

## iTunes’ Terms & Conditions

August 5, 2014
In a recent post (in French) I did plot the evolution of the number of pages of some legal document, published (and updated) over a century  I had the feeling that the same pattern should be observed on Terms and Conditions documents, that seem to be longer and longer. In Small print that’s longer than George Orwell’s Animal Farm! HSBC gets wooden spoon for endless terms and conditions, it was mentioned that…

## Stata: Generate a Spatial Moving Average

August 5, 2014
Often times we may be interested in generating a spatial moving average of a characteristic X. We may use this moving average to help control for heterogeneity in the population which may be related to the spatial distribution of observations. In order...

## Do we need institutional review boards for human subjects research conducted by big web companies?

August 5, 2014
Web companies have been doing human subjects research for a while now. Companies like Facebook and Google have employed statisticians for almost a decade (or more) and part of the culture they have introduced is the idea of randomized experiments … Continue reading →

## When doing scientific replication or criticism, collaboration with the original authors is fine but I don’t think it should be a requirement or even an expectation

August 5, 2014
Dominik Papies points me to this article, “Matched-Names Analysis Reveals No Evidence of Name-Meaning Effects,” by psychologist and data detective Uri Simonsohn, in collaboration with Raphael Silberzahn and Eric Luis Uhlmann, the two authors of an earlier study that this new report is refuting. Papies writes: This seems to me an interesting case where a […] The post When doing scientific replication or criticism, collaboration with the original authors is…

## The 7 Pillars of Statistical Wisdom

August 5, 2014
Yesterday, Stephen Stigler presented the (ASA) President's Invited Address to an overflow, and appreciative, audience at the 2014 Joint Statistical Meetings in Boston. The title of his talk was, "The Seven Pillars of Statistical Wisdom".I'd been l...

## Bad SQL habits

August 5, 2014
For those in Boston/Cambridge, I will be speaking at the Chief Data Scientist meetup on Wednesday night. See you there. *** Warning: this post may be hard to understand if you don't know SQL. SQL is one of the most fundamental tools in data science. It is used to manipulate data. Its simplicity is a big reason for its popularity. There are lots of things it can’t do but the…

## Stigler’s seven pillars of statistical wisdom

August 5, 2014
Wisdom has built her house; She has hewn out her seven pillars.      – Proverbs 9:1 At the 2014 Joint Statistical Meetings in Boston, Stephen Stigler gave the ASA President's Invited Address. In forty short minutes, Stigler laid out his response to the age-old question "What is statistics?" His answer was […]

## Thanks to R Markdown: Perhaps Word is an option after all?

August 5, 2014
In many cases Word is still the preferred file format for collaboration in the office. Yet, it is often a challenge to work with it, not so much because of the software, but how it is used and abused. Thanks to Markdown it is no longer painful to inclu...

