## Two weights and two measures?

August 11, 2014
By

This is an interesting story about the Meningitis B vaccine (some additional background here and here). In a nutshell, the main issue is that vaccines are subject to a slightly different regulation than other "normal" drugs. For example, patents do not...

## Example 2014.9: Rolling averages. Also: Second Edition is shipping!

August 11, 2014
By

As of today, the second edition of "SAS and R: Data Management, Statistical Analysis, and Graphics" is shipping from CRC Press, Amazon, and other booksellers. There are lots of additional examples from this blog, new organization, and other features ...

## Stop saying "Scientists discover…" instead say, "Prof. Doe’s team discovers…"

August 11, 2014
By

I was just reading an article about data science in the WSJ. They were talking about how data scientists with just 2 years experience can earn a whole boatload of money*. I noticed a description that seemed very familiar: At … Continue reading →

## A Trio of Texts

August 11, 2014
By

A Trio of Texts refers to three, free, econometrics e-texts made available by Francis Diebold, at U. Penn. Francis blogs at No Hesitations.The three books on question are Econometrics (undergraduate level)Forecasting (upper-level undergr...

## SEO: Abbreviate and Facilitate Conveying Information

August 11, 2014
By

Copperplate Charts When William Playfair started using visualisations in his books he saw it as a means to bring information …Continue reading →

## Amsterdam City Dashboard: a City as Urban Statistics

August 11, 2014
By

Amsterdam City Dashboard [waag.org] presents the city of Amsterdam through the lens of data, including demographic statistics, traffic reports, noise readings or political messages. The small collection of information graphics are divided in distinc...

## Discussion with Sander Greenland on posterior predictive checks

August 11, 2014
By

Sander Greenland is a leading epidemiologist and educator who’s strongly influenced my thinking on hierarchical models by pointing out that often the data do not supply much information for estimating the group-level variance, a problem that can be particularly severe when the number of groups is low. (And, in some sense, the number of groups […] The post Discussion with Sander Greenland on posterior predictive checks appeared first on Statistical…

## Exploiting Heterogeneity to Reveal Consumer Preference: Data Matrix Factorization

August 11, 2014
By

We begin with a data matrix, a set of numbers arrayed so that each row contains information from a different consumer. Marketing research focuses on the consumer, but the columns are permitted more freedom, although they ought to tell us something abou...

## On deck this week

August 11, 2014
By

Mon: Discussion with Sander Greenland on posterior predictive checks Tues: Understanding the hot hand, and the myth of the hot hand, and the myth of the myth of the hot hand, and the myth of the myth of the myth of the hot hand, all at the same time Wed: Updike and O’Hara Thurs: Luck […] The post On deck this week appeared first on Statistical Modeling, Causal Inference, and…

## Upcoming Talks #DataViz #ABTesting

August 11, 2014
By

This is a cross-post to both my blogs. Thanks to the ~200 or so people who showed up at last week's Data Scientist Meetup in Cambridge, Mass., hosted by John Baker. I gave a brief introduction to the concept of...

## Upcoming Talks #Dataviz #ABTesting

August 11, 2014
By

This is a cross-post to both my blogs. Thanks to the ~200 or so people who showed up at last week's Data Scientist Meetup in Cambridge, Mass., hosted by John Baker. I gave a brief introduction to the concept of "numbersense", and was part of a panel of "chief data scientists" talking about how to run data teams. Thanks to those who asked questions. This month, I am back in…

## You Can Now Browse by Topic

August 11, 2014
By

You can now browse No Hesitations by topic.  Check it out -- just look in the right column, scrolling down a bit. I hope it's useful.

## On Rude and Risky "Calls for Papers"

August 11, 2014
By

You have likely seen calls for papers that include this script, or something similar: You will not hear from the organizers unless they decide to use your paper.It started with one leading group's calls, which go so even farther:You will not hear ...

## Ten tips for learning the SAS/IML language

August 11, 2014
By

A SAS customer wrote, "Now that I have access to PROC IML through the free SAS University Edition, what is the best way for me to learn to program in the SAS/IML language? How do I get started with PROC IML?" That is an excellent question, and I'm happy to […]

## Minimal reproducible examples

August 11, 2014
By

I occasionally get emails from people thinking they have found a bug in one of my R packages, and I usually have to reply asking them to provide a minimal reproducible example (MRE). This post is to provide instructions on how to create a MRE. Bug reports on github, not email First, if you think […]

## ABC model choice by random forests [guest post]

August 10, 2014
By

[Dennis Prangle sent me his comments on our ABC model choice by random forests paper. Here they are! And I appreciate very much contributors commenting on my paper or others, so please feel free to join.] This paper proposes a new approach to likelihood-free model choice based on random forest classifiers. These are fit to […]

## Data & Visualization Tools to Track Ebola

August 10, 2014
By

I’ve received the following email (slightly edited for clarity): Can anyone recommend a turnkey, full-service solution to help the Liberian government track the spread of Ebola and get this information out to the public? They want something that lets healthcare workers update info from mobile phones, and a workflow that results in data visualizations. They […] The post Data & Visualization Tools to Track Ebola appeared first on Statistical Modeling,…

## Cool new position available: Director of the Pew Research Center Labs

August 10, 2014
By

Peter Henne writes: I wanted to let you know about a new opportunity at Pew Research Center for a data scientist that might be relevant to some of your colleagues. I [Henne] am a researcher with the Pew Research Center, where I manage an international index on religious issues. I am also working with others […] The post Cool new position available: Director of the Pew Research Center Labs appeared…

## Guns are cool – Regions

August 10, 2014
By

This was supposed to be a post in which General Social Surveys (GSS) data were used to understand a bit more about the causation of differences between states. Thus it was to give additioanl insight than my previous post; Guns are Cool - Differenc...

## Prior for normality (df) parameter in t distribution

August 8, 2014
By

A routine way to describe outliers in metric data is with a heavy-tailed t distribution instead of with a normal distribution. The heaviness of the tails is governed by a normality parameter, ν, also called the df parameter. What is a reasonable prior...

## Estimated effect of early childhood intervention downgraded from 42% to 25%

August 8, 2014
By

Last year I came across an article, “Labor Market Returns to Early Childhood Stimulation: a 20-year Followup to an Experimental Intervention in Jamaica,” by Paul Gertler, James Heckman, Rodrigo Pinto, Arianna Zanolini, Christel Vermeerch, Susan Walker, Susan M. Chang, and Sally Grantham-McGregor, that claimed that early childhood stimulation raised adult earnings by 42%. At the […] The post Estimated effect of early childhood intervention downgraded from 42% to 25% appeared…

## Machine Learning and Applied Statistics Lesson of the Day – Positive Predictive Value and Negative Predictive Value

$Machine Learning and Applied Statistics Lesson of the Day – Positive Predictive Value and Negative Predictive Value$

For a binary classifier, its positive predictive value (PPV) is the proportion of positively classified cases that were truly positive. its negative predictive value (NPV) is the proportion of negatively classified cases that were truly negative. In a later Statistics and Machine Learning Lesson of the Day, I will discuss the differences between PPV/NPV and sensitivity/specificity […]

## Vtreat: designing a package for variable treatment

August 8, 2014
By

When you apply machine learning algorithms on a regular basis, on a wide variety of data sets, you find that certain data issues come up again and again: Missing values (NA or blanks) Problematic numerical values (Inf, NaN, sentinel values like 999999999 or -1) Valid categorical levels that don’t appear in the training data (especially … Continue reading Vtreat: designing a package for variable treatment → Related posts: R minitip:…