## High incidence in Measles Data in Project Tycho

April 21, 2014
By

In this third post on Measles data I want to have a look at some high incidence occasions. As described before, the data is from Project Tycho, which contains data from all weekly notifiable disease reports for the United States dating back to 18...

## Notes on evaluating predictions, or Background to my airfare predictor article

April 20, 2014
By

My article on whether we can trust airfare prediction models is published today at FiveThirtyEight, the new data journalism venture launched by Nate Silver after he moved to ESPN. This topic was originally conceived as a chapter of Numbersense (link) but I dropped it. As I have noted in my review of Nate Silver's book, he has a keen interest in evaluating predictions, and not surprisingly, he encouraged me to…

## Fooled by randomness

April 20, 2014
By

From 2006: Naseem Taleb‘s publisher sent me a copy of “Fooled by randomness: the hidden role of chance in life and the markets” to review. It’s an important topic, and the book is written in a charming style—I’ll try to respond in kind, with some miscellaneous comments. On the cover of the book is a […]The post Fooled by randomness appeared first on Statistical Modeling, Causal Inference, and Social Science.

## Getting Credit (or blame) for Something You Didn’t Do (BP oil spill)

April 20, 2014
By

Four years ago, many of us were glued to the “spill cam” showing, in real time, the gushing oil from the April 20, 2010 explosion sinking the Deepwater Horizon oil rig in the Gulf of Mexico, killing 11, and spewing oil until July 15 (see video clip that was added below).Remember junk shots, top kill, blowout preventers? [1] The EPA has […]

## Monotonicity of EM Algorithm Proof

April 19, 2014
By

Here the monotonicity of the EM algorithm is established. $$f_{o}(Y_{o}|\theta)=f_{o,m}(Y_{o},Y_{m}|\theta)/f_{m|o}(Y_{m}|Y_{o},\theta)$$ $$\log L_{o}(\theta)=\log L_{o,m}(\theta)-\log f_{m|o}(Y_{m}|Y_{o},\theta) \label{eq:loglikelihood}$$ where $$L_{o}(\theta)$$ is the likelihood under the observed data and $$L_{o,m}(\theta)$$ is the likelihood under the complete data. Taking the expectation of the second line with respect to the conditional distribution of $$Y_{m}$$ given $$Y_{o}$$ and […] The post Monotonicity of EM Algorithm Proof appeared first on Lindons Log.

## Why the “sample from infinite population” metaphor has been such a disaster for reproducible science.

April 19, 2014
By

The “sampling from an infinite population” metaphor beloved by statisticians of all types is a disaster for reproducible science. To explain why I’ll show what sampling from a finite population has going for it that’s not there ...

## Index or indicator variables

April 19, 2014
By

Someone who doesn’t want his name shared (for the perhaps reasonable reason that he’ll “one day not be confused, and would rather my confusion not live on online forever”) writes: I’m exploring HLMs and stan, using your book with Jennifer Hill as my field guide to this new territory. I think I have a generally […]The post Index or indicator variables appeared first on Statistical Modeling, Causal Inference, and Social…

## Copula Density Estimation

April 19, 2014
By

The joint paper, written with Gery Geenens and Davy Paindaveine, entitled “Probit transformation for nonparametric kernel estimation of the copula density” is now online on http://arxiv.org/abs/1404.4414 “Copula modelling has become ubiquitous in modern statistics. Here, the problem of nonparametrically estimating a copula density is addressed. Arguably the most popular nonparametric density estimator, the kernel estimator is not suitable for the unit-square-supported copula densities, mainly because it is heavily affected by boundary bias issues. In addition, most…

## Old tails: a crude power law fit on ebook sales

April 18, 2014
By

We use R to take a very brief look at the distribution of e-book sales on Amazon.com. Recently Hugh Howey shared some eBook sales data spidered from Amazon.com: The 50k Report. The data is largely a single scrape of statistics about various anonymized books. Howey’s analysis tries to break sales down by declared category and […] Related posts: Sample size and power for rare events Living in A Lognormal World…

## Welcome to Econometrics Students in China

April 18, 2014
By

One of my students mentioned to me yesterday that there was quite a bit of action on Weibo (the Chinese equivalent to Twitter) relating to posts on this blog - especially those posts relating to MCMC methods in econometrics. That's just great - thanks ...

## My talks @ Universitat de Girona

April 18, 2014
By

Just after Easter, I'll go for a very quick trip to lovely Girona, where Marc Saez has invited me to give two talks.The first one will be a re-run of the short course on INLA that I did at Bayes Pharma last year. It's scheduled (and prepared) as a 3-ho...

## Date formating in R

April 18, 2014
By

As I often manipulate time series from different sources, I rarely come across the same date format twice. Having to reformat the dates every time is a real waste of time because I never remember the syntax of the as.Date function. I put below a few examples that turn strings into standard R date format. […]

## One-tailed or two-tailed?

April 18, 2014
By

Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using […]The post One-tailed or two-tailed? appeared first on Statistical Modeling, Causal Inference, and Social Science.

## More from xkcd

April 18, 2014
By

Here's another from xkcd.com, on our "good graphics" theme.

## Les Arbres de Classification

April 18, 2014
By

J’animerai une formation lundi 28 de 14:00 à 16:00 au local N-6320 de l’UQAM sur le thème introduction aux arbres de classification. Cette formation est organisée dans le cadre des séminaires en méthodes d’analyses quantitatives et qualitatives qui se tiennent régulièrement depuis un peu plus d’un mois. animé par le collectif pour le développement et les applications en mesure et évaluation (Cdame). Les slides sont disponibles en pdf (il y a quelques animations,…

## An overused chart, why it fails, and how to fix it

April 17, 2014
By

Reader and tipster Chris P. found this "death spiral" chart dizzying (link). It's one of those charts that has conceptual appeal but does not do the data justice. As the name implies, the designer has a strong message, that the...

## Correlation does not imply causation (parental involvement edition)

April 17, 2014
By

The New York Times recently published an article on education titled "Parental Involvement Is Overrated". Most research in this area supports the opposite view, but the authors claim that "evidence from our research suggests otherwise".  Before you stop helping your children … Continue reading →

## If you get to the point of asking, just do it. But some difficulties do arise . . .

April 17, 2014
By

Nelson Villoria writes: I find the multilevel approach very useful for a problem I am dealing with, and I was wondering whether you could point me to some references about poolability tests for multilevel models. I am working with time series of cross sectional data and I want to test whether the data supports cross […]The post If you get to the point of asking, just do it. But some…

## How Valuable is a #1 Ranking for Analytics Software? Not as Much as You Might Think!

April 17, 2014
By

In my never-ending quest to study the Popularity of Data Analysis Software, I recently read the 2013 Edition of the Wisdom of Crowds Business Intelligence Market Study by Dresner Advisory Services, LLC. In it, I found the table below which … Continue reading →

## Data Stories Episode About Data Storytelling

April 17, 2014
By

How is it possible that it has taken a podcast called Data Stories 35 episodes to get to the topic of data storytelling? Alberto Cairo and I helped get the topic straightened out, and I think we even convinced Moritz that stories are not the enemy of exploration. It was a fun episode to record, and it touches on many interesting topics.

## How Fast the Fastest Human Would Run 100m?

April 17, 2014
By

Ethan Siegel wrote a post entitled The Math of the Fastest Human Alive five years ago, using regressions. An alternative is too use extreme value models (I wrote a post a long time ago on the maximum length of a tennis match using extreme value theory a few years ago). In 2009, John Einmahl and Sander Smeets wrote a great article entitled ultimate 100m world records through extreme-value theory. The article is…

## Bitsanity

April 16, 2014
By

BitsanityThe awesome folks at Quandl (an amazing data collection and distribution service) have been so kind as to allow me to write for their blog.In my first post for them I demonstrate (with detailed R code) how a user of their free data services co...

## The horrible confusion between different entropies explained in a way that answers: Where do likelihoods and priors come from?

April 16, 2014
By

Here I derive a simple formula for probability distributions general enough for Statistical Mechanics and Classical Statistics in which the roles, meanings, and interpretations between the Information Entropy and Boltzmann’s Entropy are as clear ...