The “sampling from an infinite population” metaphor beloved by statisticians of all types is a disaster for reproducible science. To explain why I’ll show what sampling from a finite population has going for it that’s not there ...

The “sampling from an infinite population” metaphor beloved by statisticians of all types is a disaster for reproducible science. To explain why I’ll show what sampling from a finite population has going for it that’s not there ...

Someone who doesn’t want his name shared (for the perhaps reasonable reason that he’ll “one day not be confused, and would rather my confusion not live on online forever”) writes: I’m exploring HLMs and stan, using your book with Jennifer Hill as my field guide to this new territory. I think I have a generally […] The post Index or indicator variables appeared first on Statistical Modeling, Causal Inference, and…

The joint paper, written with Gery Geenens and Davy Paindaveine, entitled “Probit transformation for nonparametric kernel estimation of the copula density” is now online on http://arxiv.org/abs/1404.4414 “Copula modelling has become...

We use R to take a very brief look at the distribution of e-book sales on Amazon.com. Recently Hugh Howey shared some eBook sales data spidered from Amazon.com: The 50k Report. The data is largely a single scrape of statistics about various anonymized books. Howey’s analysis tries to break sales down by declared category and […] Related posts: Sample size and power for rare events Living in A Lognormal World…

As I often manipulate time series from different sources, I rarely come across the same date format twice. Having to reformat the dates every time is a real waste of time because I never remember the syntax of the as.Date function. I put below a few examples that turn strings into standard R date format. […]

Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using […] The post One-tailed or two-tailed? appeared first on Statistical Modeling, Causal Inference, and Social…

J’animerai une formation lundi 28 de 14:00 à 16:00 au local N-6320 de l’UQAM sur le thème introduction aux arbres de classification. Cette formation est organisée dans le cadre des séminaires en méthodes d’analyses quantitative...

The New York Times recently published an article on education titled "Parental Involvement Is Overrated". Most research in this area supports the opposite view, but the authors claim that "evidence from our research suggests otherwise". Before you stop helping your children … Continue reading →

Nelson Villoria writes: I find the multilevel approach very useful for a problem I am dealing with, and I was wondering whether you could point me to some references about poolability tests for multilevel models. I am working with time series of cross sectional data and I want to test whether the data supports cross […] The post If you get to the point of asking, just do it. But…

How is it possible that it has taken a podcast called Data Stories 35 episodes to get to the topic of data storytelling? Alberto Cairo and I helped get the topic straightened out, and I think we even convinced Moritz that stories are not the enemy of exploration. It was a fun episode to record, and it touches on many interesting topics.

Ethan Siegel wrote a post entitled The Math of the Fastest Human Alive five years ago, using regressions. An alternative is too use extreme value models (I wrote a post a long time ago on the maximum length of a tennis match using extreme value theory a few years ago). In 2009, John Einmahl and Sander Smeets wrote a great article entitled ultimate 100m world records through extreme-value theory. The article is…

Here I derive a simple formula for probability distributions general enough for Statistical Mechanics and Classical Statistics in which the roles, meanings, and interpretations between the Information Entropy and Boltzmann’s Entropy are as clear ...

Prakash Nayak writes: I work as a musculoskeletal oncologist (surgeon) in Mumbai, India and am keen on sarcoma research. Sarcomas are rare disorders, and conventional frequentist analysis falls short of providing meaningful results for clinical application. I am thus keen on applying Bayesian analysis to a lot of trials performed with small numbers in this […] The post Looking for Bayesian expertise in India, for the purpose of analysis of…

Two years ago, Wired breathlessly extolled the virtues of A/B testing (link). A lot of Web companies are in the forefront of running hundreds or thousands of tests daily. The reality is that most A/B tests fail. A/B tests fail for many reasons. Typically, business leaders consider a test to have failed when the analysis fails to support their hypothesis. "We ran all these tests varying the color of the…

The MAPE (mean absolute percentage error) is a popular measure for forecast accuracy and is defined as where denotes an observation and denotes its forecast, and the mean is taken over . Armstrong (1985, p.348) was the first (to my knowledge) to point out the asymmetry of the MAPE saying that “it has a bias favoring estimates that are below the actual values”. A few years later, Armstrong…