## Date formating in R

April 18, 2014
By

As I often manipulate time series from different sources, I rarely come across the same date format twice. Having to reformat the dates every time is a real waste of time because I never remember the syntax of the as.Date function. I put below a few examples that turn strings into standard R date format. […]

## One-tailed or two-tailed?

April 18, 2014
By

Someone writes: Suppose I have two groups of people, A and B, which differ on some characteristic of interest to me; and for each person I measure a single real-valued quantity X. I have a theory that group A has a higher mean value of X than group B. I test this theory by using […] The post One-tailed or two-tailed? appeared first on Statistical Modeling, Causal Inference, and Social…

## More from xkcd

April 18, 2014
By

Here's another from xkcd.com, on our "good graphics" theme.

## Les Arbres de Classification

April 18, 2014
By

J’animerai une formation lundi 28 de 14:00 à 16:00 au local N-6320 de l’UQAM sur le thème introduction aux arbres de classification. Cette formation est organisée dans le cadre des séminaires en méthodes d’analyses quantitative...

## An overused chart, why it fails, and how to fix it

April 17, 2014
By

Reader and tipster Chris P. found this "death spiral" chart dizzying (link). It's one of those charts that has conceptual appeal but does not do the data justice. As the name implies, the designer has a strong message, that the...

## Correlation does not imply causation (parental involvement edition)

April 17, 2014
By

The New York Times recently published an article on education titled "Parental Involvement Is Overrated". Most research in this area supports the opposite view, but the authors claim that "evidence from our research suggests otherwise".  Before you stop helping your children … Continue reading →

## If you get to the point of asking, just do it. But some difficulties do arise . . .

April 17, 2014
By

Nelson Villoria writes: I find the multilevel approach very useful for a problem I am dealing with, and I was wondering whether you could point me to some references about poolability tests for multilevel models. I am working with time series of cross sectional data and I want to test whether the data supports cross […] The post If you get to the point of asking, just do it. But…

## How Valuable is a #1 Ranking for Analytics Software? Not as Much as You Might Think!

April 17, 2014
By

In my never-ending quest to study the Popularity of Data Analysis Software, I recently read the 2013 Edition of the Wisdom of Crowds Business Intelligence Market Study by Dresner Advisory Services, LLC. In it, I found the table below which … Continue reading →

## Data Stories Episode About Data Storytelling

April 17, 2014
By

How is it possible that it has taken a podcast called Data Stories 35 episodes to get to the topic of data storytelling? Alberto Cairo and I helped get the topic straightened out, and I think we even convinced Moritz that stories are not the enemy of exploration. It was a fun episode to record, and it touches on many interesting topics.

## How Fast the Fastest Human Would Run 100m?

April 17, 2014
By

Ethan Siegel wrote a post entitled The Math of the Fastest Human Alive five years ago, using regressions. An alternative is too use extreme value models (I wrote a post a long time ago on the maximum length of a tennis match using extreme value theory a few years ago). In 2009, John Einmahl and Sander Smeets wrote a great article entitled ultimate 100m world records through extreme-value theory. The article is…

## Bitsanity

April 16, 2014
By

BitsanityThe awesome folks at Quandl (an amazing data collection and distribution service) have been so kind as to allow me to write for their blog.In my first post for them I demonstrate (with detailed R code) how a user of their free data services co...

## The horrible confusion between different entropies explained in a way that answers: Where do likelihoods and priors come from?

April 16, 2014
By

Here I derive a simple formula for probability distributions general enough for Statistical Mechanics and Classical Statistics in which the roles, meanings, and interpretations between the Information Entropy and Boltzmann’s Entropy are as clear ...

## An Exercise With the SURE Model

April 16, 2014
By

Here's an exercise that I sometimes set for students if we're studying the Seemingly Unrelated Regression equations (SURE) model. In fact, I used it as part of a question in the final examination that my grad. students sat last week.Suppose that we hav...

## Looking for Bayesian expertise in India, for the purpose of analysis of sarcoma trials

April 16, 2014
By

Prakash Nayak writes: I work as a musculoskeletal oncologist (surgeon) in Mumbai, India and am keen on sarcoma research. Sarcomas are rare disorders, and conventional frequentist analysis falls short of providing meaningful results for clinical application. I am thus keen on applying Bayesian analysis to a lot of trials performed with small numbers in this […] The post Looking for Bayesian expertise in India, for the purpose of analysis of…

## The reality is most A/B tests fail, and Facebook is here to help

April 16, 2014
By

Two years ago, Wired breathlessly extolled the virtues of A/B testing (link). A lot of Web companies are in the forefront of running hundreds or thousands of tests daily. The reality is that most A/B tests fail. A/B tests fail for many reasons. Typically, business leaders consider a test to have failed when the analysis fails to support their hypothesis. "We ran all these tests varying the color of the…

## Errors on percentage errors

April 16, 2014
By

The MAPE (mean absolute percentage error) is a popular measure for forecast accuracy and is defined as     where denotes an observation and denotes its forecast, and the mean is taken over . Armstrong (1985, p.348) was the first (to my knowledge) to point out the asymmetry of the MAPE saying that “it has a bias favoring estimates that are below the actual values”. A few years later, Armstrong…

## The Granville incident

April 16, 2014
By

Earlier this morning, there was some commotion on the allstat mailing list (if you don't know what it is, that's a UK-based discussion list specifically focussed on statistics; it's been active for quite some time and usually you get useful information...

## On the determinant of the Hilbert matrix

April 16, 2014
By

Last week I described the Hilbert matrix of size n, which is a famous square matrix in numerical linear algebra. It is famous partially because its inverse and its determinant have explicit formulas (that is, we know them exactly), but mainly because the matrix is ill-conditioned for moderate values of […]

## A. Spanos: Jerzy Neyman and his Enduring Legacy

April 16, 2014
By

A Statistical Model as a Chance Mechanism Aris Spanos  Jerzy Neyman (April 16, 1894 – August 5, 1981), was a Polish/American statistician[i] who spent most of his professional career at the University of California, Berkeley. Neyman is best known in statistics for his pioneering contributions in framing the Neyman-Pearson (N-P) optimal theory of hypothesis testing […]

## Econometric Game, 2014

April 15, 2014
By

I've blogged about The Econometric Game previously - see here, here,  and here.It's April, so the Game is on again - today and the next two days, to be specific. You can check out he details, as they become available, at this site.Good luck to all...

## Video Tutorial – Rolling 2 Dice: An Intuitive Explanation of The Central Limit Theorem

$Video Tutorial – Rolling 2 Dice: An Intuitive Explanation of The Central Limit Theorem$

According to the central limit theorem, if random variables, , are independent and identically distributed, is sufficiently large, then the distribution of their sample mean, , is approximately normal, and this approximation is better as increases. One of the most remarkable aspects of the central limit theorem (CLT) is its validity for any parent distribution of […]

## Timid medical research

April 15, 2014
By

-+*Cancer research is sometimes criticized for being timid. Drug companies run enormous trials looking for small improvements. Critics say they should run smaller trials and more of them. Which side is correct depends on what’s out there waiting to be discovered, which of course we don’t know. We can only guess. Timid research is rational […]

## Conventions, novelty and the double edge

April 15, 2014
By

This chart from Reuters is making the rounds on Twitter today. Quickly, tell me whether the Gun Law in Florida did well or poorly. That of course is the entire purpose of the chart. *** If you are like me,...