## Sunday data/statistics link roundup (1/19/2014)

January 20, 2014
By

Tesla is hiring a data scientist. That is all. I'm not sure I buy the idea that Python is taking over for R among people who actually do regular data science.  I think it is still context dependent. A huge … Continue reading →

## Faire parler les chiffres… n’importe comment

January 19, 2014
By

Cette fin de semaine, Martin Grandjean a mis en ligne un billet intéressant sur son blog, sur l’utilisation des statistiques (dans un but de propagande). L’exercice n’est pas nouveau, mais Martin soulève des questions, malheureusement importantes et complexes. Dans un paragraphe, intitulé “faire parler les chiffres… n’importe comment” (que j’ai repris comme titre, j’avoue avoir hésité avec “with great power comes great responsibility“), on retrouve l’analyse (rapide) d’un graphique, présenté ci-dessous.…

## “The British amateur who debunked the mathematics of happiness”

January 19, 2014
By

Andrew Anthony tells the excellent story of how Nick Brown, Alan Sokal, and Harris Friedman shot down some particularly silly work in psychology. (“According to the graph, it all came down to a specific ratio of positive emotions to negative emotions. If your ratio was greater than 2.9013 positive emotions to 1 negative emotion you […]The post “The British amateur who debunked the mathematics of happiness” appeared first on Statistical…

January 19, 2014
By

Nassim Nicholas Taleb recently wrote an article advocating the abandonment of the use of standard deviation and advocating the use of mean absolute deviation. Mean absolute deviation is indeed an interesting and useful measure- but there is a reason that standard deviation is important even if you do not like it: it prefers models that […] Related posts: Don’t use correlation to track prediction performance What does a generalized linear…

## The Myth of Random Sampling

January 19, 2014
By

I feel a slight quiver of trepidation as I begin this post – a little like the boy who pointed out that the emperor has  no clothes. Random sampling is a myth. Practical researchers know this and deal with it. … Continue reading →

## The Myth of Random Sampling

January 19, 2014
By

I feel a slight quiver of trepidation as I begin this post – a little like the boy who pointed out that the emperor has  no clothes. Random sampling is a myth. Practical researchers know this and deal with it. … Continue reading →

## Hopper – new in the travel space

January 19, 2014
By

Briefly - Hopper is something new in the travel / local space. In their own words: What if you could plan an amazing trip based on a vague idea — like “spring surfing in California” or “Mediterranean cruise”? What if...

## What is volatility?

January 19, 2014
By

Some facts and some speculation. Definition Volatility is the annualized standard deviation of returns — it is often expressed in percent. A volatility of 20 means that there is about a one-third probability that an asset’s price a year from now will have fallen or risen by more than 20% from its present value. In … Continue reading →

## Transformations for non-normal data

January 19, 2014
By

Steve Peterson writes: I recently submitted a proposal on applying a Bayesian analysis to gender comparisons on motivational constructs. I had an idea on how to improve the model I used and was hoping you could give me some feedback. The data come from a survey based on 5-point Likert scales. Different constructs are measured […]The post Transformations for non-normal data appeared first on Statistical Modeling, Causal Inference, and Social…

## Sir Harold Jeffreys’ (tail area) one-liner: Sat night comedy [draft ii]

January 19, 2014
By

You might not have thought there could be new material for 2014, but there is, and if you look a bit more closely, you’ll see that it’s actually not Jay Leno who is standing up there at the mike …. It’s Sir Harold Jeffreys himself! And his (very famous) joke, I admit, is funny. So, since […]

## Le Monde puzzle [#849]

January 18, 2014
By

A straightforward Le Monde mathematical puzzle: Find a pair (a,b) of integers such that a has an odd number d of digits larger than 2 and ab is written as 10d+1+10a+1. Find the smallest possible values of a and of b. I ran the following R code which produced a=137 (and b=83) as the unique […]

## Measurement and Measurement Error, Weight, Success and Failure

January 18, 2014
By

This blog currently weights 200 pounds. It's inscribed in my data base, so it must be true. 200 is the latest in a series of daily morning readings wearing the same clothing, at the same time of my day. But how is that 200 measured? And is 200 good or ...

## Converting plots to data

January 18, 2014
By

It is a problem which occurs ever so often in applied work, you have a plot, but you want the data. There are at least two programs which can help you there; PlotDigitizer and Engauge Digitizer. I got both on my openSuse machine. Both are available for...

## A course in sample surveys for political science

January 18, 2014
By

A colleague asked if I had any material for a course in sample surveys. And indeed I do. See here. It’s all the slides for a 14-week course, also the syllabus (“surveyscourse.pdf”), the final exam (“final2012.pdf”) and various misc files. Also more discussion of final exam questions here (keep scrolling thru the “previous entries” until […]The post A course in sample surveys for political science appeared first on Statistical Modeling,…

## Machine Learning Lesson of the Day – Cross-Validation

Validation is a good way to assess the predictive accuracy of a supervised learning algorithm, and the rule of thumb of using 70% of the data for training and 30% of the data for validation generally works well.  However, what if the data set is not very large, and the small amount of data for […]

## Metaphors Matter: Factor Structure vs. Correlation Network Maps

January 17, 2014
By

The psych R package includes a data set called "bfi" with self-report ratings on 25 personality items along a 6-point agreement scale. All the details are provided in the documentation accompanying the package. My focus is how to represent the correlat...

## Animated choropleths using animation, ggplot2, rCharts, googleVis and Shiny to visualize violent crime rates in different US States across 5 decades

January 17, 2014
By

UPDATE: THE BLOG/SITE HAS MOVED TO GITHUB. THE NEW LINK FOR THE BLOG/SITE IS patilv.github.io and THE LINK TO THIS POST IS: http://bit.ly/1jccIBN. PLEASE UPDATE ANY BOOKMARKS YOU MAY HAVE.This post uses animated choropleths to visualize violent crime r...

## Causality and T-Consistency vs. Correlation and P-Consistency

January 17, 2014
By

Consider a standard linear regression setting with $$K$$ regressors and sample size $$N$$. We will say that an estimator $$\hat{\beta}$$ is consistent for a treatment effect (T-consistent") if \(plim \hat{\beta}_k = {\partial E(y|x) }/{\partial x_k}\...

## An Interesting New Book

January 17, 2014
By

Here's a new book that looks as if it will be interesting, and I'm looking forward to reading it myself: Panel Data Analysis Using Eviews, writen by I Gusti Ngurah Agung. Two other related books by this author have been published perviously - see here....

## Happy New Year, It’s Too Late

January 17, 2014
By

Couple people wished me happy new year yesterday, Jan 16th.  But, you realize, the year is already 1/24th over?  From R, rounding by W > 16/365 [1] 0.0438 > 1/24 [1] 0.0417 > 15/365 [1] 0.0411 Somewhere between the 15th and the 16th we cros...

## Missing not at random data makes some Facebook users feel sad

January 17, 2014
By

This article, published last week, explained how "some younger users of Facebook say that using the site often leaves them feeling sad, lonely and inadequate".  Being a statistician  gives you an advantage here because we know that naive estimates from missing … Continue reading →

## How to think about the statistical evidence when the statistical evidence can’t be conclusive?

January 17, 2014
By

There’s a paradigm in applied statistics that goes something like this: 1. There is a scientific or policy question of some theoretical or practical importance. 2. Researchers gather data on relevant outcomes and perform a statistical analysis, ideally leading to a clear conclusion (p less than 0.05, or a strong posterior distribution, or good predictive […]The post How to think about the statistical evidence when the statistical evidence can’t be…