Jedi master of data: Hans Rosling

November 11, 2013
By

It has been inspiring to watch how Hans Rosling gave impressive talks about numbers and statistics. If you haven’t seen any of his great presentations, here is one example: Chances are that you probably haven’t seen him showing his wild side before. I just saw this article, “Hans Rosling: the man who makes statistics sing“, […]

Apple’s Touch ID and a worldwide lesson in sensitivity and specificity

November 11, 2013
By

I've been playing with my new iPhone 5s for the last few weeks, and first let me just say that it's an awesome phone. Don't listen to whatever Jeff says. It's probably worth it just for the camera, but I've … Continue reading →

A New Center to Watch for Predictive Macroeconomic and Financial Modeling

November 11, 2013
By

Check out USC's fine new Center for Applied Financial Economics, led by the indefatigable Hashem Pesaran. The first event is a fascinating conference, "Recent Developments on Forecasting Techniques for Macro and Finance."  Lots of information here...

Data Compression and the Nobel in Economics

November 11, 2013
By

Consider the following data compression problem. Suppose we have a large data set we wish to transmit. They’re too many to send directly but luckily the precise values aren’t important. Slightly different values would work as long as the da...

Predictive Modeling

November 11, 2013
By
$\mathbb{E}(X)=\underset{c\in\mathbb{R}}{\text{argmin}}\{\mathbb{E}\left([X-c]^2\right)\}=\underset{c\in\mathbb{R}}{\text{argmin}}\{\mathbb{E}\left(||X-c||_{L_2}\right)\}$

Tomorrow, around noon, I will be giving a talk on predictive modeling for actuaries. In the introduction, I will get back shortly on the idea that a prediction is usually a best estimate, in the sense of getting an expected value. And because it is natural to use least square ideas. In order to illustrate all those concepts, we will use a simple dataset, with the sex, the height and…

Out with Big Data, in with Hyperdata

November 11, 2013
By

Big data is so last year. Collecting data from all sorts of odd places and analyzing it much faster than was possible even a couple of years ago has become one of the hottest areas of the technology industry. The … Continue reading →

Why ask why? Forward causal inference and reverse causal questions

November 11, 2013
By

Guido Imbens and I write: The statistical and econometrics literature on causality is more focused on “effects of causes” than on “causes of effects.” That is, in the standard approach it is natural to study the effect of a treatment, but it is not in general possible to define the causes of any particular outcome. […]The post Why ask why? Forward causal inference and reverse causal questions appeared first on…

Graph redesign is hot

November 11, 2013
By

Joe D., a long time reader, points us to a few blogs that have been active creating redesigns of charts, similar to how we do it here. First up, here are some examples from Storytelling With Data (link). This example...

Multicollinearity tutoral

November 11, 2013
By

I just posted brief multicollinearity tutorial on my other blog (loosely based on the material from the Serious Stats book). You can read it here.Filed under: serious stats, stats advice Tagged: correlation and covariance, general linear model, messy d...

Multicollinearity tutoral

November 11, 2013
By

I just posted brief multicollinearity tutorial on my other blog (loosely based on the material from the Serious Stats book). You can read it here.Filed under: serious stats, stats advice Tagged: correlation and covariance, general linear model, messy d...

A statistical review of ‘Thinking, Fast and Slow’ by Daniel Kahneman

November 11, 2013
By

I failed to find Kahneman’s book in the economics section of the bookshop, so I had to ask where it was.  “Oh, that’s in the psychology section.”  It should have also been in the statistics section. He states that his collaboration with Amos Tversky started with the question: Are humans good intuitive statisticians? The wrong […] The post A statistical review of ‘Thinking, Fast and Slow’ by Daniel Kahneman appeared…

cMDS: visualising changing distances

November 11, 2013
By

Gina Gruenhage has just arxived a new paper describing an algorithm we call cMDS. Here’s what it’s for: if you do any kind of data analysis you often find yourself comparing datapoints using some kind of distance metric. All’s well if you have a unique reasonable distance metric you can use, but often what you […]

A Guide to the Quality of Different Visualization Venues

November 11, 2013
By

I recently got an email from a colleague with the subject, “Academic research, is it all bad?” He had looked at a paper presented at a VIS workshop that people were pointing to on Twitter, and had found it lacking (“it’s just a blog posting”). While there are high-quality venues for visualization research, it’s not easy to be sure which ones are good, and which ones are lower quality.

Schiminovich is on The Simpsons

November 10, 2013
By

OK, fine. Maybe they could work Stan on to the show next? I thought I could retire once I’d successfully inserted the phrase “multilevel regression and poststratification” into the NYT, but now I want more more more. Maybe a cage ...

The GCD and LCM functions in SAS

November 10, 2013
By

My daughter's middle school math class recently reviewed how to compute the greatest common factor (GCF) and the least common multiple (LCM) of a set of integers. (The GCF is sometimes called the greatest common divisor, or GCD.) Both algorithms require factoring integers into a product of primes. While helping [...]

A small comparison of bio-equivalence calculations.

November 10, 2013
By

Last week I looked at two-way cross-over studies and followed the example of Schütz (http://bebac.at/) in the analysis. Since the EU has its on opinions (Questions & Answers: Positions on specific questions addressed to the pharmacokinetics workin...

Modèle de régression et interaction(s) entre facteurs

November 10, 2013
By
$\mathbb{E}(Y\vert \boldsymbol{X}=(X_1,\ldots,X_d))=\varphi(\boldsymbol{X})=\varphi(X_1,\ldots,X_d)$

Dans un modèle de régression, on veut écrire Quand on se limite à un modèle linéaire, on écrit Mais on de doute que l’on rate quelque chose… en particulier, on va rater toutes les interactions possibles. On peut croiser les variables, et supposer que qui peut s’étendre d’avantage, à l’ordre 3, voire davantage. Supposons que nos variables  soient ici qualitatives, et plus précisément binaires. Prenons un exemple simple, avec des données…

Beware of questionable front page articles warning you to beware of questionable front page articles (iii)

November 10, 2013
By

In this time of government cut-backs and sequester, scientists are under increased pressure to dream up ever new strategies to publish attention-getting articles with eye-catching, but inadequately scrutinized, conjectures. Science writers are under similar pressures, and to this end they have found a way to deliver up at least one fire-breathing, front page article a […]

Keynote speaker

November 9, 2013
By

Earlier today, I was trying to finish preparing the poster for the Clinical Trials Methodology Conference \$-\$ I'll have both the poster presentation (on the Expected Value of Information under mixed strategies) and my talk on the Stepped Wedge des...

Typo in Ghitza and Gelman MRP paper

November 9, 2013
By

Devin Caughey points out a typo in the second column of page 765 of our AJPS paper. Here’s what we have: The typo is in the third line of the second paragraph above. Where it says y^*_j = y.bar^*_j n_j, it should be y^*_j = y.bar^*_j n^*_j. One frustrating system of the current system of […]The post Typo in Ghitza and Gelman MRP paper appeared first on Statistical Modeling, Causal…

Multicollinearity and collinearity (in multiple regression) – a tutorial

November 9, 2013
By

This blog post was written for undergraduate research methods teaching. I have therefore tried to keep everything relatively simple and equation-free. The content is based loosely on more detailed material in my book Serious stats. What are collineari...

Null Effects and Replication

November 9, 2013
By

Filed under: Comedy, Error Statistics, Statistics

Maximum Likelihood versus Goodness of Fit

November 9, 2013
By
$\{X_1,\cdots,X_n\}$

Thursday, I got an interesting question from a colleague of mine (JP). I mean, the way I understood the question turned out to be a nice puzzle (but I have to confess I might have misunderstood). The question is the following : consider a i.i.d. sample of continuous variables. We would like to choose between two (parametric) families for the distribution,  and . If we use maximum likelihood techniques, we…