This Monday, I will be giving a talk at Paris 7, room 1016 of the Sophie Germain building, on causality with non-Gaussian time series. Slides are now online.

Yesterday I posted a methods-focused item at the Monkey Cage, a follow-up of a post from a couple years ago arguing against some dramatic claims by economists Ashraf and Galor regarding the wealth of nations. No big deal, just some standard-issue skepticism. But for some reason this one caught fire—maybe somebody important linked to it, […] The post I owe it all to my Neanderthal genes appeared first on Statistical…

Mon: I owe it all to my Neanderthal genes Tues: If Yogi Berra could see this one, he’d spin in his grave: Regression modeling using a convenience sample Wed: 64 Shades of Gray: The subtle effect of chessboard images on foreign policy polarization Thurs: Integrating graphs into your workflow Fri: Gary Venter’s age-period-cohort decomposition of […] The post On deck this week appeared first on Statistical Modeling, Causal Inference, and…

Last week I attended SAS Global Forum 2016 in Las Vegas. I and more than 5,000 other attendees discussed and shared tips about data analysis and statistics. Naturally, I attended many presentations that featured using SAS/IML software to implement advanced analytical algorithms. Several speakers showed impressive mastery of SAS/IML programming […] The post Matrix computations at SAS Global Forum 2016 appeared first on The DO Loop.

The shape of a dataset is hugely important to how well it can be handled by different software. The shape defines how it is laid out: wide as in a spreadsheet, or long as in a database table. Each has its use, but it’s important to understand their differences and when each is the right choice. Wide … Continue reading Spreadsheet Thinking vs. Database Thinking

“Risk aversion” comes up a lot in microeconomics, but I think that it’s too broad a concept to do much for us. In many many cases, it seems to me that, when there is a decision option, either behavior X or behavior not-X can be thought as risk averse, depending on the framing. Thus, when […] The post Risk aversion is a two-way street appeared first on Statistical Modeling, Causal…

The traditional answer is that the prior distribution represents your state of knowledge, that there is no “true” prior. Or, conversely, that the true prior is an expression of your beliefs, so that different statisticians can have different true priors. Or even that any prior is true by definition, in representing a subjective state of […] The post What is the “true prior distribution”? A hard-nosed answer. appeared first on…

Yee Whye Teh sends along this paper with Leonard Hasenclever, Thibaut Lienart, Sebastian Vollmer, Stefan Webb, Balaji Lakshminarayanan, and Charles Blundell. I haven’t read it in detail but they not similarities to our “expectation propaga...

This is the second part in a series of posts on the flexibility of our Excel Add In Sharp-R, that allows functions defined in R code to be run on data in any Excel worksheet. Part one looked as exploratory data analysis. This post deals with time serie...

Jennifer Hill announces “the first-ever ACIC causal inference data analysis competition”: Is your SATT where it’s at? Participate by submitting treatment effect estimates across a range of datasets OR by submitting a function (in any of a variety of programming languages) that will take input (covariate, treatment assignment, and response) and generate a treatment effect […] The post The causal inference competition you’ve all been waiting for! appeared first on…

fidlr is an RStudio addin designed to simplify the financial data downloading process from various providers. This initial version is a wrapper around the getSymbols function in the quantmod package and only Yahoo, Google, FRED and Oanda are supported. I will probably add functionalities over time. As usual with those things just a kind reminder: “THE SOFTWARE […]

The bit of R code below illustrates the principal curves methods as described in The Elements of Statistical Learning, by Hastie, Tibshirani, and Friedman (Ch. 14; the book is freely available from the authors' website). Specifically, the code generates some bivariate data that have a nonlinear association, initializes the principal curve using the first (linear) principal … Continue reading Principal curves example (Elements of Statistical Learning) →

I happen to come across this post from 2011 that I like so much, I thought I’d say it again: Columbia College has for many years had a Core Curriculum, in which students read classics such as Plato (in translation) etc. A few years ago they created a Science core course. There was always some […] The post A new idea for a science core course based entirely on computer…

We've had a fantastic response to the workshop. In just a few days of public advertisement (I firstly posted about it and then advertised on allstat and HEALTHECON-ALL) we got 65 registration, as of today. We have provisionally set out 100 "t...

A puzzle on The Riddler this week that ends up as a standard integer programming problem. Removing the little story around the question, it boils down to optimise 200a+100b+50c+25d under the constraints 400a+400b+150c+50d≤1000, b≤a, a≤1, c≤8, d≤4, and (a,b,c,d) all non-negative integers. My first attempt was a brute force R code since there are only […]

Attention conservation notice: Self-promotion, and irrelevant unless you (1) will be a student at Carnegie Mellon in the fall, or (2) have a morbid curiosity about a field in which the realities of social life are first caricatured into an impoverishe...

4 September 1918 -- 4 April 2016

Michael Oakes pointed me to this excellent news article by Daniel Engber, subtitled, “There’s a replication crisis in biomedicine—and no one even knows how deep it runs.” Engber suggests that the replication problem in biomedical research is worse than the much-publicized replication problem in psychology. One reason, which I didn’t see Engber discussing, is financial […] The post “Cancer Research Is Broken” appeared first on Statistical Modeling, Causal Inference, and…

You can visualize missing data. It sounds like an oxymoron, but it is true. How can you draw graphs of something that is missing? In a previous article, I showed how you can use PROC MI in SAS/STAT software to create a table that shows patterns of missing data in […] The post Visualize missing data in SAS appeared first on The DO Loop.