Visually weighting regression displays

August 9, 2012

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

Solomon Hsiang writes:

One of my colleagues suggested that I send you this very short note that I wrote on a new approach for displaying regression result uncertainty (attached). It’s very simple, and I’ve found it effective in one of my papers where I actually use it, but if you have a chance to glance over it and have any ideas for how to sell the approach or make it better, I’d be very interested to hear them. (Also, if you’ve seen that someone else has already made this point, I’d appreciate knowing that too.)

Here’s an example:

Hsiang writes:

In Panel A, our eyes are drawn outward, away from the center of the display and toward the swirling confidence intervals at the edges. But in Panel B, our eyes are attracted to the middle of the regression line, where the high contrast between the line and the background is sharp and visually heavy. By using visual-weighting, we focus our readers’s attention on those portions of the regression that contain the most information and where the findings are strongest. Furthermore, when we attempt to look at the edges of Panel B, we actually feel a little uncertain, as if we are trying to make out the shape of something through a fog. This is good thing, because everyone knows that feeling, even if we have no statistical training (or ignore that training when its inconvenient). By aligning the feeling of uncertainty with actual statistical uncertainty, we can more intuitively and more effectively communicate uncertainty in our results to a broader set of viewers.

I like that. But, once you’re making those edges blurry, couldn’t you also spread them out, to get the best of both worlds, the uncertainty bounds and the visual weighting?

Think about it this way. Suppose that, instead of displaying the fitted curve and error bounds, you make a spaghetti-style plot showing, say, 1000 draws of the regression curve from the uncertainty distribution. Usually when we do this we just let the lines overwrite, but suppose that instead we make each of the 1000 lines really light gray but then increase the darkness when two or more lines overlap. Then you’ll get a graph where the curve is automatically darker where the uncertainty distribution is more concentrated and lighter where the distribution is more vague.

Now take this a step further. You don’t actually need to draw the 1000 lines, instead you can do it analytically and just plot the color intensities in proportion to the distributions. The result will look something like Hsiang’s visually-weighted regression but more spread out where the curve is more uncertain.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science