“I feel like the really solid information therein comes from non or negative correlations”

Steve Roth writes:

I’d love to hear your thoughts on this approach (heavily inspired by Arindrajit Dube’s work, linked therein):

This relates to our discussion from 2014:

My biggest takeaway from this latest: I feel like the really solid information therein comes from non or negative correlations:

• It comes before
• But it doesn’t correlate with ensuing (or it correlates negatively)

It’s pretty darned certain it isn’t caused by.

If smoking didn’t correlate with ensuing lung cancer (or correlated negatively), we’d say with pretty strong certainty that smoking doesn’t cause cancer, right?

By contrast, positive correlation only tells us that something (out of an infinity of explanations) might be causing the apparent effect of A on B. Non or negative correlation strongly disproves a hypothesis.

I’m less confident saying: if we don’t look at multiple positive and negative time lags for time series correlations, we don’t really learn anything from them?

More generally, this is basic Popper/science/falsification. The depressing takeaway: all we can really do with correlation analysis is disprove an infinite set of hypotheses, one at a time? Hoping that eventually we’ll gain confidence in the non-disproved causal hypotheses? Slow work!

It also suggests that file-drawer bias is far more pernicious than is generally allowed. The institutional incentives actually suppress the most useful, convincing findings? Disproofs?

(This all toward my somewhat obsessive economic interests: does wealth concentration/inequality cause slower economic growth one year, five years, twenty years later? The data’s still sparse…)

Roth summarizes:

“Dispositive” findings are literally non-positive. They dispose of hypotheses.

My reply:

1. The general point reminds me of my dictum that statistical hypothesis testing works the opposite way that people think it does. The usual thinking is that if a hyp test rejects, you’ve learned something, but if the test does not reject, you can’t say anything. I’d say it’s the opposite: if the test rejects, you haven’t learned anything—after all, we know ahead of time that just about all null hypotheses of interest are false—but if the test doesn’t reject, you’ve learned the useful fact that you don’t have enough data in your analysis to distinguish from pure noise.

2. That said, what you write can’t be literally true. Zero or nonzero correlations don’t stay zero or nonzero after you control for other variables. For example, smoking didn’t correlate with lung cancer in observational data, sure, that would be a surprise, but in any case you’d have to look at other differences between the exposed and unexposed groups.

3. As a side remark, just reacting to something at the end of the your email, I continue to think that file drawer is overrated, given the huge number of researcher degrees of freedom, even in many preregistered studies (for example here). Researchers have no need to bury non-findings in the file drawer; instead they can extract findings of interest from just about any dataset.