Someone points to my paper with Gary King from 1998, Estimating the probability of events that have never occurred: When is your vote decisive?, and writes:
In my area of early childhood intervention, there are certain outcomes which are rare. Things like premature birth, confirmed cases of child-maltreatment, SIDS, etc. They are rare enough that they occasionally won’t even show up in a given study sample. Studies looking at the efficacy of interventions to reduce the probability of these events happening have shown mixed results that, taken as a whole, probably suggest home visiting isn’t having a big impact here (though my own opinion is that we don’t really know much of anything). I wonder if part of the problem is the rarity of the events and if our methods for analysis are really inappropriate. The most typical way this is assessed is via a logistic regression (frequentist and basic). These events are so rare that many studies actually look at proxies, which themselves, have little data to support their predictive ability of the actual endpoint (which I worry are noisy).
Would a “rare event” analysis technique be potentially appropriate for more accurately modeling the potential effectiveness of an intervention on reducing something like child maltreatment, premature birth, or SIDS?
My reply: yes, it makes sense to model precursor data. Looking at the problem 20 years later, it occurs to me that it would make sense to do some sort of hierarchical modeling with informative priors. With rare events, priors would be needed to regularize, but maybe then you could work with a whole bunch of different outcomes and precursors together.
I’m too busy to do this right now, but I guess the right way to start here would be to set up a fake-data simulation study where there are rare events of interest and a various precursors, and then go from there.