This workshop should be really interesting:
- Aggregating and analysing crowdsourced annotations for NLP EMNLP Workshop. November 3–4, 2019. Hong Kong.
I really like the problem of crowdsourcing. Especially for machine learning data curation. It’s a fantastic problem that admits of really nice Bayesian hierarchical models (no surprise to this blog’s audience!).
The rest of this note’s a bit more personal, but I’d very much like to see others adopting similar plans for the future for data curation and application.
Crowdsourcing is near and dear to my heart as it’s the first serious Bayesian modeling problem I worked on. Breck Baldwin and I were working on crowdsourcing for applied natural language processing in the mid 2000s. I couldn’t quite figure out a Bayesian model for it by myself, so I asked Andrew if he could help. He invited me to the “playroom” (a salon-like meeting he used to run every week at Columbia), where he and Jennifer Hill helped me formulate a crowdsourcing model.
As Andrew likes to say, every good model was invented decades ago for psychometrics, and this one’s no different. Phil Dawid had formulated exactly the same model (without the hierarchical component) back in 1979, estimating parameters with EM (itself only published in 1977). The key idea is treating the crowdsourced data like any other noisy measurement. Once you do that, it’s just down to details.
Part of my original motivation for developing Stan was to have a robust way to fit these models. Hamiltonian Monte Carlo (HMC) only handles continuous parameters, so like in Dawid’s application of EM, I had to marginalize out the discrete parameters. This marginalization’s the key to getting these models to sample effectively. Sampling discrete parameters that can be marginalized is a mug’s game.
Coming full circle, I co-authored a paper with Silviu and Dirk recently, Comparing Bayesian models of annotation, that reformulated and evaluated a bunch of these models in Stan.
Editorial Aside: Every field should move to journals like TACL. Free to publish, fully open access, and roughly one month turnarond to first decision. You have to experience journals like this in action to believe it’s possible.
I want to see these general techniques applied to creating probabilistic corpora, to online adaptative training data (aka active learning), to joint corpus inference and model training (a la Raykar et al.’s models), and to evaluation.
P.S. Cultural consensus theory
I’m not the only one who recreated Dawid and Skene’s model. It’s everywhere these days.
Recently, I just discovered an entire literature dating back decades on cultural consensus theory, which uses very similar models (I’m pretty sure either Lauren Kennedy or Duco Veen pointed out the literature). The authors go more into the philosophical underpinnings of the notion of consensus driving these models (basically the underlying truth of which you are taking noisy measurements). One neat innovation in the cultural consensus theory literature is a mixture model of truth—you can assume multiple subcultures are coding the data with different standards. I’d thought of mixture models of coders (say experts, Mechanical turkers, and undergrads), but not of the truth.
In yet another small world phenomenon, right after I discovered cultural consensus theory, I saw a cello concert organized through Groupmuse by a social scientist at NYU I’d originally met through a mutual friend of Andrew’s. He introduced the cellist, Iona Batchelder, and added as an aside she was the daughter of well known social scientists. Not just any social scientists, the developers of cultural consensus theory!