(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

Statistical Challenges of Survey Sampling and Big Data

Andrew Gelman, Department of Statistics and Department of Political Science, Columbia University, New York

Big Data need Big Model. Big Data are typically convenience samples, not random samples; observational comparisons, not controlled experiments; available data, not measurements designed for a particular study. As a result, it is necessary to adjust to extrapolate from sample to population, to match treatment to control group, and to generalize from observations to underlying constructs of interest. Big Data + Big Model = expensive computation, especially given that we do not know the best model ahead of time and thus must typically fit many models to understand what can be learned from any given dataset. We discuss Bayesian methods for constructing, fitting, checking, and improving such models.

It’ll be at the 5th Italian Conference on Survey Methodology, at the Department of Statistical Sciences of the University of Bologna. A low-carbon remote talk.

The post Statistical Challenges of Survey Sampling and Big Data (my remote talk in Bologna this Thurs, 15 June, 4:15pm) appeared first on Statistical Modeling, Causal Inference, and Social Science.

**Please comment on the article here:** **Statistical Modeling, Causal Inference, and Social Science**