They want help designing a crowdsourcing data analysis project

July 16, 2017

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

Michael Feldman writes:

My collaborators and myself are doing research where we try to understand the reasons for the variability in data analysis (“the garden of forking paths”). Our goal is to understand the reasons why scientists make different decisions regarding their analyses and in doing so reach different results.

In a project called “Crowdsourcing data analysis: Gender, status, and science”, we have recruited a large group of independent analysts to test the same hypotheses on the same dataset using a platform we developed.

The platform is essentially Rstudio running online with few additions:

· We record all executed commands even if they are not in the final code

· We ask analysts to explain these commands by creating semantic blocks explaining the rationale and alternatives

· We allow analysts to create graphical workflow of their work using these blocks and by restructuring them

You can find the more complete experiment description here. Also a short video tutorial of the platform.

Of course this experiment is not covering all considerations that might lead to variability (e.g. R users might differ from Python users), but we believe it is a step towards better understanding how defensible, yet subjective analytic choices may shape research results. The experiment is still running but we are likely to receive about 40-60 submissions of code, logs, comments, and explanations of decisions made. We are also collecting various information about analysts like their background, methods they usually use and the way they operationalized the hypotheses.

Our current plan is to analyze the data from this crowdsourced project using inductive coding by splitting participants into groups that reached similar results (effect size and direction). We then plan to identify factors that can explain various decisions as well as explain the similarities between participants.

We would love to receive any feedback and suggestions from readers of your blog regarding our planned approach to account for variability in results across different analysts.

If anyone has suggestions, feel free to respond in the comments.

The post They want help designing a crowdsourcing data analysis project appeared first on Statistical Modeling, Causal Inference, and Social Science.

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science