# The landscape of data analysis

January 10, 2013
By

(This article was originally published at Simply Statistics, and syndicated at StatsBlogs.)

I have been getting some questions via email, LinkedIn, and Twitter about the content of the Data Analysis class I will be teaching for Coursera. Data Analysis and Data Science mean different things to different people. So I made a video describing how Data Analysis fits into the landscape of other quantitative classes here:

Here is the corresponding presentation. I also made a tentative list of topics we will cover, subject to change at the instructor’s whim. Here it is:

• The structure of a data analysis  (steps in the process, knowing when to quit, etc.)
• Types of data (census, designed studies, randomized trials)
• Types of data analysis questions (exploratory, inferential, predictive, etc.)
• How to write up a data analysis (compositional style, reproducibility, etc.)
• Obtaining data from the web (through downloads mostly)
• Loading data into R from different file types
• Plotting data for exploratory purposes (boxplots, scatterplots, etc.)
• Exploratory statistical models (clustering)
• Statistical models for inference (linear models, basic confidence intervals/hypothesis testing)
• Basic model checking (primarily visually)
• The prediction process
• Study design for prediction
• Cross-validation
• A couple of simple prediction models
• Basics of simulation for evaluating models
• Ways you can fool yourself and how to avoid them (confounding, multiple testing, etc.)

Of course that is a ton of material for 8 weeks and so obviously we will be covering just the very basics. I think it is really important to remember that being a good Data Analyst is like being a good surgeon or writer. There is no such thing as a prodigy in surgery or writing, because it requires long experience, trying lots of things out, and learning from mistakes. I hope to give people the basic information they need to get started and point to resources where they can learn more. I also hope to give them a chance to practice a couple of times some basics and to learn that in data analysis the first goal is to “do no harm”.

Please comment on the article here: Simply Statistics

Tags: , , ,

 Tweet

Email: