Posts Tagged ‘ Tutorials ’

RNA-seq Data Analysis Course Materials

November 20, 2014
By
RNA-seq Data Analysis Course Materials

Last week I ran a one-day workshop on RNA-seq data analysis in the UVA Health Sciences Library. I set up an AWS public EC2 image with all the necessary software installed. Participants logged into AWS, launched the image, and we kicked off the morning ...

Read more »

Can we try to make an adjustment?

November 14, 2014
By
Can we try to make an adjustment?

In most of our data science teaching (including our book Practical Data Science with R) we emphasize the deliberately easy problem of “exchangeable prediction.” We define exchangeable prediction as: given a series of observations with two distinguished classes of variables/observations denoted “x”s (denoting control variables, independent variables, experimental variables, or predictor variables) and “y” (denoting […] Related posts: Don’t use correlation to track prediction performance Reading the Gauss-Markov theorem Bad…

Read more »

Bias/variance tradeoff as gamesmanship

October 30, 2014
By
Bias/variance tradeoff as gamesmanship

Continuing our series of reading out loud from a single page of a statistics book we look at page 224 of the 1972 Dover edition of Leonard J. Savage’s “The Foundations of Statistics.” On this page we are treated to an example attributed to Leo A. Goodman in 1953 that illustrates how for normally distributed […] Related posts: Automatic bias correction doesn’t fix omitted variable bias Reading the Gauss-Markov theorem…

Read more »

Calculating the sum or mean of a numeric (continuous) variable by a group (categorical) variable in SAS

Calculating the sum or mean of a numeric (continuous) variable by a group (categorical) variable in SAS

Introduction A common task in data analysis and statistics is to calculate the sum or mean of a continuous variable.  If that variable can be categorized into 2 or more classes, you may want to get the sum or mean for each class. This sounds like a simple task, yet I took a surprisingly long time […]

Read more »

Reading the Gauss-Markov theorem

August 26, 2014
By
Reading the Gauss-Markov theorem

What is the Gauss-Markov theorem? From “The Cambridge Dictionary of Statistics” B. S. Everitt, 2nd Edition: A theorem that proves that if the error terms in a multiple regression have the same variance and are uncorrelated, then the estimators of the parameters in the model produced by least squares estimation are better (in the sense […] Related posts: What is meant by regression modeling? Skimming statistics papers for the ideas…

Read more »

The Chi-Squared Test of Independence – An Example in Both R and SAS

The Chi-Squared Test of Independence – An Example in Both R and SAS

Introduction The chi-squared test of independence is one of the most basic and common hypothesis tests in the statistical analysis of categorical data.  Given 2 categorical random variables, and , the chi-squared test of independence determines whether or not there exists a statistical dependence between them.  Formally, it is a hypothesis test with the following null and […]

Read more »

Do your "data janitor work" like a boss with dplyr

August 20, 2014
By
Do your "data janitor work" like a boss with dplyr

Data “janitor-work”The New York Times recently ran a piece on wrangling and cleaning data:“For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights”Whether you call it “janitor-work,” wrangling/munging, cleaning/cleansing/scru...

Read more »

Video Tutorial – Calculating Expected Counts in a Contingency Table Using Joint Probabilities

Video Tutorial – Calculating Expected Counts in a Contingency Table Using Joint Probabilities

In an earlier video, I showed how to calculate expected counts in a contingency table using marginal proportions and totals.  (Recall that expected counts are needed to conduct hypothesis tests of independence between categorical random variables.)  Today, I want to share a second video of calculating expected counts – this time, using joint probabilities.  This method uses […]

Read more »

Video Tutorial – Allelic Frequencies Remain Constant From Generation to Generation Under the Hardy-Weinberg Equilibrium

Video Tutorial – Allelic Frequencies Remain Constant From Generation to Generation Under the Hardy-Weinberg Equilibrium

The Hardy-Weinberg law is a fundamental principle in statistical genetics.  If its 7 assumptions are fulfilled, then it predicts that the allelic frequency of a genetic trait will remain constant from generation to generation.  In this new video tutorial in my Youtube channel, I explain the math behind the Hardy-Weinberg theorem.  In particular, I clarify […]

Read more »

Automatic bias correction doesn’t fix omitted variable bias

July 8, 2014
By
Automatic bias correction doesn’t fix omitted variable bias

Page 94 of Gelman, Carlin, Stern, Dunson, Vehtari, Rubin “Bayesian Data Analysis” 3rd Edition (which we will call BDA3) provides a great example of what happens when common broad frequentist bias criticisms are over-applied to predictions from ordinary linear regression: the predictions appear to fall apart. BDA3 goes on to exhibit what might be considered […] Related posts: Frequentist inference only seems easy Six Fundamental Methods to Generate a Random…

Read more »


Subscribe

Email:

  Subscribe