# Posts Tagged ‘ Data Analysis ’

## Exploratory Data Analysis of Ozone Pollution in New York City – Descriptive Statistics

May 19, 2013
By

Introduction This is the first of a series of posts on exploratory data analysis (EDA).  This post will calculate the common summary statistics of a univariate continuous data set – the data on ozone pollution in New York City that is part of the built-in “CO2″ data set in R.  This is a particularly good data set […]

## Use regression for a univariate analysis? Yes!

May 13, 2013
By

I've conducted a lot of univariate analyses in SAS, yet I'm always surprised when the best way to carry out the analysis uses a SAS regression procedure. I always think, "This is a univariate analysis! Why am I using a regression procedure? Doesn't a regression require at least two variables?" [...]

## “My” chromosome 8p inversion

May 8, 2013
By

There was lots of discussion on twitter yesterday about Graham Coop’s paper with Peter Ralph (or vice versa), on The geography of recent genetic ancestry across Europe, particularly regarding the FAQ they’d created. I was eager to take a look, and, it’s slightly embarrassing to say, I first did a search to see if they’d […]

## A three-panel visualization of a distribution

May 8, 2013
By

At a recent conference, I talked with a SAS customer who told me that he was using an R package to create a three-panel visualization of a distribution. Unfortunately, he couldn't remember the name of the package, and he has not returned my e-mails, so the purpose of today's article [...]

## Compute confidence intervals for percentiles in SAS

May 6, 2013
By

PROC UNIVARIATE has provided confidence intervals for standard percentiles (quartiles) for eons. However, in SAS 9.3M2 (featuring the 12.1 analytical procedures) you can use a new feature in PROC UNIVARIATE to compute confidence intervals for a specified list of percentiles. To be clear, percentiles and quantiles are essentially the same [...]

## Quantile regression: Better than connecting the sample quantiles of binned data

April 17, 2013
By

I often see variations of the following question posted on statistical discussion forums: I want to bin the X variable into a small number of values. For each bin, I want to draw the quartiles of the Y variable for that bin. Then I want to connect the corresponding quartile [...]

## Data science is statistics

April 5, 2013
By

When physicists do mathematics, they don’t say they’re doing “number science”. They’re doing math. If you’re analyzing data, you’re doing statistics. You can call it data science or informatics or analytics or whatever, but it’s still statistics. If you say that one kind of data analysis is statistics and another kind is not, you’re not […]

## The difference of density estimates: When does it make sense?

April 3, 2013
By

I was recently asked how to compute the difference between two density estimates in SAS. The person who asked the question sent me a link to a paper from The Review of Economics and Statistics that contains several examples of this technique (for example, see Figure 3 on p. 16 [...]

## How do Dew and Fog Form? Nature at Work with Temperature, Vapour Pressure, and Partial Pressure

April 1, 2013
By

In the early morning, especially here in Canada, I often see dew – water droplets formed by the condensation of water vapour on outside surfaces, like windows, car roofs, and leaves of trees.  I also sometimes see fog – water droplets or ice crystals that are suspended in air and often blocking visibility at great […]

## Checking for Normality with Quantile Ranges and the Standard Deviation

March 31, 2013
By
$Checking for Normality with Quantile Ranges and the Standard Deviation$

Introduction I was reading Michael Trosset’s “An Introduction to Statistical Inference and Its Applications with R”, and I learned a basic but interesting fact about the normal distribution’s interquartile range and standard deviation that I had not learned before.  This turns out to be a good way to check for normality in a data set. […]