Posts Tagged ‘ Data Analysis ’

Sample quantiles: A comparison of 9 definitions

May 24, 2017
By

According to Hyndman and Fan ("Sample Quantiles in Statistical Packages," TAS, 1996), there are nine definitions of sample quantiles that commonly appear in statistical software packages. Hyndman and Fan identify three definitions that are based on rounding and six methods that are based on linear interpolation. This blog post shows [...] The post Sample quantiles: A comparison of 9 definitions appeared first on The DO Loop.

Quantile definitions in SAS

May 22, 2017
By

In last week's article about the Flint water crisis, I computed the 90th percentile of a small data set. Although I didn't mention it, the value that I reported is different from the the 90th percentile that is reported in Significance magazine. That is not unusual. The data only had [...] The post Quantile definitions in SAS appeared first on The DO Loop.

Quantiles and the Flint water crisis

May 17, 2017
By

The April 2017 issue of Significance magazine features a cover story by Robert Langkjaer-Bain about the Flint (Michigan) water crisis. For those who don't know, the Flint water crisis started in 2014 when the impoverished city began using the Flint River as a source of city water. The water was [...] The post Quantiles and the Flint water crisis appeared first on The DO Loop.

Timeline of living US presidents

May 8, 2017
By

Quick! What is the next term in the numerical sequence 1, 2, 1, 2, 3, 4, 5, 4, 3, 4, ...? If you said '3', then you must be an American history expert, because that sequence represents the number of living US presidents beginning with Washington's inauguration on 30APR1789 and [...] The post Timeline of living US presidents appeared first on The DO Loop.

Perceptions of probability

May 3, 2017
By

If a financial analyst says it is "likely" that a company will be profitable next year, what probability would you ascribe to that statement? If an intelligence report claims that there is "little chance" of a terrorist attack against an embassy, should the ambassador interpret this as a one-in-a-hundred chance, [...] The post Perceptions of probability appeared first on The DO Loop.

Split data into groups that have the same mean and variance

May 1, 2017
By

A frequently asked question on SAS discussion forums concerns randomly assigning units (often patients in a study) to various experimental groups so that each group has approximately the same number of units. This basic problem is easily solved in SAS by using PROC SURVEYSELECT or a DATA step program. A [...] The post Split data into groups that have the same mean and variance appeared first on The DO Loop.

Visualize a design matrix

April 26, 2017
By

Most SAS regression procedures support a CLASS statement which internally generates dummy variables for categorical variables. I have previously described what dummy variables are and how are they used. I have also written about how to create design matrices that contain dummy variables in SAS, and in particular how to [...] The post Visualize a design matrix appeared first on The DO Loop.

Visualize an ANOVA with two-way interactions

April 24, 2017
By

There are several ways to visualize data in a two-way ANOVA model. Most visualizations show a statistical summary of the response variable for each category. However, for small data sets, it can be useful to overlay the raw data. This article shows a simple trick that you can use to [...] The post Visualize an ANOVA with two-way interactions appeared first on The DO Loop.

Regression with restricted cubic splines in SAS

April 19, 2017
By

Restricted cubic splines are a powerful technique for modeling nonlinear relationships by using linear regression models. I have attended multiple SAS Global Forum presentations that show how to use restricted cubic splines in SAS regression procedures. However, the presenters have all used the %RCSPLINE macro (Frank Harrell, 1988) to generate [...] The post Regression with restricted cubic splines in SAS appeared first on The DO Loop.

Sorting correlation coefficients by their magnitudes in a SAS macro

$Sorting correlation coefficients by their magnitudes in a SAS macro$

Theoretical Background Many statisticians and data scientists use the correlation coefficient to study the relationship between 2 variables.  For 2 random variables, and , the correlation coefficient between them is defined as their covariance scaled by the product of their standard deviations.  Algebraically, this can be expressed as . In real life, you can never […]