# Posts Tagged ‘ Data Analysis ’

## Sorting correlation coefficients by their magnitudes in a SAS macro

Theoretical Background Many statisticians and data scientists use the correlation coefficient to study the relationship between 2 variables.  For 2 random variables, and , the correlation coefficient between them is defined as their covariance scaled by the product of their standard deviations.  Algebraically, this can be expressed as . In real life, you can never […]

## Find your birthday in the digits of pi

March 13, 2017
It is time for Pi Day, 2017! Every year on March 14th (written 3/14 in the US), geeky mathematicians and their friends celebrate "all things pi-related" because 3.14 is the three-decimal approximation to pi. This year I use SAS software to show an amazing fact: you can find your birthday in the digits of pi.

## Quantile estimates and the difference of medians in SAS

February 22, 2017
Sometimes SAS programmers ask about how to analyze quantiles with SAS. Common questions include: How can I compute 95% confidence intervals for a median in SAS? How can I test whether the medians of two independent samples are significantly different? How can I repeat the previous analyses with other percentiles?

## The distribution of colors for plain M&M candies

February 20, 2017
Many introductory courses in probability and statistics encourage students to collect and analyze real data. A popular experiment in categorical data analysis is to give students a bag of M&M® candies and ask them to estimate the proportion of colors in the population from the sample data. In some classes, students pool their data to estimate the distribution of colors.

## An easy way to run thousands of regressions in SAS

February 13, 2017
A common question on SAS discussion forums is how to repeat an analysis multiple times. Most programmers know that the most efficient way to analyze one model across many subsets of the data (perhaps each country or each state) is to sort the data and use a BY statement to repeat the analysis for each subset.

## Counting is hard, especially when you don’t have theories

January 19, 2017
Exploring the data about movies, uncovering data issues

## Ten posts from 2016 that deserve a second look

January 11, 2017
Last week I wrote about the 10 most popular articles from The DO Loop in 2016. The popular articles tend to be about elementary topics that appeal to a wide range of SAS programmers. Today I present an "editor's choice" list of technical articles that describe more advanced statistical methods and data analysis techniques.

## Is "La Quinta" Spanish for "Next to Denny’s"?

January 6, 2017
"La Quinta" is Spanish for "next to Denny's."      -- Mitch Hedberg, comedian Mitch Hedberg's joke resonates with travelers who drive on the US interstate system because many highway exits feature both a La Quinta Inn™ and a Denny's® restaurant within a short distance of each other. But does a statistical analysis of the locations support this observation?

## The top 10 posts from The DO Loop in 2016

January 4, 2017
I wrote 105 posts for The DO Loop blog in 2016. My most popular articles were about data analysis, SAS programming tips, and elementary statistics. Without further ado, here are the most popular articles from 2016. Data Analysis and Visualization Start with a juicy set of data and an interesting question.

## Data Preparation, Long Form and tl;dr Form

December 26, 2016
Data preparation and cleaning are some of the most important steps of predictive analytic and data science tasks. They are laborious, where most of the errors are made, your last line of defense against a wild data, and hold the biggest opportunities for outcome improvement. No matter how much time you spend on them, they are never enough.