Posts Tagged ‘ Data Analysis ’

Calculating the sum or mean of a numeric (continuous) variable by a group (categorical) variable in SAS

Calculating the sum or mean of a numeric (continuous) variable by a group (categorical) variable in SAS

Introduction A common task in data analysis and statistics is to calculate the sum or mean of a continuous variable.  If that variable can be categorized into 2 or more classes, you may want to get the sum or mean for each class. This sounds like a simple task, yet I took a surprisingly long time […]

Read more »

Does this kurtosis make my tail look fat?

October 22, 2014
By
Does this kurtosis make my tail look fat?

What is kurtosis? What does negative or positive kurtosis mean, and why should you care? How do you compute kurtosis in SAS software? It is not clear from the definition of kurtosis what (if anything) kurtosis tells us about the shape of a distribution, or why kurtosis is relevant to […]

Read more »

The frequency of double-letters in Cryptoquotes

October 10, 2014
By
The frequency of double-letters in Cryptoquotes

It usually takes more than three weeks to prepare a good impromptu speech.        --Mark Twain In the popular Cryptoquote puzzle, you are presented with an enciphered version of a quote by a famous person. One of the appeals of the puzzle for me is reading the deciphered quote, such […]

Read more »

Which double letters appear most frequently in English text?

October 3, 2014
By
Which double letters appear most frequently in English text?

Double, double toil and trouble; Fire burn, and caldron bubble.     Macbeth, Act IV, Scene I For the cyptanalyst or recreational puzzle solver, "double double" does not lead to toil or trouble. Just the opposite: The occurrence of a double-letter bigram in an enciphered word puzzle is quite fortunate. Certain double […]

Read more »

Create discrete heat maps in SAS/IML

September 29, 2014
By
Create discrete heat maps in SAS/IML

In a previous article I introduced the HEATMAPCONT subroutine in SAS/IML 13.1, which makes it easy to visualize matrices by using heat maps with continuous color ramps. This article introduces a companion subroutine. The HEATMAPDISC subroutine, which also requires SAS/IML 13.1, is designed to visualize matrices that have a small […]

Read more »

The frequency of bigrams in an English corpus

September 26, 2014
By
The frequency of bigrams in an English corpus

In last week's article about the distribution of letters in an English corpus, I presented research results by Peter Norvig who used Google's digitized library and tabulated the frequency of each letter. Norvig also tabulated the frequency of bigrams, which are pairs of letters that appear consecutively within a word. […]

Read more »

Designing a quantile bin plot

September 24, 2014
By
Designing a quantile bin plot

While at JSM 2014 in Boston, a statistician asked me whether it was possible to create a "customized bin plot" in SAS. When I asked for more information, she told me that she has a large data set. She wants to visualize the data, but a scatter plot is not […]

Read more »

Skew this

September 22, 2014
By
Skew this

The skewness of a distribution indicates whether a distribution is symmetric or not. A distribution that is symmetric about its mean has zero skewness. In contrast, if the right tail of a unimodal distribution has more mass than the left tail, then the distribution is said to be "right skewed" […]

Read more »

The frequency of letters in an English corpus

September 19, 2014
By
The frequency of letters in an English corpus

It's time for another blog post about ciphers. As I indicated in my previous blog post about substitution ciphers, the classical substitution cipher is no longer used to encrypt ultra-secret messages because the enciphered text is prone to a type of statistical attack known as frequency analysis. At the root […]

Read more »

An exploratory technique for visualizing the distributions of 100 variables

September 10, 2014
By
An exploratory technique for visualizing the distributions of 100 variables

In a previous blog post I showed how to order a set of variables by a statistic. After reshaping data, you can create a graph that contains box plots for many variables. Ordering the variables by some statistic (mean, median, variance,...) helps to differentiate and distinguish the variables. You can […]

Read more »


Subscribe

Email:

  Subscribe