Posts Tagged ‘ Data Analysis ’

Initial steps towards reproducible research

December 4, 2014
By
Initial steps towards reproducible research

In anticipation of next week’s Reproducible Science Hackathon at NESCent, I was thinking about Christie Bahlai’s post on “Baby steps for the open-curious.” Moving from Ye Olde Standard Computational Science Practice to a fully reproducible workflow seems a monumental task, but partially reproducible is better than not-at-all reproducible, and it’d be good to give people […]

Read more »

Resampling and permutation tests in SAS

November 21, 2014
By
Resampling and permutation tests in SAS

My colleagues at the SAS & R blog recently posted an example of how to program a permutation test in SAS and R. Their SAS implementation used Base SAS and was "relatively cumbersome" (their words) when compared with the R code. In today's post I implement the permutation test in […]

Read more »

The distribution of blood types by country

November 7, 2014
By
The distribution of blood types by country

My colleague Robert Allison has a knack for finding fascinating data. Last week he did it again by locating data about how blood types and Rh factors vary among countries. He produced a series of eight world maps, each showing the prevalence of a blood type (A+, A-, B+, B-, […]

Read more »

Binning data by quantiles? Beware of rounded data

November 5, 2014
By
Binning data by quantiles? Beware of rounded data

In my article about how to create a quantile plot, I chose not to discuss a theoretical issue that occasionally occurs. The issue is that for discrete data (which includes rounded values), it might be impossible to use quantile values to split the data into k groups where each group […]

Read more »

Calculating the sum or mean of a numeric (continuous) variable by a group (categorical) variable in SAS

Calculating the sum or mean of a numeric (continuous) variable by a group (categorical) variable in SAS

Introduction A common task in data analysis and statistics is to calculate the sum or mean of a continuous variable.  If that variable can be categorized into 2 or more classes, you may want to get the sum or mean for each class. This sounds like a simple task, yet I took a surprisingly long time […]

Read more »

Does this kurtosis make my tail look fat?

October 22, 2014
By
Does this kurtosis make my tail look fat?

What is kurtosis? What does negative or positive kurtosis mean, and why should you care? How do you compute kurtosis in SAS software? It is not clear from the definition of kurtosis what (if anything) kurtosis tells us about the shape of a distribution, or why kurtosis is relevant to […]

Read more »

The frequency of double-letters in Cryptoquotes

October 10, 2014
By
The frequency of double-letters in Cryptoquotes

It usually takes more than three weeks to prepare a good impromptu speech.        --Mark Twain In the popular Cryptoquote puzzle, you are presented with an enciphered version of a quote by a famous person. One of the appeals of the puzzle for me is reading the deciphered quote, such […]

Read more »

Which double letters appear most frequently in English text?

October 3, 2014
By
Which double letters appear most frequently in English text?

Double, double toil and trouble; Fire burn, and caldron bubble.     Macbeth, Act IV, Scene I For the cyptanalyst or recreational puzzle solver, "double double" does not lead to toil or trouble. Just the opposite: The occurrence of a double-letter bigram in an enciphered word puzzle is quite fortunate. Certain double […]

Read more »

Create discrete heat maps in SAS/IML

September 29, 2014
By
Create discrete heat maps in SAS/IML

In a previous article I introduced the HEATMAPCONT subroutine in SAS/IML 13.1, which makes it easy to visualize matrices by using heat maps with continuous color ramps. This article introduces a companion subroutine. The HEATMAPDISC subroutine, which also requires SAS/IML 13.1, is designed to visualize matrices that have a small […]

Read more »

The frequency of bigrams in an English corpus

September 26, 2014
By
The frequency of bigrams in an English corpus

In last week's article about the distribution of letters in an English corpus, I presented research results by Peter Norvig who used Google's digitized library and tabulated the frequency of each letter. Norvig also tabulated the frequency of bigrams, which are pairs of letters that appear consecutively within a word. […]

Read more »


Subscribe

Email:

  Subscribe