It's the start of a new year. Have you made a resolution to be a better data analyst? A better SAS statistical programmer? To learn more about multivariate statistics? What better way to start the New Year than to read (or re-read!) the top 12 articles for statistical programmers from my blog in 2012. Each contains tips and techniques to make you a better programmer. (Sorry, but I can't promise that they will help you to lose weight!)
I've organized the 12 tips into four categories: multivariate statistics, simulation, matrix computations, and data analysis. Each of these categories is an essential area of knowledge for statistical programmers. The articles that made the Top 12 lists were among my most popular blog posts of 2012.
Multivariate data are often correlated. Therefore multivariate analysis, simulation, and outlier detection must account for correlation. These articles describe techniques for understanding and analyzing correlated data:
- What is Mahalanobis distance?: I describe the geometry of Mahalanobis distance, which provides a way to measure distances that takes into account correlations in the data.
- Use the Cholesky transformation to correlate and uncorrelate variables: The Cholesky matrix and other square root matrices are essential for understanding the role of correlation in multivariate analysis and simulation.
- How to compute Mahalanobis distance in SAS: A follow-up article. After you understand Mahalanobis distance, the logical question to ask is "How can I compute it in SAS?"
- Detecting outliers in SAS: Multivariate location and scatter: This article describes ways to use SAS software to find multivariate outliers.
- Testing data for multivariate normality: How can you test whether multivariate data are normally distributed? SAS software provides several options.
I've written many article on simulation, but these two articles describe how to implement efficient simulation algorithms in SAS:
- Eight tips to make your simulation run faster: As the title suggests, there are some simple things you can do to improve the performance of a simulation.
- Simulation in SAS: The slow way or the BY way: The SAS macro language is good for many things, but don't use macro loops to naively implement a simulation! This article discusses the dangers of using the macro loops for simulation and presents a more efficient alternative.
The SAS/IML language makes it easy to compute with matrices and vectors, and to compute quantities such as eigenvalues:
- The power method: compute only the largest eigenvalue of a matrix: This technique is very useful when you only need the largest eigenvalue of a matrix. And it is simple: it requires only being able to multiply a matrix and a vector!
- The curious case of random eigenvalues: This article describes the distribution of eigenvalues of random matrices. The result is c'est magnifique!
Data analysis requires knowledge of statistics, software, programming, and a lot of common sense. No wonder the "data scientist" is the current hot job!
- Fitting a Poisson distribution to data in SAS: Some people ask why the UNIVARIATE procedure doesn't support fitting a Poisson distribution. It's because the Poisson distribution is discrete, whereas the UNIVARIATE procedure fits continuous distributions. To fit Poisson data, use PROC GENMOD.
- Compute a running mean and variance: In matrix-vector languages, it is important to vectorize computations to maximize the efficiency of your program. This program describes a vectorized algorithm for computing the running mean and the running variance.
- For each observation, find the variable that contains the minimum value: In SAS software, there are usually many ways to compute a quantity. This article describes how to carry out a common task by using PROC IML, PROC SQL, and the DATA step.
What articles will be popular in 2013? I don't know, but I am committed to bringing you efficient tips and techniques for statistical programming, statistical graphics, and data analysis in SAS. Subscribe to this blog so that you don't miss a single article!
Please comment on the article here: The DO Loop