Blog Archives

Finding the closest pair in datat using PROC MODECLUS

May 9, 2013
By
Finding the closest pair in datat using PROC MODECLUS

    UPDATE: Rick Wicklin kindly shared his visualization efforts on the output to put a more straightforward sense on the results. Thanks. Here is the code, run after my code below. Note that this is designed for K=2.   proc iml...

Read more »

Large Scale Linear Mixed Model

March 26, 2013
By
Large Scale Linear Mixed Model

Update at the end: ****************************; Bob at r4stats.com claimed that a linear mixed model with over 5 million observations and 2 million levels of random effects was fit using lme4 package in R: I am always interested in large scale mi...

Read more »

Poor man’s HPQLIM?

February 27, 2013
By
Poor man’s HPQLIM?

Tobit model is a type of censored regression and is one of the most important regression models you will encounter in business. Amemiya 1984 classified Tobit models into 5 categories and interested reader can refer to SAS online doc for details. In SAS...

Read more »

Kaggle Digit Recoginizer: SAS k-Nearest Neighbor solution

December 11, 2012
By
Kaggle Digit Recoginizer: SAS k-Nearest Neighbor solution

Kaggle is hosting an educational data mining competition: Kaggle Digit Recognizer, using MNIST data. Handwritten digit recognition is one of the few applications that kNN classifier performs well. Of course, the benchmark kNN classifier provided ...

Read more »

KNN Classification and Regression in SAS

November 25, 2012
By
KNN Classification and Regression in SAS

PDF available at here. Related post on KNN classification using SAS is here. In data mining and predictive modeling, it refers to a memory-based (or instance-based) algorithm for classification and regression problems. It is a widely used algorithm wi...

Read more »

Finite Mixture Model for Loss Given Default (LGD)

October 4, 2012
By
Finite Mixture Model for Loss Given Default (LGD)

Loss Given Default (LGD) is a key business metric of risk in financial service. One unique feature of this metric is overdispersion and the other is multi-mode. Finite mixture model is an effective way to accommodate both. Multi-mode refers to the ca...

Read more »

SAS functions for computing parameters in Erlang-C model

July 12, 2012
By
SAS functions for computing parameters in Erlang-C model

Call center management is both Arts and Sciences. While driving moral and setting up strategies is more about Arts, staffing and servicing level configuration based on call load is in the domain of Sciences. The science part of call center management ...

Read more »

Stochastic Gradient Decending Logistic Regression in SAS

May 24, 2012
By
Stochastic Gradient Decending Logistic Regression in SAS

Test the Stochastic Gradient Decending Logistic Regression in SAS. The logic and code follows the code piece of Ravi Varadhan, Ph.D from this discussion of R Help. The blog SAS Die Hard also has a post about SGD Logistic Regression in SAS. filenam...

Read more »

Multi-Threaded Principle Component Analysis

February 1, 2012
By
Multi-Threaded Principle Component Analysis

SAS used to not support multithreading in PCA, then I figured out that its server version supports this functionality, see here. Today, I found this mutlithreading capability is finally available in PC SAS v9.22. The figure above indicates that...

Read more »

Random Number Seeds: NOT only the first one matters!

January 30, 2012
By
Random Number Seeds: NOT only the first one matters!

Today, Rick (blog @ here) wrote an article about random number seed in SAS to be used in random number functions in DATA Step. Rick noticed when multiple random number functions are called using different seeds, only the first one matters. Th...

Read more »

Subscribe

Email:

  Subscribe