Data Mining

Data mining blogs

KNIME is Gartners’ “Cool Vendor 2010″

August 19, 2010
By
KNIME is Gartners’ “Cool Vendor 2010″

KNIME has been selected to be the "Cool Vendor 2010" - by Gartner [...] Related posts: RapidMiner from Rapid-I at CeBIT 2010 RapidMiner is a Data Mining Suite of the German company... Data Mining in the KDD Environment Data Mining is only one step in discovering knowledge (KDD).... CeBIT 2010: Cloud Mining soon at SAS? SAS Institute Inc. At the CeBIT 2010 I visited...

Read more »

VARIMAX rotation of PLS loadings

August 9, 2010
By
VARIMAX rotation of PLS loadings

Partial Least Square is one of several supervised dimension reduction techniques and attracts attention in recent years. In the one hand, PLS is able to generate a series of scores that maximize linear correlation between dependent variables and indepe...

Read more »

Table Look Up in SAS, practical problems

August 4, 2010
By
Table Look Up in SAS, practical problems

One guy asked in a SAS forum about a typical table look up problem: He has a data with two IDs: id1 id2 a b a e b c b e c e d e and he wants to generate a new data set with the following structure according to above information : id a b c ...

Read more »

An Economic Approach for a Class of Dimensionality Reduction Techniques

July 30, 2010
By
An Economic Approach for a Class of Dimensionality Reduction Techniques

Just back from KDD2010. In the conference, there are several papers that interested me. On the computation side, Liang Sun et al.'s paper [1], "A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques" caught my eyes. Liang pro...

Read more »

Implement Randomized SVD in SAS

July 13, 2010
By
Implement Randomized SVD in SAS

In the 2010 SASware Ballot®, a dedicated PROC for Randomized SVD was among the options. While an official SAS PROC will not be available in the immediate future as well as in older SAS releases, it is fairly simple to implement this algorithm using ex...

Read more »

"Entrywise" Norm calculation using PROC FASTCLUS

June 26, 2010
By
"Entrywise" Norm calculation using PROC FASTCLUS

In some data mining applications, matrix norm has to be calculated, for instance [1]. You can find a detailed explanation of Matrix Norm on Wiki @ Here Instead of user written routine in DATA STEP, we can obtain "Entrywise" norm via PROC FASTCLUS effi...

Read more »

Why is the area under the survival curve equal to the average tenure?

June 21, 2010
By
Why is the area under the survival curve equal to the average tenure?

Last week, a student in our Applying Survival Analysis to Business Time-to-Event Problems class asked this question. He made clear that he wasn't looking for a mathematical derivation, just an intuitive understanding. Even though I make use of this pro...

Read more »

Boost to tackle nonlinearity

June 1, 2010
By
Boost to tackle nonlinearity

data nonlinear; do x=1 to 627; p=(sin(x/100)+1)*0.45; do j=1 to 100; x1=x+(j-1)/100; if ranuni(8655645)<=p then y=1; else y=0; output; drop p j; end; end; run; proc rank data=nonlinear out=nonlinearrank groups=...

Read more »

Combining Empirical Hazards by the Naïve Bayesian Method

May 26, 2010
By
Combining Empirical Hazards by the Naïve Bayesian Method

Occasionally one has an idea that seems so obvious and right that it must surely be standard practice and have a well-known name. A few months ago, I had such an idea while sitting in a client’s office in Ottawa. Last week, I wanted to include the id...

Read more »

K-Nearest Neighbor in SAS

May 5, 2010
By
K-Nearest Neighbor in SAS

K-Nearest-Neighbor, aka KNN, is a widely used data mining tool and is often called memory-based/case-based/instance-based method as no model is fit. A good introduction to KNN can be find at [1], or @ Wiki. Typically, KNN algorithm relies on a soph...

Read more »


Subscribe

Email:

  Subscribe