Boost to tackle nonlinearity

June 1, 2010
By
Boost to tackle nonlinearity

data nonlinear; do x=1 to 627; p=(sin(x/100)+1)*0.45; do j=1 to 100; x1=x+(j-1)/100; if ranuni(8655645)<=p then y=1; else y=0; output; drop p j; end; end; run; proc rank data=nonlinear out=nonlinearrank groups=...

Read more »

Support Vector machines with custom kernels using scikits.learn

May 27, 2010
By

It is now possible (using the development version as of may 2010) to use Support Vector Machines with custom kernels in scikits.learn. How to use it couldn't be more simple: you just pass a callable (the kernel) to the class constructor). For example, ...

Read more »

Combining Empirical Hazards by the Naïve Bayesian Method

May 26, 2010
By
Combining Empirical Hazards by the Naïve Bayesian Method

Occasionally one has an idea that seems so obvious and right that it must surely be standard practice and have a well-known name. A few months ago, I had such an idea while sitting in a client’s office in Ottawa. Last week, I wanted to include the id...

Read more »

Introduction to using R in research

May 13, 2010
By
Introduction to using R in research

I was recently asked to give a talk to our graduate school annual conference. I offered several titles and the one they picked was Using R in research. I'm not sure if this was a good idea or not. The graduate school covers PhD students across three ar...

Read more »

K-Nearest Neighbor in SAS

May 5, 2010
By
K-Nearest Neighbor in SAS

K-Nearest-Neighbor, aka KNN, is a widely used data mining tool and is often called memory-based/case-based/instance-based method as no model is fit. A good introduction to KNN can be find at [1], or @ Wiki. Typically, KNN algorithm relies on a soph...

Read more »

Next Project: Regularized Logistic Regression

May 5, 2010
By
Next Project: Regularized Logistic Regression

L1 Regularized Logistic Regression effectively handles large number of predictors and serves variable selection simultaneously. [1] indicates that L1 RLR can be implemented via IRLS-LARS algorithm. You can tweak PROC GLMSELECT in v9.2 for this. L2 R...

Read more »

Put aside your fears and be wrong already!

May 2, 2010
By
Put aside your fears and be wrong already!

First of all, if your research progress is slowed by fear of statistics, your are certainly not alone. Being afraid to "mess-up" your stats, and thus your project, is a common lament. But I'm here to tell you that your project is not that fragile! Once your data is collected, entered, cleaned, and ready for analysis, it is time for excitement, not concern! The golden rule here is: BACK UP.…

Read more »

Put aside your fears and be wrong already!

May 2, 2010
By
Put aside your fears and be wrong already!

First of all, if your research progress is slowed by fear of statistics, your are certainly not alone. Being afraid to "mess-up" your stats, and thus your project, is a common lament. But I'm here to tell you that your project is not that fragile! Once your data is collected, entered, cleaned, and ready for analysis, it is time for excitement, not concern! The golden rule here is: BACK UP.…

Read more »

Trade Your Stats "Truths" for Stats Arguments…

May 2, 2010
By

Warning, this blog will be short, sweet, and a bit pithy. The two most common questions that I receive about statistical analyses, no matter what kind or purpose, is: "Am I doing it right?" or "Am I allowed to...(fill-in a variation of a common analysis here)?" My response to these questions is usually: "Sure, you can do whatever you want, but what will it mean if you do?" I've said…

Read more »

Conduct R analysis within SAS

April 30, 2010
By
Conduct R analysis within SAS

R is attractive to statistical analysts for its ease of use and ready access of packages implementing modern methodologies. If you have IML, you can submit R commands within SAS/IML enviornment, see Rick's post @ here. Unfortunately, not all analyst...

Read more »

Data Transformations: statistical voodoo or truth serum for your data?

April 26, 2010
By

Anyone that has taken a statistics class has probably learned about transforming data, at one time or another (although you may be in denial about it). In short, you may want to transform your data if you need to perform a parametric analysis, but the inherent assumptions are violated in your dataset. While this seems simple enough, many researchers are hesitant to employ this tactic of handling non-normally distributed data.…

Read more »

Howto link against system-wide BLAS library using numpy.distutils

April 22, 2010
By

If your numpy installation uses system-wide BLAS libraries (this will most likely be the case unless you installed it through prebuilt windows binaries), you can retrieve this information at compile time to link python modules to BLAS. The function get...

Read more »

R 2.11.0 is released!

April 22, 2010
By
R 2.11.0 is released!

The new R 2.11.0 is out! Get it from here.Take a look at these posts for some miscellaneous advices to make the upgrade easier.Also this thread on stackoverflow can be of some value.Feel free to contribute with suggestions about how to upgrade your R i...

Read more »

Text 2.0: the text that knows it is read

April 15, 2010
By
Text 2.0: the text that knows it is read

Text 2.0 is the brand of a research company, that offers a complete new reading experience. By tracking the eyes movement, the text knows which part is read, and the text-publishing tool can [...] Related posts: Data Applied’s Cloud Mining with new functions Data Applied is one of the first Cloud Mining Companies...

Read more »

Easy Data Mining by drag’n’drop: FastStats Modelling

April 15, 2010
By
Easy Data Mining by drag’n’drop: FastStats Modelling

The FastStats Modelling Data Mining tool from Apteco offers a quick and easy drag'n'drop way to perform Data Mining. Hence the algorithms are kept simple, it works very [...] Related posts: Cloud Mining – CRM Data Mining in the Cloud Cloud Mining is an cost-reducing approach to let smaller companies... RapidMiner from Rapid-I at CeBIT 2010 RapidMiner is a Data Mining Suite of the German company...

Read more »

Data Mining Techniques now available in Korean

April 4, 2010
By
Data Mining Techniques now available in Korean

For any of our readers who have been wishing they could read our book Data Mining Techniques for Marketing, Sales, and Customer Relationship Management (2nd Edition) in Korean, now you can! We don't know why the cover pictures someone playing jacks, bu...

Read more »

Oracle Cloud Mining at Amazon AWS

March 24, 2010
By
Oracle Cloud Mining at Amazon AWS

Amazon AWS Cloud Mining with Oracles ODM is available on the Amazon Cloud since end of February 2010, as shown at Oracles Website. There is a pre-installed Oracle 11gR2 Database and sample datasets ready to use. Trying the Oracle 11gR2 Data Mining Amazon Machine Image (AMI), users can now launch an Oracle Cloud [...] Related posts: Cloud Mining – CRM Data Mining in the Cloud Cloud Mining is an cost-reducing…

Read more »

Data Applied’s Cloud Mining with new functions

March 24, 2010
By
Data Applied’s Cloud Mining with new functions

Data Applied is one of the first Cloud Mining Companies on the market. It now offers new Data Transformation [...] Related posts: Cloud Mining – CRM Data Mining in the Cloud Cloud Mining is an cost-reducing approach to let smaller companies... Oracle Cloud Mining at Amazon AWS Amazon AWS Cloud Mining with Oracles ODM is available... CeBIT 2010: Cloud Mining soon at SAS? SAS Institute Inc. At the CeBIT 2010…

Read more »

scikits.learn 0.2 release

March 22, 2010
By

Today I released a new version of the scikits.learn library for machine learning. This new release includes the new libsvm bindings, Jake VanderPlas' BallTree algorithm for *fast* nearest neighbor queries in high dimension, etc. Here is the official an...

Read more »

Balloon plot using ggplot2

March 19, 2010
By
Balloon plot using ggplot2

Following Tal Galili example and using part of his code, I want to plot the balloonplot you can see here using R and the excellent ggplot2 package by Hadley Wickham.### I retrieve the data from the google document you can find here using Tal Galili co...

Read more »

RapidMiner from Rapid-I at CeBIT 2010

March 18, 2010
By
RapidMiner from Rapid-I at CeBIT 2010

RapidMiner is a Data Mining Suite of the German company Rapid-I. I had the opportunity at the CeBIT to talk with the Co-Founder about his product and Cloud [...] Related posts: CeBIT 2010: Cloud Mining soon at SAS? SAS Institute Inc. At the CeBIT 2010 I visited... Cloud Mining – CRM Data Mining in the Cloud Cloud Mining is an cost-reducing approach to let smaller companies... KNIME is Gartners’ “Cool…

Read more »

Plot the maximum margin hyperplane with scikits.learn

March 17, 2010
By

Suppose some given data points each belong to one of two classes, and the goal is to decide which class a new data point will be in. In the case of support vector machines, a data point is viewed as a p-dimensional vector (2-dimensional in this example...

Read more »

Bitten by an Unfamiliar Form of Left Truncation

March 15, 2010
By

Alternate title: Data Mining Consultant with Egg on FaceLast week I made a client presentation. The project was complete. I was presenting the final results to the client.  The CEO was there. Also the CTO, the CFO, the VPs of Sales and Marketing, ...

Read more »


Subscribe

Email:

  Subscribe