Confusing Stats Terms Explained: Standard Deviation

August 1, 2010
By
Confusing Stats Terms Explained: Standard Deviation

Most people find statistics to be complicated, confusing, and just generally frustrating. One of the biggest causes of confusion is the complicated vocabulary that is associated with stats. Frankly, it sometimes seems that stats terms were made to be intentionally complicated. In fact, some concepts seem perfectly understandable when described inplain English, but seem incomprehensible when described in stats lingo.With this in mind, I decided to compile a list of…

Read more »

An Economic Approach for a Class of Dimensionality Reduction Techniques

July 30, 2010
By
An Economic Approach for a Class of Dimensionality Reduction Techniques

Just back from KDD2010. In the conference, there are several papers that interested me. On the computation side, Liang Sun et al.'s paper [1], "A Scalable Two-Stage Approach for a Class of Dimensionality Reduction Techniques" caught my eyes. Liang pro...

Read more »

July in Paris

July 29, 2010
By
July in Paris

One of the best things of spending summer in Paris: its parcs (here, with friends @ Parc Montsouris).

Read more »

Top Ten Tips for Data Analysis to Make Your Research Life Easier!

July 27, 2010
By
Top Ten Tips for Data Analysis to Make Your Research Life Easier!

While there is no "magic bullet" to make stats and data analysis easy to understand and helpful in our research, there are some things that you can do to avoid pitfalls and help things run smoothly. This "top ten" list offers a few of those things that I think you will find helpful! I'll be posting a video of this list later today on my Stats Videos page.

Read more »

Top Ten Tips for Data Analysis to Make Your Research Life Easier!

July 27, 2010
By
Top Ten Tips for Data Analysis to Make Your Research Life Easier!

W hile there is no "magic bullet" to make stats and data analysis easy to understand and helpful in our research, there are some things that you can do to avoid pitfalls and help things run smoothly. This "top ten" list offers a few of those things that I think you will find helpful! I'll be posting a video of this list later today on my Stats Videos page.10. Look…

Read more »

R Cheat Sheets and more

July 21, 2010
By
R Cheat Sheets and more

Here you can find a collection of cheat sheets useful to R developers.Visit the devcheatsheet homepage to inspect cheat sheets and quick reference card for other programming languages and applications.

Read more »

Implement Randomized SVD in SAS

July 13, 2010
By
Implement Randomized SVD in SAS

In the 2010 SASware Ballot®, a dedicated PROC for Randomized SVD was among the options. While an official SAS PROC will not be available in the immediate future as well as in older SAS releases, it is fairly simple to implement this algorithm using ex...

Read more »

Within-Subject and Between-Subject Effects: Wanting Ice Cream Today, Tomorrow, and The Next Day…

July 10, 2010
By
Within-Subject and Between-Subject Effects: Wanting Ice Cream Today, Tomorrow, and The Next Day…

The conceptual difference between within-subject and between-subject effects is something I am asked about quite often. So often in fact, I thought a blog posting was warranted! As a quick disclaimer, I know this is a complex issue and the description of what each type of effect actual is varies greatly based on the kind of analysis one is conducting. However, what follows is an attempt to provide a basic…

Read more »

Within-Subject and Between-Subject Effects: Wanting Ice Cream Today, Tomorrow, and The Next Day…

July 10, 2010
By
Within-Subject and Between-Subject Effects: Wanting Ice Cream Today, Tomorrow, and The Next Day…

The conceptual difference between within-subject and between-subject effects is something I am asked about quite often. So often in fact, I thought a blog posting was warranted! As a quick disclaimer, I know this is a complex issue and the description ...

Read more »

Bonferroni Correction In Regression: Fun To Say, Important To Do.

July 4, 2010
By
Bonferroni Correction In Regression: Fun To Say, Important To Do.

The Bonferroni correction is only one way to guard against the bias of repeated testing effects, but it is probably the most common method and it is definitely the most fun to say. I've come to consider it as critical to the accuracy of my analyses as selecting the correct type of analysis or entering the data accurately. Unfortunately adjustments for repeated testing of hypotheses, as a whole, remains something…

Read more »

Bonferroni Correction In Regression: Fun To Say, Important To Do…

July 4, 2010
By
Bonferroni Correction In Regression: Fun To Say, Important To Do…

The Bonferroni correction is only one way to guard against the bias of repeated testing effects, but it is probably the most common method and it is definitely the most fun to say. I've come to consider it as critical to the accuracy of my analyses as selecting the correct type of analysis or entering the data accurately. Unfortunately adjustments for repeated testing of hypotheses, as a whole, remains something…

Read more »

R Journal 2/1

June 30, 2010
By
R Journal 2/1

R Journal 2/1 is out! Grab it from here.

Read more »

"Entrywise" Norm calculation using PROC FASTCLUS

June 26, 2010
By
"Entrywise" Norm calculation using PROC FASTCLUS

In some data mining applications, matrix norm has to be calculated, for instance [1]. You can find a detailed explanation of Matrix Norm on Wiki @ Here Instead of user written routine in DATA STEP, we can obtain "Entrywise" norm via PROC FASTCLUS effi...

Read more »

Why is the area under the survival curve equal to the average tenure?

June 21, 2010
By
Why is the area under the survival curve equal to the average tenure?

Last week, a student in our Applying Survival Analysis to Business Time-to-Event Problems class asked this question. He made clear that he wasn't looking for a mathematical derivation, just an intuitive understanding. Even though I make use of this pro...

Read more »

Boost to tackle nonlinearity

June 1, 2010
By
Boost to tackle nonlinearity

data nonlinear; do x=1 to 627; p=(sin(x/100)+1)*0.45; do j=1 to 100; x1=x+(j-1)/100; if ranuni(8655645)<=p then y=1; else y=0; output; drop p j; end; end; run; proc rank data=nonlinear out=nonlinearrank groups=...

Read more »

Support Vector machines with custom kernels using scikits.learn

May 27, 2010
By

It is now possible (using the development version as of may 2010) to use Support Vector Machines with custom kernels in scikits.learn. How to use it couldn't be more simple: you just pass a callable (the kernel) to the class constructor). For example, ...

Read more »

Combining Empirical Hazards by the Naïve Bayesian Method

May 26, 2010
By
Combining Empirical Hazards by the Naïve Bayesian Method

Occasionally one has an idea that seems so obvious and right that it must surely be standard practice and have a well-known name. A few months ago, I had such an idea while sitting in a client’s office in Ottawa. Last week, I wanted to include the id...

Read more »

Introduction to using R in research

May 13, 2010
By
Introduction to using R in research

I was recently asked to give a talk to our graduate school annual conference. I offered several titles and the one they picked was Using R in research. I'm not sure if this was a good idea or not. The graduate school covers PhD students across three ar...

Read more »

K-Nearest Neighbor in SAS

May 5, 2010
By
K-Nearest Neighbor in SAS

K-Nearest-Neighbor, aka KNN, is a widely used data mining tool and is often called memory-based/case-based/instance-based method as no model is fit. A good introduction to KNN can be find at [1], or @ Wiki. Typically, KNN algorithm relies on a soph...

Read more »

Next Project: Regularized Logistic Regression

May 5, 2010
By
Next Project: Regularized Logistic Regression

L1 Regularized Logistic Regression effectively handles large number of predictors and serves variable selection simultaneously. [1] indicates that L1 RLR can be implemented via IRLS-LARS algorithm. You can tweak PROC GLMSELECT in v9.2 for this. L2 R...

Read more »

Put aside your fears and be wrong already!

May 2, 2010
By
Put aside your fears and be wrong already!

First of all, if your research progress is slowed by fear of statistics, your are certainly not alone. Being afraid to "mess-up" your stats, and thus your project, is a common lament. But I'm here to tell you that your project is not that fragile! Once your data is collected, entered, cleaned, and ready for analysis, it is time for excitement, not concern! The golden rule here is: BACK UP.…

Read more »

Put aside your fears and be wrong already!

May 2, 2010
By
Put aside your fears and be wrong already!

First of all, if your research progress is slowed by fear of statistics, your are certainly not alone. Being afraid to "mess-up" your stats, and thus your project, is a common lament. But I'm here to tell you that your project is not that fragile! Once your data is collected, entered, cleaned, and ready for analysis, it is time for excitement, not concern! The golden rule here is: BACK UP.…

Read more »

Trade Your Stats "Truths" for Stats Arguments…

May 2, 2010
By

Warning, this blog will be short, sweet, and a bit pithy. The two most common questions that I receive about statistical analyses, no matter what kind or purpose, is: "Am I doing it right?" or "Am I allowed to...(fill-in a variation of a common analysis here)?" My response to these questions is usually: "Sure, you can do whatever you want, but what will it mean if you do?" I've said…

Read more »


Subscribe

Email:

  Subscribe