Posts Tagged ‘ Tutorials ’

Why do Decision Trees Work?

January 6, 2017
By
Why do Decision Trees Work?

In this article we will discuss the machine learning method called “decision trees”, moving quickly over the usual “how decision trees work” and spending time on “why decision trees work.” We will write from a computational learning theory perspective, and hope this helps make both decision trees and computational learning theory more comprehensible. The goal … Continue reading Why do Decision Trees Work?

Read more »

A Theory of Nested Cross Simulation

January 2, 2017
By
A Theory of Nested Cross Simulation

[Reader’s Note. Some of our articles are applied and some of our articles are more theoretical. The following article is more theoretical, and requires fairly formal notation to even work through. However, it should be of interest as it touches on some of the fine points of cross-validation that are quite hard to perceive or … Continue reading A Theory of Nested Cross Simulation

Read more »

Comparative examples using replyr::let

December 22, 2016
By
Comparative examples using replyr::let

Consider the problem of “parametric programming” in R. That is: simply writing correct code before knowing some details, such as the names of the columns your procedure will have to be applied to in the future. Our latest version of replyr::let makes such programming easier. Archie’s Mechanics #2 (1954) copyright Archie Publications (edit: great news! … Continue reading Comparative examples using replyr::let

Read more »

Be careful evaluating model predictions

December 3, 2016
By
Be careful evaluating model predictions

One thing I teach is: when evaluating the performance of regression models you should not use correlation as your score. This is because correlation tells you if a re-scaling of your result is useful, but you want to know if the result in your hand is in fact useful. For example: the Mars Climate Orbiter … Continue reading Be careful evaluating model predictions

Read more »

A quick look at RStudio’s R notebooks

October 22, 2016
By

A quick demo of RStudio’s R Notebooks shown by John Mount (of Win-Vector LLC, a statistics, data science, and algorithms consulting and training firm). (link) It looks like some of the new in-line display behavior is back-ported to R Markdown and some of the difference is the delayed running and different level of interactivity in … Continue reading A quick look at RStudio’s R notebooks

Read more »

Data science for executives and managers

October 22, 2016
By

Nina Zumel recently announced upcoming speaking appearances. I want to promote the upcoming sessions at ODSC West 2016 (11:15am-1:00pm on Friday November 4th, or 3:00pm-4:30pm on Saturday November 5th) and invite executives, managers, and other data science consumers to attend. We assume most of the Win-Vector blog audience is made of practitioners (who we hope … Continue reading Data science for executives and managers

Read more »

Upcoming Talks

October 17, 2016
By

I (Nina Zumel) will be speaking at the Women who Code Silicon Valley meetup on Thursday, October 27. The talk is called Improving Prediction using Nested Models and Simulated Out-of-Sample Data. In this talk I will discuss nested predictive models. These are models that predict an outcome or dependent variable (called y) using additional submodels … Continue reading Upcoming Talks

Read more »

The unfortunate one-sided logic of empirical hypothesis testing

October 17, 2016
By

I’ve been thinking a bit on statistical tests, their absence, abuse, and limits. I think much of the current “scientific replication crisis” stems from the fallacy that “failing to fail” is the same as success (in addition to the forces of bad luck, limited research budgets, statistical naiveté, sloppiness, pride, greed and other human qualities … Continue reading The unfortunate one-sided logic of empirical hypothesis testing

Read more »

On calculating AUC

October 7, 2016
By
On calculating AUC

Recently Microsoft Data Scientist Bob Horton wrote a very nice article on ROC plots. We expand on this a bit and discuss some of the issues in computing “area under the curve” (AUC). R has a number of ROC/AUC packages; for example ROCR, pROC, and plotROC. But it is instructive to see how ROC plots … Continue reading On calculating AUC

Read more »

Adding polished significance summaries to papers using R

October 4, 2016
By

When we teach “R for statistics” to groups of scientists (who tend to be quite well informed in statistics, and just need a bit of help with R) we take the time to re-work some tests of model quality with the appropriate significance tests. We organize the lesson in terms of a larger and more … Continue reading Adding polished significance summaries to papers using R

Read more »


Subscribe

Email:

  Subscribe