Posts Tagged ‘ Tutorials ’

The Zero Bug

February 21, 2017
By
The Zero Bug

I am going to write about an insidious statistical, data analysis, and presentation fallacy I call “the zero bug” and the habits you need to cultivate to avoid it. The zero bug Here is the zero bug in a nutshell: common data aggregation tools often can not “count to zero” from examples, and this causes … Continue reading The Zero Bug

Read more »

Evolving R Tools and Practices

February 6, 2017
By
Evolving R Tools and Practices

One of the distinctive features of the R platform is how explicit and user controllable everything is. This allows the style of use of R to evolve fairly rapidly. I will discuss this and end with some new notations, methods, and tools I am nominating for inclusion into your view of the evolving “current best … Continue reading Evolving R Tools and Practices

Read more »

Using the Bizarro Pipe to Debug magrittr Pipelines in R

January 30, 2017
By
Using the Bizarro Pipe to Debug magrittr Pipelines in R

I have just finished and released a free new R video lecture demonstrating how to use the “Bizarro pipe” to debug magrittr pipelines. I think R dplyr users will really enjoy it. Please read on for the link to the video lecture. In this video lecture I use the “Bizarro pipe” to debug the example … Continue reading Using the Bizarro Pipe to Debug magrittr Pipelines in R

Read more »

Upgrading to macOS Sierra (nee OSX) for R users

January 26, 2017
By
Upgrading to macOS Sierra (nee OSX) for R users

A good fraction of R users use Apple computers. Apple machines historically have sat at a sweet spot of convenience, power, and utility: Convenience: Apple machines are available at retail stores, come with purchasable support, and can run a lot of common commercial software. Power: R packages such as parallel and Rcpp work better on … Continue reading Upgrading to macOS Sierra (nee OSX) for R users

Read more »

Why do Decision Trees Work?

January 6, 2017
By
Why do Decision Trees Work?

In this article we will discuss the machine learning method called “decision trees”, moving quickly over the usual “how decision trees work” and spending time on “why decision trees work.” We will write from a computational learning theory perspective, and hope this helps make both decision trees and computational learning theory more comprehensible. The goal … Continue reading Why do Decision Trees Work?

Read more »

A Theory of Nested Cross Simulation

January 2, 2017
By
A Theory of Nested Cross Simulation

[Reader’s Note. Some of our articles are applied and some of our articles are more theoretical. The following article is more theoretical, and requires fairly formal notation to even work through. However, it should be of interest as it touches on some of the fine points of cross-validation that are quite hard to perceive or … Continue reading A Theory of Nested Cross Simulation

Read more »

Comparative examples using replyr::let

December 22, 2016
By
Comparative examples using replyr::let

Consider the problem of “parametric programming” in R. That is: simply writing correct code before knowing some details, such as the names of the columns your procedure will have to be applied to in the future. Our latest version of replyr::let makes such programming easier. Archie’s Mechanics #2 (1954) copyright Archie Publications (edit: great news! … Continue reading Comparative examples using replyr::let

Read more »

Be careful evaluating model predictions

December 3, 2016
By
Be careful evaluating model predictions

One thing I teach is: when evaluating the performance of regression models you should not use correlation as your score. This is because correlation tells you if a re-scaling of your result is useful, but you want to know if the result in your hand is in fact useful. For example: the Mars Climate Orbiter … Continue reading Be careful evaluating model predictions

Read more »

A quick look at RStudio’s R notebooks

October 22, 2016
By

A quick demo of RStudio’s R Notebooks shown by John Mount (of Win-Vector LLC, a statistics, data science, and algorithms consulting and training firm). (link) It looks like some of the new in-line display behavior is back-ported to R Markdown and some of the difference is the delayed running and different level of interactivity in … Continue reading A quick look at RStudio’s R notebooks

Read more »

Data science for executives and managers

October 22, 2016
By

Nina Zumel recently announced upcoming speaking appearances. I want to promote the upcoming sessions at ODSC West 2016 (11:15am-1:00pm on Friday November 4th, or 3:00pm-4:30pm on Saturday November 5th) and invite executives, managers, and other data science consumers to attend. We assume most of the Win-Vector blog audience is made of practitioners (who we hope … Continue reading Data science for executives and managers

Read more »


Subscribe

Email:

  Subscribe