Category: Practical Data Science

Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter

We are excited to share a free extract of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019: Evaluating a Classification Model with a Spam Filter. This section reflects an important design decision in the book: teach model evaluation first, and as a step separate from model construction. It is funny, but it … Continue reading Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter

vtreat Cross Validation

Nina Zumel finished new documentation on how vtreat‘s cross validation works, which I want to share here. vtreat is a system that makes data preparation for machine learning a “one-liner” (available in R or available in Python). We have a set of starting off points here. These documents describe what vtreat does for you, you … Continue reading vtreat Cross Validation

New vtreat Documentation (Starting with Multinomial Classification)

Nina Zumel finished some great new documentation showing how to use Python vtreat to prepare data for multinomial classification mode. And I have finally finished porting the documentation to R vtreat. So we now have good introductions on how to use vtreat to prepare data for the common tasks of: Regression: R regression example, Python … Continue reading New vtreat Documentation (Starting with Multinomial Classification)

How to Prepare Data

Real world data can present a number of challenges to data science workflows. Even properly structured data (each interesting measurement already landed in distinct columns), can present problems, such as missing values and high cardinality categorical variables. In this note we describe some great tools for working with such data. For an example: consider the … Continue reading How to Prepare Data

WVPlots 1.1.2 on CRAN

I have put a new release of the WVPlots package up on CRAN. This release adds palette and/or color controls to most of the plotting functions in the package. WVPlots was originally a catch-all package of ggplot2 visualizations that we at Win-Vector tended to use repeatedly, and wanted to turn into “one-liners.” A consequence of … Continue reading WVPlots 1.1.2 on CRAN

Free Video Lecture: Vectors for Programmers and Data Scientists

We have just released two new free video lectures on vectors from a programmer’s point of view. I am experimenting with what ideas do programmers find interesting about vectors, what concepts do they consider safe starting points, and how to condense and present the material. Please check the lectures out. Vectors for Programmers and Data … Continue reading Free Video Lecture: Vectors for Programmers and Data Scientists