# Author: Nina Zumel

## An Ad-hoc Method for Calibrating Uncalibrated Models

In the previous article in this series, we showed that common ensemble models like random forest and gradient boosting are uncalibrated: they are not guaranteed to estimate aggregates or rollups of the data in an unbiased way. However, they can be preferable to calibrated models such as linear or generalized linear regression, when they make … Continue reading An Ad-hoc Method for Calibrating Uncalibrated Models

## Common Ensemble Models can be Biased

In our previous article , we showed that generalized linear models are unbiased, or calibrated: they preserve the conditional expectations and rollups of the training data. A calibrated model is important in many applications, particularly when financial data is involved. However, when making predictions on individuals, a biased model may be preferable; biased models may … Continue reading Common Ensemble Models can be Biased

## Link Functions versus Data Transforms

In the linear regression section of our book Practical Data Science in R, we use the example of predicting income from a number of demographic variables (age, sex, education and employment type). In the text, we choose to regress against log10(income) rather than directly against income. One obvious reason for not regressing directly against income … Continue reading Link Functions versus Data Transforms

## Link Functions versus Data Transforms

In the linear regression section of our book Practical Data Science in R, we use the example of predicting income from a number of demographic variables (age, sex, education and employment type). In the text, we choose to regress against log10(income) rather than directly against income. One obvious reason for not regressing directly against income … Continue reading Link Functions versus Data Transforms

## Cohen’s D for Experimental Planning

In this note, we discuss the use of Cohen’s D for planning difference-of-mean experiments. Estimating sample size Let’s imagine you are testing a new weight loss program and comparing it so some existing weight loss regimen. You want to run an experiment to determine if the new program is more effective than the old one. … Continue reading Cohen’s D for Experimental Planning

## PDSwR2: New Chapters!

We have two new chapters of Practical Data Science with R, Second Edition online and available for review! The newly available chapters cover: Data Engineering And Data Shaping – Explores how to use R to organize or wrangle data into a shape useful for analysis. The chapter covers applying data transforms, data manipulation packages, and … Continue reading PDSwR2: New Chapters!