Blog Archives

Bayesian and Frequentist Approaches: Ask the Right Question

May 6, 2013
By
Bayesian and Frequentist Approaches: Ask the Right Question

It occurred to us recently that we don’t have any articles about Bayesian approaches to statistics here. I’m not going to get into the “Bayesian versus Frequentist” war; in my opinion, which style of approach to use is less about philosophy, and more about figuring out the best way to answer a question. Once you [...] Related posts: Statistics to English Translation, Part 2a: ’Significant’ Doesn’t Always Mean ’Important’ Worry…

Read more »

Revisiting Cleveland’s The Elements of Graphing Data in ggplot2

February 18, 2013
By
Revisiting Cleveland’s The Elements of Graphing Data in ggplot2

I was flipping through my copy of William Cleveland’s The Elements of Graphing Data the other day; it’s a book worth revisiting. I’ve always liked Cleveland’s approach to visualization as statistical analysis. His quest to ground visualization principles in the context of human visual cognition (he called it “graphical perception”) generated useful advice for designing [...] Related posts: Good Graphs: Graphical Perception and Data Visualization Your Data is Never the…

Read more »

Error Handling in R

October 9, 2012
By
Error Handling in R

It’s often the case that I want to write an R script that loops over multiple datasets, or different subsets of a large dataset, running the same procedure over them: generating plots, or fitting a model, perhaps. I set the script running and turn to another task, only to come back later and find the [...] Related posts: R annoyances Your Data is Never the Right Shape Survive R

Read more »

On Being a Data Scientist

September 20, 2012
By
On Being a Data Scientist

When people ask me what it means to be a data scientist, I used to answer, “it means you don’t have to hold my hand.” By which I meant that as a data scientist (a consulting data scientist), I can handle the data collection, the data cleaning and wrangling, the analysis, and the final presentation [...] Related posts: Book Review: Ensemble Methods in Data Mining (Seni & Elder) Setting expectations…

Read more »

On Writing Technical Articles for the Nonspecialist

September 5, 2012
By
On Writing Technical Articles for the Nonspecialist

This was originally posted at ninazumel.com. I’m re-blogging it here. Photo: John Mount I came across a post from Emily Willingham the other day: “Is a PhD required for Good Science Writing?”. As a science writer with a science PhD, her answer is: is it not required, and it can often be an impediment. I [...] Related posts: What does a generalized linear model do? Kernel Methods and Support Vector…

Read more »

Modeling Trick: Impact Coding of Categorical Variables with Many Levels

July 23, 2012
By
Modeling Trick: Impact Coding of Categorical Variables with Many Levels

One of the shortcomings of regression (both linear and logistic) is that it doesn’t handle categorical variables with a very large number of possible values (for example, postal codes). You can get around this, of course, by going to another modeling technique, such as Naive Bayes; however, you lose some of the advantages of regression [...] Related posts: Modeling Trick: Masked Variables A bit more on impact coding Modeling Trick:…

Read more »

My Favorite Graphs

December 6, 2011
By
My Favorite Graphs

The important criterion for a graph is not simply how fast we can see a result; rather it is whether through the use of the graph we can see something that would have been harder to see otherwise or that could not have been seen at all. – William Cleveland, The Elements of Graphing Data, [...] Related posts: The cranky guide to trying R packages Good Graphs: Graphical Perception and…

Read more »

Correlation and R-Squared

November 22, 2011
By
Correlation and R-Squared

What is R2? In the context of predictive models (usually linear regression), where y is the true outcome, and f is the model’s prediction, the definition that I see most often is: In words, R2 is a measure of how much of the variance in y is explained by the model, f. Under “general conditions”, [...] Related posts: The Simpler Derivation of Logistic Regression Living in A Lognormal World “I…

Read more »

The Simpler Derivation of Logistic Regression

September 14, 2011
By
The Simpler Derivation of Logistic Regression

Logistic regression is one of the most popular ways to fit models for categorical data, especially for binary response data. It is the most important (and probably most used) member of a class of models called generalized linear models. Unlike linear regression, logistic regression can directly predict probabilities (values that are restricted to the (0,1) [...] Related posts: The equivalence of logistic regression and maximum entropy models Learn Logistic Regression…

Read more »

Subscribe

Email:

  Subscribe