Blog Archives

Automatic bias correction doesn’t fix omitted variable bias

July 8, 2014
By
Automatic bias correction doesn’t fix omitted variable bias

Page 94 of Gelman, Carlin, Stern, Dunson, Vehtari, Rubin “Bayesian Data Analysis” 3rd Edition (which we will call BDA3) provides a great example of what happens when common broad frequentist bias criticisms are over-applied to predictions from ordinary linear regression: the predictions appear to fall apart. BDA3 goes on to exhibit what might be considered […] Related posts: Frequentist inference only seems easy Six Fundamental Methods to Generate a Random…

Read more »

Frequentist inference only seems easy

July 1, 2014
By
Frequentist inference only seems easy

Two of the most common methods of statistical inference are frequentism and Bayesianism (see Bayesian and Frequentist Approaches: Ask the Right Question for some good discussion). In both cases we are attempting to perform reliable inference of unknown quantities from related observations. And in both cases inference is made possible by introducing and reasoning over […] Related posts: Bayesian and Frequentist Approaches: Ask the Right Question Automatic bias correction doesn’t…

Read more »

R minitip: don’t use data.matrix when you mean model.matrix

June 10, 2014
By
R minitip: don’t use data.matrix when you mean model.matrix

A quick R mini-tip: don’t use data.matrix when you mean model.matrix. If you do so you may lose (without noticing) a lot of your model’s explanatory power (due to poor encoding). For some modeling tasks you end up having to prepare a special expanded data matrix before calling a given machine learning algorithm. For example […] Related posts: Level fit summaries can be tricky in R Vtreat: designing a package…

Read more »

R style tip: prefer functions that return data frames

June 6, 2014
By
R style tip: prefer functions that return data frames

While following up on Nina Zumel’s excellent Trimming the Fat from glm() Models in R I got to thinking about code style in R. And I realized: you can make your code much prettier by designing more of your functions to return data.frames. That may seem needlessly heavy-weight, but it has a lot of down-stream […] Related posts: Prefer = for assignment in R Your Data is Never the Right…

Read more »

Skimming statistics papers for the ideas (instead of the complete procedures)

June 2, 2014
By
Skimming statistics papers for the ideas (instead of the complete procedures)

Been reading a lot of Gelman, Carlin, Stern, Dunson, Vehtari, Rubin “Bayesian Data Analysis” 3rd edition lately. Overall in the Bayesian framework some ideas (such as regularization, and imputation) are way easier to justify (though calculating some seemingly basic quantities becomes tedious). A big advantage (and weakness) of this formulation is statistics has a much […] Related posts: Checking claims in published statistics papers Data Science, Machine Learning, and Statistics:…

Read more »

How does Practical Data Science with R stand out?

June 2, 2014
By
How does Practical Data Science with R stand out?

There are a lot of good books on statistics, machine learning, analytics, and R. So it is valid to ask: how does Practical Data Science with R stand out? Why should a data scientist or an aspiring data scientist buy it? We admit, it isn’t the only book we own. Some relevant books from the […] Related posts: A bit of the agenda of Practical Data Science with R Data…

Read more »

Save 45% on Practical Data Science with R (expires May 21, 2014)

May 16, 2014
By
Save 45% on Practical Data Science with R (expires May 21, 2014)

Please share this generous deal from Manning publications: save 45% on Practical Data Science with R through May 21, 2014. Please tweet, forward and share! Edit: we are going to try and keep the current best deals on the book at the bottom of the Practical Data Science with R page. So look there for […] Related posts: A bit of the agenda of Practical Data Science with R Data…

Read more »

R has some sharp corners

May 16, 2014
By
R has some sharp corners

R is definitely our first choice go-to analysis system. In our opinion you really shouldn’t use something else until you have an articulated reason (be it a need for larger data scale, different programming language, better data source integration, or something else). The advantages of R are numerous: Single integrated work environment. Powerful unified scripting/programming […] Related posts: R minitip: don’t use data.matrix when you mean model.matrix Survive R Why…

Read more »

A clear picture of power and significance in A/B tests

May 3, 2014
By
A clear picture of power and significance in A/B tests

A/B tests are one of the simplest reliable experimental designs. Controlled experiments embody the best scientific design for establishing a causal relationship between changes and their influence on user-observable behavior. “Practical guide to controlled experiments on the web: listen to your customers not to the HIPPO” Ron Kohavi, Randal M Henne, and Dan Sommerfield, Proceedings […] Related posts: Bandit Formulations for A/B Tests: Some Intuition Sample size and power for…

Read more »

A bit of the agenda of Practical Data Science with R

May 1, 2014
By
A bit of the agenda of Practical Data Science with R

The goal of Zumel/Mount: Practical Data Science with R is to teach, through guided practice, the skills of a data scientist. We define a data scientist as the person who organizes client input, data, infrastructure, statistics, mathematics and machine learning to deploy useful predictive models into production. Our plan to teach is to: Order the […] Related posts: How does Practical Data Science with R stand out? Data Science, Machine…

Read more »


Subscribe

Email:

  Subscribe