Posts Tagged ‘ Significance ’

Variables can synergize, even in a linear model

September 1, 2016
By

Introduction Suppose we have the task of predicting an outcome y given a number of variables v1,..,vk. We often want to “prune variables” or build models with fewer than all the variables. This can be to speed up modeling, decrease the cost of producing future data, improve robustness, improve explain-ability, even reduce over-fit, and improve … Continue reading Variables can synergize, even in a linear model

Read more »

Variable pruning is NP hard

August 28, 2016
By

I am working on some practical articles on variable selection, especially in the context of step-wise linear regression and logistic regression. One thing I noticed while preparing some examples is that summaries such as model quality (especially out of sample quality) and variable significances are not quite as simple as one would hope (they in … Continue reading Variable pruning is NP hard

Read more »

a Simpson paradox of sorts

May 5, 2016
By
a Simpson paradox of sorts

The riddle from The Riddler this week is about finding an undirected graph with N nodes and no isolated node such that the number of nodes with more connections than the average of their neighbours is maximal. A representation of a connected graph is through a matrix X of zeros and ones, on which one […]

Read more »

The failure to replicate scientific findings

January 19, 2016
By

Andrew Gelman and I have published a piece in Slate, discussing the failure to replicate scientific findings, using the recent example of the so-called power pose. The idea of the "power pose" is that people develop psychological and hormonal changes by making this "power pose" before walking into business meetings, whereupon these changes make them more powerful. As you often read here and at Gelman's blog, the fact that someone…

Read more »

Statbusters: standing may or may not stand a chance

December 7, 2015
By

In our latest Statbusters column for the Daily Beast, we read the research behind the claim that "standing reduces odds of obesity". Especially at younger companies, it is trendy to work at standing desks because of findings like this. We find a variety of statistical issues calling for better studies. For example, the observational dataset used provides no clue as to whether sitting causes obesity or obesity leads to more…

Read more »

Statbusters: Games people play with the placebo effect

November 3, 2015
By

In the first two chapters of Numbersense, I discuss how people game statistics, and why gaming is inevitable. I have also written about the placebo effect before. Another article has appeared covering the same topic -- the industry doesn't like the fact that more and more drugs fail to clear the "placebo" hurdle; and the industry thinks the problem is that the placebo effect is mysteriously increasing over time. What…

Read more »

Keep eating those sausages

November 2, 2015
By

I am outsourcing this post to Aaron Carroll, whose Upshot column eviscerates the recent claim that eating meat will give you cancer, or that eating meat is the same as smoking cigarettes. While the media is partly culpable for spreading misinformation...

Read more »

Statbusters: What the experiments on rigging elections via Google tell us

September 21, 2015
By

For this week's Statbusters (link), we opine on that astounding report from a few weeks ago about how Google could manipulate the next elections by biasing search results. We walk you through our vetting process, starting with face validity ("the magnitude of the reported effect is too large to be believed!"). The crux of the article is about the experimental design. You start with a group of people who have…

Read more »

How Do You Know if Your Data Has Signal?

August 10, 2015
By
How Do You Know if Your Data Has Signal?

Image by Liz Sullivan, Creative Commons. Source: Wikimedia An all too common approach to modeling in data science is to throw all possible variables at a modeling procedure and “let the algorithm sort it out.” This is tempting when you are not sure what are the true causes or predictors of the phenomenon you are … Continue reading How Do You Know if Your Data Has Signal?

Read more »

Statistically significant. What does it mean?

July 22, 2015
By

Andrew Gelman has a great post about the concept of statistical significance, starting with a published definition by the Department of Health that is technically wrong on many levels. (link) Statistical significance is one of the most important concepts in statistics. In recent years, there is a vocal group who claims this idea is misguided and/or useless. But what they are angry about is the use (and frequently, mis-use) of…

Read more »


Subscribe

Email:

  Subscribe