C. Titus Brown C. Titus Brown is an assistant professor in the Department of Computer Science and Engineering at Michigan State University. He develops computational software for next generation sequencing and the author of the blog, “Livin...
A while ago I wrote this post with some R code to visualize the updating of a beta distribution as the outcome of Bernoulli trials are observed. The code provided a single plot of this process, with all the curves overlayed on top of one another. Then John Myles White (co-author of Machine Learning for
Egon Sharpe (E.S.) Pearson’s birthday was August 11. This slightly belated birthday discussion is directly connected to the question of the uses to which frequentist methods may be put in inquiry. Are they limited to supplying procedures which will not err too frequently in some vast long run? Or are these long run results of [...]
I came across a very interesting paper by E. J. Masicampo and Daniel Lalande called A peculiar prevalence of p values just below .05. The link is here. (I saw it referenced at marginal revolution. 1. Peculiar Prevalence I recommend reading the paper to get the full details. The quick summary is that they collected [...]
Michael McLaughlin sent me the following query with the above title. Some time ago, I [McLaughlin] was handed a dataset that needed to be modeled. It was generated as follows: 1. Random navigation errors, historically a binary mixture of normal and Laplace with a common mean, were collected by observation. 2. Sadly, these data were [...]
What does a generalized linear model do? R supplies a modeling function called glm() that fits generalized linear models (abbreviated as GLMs). A natural question is what does it do and what problem is it solving for you? We work some examples and place generalized linear models in context with other techniques.For predicting a categorical [...] Related posts: How robust is logistic regression? Modeling Trick: Impact Coding of Categorical Variables…
Larry Wasserman refers to finite mixture models as “beasts” and writes jokes that they “should be avoided at all costs.” I’ve thought a lot about mixture models, ever since using them in an analysis of voting patterns that was published in 1990. First off, I’d like to say that our model was useful so I’d [...]
Dealing with endogeneity in a binary dependent variable model requires more consideration than the simpler continuous dependent variable case. For some, the best approach to this problem is to use the same methodology used in the continuous case, i.e. 2 stage least squares. Thus, the equation of interest becomes a linear probability model (LPM). The […]
In an attempt to fix the problem of “unreal” results in science some have started a “reproducibility initiative”. Think of the incentive for being explicit about how the results were obtained the first time….But would researchers really pay to have their potential errors unearthed in this way? Even for a “good scientist” badge of approval? [...]
Statisticians have not always been great self-promoters. I think in part this comes from our tendency to be arbiters rather than being involved in the scientific process. In some ways, I think this is a good thing. Self-promotion can quickly become re...
A Brooks op-ed in the New York Times (circulation approximately 1.5 million): People at the extremes are happier than political moderates. . . . none, it seems, are happier than the Tea Partiers . . . Jay Livingston on his blog (circulation approximately 0 (rounding to the nearest million)), giving data from the 2009-2010 General [...]
There is a lot of confusion about storytelling and what tells a story. I have argued previously that stories do not tell themselves. Rather, we tell the stories given raw materials. Some of these materials lend themselves better to ad-hoc storytelling, so we tend to say that they actually tell the story, when it’s really us who do it. Exhibit A: Minard A particular example of the easy storytelling genre…
If you haven’t yet discovered the competitive machine learning site kaggle.com, please do so now. I’ll wait. Great – so, you checked it out, fell in love and have made it back. I recently downloaded the data for the getting started competition. It consists of 42000 labelled images (28×28) of hand written digits 0-9. The
Today I want to highlight a whitepaper about Adaptive Asset Allocation by Butler, Philbrick and Gordillo and the discussion by David Varadi on the robustness of parameters of the Adaptive Asset Allocation algorithm. In this post I will follow the steps of the Adaptive Asset Allocation paper, and in the next post I will show [...]