## Modeling Categories with Breadth and Depth

April 11, 2015
Religion is a categorical variable with followers differentiated by their degree of devotion. Liberals and conservatives check their respective boxes when surveyed, although moderates from each group sometimes seem more alike than their more extreme co...

## Translational Bioinformatics Year In Review

April 10, 2015
Per tradition, Russ Altman gave his "Translational Bioinformatics: The Year in Review" presentation at the close of the AMIA Joint Summit on Translational Bioinformatics in San Francisco on March 26th.  This year, papers came from six key areas (a...

## A silly little error, of the sort that I make every day

April 10, 2015
Ummmm, running Stan, testing out a new method we have that applies EP-like ideas to perform inference with aggregate data—it’s really cool, I’ll post more on it once we’ve tried everything out and have a paper that’s in better shape—anyway, I’m starting with a normal example, a varying-intercept, varying-slope model where the intercepts have population […] The post A silly little error, of the sort that I make every day…

## R User Group Recap: Heatmaps and Using the caret Package

April 10, 2015
At our most recent R user group meeting we were delighted to have presentations from Mark Lawson and Steve Hoang, both bioinformaticians at Hemoshear. All of the code used in both demos is in our Meetup’s GitHub repo.Making heatmaps in RSteve started...

## Mistaken identity

April 10, 2015
Someone I know sent me the following email: The person XX [pseudonym redacted] who posts on your blog is almost certainly YY [name redacted]. So he is referencing his own work and trying to make it sound like it is a third party endorsing it. Not sure why but it bugs me. He is an […] The post Mistaken identity appeared first on Statistical Modeling, Causal Inference, and Social Science.

## All about that "bias, bias, bias" (it’s no trouble)

April 10, 2015
At some point, everyone who fiddles around with Bayes factors with point nulls notices something that, at first blush, seems strange: small effect sizes seem “biased” toward the null hypothesis. In null hypothesis significance testing, power simply increases when you change the true effect size. With Bayes factors, there is a non-monotonicity where increasing the sample size will slightly increase the degree to which a small effect size favors the…

## Index to first 50 posts

April 10, 2015
This is the 50th post to this blog. For my 25th post I provided a catalogue of my first 25 posts, and as promised then, I now provide a similar index for posts 25 to 50. 25. Catalogue of my first 25 blog posts 26. Multivariate data analysis (using R): a course and some lecture … Continue reading Index to first 50 posts

## Feeling the FPP love

April 10, 2015
It is now exactly 12 months since the print version of my forecasting textbook with George Athanasopoulos was released on Amazon.com. Although the book is freely available online, it seems that a lot of people still like to buy print books. It’s nice to see that it has been getting some good reviews. It is rated […]

## Books to Read While the Algae Grow in Your Fur, November 2014

April 10, 2015
Attention conservation notice: I have no taste. Kathleen Tierney (= Caitlin Kiernan), Blood Oranges Mind candy: it's hard out there for a hustler on the fringes of Providence's supernatural demi-monde. Nicole Peeler, Jinn and Juice Mind candy; fr...

## "Sparse Graph Limits with Applications to Machine Learning" (Week after Next at the Statistics Seminar)

April 10, 2015
Attention conservation notice: Notice of an upcoming academic talk at Carnegie Mellon. Only of interest if you (1) care about how the mathematics of graph limits intersects with non-parametric network modeling, and (2) will be in Pittsburgh week afte...

## Some thoughts on replication

April 9, 2015
In a recent blog post, Simine Vazire discusses the problem with the logic of requiring replicators to explain when they reach different conclusions to the original authors. She frames it, correctly, it as asking people to over-interpret random noi...

## A blessing of dimensionality often observed in high-dimensional data sets

April 9, 2015
Tidy data sets have one observation per row and one variable per column.  Using this definition, big data sets can be either: Wide - a wide data set has a large number of measurements per observation, but fewer observations. This type of data set is typical in neuroimaging, genomics, and other biomedical applications. Tall - a

## What can be in an R data.frame column?

April 9, 2015
As an R programmer have you every wondered what can be in a data.frame column? The documentation is a bit vague, help(data.frame) returns some comforting text including: Value A data frame, a matrix-like structure whose columns may be of differing type...

April 9, 2015
This video on how to make it in academia was produced over 10 years ago by Steven Goodman for the ENAR Junior Researchers Workshop. Now the whole world can benefit from its wisdom. The movie features current and former JHU Biostatistics faculty, including Francesca Dominici, Giovanni Parmigiani, Scott Zeger, and Tom Louis. You don't want

## Why not statistics

April 9, 2015
Jordan Ellenberg’s parents were both statisticians. In his interview with Strongly Connected Components Jordan explains why he went into mathematics rather than statistics. I tried. I tried to learn some statistics actually when I was younger and it’s a beautiful subject. But at the time I think I found the shakiness of the philosophical underpinnings […]

## My favorite Neyman passage: on confidence intervals

April 9, 2015
I've been doing a lot of reading on confidence interval theory. Some of the reading is more interesting than others. There is one passage from Neyman's (1952) book "Lectures and Conferences on Mathematical Statistics and Probability" (available here) t...

## New research in tuberculosis mapping and control

April 9, 2015
Mapping and control. Or, as we would say, descriptive and causal inference. Jon Zelner informs os about two ongoing research projects: 1. TB Hotspot Mapping: Over the summer, I [Zelner] put together a really simple R package to do non-parametric disease mapping using the distance-based mapping approach developed by Caroline Jeffery and Al Ozonoff at […] The post New research in tuberculosis mapping and control appeared first on Statistical Modeling,…

## Health economic combat

April 9, 2015
A couple of weeks ago we decided to create a more formal website for our research group within the department of Statistical Science at UCL. The group includes the PhD students involved in health economic-related topics (basically all under my sup...

## Scala for Machine Learning [book review]

April 9, 2015
Nicolas, Patrick R. (2014) Scala for Machine Learning, Packt Publishing: Birmingham, UK. Full disclosure: I received a free electronic version of this book from the publisher for the purposes of review. There is clearly a market for a good book about using Scala for statistical computing, machine learning and data science. So when the publisher … Continue reading Scala for Machine Learning [book review]

## Classification with Categorical Variables (the fuzzy side)

April 9, 2015
By
$\frac{1}{n}\sum_{i=1}^n \widehat{Y}_i=\frac{1}{n}\sum_{i=1}^n Y_i$

The Gaussian and the (log) Poisson regressions share a very interesting property, i.e. the average predicted value is the empirical mean of our sample. > mean(predict(lm(dist~speed,data=cars))) [1] 42.98 > mean(cars\$dist) [1] 42.98 One can prove that it is also the prediction for the average individual in our sample > predict(lm(dist~speed,data=cars), + newdata=data.frame(speed=mean(cars\$speed))) 42.98 The geometric interpretation is that the regression line passes through the centroid, > plot(cars) > abline(lm(dist~speed,data=cars),col="red") > abline(h=mean(cars\$dist),col="blue")…