A blessing of dimensionality often observed in high-dimensional data sets

April 9, 2015
By

Tidy data sets have one observation per row and one variable per column.  Using this definition, big data sets can be either: Wide - a wide data set has a large number of measurements per observation, but fewer observations. This type of data set is typical in neuroimaging, genomics, and other biomedical applications. Tall - a

Read more »

What can be in an R data.frame column?

April 9, 2015
By

As an R programmer have you every wondered what can be in a data.frame column? The documentation is a bit vague, help(data.frame) returns some comforting text including: Value A data frame, a matrix-like structure whose columns may be of differing type...

Read more »

How to Get Ahead in Academia

April 9, 2015
By

This video on how to make it in academia was produced over 10 years ago by Steven Goodman for the ENAR Junior Researchers Workshop. Now the whole world can benefit from its wisdom. The movie features current and former JHU Biostatistics faculty, including Francesca Dominici, Giovanni Parmigiani, Scott Zeger, and Tom Louis. You don't want

Read more »

Why not statistics

April 9, 2015
By

Jordan Ellenberg’s parents were both statisticians. In his interview with Strongly Connected Components Jordan explains why he went into mathematics rather than statistics. I tried. I tried to learn some statistics actually when I was younger and it’s a beautiful subject. But at the time I think I found the shakiness of the philosophical underpinnings […]

Read more »

My favorite Neyman passage: on confidence intervals

April 9, 2015
By
My favorite Neyman passage: on confidence intervals

I've been doing a lot of reading on confidence interval theory. Some of the reading is more interesting than others. There is one passage from Neyman's (1952) book "Lectures and Conferences on Mathematical Statistics and Probability" (available here) t...

Read more »

New research in tuberculosis mapping and control

April 9, 2015
By

Mapping and control. Or, as we would say, descriptive and causal inference. Jon Zelner informs os about two ongoing research projects: 1. TB Hotspot Mapping: Over the summer, I [Zelner] put together a really simple R package to do non-parametric disease mapping using the distance-based mapping approach developed by Caroline Jeffery and Al Ozonoff at […] The post New research in tuberculosis mapping and control appeared first on Statistical Modeling,…

Read more »

Health economic combat

April 9, 2015
By
Health economic combat

A couple of weeks ago we decided to create a more formal website for our research group within the department of Statistical Science at UCL. The group includes the PhD students involved in health economic-related topics (basically all under my sup...

Read more »

Scala for Machine Learning [book review]

April 9, 2015
By
Scala for Machine Learning [book review]

Nicolas, Patrick R. (2014) Scala for Machine Learning, Packt Publishing: Birmingham, UK. Full disclosure: I received a free electronic version of this book from the publisher for the purposes of review. There is clearly a market for a good book about using Scala for statistical computing, machine learning and data science. So when the publisher … Continue reading Scala for Machine Learning [book review]

Read more »

Scala for Machine Learning [book review]

April 9, 2015
By
Scala for Machine Learning [book review]

Nicolas, Patrick R. (2014) Scala for Machine Learning, Packt Publishing: Birmingham, UK. Full disclosure: I received a free electronic version of this book from the publisher for the purposes of review. There is clearly a market for a good book about using Scala for statistical computing, machine learning and data science. So when the publisher … Continue reading Scala for Machine Learning [book review]

Read more »

Classification with Categorical Variables (the fuzzy side)

April 9, 2015
By
Classification with Categorical Variables (the fuzzy side)

The Gaussian and the (log) Poisson regressions share a very interesting property, i.e. the average predicted value is the empirical mean of our sample. > mean(predict(lm(dist~speed,data=cars))) [1] 42.98 > mean(cars$dist) [1] 42.98 One can prove that it is also the prediction for the average individual in our sample > predict(lm(dist~speed,data=cars), + newdata=data.frame(speed=mean(cars$speed))) 42.98 The geometric interpretation is that the regression line passes through the centroid, > plot(cars) > abline(lm(dist~speed,data=cars),col="red") > abline(h=mean(cars$dist),col="blue")…

Read more »

Heads I win, tails you lose? Meehl and many Popperians get this wrong (about severe tests)!

April 9, 2015
By
Heads I win, tails you lose? Meehl and many Popperians get this wrong (about severe tests)!

[T]he impressive thing about the 1919 tests of Einstein ‘s theory of gravity] is the risk involved in a prediction of this kind. If observation shows that the predicted effect is definitely absent, then the theory is simply refuted. The theory is incompatible with certain possible results of observation—in fact with results which everybody before […]

Read more »

Paperpile makes me more productive

April 9, 2015
By
Paperpile makes me more productive

One of the first things I tell my new research students is to use a reference management system to help them keep track of the papers they read, and to assist in creating bib files for their bibliography. Most of them use Mendeley, one or two use Zotero. Both do a good job and both are […]

Read more »

New video course: Campaign Response Testing

April 8, 2015
By

I am proud to announce a new Win-Vector LLC statistics video course: Campaign Response Testing John Mount, Win-Vector LLC This course works through the very specific statistics problem of trying to estimate the unknown true response rates one or more p...

Read more »

How can teachers of (large) online classes use text data from online learners?

April 8, 2015
By
How can teachers of (large) online classes use text data from online learners?

Dustin Tingley sends along a recent paper (coauthored with Justin Reich, Jetson Leder-Luis, Margaret Roberts, and Brandon Stewart), which begins: Dealing with the vast quantities of text that students generate in a Massive Open Online Course (MOOC) is a daunting challenge. Computational tools are needed to help instructional teams uncover themes and patterns as MOOC […] The post How can teachers of (large) online classes use text data from online…

Read more »

Compute the rank of a matrix in SAS

April 8, 2015
By
Compute the rank of a matrix in SAS

A common question from statistical programmers is how to compute the rank of a matrix in SAS. Recall that the rank of a matrix is defined as the number of linearly independent columns in the matrix. (Equivalently, the number of linearly independent rows.) This article describes how to compute the […]

Read more »

an email exchange about integral representations

April 7, 2015
By
an email exchange about integral representations

I had an interesting email exchange [or rather exchange of emails] with a (German) reader of Introducing Monte Carlo Methods with R in the past days, as he had difficulties with the validation of the accept-reject algorithm via the integral in that it took me several iterations [as shown in the above] to realise the […]

Read more »

Comparison of Bayesian predictive methods for model selection

April 7, 2015
By

This post is by Aki We mention the problem of bias induced by model selection in A survey of Bayesian predictive methods for model assessment, selection and comparison, in Understanding predictive information criteria for Bayesian models, and in BDA3 Chapter 7, but we haven’t had a good answer how to avoid that problem (except by […] The post Comparison of Bayesian predictive methods for model selection appeared first on Statistical…

Read more »

Outside pissing in

April 7, 2015
By
Outside pissing in

Coral Davenport writes in the New York Times: Mr. Tribe, 73, has been retained to represent Peabody Energy, the nation’s largest coal company, in its legal quest to block an Environmental Protection Agency regulation that would cut carbon dioxide emissions from the nation’s coal-fired power plants . . . Mr. Tribe likened the climate change […] The post Outside pissing in appeared first on Statistical Modeling, Causal Inference, and Social…

Read more »

Question from a Reader

April 7, 2015
By
Question from a Reader

Recently, I received an email from Ozan, who wrote:"I’ve a simple but not explicitly answered question within the text books on stationary series. I’m estimating a model with separate single equations (I don’t take into account the interactions a...

Read more »

The end of the oil glut

April 7, 2015
By
The end of the oil glut

In my last post, I talked about how America had depressed oil prices by increasing its supply. Recall this graph which shows that the supply glut is primarily caused by increased American supply (the top pink line is America): Since low prices are mainly caused by American oversupply, a decrease in American supply will have […]

Read more »

And . . . our featured 2015 seminar speaker is . . . Thomas HOBBES!!!!!

April 7, 2015
By
And . . . our featured 2015 seminar speaker is . . . Thomas HOBBES!!!!!

Just in case you’ve forgotten where this all came from: This came in the departmental email awhile ago: CALL FOR APPLICATIONS: LATOUR SEMINAR — DUE DATE AUGUST 11 (extended) The Brown Institute for Media Innovation, Alliance (Columbia University, École Polytechnique, Sciences Po, and Panthéon-Sorbonne University), The Center for Science and Society, and The Faculty of […] The post And . . . our featured 2015 seminar speaker is . .…

Read more »

Planned redundancy

April 7, 2015
By
Planned redundancy

The following Wall Street Journal caught my eye the other day: (Link to article) Looking closely, I realize that the four charts are identical, except for the call-outs. This is a kind of small-multiples in which the same data reside...

Read more »

Unsolicitors

April 7, 2015
By
Unsolicitors

This is probably just me being a bit grumpy, but I guess this happens to many people. I have just received an email (and it's not the first time) from a random scientific journal (this time it's a medical journal) inviting me to publish my research.Exc...

Read more »


Subscribe

Email:

  Subscribe