## Interview with Nick Chamandy, statistician at Google

February 15, 2013
Nick Chamandy Nick Chamandy received his M.S. in statistics from the University of Chicago, his Ph.D. in statistics at McGill University and joined Google as a statistician. We talked to him about how he ended up at Google, what software … Continue reading →

## Wacky priors can work well?

February 15, 2013
Dave Judkins writes: I would love to see a blog entry on this article, Bayesian Model Selection in High-Dimensional Settings, by Valen Johnson and David Rossell. The simulation results are very encouraging although the choice of colors for some of the graphics is unfortunate. Unless I am colorblind in some way that I am unaware [...]

## New Data Scientist role at Lloyd’s

February 15, 2013
Lloyd's of London is looking for a Data Scientist as part of the Analysis team. See Lloyd's career web site for more details.

## Sloppy journalism with interactive graphics is still sloppy journalism

February 15, 2013
The Guardian recently discussed the "declining linguistic standards" in State of the Union addresses. I thought  this was an interesting exercise, but something seemed wrong about the article, and it turns out this is one case where the data do no...

## FillIn: a function for filling in missing data in one data frame with info from another

February 15, 2013
Update (10 March 2013): FillIn is now part of the budding DataCombine package. Sometimes I want to use R to fill in values that are missing in one data frame with values from another. For example, I have data from the World Bank on government deficits...

## Review: Scott Christianson, 100 Diagrams That Changed the World

February 15, 2013
I recently came across this book that claims to collect the 100 most important diagrams in the history of mankind. It’s a good collection, with many wonderful examples, though it has its flaws. To get the main issue out of the way: the title is misleading. The selection in the book is not based on the quality of the diagrams, but rather of the invention or cultural shift they are…

## Forecasting conferences

February 15, 2013
This year there are no less than three forecasting conferences planned for June and July 2013. As well as the annual International Symposium on Forecasting, there is WIPFOR (Workshop on Industry & Practices for FORecasting) to be held in Clamart (near Paris) in June, and a forecasting stream at the EURO2013 conference in Rome in early July. Some details follow, taken from emails sent to me recently. WIPFOR (Clamart, France,…

## January Seasonality Shiny web application

February 15, 2013
Today, I want to share the January Seasonality application (code at GitHub). This example is based on the An Example of Seasonality Analysis post. This is the third application in the series of examples (I plan to share 5 examples) that will demonstrate the amazing Shiny framework and Systematic Investor Toolbox to analyze stocks, make […]

## Statistics for firefighters: update

February 14, 2013
Following up on our earlier discussion, Daniel Rubenson from Ryerson University in Toronto writes: The course went really well (it was a couple of years ago now). The course was run through a partnership my department has with the Ontario Fire College. Basically, firefighters can do a certificate and sometimes a degree in public administration [...]

February 14, 2013
I link to and briefly discuss the paper "Rookie Mistakes," recently published in PS.

February 14, 2013
## R database interfaces

February 14, 2013
Several packages on CRAN provide (or relate to) interfaces between databases and R.  Here is a summary, mostly in the words of the package descriptions.  Remember that package names are case-sensitive. The packages that talk about being DBI-compliant are referring to the DBI package (see below in “Other SQL”). MySQL dbConnect: Provides a graphical user [...]The post R database interfaces appeared first on Burns Statistics.

## Statistics as a Counter to Heavyweights…who wrote this?

February 14, 2013
When any scientific conclusion is supposed to be [shown or disproved] on experimental evidence [or data], critics who still refuse to accept the conclusion are accustomed to take one of two lines of attack. They may claim that the interpretation of the [data] is faulty, that the results reported are not in fact those which [...]

## Multiple Stocks Plot Shiny web application

February 14, 2013
Today, I want to share the Multiple Stocks Plot application (code at GitHub). This is the second application in the series of examples (I plan to share 5 examples) that will demonstrate the amazing Shiny framework and Systematic Investor Toolbox to analyze stocks, make back-tests, and create summary reports. The motivation for this series of […]

## Hyndsight

February 14, 2013
Originally, I wrote this blog for my own PhD students and I covered issues to do with research. I called it “Research tips” because that is what it was meant to be. However, over time I’ve started covering other things of interest to me, and the ...

## Out-of-sample one-step forecasts

February 13, 2013
It is common to fit a model using training data, and then to evaluate its performance on a test data set. When the data are time series, it is useful to compute one-step forecasts on the test data. For some reason, this is much more commonly done by people trained in machine learning rather than statistics. If you are using the forecast package in R, it is easily done with…

## How Big Data Can Ruin True Statistics – Storagecraft

February 13, 2013
FROM sTORAGECRAFT: FEBRUARY 13, 2013   CASEY MORGAN   NO COMMENTSBig Data presents a lot of opportunities for information discovery. The world has begun creating billions of bytes of data, which can be analyzed and ut...

## Offended by conditional probability

February 13, 2013
It’s a simple rule of probability that if A makes B more likely, B makes A more likely. That is, if the conditional probability of A given B is larger than the probability of A alone, the the conditional probability…Read more ›

## Visualizing overdispersion (with trees)

February 13, 2013
This week, we started to discuss overdispersion when modeling claims frequency. In my previous post, I discussed computations of empirical variances with different exposure. But I did use only one factor to compute classes. Of course, it is possible to use much more factors. For instance, using cartesian products of factors, > X=as.factor(paste(sinistres\$carburant,sinistres\$zone, + cut(sinistres\$ageconducteur,breaks=c(17,24,40,65,101)))) > E=sinistres\$exposition > Y=sinistres\$nbre > vm=vv=ve=rep(NA,length(levels(X))) > for(i in 1:length(levels(X))){ + ve[i]=Ei=E[X==levels(X)[i]] + Yi=Y[X==levels(X)[i]] +…

## I’m a young scientist and sequestration will hurt me

February 13, 2013
I’m a biostatistician. That means that I help scientists and doctors analyze their medical data to try to figure out new screening tools, new therapies, and new ways to improve patients’ health. I’m also a professor. I  spend a good … Continue reading →

## Large claims, and ratemaking

February 13, 2013
$Y$

During the course, we have seen that it is natural to assume that not only the individual claims frequency can be explained by some covariates, but individual costs too. Of course, appropriate families should be considered to model the distribution of the cost , given some covariates .Here is the dataset we’ll use, > sinistre=read.table("http://freakonometrics.free.fr/sinistreACT2040.txt", + header=TRUE,sep=";") > sinistres=sinistre[sinistre\$garantie=="1RC",] > sinistres=sinistres[sinistres\$cout>0,] > contrat=read.table("http://freakonometrics.free.fr/contractACT2040.txt", + header=TRUE,sep=";") > couts=merge(sinistres,contrat) > tail(couts) nocontrat…

## A must-read paper on statistical analysis of experimental data

February 13, 2013
Russ Lyons points to an excellent article on statistical experimentation by Ron Kohavi, Alex Deng, Brian Frasca, Roger Longbotham, Toby Walker, Ya Xu, a group of software engineers (I presume) at Microsoft. Kohavi et al. write: Online controlled experiments are often utilized to make data-driven decisions at Amazon, Microsoft . . . deployment and mining [...]

## Light entertainment: which number is larger

February 13, 2013
A reader from Down Under sent this via twitter: Seems like the editor fell asleep.