## I’ll say it again

October 1, 2013
By

Milan Valasek writes: Psychology students (and probably students in other disciplines) are often taught that in order to perform ‘parametric’ tests, e.g. independent t-test, the data for each group need to be normally distributed. However, in literature (and various university lecture notes and slides accessible online), I have come across at least 4 different interpretation […]The post I’ll say it again appeared first on Statistical Modeling, Causal Inference, and Social…

October 1, 2013
By

## Marginal likelihood from tempered Bayesian posteriors

October 1, 2013
By
$Marginal likelihood from tempered Bayesian posteriors$

Introduction In the previous post I showed that it is possible to couple parallel tempered MCMC chains in order to improve mixing. Such methods can be used when the target of interest is a Bayesian posterior distribution that is difficult to sample. There are (at least) a couple of obvious ways that one can temper […]

## Marginal likelihood from tempered Bayesian posteriors

October 1, 2013
By
$Marginal likelihood from tempered Bayesian posteriors$

Introduction In the previous post I showed that it is possible to couple parallel tempered MCMC chains in order to improve mixing. Such methods can be used when the target of interest is a Bayesian posterior distribution that is difficult to sample. There are (at least) a couple of obvious ways that one can temper […]

## The messy world of Big Data

October 1, 2013
By

On page 8 of Numbersense (link), I wrote: Web logs are a messy, messy world. If two vendors are deployed to analyze traffic on the same website, it is guaranteed that their statistics would not reconcile, and the gap can be as high as 20 or 30 percent. Insiders will nod their heads; for those who aren’t familiar with Web data, take a look at this recent post on The…

## Creating a matrix from a long data.frame

October 1, 2013
By

There can never be too many examples for transforming data with R. So, here is another example of reshaping a data.frame into a matrix.Here I have a data frame that shows incremental claim payments over time for different loss occurrence (origin) years...

## Using the aggregate of the outcome variable as a group-level predictor in a hierarchical model

September 30, 2013
By

When I was a kid I took a writing class, and one of the assignments was to write a 1-to-2 page story. I can’t remember what I wrote, but I do remember the following story from one of the other kids. In its entirety: I snuck into this pay toilet and I can’t get out! […]The post Using the aggregate of the outcome variable as a group-level predictor in a…

## Credibility Toryism: Causal Inference, Research Design, and Evidence

September 30, 2013
By

Originally posted on The Political Methodologist:In a prior post on my personal blog, I argued that it is misleading to label matching procedures as causal inference procedures (in the Neyman-Rubin sense of the term). My basic argument was that the causal quality of these inferences depends on untested (and in some cases untestable) assumptions…

## A Bayesian Twist on Tukey’s Flogs

September 30, 2013
By

In the last post I described flogs, a useful transform on proportions data introduced by John Tukey in his Exploratory Data Analysis. Flogging a proportion (such as, two out of three computers were Macs) consisted of two steps: first we “started”...

## ROC curves and classification

September 30, 2013
By
$\{0,1\}$

To get back to a question asked after the last course (still on non-life insurance), I will spend some time to discuss ROC curve construction, and interpretation. Consider the dataset we’ve been using last week, > db = read.table("http://freakonometrics.free.fr/db.txt",header=TRUE,sep=";") > attach(db) The first step is to get a model. For instance, a logistic regression, where some factors were merged together, > X3bis=rep(NA,length(X3)) > X3bis[X3%in%c("A","C","D")]="ACD" > X3bis[X3%in%c("B","E")]="BE" > db\$X3bis=as.factor(X3bis) > reg=glm(Y~X1+X2+X3bis,family=binomial,data=db)…

## Statistical Ode to Mariano Rivera

September 30, 2013
By

Mariano Rivera is an outlier in many ways. The plot below shows one of them: top 10 pitchers ranked by postseason saves.

## Query from a textbook author – looking for stories to tell to undergrads about significance

September 30, 2013
By

Someone sent me the following email: I am an environmental journalist writing an Environmental Science 101 textbook and I’m currently working on the section on hypothesis testing and statistical significance. I am searching for a story to make the importance of thinking statistically come alive for the students, ideally one from the environmental sciences. I’m […]The post Query from a textbook author – looking for stories to tell to undergrads…

## An inspired picture of Blackberry’s dying inspiration

September 30, 2013
By

The New York Times has a splendid example of an infographics this weekend, showing the rise and fall of the Blackberry. Notice the inspired touch of the black circles to trace the outline of Blackberry's market share. They are a...

## Generate combinations in SAS

September 30, 2013
By

Last week I described how to generate permutations in SAS. A related concept is the "combination." In probability and statistics, a combination is a subset of k items chosen from a set that contains N items. Order does not matter, so although the ordered triplets (B, A, C) and (C, [...]

## R Presentation

September 30, 2013
By

Last week a preview of version 0.98 of R Studio  was released, with lots of new features, including some useful debugging tools. Also part of the release was a new option for creating presentations, which looks like it will be very useful. The presen...

## Estimating Undirected Graphs Under Weak Assumptions

September 30, 2013
By
$Estimating Undirected Graphs Under Weak Assumptions$

Mladen Kolar, Alessandro Rinaldo and I have uploaded a paper to arXiv entitled “Estimating Undirected Graphs Under Weak Assumptions.” As the name implies, the goal is to estimate an undirected graph from random vectors . Here, each is a vector with coordinates, or features. The graph has nodes, one for each feature. We put an […]

## Estimating Undirected Graphs Under Weak Assumptions

September 30, 2013
By
$Estimating Undirected Graphs Under Weak Assumptions$

Mladen Kolar, Alessandro Rinaldo and I have uploaded a paper to arXiv entitled “Estimating Undirected Graphs Under Weak Assumptions.” As the name implies, the goal is to estimate an undirected graph from random vectors . Here, each is a vector with coordinates, or features. The graph has nodes, one for each feature. We put an […]

## Testing R Packages

September 30, 2013
By

This guy th3james claimed Testing Code Is Simple, and I agree. In the R world, this is not anything new. As far as I can see, there are three schools of R users with different testing techniques: tests are put under package/tests/, and a foo-test.Rou...

## Highly probable vs highly probed: Bayesian/ error statistical differences

September 29, 2013
By

A reader asks: “Can you tell me about disagreements on numbers between a severity assessment within error statistics, and a Bayesian assessment of posterior probabilities?” Sure. There are differences between Bayesian posterior probabilities and formal error statistical measures, as well as between the latter and a severity (SEV) assessment, which differs from the standard type […]

## Those who can, teach statistics

September 29, 2013
By

The phrase I despise more than any in popular use (and believe me there are many contenders) is “Those who can, do, and those who can’t, teach.” I like many of the sayings of George Bernard Shaw, but this one … Continue reading →

## Sunday data/statistics link roundup (9/29/13)

September 29, 2013
By

The links are back! Read on. Susan Murphy - a statistician - wins a Macarthur Award. Great for the field of statistics (via Dan S. and Simina B., among others). Related: an Interview with David Donoho about the Shaw Prize. Statisticians are … Continue reading →

## The difficulties of measuring just about anything

September 29, 2013
By

Mark Duckenfield writes: Some comments on statistics and “bad math”, that I think display a clear misunderstanding of statistics and surveys. http://www.armytimes.com/article/20130714/NEWS/307140016/Marine-officer-Scope-sex-assault-problem-exaggerated and the editorial to which it refers http://online.wsj.com/article/SB10001424127887323582904578484941173658754.html The original report is quite clear about weighting things, smaple sizes, etc. The apparent “clincher” argument in the editorial—that over 50% of unwanted sexual advances […]The post The difficulties of measuring just about anything appeared first on Statistical Modeling, Causal…

## Mixed Models: Influence in Heterogeneous Variance Model

September 29, 2013
By

In this post I extend my knowledge of mixed models by redoing section 59.7 (page 5048) of the SAS/STAT user guide. I don't think this particular example can be run in lme4, so that leaves nlme and MCMCglmm. MCMCglmm has less ability for influence measu...