On the misinterpretation of p-values:

October 2, 2013
By

First, let me start of by saying I'm a classical statistics and p-value apologist--I think it's the cats pajamas. It was mine (and many others') first introduction to statistics. So, in spite of my being a card-carrying member of The Bayesian Consipiracy, there will always be a place in my heart (grinch-sized though it is) »more

Read more »

Bayes alert! Cool postdoc position here on missing data imputation and applications in health disparities research!

October 2, 2013
By

The Division of Biostatistics and Epidemiology at Weill Medical College of Cornell University invite applications for a post-doctoral fellow position in Biostatistics. We are seeking a highly motivated individual to develop novel statistical methods for missing data imputation and applications in health disparities research using national administrative data, with funding from Agency of Healthcare Research […]The post Bayes alert! Cool postdoc position here on missing data imputation and applications in…

Read more »

Scale-cramming

October 2, 2013
By
Scale-cramming

Reader Andrew C. was unhappy about the following stacked bar chart, published by Teach for America, touting its diversity. (link) The lightning symbol that splits apart the Caucasian bar is a harbinger of trouble. For the designer deployed seven different...

Read more »

The first MOOC in statistics

October 2, 2013
By
The first MOOC in statistics

Massive open online courses (MOOCs) are all the rage today. Some people see free online courses as a convenient way to introduce statistical concepts to tens of thousands of students who would not otherwise have an opportunity to learn about data analysis. Whereas 2013 is the International Year of Statistics, [...]

Read more »

The Uncertainty of Predictions

October 2, 2013
By
The Uncertainty of Predictions

There are many kinds of intervals in statistics.  To name a few of the common intervals: confidence intervals, prediction intervals, credible intervals, and tolerance intervals. Each are useful and serve their own purpose. I’ve been recently working on a couple of projects that involve making predictions from a regression model and I’ve been doing some […]

Read more »

Big Data the Big Hassle

October 2, 2013
By

The hype surrounding "Big Data" has escalated to borderline nauseating. Is it just a sham?Yes, I know, I have earlier gushed about the wonders of Big Data. But that was then, and now is now, and I hear my inner contrarian alarm sounding.One thing ...

Read more »

I’ll say it again

October 1, 2013
By

Milan Valasek writes: Psychology students (and probably students in other disciplines) are often taught that in order to perform ‘parametric’ tests, e.g. independent t-test, the data for each group need to be normally distributed. However, in literature (and various university lecture notes and slides accessible online), I have come across at least 4 different interpretation […]The post I’ll say it again appeared first on Statistical Modeling, Causal Inference, and Social…

Read more »

Workshop on BIG DATA

October 1, 2013
By
Workshop on BIG DATA

From: https://docs.google.com/file/d/0B8kL1t8n_fICMlBkWlZWR3UwS0E/edit?usp=drive_web

Read more »

Marginal likelihood from tempered Bayesian posteriors

October 1, 2013
By
Marginal likelihood from tempered Bayesian posteriors

Introduction In the previous post I showed that it is possible to couple parallel tempered MCMC chains in order to improve mixing. Such methods can be used when the target of interest is a Bayesian posterior distribution that is difficult to sample. There are (at least) a couple of obvious ways that one can temper […]

Read more »

Marginal likelihood from tempered Bayesian posteriors

October 1, 2013
By
Marginal likelihood from tempered Bayesian posteriors

Introduction In the previous post I showed that it is possible to couple parallel tempered MCMC chains in order to improve mixing. Such methods can be used when the target of interest is a Bayesian posterior distribution that is difficult to sample. There are (at least) a couple of obvious ways that one can temper […]

Read more »

The messy world of Big Data

October 1, 2013
By

On page 8 of Numbersense (link), I wrote: Web logs are a messy, messy world. If two vendors are deployed to analyze traffic on the same website, it is guaranteed that their statistics would not reconcile, and the gap can be as high as 20 or 30 percent. Insiders will nod their heads; for those who aren’t familiar with Web data, take a look at this recent post on The…

Read more »

Creating a matrix from a long data.frame

October 1, 2013
By
Creating a matrix from a long data.frame

There can never be too many examples for transforming data with R. So, here is another example of reshaping a data.frame into a matrix.Here I have a data frame that shows incremental claim payments over time for different loss occurrence (origin) years...

Read more »

Using the aggregate of the outcome variable as a group-level predictor in a hierarchical model

September 30, 2013
By
Using the aggregate of the outcome variable as a group-level predictor in a hierarchical model

When I was a kid I took a writing class, and one of the assignments was to write a 1-to-2 page story. I can’t remember what I wrote, but I do remember the following story from one of the other kids. In its entirety: I snuck into this pay toilet and I can’t get out! […]The post Using the aggregate of the outcome variable as a group-level predictor in a…

Read more »

Credibility Toryism: Causal Inference, Research Design, and Evidence

September 30, 2013
By
Credibility Toryism: Causal Inference, Research Design, and Evidence

Originally posted on The Political Methodologist:In a prior post on my personal blog, I argued that it is misleading to label matching procedures as causal inference procedures (in the Neyman-Rubin sense of the term). My basic argument was that the causal quality of these inferences depends on untested (and in some cases untestable) assumptions…

Read more »

A Bayesian Twist on Tukey’s Flogs

September 30, 2013
By
A Bayesian Twist on Tukey’s Flogs

In the last post I described flogs, a useful transform on proportions data introduced by John Tukey in his Exploratory Data Analysis. Flogging a proportion (such as, two out of three computers were Macs) consisted of two steps: first we “started”...

Read more »

ROC curves and classification

September 30, 2013
By
ROC curves and classification

To get back to a question asked after the last course (still on non-life insurance), I will spend some time to discuss ROC curve construction, and interpretation. Consider the dataset we’ve been using last week, > db = read.table("http://freakonometrics.free.fr/db.txt",header=TRUE,sep=";") > attach(db) The first step is to get a model. For instance, a logistic regression, where some factors were merged together, > X3bis=rep(NA,length(X3)) > X3bis[X3%in%c("A","C","D")]="ACD" > X3bis[X3%in%c("B","E")]="BE" > db$X3bis=as.factor(X3bis) > reg=glm(Y~X1+X2+X3bis,family=binomial,data=db)…

Read more »

Statistical Ode to Mariano Rivera

September 30, 2013
By
Statistical Ode to Mariano Rivera

Mariano Rivera is an outlier in many ways. The plot below shows one of them: top 10 pitchers ranked by postseason saves.

Read more »

Query from a textbook author – looking for stories to tell to undergrads about significance

September 30, 2013
By
Query from a textbook author – looking for stories to tell to undergrads about significance

Someone sent me the following email: I am an environmental journalist writing an Environmental Science 101 textbook and I’m currently working on the section on hypothesis testing and statistical significance. I am searching for a story to make the importance of thinking statistically come alive for the students, ideally one from the environmental sciences. I’m […]The post Query from a textbook author – looking for stories to tell to undergrads…

Read more »

An inspired picture of Blackberry’s dying inspiration

September 30, 2013
By
An inspired picture of Blackberry’s dying inspiration

The New York Times has a splendid example of an infographics this weekend, showing the rise and fall of the Blackberry. Notice the inspired touch of the black circles to trace the outline of Blackberry's market share. They are a...

Read more »

Generate combinations in SAS

September 30, 2013
By
Generate combinations in SAS

Last week I described how to generate permutations in SAS. A related concept is the "combination." In probability and statistics, a combination is a subset of k items chosen from a set that contains N items. Order does not matter, so although the ordered triplets (B, A, C) and (C, [...]

Read more »

R Presentation

September 30, 2013
By

Last week a preview of version 0.98 of R Studio  was released, with lots of new features, including some useful debugging tools. Also part of the release was a new option for creating presentations, which looks like it will be very useful. The presen...

Read more »

Estimating Undirected Graphs Under Weak Assumptions

September 30, 2013
By
Estimating Undirected Graphs Under Weak Assumptions

Mladen Kolar, Alessandro Rinaldo and I have uploaded a paper to arXiv entitled “Estimating Undirected Graphs Under Weak Assumptions.” As the name implies, the goal is to estimate an undirected graph from random vectors . Here, each is a vector with coordinates, or features. The graph has nodes, one for each feature. We put an […]

Read more »

Estimating Undirected Graphs Under Weak Assumptions

September 30, 2013
By
Estimating Undirected Graphs Under Weak Assumptions

Mladen Kolar, Alessandro Rinaldo and I have uploaded a paper to arXiv entitled “Estimating Undirected Graphs Under Weak Assumptions.” As the name implies, the goal is to estimate an undirected graph from random vectors . Here, each is a vector with coordinates, or features. The graph has nodes, one for each feature. We put an […]

Read more »


Subscribe

Email:

  Subscribe