Big Data the Big Hassle

October 2, 2013
By

The hype surrounding "Big Data" has escalated to borderline nauseating. Is it just a sham?Yes, I know, I have earlier gushed about the wonders of Big Data. But that was then, and now is now, and I hear my inner contrarian alarm sounding.One thing ...

Read more »

I’ll say it again

October 1, 2013
By

Milan Valasek writes: Psychology students (and probably students in other disciplines) are often taught that in order to perform ‘parametric’ tests, e.g. independent t-test, the data for each group need to be normally distributed. However, in literature (and various university lecture notes and slides accessible online), I have come across at least 4 different interpretation […]The post I’ll say it again appeared first on Statistical Modeling, Causal Inference, and Social…

Read more »

Workshop on BIG DATA

October 1, 2013
By
Workshop on BIG DATA

From: https://docs.google.com/file/d/0B8kL1t8n_fICMlBkWlZWR3UwS0E/edit?usp=drive_web

Read more »

Marginal likelihood from tempered Bayesian posteriors

October 1, 2013
By
Marginal likelihood from tempered Bayesian posteriors

Introduction In the previous post I showed that it is possible to couple parallel tempered MCMC chains in order to improve mixing. Such methods can be used when the target of interest is a Bayesian posterior distribution that is difficult to sample. There are (at least) a couple of obvious ways that one can temper […]

Read more »

Marginal likelihood from tempered Bayesian posteriors

October 1, 2013
By
Marginal likelihood from tempered Bayesian posteriors

Introduction In the previous post I showed that it is possible to couple parallel tempered MCMC chains in order to improve mixing. Such methods can be used when the target of interest is a Bayesian posterior distribution that is difficult to sample. There are (at least) a couple of obvious ways that one can temper […]

Read more »

The messy world of Big Data

October 1, 2013
By

On page 8 of Numbersense (link), I wrote: Web logs are a messy, messy world. If two vendors are deployed to analyze traffic on the same website, it is guaranteed that their statistics would not reconcile, and the gap can be as high as 20 or 30 percent. Insiders will nod their heads; for those who aren’t familiar with Web data, take a look at this recent post on The…

Read more »

Creating a matrix from a long data.frame

October 1, 2013
By
Creating a matrix from a long data.frame

There can never be too many examples for transforming data with R. So, here is another example of reshaping a data.frame into a matrix.Here I have a data frame that shows incremental claim payments over time for different loss occurrence (origin) years...

Read more »

Using the aggregate of the outcome variable as a group-level predictor in a hierarchical model

September 30, 2013
By
Using the aggregate of the outcome variable as a group-level predictor in a hierarchical model

When I was a kid I took a writing class, and one of the assignments was to write a 1-to-2 page story. I can’t remember what I wrote, but I do remember the following story from one of the other kids. In its entirety: I snuck into this pay toilet and I can’t get out! […]The post Using the aggregate of the outcome variable as a group-level predictor in a…

Read more »

Credibility Toryism: Causal Inference, Research Design, and Evidence

September 30, 2013
By
Credibility Toryism: Causal Inference, Research Design, and Evidence

Originally posted on The Political Methodologist:In a prior post on my personal blog, I argued that it is misleading to label matching procedures as causal inference procedures (in the Neyman-Rubin sense of the term). My basic argument was that the causal quality of these inferences depends on untested (and in some cases untestable) assumptions…

Read more »

A Bayesian Twist on Tukey’s Flogs

September 30, 2013
By
A Bayesian Twist on Tukey’s Flogs

In the last post I described flogs, a useful transform on proportions data introduced by John Tukey in his Exploratory Data Analysis. Flogging a proportion (such as, two out of three computers were Macs) consisted of two steps: first we “started”...

Read more »

ROC curves and classification

September 30, 2013
By
ROC curves and classification

To get back to a question asked after the last course (still on non-life insurance), I will spend some time to discuss ROC curve construction, and interpretation. Consider the dataset we’ve been using last week, > db = read.table("http://freakonometrics.free.fr/db.txt",header=TRUE,sep=";") > attach(db) The first step is to get a model. For instance, a logistic regression, where some factors were merged together, > X3bis=rep(NA,length(X3)) > X3bis[X3%in%c("A","C","D")]="ACD" > X3bis[X3%in%c("B","E")]="BE" > db$X3bis=as.factor(X3bis) > reg=glm(Y~X1+X2+X3bis,family=binomial,data=db)…

Read more »

Statistical Ode to Mariano Rivera

September 30, 2013
By
Statistical Ode to Mariano Rivera

Mariano Rivera is an outlier in many ways. The plot below shows one of them: top 10 pitchers ranked by postseason saves.

Read more »

Query from a textbook author – looking for stories to tell to undergrads about significance

September 30, 2013
By
Query from a textbook author – looking for stories to tell to undergrads about significance

Someone sent me the following email: I am an environmental journalist writing an Environmental Science 101 textbook and I’m currently working on the section on hypothesis testing and statistical significance. I am searching for a story to make the importance of thinking statistically come alive for the students, ideally one from the environmental sciences. I’m […]The post Query from a textbook author – looking for stories to tell to undergrads…

Read more »

An inspired picture of Blackberry’s dying inspiration

September 30, 2013
By
An inspired picture of Blackberry’s dying inspiration

The New York Times has a splendid example of an infographics this weekend, showing the rise and fall of the Blackberry. Notice the inspired touch of the black circles to trace the outline of Blackberry's market share. They are a...

Read more »

Generate combinations in SAS

September 30, 2013
By
Generate combinations in SAS

Last week I described how to generate permutations in SAS. A related concept is the "combination." In probability and statistics, a combination is a subset of k items chosen from a set that contains N items. Order does not matter, so although the ordered triplets (B, A, C) and (C, [...]

Read more »

R Presentation

September 30, 2013
By

Last week a preview of version 0.98 of R Studio  was released, with lots of new features, including some useful debugging tools. Also part of the release was a new option for creating presentations, which looks like it will be very useful. The presen...

Read more »

Estimating Undirected Graphs Under Weak Assumptions

September 30, 2013
By
Estimating Undirected Graphs Under Weak Assumptions

Mladen Kolar, Alessandro Rinaldo and I have uploaded a paper to arXiv entitled “Estimating Undirected Graphs Under Weak Assumptions.” As the name implies, the goal is to estimate an undirected graph from random vectors . Here, each is a vector with coordinates, or features. The graph has nodes, one for each feature. We put an […]

Read more »

Estimating Undirected Graphs Under Weak Assumptions

September 30, 2013
By
Estimating Undirected Graphs Under Weak Assumptions

Mladen Kolar, Alessandro Rinaldo and I have uploaded a paper to arXiv entitled “Estimating Undirected Graphs Under Weak Assumptions.” As the name implies, the goal is to estimate an undirected graph from random vectors . Here, each is a vector with coordinates, or features. The graph has nodes, one for each feature. We put an […]

Read more »

Testing R Packages

September 30, 2013
By

This guy th3james claimed Testing Code Is Simple, and I agree. In the R world, this is not anything new. As far as I can see, there are three schools of R users with different testing techniques: tests are put under package/tests/, and a foo-test.Rou...

Read more »

Highly probable vs highly probed: Bayesian/ error statistical differences

September 29, 2013
By
Highly probable vs highly probed: Bayesian/ error statistical differences

A reader asks: “Can you tell me about disagreements on numbers between a severity assessment within error statistics, and a Bayesian assessment of posterior probabilities?” Sure. There are differences between Bayesian posterior probabilities and formal error statistical measures, as well as between the latter and a severity (SEV) assessment, which differs from the standard type […]

Read more »

Those who can, teach statistics

September 29, 2013
By
Those who can, teach statistics

The phrase I despise more than any in popular use (and believe me there are many contenders) is “Those who can, do, and those who can’t, teach.” I like many of the sayings of George Bernard Shaw, but this one … Continue reading →

Read more »

Sunday data/statistics link roundup (9/29/13)

September 29, 2013
By

The links are back! Read on. Susan Murphy - a statistician - wins a Macarthur Award. Great for the field of statistics (via Dan S. and Simina B., among others). Related: an Interview with David Donoho about the Shaw Prize. Statisticians are … Continue reading →

Read more »

The difficulties of measuring just about anything

September 29, 2013
By

Mark Duckenfield writes: Some comments on statistics and “bad math”, that I think display a clear misunderstanding of statistics and surveys. http://www.armytimes.com/article/20130714/NEWS/307140016/Marine-officer-Scope-sex-assault-problem-exaggerated and the editorial to which it refers http://online.wsj.com/article/SB10001424127887323582904578484941173658754.html The original report is quite clear about weighting things, smaple sizes, etc. The apparent “clincher” argument in the editorial—that over 50% of unwanted sexual advances […]The post The difficulties of measuring just about anything appeared first on Statistical Modeling, Causal…

Read more »


Subscribe

Email:

  Subscribe