## The Cui-bono Approach to Open Data

July 3, 2015
By

What’s the problem? Which data are needed to solve it? Who gets an advantage of it? These few questions are valuable key for implementing the open data culture. Open data not as ‘l’art pour l’art’ but in a pragmatic approach, demonstrating that the ‘proof of the pudding is in the eating’. It seems to work … Continue reading The Cui-bono Approach to Open Data

## The Massive Future of Statistics Education

July 3, 2015
By

NOTE: This post was written as a chapter for the not-yet-released Handbook on Statistics Education.  Data are eating the world, but our collective ability to analyze data is going on a starvation diet. Everywhere you turn, data are being generated somehow. By the time you read this piece, you’ll probably have collected some data. (For

## “Why should anyone believe that? Why does it make sense to model a series of astronomical events as though they were spins of a roulette wheel in Vegas?”

July 3, 2015
By

Deborah Mayo points us to a post by Stephen Senn discussing various aspects of induction and statistics, including the famous example of estimating the probability the sun will rise tomorrow. Senn correctly slams a journalistic account of the math problem: The canonical example is to imagine that a precocious newborn observes his first sunset, and […] The post “Why should anyone believe that? Why does it make sense to model…

## Visualizing survey results excellently

July 3, 2015
By

Surveys generate a lot of data. And, if you have used a survey vendor, you know they generate a ton of charts. I was in Germany to attend the Data Meets Viz workshop organized by Antony Unwin. Paul and Sascha...

## Larry Laudan: “When the ‘Not-Guilty’ Falsely Pass for Innocent”, the Frequency of False Acquittals (guest post)

July 3, 2015
By

Professor Larry Laudan Lecturer in Law and Philosophy University of Texas at Austin “When the ‘Not-Guilty’ Falsely Pass for Innocent” by Larry Laudan While it is a belief deeply ingrained in the legal community (and among the public) that false negatives are much more common than false positives (a 10:1 ratio being the preferred guess), […]

## An Update on Boosting with Splines

July 2, 2015
By
$\boldsymbol{y}_1=h_1(\boldsymbol{x})$

In my previous post, An Attempt to Understand Boosting Algorithm(s), I was puzzled by the boosting convergence when I was using some spline functions (more specifically linear by parts and continuous regression functions). I was using > library(splines) > fit=lm(y~bs(x,degree=1,df=3),data=df) The problem with that spline function is that knots seem to be fixed. The iterative boosting algorithm is start with some regression model  compute the residuals, including some shrinkage parameter, then…

## Looks like this R thing might be for real

July 2, 2015
By

Not sure how I missed this, but the Linux Foundation just announced the R Consortium for supporting the "world’s most popular language for analytics and data science and support the rapid growth of the R user community". From the Linux Foundation: The R language is used by statisticians, analysts and data scientists to unlock value

## Humility needed in decision-making

July 2, 2015
By

Brian MacGillivray and Nick Pidgeon write: Daniel Gilbert maintains that people generally make bad decisions on risk issues, and suggests that communication strategies and education programmes would help (Nature 474, 275–277; 2011). This version of the deficit model pervades policy-making and branches of the social sciences. In this model, conflicts between expert and public perceptions […] The post Humility needed in decision-making appeared first on Statistical Modeling, Causal Inference, and…

## R brut

July 2, 2015
By

Filed under: Kids, pictures, R, Statistics, University life Tagged: cex, pch, plot, R

## How To Be A Kick-A## Teacher

July 2, 2015
By

25 helpful pieces of advice. Comportment: Walk like you're walking away from an explosion in a Hollywood movie. Tuck your chin in, tilt your head down and look at people from out of the top of your eyes. Squint. Lecturing: Show up late, then run ove...

## Recently in the sister blog

July 1, 2015
By

When is the death penalty okay? A court with no Protestants How much does advertising matter in presidential elections? Bartenders are Democrats, beer wholesalers are Republicans The ambiguity of racial categories No, public opinion is not driven by ‘unreasoning bias and emotion’ Political science: Who is it for? Modern campaigning has big effects on voter […] The post Recently in the sister blog appeared first on Statistical Modeling, Causal Inference,…

## Variable Selection using Cross-Validation (and Other Techniques)

July 1, 2015
By
$\nu$

A natural technique to select variables in the context of generalized linear models is to use a stepŵise procedure. It is natural, but contreversial, as discussed by Frank Harrell  in a great post, clearly worth reading. Frank mentioned about 10 points against a stepwise procedure. It yields R-squared values that are badly biased to be high. The F and chi-squared tests quoted next to each variable on the printout do not have the…

## How Airbnb built a data science team

July 1, 2015
By

From Venturebeat: Back then we knew so little about the business that any insight was groundbreaking; data infrastructure was fast, stable, and real-time (I was querying our production MySQL database); the company was so small that everyone was in the loop about every decision; and the data team (me) was aligned around a singular set

## Merge observed outcomes into a list of all outcomes

July 1, 2015
By

When you count the outcomes of an experiment, you do not always observe all of the possible outcomes. For example, if you roll a six-sided die 10 times, it might be that the "1" face does not appear in those 10 rolls. Obviously, this situation occurs more frequently with small […] The post Merge observed outcomes into a list of all outcomes appeared first on The DO Loop.

July 1, 2015
By

Now that the (Northern) summer is here, you should have plenty of time for reading. Here are some recommendations:Ahelegbey, D. F., 2015. The econometrics of networks: A review. Working Paper 13/WP/2015, Department of Economics, University of Venice.Ca...

## Useful tutorials

July 1, 2015
By

There are some tools that I use regularly, and I would like my research students and post-docs to learn them too. Here are some great online tutorials that might help. ggplot tutorial from Winston Chang Writing an R package from Karl Broman Rmarkdown from RStudio Shiny from RStudio git/github guide from Karl Broman minimal make tutorial from Karl […]

## Stapel’s Fix for Science? Admit the story you want to tell and how you “fixed” the statistics to support it!

July 1, 2015
By

Stapel’s “fix” for science is to admit it’s all “fixed!” That recent case of the guy suspected of using faked data for a study on how to promote support for gay marriage in a (retracted) paper, Michael LaCour, is directing a bit of limelight on our star fraudster Diederik Stapel (50+ retractions). The Chronicle of Higher Education just published an article by […]

## Notes from the Kölner R meeting, 26 June 2015

June 30, 2015
By

Last Friday the Cologne R user group came together for the 14th time. For the first time we met at Startplatz, a start-up incubator venue. The venue was excellent, not only did they provide us with a much larger room, but also with table-football and d...

## Where does Mister P draw the line?

June 30, 2015
By

Bill Harris writes: Mr. P is pretty impressive, but I’m not sure how far to push him in particular and MLM [multilevel modeling] in general. Mr. P and MLM certainly seem to do well with problems such as eight schools, radon, or the Xbox survey. In those cases, one can make reasonable claims that the […] The post Where does Mister P draw the line? appeared first on Statistical Modeling,…

## NBER IFM Data Session and Site

June 30, 2015
By

The NBER's International Finance and Macroeconomics (IFM) Program is sponsoring a 2015 Summer Institute "Data Session" and a corresponding web site ("Catalog of Data Sources") where the various datasets are archived.Great idea. Hats off to the org...

## Hey, this is what Michael Lacour should’ve done when they asked him for his data

June 30, 2015
By

Texas Town Is Charging Us \$79,000 for Emails About Pool Party Abuse Cop. FOIA that, pal! The post Hey, this is what Michael Lacour should’ve done when they asked him for his data appeared first on Statistical Modeling, Causal Inference, and Socia...

## A note from John Lott

June 29, 2015
By

The other day, I wrote: It’s been nearly 20 years since the last time there was a high-profile report of a social science survey that turned out to be undocumented. I’m referring to the case of John Lott, who said he did a survey on gun use in 1997, but, in the words of Wikipedia, […] The post A note from John Lott appeared first on Statistical Modeling, Causal Inference,…

## The Econometrics of Temporal Aggregation – VI – Tests of Linear Restrictions

June 29, 2015
By

This post is one of several related posts. The previous ones can be found here, here, here, here and here. These posts are based on Giles (2014).Many of the statistical tests that we perform routinely in econometrics can be affected by the level o...