## Unknown pleasures

November 11, 2014
Have I missed unknown pleasures in Python by focusing on R? A comment on my blog post of last week suggested just that. Reason enough to explore Python a little. Learning another computer language is like learning another human language - it takes time...

## Mais que s’est-il passé pendant la Première Guerre Mondiale?

November 11, 2014
La réponse courte est que des gens sont morts. Beaucoup. Cela étant dit, on ne dit pas grand chose. On peut comparer les pyramides des âges pour mieux comprendre ce qui a pu se passer. Juste avant la guerre (en 1913), la pyramide des âges ressemblait à ça (en utilisant les données de mortality.org) > EXPO <- read.table( + "http://freakonometrics.free.fr/Exposures-France.txt", header=TRUE,skip=2) > EM=EXPO\$Male > EF=EXPO\$Female > Y= EXPO\$Year > A= EXPO\$Age…

## Munging fixed width formats in Python

November 11, 2014
In a previous post, I described how to munge fixed width format data in R. I also developed Python code for the same use case, which is described in this IPython Notebook. This seems the easiest way to present this given WordPress.com’s restricti...

## “LaF”-ing about fixed width formats

November 10, 2014
If you have ever worked with US government data or other large datasets, it is likely you have faced fixed-width format data. This format has no delimiters in it; the data look like strings of characters. A separate format file defines which columns of data represent which variables. It seems as if the format is […]

## Reverse Regression Follow-up

November 10, 2014
At the end of my recent post on Reverse Regression, I posed three simple questions - homework for the students among you, if you will. Here they are again, with brief "solutions":First recall the context. We fitted the following simple regression ...

## 2nd Edition has shipped (Doing Bayesian Data Analysis)

November 10, 2014
I am told by some readers that they have received a physical copy of the 2nd Edition of Doing Bayesian Data Analysis, but I have yet to see it myself. I hope the paper is heavy and opaque, but the book lifts the spirits and the concepts are transparent...

November 10, 2014
The other day I wrote: After encountering the Chicago-cops example I was going to retitle this post, “The psych department’s just another crew” in homage to the line, “The police department’s just another crew” from the rap, “Who Protects Us From You.” But, just to check, I googled that KRS-One rap and it turns out […] The post Illegal Business Controls America appeared first on Statistical Modeling, Causal Inference, and…

## On deck this week

November 10, 2014
Mon: Illegal Business Controls America Tues: The history of MRP highlights some differences between political science and epidemiology Wed: “Patchwriting” is a Wegmanesque abomination but maybe there’s something similar that could be helpful? Thurs: If you do an experiment with 700,000 participants, you’ll (a) have no problem with statistical significance, (b) get to call it […] The post On deck this week appeared first on Statistical Modeling, Causal Inference, and…

## Rasmus’ socks fit perfectly!

November 10, 2014
Following the previous post on Rasmus’ socks, I took the opportunity of a survey on ABC I am currently completing to compare the outcome of his R code with my analytical derivation. After one quick correction [by Rasmus] of a wrong representation of the Negative Binomial mean-variance parametrisation [by me], I achieved this nice fit… […]

## Financial and statistical incentives to over-diagnose and over-treat

November 10, 2014
Nice article in the New York Times about the "overdiagnosis" problem in cancer screening. The particular case is thyroid cancer in South Korea. There are a number of things about any form of screening tests that one should always bear in mind: Death rate is measured as the number of deaths divided by the number of people with the disease. The latter number increases with better diagnosis techniques. Better diagnosis…

## Practical Data Science Cookbook

November 10, 2014
Practical Data Science Cookbook My friends Sean Murphy, Ben Bengfort, Tony Ojeda and I recently published a book, Practical Data Science Cookbook. All of us are heavily involved in developing the data community in the Washington DC metro area, serving on the Board of Directors of Data Community DC. Sean and Ben co-organize the meetup […]

## Penn Econometrics Reading Group Materials Online

November 10, 2014
Locals who come to the Friday research/reading group will obviously be interested in this post, but others may also be interested in following and influencing the group's path.The schedule has been online here for a while. Starting now, it will co...

## An efficient way to increment a matrix diagonal

November 10, 2014
I was recently asked about how to use the SAS/IML language to efficiently add a constant to every element of a matrix diagonal. Mathematically, the task is to form the matrix sum A + kI, where A is an n x n matrix, k is a scalar value, and I is the […]

## Sunday data/statistics link roundup (11/9/14)

November 10, 2014
So I'm a day late, but you know, I got a new kid and stuff... The New Yorker hating on MOOCs, they mention all the usual stuff. Including the really poorly designed San Jose State experiment. I think this deserves a longer post, but this is definitely a case where people are looking at MOOCs

## Multiple Linear Regression Revisited

November 10, 2014
$Multiple Linear Regression Revisited$

Last night, I had a discussion about the integrative data analysis (closely related with the discussion of AOAS 2014 paper from Dr Xihong Lin’s group and JASA 2014 paper from Dr. Hongzhe Li’s group) with my friend. If some biologist gave you the genetic variants (e.g. SNP) data and the phenotype (e.g. some trait) data, […]

## “Statistical Flukes, the Higgs Discovery, and 5 Sigma” at the PSA

November 9, 2014
We had an excellent discussion at our symposium yesterday: “How Many Sigmas to Discovery? Philosophy and Statistics in the Higgs Experiments” with Robert Cousins, Allan Franklin and Kent Staley. Slides from my presentation, “Statistical Flukes, the Higgs Discovery, and 5 Sigma” are posted below (we each only had 20 minutes, so this is clipped,but much came out in the discussion). […]

## A Source of Irritation

November 9, 2014
I very much liked one of ECONJEFF's posts last week, titled "Epistemological Irritation of the Day".The bulk of it reads:" "A direct test of the hypothesis is looking for significance in the relationship between [one variable] and {another variabl...

## “Differences Between Econometrics and Statistics” (my talk this Monday at the University of Pennsylvania econ dept)

November 9, 2014
Differences Between Econometrics and Statistics:  that’s the title of the talk I’ll be giving at the econometrics workshop at noon on Monday. At 4pm 4:30pm in the same place, I’ll be speaking on Stan. And here are some things for people to read: For “Differences between econometrics and statistics”: Everyone’s trading bias for variance at […] The post “Differences Between Econometrics and Statistics” (my talk this Monday at the University…

## The completeness of online gun shooting victim counts

November 9, 2014
There are a number of on line efforts to register victims of shootings online. Shootingtracker tries to register all mass shootings, those with four or more victims. Slate had the gun death tally (GDT), gun deaths starting at Newtown, running thro...

## Econometric Society World Congress

November 9, 2014
Every five years, the Econometric Society holds a World Congress. In those years, the usual annual European, North American, Latin American, and Australasian meetings are held over.The first World Congress was held in Rome, in 1960. I've been to a few ...

## SBS documentary “The Age of Big Data”

November 8, 2014
by Yanchang Zhao, RDataMining.com “Data is becoming a powerful and most valuable commodity in 21st century. It is leading to scientific insights and new ways of understanding human behaviour. Data can also make you rich. Very rich.” — SBS documentary … Continue reading →