## Python: Venn Diagram

November 15, 2013
By

Venn Diagram is very useful for visualizing operations between events/sets. So in this post, we will learn how to visualize one in Python. First, we need to install the module matplotlib-venn. Open the terminal or command prompt, and run the followin...

## BDA class 4 G+ hangout on air is on air

November 15, 2013
By

Here. And here‘s the backstory. P.S. The damn mike was muted most of the time. Something always goes wrong! The post BDA class 4 G+ hangout on air is on air appeared first on Statistical Modeling, Causal Inference, and Social Science.

## Un peu plus près des étoiles (***)

November 15, 2013
By
$p$

Il y a eu un gros buzz, récement autour du papier de Valen Johnson paru dans PNAS. L’article a été repris un peu partout (http://nature.com/news/, http://blogs.scientificamerican.com/absolutely-maybe/, http://arstechnica.com/science/ ou encore http://passeurdesciences.blog.lemonde.fr/ qui a repris l’information, en français). Et plusieurs personnes m’ont fait suivre des liens, en me demandant mon avis, par courriel ou via twitter. Je ne vais pas revenir sur l’étude (pour l’instant) ni sur les mauvaises lectures de l’étude, mais plutôt sur le buzz…

## How Countries Fare, 2010

November 15, 2013
By

Originally posted on CoolStatsBlog:The Current Account Balance is a measure of a country’s “profitability”. It is the sum of profits (losses) made from trading with other countries, profits (losses) made from investments in other countries, and cash transfers, such as remittances from expatriates. World: Current Account Balance, 2010 As the infographic shows, there isn’t…

## Daily/monthly/yearly tallies for your data

November 15, 2013
By

Say you have a dataset, where each row has a date or time, and something is recorded for that date and time. If each row is a unique date – great! If not, you may have rows with the same date, and you have to combine records for the same date to get a daily tally. […]

## BDA class G+ hangout another try

November 14, 2013
By

Tomorrow (Thurs) 8h30 (Paris time) I will be teaching my Bayesian Data Analysis class (class4a.pdf and class4b.pdf, you can follow the slides here). We had problems earlier with the regular G+ hangout, so this time we’re trying the G+ On-Air Hangout which I think should work better. I’ll post a blog entry tomorrow with a […]The post BDA class G+ hangout another try appeared first on Statistical Modeling, Causal Inference,…

## The Leek group guide to sharing data with a data analyst to speed collaboration

November 14, 2013
By

My group collaborates with many different scientists and the number one determinant of how fast we can turn around results is the status of the data we receive from our collaborators. If the data are well organized and all the … Continue reading →

## Calibration of p-value under variable selection: an example

November 14, 2013
By

Very often people report p-values for linear regression estimates after performing variable selection step. Here is a simple simulation that shows that such a procedure might lead to wrong calibration of such tests.Consider a simple data generating pro...

## Statistics is the least important part of data science

November 14, 2013
By

This came up already but I’m afraid the point got lost in the middle of our long discussion of Rachel and Cathy’s book. So I’ll say it again: There’s so much that goes on with data that is about computing, not statistics. I do think it would be fair to consider statistics (which includes sampling, […]The post Statistics is the least important part of data science appeared first on Statistical…

## Mathématiques de l’Assurance Non-Vie (2)

November 14, 2013
By

« Dans ce contexte d’incertitude, il est particulièrement réconfortant de revenir aux sources, aux fondamentaux, c’est-à-dire aux mathématiques et de rappeler que le risque naît de l’aléa et s’appréhende grâce aux développements les plus avancés du calcul des probabilités. » (Claude Bébéar, dans la préface du tome 1) Après avoir été épuisé plusieurs semaines (voire plusieurs mois ?), une nouvelle impression du tome 2 de Mathématiques de l’Assurance Non-Vie, coécrit avec Michel…

## Loophole

November 14, 2013
By

I think I should thank Marta (again!) for this post, as she made me think about it while we were riding together to the Stan workshop, in one of our now ("A XY", that is, as opposed to "B XY" when we used to do so all the time) rare joint outings on th...

## Bayesian essentials with R available on amazon

November 14, 2013
By

Bayesian Essentials with R is now available both as an e-book and as a hardcover book on amazon.com!Filed under: Books, R, Statistics, University life Tagged: Bayesian Core, Bayesian Essentials with R, e-book, Jean-Michel Marin, R, Springer-Verlag

## What will I do on my Caribbean vacation? Teach data mining, of course!

November 14, 2013
By

Monday, November 18th at the Radisson Hotel Barbados. Presented by Michael Berry of Tripadvisor and David Weisman of the University of Massachusetts.  Sponsored by Purple Leaf Communications. Registration and information here.

## T. Kepler: “Trouble with ‘Trouble at the Lab’?” (guest post)

November 14, 2013
By

Tom Kepler’s guest post arose in connection with my November 9 post & comments. Professor Thomas B. Kepler Department of Microbiology Department of Mathematics & Statistics Boston University School of Medicine There is much to say about the article in the Economist, but the first is to note that it is far more balanced than […]

## What makes us happy? Let’s look at data to find out.

November 13, 2013
By

I’ve had a lot of different jobs over the past 4 years, and I’ve had some incredible experiences along the way. Lately, I’ve been struggling with what to do next. Or perhaps more accurately, I’ve been struggling with how to de...

## Parallel R (and air travel)

November 13, 2013
By

My heart sinks a little when I check on my laptop in the morning and the computation I started the night before still hasn’t finished. Even when the data I’m playing with isn’t particularly.... large... (I’m not going to say it), I have a knack for choosing expensive algorithms late at night. Because of my »more

## What makes us happy? Lets look at data to find out.

November 13, 2013
By

I've had a lot of different jobs over the past 4 years, and I've had some incredible experiences along the way. Lately, I've been struggling with what to do next. Or perhaps more accurately, I've been struggling with how to decide what to do next. Decisions that seem obvious in hindsight are tough to come to grips with beforehand, and it's led me to think about what metric I am…

## “What are some situations in which the classical approach (or a naive implementation of it, based on cookbook recipes) gives worse results than a Bayesian approach, results that actually impeded the science?”

November 13, 2013
By

Phil Nelson writes in the context of a biostatistics textbook he is writing, “Physical models of living systems”: There are a number of classic statistical problems that arise every day in the lab, and which are discussed in any book: 1. In a control group, M untreated rats out of 20 got a form of […]The post “What are some situations in which the classical approach (or a naive implementation…

## A good paper on measuring digital marketing

November 13, 2013
By

In response to my post about the challenges of measuring digital marketing, Dean Eckles (Facebook) sent me a paper by him and his colleagues. The paper is titled "Social Influence in Social Advertising: Evidence from Field Experiments" by Bakshy, Eckles, et. al. (ACM 2012) and it's an impressive piece of work. In this post, I summarize the research for those who don't want to read an academic paper; and then…

## Original source code for Apple II DOS

November 13, 2013
By

Someone needs to put this on GitHub right now. Thanks Paul Laughton for your donation of this superb collection of early to mid-1978 documents including the letters, agreements, specifications (including hand-written code and schematics), and two original source code listing for the … Continue reading →

## How to compute the incomplete beta function in SAS

November 13, 2013
By

While sorting through an old pile of papers, I discovered notes from a 2012 SAS conference that I had attended. Next to the abstract for one presentation, I had scrawled a note to myself that read "BLOG about the incomplete beta function!" Okay, Rick, whatever you say! In statistics, the [...]

## Survival analysis for hard drives

November 12, 2013
By

How long do hard drives last? Backblaze has kept up to 25,000 hard drives constantly online for the last four years. Every time a drive fails, they note it down, then slot in a replacement. After four years, Backblaze now … Continue reading →

## Classical Statistics really is screwed.

November 12, 2013
By

It’s believed the crises in science will abate if we only educate everyone on the correct interpretation of p-values and confidence intervals. I explained before in this long post why this isn’t true. Below is a summary. Two technical point...