Greetings from PyCon 2014 in Montreal! I did a book signing yesterday at the O'Reilly Media booth. I had the pleasure of working side by side with David Beazley, who was signing copies of The Python Cookbook, now updated for Python 3 and, I...

“There was a vain and ambitious hospital director. A bad statistician. ..There were good medics and bad medics, good nurses and bad nurses, good cops and bad cops … Apparently, even some people in the Public Prosecution service found the witch hunt deeply disturbing.” This is how Richard Gill, statistician at Leiden University, describes a […]

This bit is perhaps worth saying again, especially given the occasional trolling on the internet by people who disparage their ideological opponents by calling them “religious” . . . So here it is: Sometimes the choice of statistical philosophy is decided by convention or convenience. . . . In many settings, however, we have freedom […] The post “Schools of statistical thoughts are sometimes jokingly likened to religions. This analogy…

A linguist send me an email with the above title and a link to a paper, “The Effect of Language on Economic Behavior: Evidence from Savings Rates, Health Behaviors, and Retirement Assets,” by M. Keith Chen, which begins: Languages differ widely in the ways they encode time. I test the hypothesis that languages that grammatically […] The post “More research from the lunatic fringe” appeared first on Statistical Modeling, Causal…

It strikes me that the media loves to talk about probability, a subject about which journalists are ill-trained to write. The latest example of this is Forbes' attempt to draw a lesson out of the Warren Buffett's gimmicky $1 billion NCAA pool. As we all learned, by the time the 25th match drew to a close, all 8.7 million entrants have gotten at least one winner wrong, thus there would…

Introduction A while ago, one of my co-workers asked me to group box plots by plotting them side-by-side within each group, and he wanted to use patterns rather than colours to distinguish between the box plots within a group; the publication that will display his plots prints in black-and-white only. I gladly investigated how to […]

Kaiser Fung shares this graph from Ritchie King: Kaiser writes: What they did right: - Did not put the data on a map - Ordered the countries by the most recent data point rather than alphabetically - Scale labels are found only on outer edge of the chart area, rather than one set per panel […] The post Small multiples of lineplots > maps (ok, not always, but yes in…

Editor's note: This is a guest post by Alyssa Frazee, a graduate student in the Biostatistics department at Johns Hopkins and a participant in the recent rOpenSci hackathon. Last week, I took a break from my normal PhD student schedule … Continue reading →

For those who weren't able to attend my recent talks, a few have surfaced online. *** JMP put up the video of the webcast from last Friday with Alberto Cairo, a data visualization expert and author of The Functional Art. You can access it from here. This event is part of their Analytically Speaking series with recent guests such as David Hand and Michael Schrage. I also appear on this…

Yesterday I blogged about the Hilbert matrix. The (i,j)th element of the Hilbert matrix has the value 1 / (i+j-1), which is the reciprocal of an integer. However, the printed Hilbert matrix did not look exactly like the formula because the elements print as finite-precision decimals. For example, the last […]

I recently introduced the use of linear basis function models for supervised learning problems that involve non-linear relationships between the predictors and the target. A common type of basis function for such models is the Gaussian basis function. This type of model uses the kernel of the normal (or Gaussian) probability density function (PDF) as […]

Consider again an experiment that seeks to determine the causal relationships between factors and the response, where . Ideally, the sample size is large enough for a full factorial design to be used. However, if the sample size is small and the number of possible treatments is large, then a fractional factorial design can be used instead. Such a […]

Sometime today, I got the idea to try to do automatic speech recognition. Speech recognition, even though it is widely used (and is on our phones), still seems kind of sci-fi-ish to me. The thought of running it on your own computer is still pretty exciting. I looked for open source libraries, and was pleasantly surprised to find Sphinx, a CMU project. It has python bindings, and even lets you…

Many view the propensity theory of probabilities as something incompatible with Bayesian probabilities. Nothing could be further from the truth; it represents an elementary special case of that definition. To see this I’ll apply those Bayesian pr...

IPython notebooks have become a defacto standard for presenting Python-based analyses and talks, as evidenced by recent Pycon and PyData events. As anyone who has used them knows, they are great for “reproducible research”, presentations, and sharing via the nbviewer. There are extensions connecting IPython to R, Octave, Matlab, Mathematica, SQL, among others. However, the […]