In this post you will learn how to: Create your own quasi-shape file Plot your homemade quasi-shape file in ggplot2 Add an external svg/ps graphic to a plot Change a grid grob's color and alpha *Note get simple .md version … Continue reading →

A linguist send me an email with the above title and a link to a paper, “The Effect of Language on Economic Behavior: Evidence from Savings Rates, Health Behaviors, and Retirement Assets,” by M. Keith Chen, which begins: Languages differ widely in the ways they encode time. I test the hypothesis that languages that grammatically […] The post “More research from the lunatic fringe” appeared first on Statistical Modeling, Causal…

It strikes me that the media loves to talk about probability, a subject about which journalists are ill-trained to write. The latest example of this is Forbes' attempt to draw a lesson out of the Warren Buffett's gimmicky $1 billion NCAA pool. As we all learned, by the time the 25th match drew to a close, all 8.7 million entrants have gotten at least one winner wrong, thus there would…

Introduction A while ago, one of my co-workers asked me to group box plots by plotting them side-by-side within each group, and he wanted to use patterns rather than colours to distinguish between the box plots within a group; the publication that will display his plots prints in black-and-white only. I gladly investigated how to […]

Kaiser Fung shares this graph from Ritchie King: Kaiser writes: What they did right: - Did not put the data on a map - Ordered the countries by the most recent data point rather than alphabetically - Scale labels are found only on outer edge of the chart area, rather than one set per panel […] The post Small multiples of lineplots > maps (ok, not always, but yes in…

Editor's note: This is a guest post by Alyssa Frazee, a graduate student in the Biostatistics department at Johns Hopkins and a participant in the recent rOpenSci hackathon. Last week, I took a break from my normal PhD student schedule … Continue reading →

For those who weren't able to attend my recent talks, a few have surfaced online. *** JMP put up the video of the webcast from last Friday with Alberto Cairo, a data visualization expert and author of The Functional Art. You can access it from here. This event is part of their Analytically Speaking series with recent guests such as David Hand and Michael Schrage. I also appear on this…

Yesterday I blogged about the Hilbert matrix. The (i,j)th element of the Hilbert matrix has the value 1 / (i+j-1), which is the reciprocal of an integer. However, the printed Hilbert matrix did not look exactly like the formula because the elements print as finite-precision decimals. For example, the last […]

I recently introduced the use of linear basis function models for supervised learning problems that involve non-linear relationships between the predictors and the target. A common type of basis function for such models is the Gaussian basis function. This type of model uses the kernel of the normal (or Gaussian) probability density function (PDF) as […]

Consider again an experiment that seeks to determine the causal relationships between factors and the response, where . Ideally, the sample size is large enough for a full factorial design to be used. However, if the sample size is small and the number of possible treatments is large, then a fractional factorial design can be used instead. Such a […]

Sometime today, I got the idea to try to do automatic speech recognition. Speech recognition, even though it is widely used (and is on our phones), still seems kind of sci-fi-ish to me. The thought of running it on your own computer is still pretty exciting. I looked for open source libraries, and was pleasantly surprised to find Sphinx, a CMU project. It has python bindings, and even lets you…

Many view the propensity theory of probabilities as something incompatible with Bayesian probabilities. Nothing could be further from the truth; it represents an elementary special case of that definition. To see this I’ll apply those Bayesian pr...

IPython notebooks have become a defacto standard for presenting Python-based analyses and talks, as evidenced by recent Pycon and PyData events. As anyone who has used them knows, they are great for “reproducible research”, presentations, and sharing via the nbviewer. There are extensions connecting IPython to R, Octave, Matlab, Mathematica, SQL, among others. However, the […]

There’s a lot of free advice out there. I offer some of it myself! As I’ve written before (see this post from 2008 reacting to this advice from Dan Goldstein for business school students, and this post from 2010 reacting to some general advice from Nassim Taleb), what we see is typically presented as advice […] The post Advice: positive-sum, zero-sum, or negative-sum appeared first on Statistical Modeling, Causal Inference,…

There is now some serious soul-searching in the mainstream media about their (previously) breath-taking coverage of the Big Data revolution. I am collecting some useful links here for those interested in learning more. Here's my Harvard Business Review article in which I discussed the Science paper disclosing that Google Flu Trends, that key exhibit of the Big Data lobby, has systematically over-estimated flu activity for 100 out of the last…

-+*There’s a theorem in statistics that says You could read this aloud as “the mean of the mean is the mean.” More explicitly, it says that the expected value of the average of some number of samples from some distribution is equal to the expected value of the distribution itself. The shorter reading is confusing […]