## Take a look, it’s in a book: distribution of kindle e-book highlights

April 10, 2014
By

If you've ever started a book and not finished it, it may comfort you to know that you are not alone. It's hard to get accurate estimates of the percentage books that are discontinued, but the rise of e-reading (and… Continue reading →

## Small multiples of lineplots > maps (ok, not always, but yes in this case)

April 10, 2014
By

Kaiser Fung shares this graph from Ritchie King: Kaiser writes: What they did right: - Did not put the data on a map - Ordered the countries by the most recent data point rather than alphabetically - Scale labels are found only on outer edge of the chart area, rather than one set per panel […] The post Small multiples of lineplots > maps (ok, not always, but yes in…

## The #rOpenSci hackathon #ropenhack

April 10, 2014
By

Editor's note: This is a guest post by Alyssa Frazee, a graduate student in the Biostatistics department at Johns Hopkins and a participant in the recent rOpenSci hackathon.  Last week, I took a break from my normal PhD student schedule … Continue reading →

## Some past talks

April 10, 2014
By

For those who weren't able to attend my recent talks, a few have surfaced online. *** JMP put up the video of the webcast from last Friday with Alberto Cairo, a data visualization expert and author of The Functional Art. You can access it from here. This event is part of their Analytically Speaking series with recent guests such as David Hand and Michael Schrage. I also appear on this…

## Simple speech recognition in Python

April 10, 2014
By

Sometime today, I got the idea to try to do automatic speech recognition. Speech recognition, even though it is widely used (and is on our phones), still seems kind of sci-fi-ish to me. The thought of running it on your own computer is still pretty e...

## How to format decimals as fractions in SAS

April 10, 2014
By

Yesterday I blogged about the Hilbert matrix. The (i,j)th element of the Hilbert matrix has the value 1 / (i+j-1), which is the reciprocal of an integer. However, the printed Hilbert matrix did not look exactly like the formula because the elements print as finite-precision decimals. For example, the last […]

## Quick notes on file management in Python

April 10, 2014
By

This is primarily for my recollection To expand ~ in a path name: To get the size of a directory:

## Machine Learning Lesson of the Day – Linear Gaussian Basis Function Models

$Machine Learning Lesson of the Day – Linear Gaussian Basis Function Models$

I recently introduced the use of linear basis function models for supervised learning problems that involve non-linear relationships between the predictors and the target.  A common type of basis function for such models is the Gaussian basis function.  This type of model uses the kernel of the normal (or Gaussian) probability density function (PDF) as […]

## Applied Statistics Lesson of the Day – Fractional Factorial Design and the Sparsity-of-Effects Principle

$Applied Statistics Lesson of the Day – Fractional Factorial Design and the Sparsity-of-Effects Principle$

Consider again an experiment that seeks to determine the causal relationships between  factors and the response, where .  Ideally, the sample size is large enough for a full factorial design to be used.  However, if the sample size is small and the number of possible treatments is large, then a fractional factorial design can be used instead.  Such a […]

## Simple speech recognition in python

April 10, 2014
By

Sometime today, I got the idea to try to do automatic speech recognition. Speech recognition, even though it is widely used (and is on our phones), still seems kind of sci-fi-ish to me. The thought of running it on your own computer is still pretty exciting. I looked for open source libraries, and was pleasantly surprised to find Sphinx, a CMU project. It has python bindings, and even lets you…

## Simple speech recognition in python

April 10, 2014
By

Sometime today, I got the idea to try to do automatic speech recognition. Speech recognition, even though it is widely used (and is on our phones), still seems kind of sci-fi-ish to me. The thought of running it on your own computer is still pretty e...

## The “Propensity Theory of Probabilities” as a Simple Application of the Bayesian Definition of Probabilities

April 9, 2014
By

Many view the propensity theory of probabilities as something incompatible with Bayesian probabilities. Nothing could be further from the truth; it represents an elementary special case of that definition. To see this I’ll apply those Bayesian pr...

## IPython notebooks: the new glue?

April 9, 2014
By

IPython notebooks have become a defacto standard for presenting Python-based analyses and talks, as evidenced by recent Pycon and PyData events. As anyone who has used them knows, they are great for “reproducible research”, presentations, and sharing via the nbviewer. There are extensions connecting IPython to R, Octave, Matlab, Mathematica, SQL, among others. However, the […]

## Advice: positive-sum, zero-sum, or negative-sum

April 9, 2014
By

There’s a lot of free advice out there. I offer some of it myself! As I’ve written before (see this post from 2008 reacting to this advice from Dan Goldstein for business school students, and this post from 2010 reacting to some general advice from Nassim Taleb), what we see is typically presented as advice […] The post Advice: positive-sum, zero-sum, or negative-sum appeared first on Statistical Modeling, Causal Inference,…

## Round-up of coverage of the Big Miss of Big Data

April 9, 2014
By

There is now some serious soul-searching in the mainstream media about their (previously) breath-taking coverage of the Big Data revolution. I am collecting some useful links here for those interested in learning more. Here's my Harvard Business Review article in which I discussed the Science paper disclosing that Google Flu Trends, that key exhibit of the Big Data lobby, has systematically over-estimated flu activity for 100 out of the last…

## The mean of the mean is the mean

April 9, 2014
By

-+*There’s a theorem in statistics that says You could read this aloud as “the mean of the mean is the mean.” More explicitly, it says that the expected value of the average of some number of samples from some distribution is equal to the expected value of the distribution itself. The shorter reading is confusing […]

## The Hilbert matrix: A vectorized construction

April 9, 2014
By

The Hilbert matrix is the most famous ill-conditioned matrix in numerical linear algebra. It is often used in matrix computations to illustrate problems that arise when you compute with ill-conditioned matrices. The Hilbert matrix is symmetric and positive definite, properties that are often associated with "nice" and "tame" matrices. The […]

## “Out Damned Pseudoscience: Non-significant results are the new ‘Significant’ results!” (update)

April 9, 2014
By

We were reading “Out, Damned Spot: Can the ‘Macbeth effect’ be replicated?” (Earp,B., Everett,J., Madva,E., and Hamlin,J. 2014, in Basic and Applied Social Psychology 36: 91-8) in an informal gathering of our 6334 seminar yesterday afternoon at Thebes. Some of the graduate students are interested in so-called “experimental” philosophy, and I asked for an example that used statistics […]

## Elections Sans Bipartisme

April 9, 2014
By

Hier, sur Twitter, @JF_Godbout partageait un joli graphique relatif aux élections québécoises, avec les nombres de votes obtenus (ici en pourcentage des votes totaux) et le pourcentage de sièges que cela donne, Il faut dire qu’hier, c&#8217...

## My forecasting book now on Amazon

April 9, 2014
By

For all those people asking me how to obtain a print version of my book “Forecasting: principles and practice” with George Athanasopoulos, you now can. Order on Amazon.com Order on Amazon.co.uk Order on Amazon.fr The online book will continue to b...

## A new data-centric incubator project in DC

April 8, 2014
By

District Data Labs is a new endeavor by members of the local data community (myself included) to increase educational outreach about data-related topics through workshops and other media to the local data community. We want District Data Labs to be an efficient learning resource for people who want to enhance and expand their analytical and […]

## Why I don’t recommend MS Access

April 8, 2014
By

Recently, I was asked:Why do you not recommend Access to use? Just curious. Read on page xi of your intro in Data Analysis Using SQL and Excel. Just beginning a class in SQL and bought your text. Thanks, MortThis is a very fair question and o...

April 8, 2014
By

I recently found this little gem of a web app that analyzes the clarity of your writing. Hemingway highlights long, complex, and hard to read sentences. It also highlights complex words where a simple one would do, and highlights adverbs, suggesting yo...