Category: Statistics

Golay codes

Suppose you want to sent pictures from Jupiter back to Earth. A lot could happen as a bit travels across the solar system, and so you need some way of correcting errors, or at least detecting errors. The simplest thing to do would be to transmit photos twice. If a bit is received the same […]

The real lesson learned from those academic hoaxes: a key part of getting a paper published in a scholarly journal is to be able to follow the conventions of the journal. And some people happen to be good at that, irrespective of the content of the papers being submitted.

I wrote this email to a colleague: Someone pointed me to this paper. It’s really bad. It was published by The Review of Environmental Economics and Policy, “the official journal of the Association of Environmental and Resource Economists and the European Association of Environmental and Resource Economists.” Is this a real organization? The whole thing […]

Chinese character frequency and entropy

Yesterday I wrote a post looking at the frequency of Koine Greek letters and the corresponding entropy. David Littleboy asked what an analogous calculation would look like for a language like Japanese. This post answers that question. First of all, information theory defines the Shannon entropy of an “alphabet” to be bits where pi is […]

three birthdays and a numeral

The riddle of the week on The Riddler was to find the size n of an audience for at least a 50% chance of observing at least one triplet of people sharing a birthday, as is the case in the present U.S. Senate. The question is much harder to solve than for a pair of […]

“Here’s an interesting story right in your sweet spot”

Jonathan Falk writes: Here’s an interesting story right in your sweet spot: Large effects from something whose possible effects couldn’t be that large? Check. Finding something in a sample of 1024 people that requires 34,000 to gain adequate power? Check. Misuse of p values? Check Science journalist hype? Check Searching for the cause of an […]

Greek letter frequency and entropy

Would the letters in an ancient Greek text carry more or less information than in modern English? To address this question, I downloaded a copy of the Greek New Testament from Project Gutenberg and ran the word frequency script from my previous post. This lead to the follow table of letters and percent frequency. α […]

stochastic magnetic bits, simulated annealing and Gibbs sampling

A paper by Borders et al. in the 19 September issue of Nature offers an interesting mix of computing and electronics and optimisation. With two preparatory tribunes! One [rather overdone] on Feynman’s quest. As a possible alternative to quantum computers for creating probabilistic bits. And making machine learning (as an optimisation program) more efficient. And […]

File character counts

Once in a while I need to know what characters are in a file and how often each appears. One reason I might do this is to look for statistical anomalies. Another reason might be to see whether a file has any characters it’s not supposed to have, which is often the case. A few […]

The status-reversal heuristic

Awhile ago we came up with the time-reversal heuristic, which was a reaction to the common situation that there’s a noisy study, followed by an unsuccessful replication, but all sorts of people want to take the original claim as the baseline and construct high walls to make it difficult to move away from that claim. […]

My talk on visualization and data science this Sunday 9am

Uncovering Principles of Statistical Visualization Visualizations are central to good statistical workflow, but it has been difficult to establish general principles governing their use. We will try to back out some principles of visualization by considering examples of effective and ineffective uses of graphics in our own applied research. We consider connections between three goals […]

Le Monde puzzle [#1114]

Another very low-key arithmetic problem as Le Monde current mathematical puzzle: 32761 is 181² and the difference of two cubes, which ones? And 181=9²+10², the sum of two consecutive integers. Is this a general rule, i.e. the root z of a perfect square that is the difference of two cubes is always the sum of […]

Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter

We are excited to share a free extract of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019: Evaluating a Classification Model with a Spam Filter. This section reflects an important design decision in the book: teach model evaluation first, and as a step separate from model construction. It is funny, but it … Continue reading Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter

A heart full of hatred: 8 schools edition

No; I was all horns and thorns Sprung out fully formed, knock-kneed and upright — Joanna Newsom Far be it for me to be accused of liking things. Let me, instead, present a corner of my hateful heart. (That is to say that I’m supposed to be doing a really complicated thing right now and […]