Suppose you want to sent pictures from Jupiter back to Earth. A lot could happen as a bit travels across the solar system, and so you need some way of correcting errors, or at least detecting errors. The simplest thing to do would be to transmit photos twice. If a bit is received the same […]
Category: Statistics
The real lesson learned from those academic hoaxes: a key part of getting a paper published in a scholarly journal is to be able to follow the conventions of the journal. And some people happen to be good at that, irrespective of the content of the papers being submitted.
I wrote this email to a colleague: Someone pointed me to this paper. It’s really bad. It was published by The Review of Environmental Economics and Policy, “the official journal of the Association of Environmental and Resource Economists and the European Association of Environmental and Resource Economists.” Is this a real organization? The whole thing […]
Chinese character frequency and entropy
Yesterday I wrote a post looking at the frequency of Koine Greek letters and the corresponding entropy. David Littleboy asked what an analogous calculation would look like for a language like Japanese. This post answers that question. First of all, information theory defines the Shannon entropy of an “alphabet” to be bits where pi is […]
Forecasts are always wrong
Recently I was interviewed for the Monash Business School podcast “Thought Capital” on the topic of forecasting. You can listen to the episode here (or read the transcript).
three birthdays and a numeral
The riddle of the week on The Riddler was to find the size n of an audience for at least a 50% chance of observing at least one triplet of people sharing a birthday, as is the case in the present U.S. Senate. The question is much harder to solve than for a pair of […]
Practical Data Science with R 2nd Edition update
We are in the last stages of proofing the galleys/typesetting of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019. So this edition will definitely be out soon! If you ever wanted to see what Nina Zumel and John Mount are like when we have the help of editors, this book is your … Continue reading Practical Data Science with R 2nd Edition update
“Here’s an interesting story right in your sweet spot”
Jonathan Falk writes: Here’s an interesting story right in your sweet spot: Large effects from something whose possible effects couldn’t be that large? Check. Finding something in a sample of 1024 people that requires 34,000 to gain adequate power? Check. Misuse of p values? Check Science journalist hype? Check Searching for the cause of an […]
and it only gets worse [verbatim]
The science of snow
Kenneth G. Libbrecht has posted a 523-page book on snow to arXiv.
Greek letter frequency and entropy
Would the letters in an ancient Greek text carry more or less information than in modern English? To address this question, I downloaded a copy of the Greek New Testament from Project Gutenberg and ran the word frequency script from my previous post. This lead to the follow table of letters and percent frequency. α […]
Non-Gaussian forecasting using fable
library(tidyverse) library(tsibble) library(lubridate) library(feasts) library(fable) In my previous post about the new fable package, we saw how fable can produce forecast distributions, not just point forecasts. All my examples used Gaussian (normal)…
stochastic magnetic bits, simulated annealing and Gibbs sampling
A paper by Borders et al. in the 19 September issue of Nature offers an interesting mix of computing and electronics and optimisation. With two preparatory tribunes! One [rather overdone] on Feynman’s quest. As a possible alternative to quantum computers for creating probabilistic bits. And making machine learning (as an optimisation program) more efficient. And […]
File character counts
Once in a while I need to know what characters are in a file and how often each appears. One reason I might do this is to look for statistical anomalies. Another reason might be to see whether a file has any characters it’s not supposed to have, which is often the case. A few […]
The status-reversal heuristic
Awhile ago we came up with the time-reversal heuristic, which was a reaction to the common situation that there’s a noisy study, followed by an unsuccessful replication, but all sorts of people want to take the original claim as the baseline and construct high walls to make it difficult to move away from that claim. […]
My talk on visualization and data science this Sunday 9am
Uncovering Principles of Statistical Visualization Visualizations are central to good statistical workflow, but it has been difficult to establish general principles governing their use. We will try to back out some principles of visualization by considering examples of effective and ineffective uses of graphics in our own applied research. We consider connections between three goals […]
The Current State of Play in Statistical Foundations: A View From a Hot-Air Balloon
Continue to the third, and last stop of Excursion 1 Tour I of Statistical Inference as Severe Testing: How to Get Beyond the Statistics Wars (2018, CUP)–Section 1.3. It would be of interest to ponder if (and how) the current state of play in the stat wars has shifted in just one year. I’ll do […]
Le Monde puzzle [#1114]
Another very low-key arithmetic problem as Le Monde current mathematical puzzle: 32761 is 181² and the difference of two cubes, which ones? And 181=9²+10², the sum of two consecutive integers. Is this a general rule, i.e. the root z of a perfect square that is the difference of two cubes is always the sum of […]
The virtue of fake universes: A purposeful and safe way to explain empirical inference.
I keep being drawn to thinking there is a away to explain statistical reasoning to others that will actually do more good than harm. Now, I also keep thinking I should know better – but can’t stop. My recent attempt starts with a shadow metaphor, then a review of analytical chemistry and moves to the […]
Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter
We are excited to share a free extract of Zumel, Mount, Practical Data Science with R, 2nd Edition, Manning 2019: Evaluating a Classification Model with a Spam Filter. This section reflects an important design decision in the book: teach model evaluation first, and as a step separate from model construction. It is funny, but it … Continue reading Free R/datascience Extract: Evaluating a Classification Model with a Spam Filter
A heart full of hatred: 8 schools edition
No; I was all horns and thorns Sprung out fully formed, knock-kneed and upright — Joanna Newsom Far be it for me to be accused of liking things. Let me, instead, present a corner of my hateful heart. (That is to say that I’m supposed to be doing a really complicated thing right now and […]