# Author: John

## Airline flight number parity

I read in Wikipedia this morning that there’s a pattern to the parity of flight numbers. Among airline flight numbers, even numbers typically identify eastbound or northbound flights, and odd numbers typically identify westbound or southbound flights. I never noticed this. I could see how it might be a useful convention. It would mean that […]

## Testing Rupert Miller’s suspicion

I was reading Rupert Miller’s book Beyond ANOVA when I ran across this line: I never use the Kolmogorov-Smirnov test (or one of its cousins) or the χ² test as a preliminary test of normality. … I have a feeling they are more likely to detect irregularities in the middle of the distribution than in […]

## Why would anyone do that?

There are tools that I’ve used occasionally for many years that I’ve just started to appreciate lately. “Oh, that’s why they did that.” When you see something that looks poorly designed, don’t just exclaim “Why would anyone do that?!” but ask sincerely “Why would someone do that?” There’s probably a good reason, or at least […]

## Predicted distribution of Mersenne primes

Mersenne primes are prime numbers of the form 2p – 1. It turns out that if 2p – 1 is a prime, so is p; the requirement that p is prime is a theorem, not part of the definition. So far 51 Mersenne primes have discovered [1]. Maybe that’s all there are, but it is […]

## Short video introducing differential privacy

Here is a 12-minute video from Minute Physics, in collaboration with the US Census Bureau, giving an overview of differential privacy and how the 2020 census will use it to protect privacy. Related posts Scaling up differential privacy: lessons from the US Census Protecting privacy while keeping detailed date information Comparing differential privacy to Safe […]

## Collatz conjecture skepticism

The Collatz conjecture asks whether the following procedure always terminates at 1. Take any positive integer n. If it’s odd, multiply it by 3 and add 1. Otherwise, divide it by 2. For obvious reasons the Collatz conjecture is also known as the 3n + 1 conjecture. It has been computationally verified that the Collatz […]

## String interpolation in Python and R

One of the things I liked about Perl was string interpolation. If you use a variable name in a string, the variable will expand to its value. For example, if you a variable \$x which equals 42, then the string “The answer is \$x.” will expand to “The answer is 42.” Perl requires variables to […]

## Detecting typos with the four color theorem

In my previous post on VIN numbers, I commented that if a check sum has to be one of 11 characters, it cannot detect all possible changes to a string from an alphabet of 33 characters. The number of possible check sum characters must be at least as large as the number of possible characters […]

## Vehicle Identification Number (VIN) check sum

A VIN (vehicle identification number) is a string of 17 characters that uniquely identifies a car or motorcycle. These numbers are used around the world and have three standardized formats: one for North America, one for the EU, and one for the rest of the world. Letters that resemble digits The characters used in a […]

## Progress on the Collatz conjecture

The Collatz conjecture is for computer science what until recently Fermat’s last theorem was for mathematics: a famous unsolved problem that is very simple to state. The Collatz conjecture, also known as the 3n+1 problem, asks whether the following function terminates for all positive integer arguments n. def collatz(n): if n == 1: return 1 […]

## How UTF-8 works

UTF-8 is a clever way of encoding Unicode text. I’ve mentioned it a couple times lately, but I haven’t blogged about UTF-8 per se. Here goes. The problem UTF-8 solves US keyboards can often produce 101 symbols, which suggests 101 symbols would be enough for most English text. Seven bits would be enough to encode […]

## Excel, R, and Unicode

I received some data as an Excel file recently. I cleaned things up a bit, exported the data to a CSV file, and read it into R. Then something strange happened.
Say the CSV file looked like this:
foo,bar
1,2
3,4

I read the file into R with…

## How fast were dead languages spoken?

A new paper in Science suggests that all human languages carry about the same amount of information per unit time. In languages with fewer possible syllables, people speak faster. In languages with more syllables, people speak slower. Researchers quantified the information content per syllable in 17 different languages by calculating Shannon entropy. When you multiply […]

## Quiet mode

When you start a programming language like Python or R from the command line, you get a lot of initial text that you probably don’t read. For example, you might see something like this when you start Python. Python 2.7.6 (default, Nov 23 2017, 15:49:48) [GCC 4.8.4] on linux2 Type “help”, “copyright”, “credits” or “license” […]

## More bc weirdness

As I mentioned in a footnote to my previous post, I just discovered that variable names in the bc programming language cannot contain capital letters. I think I understand why: Capital letters are reserved for hexadecimal constants, though in a weird sort of way. At first variable names in bc could only be one letter […]

## Asimov’s question about π

In 1977, Isaac Asimov [1] asked how many terms of the slowly converging series π = 4 – 4/3 + 4/5 – 4/7 + 4/9 – … would you have to sum before doing better than the approximation π ≈ 355/113. A couple years later Richard Johnsonbaugh [2] answered Asimov’s question in the course of […]

## National Drug Code (NDC)

The US Food and Drug Administration tracks drugs using an identifer called the NDC or National Drug Code. It is described as a 10-digit code, but it may be more helpful to think of it as a 12-character code. An NDC contains 10 digits, separated into three segments by two dashes. The three segments are […]

## Prefix code examples

In many offices, you can dial a three digit number to reach someone else in the office. In such offices, you usually have to dial 9 to to reach an outside number. There’s no ambiguity because no one can have an extension that begins with 9. After you’ve entered three digits, the phone system knows […]

## Number of possible Unicode characters

How many? The previous post showed how the number of Unicode characters has grown over time. You’ll notice there was a big jump between versions 3.0 and 3.1. That will be important later on. Unicode started out relative small then became much more ambitious. Are they going to run out of room? How many possible […]

## Growth of Unicode over time

My previous post quoted Randall Munroe saying Unicode “started out just trying to unify a couple different character sets” and grew much more ambitious. The first version of Unicode, published in 1991, had 7,191 characters. Now the latest version has 137,994 characters and so is about 19 times bigger. Here’s a plot of the number […]