Author: John

Hashing names does not protect privacy

Secure hash functions are practically impossible to reverse, but only if the input is unrestricted. If you generate 256 random bits and apply a secure 256-bit hash algorithm, an attacker wanting to recover your input can’t do much better than brute force, hashing 256-bit strings hoping to find one that matches your hash value. Even […]

May the best technology win

I’ve become skeptical of arguments of the form “X is a better technology, but people won’t quit using Y.” Comparisons of technologies are multi-faceted. When someone says “X is better than Y” I want to ask “By all criteria? There’s nothing better about Y?” When people say X is better but Y won, it’s often […]

Integral approximation trick

Here’s a simple integration approximation that works remarkably well in some contexts. Suppose you have an integrand that looks roughly like a normal density, something with a single peak that drops off fairly quickly on either side of the peak. The majority of integrals that arise in basic applications of probability and statistics fit this […]

Number of feet in a mile

Here are a couple amusing things I’ve run across recently regarding the number of feet in a mile. Both are frivolous, but also have a more serious side. Mnemonic First, you can use “five tomatoes” as a mnemonic for remembering that there are 5280 feet in a mile. “Five tomatoes” is a mnemonic for the […]

Discriminant of a cubic

The discriminant of a quadratic equation ax² + bx + c = 0 is Δ = b² – 4ac. If the discriminant Δ is zero, the equation has a double root, i.e. there is a unique x that makes the equation zero, and it counts twice as a root. If the discriminant is not zero, […]

Distribution of quadratic residues

Let p be an odd prime number. If the equation x² = n mod p has a solution then n is a square mod p, or in classical terminology, n is a quadratic residue mod p. Half of the numbers between 0 and p are quadratic residues and half are not. The residues are distributed […]

What does CCPA say about de-identified data?

The California Consumer Privacy Act, or CCPA, takes effect January 1, 2020, less than six months from now. What does the act say about using deidentified data? First of all, I am not a lawyer; I work for lawyers, advising them on matters where law touches statistics. This post is not legal advice, but my […]

Serious applications of a party trick

In a group of 30 people, it’s likely that two people have the same birthday. For a group of 23 the probability is about 1/2, and it goes up as the group gets larger. In a group of a few dozen people, it’s unlikely that anyone will have a particular birthday, but it’s likely that […]

Channel quantity and quality

Years ago, when there were a couple dozen television stations, someone [1] speculated that when we got more channels we’d also get better content. The argument was that people are more similar in the base interests than in their more refined interests. Therefore if there are only a few channels, they will all appeal to […]

Symmetry in exponential sums

Today’s exponential sum is highly symmetric: These sums are often symmetric, but not always. For example, here’s the sum from a couple days ago: It’s not obvious from looking at the parameters whether a sum will be symmetric or not. Maybe someone could find a prove criteria for a sum to have certain symmetries. For […]

Proto-calculus

David Bressoud has written a new book entitled Calculus Reordered: A History of the Big Ideas. He presents the major themes of calculus in historical order, which is very different from the order in which it is now taught. We now begin with limits, then differentiation, integration, and infinite series. Historically, integration came first and […]

Homomorphic encryption

A function that satisfies f(x*y) = f(x)*f(y) is called a homomorphism. The symbol “*” can stand for any operation, and it need not stand for the same thing on both sides of the equation. Technically * is the group operation, and if the function f maps elements of one group to another, the group operation […]

Journalistic stunt with Emacs

Emacs has been called a text editor with ambitions of being an operating system, and some people semi-seriously refer to it as their operating system. Emacs does not want to be an operating system per se, but it is certainly ambitious. It can be a shell, a web browser, an email client, a calculator, a […]

Notes on computing hash functions

A secure hash function maps a file to a string of bits in a way that is hard to reverse. Ideally such a function has three properties: pre-image resistance collision resistance second pre-image resistance Pre-image resistance means that starting from the hash value, it is very difficult to infer what led to that output; it […]

Translating poetry

You can’t preserve every aspect of a text when translating. A strict word-for-word translation attempts to be faithful to the words but may be ungrammatical in the target language. A idea-for-idea translation is more readable, but still may not convey the style of the original. Translation reminds me of making maps. There have been countless […]

Category theory for data science: cautious optimism

I’m cautiously optimistic about applications of category theory. I believe category theory has the potential to be useful, but I’m skeptical of most claims to have found useful applications. Category theory has found useful application, especially behind the scenes, but a lot of supposed applications remind me of a line from Colin McLarty: [Jean-Pierre] Serre […]

Software to factor integers

In my previous post, I showed how changing one bit of a semiprime (i.e. the product of two primes) creates an integer that can be factored much faster. I started writing that post using Python with SymPy, but moved to Mathematica because factoring took too long. SymPy vs Mathematica When I’m working in Python, SymPy […]

Making public keys factorable with Rowhammer

The security of RSA encryption depends on the fact that the product of two large primes is difficult to factor. So if p and q are large primes, say 2048 bits each, then you can publish n = pq with little fear that someone can factor n to recover p and q. But if you […]

Bounds on the nth prime

The nth prime is approximately n log n. For more precise estimates, there are numerous upper and lower bounds for the nth prime, each tighter over some intervals than others. Here I want to point out upper and lower bounds from a dissertation by Christian Axler on page viii. First, define Then for sufficiently large […]

Converting between nines and sigmas

Nines and sigmas are two ways to measure quality. You’ll hear something has four or five nines of reliability or that some failure is a five sigma event. What do these mean, and how do you convert between them? Definitions If a system has fives nines of availability, that means the probability of the system […]