Category: Computing

Journalistic stunt with Emacs

Emacs has been called a text editor with ambitions of being an operating system, and some people semi-seriously refer to it as their operating system. Emacs does not want to be an operating system per se, but it is certainly ambitious. It can be a shell, a web browser, an email client, a calculator, a […]

Notes on computing hash functions

A secure hash function maps a file to a string of bits in a way that is hard to reverse. Ideally such a function has three properties: pre-image resistance collision resistance second pre-image resistance Pre-image resistance means that starting from the hash value, it is very difficult to infer what led to that output; it […]

Software to factor integers

In my previous post, I showed how changing one bit of a semiprime (i.e. the product of two primes) creates an integer that can be factored much faster. I started writing that post using Python with SymPy, but moved to Mathematica because factoring took too long. SymPy vs Mathematica When I’m working in Python, SymPy […]

Why are regular expressions difficult?

Regular expressions are challenging, but not for the reasons commonly given. Non-reasons Here are some reasons given for the difficulty of regular expressions that I don’t agree with. Cryptic syntax I think complaints about cryptic syntax miss the mark. Some people say that Greek is hard to learn because it uses a different alphabet. If […]

Protecting privacy while keeping detailed date information

A common attempt to protect privacy is to truncate dates to just the year. For example, the Safe Harbor provision of the HIPAA Privacy Rule says to remove “all elements of dates (except year) for dates that are directly related to an individual …” This restriction exists because dates of service can be used to […]

SQRL: Secure Quick Reliable Login

Steve Gibson’s Security Now is one of the podcasts I regularly listen to, and so I’ve been hearing him talk about his SQRL for a while. This week he finally released SQRL: Secure Quick Reliable Login. You can read more about SQRL in the white paper posted on the GRC web site. Here’s a tease […]

Random sampling from a file

I recently learned about the Linux command line utility shuf from browsing The Art of Command Line. This could be useful for random sampling. Given just a file name, shuf randomly permutes the lines of the file. With the option -n you can specify how many lines to return. So it’s doing sampling without replacement. […]

Cosmic rays flipping bits

A cosmic ray striking computer memory at just the right time can flip a bit, turning a 0 into a 1 or vice versa. While I knew that cosmic ray bit flips were a theoretical possibility, I didn’t know until recently that there had been documented instances on the ground. Radiolab did an episode on […]

Improving on the sieve of Eratosthenes

Ancient algorithm Eratosthenes had a good idea for finding all primes less than an upper bound N over 22 centuries ago. Make a list of the numbers 2 to N. Circle 2, then scratch out all the larger multiples of 2 up to N. Then move on to 3. Circle it, and scratch out all […]

A truly horrible random number generator

I needed a bad random number generator for an illustration, and chose RANDU, possibly the worst random number generator that was ever widely deployed. Donald Knuth comments on RANDU in the second volume of his magnum opus. When this chapter was first written in the late 1960’s, a truly horrible random number generator called RANDU […]

Assumed technologies

I just had a client ship me a laptop. We never discussed what OS the computer would run. I haven’t opened the box yet, but I imagine it’s running Windows 10. I’ve had clients assume I run Windows, but also others who assume I run Linux or Mac. I don’t recall anyone asking me whether […]

Digital signatures with oil and vinegar

“Unbalanced oil and vinegar” is a colorful name for a cryptographic signature method. This post will give a high-level description of the method and explain where the name comes from. The RSA encryption algorithm depends on the fact that computers can easily multiply enormous numbers, but they cannot efficiently factor the product of two enormous […]

Why isn’t CPU time more valuable?

Here’s something I find puzzling: why isn’t CPU time more valuable? I first thought about this when I was working for MD Anderson Cancer Center, maybe around 2002. Our research in adaptive clinical trial methods required bursts of CPU time. We might need hundreds of hours of CPU time for a simulation, then nothing while […]

Google Adiantum and the ChaCha RNG

The ChaCha cryptographic random number generator is in the news thanks to Google’s Adiantum project. I’ll discuss what’s going on, but first a little background. The name of the project comes from a genus of fern. More on that below as well. One-time pads The one-time pad is a provably unbreakable way to encrypt things. […]

Hash function menagerie

Here’s an oversimplified survey of cryptographic hash functions: Everyone used to use MD5, now they use some variation on SHA. There’s some truth to that. MD5 was very popular, and remains popular years after it was proven insecure. And now variations on SHA like SHA1 and SHA256 are commonly used. But there are a lot […]

Reversing an MD5 hash

The MD5 hashing algorithm was once considered secure cryptographic hash, but those days are long gone [1]. For a given hash value, it doesn’t take much computing power to create a document with the same hash. Hash functions are not reversible in general. MD5 is a 128-bit hash, and so it maps any string, no […]

Projecting Unicode to ASCII

Sometimes you need to downgrade Unicode text to more restricted ASCII text. For example, while working on my previous post, I was surprised that there didn’t appear to be an asteroid named after Poincaré. There is one, but it was listed as Poincare in my list of asteroid names. Python module I used the Python module unidecode […]

Sine of a googol

How do you evaluate the sine of a large number in floating point arithmetic? What does the result even mean? Sine of a trillion Let’s start by finding the sine of a trillion (1012) using floating point arithmetic. There are a couple ways to think about this. The floating point number t = 1.0e12 can only […]

Sequence alignment

In my previous post I illustrated the Levenshtein edit distance by comparing the opening paragraphs of Finnegans Wake by James Joyce and a parody by Adam Roberts. In this post I’ll show how to align two sequences using the sequence alignment algorithms of Needleman-Wunsch and Hirschberg. These algorithms can be used to compare any sequences, though they […]