Efficient modular arithmetic technique for Curve25519

Daniel Bernstein’s Curve25519 is the elliptic curve

y² = x³ + 486662x² + x

over the prime field with order p = 2255 – 19. The curve is a popular choice in elliptic curve cryptography because its design choices are transparently justified [1] and because cryptography over the curve can be implemented very efficiently. This post will concentrate on one of the tricks that makes ECC over Curve25519 so efficient.

Curve25519 was designed for fast and secure cryptography. One of the things that make it fast is the clever way Bernstein carries out arithmetic mod 2255 – 19 which he describes here.

Bernstein represents numbers mod 2255 – 19 by polynomials whose value at 1 gives the number. That alone is not remarkable, but his choice of representation seems odd until you learn why it was chosen. Each number is represented as a polynomial of the form

ui xi

where each ui is an integer multiple ki of 2⌈25.5i, and each ki is an integer between -225 and 225 inclusive.

Why this limitation on the k‘s? Pentium cache optimization. In Bernstein’s words:

Why split 255-bit integers into ten 26-bit pieces, rather than nine 29-bit pieces or eight 32-bit pieces? Answer: The coefficients of a polynomial product do not fit into the Pentium M’s fp registers if pieces are too large. The cost of handling larger coefficients outweighs the savings of handling fewer coefficients.

And why unevenly spaced powers of 2: 1, 226, 251, 277, …, 2230? Some consecutive exponents differ by 25 and some by 26. This looks sorta like a base 225 or base 226 representation, but is a mixture of both. Bernstein answers this in his paper.

Bernstein answers this question as well.

Given that there are 10 pieces, why use radix 225.5 rather than, e.g., radix 225 or radix 226? Answer: My ring R contains 2255x10 − 19, which represents 0 in Z/(2255 − 19). I will reduce polynomial products modulo 2255x10 – 19 to eliminate the coefficients of x10, x11, etc. With radix 225 , the coefficient of x10 could not be eliminated. With radix 226, coefficients would have to be multiplied by 2519 rather than just 19, and the results would not fit into an fp register.

There are a few things to unpack here.

Remember that we’re turning polynomials in to numbers by evaluating them at 1. So when x = 1, 2255x10 – 19  = p = 2255 – 19, which is the zero in the integers mod  2255 – 19.

If we were using base (radix) 225 , the largest number we could represent with a 9th degree polynomial with the restrictions above would be 2250 , so we’d need a 10th degree polynomial; we couldn’t eliminate terms containing x10.

I don’t yet see why working with radix 226 would overflow an fp register. If you do see why, please leave an explanation in the comments.

Related posts

[1] When a cryptographic method has an unjustified parameter, it invites suspicion that the parameter was chosen to create an undocumented back door. This is not the case with Curve25519. For example, why does it use p = 2255 – 19? It’s efficient to use a prime close to a large power of 2, and this p is the closes prime to 2255. The coefficient 486662 is not immediately obvious, but Bernstein explains in his paper how it was the smallest integer that met his design criteria.