Starting With Data Science A rigorous hands-on introduction to data science for engineers. Win Vector LLC is now offering a 4 day on-site intensive data science course. The course targets engineers familiar with Python and introduces them to the basics of current data science practice. This is designed as an interactive in-person (not remote or … Continue reading Starting With Data Science: A Rigorous Hands-On Introduction to Data Science for Engineers

# Category: Python

## An infinite product challenge

Gil Kalai wrote a blog post yesterday entitled “Test Your Intuition (or knowledge, or programming skills) 36.” The challenge is to evaluate the infinite product I imagine there’s an elegant analytical solution, but since the title suggested that programming might suffice, I decided to try a little Python. I used primerange from SymPy to generate […]

## Implementing the ChaCha RNG in Python

My previous post talked about the ChaCha random number generator and how Google is using it in a stream cipher for encryption on low-end devices. This post talks about how to implement ChaCha in pure Python. First of all, the only reason to implement ChaCha in pure Python is to play with it. It would […]

## Gartner’s 2019 Take on Data Science Software

I’ve just updated The Popularity of Data Science Software to reflect my take on Gartner’s 2019 report, Magic Quadrant for Data Science and Machine Learning Platforms. To save you the trouble of digging through all 40+ pages of my report, here’s just the updated section: Continue reading →

## Computing Legendre and Jacobi symbols

In a earlier post I introduce the Legendre symbol where a is a positive integer and p is prime. It is defined to be 0 if a is a multiple of p, 1 if a has a square root mod p, and -1 otherwise. The Jacobi symbol is a generalization of the Legendre symbol and uses the same notation. It […]

## Exploring the sum-product conjecture

Quanta Magazine posted an article yesterday about the sum-product problem of Paul Erdős and Endre Szemerédi. This problem starts with a finite set of real numbers A then considers the size of the sets A+A and A*A. That is, if we add every element of A to every other element of A, how many distinct sums are there? If we […]

## R Tip: Use Inline Operators For Legibility

R Tip: use inline operators for legibility. A Python feature I miss when working in R is the convenience of Python‘s inline + operator. In Python, + does the right thing for some built in data types: It concatenates lists: [1,2] + [3] is [1, 2, 3]. It concatenates strings: ‘a’ + ‘b’ is ‘ab’. … Continue reading R Tip: Use Inline Operators For Legibility

## Projecting Unicode to ASCII

Sometimes you need to downgrade Unicode text to more restricted ASCII text. For example, while working on my previous post, I was surprised that there didn’t appear to be an asteroid named after Poincaré. There is one, but it was listed as Poincare in my list of asteroid names. Python module I used the Python module unidecode […]

## Timing the Same Algorithm in R, Python, and C++

While developing the RcppDynProg R package I took a little extra time to port the core algorithm from C++ to both R and Python. This means I can time the exact same algorithm implemented nearly identically in each of these three languages. So I can extract some comparative “apples to apples” timings. Please read on … Continue reading Timing the Same Algorithm in R, Python, and C++

## Groups of order 2019

How many groups have 2019 elements? What are these groups? 2019 is a semiprime, i.e. the product of two primes, 3 and 673. For every semiprime s, there are either one or two distinct groups of order s. As explained here, if s = pq with p > q, all groups of order s are isomorphic if q is not a factor of p-1. […]

## Check sums and error detection

The previous post looked at Crockford’s base 32 encoding, a minor variation on the way math conventionally represents base 32 numbers, with concessions for human use. By not using the letter O, for example, it avoids confusion with the digit 0. Crockford recommends the following check sum procedure, a simple error detection code: The check […]

## Base 32 and base 64 encoding

Math has a conventional way to represent numbers in bases larger than 10, and software development has a couple variations on this theme that are only incidentally mathematical. Math convention By convention, math books typically represent numbers in bases larger than 10 by using letters as new digit symbols following 9. For example, base 16 […]

## RSA with one shared prime

The RSA encryption setup begins by finding two large prime numbers. These numbers are kept secret, but their product is made public. We discuss below just how difficult it is to recover two large primes from knowing their product. Suppose two people share one prime. That is, one person chooses primes p and q and the other chooses p […]

## Simulating identification by zip code, sex, and birthdate

As mentioned in the previous post, Latanya Sweeney estimated that 87% of Americans can be identified by the combination of zip code, sex, and birth date. We’ll do a quick-and-dirty estimate and a simulation to show that this result is plausible. There’s no point being too realistic with a simulation because the actual data that […]

## Sine of a googol

How do you evaluate the sine of a large number in floating point arithmetic? What does the result even mean? Sine of a trillion Let’s start by finding the sine of a trillion (1012) using floating point arithmetic. There are a couple ways to think about this. The floating point number t = 1.0e12 can only […]

## Searching for Mersenne primes

The nth Mersenne number is Mn = 2n – 1. A Mersenne prime is a Mersenne number which is also prime. So far 50 51 have been found [1]. A necessary condition for Mn to be prime is that n is prime, so searches for Mersenne numbers only test prime values of n. It’s not sufficient for n to be prime […]

## Ellipsoid distance on Earth

To first approximation, Earth is a sphere. But it bulges at the equator, and to second approximation, Earth is an oblate spheroid. Earth is not exactly an oblate spheroid either, but the error in the oblate spheroid model is about 100x smaller than the error in the spherical model. Finding the distance between two points […]