Posts Tagged ‘ Big Data ’

Data bite back: a Harvard lesson

June 7, 2017
By

New York Times reports that Harvard rescinded admission offers to 10 students (link). These kids allegedly engaged in horrible behavior on "private" chatrooms online. Not reported is how Harvard admission officers got a hold of such information. Is it possible there are jealous classmates? I am not condoning the bad behavior - I put this link up to remind people: (a) no data are private, not even "deleted" data (b)…

Read more »

Managing Spark data handles in R

May 26, 2017
By
Managing Spark data handles in R

When working with big data with R (say, using Spark and sparklyr) we have found it very convenient to keep data handles in a neat list or data_frame. Please read on for our handy hints on keeping your data handles neat. When using R to work over a big data system (such as Spark) much … Continue reading Managing Spark data handles in R

Read more »

Artificial human intelligence

May 25, 2017
By

When we use the words artificial intelligence, we typically mean artificial machine intelligence, training machines to act like human beings. What is actually happening is the opposite - we are developing artificial human intelligence, as in, humans are being trained to think like machines. Example 1: I recently called a cab company and told them I was at Union Square. The despatcher was taking a long time to respond, and…

Read more »

New series: R and big data (concentrating on Spark and sparklyr)

May 20, 2017
By
New series: R and big data (concentrating on Spark and sparklyr)

Win-Vector LLC has recently been teaching how to use R with big data through Spark and sparklyr. We have also been helping clients become productive on R/Spark infrastructure through direct consulting and bespoke training. I thought this would be a good time to talk about the power of working with big-data using R, share some … Continue reading New series: R and big data (concentrating on Spark and sparklyr)

Read more »

Book review: Everybody Lies by Seth Stephens-Davidowitz

May 15, 2017
By
Book review: Everybody Lies by Seth Stephens-Davidowitz

Kaiser Fung, founder of Principal Analytics Prep, discusses Seth Stephens-Davidowitz's new book, Everybody Lies

Read more »

The Times agrees on privacy and kind of on fake news business

May 11, 2017
By

The New York Times Magazine has been publishing some pieces that directly relate to a couple of my blog posts. In this article, Amanda Hess noticed that "privacy became a commodity for the rich and powerful." This echoes my blog post on "Data is the next frontier of equal rights." Hess discussed the asymmetry and hypocrisy of the situation whereby the same businesses and business executives that are wantonly stripping…

Read more »

That fake news business

May 4, 2017
By

When Hillary Clinton unexpectedly lost the election in November 2016, it was high time for pundits to espouse their pet theories for the shocking losses suffered by the Democrats at all levels of government. The usual suspects were put on parade, such as “Bernie Bros” who abstained from voting and the “deplorables” who voted against their own interests. For the first time, two unlikely entities faced scrutiny: the social-media giant,…

Read more »

Ramp metering magic

May 1, 2017
By

Here is a recent article giving some history on the invention of ramp meters, used on highways to mitigate congestion. I discuss this subject in Numbers Rule Your World (link). There is an interesting stochastic phenomenon underlying highway congestion. Ramp meters help by regulating the inflow of vehicles onto the highway, and prolonging the period of time by which the highway runs at full capacity. The key insight is that…

Read more »

Data sleaze: Uber and beyond

April 26, 2017
By

There has been a barrage of negative publicity related to Uber recently. The latest salvo is a long article in the New York Times (link). This piece focuses on Uber's CEO, who was trained as a computer engineer, but my interest lies primarily in several revelations about how Uber collects and uses customer data. The key episode picked up by various outlets (e.g. TechCrunch, Wired) involves Uber "secretly identifying and…

Read more »

Confused by machines, or spooked by the machine-makers

March 29, 2017
By

This New York Times article draws attention to real trends in the financial investments industry but gets completely lost in the smoke around those pushing "machines" and "data". The trend most concerning to the investments industry is the sustained, large-scale outflow of money from "actively-managed" funds, mutual funds being the biggest category of such. The industry makes loads of money from management fees by promoting the idea that investors are…

Read more »


Subscribe

Email:

  Subscribe