Posts Tagged ‘ Big Data ’

My advice on dplyr::mutate()

September 22, 2017
By
My advice on dplyr::mutate()

There are substantial differences between ad-hoc analyses (be they: machine learning research, data science contests, or other demonstrations) and production worthy systems. Roughly: ad-hoc analyses have to be correct only at the moment they are run (and often once they are correct, that is the last time they are run; obviously the idea of reproducible … Continue reading My advice on dplyr::mutate()

Read more »

Uber data collection makes news again

August 31, 2017
By

Kaiser Fung, founder of Junk Charts and Principal Analytics Prep, the next-gen data analytics bootcamp, discusses ethical issues concerning Uber's collection of user data from smartphone apps.

Read more »

Gelman digested read

August 16, 2017
By

It's hard to keep up with Andrew Gelman, so let me point to some interesting recent posts from his blog. Readings on philosophy of statistics (link): Andrew has a bunch of links of (mostly his own) writings about deep statistical issues. Science is about understanding how the world works, which involves questions of cause and effect, and randomness and unexplained variability. Data that can be observed are almost never sufficient…

Read more »

Did web scraping just receive a legal boost?

August 15, 2017
By

Kaiser Fung, founder of Principal Analytics Prep and author of Numbersense, discusses a recent legal ruling against LinkedIn's technologies that restricts web scraping.

Read more »

Analyzing Terabytes of Economic Data

August 14, 2017
By

Serena Ng's World Congress piece is out as an NBER w.p.  It's been floating around for a long time, but just in case you missed it, it's a fun and insightful read:Opportunities and Challenges: Lessons from Analyzing Terabytes of Scanner Datab...

Read more »

Know your data 21: another example of data sleaze, straight from your home

August 1, 2017
By

Kaiser Fung, founder of Principal Analytics Prep and author of Numbersense, reacts to the news that Roomba captures detailed maps of people's homes.

Read more »

If you are using Facebook Ads split testing (A/B testing), stop fooling yourself

July 26, 2017
By

Kaiser Fung, founder of Principal Analytics Prep, and former director of Applied Analytics at Columbia University, explains why you can't run proper A/B tests on Facebook

Read more »

Data bite back: a Harvard lesson

June 7, 2017
By

New York Times reports that Harvard rescinded admission offers to 10 students (link). These kids allegedly engaged in horrible behavior on "private" chatrooms online. Not reported is how Harvard admission officers got a hold of such information. Is it possible there are jealous classmates? I am not condoning the bad behavior - I put this link up to remind people: (a) no data are private, not even "deleted" data (b)…

Read more »

Managing Spark data handles in R

May 26, 2017
By
Managing Spark data handles in R

When working with big data with R (say, using Spark and sparklyr) we have found it very convenient to keep data handles in a neat list or data_frame. Please read on for our handy hints on keeping your data handles neat. When using R to work over a big data system (such as Spark) much … Continue reading Managing Spark data handles in R

Read more »

Artificial human intelligence

May 25, 2017
By

When we use the words artificial intelligence, we typically mean artificial machine intelligence, training machines to act like human beings. What is actually happening is the opposite - we are developing artificial human intelligence, as in, humans are being trained to think like machines. Example 1: I recently called a cab company and told them I was at Union Square. The despatcher was taking a long time to respond, and…

Read more »


Subscribe

Email:

  Subscribe