Posts Tagged ‘ Big Data ’

Data sleaze: Uber and beyond

April 26, 2017
By

There has been a barrage of negative publicity related to Uber recently. The latest salvo is a long article in the New York Times (link). This piece focuses on Uber's CEO, who was trained as a computer engineer, but my interest lies primarily in several revelations about how Uber collects and uses customer data. The key episode picked up by various outlets (e.g. TechCrunch, Wired) involves Uber "secretly identifying and…

Read more »

Confused by machines, or spooked by the machine-makers

March 29, 2017
By

This New York Times article draws attention to real trends in the financial investments industry but gets completely lost in the smoke around those pushing "machines" and "data". The trend most concerning to the investments industry is the sustained, large-scale outflow of money from "actively-managed" funds, mutual funds being the biggest category of such. The industry makes loads of money from management fees by promoting the idea that investors are…

Read more »

Collaboration with New York Public Library

March 27, 2017
By

For many years now, the field of Data Science and Business Analytics has been booming, and hiring managers are finding a severe dearth of high-quality job-seekers. Meanwhile, there are a good number of people interested in entering the field but keep bumping into walls. Hiring managers like to hire experienced people for a host of reasons, including the fear of other hiring managers poaching their trained employees. For a number…

Read more »

New screencast: using R and RStudio to install and experiment with Apache Spark

March 15, 2017
By

I have new short screencast up: using R and RStudio to install and experiment with Apache Spark. More material from my recent Strata workshop Modeling big data with R, sparklyr, and Apache Spark can be found here.

Read more »

Reading Everything is Obvious by Duncan Watts

February 15, 2017
By
Reading Everything is Obvious by Duncan Watts

In his book, Everything is Obvious (Once You Know the Answer): Why Common Sense Fails, Duncan Watts, a professor of sociology at Columbia, imparts urgent lessons that are as relevant to his students as to self-proclaimed data scientists. It takes only nominal effort to generate narrative structures that retrace the past, Watts contends, but developing lasting theory that produces valid predictions requires much more effort than common sense. Watts’s is…

Read more »

Data for the People

February 5, 2017
By

Data for the People, by Andreas Weigend, is coming out this week, or maybe it came out last week. Andreas is a leading technologist (at least that's the most accurate one-word description I can think of), and I have valued his insights ever since we we...

Read more »

Deep thinking about your data

February 3, 2017
By
Deep thinking about your data

In the on-going series of posts about the IMDB dataset, from Kaggle, I have so far looked at several of the scraped variables, including the number of faces on movie posters (1, 2), plot keywords (3), and movie rating by title year (4). In this post, I tackle the variables resulting from a data merge between IMDB and Facebook. These columns have names like "Director Facebook Likes", "Actor 1 Facebook…

Read more »

Apparently Hollywood does not recycle action-movie plots. The data said so, so it must be right

January 25, 2017
By
Apparently Hollywood does not recycle action-movie plots. The data said so, so it must be right

Today I continue to explore the movie dataset, found on Kaggle. To catch up with previous work, see the blog posts 1 and 2. One of the students came up with an interesting problem. Among the genre of action movies, are there particular plot elements that are correlated with box office? This problem is solvable because the dataset contains a variable called "plot keywords" lifted from IMDB. Plot keywords are…

Read more »

Counting is hard, especially when you don’t have theories

January 19, 2017
By
Counting is hard, especially when you don’t have theories

Exploring the data about movies, uncovering data issues

Read more »

Good models + Bad data = Bad analysis

January 18, 2017
By
Good models + Bad data = Bad analysis

Example showing how to diagnose bad data in data science models

Read more »


Subscribe

Email:

  Subscribe