Posts Tagged ‘ Errors ’

Analysts must reckon with the fake data menace

October 9, 2017
By

Kaiser Fung, founder of Principal Analytics Prep, comments on the fake data and fraud problem in digital advertising, and calls on data scientists and analysts to rise up to the challenge.

Read more »

Crash course in precision and uncertainty, in advance of that climate debate, free for Mr. Pruitt

July 13, 2017
By

Scott Pruitt, the EPA chief, continues to make innumerate comments about his personal views on climate change science. His chief accusation - chanted often - is that we need more "precision". In his view, achieving 100% precision is necessary because it removes all uncertainty, allowing lawmakers to take action. This post is inspired by his latest interview in which he is encouraging a TV debate event to air out the…

Read more »

The get-rich-quick scheme of the English

March 31, 2017
By
The get-rich-quick scheme of the English

The World Economic Forum published this chart: The "EF EPI Score" is a measure of English proficiency. So the evidence is clear as day: "Better English and Income Go Hand in Hand," as their headline blares. Last time I was in the New York subway, the panhandler spoke good English. What's a blogger to do? I pulled out the EPI scores from the EPI report, and downloaded the Gross National…

Read more »

Reading Everything is Obvious by Duncan Watts

February 15, 2017
By
Reading Everything is Obvious by Duncan Watts

In his book, Everything is Obvious (Once You Know the Answer): Why Common Sense Fails, Duncan Watts, a professor of sociology at Columbia, imparts urgent lessons that are as relevant to his students as to self-proclaimed data scientists. It takes only nominal effort to generate narrative structures that retrace the past, Watts contends, but developing lasting theory that produces valid predictions requires much more effort than common sense. Watts’s is…

Read more »

Apparently Hollywood does not recycle action-movie plots. The data said so, so it must be right

January 25, 2017
By
Apparently Hollywood does not recycle action-movie plots. The data said so, so it must be right

Today I continue to explore the movie dataset, found on Kaggle. To catch up with previous work, see the blog posts 1 and 2. One of the students came up with an interesting problem. Among the genre of action movies, are there particular plot elements that are correlated with box office? This problem is solvable because the dataset contains a variable called "plot keywords" lifted from IMDB. Plot keywords are…

Read more »

Good models + Bad data = Bad analysis

January 18, 2017
By
Good models + Bad data = Bad analysis

Example showing how to diagnose bad data in data science models

Read more »

Inspired by water leaks

December 19, 2016
By
Inspired by water leaks

For me, 2016 is a year of water leaks. I was forced to move apartments during the summer. (Blame my old landlord for the lower frequency of posts this year!) That old apartment was overrun by water issues. In the past four years, there were two big leaks in addition to annual visible "seepage" in the ceiling. The first big leak ruined my first night back from Hurricane Sandy-induced evacuation.…

Read more »

This election forecasting business

November 15, 2016
By
This election forecasting business

If you live in the States, and particularly a blue state, in the last year or two, it has been drilled into your head that Hillary Clinton was the overwhelming favorite to win the Presidential election. On the day before the election, when all the major media outlets finalized their "election forecasting models," they unanimously pronounced Clinton the clear winner, with a probability of winning of 70% to 99%. One…

Read more »

Reader’s Guide to the Power Pose Controversy 3

November 2, 2016
By

This is the third and final post about the controversy over statistical analysis used in peer-reviewed published scholarly research. Most of the new stuff are covered in post #2 (link). Today's post covers statistical issues related to sample size, which is nothing new, but it was mentioned in Amy Cuddy's response to her critics and thus I also discuss it here. In post #2 (link), I offer the following mental…

Read more »

The idol worship of objective data is damaging our discipline

October 28, 2016
By

In class last week, I discussed this New York Times article with the students. One of the claims in the article is that the U.S. News ranking of colleges is under threat by newcomers whose rankings are more relevant because they more directly measure outcomes such as earnings of graduates. This specific claim in the article makes me head hurt: "If nothing else, earnings are objective and, as the database…

Read more »


Subscribe

Email:

  Subscribe