Analysts must reckon with the fake data menace

October 9, 2017

(This article was originally published at Big Data, Plainly Spoken (aka Numbers Rule Your World), and syndicated at StatsBlogs.)

Here is a problem staring many digital/Web/social media analysts in the face today: what if you are told that the majority of the data you have been dutifully reporting, analyzing and (gasp!) modeling are fake data?

By fake data, I mean, useless numbers that have no bearing on reality: visits to websites that never happened, clicks on ads by hired hands, clicks on ads by bots, clicks on ads that are buried layers deep invisible to any humans, video "views" that result from automatically playing clips, video "views" that last one second, ad reach (i.e. number of people who have seen the ad) that exceeds Census counts, reviews planted by hired hands, etc. etc.

Every one of the above is not fictional but the reality of the uncontrolled and unaudited, increasingly machine-driven and complex, secretive world of digital advertising. All major players - Google, Facebook, Microsoft, ad networks like AppNexus and Mediamath - are implicated.

I raised the alarm two years ago in an article at Harvard Business Review, featuring the work of leading ad fraud researcher Dr. Augustine Fou. Recently, there is tidal wave of news reports about all kinds of ad fraud and fake data.

Here are a selected few links to get you started:

Ad Buyers Rate Facebook’s 10 Measurement Errors

Facebook Ads Supposedly Reach More People Than Census Counts

Google and Other Ad Platforms Sold Fake Ads

Google’s Chrome and Microsoft’s Browsers Attract Fraudsters

P&G Cut $100 Million of Digital Ads, Without Impact on Results

Business Could Lose As Much As $16.4 Billion in Online Advertising in 2017

I have invited Dr. Fou to comment on this fast-developing situation in the Principal Analytics Prep Webinar on Wednesday night. Learn more about the Webinar and register for free here.


The focus of most news items are from the perspective of brand advertisers who belatedly are waking up to the huge amount of dollars wasted. And a big story is being missed. Such waste was enabled by massive amounts of data that we now know are fake.

What about the zillions of reports, analyses and models created over the last 20 years by countless data "scientists" and analysts, in which the data from Google, Facebook, and myriad digital marketing vendors are taken at face value as accurate?

In fact, the digital advertising industry was built on the promise that it is more measurable, more accountable and more cost-effective. What Dr. Fou shows is that only basic statistics is needed to uncover such fraud.

Data cleaning is a huge time sink already without fake data - now, we have to wrestle with mountains of fake data. But that is the reality, and we have to rise up to it.

Please comment on the article here: Big Data, Plainly Spoken (aka Numbers Rule Your World)

Tags: , , , , , , , , , , , , , ,