Posts Tagged ‘ data ’

Statbusters: To divide or not to divide by 365

August 24, 2015
By

In the newest column for the Daily Beast, Andrew and I look at the media's fascination with expressing large numbers as daily numbers. (link) In short, you should divide by 365 only when the metric actually scales with time, and be careful if the metric is not evenly distributed across time. We discuss the following headlines: "Air pollution is China is killing 4,000 per day" and "Periscope users view 40…

Read more »

Data frames and tables in Scala

August 21, 2015
By
Data frames and tables in Scala

Introduction To statisticians and data scientists used to working in R, the concept of a data frame is one of the most natural and basic starting points for statistical computing and data analysis. It always surprises me that data frames aren’t a core concept in most programming languages’ standard libraries, since they are essentially a … Continue reading Data frames and tables in Scala

Read more »

Dumbing by numbers

August 20, 2015
By

The New York Times has been making waves this week featuring management practices at Amazon and workplace tracking practices at various companies (link). These are essential references for how data make us dumber. I am going to ignore the shocking claim by the journalist who stated that GE is "long a standard-setter in management practices." To give him some credit, he did not say "good" management practice. It is true…

Read more »

Calling Scala code from R using rscala

August 15, 2015
By
Calling Scala code from R using rscala

Introduction In a previous post I looked at how to call Scala code from R using a CRAN package called jvmr. This package now seems to have been replaced by a new package called rscala. Like the old package, it requires a pre-existing Java installation. Unlike the old package, however, it no longer depends on … Continue reading Calling Scala code from R using rscala

Read more »

Classic Data Visualizations

August 12, 2015
By
Classic Data Visualizations

My thanks to Veronica Johnson at Investech.com for drawing my attention a recent piece of theirs relating to Classic Data Visualizations.As they say:"A single data visualization graphic can be priceless. It can save you hours of research. They’re eas...

Read more »

Is something rotten behind there?

August 5, 2015
By
Is something rotten behind there?

Via Twitter, Andrew B. (link) asked if I could comment on the following chart, published by PC Magazine as part of their ISP study. (link) This chart is decent, although it can certainly be improved. Here is a better version:...

Read more »

Statbusters are back, taking on robots that hire people

July 27, 2015
By

In our newest column, we take on the recent media obsession with companies who make robots that hire people. (link) As with most articles about data science, the journalists failed to dig up any evidence that these robots work, other than glowing quotes from the people who are selling these robots. We point out a number of challenges that such algorithms must overcome in order to generate proper predictions. We…

Read more »

I try hard to not hate all hover-overs. Here is one I love

July 23, 2015
By
I try hard to not hate all hover-overs. Here is one I love

One of the smart things Noah (at WNYC) showed to my class was his NFL fan map, based on Facebook data. This is the "home" of the visualization: The fun starts by clicking around. Here are the Green Bay fans...

Read more »

Is data privacy a fundamental right?

July 4, 2015
By

This piece is part of the StatBusters column written jointly with Andrew Gelman. Hope they fix the labeling soon. In it, we talk about two recent studies on data privacy, which leads to contradictory conclusions. How should the media report such surveys? Is the brand name of the organization enough? In addition, we debunk the notion that consumers will definitely get something valuable out of sharing their data.

Read more »

Mathematical Statistics Lesson of the Day – Ancillary Statistics

Mathematical Statistics Lesson of the Day – Ancillary Statistics

The set-up for today’s post mirrors my earlier Statistics Lessons of the Day on sufficient statistics and complete statistics. Suppose that you collected data in order to estimate a parameter .  Let be the probability density function (PDF) or probability mass function (PMF) for . Let be a statistics based on . If the distribution of does NOT […]

Read more »


Subscribe

Email:

  Subscribe