Posts Tagged ‘ data ’

Some statistics about nutrition statistics

May 26, 2015
By

I only read nutrition studies in the service of this blog but otherwise, I don't trust them or care. Nevertheless, the health beat of most media outlets is obsessed with printing the latest research on coffee or eggs or fats or alcohol or what have you. Now, the estimable John Ioannidis has published an editorial in BMJ titled "Implausible Results in Human Nutrition Research". John previously told us about the…

Read more »

Should I tell students that the maximum score in the class is 137?

May 22, 2015
By

This op-ed by Richard Thaler caught my attention because I have a similar experience. In my statistics classes, I have noticed a pattern: if the mid-term exam is hard, with a lower average score (say 75-80%), the students look crestfallen and feel that they did not learn; eventually, when it comes to evaluating the instructor, I receive lower grades, with comments indicating that I have not taught them properly to…

Read more »

Story time, known unknowns and the endowment effect in an HBR article on customer data

May 6, 2015
By
Story time, known unknowns and the endowment effect in an HBR article on customer data

Harvard Business Review devotes a long article to customer data privacy in the May issue (link). The article raises important issues, such as the low degree of knowledge about what data are being collected and traded, the value people place on their data privacy, and so on. In a separate post, I will discuss why I don't think the recommendations issued by the authors will resolve the issues they raised.…

Read more »

Painting the full picture of the employment situation

May 5, 2015
By
Painting the full picture of the employment situation

It's very frustrating to read the mainstream articles about the recent unemployment report. For example, the New York Times said "U.S. Jobless Claims Hit 15-year Low." (link) At this point, everyone should be aware of how employment statistics, in particular,...

Read more »

Wakefield: Random Data Set (Part II)

April 30, 2015
By
Wakefield: Random Data Set (Part II)

This post is part II of a series detailing the GitHub package, wakefield, for generating random data sets. The First Post (part I) was a test run to gauge user interest. I received positive feedback and some ideas for improvements, … Continue reading →

Read more »

Random Data Sets Quickly

April 25, 2015
By
Random Data Sets Quickly

This post will discuss a recent GitHub package I’m working on, wakefield to generate random data sets. The post is broken into the following sections: Demo 1.1 Random Variable Functions 1.2 Random Data Frames 1.3 Missing Values 1.4 Default Data … Continue reading →

Read more »

Gelman speed read

April 23, 2015
By

For those who have found it tough to keep up with Andrew Gelman's prolificacy, here are some brief summaries of several recent posts: On people obsessed with proving the statistical significance of tiny effects: "they are trying to use a bathroom scale to weigh a feather—and the feather is resting loosely in the pouch of a kangaroo that is vigorously jumping up and down." (link) [I left a comment. In…

Read more »

What if the Washington Post did not display all the data

April 23, 2015
By
What if the Washington Post did not display all the data

Thanks to reader Charles Chris P., I was able to get the police staffing data to play around with. Recall from the previous post that the Washington Post made the following scatter plot, comparing the proportion of whites among police...

Read more »

Scala for Machine Learning [book review]

April 9, 2015
By
Scala for Machine Learning [book review]

Nicolas, Patrick R. (2014) Scala for Machine Learning, Packt Publishing: Birmingham, UK. Full disclosure: I received a free electronic version of this book from the publisher for the purposes of review. There is clearly a market for a good book about using Scala for statistical computing, machine learning and data science. So when the publisher … Continue reading Scala for Machine Learning [book review]

Read more »

What popular baby names teach us about data analytics

April 6, 2015
By

In my latest piece for Harvard Business Review (link), I tackle this common problem in the interactions between data scientists and business managers: A typical big data analysis goes like this: First, a data scientist finds some obscure data accumulating in a server. Next, he or she spends days or weeks slicing and dicing the numbers, eventually stumbling upon some unusual insights. Then, a meeting is organized to present the…

Read more »


Subscribe

Email:

  Subscribe