Watch Dogs: Mapping all Publicly Available Data of a World City

July 10, 2013
By
Watch Dogs: Mapping all Publicly Available Data of a World City

Watch Dogs - We Are Data [watchdogs.com], developed by the French global video game publisher Ubisoft, is the first website to gather publicly available and real-time data about Paris, London and Berlin in a single interface. More specifically, all ...

Read more »

Please send all comments to /dev/ripley

July 10, 2013
By
Please send all comments to /dev/ripley

Trey Causey asks, Has R-help gotten meaner over time?: I began by using Scrapy to download all the e-mails sent to R-help between April 1997 (the earliest available archive) and December 2012. . . . We each read 500 messages and coded them in the following categories: -2 Negative and unhelpful -1 Negative but helpful […]The post Please send all comments to /dev/ripley appeared first on Statistical Modeling, Causal Inference,…

Read more »

What are the iconic data graphs of the past 10 years?

July 10, 2013
By

This article in the New York Times about the supposed death of photography got me thinking about statistics. Apparently, the death of photography has been around the corner for some time now: For years, photographers have been bracing for this … Continue reading →

Read more »

Six reasons you should stop using the RANUNI function to generate random numbers

July 10, 2013
By
Six reasons you should stop using the RANUNI function to generate random numbers

Are you still using the old RANUNI, RANNOR, RANBIN, and other "RANXXX" functions to generate random numbers in SAS? If so, here are six reasons why you should switch from these older (1970s) algorithms to the newer (late 1990s) Mersenne-Twister algorithm, which is implemented in the RAND function. The newer [...]

Read more »

Flotsam 13: early July links

July 10, 2013
By

Man flu kept me at home today, so I decided to do something ‘useful’ and go for a linkathon: Ed Yong discusses the effect of subject expectations in psychology experiments Nice Results, But What Did You Expect? At the beginning there was another article on The placebo phenomenon, and another one on The placebo defect. […]

Read more »

“Frontiers in Massive Data Analysis”

July 10, 2013
By

Mike Jordan sends along this National Academies report on “big data.” This is not a research report but it could be interesting in that it conveys what are believed to be important technical challenges. The post “Frontiers in Massive...

Read more »

Predicting season records for NFL teams – part 2

July 9, 2013
By
Predicting season records for NFL teams – part 2

This is the second, technical, part of this series. See the first part for the overview. Introduction This post will introduce the technical details behind the nfl season record prediction that was introduced in part one. After selecting the error metric and defining an acceptable baseline, which was setup in part one, the next step is to develop a plan of attack. In order to create and develop this plan,…

Read more »

Exploratory Data Analysis: Conceptual Foundations of Histograms – Illustrated with New York’s Ozone Pollution Data

Exploratory Data Analysis: Conceptual Foundations of Histograms – Illustrated with New York’s Ozone Pollution Data

Introduction Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on histograms, which are very useful plots for visualizing the distribution of a data set.  I will discuss how histograms are constructed and use histograms to assess the distribution of the “Ozone” data from the built-in “airquality” data set in R.  In […]

Read more »

Predicting season records for NFL teams – overview

July 9, 2013
By
Predicting season records for NFL teams – overview

This is the first, non-technical, part of this series. See the second part for more detail. Introduction I was recently looking for a good machine learning task to try out, and I thought that doing something NFL-related would be interesting, because the NFL season is about to start (finally!). Why was I looking for a good machine learning task to try out? I have mostly done my data analysis work…

Read more »

Predicting season records for NFL teams – part 2

July 9, 2013
By
Predicting season records for NFL teams – part 2

This is the second, technical, part of this series. See the first part for the overview. Introduction This post will introduce the technical details behind the nfl season record prediction that was introduced in part one. After selecting the error metric and defining an acceptable baseline, which was setup in part one, the next step is to develop a plan of attack. In order to create and develop this plan,…

Read more »

Repost: Preventing Errors Through Reproducibility

July 9, 2013
By

Checklist mania has hit clinical medicine thanks to people like Peter Pronovost and many others. The basic idea is that simple and short checklists along with changes to clinical culture can prevent major errors from occurring in medical practice. One … Continue reading →

Read more »

Symposium Magazine

July 9, 2013
By

Symposium is a new online magazine subtitled “Where academia meets public life.” You can think of it as a sort of Slate magazine without Mickey Kaus, or as the Atlantic without the stylish writing. Here are the articles in the first issue, which has just been posted: Why Write the History of Capitalism? Louis Hyman [...]The post Symposium Magazine appeared first on Statistical Modeling, Causal Inference, and Social Science.

Read more »

A conversation with Professor Bin Yu

July 9, 2013
By
A conversation with Professor Bin Yu

The latest issue of ISCA Bulletin published my interview: A conversation with Professor Bin Yu. It is quite long, but informative. Here I picked out some short paragraphs based on my personal bias. [Before College] A math book from a cousin gave me my first boost into math when I was in 3rd and 4th grade. I […]

Read more »

Predicting season records for NFL teams – overview

July 9, 2013
By
Predicting season records for NFL teams – overview

This is the first, non-technical, part of this series. See the second part for more detail. Introduction I was recently looking for a good machine learning task to try out, and I thought that doing something NFL-related would be interesting, because the NFL season is about to start (finally!). Why was I looking for a good machine learning task to try out? I have mostly done my data analysis work…

Read more »

googleVis tutorial at useR!2013

July 9, 2013
By
googleVis tutorial at useR!2013

Today Diego and I will give our googleVis tutorial at useR!2013 in Albacete, Spain.googleVis Tutorial at useR! 2013We will cover:Introduction and motivationGoogle Chart ToolsR package googleVisConcepts of googleVisCase studiesgoogleVis on shiny

Read more »

Welcome Message

July 9, 2013
By
Welcome Message

Welcome Message for Readers of Kaiser Fung's blog, Big Data Plainly Spoken. Comments on Statistical Thinking in Everyday Life. Big Data Interpretation. Author of Numbersense and Numbers Rule Your World

Read more »

Announcing New Book: Numbersense

July 9, 2013
By
Announcing New Book: Numbersense

I have a new book arriving at stores this week. It’s titled Numbersense: How to Use Big Data to Your Advantage. If you read this blog, you’d have a good idea what the book is about. I analyze claims made in the media that are supported by analyses of data. I show you how I dissect these claims to decide whether they are credible, or they are bogus. The ability…

Read more »

Confusing Stats Terms Explained: Internal Consistency

July 8, 2013
By
Confusing Stats Terms Explained: Internal Consistency

Internal consistency refers to the general agreement between multiple items (often likert scale items) that make-up a composite score of a survey measurement of a given construct. This agreement is generally measured by the correlation between items. For example, a survey measure of depression may include many questions that each measure various aspects of depression, such as...

Read more »

Confusing Stats Terms Explained: Internal Consistency

July 8, 2013
By
Confusing Stats Terms Explained: Internal Consistency

Related Content:Confusing Stats Terms Explained: Heteroscedasticity (Heteroskedasticity)Confusing Stats Terms Explained: ResidualConfusing Stats Terms Explained: MulticollinearityConfusing Stats Terms Explained: Standard Deviation Internal consistency ...

Read more »

Use R! 2014 to be at UCLA

July 8, 2013
By

The 2014 Use R! conference will be in Los Angeles, CA and will be hosted by the UCLA Department of Statistics (an excellent department, I must say) and the newly created Foundation for Open Access Statistics. This is basically the meeting … Continue reading →

Read more »

Avert your eyes

July 8, 2013
By
Avert your eyes

Reader omegatron came back with another shocking instance of a pie chart: Here is the link to the AVERT organization in the U.K. that published the chart and several others. For the umpteenth time, the pie chart plots proportions. All...

Read more »

How to think about a Psychological Science paper that seems iffy but is not obviously flawed?

July 8, 2013
By
How to think about a Psychological Science paper that seems iffy but is not obviously flawed?

So I open the email one day and see this: hi, Andrew – FYI, here’s another paper from the Annals of Small-N Correlational Studies, also known as Psychological Science: http://www.futurity.org/society-culture/can-bigger-desks-make-us-dishonest/ and http://www.huffingtonpost.com/2013/06/26/physical-environment-dishonesty-space-corrupt-behavior-study_n_3497126.html hope all is well! The research paper he was referring to is called “The Ergonomics of Dishonesty: The Effect of Incidental Posture on [...]The post How to think about a Psychological Science paper that seems iffy but is…

Read more »

More on What Can Be Learned from Statistical Significance

July 8, 2013
By
More on What Can Be Learned from Statistical Significance

A while back, I reacted to a post by Justin Esarey, in which he argues that not much can be learned from statistical significance. The basic question he's asking is as follows: For a fixed sample size, what does the posterior probability of the null hypothesis look like if we update based on the result of […]

Read more »


Subscribe

Email:

  Subscribe