Don’t trust the Turk

July 10, 2013
By
Don’t trust the Turk

Dan Kahan gives a bunch of reasons not to trust Mechanical Turk in psychology experiments, in particular when studying “hypotheses about cognition and political conflict over societal risks and other policy-relevant facts.” The post Don&#8...

Read more »

Asking good questions

July 10, 2013
By
Asking good questions

I’m currently attending my third conference in three weeks. So I’ve heard a lot of talks, and I’ve heard a lot of questions asked after the talks. In this guest post, Eran Raviv reflects on what makes a good question after a talk. A few weeks back I attended the excellent ISF conference. In one of the sessions, the presenter was talking about a state-of-the-art method to prevent model overfitting,…

Read more »

Visualizing a tiny slice of India’s demographics with information from Wikipedia

July 10, 2013
By
Visualizing a tiny slice of India’s demographics with information from Wikipedia

UPDATE: THE BLOG/SITE HAS MOVED TO GITHUB. THE NEW LINK FOR THE BLOG/SITE IS patilv.github.io and THE LINK TO THIS POST IS:http://bit.ly/1ib8wTl .  PLEASE UPDATE ANY BOOKMARKS YOU MAY HAVE.This post presents a tiny slice of a complex and...

Read more »

Startup Universe: Connecting Startup Companies, Founders and Investors

July 10, 2013
By
Startup Universe: Connecting Startup Companies, Founders and Investors

Start Up Universe [visual.ly], developed by information design agency Accurat and graphic designer Ben Willers for visualization community aggregator Visually, provides a comprehensive view of the relationships between startup companies and their foun...

Read more »

Watch Dogs: Mapping all Publicly Available Data of a World City

July 10, 2013
By
Watch Dogs: Mapping all Publicly Available Data of a World City

Watch Dogs - We Are Data [watchdogs.com], developed by the French global video game publisher Ubisoft, is the first website to gather publicly available and real-time data about Paris, London and Berlin in a single interface. More specifically, all ...

Read more »

Please send all comments to /dev/ripley

July 10, 2013
By
Please send all comments to /dev/ripley

Trey Causey asks, Has R-help gotten meaner over time?: I began by using Scrapy to download all the e-mails sent to R-help between April 1997 (the earliest available archive) and December 2012. . . . We each read 500 messages and coded them in the following categories: -2 Negative and unhelpful -1 Negative but helpful […]The post Please send all comments to /dev/ripley appeared first on Statistical Modeling, Causal Inference,…

Read more »

What are the iconic data graphs of the past 10 years?

July 10, 2013
By

This article in the New York Times about the supposed death of photography got me thinking about statistics. Apparently, the death of photography has been around the corner for some time now: For years, photographers have been bracing for this … Continue reading →

Read more »

Six reasons you should stop using the RANUNI function to generate random numbers

July 10, 2013
By
Six reasons you should stop using the RANUNI function to generate random numbers

Are you still using the old RANUNI, RANNOR, RANBIN, and other "RANXXX" functions to generate random numbers in SAS? If so, here are six reasons why you should switch from these older (1970s) algorithms to the newer (late 1990s) Mersenne-Twister algorithm, which is implemented in the RAND function. The newer [...]

Read more »

Flotsam 13: early July links

July 10, 2013
By

Man flu kept me at home today, so I decided to do something ‘useful’ and go for a linkathon: Ed Yong discusses the effect of subject expectations in psychology experiments Nice Results, But What Did You Expect? At the beginning there was another article on The placebo phenomenon, and another one on The placebo defect. […]

Read more »

“Frontiers in Massive Data Analysis”

July 10, 2013
By

Mike Jordan sends along this National Academies report on “big data.” This is not a research report but it could be interesting in that it conveys what are believed to be important technical challenges. The post “Frontiers in Massive...

Read more »

Predicting season records for NFL teams – part 2

July 9, 2013
By
Predicting season records for NFL teams – part 2

This is the second, technical, part of this series. See the first part for the overview. Introduction This post will introduce the technical details behind the nfl season record prediction that was introduced in part one. After selecting the error metric and defining an acceptable baseline, which was setup in part one, the next step is to develop a plan of attack. In order to create and develop this plan,…

Read more »

Exploratory Data Analysis: Conceptual Foundations of Histograms – Illustrated with New York’s Ozone Pollution Data

Exploratory Data Analysis: Conceptual Foundations of Histograms – Illustrated with New York’s Ozone Pollution Data

Introduction Continuing my recent series on exploratory data analysis (EDA), today’s post focuses on histograms, which are very useful plots for visualizing the distribution of a data set.  I will discuss how histograms are constructed and use histograms to assess the distribution of the “Ozone” data from the built-in “airquality” data set in R.  In […]

Read more »

Predicting season records for NFL teams – overview

July 9, 2013
By
Predicting season records for NFL teams – overview

This is the first, non-technical, part of this series. See the second part for more detail. Introduction I was recently looking for a good machine learning task to try out, and I thought that doing something NFL-related would be interesting, because the NFL season is about to start (finally!). Why was I looking for a good machine learning task to try out? I have mostly done my data analysis work…

Read more »

Predicting season records for NFL teams – part 2

July 9, 2013
By
Predicting season records for NFL teams – part 2

This is the second, technical, part of this series. See the first part for the overview. Introduction This post will introduce the technical details behind the nfl season record prediction that was introduced in part one. After selecting the error metric and defining an acceptable baseline, which was setup in part one, the next step is to develop a plan of attack. In order to create and develop this plan,…

Read more »

Repost: Preventing Errors Through Reproducibility

July 9, 2013
By

Checklist mania has hit clinical medicine thanks to people like Peter Pronovost and many others. The basic idea is that simple and short checklists along with changes to clinical culture can prevent major errors from occurring in medical practice. One … Continue reading →

Read more »

Symposium Magazine

July 9, 2013
By

Symposium is a new online magazine subtitled “Where academia meets public life.” You can think of it as a sort of Slate magazine without Mickey Kaus, or as the Atlantic without the stylish writing. Here are the articles in the first issue, which has just been posted: Why Write the History of Capitalism? Louis Hyman [...]The post Symposium Magazine appeared first on Statistical Modeling, Causal Inference, and Social Science.

Read more »

A conversation with Professor Bin Yu

July 9, 2013
By
A conversation with Professor Bin Yu

The latest issue of ISCA Bulletin published my interview: A conversation with Professor Bin Yu. It is quite long, but informative. Here I picked out some short paragraphs based on my personal bias. [Before College] A math book from a cousin gave me my first boost into math when I was in 3rd and 4th grade. I […]

Read more »

Predicting season records for NFL teams – overview

July 9, 2013
By
Predicting season records for NFL teams – overview

This is the first, non-technical, part of this series. See the second part for more detail. Introduction I was recently looking for a good machine learning task to try out, and I thought that doing something NFL-related would be interesting, because the NFL season is about to start (finally!). Why was I looking for a good machine learning task to try out? I have mostly done my data analysis work…

Read more »

googleVis tutorial at useR!2013

July 9, 2013
By
googleVis tutorial at useR!2013

Today Diego and I will give our googleVis tutorial at useR!2013 in Albacete, Spain.googleVis Tutorial at useR! 2013We will cover:Introduction and motivationGoogle Chart ToolsR package googleVisConcepts of googleVisCase studiesgoogleVis on shiny

Read more »

Welcome Message

July 9, 2013
By
Welcome Message

Welcome Message for Readers of Kaiser Fung's blog, Big Data Plainly Spoken. Comments on Statistical Thinking in Everyday Life. Big Data Interpretation. Author of Numbersense and Numbers Rule Your World

Read more »

Announcing New Book: Numbersense

July 9, 2013
By
Announcing New Book: Numbersense

I have a new book arriving at stores this week. It’s titled Numbersense: How to Use Big Data to Your Advantage. If you read this blog, you’d have a good idea what the book is about. I analyze claims made in the media that are supported by analyses of data. I show you how I dissect these claims to decide whether they are credible, or they are bogus. The ability…

Read more »

Confusing Stats Terms Explained: Internal Consistency

July 8, 2013
By
Confusing Stats Terms Explained: Internal Consistency

Internal consistency refers to the general agreement between multiple items (often likert scale items) that make-up a composite score of a survey measurement of a given construct. This agreement is generally measured by the correlation between items. For example, a survey measure of depression may include many questions that each measure various aspects of depression, such as...

Read more »

Confusing Stats Terms Explained: Internal Consistency

July 8, 2013
By
Confusing Stats Terms Explained: Internal Consistency

Related Content:Confusing Stats Terms Explained: Heteroscedasticity (Heteroskedasticity)Confusing Stats Terms Explained: ResidualConfusing Stats Terms Explained: MulticollinearityConfusing Stats Terms Explained: Standard Deviation Internal consistency ...

Read more »


Subscribe

Email:

  Subscribe