## Question 25 of my final exam for Design and Analysis of Sample Surveys

June 4, 2012
25. You are using multilevel regression and poststratification (MRP) to a survey of 1500 people to estimate support for the space program, by state. The model is fit using, as a state- level predictor, the Republican presidential vote in the state, which turns out to have a low correlation with support for the space program. [...]

## Poking at the data behind a chart

June 4, 2012
Reader Jamie D. wasn't very amused by the following chart, from the Freakonomics blog (link): Jamie summarized his view as follows: First of all, a quick look of the graph makes you think you're comparing states with helmet laws vs....

## How Big Data Gets Real

June 4, 2012
How Big Data Gets Real: Big Data is moving up a classic modern curve, from discovery to science, and on to engineering and mass use. We are not as far along as a lot of people selling the boom would have you believe, but lots of good businesses are bei...

## Metropolis Hastings MCMC when the proposal and target have differing support

June 4, 2012
$Metropolis Hastings MCMC when the proposal and target have differing support$

Introduction Very often it is desirable to use Metropolis Hastings MCMC for a target distribution which does not have full support (for example, it may correspond to a non-negative random variable), using a proposal distribution which does (for example, a Gaussian random walk proposal). This isn’t a problem at all, but on more than one […]

## Massive confusion about a study that purports to show that exercise may increase heart risk

June 4, 2012
I read this front-page New York Times article and was immediately suspicious. Here’s the story (from reporter Gina Kolata): Could exercise actually be bad for some healthy people? A well-known group of researchers, including one who helped write the scientific paper justifying national guidelines that promote exercise for all, say the answer may be a [...]

## Slidify: Things are coming together fast

June 4, 2012
Tools for using R/RStudio as a one-stop shop for research and presentation have been coming out quickly. I think this one has a good shot of being included in future releases of RStudio: The other day I ran across a new R package called slidify by Ramn...

## Rename many variables that have numerical suffixes and a common prefix

June 4, 2012
I recently read a blog post in which a SAS user had to rename a bunch of variables named A1, A2,..., A10, such as are contained in the following data set: /* generate data with variables A1-A10 */ data A; array A[10] A1-A10 (1); do i = 1 to 10; [...]

## PDF slides and R code examples on Data Mining and Exploration

June 4, 2012
by Yanchang Zhao, RDataMining.com There are some nice slides and R code examples on Data Mining and Exploration at http://www.inf.ed.ac.uk/teaching/courses/dme/, which are listed below. PDF Slides: - Overview of Data Mining http://www.inf.ed.ac.uk/teaching/courses/dme/2012/slides/datamining_intro4up.pdf - Visualizing Data http://www.inf.ed.ac.uk/teaching/courses/dme/2012/slides/visualisation4up.pdf - Decision trees http://www.inf.ed.ac.uk/teaching/courses/dme/2012/slides/classification4up.pdf … Continue reading →

## Obtaining a protein-protein interaction network for a gene list in R

June 4, 2012
Building a network of interaction between a bunch of genes can help a great deal in understanding the relationships between the seemingly disparate elements from your list. It can seems challenging at first to build such network but it's less complicat...

## From Data to Trends

June 4, 2012
After my recent abstraction exercise created some interesting discussion but kind of went off in a slightly wrong direction, here is another experiment. Let’s take the data from Nigel Holmes’ famous Monster chart and turn that into a simple bar chart. I chose a similarly basic one here as the one in Bateman et al.’s study, but the filled bars are slightly less ugly. The x axis shows years in…

## Question 24 of my final exam for Design and Analysis of Sample Surveys

June 3, 2012
24. A supermarket chain has 100 equally-sized stores. It is desired to estimate the proportion of vegetables that spoil before being sold. The following sampling designs are considered: (a) Sample 10 stores, then sample half the vegetables within each of these stores; or (b) Sample 20 stores, then sample one-quarter of the vegetables within each [...]

## Better Life Index 2012

June 3, 2012
Play around and enjoy a beautiful data visualisation full of insights.

## NBA Playoff Predictions Update 2 and Results (3-1)

June 3, 2012
This is my second follow-up to my previous two posts which were about predicting NBA games with an algorithm, and my first update to the algorithm. The algorithm's record is now 3-1, as it correctly predicted Boston and Oklahoma City as winners of the...

## Question about predictive checks

June 3, 2012
Klaas Metselaar writes: I [Metselaar] am currently involved in a discussion about the use of the notion “predictive” as used in “posterior predictive check”. I would argue that the notion “predictive” should be reserved for posterior checks using information not used in the determination of the posterior. I quote from the discussion: “However, the predictive [...]

## US market portrait 2012 week 23

June 3, 2012
US large cap market returns. Fine print The data are from Yahoo Almost all of the S&P 500 stocks are used The initial post was “Replacing market indices” The R code is in marketportrait_funs.R Subscribe to the Portfolio Probe blog by Email

## Another retraction

June 2, 2012
Xian points me to this pitiful story. I hate that these people never just say they’re sorry, for wasting everyone’s time if for nothing else.

## Pasting Excel data into R on a Mac

June 2, 2012
When starting out with R, getting data in and out can be a bit of a pain. It should take long to work out a convenient method – depending on what OS you use and what other packages you work with. In my case I prefer to work with Excel spreadsheets (which are versatile and […]

## Question 23 of my final exam for Design and Analysis of Sample Surveys

June 2, 2012
23. Suppose you are conducting a survey in which people are asked about their health behaviors (how often they wash their hands, how often they go to the doctor, etc.). There is a concern that different interviewers will get different sorts of responses—that is, there may be important interviewer effects. Describe (in two sentences) how [...]

## Useful for referring–6-2-2012

June 2, 2012
Note: the following 4-7 are from Simply Statistics. A Personal Perspective on Machine Learning The differing perspectives of statistics and machine learning Kernel Methods and Support Vector Machines de-Mystified I love this article in the WSJ about the crisis at JP Morgan. The key point it highlights is that looking only at the high-level analysis and summaries can [...]

## Helpful on happiness

June 2, 2012
Following on our recent discussion of contradictory findings on happiness, David Austin writes: A pellucid discussion of happiness and happiness research is Fred Feldman, What is This Thing Called Happiness? (Oxford University Press, 2010). And here&#8...

## Visualizing car brand choices in ggplot2

June 2, 2012
I always like to read new posts at chartsnthings as they always inspire me with new ideas for data visualization. Yesterday I have read an article on choices of car brands by members of parliament in Poland in Gazeta.pl. It contains a simple ...

## Distribution of Oft-Used Bash Commands

June 1, 2012
Browsing commandlinefu.com today, I came across this little one-liner to display which commands I use most often. Here’s what I got: Yep, seems legit. I navigate and look at files a whole bunch (ls, cd, cat), and I do a butt tonne of editing (vim). I sudo like a boss, hop onto various servers (ssh),