## What is it with Americans in Olympic ski teams from tropical countries?

March 2, 2014
Every time I hear this sort of story: Morrone—listed at 48 years old, which would have made her the oldest Olympic cross-country skier of all time by seven years—didn’t even show up for the 10K women’s classic on Feb. 13, claiming injury. (She was the only one of the race’s 76 entrants who didn’t start.) […]The post What is it with Americans in Olympic ski teams from tropical countries? appeared…

## BusinessNewsDaily Reference: What is Statistical Analysis?

March 2, 2014
From: http://www.businessnewsdaily.com/6000-statistical-analysis.htmlByChad Brooks, BusinessNewsDaily Contributor   |   February 28, 2014 12:07am ETIn an effort to organize their data and predict future trends based on the info...

## Simple Pharmacokinetics with Jags

March 2, 2014
In this post I want to analyze a first order pharmocokinetcs problem: the data of study problem 9, chapter 3 of Rowland and Tozer (Clinical pharmacokinetics and pharmacodynamics, 4th edition) with Jags. It is a surprising simple set of data, but still ...

## The Statistics behind “Verification by Multiplicity”

March 2, 2014
There’s a new post up at the ninazumel.com blog that looks at the statistics of “verification by multiplicity” — the statistical technique that is behind NASA’s announcement of 715 new planets that have been validated in the data from the Kepler Space Telescope. We normally don’t write about science here at Win-Vector, but we do […] Related posts: “I don’t think that means what you think it means;” Statistics to…

## Short Review: the War of Art by Steven Pressfield

March 2, 2014
The War of Art: Winning the Inner Creative Battle by Steven Pressfield Pressfield is the author of several bestsellers. The War of Art is a 12 step self-help support group for procrastinators, a biological and psychological disection of procrastination...

March 2, 2014
It's time for the monthly round-up of recommended reading material.Gan, L. and J. Jiang, 1999. A test for global maximum. Journal of the American Statistical Association, 94, 847-854.Nowak-Lehmann, F., D. Herzer, S. Vollmer, and I. Martinez-Zarzosa, 20...

## Oldies but Goldies: Statistical Graphics Books

March 2, 2014
I just wanted to plug for three classical books on statistical graphics that I really enjoyed reading. The books are old (that is, older than me) but still relevant and together they give a sense of the development of exploratory graphics in general ...

## Short Review: Writing Tools: 50 Essential Strategies for Every Writer

March 1, 2014
This is the first of perhaps three short book reviews.  Certain basics of writing I go over with almost every student. Organization, content, paragraphs and sentences. Roy Peter Clark's Writing Tools: 50 Essential Strategies for Every Writer covers m...

## C++11 versus R Standalone Random Number Generation Performance Comparison

If you are writing some C++ code with the intent of calling it from R or even developing it into a package you might wonder whether it is better to use the pseudo random number library native to C++11 or the R standalone library. On the one hand users of your package might have an […] The post C++11 versus R Standalone Random Number Generation Performance Comparison appeared first on…

## Lines and Circles and Logistic Regression

March 1, 2014
Euclidean geometry, formalized in Euclid's Elements about 2,300 years ago, is in many ways a study of lines and circles.  One might think that after more than two millennia, we have moved beyond such basic shapes particularly in a realm such as da...

## Cosma Shalizi gets tenure (at last!) (metastat announcement)

March 1, 2014
News Flash! Congratulations to Cosma Shalizi who announced yesterday that he’d been granted tenure (Statistics, Carnegie Mellon). Cosma is a leading error statistician, a creative polymath and long-time blogger (at Three-Toad sloth). Shalizi wrote an early book review of EGEK (Mayo 1996)* that people still send me from time to time, in case I hadn’t […]

## “We are moving from an era of private data and public analyses to one of public data and private analyses. Just as we have learned to be cautious about data that are missing, we may have to be cautious about missing analyses also.”

March 1, 2014
Stephen Senn writes: For many years now I [Senn] have been making the point that obtaining a license to market a drug should carry with it the obligation to share the results with interested parties. . . . Amongst those misunderstanding the issues, are many who work in the pharmaceutical industry. A common assumption is […]The post “We are moving from an era of private data and public analyses to…

## Fitting models to long time series

March 1, 2014
I received this email today: I recall you made this very insightful remark somewhere that, fitting a standard arima model with too much data, ie. a very long time series, is a bad idea. Can you elaborate why? I can see the issue with noise, which compounds the ML estimation as the series gets too long. But is there anything else? I’m not sure where I made a comment about…

## On Getting Tenure

March 1, 2014
Attention conservation notice: Navel-gazing by a middle-aged academic. I got tenure a few weeks ago. (Technically it takes effect in July.) The feedback from the department and university which accompanied the decision was gratifyingly positive, a...

## Machine Learning Lesson of the Day – K-Nearest Neighbours Regression

$Machine Learning Lesson of the Day – K-Nearest Neighbours Regression$

I recently introduced the K-nearest neighbours classifier.  Some slight adjustments to the same algorithm can make it into a regression technique. Given a training set and a new input , we can predict the target of the new input by identifying the K data (the K “neighbours”) in the training set that are closest to by Euclidean […]

## The Normality of Joint and Marginal Distributions

February 28, 2014
I'm often surprised how many people are confused when it comes to joint and marginal normal distributions.Most students of econometrics are taught that the marginal and conditional distributions associated with a multivariate normal random vector are t...

## Combining two of my interests

February 28, 2014
Paul Alper writes: Hi Andrew (or Andy or even Gelman [17 of them]): Go to this link and have some fun with (useless? powerful?) data mining. As the authors say, it is addictive. Paul (no other way to spell it) Alper [215 of us] I’m reminded of this discussion from 2012, “Michael’s a Republican, Susan’s […]The post Combining two of my interests appeared first on Statistical Modeling, Causal Inference, and…

## Using CART for Stock Market Forecasting

February 28, 2014
There is an enormous body of literature both academic and empirical about market forecasting. Most of the time it mixes two market features: Magnitude and Direction. In this article I want to focus on identifying the market direction only. The goal I set myself, is to identify market conditions when the odds are significantly biased […]

## God/leaf/tree

February 28, 2014
Govind Manian writes: I wanted to pass along a fragment from Lichtenberg’s Waste Books — which I am finding to be great stone soup — that reminded me of God is in Every Leaf: To the wise man nothing is great and nothing small…I believe he could write treatises on keyholes that sounded as weighty […]The post God/leaf/tree appeared first on Statistical Modeling, Causal Inference, and Social Science.

## r4stats.com 2013 in review

February 28, 2014
The WordPress.com stats helper monkeys prepared a 2013 annual report for this blog. Here’s an excerpt: The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 150,000 times in 2013. If it were an exhibit at … Continue reading →

## Useful Functions in R for Manipulating Text Data

$Useful Functions in R for Manipulating Text Data$

Introduction In my current job, I study HIV at the genetic and biochemical levels.  Thus, I often work with data involving the sequences of nucleotides or amino acids of various patient samples of HIV, and this type of work involves a lot of manipulating text.  (Strictly speaking, I analyze sequences of nucleotides from DNA that are reverse-transcribed from […]