## Row names in data frames: beware of 1:nrow

March 21, 2012
I spent some time puzzling over row names in data frames in R this morning. It seems that if you make the row names for a data frame, x, as 1:nrow(x), R will act as if you’d not assigned row names, and the names might get changed when you do rbind. Here’s an illustration: As […]

## Nordic Countries Dominate the World in Internet Penetration

March 21, 2012
Something about that cold weather... The number of internet users in the Nordic countries has greatly outpaced the world by comparison. Denmark, Iceland, Norway, Sweden, Finland - all in the elite echelon. These countries share a common ancestry -...

## The sun will probably come out tomorrow

March 21, 2012
I am always looking for good examples of Bayesian analysis, so I was interested in this paragraph from The Economist (September 2000): "The canonical example is to imagine that a precocious newborn observes his first sunset, and wonders whether the sun...

## Using R for a salary negotiation–an extension of decision tree models

March 21, 2012
Let’s say you are in the middle of a salary negotiation, and you want to know whether you should be aggressive in your offering or conservative. One way to help with the decision is to make a decision tree. We’ll work with the following assumptions...

## Copy and paste small data sets into R

March 21, 2012
How can I embed a small data set into my R code? That was the question I came across today, when I prepared my talk about Dynamical Systems in R with simecol for the forthcoming Cologne R user group meeting. I wanted to add all the R code of the talk ...

## Geocode and reverse geocode your data using, R, JSON and Google Maps’ Geocoding API

March 20, 2012
Geocode and reverse geocode your data using, R, JSON and Google Maps' Geocoding APITo geocode and reverse geocode my data, I use Google's Geocoding service which returns the geocoded data in a JSON. I will recommend that you register with Google Maps A...

## Statistical Misconception Removal

March 20, 2012
Our central city is being “deconstructed”. That’s the modern word for demolition. We live in Christchurch, New Zealand where many of our buildings were badly damaged by a string of serious earthquakes over the last 18 months, beginning on 4 … Continue reading →

## Easiest and hardest classes to teach

March 20, 2012
I’ve taught a variety of math classes, and statistics has been the hardest to teach. The thing I find most challenging is coming up with homework problems. Most exercises are either blatantly artificial or extremely tedious. It’s hard to find moderately realistic problems that don’t take too long to work out. The course I’ve found easiest [...]

## Visualize This: The FlowingData Guide to Design, Visualization, and Statistics

March 19, 2012
From: http://book.flowingdata.com/A book by Nathan Yau who writes for FlowingData, Visualize This is a practical guide on visualization and how to approach real-world data. The book is published by Wiley and is available&n...

## Open data: Jimmy Wales and the Man from Sweden – Web Exclusive Article – Significance Magazine

March 19, 2012
Open data: Jimmy Wales and the Man from Sweden - Web Exclusive Article - Significance MagazineAuthor: Julian ChampkinJimmy Wales at the Gottlieb DuttweilerAwards Show, 2011. Image by ThomasEntzeroth (photographer) on behalfof Gottlieb Duttweiler I...

## Graphing between-subject confidence intervals for ANOVA

March 19, 2012
This is a quick follow up to my earlier post that discussed how to graph CIs for within-subjects (repeated measures) ANOVA designs. My forthcoming book Serious stats describes how to do this for between-subjects designs (a much simpler proble...

## Backtesting Asset Allocation portfolios

March 19, 2012
$Backtesting Asset Allocation portfolios$

In the last post, Portfolio Optimization: Specify constraints with GNU MathProg language, Paolo and MC raised a question: “How would you construct an equal risk contribution portfolio?” Unfortunately, this problem cannot be expressed as a Linear or Quadratic Programming problem. The outline for this post: I will show how Equal Risk Contribution portfolio can be [...]

## Independent measures (between-subjects) ANOVA and displaying confidence intervals for differences in means

March 18, 2012
$Independent measures (between-subjects) ANOVA and displaying confidence intervals for differences in means$

In Chapter 2 (Confidence Intervals) of Serious stats I consider the problem of displaying confidence intervals (CIs) of a set of means (which I illustrate with the simple case of two independent means). Later, in Chapter 16 (Repeated Measures ANOVA), I consider the trickier problem of displaying of two or more means from paired or […]

## Logistic map: Feigenbaum diagram in R

March 17, 2012
The other day I found some old basic code I had written about 15 years ago on a Mac Classic II to plot the Feigenbaum diagram for the logistic map. I remember, it took the little computer the whole night to produce the bifurcation chart. With today's c...

## simulated annealing for Sudokus [2]

March 16, 2012
On Tuesday, Eric Chi and Kenneth Lange arXived a paper on a comparison of numerical techniques for solving sudokus. (The very Kenneth Lange who wrote this fantastic book on numerical analysis.) One of these techniques is the simulated annealing approach I had played with a long while ago.  They seem to use the same penalisation [...]

## Standards for statistical data dissemination: a wish list

March 16, 2012
Standards for statistical data dissemination: a wish list View more PowerPoint from Xavier Badosa The digitization of information exchange processes has led in many industries to define standards to be used in the B2B side of the value chain for the c...

## p curves revisited

March 15, 2012
I finally found some time to take a closer look at p curves. I haven't had a chance to follow-up my simulations (and probably won't for a few weeks if not months), but I have had time to think through the ideas the p curve approach raises based on some...

## Seductive Causation

March 15, 2012
Causation is a seductive notion. We want to make meaning out of our world. I love playing “the beeping nose” with little children. I press their nose and it beeps. I press my nose and it whirrs. It fascinates them. … Continue reading →

## Call for chapters: Data Mining Applications with R

March 15, 2012
Data Mining Applications with R A book to be published by Elsevier http://www.RDataMining.com/books/book2 Proposal Submission Deadline: April 30, 2012 Introduction R is one of the most widely used data mining tools in scientific and business applications, among dozens of commercial … Continue reading →

## Ideas on A Really Fast Statistics Journal

March 15, 2012
I was writing comments on the blog post A proposal for a really fast statistics journal, and I realized the comment box was too small to write down my ideas. I like the proposal a lot, and I feel really bad about the current model of submitting and rev...

March 15, 2012
At PyCon last week I taught a tutorial on Bayesian statistics.  It is based on Chapters 5 and 8 of Think Stats.  Here is the web page I created for the tutorial. And here, courtesy of PyCon and pyvideo.org, is the video.  It's three ho...

## R code for p curves

March 14, 2012
I have finally got around to posting the R code for my p curve simulation. Those familiar with R will realize how crude it is (I've been caught up with other urgent stuff and had no time to explore further).You are welcome to play with (and improve!) t...