# Choices in graphing parallel time series

September 16, 2012
By

(This article was originally published at Statistical Modeling, Causal Inference, and Social Science, and syndicated at StatsBlogs.)

I saw this graph posted by Tyler Cowen:

and my first thought was that the bar plot should be replaced by a line plot: Six lines, one for each income category, with each line being a time series of these changes. With a line plot, you can more easily see each time series (these are hard to see in the barplot because you have to follow each color and jump from decade to decade) and also compare the patterns for each category. The line plot pretty much dominates the bar plot.

At least that was the theory. Now here’s what actually happened.

I downloaded the data as Excel files, saved them as csv, then read them into R. In all, it took close to an hour to get the data set up in the format that was needed to make the graphs. At this point it was pretty easy to make the line plot. But the result was disappointing:

The six lines are hard to untangle (sure, a better color scheme might help, but it wouldn’t really solve the problem) and the graph as a whole is much less clear than the original bar plot.

My next try was small multiples: six little graphs, each with its own time series. That didn’t work so well either (although, on the plus side, it only took a few minutes to make that graph).

Then I thought of plotting the incomes over time (all these income values are inflation-adjusted, of course):

I like this one a lot. In particular, it shows that the drop from 2000-2010 is really a drop since 2007. (Although I suppose Cowen would argue that the drop was really happening earlier and it was just that the economy was doing a Wiley E. Coyote, standing in midair and not actually going into freefall until people realized they had gone off the edge of the cliff).

Still, even the time-trends graph is not quite a replacement for the original bar plot which shows so much drama. I think my recommended solution is to give the bar plot for the initial impression and then follow up immediately with the time-trends graph, which shows the big picture much more clearly.

P.S. The data are in the location indicated by the caption of the first graph above. Here’s my (ugly) R code to make the graphs:
``` n_years <- 64 # Save F02AR_2010 as csv file income_share <- read.csv ("F02AR_2010.csv", skip=4, nrow=n_years) year_income_share <- as.numeric (substr (income_share[,1], 1, 4)) # Remove thousands separators from F07AR_2010, then save as CSV file income_mean <- read.csv ("F07AR_2010.csv", skip=5, nrow=n_years) year_income_mean <- as.numeric (substr (income_mean[,1], 1, 4)) # if (sum(year.income.share!=year_income_mean)==0) year <- year_income_share else stop() income <- (income_share[,2:7]/100)*income_mean[,6] income[,1:5] <- income[,1:5]/.2 income[,6] <- income[,6]/.05 income <- income[n_years:1,] year <- rev(year)```

``` decades <- match (seq(1950,2010,10), year) income_decades <- income[decades,] n_decades <- length (decades) after_decades <- income_decades[2:n_decades,] before_decades <- income_decades[1:(n_decades-1),] total_changes <- ((after_decades - before_decades)/before_decades)/10 after <- income[2:n_years,] before <- income[1:(n_years-1),] changes <- ((after - before)/before) avg_changes <- array (NA, c(n_decades-1,ncol(income))) dimnames (avg_changes) <- list (paste(seq(1950,2000,10),"s",sep=""), colnames(income)) for (i in 1:(n_decades-1)){ avg_changes[i,] <- colMeans (changes[decades[i]:(decades[i+1]-1),]) } pdf ("changes1.pdf", height=6, width=8) y <- avg_changes x_labels <- rownames (y) line_labels <- c("Lowest fifth", "Second fifth", "Third fifth", "Fourth fifth", "Highest fifth", "Top 5 percent") n_x <- nrow (y) n_lines <- ncol (y) par (mar=c(3,4,1,1), mgp=c(2,.5,0), tck=-.01) plot (c(1,n_x), range(y), xlab="", ylab="Avg annual change", xaxt="n", yaxt="n", bty="l", type="n") y_ticks <- seq (-2,4,2) axis (2, y_ticks/100, paste (y_ticks, "%", sep="")) par (mgp=c(1,.5,0)) axis (1, 1:n_x, x_labels) abline (0, 0, col="gray") colors <- c("black", "gray20", "gray35", "gray45", "gray65", "brown") for (i in 1:n_lines){ lines (1:n_x, y[,i], col=colors[i]) text (4, y[4,i], line_labels[i], col=colors[i]) } mtext ("Average annual change in mean family income, 1950-2010,\nby quintile and for the top 5 percent", 3, -1) dev.off () pdf ("changes2.pdf", height=4, width=5) y <- avg_changes x_labels <- rownames (y) line_labels <- c("Lowest fifth", "Second fifth", "Third fifth", "Fourth fifth", "Highest fifth", "Top 5 percent") n_x <- nrow (y) n_lines <- ncol (y) par (mar=c(3,4,1,1), mgp=c(2,.5,0), tck=-.01, mfrow=c(3,2)) for (i in 1:n_lines){ plot (c(1,n_x), c(-.03,.05), xlab="", ylab="Avg annual change", xaxt="n", yaxt="n", bty="l", yaxs="i", type="n") y_ticks <- seq (-2,4,2) axis (2, y_ticks, paste (y_ticks, "%", sep="")) par (mgp=c(2,1.5,0)) axis (1, 1:n_x, x_labels) lines (1:n_x, y[,i]) mtext (line_labels[i]) } mtext ("Average annual change in mean family income, 1950-2010,\nby quintile and for the top 5 percent", 3, -1, outer=TRUE) dev.off () ```

```pdf ("income1.pdf", height=6, width=8) y <- income x_labels <- year line_labels <- c("Lowest fifth", "Second fifth", "Third fifth", "Fourth fifth", "Highest fifth", "Top 5 percent") n_x <- nrow (y) n_lines <- ncol (y) par (mar=c(3,4,1,1), mgp=c(2,.5,0), tck=-.01) plot (range(year), range(y), xlab="", ylab="Avg family income (in 2010 dollars)", xaxt="n", yaxt="n", bty="l",, type="n", log="y") axis (1, seq(1950,2010,10)) axis (2, c(1e4,2e4,5e4,1e5,2e5), c("10K","20K","50K","100K","200K")) for (i in 1:n_lines){ lines (year, y[,i]) text (year[n_years-8], y[n_years-8,i]*.88, line_labels[i]) } mtext ("Trends in mean family income, 1947-2010,\nby quintile and for the top 5 percent", 3, -1) dev.off () ```

Please comment on the article here: Statistical Modeling, Causal Inference, and Social Science

Tags: , ,

 Tweet

Email: