I noticed this article in the newspaper today:

A simple rule change in Ivy League football games has led to a significant drop in concussions, a study released this week found.

After the Ivy League changed its kickoff rules in 2016, adjusting the kickoff and touchback lines by just five yards, the rate of concussions per 1,000 kickoff plays fell to two from 11, according to the study, which was published Monday in the Journal of the American Medical Association. . . .

Under the new system, teams kicked off from the 40-yard line, instead of the 35, and touchbacks started from the 20-yard line, rather than the 25.

The result? A spike in the number of touchbacks — and “a dramatic reduction in the rate of concussions” . . .

The study looked at the rate of concussions over three seasons before the rule change (2013 to 2015) and two seasons after it (2016 to 2017). Researchers saw a larger reduction in concussions during kickoffs after the rule change than they did with other types of plays, like scrimmages and punts, which saw only a slight decline. . . .

I was curious so I followed the link to the research article, “Association Between the Experimental Kickoff Rule and Concussion Rates in Ivy League Football,” by Douglas Wiebe, Bernadette D’Alonzo, Robin Harris, et al.

From the Results section:

Kickoffs resulting in touchbacks increased from a mean of 17.9% annually before the rule change to 48.0% after. The mean annual concussion rate per 1000 plays during kickoff plays was 10.93 before the rule change and 2.04 after (difference, −8.88; 95% CI, −13.68 to −4.09). For other play types, the concussion rate was 2.56 before the rule change and 1.18 after (difference, −1.38; 95% CI, −3.68 to 0.92). The difference-in-differences analysis showed that 7.51 (95% CI, −12.88 to −2.14) fewer concussions occurred for every 1000 kickoff plays after vs before the rule change.

I took a look at the table and noticed some things.

First, the number of concussions was pretty high and the drop was obviously not statistical noise: 126 (that is, 42 per year) for the first three years, down to 33 (15.5 per year) for the last two years. With the exception of punts and FG/PATs, the number of cases was large enough that the drop was clear.

Second, I took a look at the confidence intervals. The confidence interval for “other play types combined” includes zero: see the bottom line of the table. Whassup with that?

I shared this example with my classes this morning and we also had some questions about the data:

– How is “concussion” defined? Could the classification be changing over time? I’d expect that, what with increased concussion awareness, that concussion would be diagnosed more than before, which would make the declining trend in the data even more impressive. But I don’t really know.

– Why data only since 2013? Maybe that’s only how long they’ve been recording concussions.

– We’d like to see the data for each year. To calibrate the effect of a change over time, you want to see year-to-year variation, in this case a time series of the 5 years of data. Obviously the years of the concussions are available, and they might have even been used in the analysis. In the published article, it says, “Annual concussion counts were modeled by year and play type, with play counts as exposures, using maximum likelihood Poisson regression . . .” I’m not clear on what exactly was done here.

– We’d also like to see similar data from other conferences, not just the Ivy League, to see changes in games that did not follow these rules.

– Even simpler than all that, we’d like to see the raw data on which the above analysis was based. Releasing the raw data—that would be trivial. Indeed, the dataset may

alreadybe accessible—I just don’t know where to look for it. Ideally we’d move to a norm in which it was just expected that every publication came with its data and code attached (except when not possible for reasons of privacy, trade secrets, etc.). It just wouldn’t be a question.

The above requests are not meant to represent devastating criticisms of the research under discussion. It’s just hard to make informed decisions without the data.

**Checking the calculations**

Anyway, I was concerned about the last row of the above table so I thought I’d do my best to replicate the analysis in R.

First, I put the data from the table into a file, football_concussions.txt:

y1 n1 y2 n2 kickoff 26 2379 3 1467 scrimmage 92 34521 28 22467 punt 6 2496 2 1791 field_goal_pat 2 2090 0 1268

Then I read the data into R, added a new row for “Other plays” summing all the non-kickoff data, and computed the classical summaries. For each row, I computed the raw proportions and standard errors, the difference between the proportion, and the standard errors of that difference. I also computed the difference in differences, comparing the change in the concussion rate for kickoffs to the change in concussion rate for non-kickoff plays, as this comparison was mentioned in the article’s results section. I multiplied all the estimated differences and standard errors by 1000 to get the data in rates per thousand.

Here’s the (ugly) R code:

concussions <- read.table("football_concussions.txt", header=TRUE) concussions <- rbind(concussions, apply(concussions[2:4,], 2, sum)) rownames(concussions)[5] <- "other_plays" compare <- function(data) { p1 <- data$y1/data$n1 p2 <- data$y2/data$n2 diff <- p2 - p1 se_diff <- sqrt(p1*(1-p1)/data$n1 + p2*(1-p2)/data$n2) diff_in_diff <- diff[1] - diff[5] se_diff_in_diff <- sqrt(se_diff[1]^2 + se_diff[5]^2) return(list(diffs=data.frame(data, diff=diff*1000, diff_ci_lower=(diff - 2*se_diff)*1000, diff_ci_upper=(diff + 2*se_diff)*1000), diff_in_diff=diff_in_diff*1000, se_diff_in_diff=se_diff_in_diff*1000)) } print(lapply(compare(concussions), round, 2))

And here's the result:

$diffs y1 n1 y2 n2 diff diff_ci_lower diff_ci_upper kickoff 26 2379 3 1467 -8.88 -13.76 -4.01 scrimmage 92 34521 28 22467 -1.42 -2.15 -0.69 punt 6 2496 2 1791 -1.29 -3.80 1.23 field_goal_pat 2 2090 0 1268 -0.96 -2.31 0.40 other_plays 100 39107 30 25526 -1.38 -2.05 -0.71 $diff_in_diff [1] -7.5 $se_diff_in_diff [1] 2.46

The differences and 95% intervals computed this way are similar, although not identical to, the results in the published table--but there are some differences that baffle me. Let's go through the estimates, line by line:

- Kickoffs. Estimated difference is identical 95% interval is correct to two significant digits. This unimportant discrepancy could be coming because I used a binomial model and the published analysis used a Poisson regression.

- Plays from scrimmage. Estimated difference is the same for both analyses; lower confidence bound is almost the same. But the upper bounds are different: I have -0.69; the published analysis is -0.07. I'm guessing they got -0.70 and they just made a typo when entering the result into the paper. Not as bad as the famous Excel error, but these slip-ups indicate a problem with workflow. A problem I often have in my workflow too, as in the short term it is often easier to copy numbers from one place to another than to write a full script.

- Punts. Estimated difference and confidence interval work out to within 2 significant digits.

- Field goals and extra points. Same point estimate, but confidence bounds are much different, with the published intervals much wider than mine. This makes sense: There's a zero count in these data, and I was using the simple sqrt(p_hat*(1-p_hat)) standard deviation formula which gives too low a value when p_hat=0. Their Poisson regression would be better.

- All non-kickoffs. Same point estimate but much different confidence intervals. I think they made a mistake here: for one thing, their confidence interval doesn't exclude zero, even though the raw numbers show very strong evidence of a difference (30 concussions after the rule change and 100 before; even if you proportionally scale down the 100 to 60 or so, that's still sooo much more than 30; it could not be the result of noise). I have no idea what happened here; perhaps this was some artifact of their regression model?

This last error is somewhat consequential, in that it could lead readers to believe that there's no strong evidence that the underlying rate of concussions was declining for non-kickoff plays.

- Finally, the diff-in-diff is three standard errors away from zero, implying that the relatively larger decline in concussion rate in kickoffs, compared to non-kickoffs, could not be simply explained by chance variation.

**Difference in difference or ratio of ratios?**

But is the simple difference the right thing to look at?

Consider what was going on. Before the rule change, concussion rates were much higher for kickoffs than for other plays. After the rule change, concussion rates declined for all plays.

As a student in my class pointed out, it really makes more sense to compare the *rates of decline* than the absolute declines.

Put it this way. If the probabilities of concussion are:

- p_K1: Probability of concussion for a kickoff play in 2013-2015

- p_K2: Probability of concussion for a kickoff play in 2016-2017

- p_N1: Probability of concussion for a non-kickoff play in 2013-2015

- p_N2: Probability of concussion for a non-kickoff play in 2016-2017.

Following the published paper, we estimated the difference in differences, (p_K2 - p_K1) - (p_N2 - p_N1).

But I think it makes more sense to think multiplicatively, to work with the ratio of ratios, (p_K2/p_K1) / (p_N2/p_N1).

Or, on the log scale, (log p_K2 - log p_K1) - (log p_N2 - log p_N1).

What's the estimate and standard error of this comparison?

The key step is that we can use relative errors. From the Poisson distribution, the relative sd of an estimated rate is 1/sqrt(y), so this is the approx sd on the log scale. So, the estimated difference in difference of log probabilities is (log(3/1467) - log(26/2379)) - (log(30/25526) - log(100/39107)) = -0.90. That's a big number: exp(0.90) = 2.46, which means that the concussion rate fell over twice as fast in kickoff than in non-kickoff plays.

But the standard error of this difference in difference of logs is sqrt(1/3 + 1/26 + 1/30 + 1/100) = 0.64. The observed d-in-d-of-logs is -0.90, which is less than two standard errors from zero, thus not conventionally statistically significant.

So, from these data alone, we cannot confidently conclude that the relative decline in concussion rates is more for kickoffs than for other plays. That estimate of 3/1467 is just too noisy.

We could also see this using a Poisson regression with four data points. We create the data frame using this (ugly) R code:

data <- data.frame(y = c(concussions$y1[c(1,5)], concussions$y2[c(1,5)]), n = c(concussions$n1[c(1,5)], concussions$n2[c(1,5)]), time = c(0, 0, 1, 1), kickoff = c(1, 0, 1, 0))

Which produces this:

y n time kickoff 1 26 2379 0 1 2 100 39107 0 0 3 3 1467 1 1 4 30 25526 1 0

Then we load Rstanarm:

library("rstanarm") options(mc.cores = parallel::detectCores())

And run the regression:

fit <- stan_glm(y ~ time*kickoff, offset=log(n), family=poisson, data=data) print(fit)

Which yields:

Median MAD_SD (Intercept) -6.0 0.1 time -0.8 0.2 kickoff 1.4 0.2 time:kickoff -0.9 0.6

The coefficient for time is negative: concussion rates have gone down a lot. The coefficient for kickoff is positive: kickoffs have higher concussion rates. The coefficient for the interaction of time and kickoff is negative: concussion rates have been going down faster for kickoffs. But that last coefficient is less than two standard errors from zero. This is just confirming with our regression analysis what we already saw with our simple standard error calculations.

OK, but don't I think that the concussion rate really was going down faster, even proportionally, in kickoffs than in other plays? Yes, I do, for three reasons. In no particular order:

- The data *do* show a faster relative decline, albeit with uncertainty;

- The story makes sense given that the Ivy League directly changed the rules to make kickoffs safer;

- Also, there were many more touchbacks, and this directly reduces the risks.

That's fine.

**Summary**

- There seem to be two mistakes in the published paper: (1) some error in the analysis of the plays from scrimmage which led to too wide a standard error, and (2) a comparison of absolute rather than relative probabilities.

- It would be good to see the raw data for each year.

- The paper's main finding---a sharp drop in concussions, especially in kickoff plays---is supported both by the data and our substantive understanding.

- Concussion rates fell faster for kickoff plays, but they also dropped a lot in non-kickoff plays too. I think the published article should've emphasized this finding more than they did, but they perhaps here were hindered by the error they made in their analysis leading to an inappropriately wide confidence interval for non-kickoff plays. The decline in concussions for non-kickoff plays is consistent with a general rise in concussion awareness.

**P.S.** I corrected a mistake in the above Poisson regression code, pointed out by commenter TJ (see here).

The post “Ivy League Football Saw Large Reduction in Concussions After New Kickoff Rules” appeared first on Statistical Modeling, Causal Inference, and Social Science.