Shrinkage in multi-level hierarchical models

November 28, 2012
By

(This article was originally published at Doing Bayesian Data Analysis, and syndicated at StatsBlogs.)


In a previous post and in a video, I used baseball data to illustrate some ideas about shrinkage and multiple comparisons in hierarchical models. Here I take it a step further, to multi-level hierarchical models, to illustrate shrinkage of estimates within different levels of the model.

Background: Consider the sport of baseball, in which players have opportunities at bat during a season of games, and occasionally hit the ball. In Major League Baseball, players typically hit the ball on about 24% of their opportunities at bat. That ratio, of hits (H) divided by opportunities at bat (AB), is called the "batting average" of each player. Players also must play a position in the field when the other team is at bat. Some fielding positions are specialized and those players are expected to focus on those skills, not necessarily on hitting. In particular, pitchers are not expected to be strong hitters, and catchers might also not be expected to focus so much on hitting. (But I'm no baseball aficionado, so please forgive any inadvertent insults or errors! My thanks to colleagues Ed Hirt and Jim Sherman for tutoring me in some of the basics of baseball rules.)

Our goal is to estimate the underlying probability of getting a hit by each player, based on their hits H, at bats AB, and primary position. The model will put each player's estimated hitting ability (i.e., probability of getting a hit) under a distribution for the primary position, and put those position distributions under an overarching distribution across all positions. Here is a diagram of the hierarchical model. It should be scanned from the bottom up, as described next.
The data are indicated at the bottom of the diagram, with Hi being the number of hits for player i, and ABi being the number of at-bats for player i. The parameter θi in the binomial distribution is the underlying probability of getting a hit for player i. The θi are assumed to come from a beta distribution, but there is a distinct beta distribution for each type of field position: pitcher, catcher, 1st base, 2nd base, 3rd base, shortstop, left field, center field, and right field. The subscript pp(i) indicates the primary position of player i. Thus, each of the nine position-specific beta distributions has a mean μpp and a consistency (like precision) κpp. (Now I notice a small error in the diagram: The ellipses labeled with pp(i) should instead be labeled with only pp.) Moving up the diagram, the nine position-specific means μpp come from a beta distribution, with a a mean μμpp that expresses the central tendency across all positions.At the top level, the priors are vague and noncommittal.

The data are from 948 players in the 2012 regular season of Major League Baseball. Thanks to Matthew Maenner for posting a comment explaining how to download the data. The model simultaneously estimates 948 + 9*2 + 4 = 970 parameters.

Results:


Let's first consider the estimates of hitting ability for different positions. At right are the marginal posterior distributions for the μpp parameters for the positions of catcher and pitcher. The blue distributions show the credible values of the parameters, and the red crosses on the x-axis indicate the ratio in the data of total hits divided by total at bats for all players in that position. Notice that the modes of the posterior are not centered on the red crosses. Instead, the modal estimates are shrunken toward the middle between the pitchers (who tend to have low batting averages) and the catchers (who tend to have higher batting averages). The lowest panel shows the distribution of differences between catchers and pitchers. The distribution falls far above zero, so we can say with high credibility that catchers hit better than pitchers. (Click on the picture to see it enlarged.)


At right is a comparison of two individual players with same record, 1 hit in 3 at bats, but playing different positions, namely catcher and pitcher. Notice that the red crosses are at the same place on the x-axes for the two players, but notice the radically different estimates of their underlying probability of getting a hit, because of the different positions they play. The data from all the other catchers tells the model that catchers tend to have theta values around 0.24. Because this particular catcher has so few data to inform his estimate, the estimate from the higher-level distribution dominates. The same is true for the pitcher, but the higher-level distribution says that pitchers tend to have theta values around 0.13. The resulting distribution of differences, in the lowest panel, suggests that these two players may have credibly different hitting abilities, even though their hits and at-bats are identical! In other words, because we know the players play these particular different positions, we can infer that they probably have different hitting abilities. (Click on the picture to see it enlarged.)


Here is a comparison of two indivdiual players, both of whom are pitchers, with seemingly quite different batting averages of 18/61 and 4/61, as marked by the red crosses on the x-axis. But the posterior estimates of their hitting probabilities are not so different. Notice the dramatic shrinkage of the estimates toward the average of players who are pitchers. In the lower panel, we see that a difference of zero is credible. (Click on the picture to see it enlarged.)


Here is a comparison of two indivdiual players with different records but both playing Center Field. The larger amount of individual data (593 and 120 at bats) allows somewhat less shrinkage than the previous example, so that the estimated difference excludes zero (but still overlaps the particular ROPE used here). Notice also the difference in widths of HDIs for the two players: There is a narrower HDI for the player with more data. (Click on the picture to see it enlarged.)


Here are two players from Right Field with huge numbers of at-bats and nearly the same (impressive!) batting average. The HDI of the difference falls almost entirely within the ROPE, so we accept that the players have identical probability of getting a hit (for practical purposes). (Click on the picture to see it enlarged.)







Summary: With multi-level hierarchical models, there is shrinkage within each level. In this model, there was shrinkage of the position parameters toward each other, as illustrated by the pitcher and catcher distributions. And there was shrinkage of the player parameters within each position, as shown by various examples above. The model also provided some strong inferences about player abilities based on position alone, as illustrated by the estimates for individual players with few at bats.



Please comment on the article here: Doing Bayesian Data Analysis


Subscribe

Email:

  Subscribe