# Visualize repetition in song lyrics

March 14, 2018
By

(This article was originally published at The DO Loop, and syndicated at StatsBlogs.)

One of my favorite magazines, Significance, printed an intriguing image of a symmetric matrix that shows repetition in a song's lyrics. The image was created by Colin Morris, who has created many similar images. When I saw these images, I knew that I wanted to duplicate the analysis in SAS!

### Visualize repetition in lyrics: A simple example

The analysis is easy. Suppose that a song (or any text source) contains N words. Define the repetition matrix to be the N x N matrix where the (i,j)th cell has the value 1 if the i_th word is the same as the j_th word. Otherwise, the (i,j)th cell equals 0. Now visualize the matrix by using a heat map: Black indicates cells where the matrix is 1 and white for 0. A SAS program that performs this analysis is available at the end of this article.

To illustrate this algorithm, consider the nursery rhyme, "Row, Row, Row the Boat":

```Row, row, row your boat Gently down the stream Merrily, merrily, merrily, merrily Life is but a dream.```

There are 18 words in this song. Words 1–3 are repeated, as are words 10-–13. You can use the SAS DATA steps to read the words of the song into a variable and use other SAS functions to strip out any punctuation. You can then use SAS/IML software to construct and visualize the repetition matrix. The details are shown at the end of this article.

The repetition matrix for the song "Row, Row, Row, Your Boat" is shown to the right. For this example I could have put the actual words along the side and bottom of the matrix, but that is not feasible for songs that have hundreds of words. Instead, the matrix has a numerical axis where the number indicates the position of each word in the song.

Every repetition matrix has 1s on the diagonal. In this song, the words "row" and "merrily" are repeated. Consequently, there is a 3 x 3 block of 1s at the top left and a 4 x 4 block of 1s in the middle of the matrix. (Click to enlarge.)

As mentioned, this song has very little repetition. One way to quantify the amount of repetition is to compute the proportion of 1s in the upper triangular portion of the repetition matrix. The upper triangular portion of an N x N matrix has N(N–1)/2 elements. For this song, N=18, so there are 153 cells and 9 of them are 1s. Therefore the "repetition score" is 9 / 153 = 0.059.

### Another simple example of repetition in song lyrics

I wrote a SAS/IML function that creates and visualizes the repetition matrix and returns the repetition score. In order to visualize songs that might have hundreds of words, I suppress the outlines (the grid) in the heat map. To illustrate the output of the function, the following image visualizes the words of the song "Here We Go Round the Mulberry Bush":

```Here we go round the mulberry bush, The mulberry bush, The mulberry bush. Here we go round the mulberry bush So early in the morning.```

The repetition score for this song is 0.087. You can see diagonal "stripes" that correspond to the repeating phrases "here we go round" and "the mulberry bush". In fact, if you study only the first seven rows, you can "see" almost the entire structure of the song. The first seven words contain all lyrics except for four words ("so", "early", "in", "morning").

### Visualize Classic Song Lyrics

Let's visualize the repetitions in the lyrics of several classic songs.

#### Hey Jude (The Beatles)

When I saw Morris's examples, the first song I wanted to visualize was "Hey Jude" by the Beatles. Not only does the title phrase repeat throughout the song, but the final chorus ("Nah nah nah nah nah nah, nah nah nah, hey Jude") repeats more than a dozen times. This results in a very dense block in the lower right corner of the repetition matrix and a very high repetition score of 0.183. The following image visualizes "Hey Jude":

#### Love Shack (The B-52s)

The second song that I wanted to visualize was "Love Shack" by The B-52s. In addition to a title that repeats almost 40 times, the song contains a sequence near the end in which the phrase "Bang bang bang on the door baby" is alternated with various interjections. The following visualization of the repetition matrix indicates that there is a lot of variation interspersed with regular repetition. The repetition score is 0.035.

#### Call Me (Blondie)

Lastly, I wanted to visualize the song "Call Me" by Blondie. This classic song has only 241 words, yet the title is repeated 41 times! In other words, about 1/3 of the song consists of those two words! Furthermore, there is a bridge in the middle of the song in which the phrase "oh oh oh oh oh" is alternated with other phrases (some in Italian and French) that appear only once in the song. The repetition score is 0.077. The song is visualized below:

### How to create a repetition matrix in SAS

If you think this is a fun topic, you can construct these images yourself by using SAS. If you discover a song that has an interesting repetition matrix, post a comment!

Here's the basic idea of how to construct and visualize a repetition matrix. First, use the DATA step to read each word, use the COMPRESS function to remove any punctuation, and standardize the input by transforming all words to lowercase:
```data Lyrics; length word \$20; input word @@; word = lowcase( compress(word, ,'ps') ); /* remove punctuation and spaces */ datalines; Here we go round the mulberry bush, The mulberry bush, The mulberry bush. Here we go round the mulberry bush So early in the morning. ;```

In SAS/IML software you can use the ELEMENT function to find the locations in the i_th row that have the value 1. After you construct a repetition matrix, you can use the HEATMAPDISC subroutine to display it. For example, the following SAS/IML program reads the words of the song into a vector and visualizes the repetition matrix. It also returns the repetition score, which is the proportion of 1s in the upper triangular portion of the matrix.

```ods graphics / width=500 height=500; /* for 9.4M5 you might need NXYBINSMAX=1000000 */ proc iml; /* define a function that creates and visualizes the repetition matrix */ start VizLyrics(DSName, Title); use (DSName); read all var _CHAR_ into Word; close; N = nrow(Word); M = j(N,N,0); /* allocate N x N matrix */ do i = 1 to N; M[,i] = element(Word, Word[i]); /* construct i_th row */ end; run heatmapdisc(M) title=Title colorramp={white black} displayoutlines=0 showlegend=0;   /* compute the proportion of 1s in the upper triangular portion of the matrix */ upperIdx = loc(col(M)>row(M)); return ( M[upperIdx][:] ); /* proportion of words that are repeated */ finish;   score = VizLyrics("Lyrics", "Here We Go Round the Mulberry Bush"); print score;```

If you want to reproduce the images in this post, you can download the SAS program for this article. In addition, the program creates repetition matrices for "We Didn't Start the Fire" (Billy Joel) and a portion of Martin Luther King Jr.'s "I Have a Dream" speech. You can modify the program and enter lyrics for your favorite songs.

The post Visualize repetition in song lyrics appeared first on The DO Loop.

Please comment on the article here: The DO Loop

Tags: ,

 Tweet

Email: