Ask any music fanatic about their favorite music genre and you’re bound to get a lot of strong opinions. But with music constantly evolving and genres expanding, how much can you really infer from this preference? For Rock fans, you could favor the 1960s British invasion, classic rock 70s, prog-rock 80s, alternative rock 90s, or nu-metal early 2000s. All of these eras have distinctive sounds, and with it, mostly different fans.
A lot of these musical differences could stem from the sound evolution that came from inventions like the electric guitar or drum machine, but I sought to quantify them in the lyrics themselves. There’s a list of stereotypes that music fans intuitively know for each genre. For instance, Hip-Hop is angrier than Pop, which is happier than Country, which is sadder than Rock. A lot of these stereotypes stem from tempo, sound and media portrayals, but I wanted to quantify these attributes by categorizing an artist’s lyrics, to see how often lyrics conveying a certain emotion appeared.
In order to do so, I needed two things: a song lyrics dataset and a map of english vocabulary and emotion. There are a lot of great music datasets out there, including the very popular Million Songs Dataset, but for my purposes, I was only interested in the artist, song year, song lyrics, and approximate genre. A dataset I pulled from Kaggle had this exact data for 380,000 songs, with the lyrics pulled from Metrolyrics. They had also sorted the songs into twelve genres including ‘Pop’, ‘Hip-Hop’, ‘Rock’, ‘Metal’, and ‘Country’.
For the second part of my data requirements, I used the NRC Emotion Lexicon. This is a list of English words and their associations with eight basic emotions (anger, fear, anticipation, trust, surprise, sadness, joy, and disgust) and two sentiments (negative and positive). These were manually categorized with crowdsourcing done by Saif Mohammad and Peter Turney. A lot of these categories overlap, so a word can show up in three categories like “negative”, “anger” and “disgust.”
After creating the emotional words dictionaries, I needed to clean the lyrics in my dataset. I removed any whitespace or any character that wasn’t a standard alpha-numeric character. This was a problem with the contraction “I’ll” since without an apostrophe, it has a negative connotation, so I manually changed these to “I will.”
After the lyrics were cleaned, I wanted to shrink the time range because decades felt like too long. So I grouped every song into 5 year time windows like 1970-1974, 1975-1979, and so on. At this point, I wanted to take my analysis in two directions, so I wrote a script to categorize the lyrics at a genre level, and another to categorize them at an artist level.
Genre-Level Lyric Emotions
To start at a genre-level, I calculated the total proportion of words for each genre during each half-decade that represented each emotion. Plotting them all in a heatmap next to each other, we’re able to track how emotions have changed relative to time and each other. My first observation is that there seems to be a general shift to fewer positive lyrics over the last 50 years. This is most obviously seen in the Pop and R&B genres. This makes sense when you consider that the most commonly used positive lyrics are “love”, “baby” and “sweet,” which correspond pretty well to the stereotypical hippie culture of the 70s.
On the flip side, we can also take a look at how negative lyrics have been distributed over time. Unlike the positive word distribution, negative words are dominated by a single category: Metal. Since 1980, Metal has had around a 12% negative word rate, with the next closest genre being Hip-Hop at 7.5%. However, the types of negative words these two genres are using are quite different. The most common negative lyrics in Metal are “pain”, “hate”, and “death”, whereas the most common negative Hip-Hop lyrics are “shit”, “ass” and “hit.”
Now that we’ve noticed some trends on a macro-level, let’s take the level of granularity down to the artist, as we focus on Pop, Metal and Hip-Hop.
Pop, Metal and Hip Hop Emotions
We know that music genres are full of variety, which, as another school project of mine examined, can lead to a lot of grey areas between genres. In the following 3D scatter plot, I’ve tried to map the amount of positive, negative and angry words for a large sample of artists from Pop, Metal and Hip-Hop. You can click, drag and zoom the plot around to see different angles of the data, but if this feature just makes you dizzy, then just scroll past it.
We know that pop music can cover a wide-range of emotions on its own, but how does it compare to Hip-Hop? It turns out that even the most negative and angry pop singers like Avril Lavigne have a modest 6.2% positive rate and 7.5% negative rate. These numbers are comparable to the most positive and least angry rappers like Chamillionaire and Drake.
Similarly, Chief Keef, the single most negative and angry rapper in this database, would hardly crack the top half of negative lyrical emotions when it comes to Metal music. Chief Keef has about 5.5% positive lyrics, 9% negative and 12% angry, while Cannibal Corpse, one of the most negative and angry Metal bands has a similar amount of positive lyrics while maintaining an 11% angry lyric rate and whopping 22% negative lyric rate.
Although it’s interesting to compare lyrics across genres like this, let’s take a closer look at Rock, which is well known for having a wide-variety of sub-genres.
Rock Music Emotions
In the following scatter, you’ll see the distribution of rock bands by positive and negative emotion, with the size of the bubble representing how many lyrics were available for the artist. There are a couple outliers on every end of this plot, and a whole bunch of artists meeting in the middle.
First, at the extreme positive end, we have Christian Rock singers like Chris Tomlin. He has a 25% positive rate vs only an 8% negative rate. His most used negative words “cross” and “sin” are also as you’d expect from a Christian singer.
On the negative side, you have GG Allin, the largest bubble on top of the graph. GG is an infamous punk rock musician who’s mostly famous for his extreme antics like defecating on-stage, and constant arrests. It’s no surprise we find him with a three times higher negative than positive lyrical sentiment.
Finally, there also seems to be a cluster away from everyone else and much closer to the origin. After looking at a few band names from here like Die Toten Hosen and Echt, it’s clear that these represent non-English speaking artists. A future iteration of this project could create emotion dictionaries beyond the English language.
It can be fun to look at the extremes in these charts, but even artists right in the middle of the cluster can have ebbs and flows to their lyrical emotions over the course of their career. To illustrate, I created a heatmap for the emotions of David Bowie songs through time.
There’s a small gap in this data in the 1990s, but the first observation that jumps out to me is how positive and joyful his music was in the late 80s and early 2000s. This could be the result of a rock icon in the latter half of his career, able to star in movies, voice video game characters, and bring anyone they want into the studio.
As many reading this already know, David Bowie passed away of liver cancer two days before he released his final album Blackstar in 2016. He kept his diagnosis secret for 18 months while he worked on his off-Broadway musical and recorded this farewell album. Given these circumstances, I find it notable that the surprise, anger, sadness and fear all increased, while his trust was decreased for this album.
Looking at all of these heatmaps, it’s very easy to over-interpret patterns like this. With an even larger dataset and more precise emotional dictionary, it’s possible that data for these artists becomes less noisy and more interpretable.
Conclusion
Most of us know which music energizes us, and which slows us down or helps us focus. We might have even created Spotify playlists for these situations. But it’s not just the tempo and instruments that contribute to this energy, we can find these emotions intentionally written into the songs. Properly quantifying this helps us compare artists across genres, artists within genres, and even artists against themselves.
People should use this analysis not only to track what kind of music they like, but also to find new music that suits their style. Do you like the energy of Tupac? Maybe you’ll like French Montana. Are the Beach Boys more your speed? Try Chuck Berry. This data could easily be transformed to use in a music recommendation engine.
In the future, I’d like to add degrees of emotion to the NRC Emotional Lexicon in order to better classify song lyrics. Not every negative word has the same negative weight to it. I’d also like to be able to analyze words better in the context of the lyrics. This can be tricky since song lyrics aren’t always as verbose as books or essays, but context is still very important. “Hell” means something very different in songs written by Chris Tomlin than by Chief Keef. Finally, I’d like to add sound and tempo characteristics to this in order to create a powerful song/artist recommendation engine.
Thank you to everyone that helped me with this project, including Hilal Kasikci and Andrew Brandt. Credit is also due to Alex Burzinski, Tony Colucci, JD Cook, James Fan, Michael Fedell as partners in the Album.ai project referred to above. Thanks also to Matt Daniels at The Pudding, whose article on emo rap lyrics inspired this project. If you’d like to learn more about my project, or give feedback, feel free to email me or follow me on LinkedIn.