battle-of-the-bands

Using NLP to analyze song lyrics

View the Project on GitHub mpfoley73/battle-of-the-bands

battle-of-the-bands

Using NLP to analyze song lyrics.

What the project does

This project uses natural language processing to compare song lyrics from three iconic bands with distinctive song-writing. On the one extreme, Rush’s lyricist, Neil Peart, wrote some of the most erudite and textually challenging lyrics of any rock band. AC/DC, although not considered lyrically weak, is included as an unimpressive contrast to Peart. Queen is the dark-horse in this analysis. All four members of Queen wrote songs individually, and as a group. Earlier songs, especially those written by Freddie Mercury, were extravagantly imaginative. How will Mercury’s creativity show up in the analysis?

Why the project is useful

This project is a fun analytic approach to exploring something that is easier to feel than to explain. AC/DC songs are easy to enjoy, but also easy to tire of. Queen and Rush are more difficult to acquire a taste for, but garner stronger feelings. This project is also a case study in the use of natural language processing (NLP) algorithms. The big two are present here: topic modeling, and sentiment analysis. I also achieved interesting results with a complexity analysis.

How you can get started

Fork this project to start your own battle of the bands, or clone and submit pull requests if you have any improvements.

Where to get help

I narrated my analysis so that it functions somewhat as a tutorial. I also include links to the resources I used. Each section contributes insights that subsequent sections occasionally leverage. So if you are running the code, run each section in succession.

Table of Contents

Section 1 uses API and web scraping techniques to assemble a corpus of text for nearly 500 songs.

Section 2 is a brief overview of the corpus, including summary stats and trend charts.

Section 3 is the first NLP analysis. I measure text complexity using several measures. The quanteda package does almost all of the work, leaving the user with all of the fun.

Section 4 is a topic analysis. I use the stm package to identify topics, then perform a cluster analysis to find similar songs.