Introduction

As a previous and current college-goer, it would make sense if I had exposure to college fight songs. However, I can not remember a single word of the fight song of a college that I have gone to. This is why the data (https://projects.fivethirtyeight.com/college-fight-song-lyrics/) I’m going to go over is a collection of various parameters regarding the college fight songs of 65 schools which were part of the major sports conferences apparently called the “Power Five”.

####Loading the Data

Here we load the data from the csv within GitHub into a dataframe for further processing later.

We also take a glimpse at the data to see if anything needs to be further modified regarding the headers or the data itself.

In this case the headers to me in certain instances of categorical variables seem to be ambiguous. The values themselves seem fine however.

fightframe <- read.csv("https://raw.githubusercontent.com/fivethirtyeight/data/master/fight-songs/fight-songs.csv", header= TRUE)
glimpse (fightframe)
## Rows: 65
## Columns: 23
## $ school          <chr> "Notre Dame", "Baylor", "Iowa State", "Kansas", "Kansa…
## $ conference      <chr> "Independent", "Big 12", "Big 12", "Big 12", "Big 12",…
## $ song_name       <chr> "Victory March", "Old Fight", "Iowa State Fights", "I'…
## $ writers         <chr> "Michael J. Shea and John F. Shea", "Dick Baker and Fr…
## $ year            <chr> "1908", "1947", "1930", "1912", "1927", "1905", "1934"…
## $ student_writer  <chr> "No", "Yes", "Yes", "Yes", "Yes", "Yes", "No", "No", "…
## $ official_song   <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "Yes"…
## $ contest         <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", …
## $ bpm             <int> 152, 76, 155, 137, 80, 153, 180, 81, 149, 159, 152, 16…
## $ sec_duration    <int> 64, 99, 55, 62, 67, 37, 29, 65, 47, 54, 92, 60, 63, 72…
## $ fight           <chr> "Yes", "Yes", "Yes", "No", "Yes", "No", "Yes", "Yes", …
## $ number_fights   <int> 1, 4, 5, 0, 6, 0, 5, 17, 2, 8, 0, 0, 1, 9, 8, 0, 6, 0,…
## $ victory         <chr> "Yes", "Yes", "No", "No", "Yes", "No", "Yes", "No", "Y…
## $ win_won         <chr> "Yes", "Yes", "No", "No", "No", "No", "No", "Yes", "No…
## $ victory_win_won <chr> "Yes", "Yes", "No", "No", "Yes", "No", "Yes", "Yes", "…
## $ rah             <chr> "Yes", "No", "Yes", "No", "No", "Yes", "No", "No", "Ye…
## $ nonsense        <chr> "No", "No", "No", "Yes", "No", "No", "No", "No", "No",…
## $ colors          <chr> "Yes", "Yes", "No", "No", "Yes", "No", "No", "Yes", "Y…
## $ men             <chr> "Yes", "No", "Yes", "Yes", "No", "No", "Yes", "No", "Y…
## $ opponents       <chr> "No", "No", "No", "Yes", "No", "No", "Yes", "Yes", "No…
## $ spelling        <chr> "No", "Yes", "Yes", "No", "No", "Yes", "No", "No", "Ye…
## $ trope_count     <int> 6, 5, 4, 3, 3, 2, 4, 4, 6, 3, 1, 6, 3, 3, 3, 0, 5, 3, …
## $ spotify_id      <chr> "15a3ShKX3XWKzq0lSS48yr", "2ZsaI0Cu4nz8DHfBkPt0Dl", "3…

####Processing the Data

Here we change the names of some ambiguous columns in order to make the data itself more useful. There are multiple columns such as “victory” that are indicators of words that are directly in the lyrics of the song itself. These I want to prepend with “contains_”, so “victory” becomes “contains_victory”. There are also columns that indicate if something is mentioned in the song such as “men”, but are not directly in the lyrics. These I want to prepend with “mentions_”, so “men” becomes “mentions_men”.

fightframe <- rename(fightframe,c("won_contest" = "contest","contains_fight"  = "fight", "contains_victory" = "victory", "contains_win_won" = "win_won","contains_victory_win_won" = "victory_win_won", "contains_rah" = "rah","contains_nonsense" = "nonsense","mentions_colors" = "colors","mentions_men" = "men","mentions_opponents" = "opponents","contains_spelling" = "spelling"))
colnames(fightframe)
##  [1] "school"                   "conference"              
##  [3] "song_name"                "writers"                 
##  [5] "year"                     "student_writer"          
##  [7] "official_song"            "won_contest"             
##  [9] "bpm"                      "sec_duration"            
## [11] "contains_fight"           "number_fights"           
## [13] "contains_victory"         "contains_win_won"        
## [15] "contains_victory_win_won" "contains_rah"            
## [17] "contains_nonsense"        "mentions_colors"         
## [19] "mentions_men"             "mentions_opponents"      
## [21] "contains_spelling"        "trope_count"             
## [23] "spotify_id"

####Subsetting the Data

Now that the dataframe is loaded into R and has better names for the column, we can begin to utilize this data via subsetting.

In this instance, I want to attempt to subset the data in a way that will answer the question “Which college fight song is the manliest college fight song?” First we would need to define a manly song within our parameters. In this case, we’ll say a manly song is one that at the very least: contains “fight”, contains “rah”, and mentions men. As these are all very manly things for a song to do. Now to compare manliness we could use the number of times the song says “fight” to measure manliness, as more “fight” means it is more manly. A top ten list would also make for easier comparing.

fightsubset <- subset(fightframe,"contains_fight" = "Yes", "contains_rah" = "Yes", "mentions_men" = "Yes", select = c("school","song_name","number_fights"))
attach(fightsubset)
fightsubsetordered <- head(fightsubset[order(-number_fights),],10)
detach(fightsubset)
fightsubsetordered
##              school             song_name number_fights
## 8             Texas           Texas Fight            17
## 24          Rutgers   The Bells Must Ring            10
## 14             Iowa       Iowa Fight Song             9
## 10       Texas Tech Fight, Raiders, Fight             8
## 15         Maryland   Maryland Fight Song             8
## 29         Colorado              Fight CU             8
## 37 Washington State        The Fight Song             7
## 5      Kansas State       Wildcat Victory             6
## 17   Michigan State       Victory for MSU             6
## 45      Mississippi        Forward Rebels             6

We can see that Texas Fight wins the manliest match by an overwhelming margin of 7 fights here.

Conclusions

The data that was collected by FiveThirtyEight is quite a fun way to collect data. I believe this dataset could be improved by having data collected from more colleges. As of now it only exists for those that actually are within the Power Five sports conferences. However, international and even just more colleges in the United States could definitely spruce the data up.