Loading Data into a Data Frame

Overview

I decided to select the data set for FiveThirtyEight’s article, “Our Guide to the Exuberant Nonesense of College Fight Songs”. In high school, I was part of the band and had to play our fight song (which is the same as Cal’s I have discovered) repeatedly. I’m a bit overloaded on politics and not particularly into professional sports, so this dataset had a certain charm!

The brass tacks for the data set is reviewing the lyrics for certain clichés, things like “fight,” “Victory,” “Rah,” etc. The article is more of a tool to view each school’s song and have summary stats on the cliches and how it rates among the others. It also automatically starts playing, so make sure your speakers are at an appropriate low volume before clicking that link.

Summary Statistics

Linked directly with 538’s GitHub raw source data.

fightSongRawURL <- "https://raw.githubusercontent.com/fivethirtyeight/data/master/fight-songs/fight-songs.csv"
fightSongRaw <- read.csv(file = fightSongRawURL, header = TRUE, sep = ",")

Just to check initial impressions of the data, I used the summary function to breakout the columns along summary stats. Because most of the columns are stored as characters, there’s not much to glean from the presence of a certain trope. I did zero in on the beats per minute (bpm) and song duration (sec_duration) columns and chose those to create a subset dataframe.

summary(fightSongRaw)

##     school           conference         song_name           writers         
##  Length:65          Length:65          Length:65          Length:65         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##      year           student_writer     official_song        contest         
##  Length:65          Length:65          Length:65          Length:65         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##       bpm         sec_duration       fight           number_fights   
##  Min.   : 65.0   Min.   : 27.00   Length:65          Min.   : 0.000  
##  1st Qu.: 90.0   1st Qu.: 58.00   Class :character   1st Qu.: 0.000  
##  Median :140.0   Median : 67.00   Mode  :character   Median : 2.000  
##  Mean   :128.8   Mean   : 71.91                      Mean   : 2.846  
##  3rd Qu.:151.0   3rd Qu.: 85.00                      3rd Qu.: 5.000  
##  Max.   :180.0   Max.   :172.00                      Max.   :17.000  
##    victory            win_won          victory_win_won        rah           
##  Length:65          Length:65          Length:65          Length:65         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    nonsense            colors              men             opponents        
##  Length:65          Length:65          Length:65          Length:65         
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    spelling          trope_count     spotify_id       
##  Length:65          Min.   :0.000   Length:65         
##  Class :character   1st Qu.:3.000   Class :character  
##  Mode  :character   Median :4.000   Mode  :character  
##                     Mean   :3.615                     
##                     3rd Qu.:5.000                     
##                     Max.   :8.000

I named a variable for the subset and made a few selections of the raw data source. As a check to ensure I pulled the right vars, I put in a “head” function.

fightSongSub <- fightSongRaw[c(1,2,9:10,22)]

head(fightSongSub)

##         school  conference bpm sec_duration trope_count
## 1   Notre Dame Independent 152           64           6
## 2       Baylor      Big 12  76           99           5
## 3   Iowa State      Big 12 155           55           4
## 4       Kansas      Big 12 137           62           3
## 5 Kansas State      Big 12  80           67           3
## 6     Oklahoma      Big 12 153           37           2

Conclusions

General Comments

Turning these data into 538’s tool to have user explore each song more in-depth is a logical next step. I was curious how the songs may stack up against championship wins, bowl game appearances, and season records. If the songs (and perhaps the number of times they’re played per game) have any connection to sports wins. I rather doubt it, but an interesting future project with additional data included.

Scatterplot

Thought it’d be fun to throw a scatterplot graphic in. Using what I learned from the R bridge program, I wanted to compare bpm and sec_duration columns along conference lines. I didn’t figure the conference would make much of a difference and that seemed to bear out with no pattern. I was a bit suprised to see that there was a polarization of bpm, with a gap in the middle.

library(ggplot2)

ggplot(fightSongSub, aes(x=bpm, y=sec_duration, color=conference)) +
  geom_point()

DATA 607 - Week 1

Ian Costello