Kruzlic Homework 3

HW3 First Attempt

Bryn Kruzlic
2022-03-04

Overview of Final Project

The dataset I am using for my final project is a set taken from ‘Kaggle’, containing all of the songs and lyrics from Taylor Swift’s discography up until 2017.Due to the wide range of albums containing 20+ songs each, I will be comparing the lyrics of the first album ‘Taylor Swift’ and the most recent in the data set ‘Reputation’.

Swift_lyrics <- read_csv("C:/Users/Bryn Kruzlic/OneDrive/Desktop/DACSS601/taylor_swift_lyrics.csv")
View(Swift_lyrics)
head(Swift_lyrics)
# A tibble: 6 x 7
  artist       album        track_title track_n lyric       line  year
  <chr>        <chr>        <chr>         <dbl> <chr>      <dbl> <dbl>
1 Taylor Swift Taylor Swift Tim McGraw        1 "He said ~     1  2006
2 Taylor Swift Taylor Swift Tim McGraw        1 "Put thos~     2  2006
3 Taylor Swift Taylor Swift Tim McGraw        1 "I said, ~     3  2006
4 Taylor Swift Taylor Swift Tim McGraw        1 "Just a b~     4  2006
5 Taylor Swift Taylor Swift Tim McGraw        1 "That had~     5  2006
6 Taylor Swift Taylor Swift Tim McGraw        1 "On backr~     6  2006

The variables within the data set include:

  1. artist- character data (Taylor Swift)
  2. album- character data (Taylor Swift, Fearless, etc.)
  3. track_title- character data (Tim McGraw, etc. )
  4. lyric- character data (He said the way..)
  5. track_n- doubles data (1-etc.)
  6. line- doubles data (1-etc.)
  7. year- doubles data (2005-2017)]

As shown in code

colnames(Swift_lyrics)
[1] "artist"      "album"       "track_title" "track_n"    
[5] "lyric"       "line"        "year"       

Potential Research Questions for Research

What word shows up the most overall? Are there any visible trends in words or topics in the chosen albums? *How have the lyrics changed over time?