Kruzlic Homework 3

Overview of Final Project

The dataset I am using for my final project is a set taken from ‘Kaggle’, containing all of the songs and lyrics from Taylor Swift’s discography up until 2017.Due to the wide range of albums containing 20+ songs each, I will be comparing the lyrics of the first album ‘Taylor Swift’ and the most recent in the data set ‘Reputation’.

library(readr)
library(tidyverse)
library(tidyselect)
library(dplyr)
library(ggplot2)

Swift_lyrics <- read_csv("C:/Users/Bryn Kruzlic/OneDrive/Desktop/DACSS601/taylor_swift_lyrics.csv")
View(Swift_lyrics)

head(Swift_lyrics)

# A tibble: 6 x 7
  artist       album        track_title track_n lyric       line  year
  <chr>        <chr>        <chr>         <dbl> <chr>      <dbl> <dbl>
1 Taylor Swift Taylor Swift Tim McGraw        1 "He said ~     1  2006
2 Taylor Swift Taylor Swift Tim McGraw        1 "Put thos~     2  2006
3 Taylor Swift Taylor Swift Tim McGraw        1 "I said, ~     3  2006
4 Taylor Swift Taylor Swift Tim McGraw        1 "Just a b~     4  2006
5 Taylor Swift Taylor Swift Tim McGraw        1 "That had~     5  2006
6 Taylor Swift Taylor Swift Tim McGraw        1 "On backr~     6  2006

The variables within the data set include:

artist- character data (Taylor Swift)
album- character data (Taylor Swift, Fearless, etc.)
track_title- character data (Tim McGraw, etc. )
lyric- character data (He said the way..)
track_n- doubles data (1-etc.)
line- doubles data (1-etc.)
year- doubles data (2005-2017)]

As shown in code

colnames(Swift_lyrics)

[1] "artist"      "album"       "track_title" "track_n"    
[5] "lyric"       "line"        "year"

Potential Research Questions for Research

What word shows up the most overall? Are there any visible trends in words or topics in the chosen albums? *How have the lyrics changed over time?