This data was downloaded from kaggle about weekly chart of The Hot 100 Songs Billboard since 1958. There are a lot of artist who publish their song, however it is hard to be famous, so this chart show the hottest songs every week. Here we want to analyze what song is the most frequent appeared in The Hot 100 Songs Billboard chart from 2010 to 2021 and which artist is the most frequent appeared in The Hot 100 Songs Billboard chart from 2010 to 2021.
Some relevant columns in the data :
- date
: The date of the charts
- rank
: The rank of the song
- song
: The song title
- artist
: The song artist
- last.week
: The rank in previous week
- peak.rank
: Top rank achieved by the song
- weeks.on.board
: Total weeks the song reappeared on the chart
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(ggplot2)
Before we analyze the data, let’s input it first
<- read.csv("data/charts.csv", stringsAsFactors = FALSE)
w_chart head(w_chart)
## date rank song artist
## 1 1958-08-04 1 Poor Little Fool Ricky Nelson
## 2 1958-08-04 2 Patricia Perez Prado And His Orchestra
## 3 1958-08-04 3 Splish Splash Bobby Darin
## 4 1958-08-04 4 Hard Headed Woman Elvis Presley With The Jordanaires
## 5 1958-08-04 5 When Kalin Twins
## 6 1958-08-04 6 Rebel-'rouser Duane Eddy His Twangy Guitar And The Rebels
## last.week peak.rank weeks.on.board
## 1 NA 1 1
## 2 NA 2 1
## 3 NA 3 1
## 4 NA 4 1
## 5 NA 5 1
## 6 NA 6 1
tail(w_chart)
## date rank song artist last.week
## 326682 2021-03-13 95 Pick Up Your Feelings Jazmine Sullivan 89
## 326683 2021-03-13 96 Nobody Dylan Scott NA
## 326684 2021-03-13 97 Cover Me Up Morgan Wallen 95
## 326685 2021-03-13 98 Like I Want You Giveon 100
## 326686 2021-03-13 99 Gone Dierks Bentley NA
## 326687 2021-03-13 100 Bichota Karol G NA
## peak.rank weeks.on.board
## 326682 88 6
## 326683 96 1
## 326684 52 9
## 326685 95 3
## 326686 99 1
## 326687 72 13
Here we want to check what type the data is
dim(w_chart)
## [1] 326687 7
str(w_chart)
## 'data.frame': 326687 obs. of 7 variables:
## $ date : chr "1958-08-04" "1958-08-04" "1958-08-04" "1958-08-04" ...
## $ rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ song : chr "Poor Little Fool" "Patricia" "Splish Splash" "Hard Headed Woman" ...
## $ artist : chr "Ricky Nelson" "Perez Prado And His Orchestra" "Bobby Darin" "Elvis Presley With The Jordanaires" ...
## $ last.week : num NA NA NA NA NA NA NA NA NA NA ...
## $ peak.rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ weeks.on.board: int 1 1 1 1 1 1 1 1 1 1 ...
Changing the data type to the proper type
$date <- as.Date(w_chart$date)
w_chart$song <- as.factor(w_chart$song)
w_chart$artist <- as.factor(w_chart$artist) w_chart
str(w_chart)
## 'data.frame': 326687 obs. of 7 variables:
## $ date : Date, format: "1958-08-04" "1958-08-04" ...
## $ rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ song : Factor w/ 24254 levels "'03 Bonnie & Clyde",..: 15710 15414 18419 7501 22741 16195 23556 14081 23269 5957 ...
## $ artist : Factor w/ 9994 levels "'N Sync","'N Sync & Gloria Estefan",..: 7099 6580 1011 2650 4517 2500 8340 3771 8640 6567 ...
## $ last.week : num NA NA NA NA NA NA NA NA NA NA ...
## $ peak.rank : int 1 2 3 4 5 6 7 8 9 10 ...
## $ weeks.on.board: int 1 1 1 1 1 1 1 1 1 1 ...
To check if there is any missing value from the data.
anyNA(w_chart)
## [1] TRUE
colSums(is.na(w_chart))
## date rank song artist last.week
## 0 0 0 0 33392
## peak.rank weeks.on.board
## 0 0
Conclusion : There is missing value from the data in last.week column, it means there is a song that is not ranked in last week
Subset to delete the information that we don’t need, in this case we want to analyzed the billboard charts from 2010 to 2021. Save it to w_1021 to be processed and analyzed
$year <- year(w_chart$date)
w_chart<- w_chart[w_chart$year >= 2010, ] w_1021
summary(w_1021)
## date rank song
## Min. :2010-01-02 Min. : 1.00 Radioactive: 92
## 1st Qu.:2012-10-20 1st Qu.: 25.75 I Like It : 89
## Median :2015-08-08 Median : 50.50 Let It Go : 89
## Mean :2015-08-08 Mean : 50.50 Stay : 89
## 3rd Qu.:2018-05-26 3rd Qu.: 75.25 Mercy : 85
## Max. :2021-03-13 Max. :100.00 Rockstar : 82
## (Other) :57974
## artist last.week peak.rank weeks.on.board
## Drake : 711 Min. : 1.00 Min. : 1.0 Min. : 1.00
## Taylor Swift : 706 1st Qu.: 23.00 1st Qu.: 11.0 1st Qu.: 4.00
## Luke Bryan : 479 Median : 46.00 Median : 35.0 Median :10.00
## The Weeknd : 433 Mean : 47.06 Mean : 38.5 Mean :12.25
## Jason Aldean : 424 3rd Qu.: 71.00 3rd Qu.: 62.0 3rd Qu.:17.00
## Imagine Dragons: 407 Max. :100.00 Max. :100.0 Max. :87.00
## (Other) :55340 NA's :6637
## year
## Min. :2010
## 1st Qu.:2012
## Median :2015
## Mean :2015
## 3rd Qu.:2018
## Max. :2021
##
Summary :
1. First week observation of The Hot 100 Songs Billboard from 2010 to 2021 is on 02 January 2010
2. Last week observation of The Hot 100 Songs Billboard from 2010 to 2021 is on 13 March 2021
3. The most frequent song appeared in Top 100 Hot Billboard from 2010 to 2021 is Radioactive
4. The most frequent artist appeared in The Hot 100 Songs Billboard from 2010 to 2021 is Drake
Here we want to observe how many song and what song that are sung by Ed Sheeran that apeeared in The Hot 100 Songs Billboard in 2010 to 2021
<- w_1021[w_1021$artist == "Ed Sheeran", c("date", "song", "rank")]
ed_sheeran ggplot(ed_sheeran, aes(date, rank, color = song)) +
geom_line() +
scale_y_reverse() +
scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
theme_minimal() +
theme(legend.text = element_text(size = 5.5), legend.position = "bottom") +
labs(title = "Few Hits Song From Ed Sheeran", x = "Year", y = "Rank")
Conclusion :
- There is 23 songs that are sung by Ed Sheeran which appeared on Billboard chart from 2010 to 2021
- Ed Sheeran didn’t make a song or his song didn’t appeared on Billboard on 2014, 2016, 2019, and 2020
Here we want to observe how many song and what song that are sung by Coldplay that apeeared in The Hot 100 Songs Billboard in 2010 to 2021
<- w_1021[w_1021$artist == "Coldplay", c("date", "song", "rank")]
cold ggplot(cold, aes(date, rank, color = song)) +
geom_line() +
scale_y_reverse() +
scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
theme_minimal() +
theme(legend.text = element_text(size = 8), legend.position = "bottom") +
labs(title = "Few Hits Song From Coldplay", x = "Year", y = "Rank")
Conclusion :
- There is 11 songs that are sung by Coldplay which appeared on Billboard chart from 2010 to 2021
- Coldplay didn’t make a song or his song didn’t appeared on Billboard on 2013, 2015, 2017, 2018, 2019, 2020, and 2021
- The most song appeared on Billboard chart sung by Coldplay is on 2014
<- aggregate(weeks.on.board ~ song, w_1021, max)
most_week <- most_week[order(most_week$weeks.on.board, decreasing = T), ]
most_week head(most_week)
## song weeks.on.board
## 3387 Radioactive 87
## 3590 Sail 79
## 911 Counting Stars 68
## 3228 Party Rock Anthem 68
## 556 Blinding Lights 65
## 3533 Rolling In The Deep 65
ggplot(most_week[1:10,],aes(weeks.on.board, reorder(song, weeks.on.board))) +
geom_col(aes(fill = weeks.on.board), show.legend = F) +
scale_fill_gradient(high = "black",low = "fire brick") +
geom_label(aes(label = weeks.on.board), nudge_x = 1) +
labs(title = "Top 10 Most Frequent Song Appeared on Billboard From 2010 to 2021", x = "Frequency Appeared", y = NULL) +
theme_minimal()
Conclusion : This is the graph of Top 10 Most Frequent Appeared Song on Billboard From 2010 to 2021, with the most frequency of 87 and that song is Radioactive
<- as.data.frame(table(w_1021$artist))
most_artist <- most_artist[order(most_artist$Freq, decreasing = T), ]
most_artist head(most_artist)
## Var1 Freq
## 2443 Drake 711
## 8059 Taylor Swift 706
## 5383 Luke Bryan 479
## 9164 The Weeknd 433
## 3892 Jason Aldean 424
## 3663 Imagine Dragons 407
ggplot(most_artist[1:10, ],aes(Freq, reorder(Var1, Freq))) +
geom_col(aes(fill = Freq), show.legend = F) +
scale_fill_gradient(high = "chocolate4",low = "burlywood1") +
geom_label(aes(label = Freq), nudge_x = 1) +
labs(title = "Top 10 Most Frequent Artist Appeared on Billboard", x = "Frequency", y = NULL)
Conclusion : This is the graph of Top 10 Most Artist Appeared Song on Billboard From 2010 to 2021, with the most frequency of 711 and that artist is Drake
From all the graph that is shown above we can conclude :
- There is 23 songs that are sung by Ed Sheeran which appeared on Billboard chart from 2010 to 2021
- There is 11 songs that are sung by Coldplay which appeared on Billboard chart from 2010 to 2021
- That Radioactive is the hottest song from 2010 to 2021
- Drake is the hottest artist from 2010 to 2021