1 Explanation

This data was downloaded from kaggle about weekly chart of The Hot 100 Songs Billboard since 1958. There are a lot of artist who publish their song, however it is hard to be famous, so this chart show the hottest songs every week. Here we want to analyze what song is the most frequent appeared in The Hot 100 Songs Billboard chart from 2010 to 2021 and which artist is the most frequent appeared in The Hot 100 Songs Billboard chart from 2010 to 2021.

Some relevant columns in the data :
- date : The date of the charts
- rank : The rank of the song
- song : The song title
- artist : The song artist
- last.week : The rank in previous week
- peak.rank : Top rank achieved by the song
- weeks.on.board : Total weeks the song reappeared on the chart

1.1 Attaching Packages

library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(ggplot2)

2 Data Explanatory

2.1 Input Data

Before we analyze the data, let’s input it first

w_chart <- read.csv("data/charts.csv", stringsAsFactors = FALSE)
head(w_chart)
##         date rank              song                                      artist
## 1 1958-08-04    1  Poor Little Fool                                Ricky Nelson
## 2 1958-08-04    2          Patricia               Perez Prado And His Orchestra
## 3 1958-08-04    3     Splish Splash                                 Bobby Darin
## 4 1958-08-04    4 Hard Headed Woman          Elvis Presley With The Jordanaires
## 5 1958-08-04    5              When                                 Kalin Twins
## 6 1958-08-04    6     Rebel-'rouser Duane Eddy His Twangy Guitar And The Rebels
##   last.week peak.rank weeks.on.board
## 1        NA         1              1
## 2        NA         2              1
## 3        NA         3              1
## 4        NA         4              1
## 5        NA         5              1
## 6        NA         6              1
tail(w_chart)
##              date rank                  song           artist last.week
## 326682 2021-03-13   95 Pick Up Your Feelings Jazmine Sullivan        89
## 326683 2021-03-13   96                Nobody      Dylan Scott        NA
## 326684 2021-03-13   97           Cover Me Up    Morgan Wallen        95
## 326685 2021-03-13   98       Like I Want You           Giveon       100
## 326686 2021-03-13   99                  Gone   Dierks Bentley        NA
## 326687 2021-03-13  100               Bichota          Karol G        NA
##        peak.rank weeks.on.board
## 326682        88              6
## 326683        96              1
## 326684        52              9
## 326685        95              3
## 326686        99              1
## 326687        72             13

Here we want to check what type the data is

dim(w_chart)
## [1] 326687      7
str(w_chart)
## 'data.frame':    326687 obs. of  7 variables:
##  $ date          : chr  "1958-08-04" "1958-08-04" "1958-08-04" "1958-08-04" ...
##  $ rank          : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ song          : chr  "Poor Little Fool" "Patricia" "Splish Splash" "Hard Headed Woman" ...
##  $ artist        : chr  "Ricky Nelson" "Perez Prado And His Orchestra" "Bobby Darin" "Elvis Presley With The Jordanaires" ...
##  $ last.week     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ peak.rank     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ weeks.on.board: int  1 1 1 1 1 1 1 1 1 1 ...

Changing the data type to the proper type

w_chart$date <- as.Date(w_chart$date)
w_chart$song <- as.factor(w_chart$song)
w_chart$artist <- as.factor(w_chart$artist)
str(w_chart)
## 'data.frame':    326687 obs. of  7 variables:
##  $ date          : Date, format: "1958-08-04" "1958-08-04" ...
##  $ rank          : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ song          : Factor w/ 24254 levels "'03 Bonnie & Clyde",..: 15710 15414 18419 7501 22741 16195 23556 14081 23269 5957 ...
##  $ artist        : Factor w/ 9994 levels "'N Sync","'N Sync & Gloria Estefan",..: 7099 6580 1011 2650 4517 2500 8340 3771 8640 6567 ...
##  $ last.week     : num  NA NA NA NA NA NA NA NA NA NA ...
##  $ peak.rank     : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ weeks.on.board: int  1 1 1 1 1 1 1 1 1 1 ...

2.2 Checking Missing Data

To check if there is any missing value from the data.

anyNA(w_chart)
## [1] TRUE
colSums(is.na(w_chart))
##           date           rank           song         artist      last.week 
##              0              0              0              0          33392 
##      peak.rank weeks.on.board 
##              0              0

Conclusion : There is missing value from the data in last.week column, it means there is a song that is not ranked in last week

3 Subseting

Subset to delete the information that we don’t need, in this case we want to analyzed the billboard charts from 2010 to 2021. Save it to w_1021 to be processed and analyzed

w_chart$year <- year(w_chart$date)
w_1021 <- w_chart[w_chart$year >= 2010, ]

4 Data Explanation

summary(w_1021)
##       date                 rank                 song      
##  Min.   :2010-01-02   Min.   :  1.00   Radioactive:   92  
##  1st Qu.:2012-10-20   1st Qu.: 25.75   I Like It  :   89  
##  Median :2015-08-08   Median : 50.50   Let It Go  :   89  
##  Mean   :2015-08-08   Mean   : 50.50   Stay       :   89  
##  3rd Qu.:2018-05-26   3rd Qu.: 75.25   Mercy      :   85  
##  Max.   :2021-03-13   Max.   :100.00   Rockstar   :   82  
##                                        (Other)    :57974  
##              artist        last.week        peak.rank     weeks.on.board 
##  Drake          :  711   Min.   :  1.00   Min.   :  1.0   Min.   : 1.00  
##  Taylor Swift   :  706   1st Qu.: 23.00   1st Qu.: 11.0   1st Qu.: 4.00  
##  Luke Bryan     :  479   Median : 46.00   Median : 35.0   Median :10.00  
##  The Weeknd     :  433   Mean   : 47.06   Mean   : 38.5   Mean   :12.25  
##  Jason Aldean   :  424   3rd Qu.: 71.00   3rd Qu.: 62.0   3rd Qu.:17.00  
##  Imagine Dragons:  407   Max.   :100.00   Max.   :100.0   Max.   :87.00  
##  (Other)        :55340   NA's   :6637                                    
##       year     
##  Min.   :2010  
##  1st Qu.:2012  
##  Median :2015  
##  Mean   :2015  
##  3rd Qu.:2018  
##  Max.   :2021  
## 

Summary :
1. First week observation of The Hot 100 Songs Billboard from 2010 to 2021 is on 02 January 2010
2. Last week observation of The Hot 100 Songs Billboard from 2010 to 2021 is on 13 March 2021
3. The most frequent song appeared in Top 100 Hot Billboard from 2010 to 2021 is Radioactive
4. The most frequent artist appeared in The Hot 100 Songs Billboard from 2010 to 2021 is Drake

5 Study Case

5.1 Observe hit song that are sung by Ed Sheeran from 2010 to 2021

Here we want to observe how many song and what song that are sung by Ed Sheeran that apeeared in The Hot 100 Songs Billboard in 2010 to 2021

ed_sheeran <- w_1021[w_1021$artist == "Ed Sheeran", c("date", "song", "rank")]
ggplot(ed_sheeran, aes(date, rank, color = song)) +
  geom_line() +
  scale_y_reverse() +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
  theme_minimal() +
  theme(legend.text = element_text(size = 5.5), legend.position = "bottom") +
  labs(title = "Few Hits Song From Ed Sheeran", x = "Year", y = "Rank")

Conclusion :
- There is 23 songs that are sung by Ed Sheeran which appeared on Billboard chart from 2010 to 2021
- Ed Sheeran didn’t make a song or his song didn’t appeared on Billboard on 2014, 2016, 2019, and 2020

5.2 Observe hit song that are sung by Coldplay from 2010 to 2021

Here we want to observe how many song and what song that are sung by Coldplay that apeeared in The Hot 100 Songs Billboard in 2010 to 2021

cold <- w_1021[w_1021$artist == "Coldplay", c("date", "song", "rank")]
ggplot(cold, aes(date, rank, color = song)) +
  geom_line() +
  scale_y_reverse() +
  scale_x_date(date_breaks = "1 year", date_labels = "%Y") +
  theme_minimal() +
  theme(legend.text = element_text(size = 8), legend.position = "bottom") +
  labs(title = "Few Hits Song From Coldplay", x = "Year", y = "Rank")

Conclusion :
- There is 11 songs that are sung by Coldplay which appeared on Billboard chart from 2010 to 2021
- Coldplay didn’t make a song or his song didn’t appeared on Billboard on 2013, 2015, 2017, 2018, 2019, 2020, and 2021
- The most song appeared on Billboard chart sung by Coldplay is on 2014

5.3 Observe top 10 most frequentd song appeared on Billboard chart from 2010 to 2021

most_week <- aggregate(weeks.on.board ~ song, w_1021, max)
most_week <- most_week[order(most_week$weeks.on.board, decreasing = T), ]
head(most_week)
##                     song weeks.on.board
## 3387         Radioactive             87
## 3590                Sail             79
## 911       Counting Stars             68
## 3228   Party Rock Anthem             68
## 556      Blinding Lights             65
## 3533 Rolling In The Deep             65
ggplot(most_week[1:10,],aes(weeks.on.board, reorder(song, weeks.on.board))) +
  geom_col(aes(fill = weeks.on.board), show.legend = F) +
  scale_fill_gradient(high = "black",low = "fire brick") +
  geom_label(aes(label = weeks.on.board), nudge_x = 1) +
  labs(title = "Top 10 Most Frequent Song Appeared on Billboard From 2010 to 2021", x = "Frequency Appeared", y = NULL) +
  theme_minimal() 

Conclusion : This is the graph of Top 10 Most Frequent Appeared Song on Billboard From 2010 to 2021, with the most frequency of 87 and that song is Radioactive

5.4 Observe top 10 most frequent artist appeared on Billboard chart from 2010 to 2021

most_artist <- as.data.frame(table(w_1021$artist))
most_artist <- most_artist[order(most_artist$Freq, decreasing = T), ]
head(most_artist)
##                 Var1 Freq
## 2443           Drake  711
## 8059    Taylor Swift  706
## 5383      Luke Bryan  479
## 9164      The Weeknd  433
## 3892    Jason Aldean  424
## 3663 Imagine Dragons  407
ggplot(most_artist[1:10, ],aes(Freq, reorder(Var1, Freq))) +
  geom_col(aes(fill = Freq), show.legend = F) +
  scale_fill_gradient(high = "chocolate4",low = "burlywood1") +
  geom_label(aes(label = Freq), nudge_x = 1) +
  labs(title = "Top 10 Most Frequent Artist Appeared on Billboard", x = "Frequency", y = NULL)

Conclusion : This is the graph of Top 10 Most Artist Appeared Song on Billboard From 2010 to 2021, with the most frequency of 711 and that artist is Drake

6 Final Conclusion

From all the graph that is shown above we can conclude :
- There is 23 songs that are sung by Ed Sheeran which appeared on Billboard chart from 2010 to 2021
- There is 11 songs that are sung by Coldplay which appeared on Billboard chart from 2010 to 2021
- That Radioactive is the hottest song from 2010 to 2021
- Drake is the hottest artist from 2010 to 2021