The purpose of this assignment is to use HTML scraping tools to collect data from a web source. For my analysis, I grabbed data from an XML table on Wikipedia. The table is a series of lists that share information about the Billboard Hot 100 number-one singles of the 2000s (2000-2009). The data was compiled by Nielsen SoundScan based collectively on each single’s weekly physical (CD, vinyl and cassette) and digital sales, airplay, and streaming.
The packages required for this analysis are: XML tidyverse httr rvest *sqldf
Which artist appeared at #1 on the Billboard Hot 100 the most times in the 2000s?
This question will count the number of times an artist had a #1 single with no features. Hit songs with features from other artists are not included. This is interesting because it shows who had some of the most success in music during this time, and how they might’ve impacted music in this decade.
## # A tibble: 10 x 2
## # Groups: Artist [10]
## Artist n
## <chr> <int>
## 1 <NA> 9
## 2 Usher 4
## 3 Rihanna 3
## 4 Britney Spears 2
## 5 Christina Aguilera 2
## 6 Destiny's Child 2
## 7 Fergie 2
## 8 Janet Jackson 2
## 9 Jennifer Lopez featuring Ja Rule 2
## 10 Justin Timberlake 2
The results tell us that Usher appeared at #1 alone more than any other artist in the 2000s. Usher appeared four times thanks to his classic hits that will be illistrated later in this analysis. If you included features, most of these artists’ total appearances would increase; however, it is interesting to see which artists had the most solo influence.
To improve this analysis I would run a query that grabbed the names of Artists from both their solo projects and their features, then grouped them together. With more data from recent years, we could run a regression test to see if the artists that ran the charts in the 2000s still have any real presence on the billboard charts.
In 2006, which songs spent the most weeks at #1? By which Artist?
I am interested to see what is the “Song of 2006”, or which song was the most popular in that specfic year.
## Artist Song Weeks
## 1 Justin Timberlake "SexyBack" 7
## 2 Nelly Furtado featuring Timbaland "Promiscuous" 6
## 3 Beyoncé featuring Slim Thug "Check on It" 5
## 4 Daniel Powter "Bad Day"♪[G] 5
## 5 Rihanna "SOS" 3
## 6 Fergie "London Bridge" 3
## 7 Justin Timberlake featuring T.I. "My Love" 3
## 8 Nelly featuring Paul Wall, Ali and Gipp "Grillz" 2
## 9 Ne-Yo "So Sick" 2
## 10 Chamillionaire featuring Krayzie Bone "Ridin'" 2
## 11 Shakira featuring Wyclef Jean "Hips Don't Lie" 2
## 12 Ludacris featuring Pharrell "Money Maker" 2
## 13 Akon featuring Snoop Dogg "I Wanna Love You" 2
## 14 D4L "Laffy Taffy" 1
## 15 James Blunt "You're Beautiful" 1
## 16 Sean Paul "Temperature" 1
## 17 Taylor Hicks "Do I Make You Proud" 1
According to these results, 2006 was a great year. The song “Sexy Back” by Justin Timeberlake spent 7 weeks at #1. The results also show us that Justin Timberlake was the only artist in 2006 to have two #1 hits. This was definitely a successful year for him!
With more data, I would like to run a comparison test to see the success of the albums that the singles were associated with, to determine if the success was mainly from the artists’ work overall or just the single.
Of all the #1 hits in the 2000s, which songs spent more than 8 weeks at number 1?
This is going to show the longest running singles in the 2000s, and what songs could be considered the “Song of the Decade”.
## Artist Song Weeks
## 1 Mariah Carey "We Belong Together"†[F] 14
## 2 The Black Eyed Peas "I Gotta Feeling" 14
## 3 Usher featuring Lil Jon and Ludacris "Yeah!"♪[E] 12
## 4 The Black Eyed Peas "Boom Boom Pow"♪[J] 12
Mariah Carey’s “We Belong Together” and The Black Eyed Peas “I Gotta Feeling” spent 14 weeks at #1 in the decade. That is about 3 1/2 months each. The Black Eyed Peas also spent 12 additional weeks at #1 with “Boom Boom Pow”. Combined, they spent 26 weeks at #1 without any other features.
Given the proper data, I would run an ANOVA test to see if there was a difference in sales from streams or album sales. If one or the other were the only one considered, would that have’ve impacted the weeks spent at #1.
In the 2000s, which artists spent the most number of weeks at #1?
This will sum all the weeks at #1 for each single by each Artist, and see who spent the most total weeks at #1 throughout the entire decade.
## Selecting by Weeks
## Artist sum(weeks)
## 1 Mariah Carey 16
## 2 Usher featuring Lil Jon and Ludacris 12
## 3 The Black Eyed Peas 26
## 4 Santana featuring The Product G&B 10
## 5 Nelly featuring Kelly Rowland 10
## 6 Kanye West featuring Jamie Foxx 10
## 7 Flo Rida featuring T-Pain 10
## 8 Ashanti 10
## 9 Mario 9
## 10 Beyoncé featuring Sean Paul 9
## 11 50 Cent featuring Olivia 9
## 12 50 Cent 9
## 13 Beyoncé featuring Jay-Z 8
## 14 T.I. 7
## 15 Soulja Boy 7
While it appears that the Black Eyed Peas spent the most weeks at #1, Usher would have the most weeks combined. He is listed separately on this list because of the features; however, if you add the singles with Usher on them then he will have more weeks than the Black Eyed Peas.
What were the names of all of Usher’s #1 hits in the 2000s ordered by number of weeks?
Since we discovered Usher spent the most weeks at #1, it will be interesting to see which songs got him there.
## Artist Song Usher
## 1 Usher featuring Lil Jon and Ludacris "Yeah!"♪[E] 12
## 2 Usher "Burn" 8
## 3 Usher "U Got It Bad" 6
## 4 Usher and Alicia Keys "My Boo" 6
## 5 Usher "U Remind Me" 4
## 6 Usher featuring Young Jeezy "Love in This Club" 3
## 7 Usher "Confessions Part II" 2
The results show that his most successful song was “Yeah!” with Lil Jon and Ludacris. His most successful album to date, “Confessions”give him the most #1 singles with “Yeah!”, “Burn”, “My Boo”, “Confessionss Part II”. Usher had a very successful decade and gave us timeless hits that will last a lifetime!