This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
Spotify <- read.csv("/Users/yashuvaishu/Downloads/Spotify.csv")
## trackName artistName msPlayed genre
## Length:8511 Length:8511 Min. : 0 Length:8511
## Class :character Class :character 1st Qu.: 139977 Class :character
## Mode :character Mode :character Median : 269850 Mode :character
## Mean : 1539795
## 3rd Qu.: 1211910
## Max. :158367130
## danceability energy key loudness
## Min. :0.0000 Min. :0.00108 Min. : 0.000 Min. :-42.044
## 1st Qu.:0.5070 1st Qu.:0.40700 1st Qu.: 2.000 1st Qu.:-10.016
## Median :0.6220 Median :0.59200 Median : 5.000 Median : -7.132
## Mean :0.6016 Mean :0.56681 Mean : 5.243 Mean : -8.580
## 3rd Qu.:0.7140 3rd Qu.:0.75400 3rd Qu.: 8.000 3rd Qu.: -5.309
## Max. :0.9760 Max. :0.99900 Max. :11.000 Max. : 3.010
## speechiness valence tempo id
## Min. :0.00000 Min. :0.0000 Min. : 0.00 Length:8511
## 1st Qu.:0.03610 1st Qu.:0.2380 1st Qu.: 97.18 Class :character
## Median :0.04790 Median :0.4100 Median :118.94 Mode :character
## Mean :0.07833 Mean :0.4353 Mean :119.10
## 3rd Qu.:0.08190 3rd Qu.:0.6180 3rd Qu.:139.32
## Max. :0.94100 Max. :0.9860 Max. :236.20
## duration_ms
## Min. : 10027
## 1st Qu.: 163173
## Median : 195989
## Mean : 203951
## 3rd Qu.: 231378
## Max. :1847210
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Categorical_data <- Spotify %>%
select(trackName,artistName,genre,id)
Categorical_summaries <- lapply(Categorical_data, function(x){
data.frame(Unique_value = unique(x), Counts = table(x))
})
filter(Spotify, artistName == "DJ Snake")
## trackName artistName msPlayed genre
## 1 A Different Way (with Lauv) DJ Snake 66060 edm
## 2 Broken Summer DJ Snake 3987042 edm
## 3 Let Me Love You DJ Snake 64207 edm
## 4 Taki Taki (with Selena Gomez, Ozuna & Cardi B) DJ Snake 44331 edm
## 5 Try Me (with Plastic Toy) DJ Snake 13010 edm
## 6 Turn Down for What DJ Snake 2222 edm
## 7 A Different Way (with Lauv) DJ Snake 66060 edm
## 8 Broken Summer DJ Snake 3987042 edm
## 9 Let Me Love You DJ Snake 64207 edm
## 10 Taki Taki (with Selena Gomez, Ozuna & Cardi B) DJ Snake 44331 edm
## 11 Try Me (with Plastic Toy) DJ Snake 13010 edm
## 12 Turn Down for What DJ Snake 2222 edm
## danceability energy key loudness speechiness valence tempo
## 1 0.784 0.757 8 -3.912 0.0384 0.5870 104.996
## 2 0.683 0.415 10 -10.720 0.0841 0.5540 81.006
## 3 0.649 0.716 8 -5.371 0.0349 0.1630 99.988
## 4 0.842 0.801 8 -4.167 0.2280 0.6170 95.881
## 5 0.680 0.703 7 -3.360 0.0439 0.1930 102.289
## 6 0.818 0.799 1 -4.100 0.1560 0.0815 100.014
## 7 0.784 0.757 8 -3.912 0.0384 0.5870 104.996
## 8 0.683 0.415 10 -10.720 0.0841 0.5540 81.006
## 9 0.649 0.716 8 -5.371 0.0349 0.1630 99.988
## 10 0.842 0.801 8 -4.167 0.2280 0.6170 95.881
## 11 0.680 0.703 7 -3.360 0.0439 0.1930 102.289
## 12 0.818 0.799 1 -4.100 0.1560 0.0815 100.014
## id duration_ms
## 1 1YMBg7rOjxzbya0fPOYfNX 198286
## 2 60mrGpCA4OIUuRLwi4T5Nm 183779
## 3 0lYBSQXN6rCTvUZvg9S0lU 205947
## 4 4w8niZpiMy6qz1mntFA5uM 212500
## 5 5vTtANNlQK5UhwfooDek5y 198405
## 6 67awxiNHNyjMXhVgsHuIrs 213733
## 7 1YMBg7rOjxzbya0fPOYfNX 198286
## 8 60mrGpCA4OIUuRLwi4T5Nm 183779
## 9 0lYBSQXN6rCTvUZvg9S0lU 205947
## 10 4w8niZpiMy6qz1mntFA5uM 212500
## 11 5vTtANNlQK5UhwfooDek5y 198405
## 12 67awxiNHNyjMXhVgsHuIrs 213733
library(ggplot2)
plot(Spotify)
The goal of the project is to analyze most played musics based on genre and to predict the most popular artist based on the data in Spotify dataset table.
“Spotify Song Attributes Dataset: Exploring the Musical Landscape”
The Spotify Song Attributes Dataset is a comprehensive collection of music tracks, encompassing various genres and artist names. This dataset provides valuable insights into the world of music, allowing enthusiasts, researchers, and data scientists to delve into the characteristics and nuances of each track.
The dataset includes the author’s streaming history throughout the year 2022. It consists of key features such as danceability, energy, loudness, speechiness, acousticness, instrumentalness, liveness, valence, tempo, duration, and time signature. These attributes provide a holistic view of the songs, enabling users to analyze and interpret different aspects of their musical composition.
trackName - The name of the track. artistName - The name of the artist or band associated with the track. msPlayed - The duration in milliseconds that the track was played. genre - The genre or genres associated with the track. danceability - A measure of how suitable a track is for dancing. energy - The energy level of the track. key - The key of the track (e.g., C, D, E). loudness - The overall loudness of the track in decibels (dB). speechiness - The presence of spoken words in the track. valence - The musical positiveness or happiness conveyed by the track. tempo - The tempo of the track in beats per minute (BPM). id - The unique identifier of the track. duration_ms - The duration of the track in milliseconds.
# Standard deviation of speechiness
std_spotify<- sd(Spotify$speechiness)
print(std_spotify)
## [1] 0.07851632
# Variance of speechiness
var_spotify <- var(Spotify$speechiness)
print(var_spotify)
## [1] 0.006164813
# Sum of total incentive provided
incent_sum <- sum(Spotify$speechiness)
print(incent_sum)
## [1] 666.6545
Energy Vs Loudness Graph
From this graph we can understand that loudness is directly propotional to energy.
ggplot(Spotify, aes(energy, loudness)) + geom_point(size=2, color="purple") + labs(title = "energy vs loudness") + theme(axis.title.x=element_text(colour="red"),axis.title.y = element_text(colour="red"))
a <- select_if(Spotify, is.numeric)
summary(a)
## msPlayed danceability energy key
## Min. : 0 Min. :0.0000 Min. :0.00108 Min. : 0.000
## 1st Qu.: 139977 1st Qu.:0.5070 1st Qu.:0.40700 1st Qu.: 2.000
## Median : 269850 Median :0.6220 Median :0.59200 Median : 5.000
## Mean : 1539795 Mean :0.6016 Mean :0.56681 Mean : 5.243
## 3rd Qu.: 1211910 3rd Qu.:0.7140 3rd Qu.:0.75400 3rd Qu.: 8.000
## Max. :158367130 Max. :0.9760 Max. :0.99900 Max. :11.000
## loudness speechiness valence tempo
## Min. :-42.044 Min. :0.00000 Min. :0.0000 Min. : 0.00
## 1st Qu.:-10.016 1st Qu.:0.03610 1st Qu.:0.2380 1st Qu.: 97.18
## Median : -7.132 Median :0.04790 Median :0.4100 Median :118.94
## Mean : -8.580 Mean :0.07833 Mean :0.4353 Mean :119.10
## 3rd Qu.: -5.309 3rd Qu.:0.08190 3rd Qu.:0.6180 3rd Qu.:139.32
## Max. : 3.010 Max. :0.94100 Max. :0.9860 Max. :236.20
## duration_ms
## Min. : 10027
## 1st Qu.: 163173
## Median : 195989
## Mean : 203951
## 3rd Qu.: 231378
## Max. :1847210
Here is a line graph between genre and msplayed which help us to know which genre is played more times.
ggplot(Spotify, aes(genre,msPlayed)) + geom_line(size=2, color="green") + geom_point(size=3, color="#008080") + labs(title = "genre vs msplayed", shape= "Research area") + theme(axis.text.x = element_text(angle = 90))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
ggplot(Spotify, aes(danceability,energy)) + geom_point(alpha = 0.1,color="#008080") + labs(title = "energy vs danceability")+ theme(axis.title.x=element_text(colour="red"),axis.title.y = element_text(colour="red"))
ggplot(Spotify, aes(key,energy)) + geom_bar(stat="identity", fill="steelblue") + labs(title = "energy vs key")
ggplot(Spotify, aes(loudness,speechiness)) + geom_bar(stat="identity", fill="blue")+ labs(title = "loudness vs speechiness")
pie(table(Spotify$key), main="Pie chart for Key")
ggplot(Spotify, aes(x=energy)) + geom_histogram(fill="steelblue") + theme(axis.title.x=element_text(colour="orange"),axis.title.y = element_text(colour="orange")) + labs(title = "energy vs count")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(Spotify, aes(danceability,valence)) +
geom_point(size=0.5)+
geom_smooth(method=lm, linetype="dashed",
color="darkred", fill="blue")+ theme(axis.title.x=element_text(colour="green"),axis.title.y = element_text(colour="red")) + labs(title = "danceability vs valence")
## `geom_smooth()` using formula = 'y ~ x'
ggplot(Spotify, aes(valence, loudness)) +
geom_boxplot()+theme(axis.title.x=element_text(colour="blue"),axis.title.y = element_text(colour="blue"))+ labs(title = "valence vs loudness")
## Warning: Continuous x aesthetic
## ℹ did you forget `aes(group = ...)`?