The culture of music

Music and culture go hand in hand. By accessing Spotify’s API, I attempt to break down the culture of a country through the music its people listen to. I am working with a panel data set that I constructed. It consists of 61 Countries over 90 weeks from 2017-2018.

Loading & Cleaning the data:

data=read.csv("/Users/gregmaghakian/Documents/Econ 392W/Code/SpotifyDataFiles/tracklist_with_audio_features.csv")
data$Country=as.character(data$Country)
data$Date=as.Date(data$Date)
data=filter(data,data$Country!="Global")

Valence

To understand the culture and mood of a country, we will be using Spotify’s measure of valence. Valence, or the emotion of a song, is a continuous feature [0,1], where 0 is the saddest a song can be, and 1 is the happiest a song can be.

By using valence as a proxy for national mood and culture, I hope to see clear patterns between countries & regions when it comes to the values for valence itself. In this sense, we will be able to draw conclusions about certain countries and cultures.

vdata=data%>%
  select(Country,valence,Date)%>%
  group_by(Country)%>%
  mutate(avgValence=mean(valence))%>%
  select(Country,avgValence)%>%
  unique()

ggplot(data=vdata,aes(x=reorder(Country,avgValence),y=avgValence,fill=avgValence))+geom_bar(stat = "identity")+scale_fill_viridis(name = "Average Valence", option = "C")+coord_flip()+labs(title = 'Average Valence',subtitle='By country',x="Country",y="Average Valence")+geom_text(aes(label=round(avgValence,digits=3)), color="black", size=2.5,hjust=-.3)+
  theme_minimal()

vdata2=data%>%
  select(Country,valence,Date)%>%
  group_by(Country,Date)%>%
  mutate(avgValence=mean(valence))%>%
  select(Country,avgValence)%>%
  unique()

ggplot(vdata2, aes(x = `avgValence`, y = `Country`, fill = ..x..)) +
  geom_density_ridges_gradient(scale = 3, rel_min_height = 0.05, gradient_lwd = 1.) +
  scale_x_continuous(expand = c(0.01, 0)) +
  scale_y_discrete(expand = c(0.01, 0)) +
  scale_fill_viridis(name = "Average Valence", option = "Z") +
  labs(
    title = 'Average Valence',
    subtitle = 'Average Valence distribution by country\nOver 90 Weeks'
  ) +
  theme_ridges(font_size = 11, grid = TRUE)+xlab("Average Valence")

Patterns Emerging…

We start to see a pattern emerging. It looks like there is a clear grouping of culture when checking out the two visualizations. To me at least, we can see that Latin American countries have higher average valence than the other countries.

Let’s dig deeper.

data=data%>%
  mutate(region=
           ifelse(Country=="Uruguay"|Country=="Peru"|Country=="Argentina"|Country=="Bolivia"|Country=="Brazil"|Country=="Chile"|Country=="Colombia"|Country=="Costa Rica"|Country=="Dominican Republic"|Country=="Ecuador"|Country=="El Salvador"|Country=="Guatemala"|Country=="Honduras"|Country=="Mexico"|Country=="Nicaragua"|Country=="Panama"|Country=="Paraguay","Latin America","Not Latin America"))


vdata4=data%>%
  select(Country,valence,Date,region)%>%
  group_by(Country)%>%
  mutate(avgValence=mean(valence))%>%
  select(Country,avgValence,region)%>%
  unique()


ggplot(vdata4, aes(region, avgValence, label = Country)) +
geom_label_repel(segment.alpha = .3) +
  geom_point(color = 'blue') +
  theme_minimal()+labs(title = "Average Valence By Region",subtitle = "By Country",x="Region",y="Average Valence")

vdata3=data%>%
  select(region,Country,Date,valence)%>%
  group_by(region,Date)%>%
  summarise(avgValence=mean(valence))
  

a=ggplot(vdata3, aes(x=region,y=avgValence, fill = avgValence)) +
  geom_col(alpha = 0.7) +
  scale_size(range = c(2, 12)) + guides(fill=guide_legend(title="Average Valence"))+
  facet_wrap(~region) +
  labs(title = 'Average Valence Over Time',subtitle='Date: {frame_time}', x = 'Region', y = 'Average Valence') +
  transition_time(Date) +theme_minimal()

animate(a,fps=2)

It is clear that average valence tends to be higher across the two years in Latin american and Spanish speaking countries. This cultural grouping makes sense to me, as Latin american music and culture is usually thought of as upbeat, happy, and inviting. It is interesting to see how the high valence Latin American countries are geographically similar as well, leading me to believe that there is cultural diffusion and influence across borders.

Through the lens of a regression analysis

ggplot(vdata3, aes(x = Date, y =avgValence,fill=region)) + 
  geom_point(colour="blue") +facet_wrap(~region)+
  stat_smooth(method = "lm", col = "red")+labs(title="Regression of Date on Average Valence",subtitle="By Region",x="Date",y="Average Valence")+theme_bw()

We can clearly see the trend of higher valence for the group of Latin american countries, as they have a higher valence compared to the non Latin american countries when regressing average valence on date.

vdata4=as.data.frame(vdata4)
vdata4$region=as.factor(vdata4$region)
reg1=zelig(avgValence~region,data = vdata4,model = "ls",cite=FALSE)
reg1$setx(region="Not Latin America")
reg1$setx1(region="Latin America")                                                                                   
reg1$sim()
reg1$graph()

First Difference

Similarly, when looking at the first difference between the two regions, we view an expected value of average valence for Latin american countries of about .085 greater than non-Latin american countries. The 95% confidence interval is >0 and therefore, a positive difference other than 0 is expected for average valence between the two regions at the 95% level. We can give a certain degree of statistical significance to this, as well as let it support our findings that the Latin country groups have happier songs on average.

Thoughts so far..

Country does matter with regards to song emotion. Culturally speaking, it seems that there is something unique and universal across Latin-american-like countries that create happier songs. It could be that it is a culture of dance and celebration of life, and therefore more upbeat and joyful songs are created. Maybe there is a linguistic component that inherently creates a happier sounding song when in Spanish. However, it could also be that those countries do not perform as well economically when compared to the Nordic or European countries–thereby compensating for economic volatility through happy songs and moods. Either way, it is clear that there is a cultural factor and grouping. What it truly is, is to be determined.

#Danceability

Let’s try and extract more culture information from the data by playing with danceability. “Danceability describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.” I suspect the Latin american countries will have more danceable music (based on their culture), contributing to the theory of higher happiness and positivity conveyed through the music.

vdata5=data%>%
  select(Country,valence,Date,danceability,region)%>%
  group_by(Country,Date)%>%
  mutate(avgValence=mean(valence),avgDance=mean(danceability))%>%
  select(Country,avgValence,avgDance,region,Date)%>%
  unique()


p1=ggplot(vdata5,aes(x=avgDance,y=region,fill=region))+geom_density_ridges( rel_min_height = 0.01)+  stat_density_ridges(quantile_lines = TRUE, quantiles = 2)+ guides(fill=guide_legend(title="Region"))+labs(title="Average Danceability Distribution",subtitle = "By region",x="Average Danceability",y="Region")

p2=ggplot(vdata5,aes(x=Date,y=avgDance,color=region))+geom_point(alpha=.5)+facet_wrap(~region)+stat_smooth(method = "lm", col = "black")+ guides(color=guide_legend(title="Region"))+labs(title="Average Danceability Over Time",subtitle = "By region",x="Date",y="Average Danceability")+ theme(axis.text.x = element_text(angle=70, vjust=0.5))

plot_grid(p1,NULL,NULL,p2,nrow=2,ncol=2,labels=c("Fig. A","","","Fig. B"))

p=ggplot(data=vdata5,aes(x=avgDance,y=avgValence,color=region))+geom_point()+facet_wrap(~region)+  guides(color=guide_legend(title="Region"))+
  labs(title = 'Danceability vs. Average Valence',subtitle='Date: {frame_time}', x = 'Average Danceability', y = 'Average Valence') +
  transition_time(Date)+theme_ridges()

animate(p,fps=2)

cor(vdata5$avgDance,vdata5$avgValence)

## [1] 0.5178965

Danceabilty & Valence

As we can see, danceability in the higher valenced Latin american tends to be higher as well. In general, the correlation between average valence and average danceability is also positive at .5. Over time, we view a steady relationship between valence and danceability as well.

Countries that have dance embedded in their culture and way of life, like Latin american ones, represent this culture through music, and in turn, the positivity of music as well.

Music and Culture

Greg Maghakian

11/17/2018