Data Visualization Analysis with Plotly

# necessary libraries
library(tidyverse)
library(plotly)

Spotify Analysis

# spotify top music data (based on the billboards)
spotify <- read.csv("spotify_top_music.csv")
head(spotify)

# plotly scatterplot of spotify songs and their danceability/popularity
spotify %>%
  plot_ly(x = ~dnce, y = ~pop, color = ~year,
          hoverinfo = "text",
          text = ~paste("Song Title:", title, "<br>",
                        "Artist:", artist, "<br>",
                        "Popularity:", pop, "<br>",
                        "Danceablility:", dnce)) %>%
  add_markers() %>%
  layout(title = "Danceability vs Popularity for Top Spotify Songs",
         xaxis = list(title = "Danceability (0-100 scale)"),
         yaxis = list(title = "Popularity (0-100 scale)"))

# a closer look at the majority of data
spotify %>%
  filter(pop >= 30) %>%
  plot_ly(x = ~dnce, y = ~pop, color = ~year,
          hoverinfo = "text",
          text = ~paste("Song Title:", title, "<br>",
                        "Artist:", artist, "<br>",
                        "Popularity:", pop, "<br>",
                        "Danceablility:", dnce)) %>%
  add_markers() %>%
  layout(title = "Danceability vs Popularity for Top Spotify Songs",
         xaxis = list(title = "Danceability (0-100 scale)"),
         yaxis = list(title = "Popularity (0-100 scale)"))

The above scatterplot is a visual of songs on Spotify that were in the Billboard top songs of the year at some time between 2010 and 2019. The popularity and danceability (how easy it is to dance to) scores for each song were recorded on scales from 0 to 100. There seems to be a weak, positive relationship between these two variables based on the scatterplot. Hovering over the points on this plot will reveal the song, artist, and scores for each observation in the dataset. A lot of the data is gathered towards more of the central scores (50-70), so it can be slightly difficult to extract the animation in this area of the plot. In the second plot, I filtered out observations with popularity scores lower than 30 to take a closer look at the majority of data points because of this. On average for this dataset, songs with higher danceability tend to be more popular among listeners. Overall, when considering the year each song was released, it also appears that current songs tend to have higher danceability scores and also achieve more popularity compared to songs from before 2012. However, there are still older songs mixed in that are popular and have somewhat high danceability scores. Taking a closer look at the songs individually, thanks to the animation of the plot, I noticed a few interesting insights. The songs with the lowest popularity scores are songs that I did not recognize. Some of the most popular songs such as “Memories” and “Senorita” are songs that I am familiar with as they are newer and played on the radio often. There are songs that are highly popular that aren’t as easy to dance to such as slow songs like “Lose You to Love Me” and “Someone You Loved”. Listeners have a wide spread of appeal when it comes to music, and some people don’t listen to songs solely to dance to them.

Olympic Swimming Analysis

# olympic swimmer data
olympics_swim <- read.csv("olympic_swimming.csv")
head(olympics_swim)

# not a lot of data for breaststroke and years before 1970
table(olympics_swim$Stroke)

## 
##   Backstroke Breaststroke    Butterfly    Freestyle 
##          133           29          173          271

table(olympics_swim$Year)

## 
## 1924 1928 1932 1936 1948 1952 1956 1960 1964 1968 1972 1976 1980 1984 1988 1992 
##    1    2    6    7    7    8    8    8   10   17   29   32   32   33   36   38 
## 1996 2000 2004 2008 2012 2016 2020 
##   35   40   40   52   54   55   56

# animated plotly scatterplot of men's race times over the years
anim <- olympics_swim %>%
  filter(Year >= 1980 & Gender == "Men" & Stroke != "Breaststroke") %>%
  plot_ly(x = ~Rank, y = ~Results, 
          hoverinfo = "text",
          text = ~paste("Athlete:", Athlete, "<br>",
                        "Country:", Team, "<br>",
                        "Year:", Year, "<br>",
                        "Time:", Results)) %>%
  add_markers(frame = ~Year, color = ~Stroke) %>%
  layout(title = "Men's Olympic Swimming Performance (1980-2020)",
         xaxis = list(title = "Placement Rank (1-4)", tickvals = c(1:4)),
         yaxis = list(title = "Race Time (Seconds)", tickvals = c(46:60)))

anim %>%
  animation_slider(currentvalue = list(prefix = FALSE, font = list(color = "steelblue", size = 30)))

Each frame has a scatterplot each representing a different Olympic year. Each point represents the time each swimmer completed the race in and the place they got in the race. Data for different strokes in the Olympics are plotted including freestyle, butterfly, and backstroke. The athletes swimming freestyle have faster race times than those swimming butterfly and both of these strokes have faster times than backstroke. This holds true across all of these Olympic years. The race times for freestyle from 1980 to 2020 range from 47.02 to 52.22 seconds. The race times for butterfly from 1980 to 2020 range from 49.45 to 55.7 seconds. Finally, the race times for backstroke from 1980 to 2020 range from 51.97 to 58.38. Overall, the times tend to get faster every Olympics with a few exceptions. When analyzing these plots, I focused on comparing the USA team to the other Olympic teams. The Men’s USA team won first place in backstroke for 7 out of the 11 Olympics plotted. They held the record from 1996 until 2020 when the time was beat by 0.01 second by Russian Olympian, Evgeny Rylov. In the 2012 Olympic Games, the USA dominated and won first place in freestyle, butterfly, and backstroke. Michael Phelps won first place in the butterfly stroke every Olympic year from 2004 to 2012. Sweden seems to have been a competitive and successful team during the 1980 Olympics, and since then there have been some more standout teams. The United States, Russia, and Australia teams seem to have been the most successful across these 11 Olympic years as they appear the most throughout these plots. This plot provides many insights into Men’s swimming in the Olympics and a lot of useful analysis can be done to see how the races and times have changed over the years. I attempted to add lines connecting the points within each swimming stroke, but the plot looked more confusing to analyze, so I decided to leave them out in my final plot.

Data Visualization Analysis with Plotly

Allison Buck