Project 1 2nd time

Author

Aaron Thomas

Olympic Rowing Medals Analysis (1896-2022)

The image won’t load

pngtree.com

In this analysis I aimed to explore the strength of different nations in Olympic rowing by visualizing the distribution of medals and points. The dataset is sourced from the SCORE Sports Data Repository.

Variables Description

Event: The specific rowing event.

NOC: National Olympic Committee or the nation competing.

Number of Athletes: The number of athletes in the boat for that event.

total_medals: The total number of medals for that country in that event, adjusted by the number of athletes.

total_points: The total number of points for that country in that event (gold = 3 points, silver = 2 points, bronze = 1 point), adjusted by the number of athletes.

Loading Libraries and Dataset

# Loading libraries
library(readr)
library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(highcharter)
Warning: package 'highcharter' was built under R version 4.3.3
Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo 
library(shiny)
Warning: package 'shiny' was built under R version 4.3.3
# The dataset
rowing_data <- read_csv("rowing_medals.csv")
Rows: 733 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): Event, NOC
dbl (3): Number of Athletes, total_medals, total_points

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Data Cleaning
rowing_data$Event <- as.factor(rowing_data$Event)
rowing_data$NOC <- as.factor(rowing_data$NOC)

# Display the cleaned data structure
str(rowing_data)
spc_tbl_ [733 × 5] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
 $ Event             : Factor w/ 25 levels "Rowing Men's 17-Man Naval Rowing Boats",..: 1 1 2 2 3 3 3 3 3 3 ...
 $ NOC               : Factor w/ 101 levels "ALG","ANG","ANZ",..: 36 47 36 47 3 4 5 6 9 13 ...
 $ Number of Athletes: num [1:733] 16 16 6 6 9 9 9 9 9 9 ...
 $ total_medals      : num [1:733] 3.44 11.44 9.17 30.5 0 ...
 $ total_points      : num [1:733] 7.25 22 19.33 58.67 0 ...
 - attr(*, "spec")=
  .. cols(
  ..   Event = col_character(),
  ..   NOC = col_character(),
  ..   `Number of Athletes` = col_double(),
  ..   total_medals = col_double(),
  ..   total_points = col_double()
  .. )
 - attr(*, "problems")=<externalptr> 

Visualization 1: Boxplot of Total Medals by Event (This one didn’t work out the way I wanted it to, I’m just bad at boxplots)

# Boxplot of Total Medals by Event 

hc_boxplot <- hchart(rowing_data, type = "boxplot", hcaes(x = Event, y = total_medals, group = Event)) %>%
  hc_title(text = "Distribution of Total Medals by Event") %>%
  hc_xAxis(title = list(text = "Event")) %>%
  hc_yAxis(title = list(text = "Total Medals")) %>%
  hc_colors(colors = rainbow(length(unique(rowing_data$Event))))

hc_boxplot

Visualization 2: Scatter Plot of Total Points vs. Total Medals

# Scatter plot of Total Points vs. Total Medals using highcharter

hc_scatter <- hchart(rowing_data, type = "scatter", hcaes(x = total_medals, y = total_points, group = NOC, color = NOC)) %>%
  hc_title(text = "Scatter Plot of Total Points vs. Total Medals") %>%
  hc_xAxis(title = list(text = "Total Medals")) %>%
  hc_yAxis(title = list(text = "Total Points")) %>%
  hc_colors(colors = rainbow(length(unique(rowing_data$NOC))))

hc_scatter

Interactive version of the plot above (It opens to another window. Also had chat gbt help with this thought this was interesting and wanted to try something new)

ui <- fluidPage(
  titlePanel("Interactive Olympic Rowing Medals Visualization"),
  sidebarLayout(
    sidebarPanel(
      selectInput("event", "Select Event:", choices = unique(rowing_data$Event)),
      selectInput("metric", "Select Metric:", choices = c("Total Medals" = "total_medals", "Total Points" = "total_points"))
    ),
    mainPanel(
      highchartOutput("interactivePlot")
    )
  )
)

server <- function(input, output) {
  output$interactivePlot <- renderHighchart({
    data_filtered <- rowing_data %>% filter(Event == input$event)
    
    if (input$metric == "total_medals") {
      hchart(data_filtered, type = "boxplot", hcaes(x = NOC, y = total_medals, group = NOC)) %>%
        hc_title(text = paste("Distribution of", input$metric, "by Country for", input$event)) %>%
        hc_xAxis(title = list(text = "Country")) %>%
        hc_yAxis(title = list(text = "Total Medals")) %>%
        hc_colors(colors = rainbow(length(unique(data_filtered$NOC))))
    } else {
      hchart(data_filtered, type = "scatter", hcaes(x = total_medals, y = total_points, group = NOC, color = NOC)) %>%
        hc_title(text = paste("Scatter Plot of Total Points vs. Total Medals for", input$event)) %>%
        hc_xAxis(title = list(text = "Total Medals")) %>%
        hc_yAxis(title = list(text = "Total Points")) %>%
        hc_colors(colors = rainbow(length(unique(data_filtered$NOC))))
    }
  })
}

shinyApp(ui = ui, server = server)

Shiny applications not supported in static R Markdown documents

Overall this dataset went really well. With the help of chat gbt cleaning the dataset was really easy mainly because it was relatively clean, with no missing values. The visualizations provided showed the power that certain nations had in Olympic rowing. For instance, countries like the USA and Germany consistently rank high in total medals and points across various events. The use of multiple colors helped distinguish between different events, making the plots more informative and visually appealing. I was trying new things out for this project so I had a little help from chat gbt. One challenge was ensuring the visualizations were both informative and aesthetically pleasing, which is the reason why I choose to use highcharter.