Project2

Author

NCowan

2022 Beijing Olympics Women’s Ice Skating Map

A. The Winter Olympics were last held in Beijing in 2022. The games were very exciting and my personal favorite sport in the winter Olympics is figure skating. So, I have taken this project as an opportunity to make a map showing where all the female skaters who competed are from and their scores. I thought this would be an interesting project, not only because I love figure skating and this is something I am genuinely interested in, but also because you may be able to see some patterns as to where the skaters who do well are from. To make this happen, I used a chat from Wikipedia that was a summary of several tables from ISU (International Skating Union). This table showed the skaters name, country they represented, their overall rank, and their rank and score for their short program and freestyle performances. I then merged it with a data set from Kaggle that has the coordinates of every country so I could plot where these skaters were from on a map.

First, I loaded the libraries needed for this project. I used tidyverse, rvest for web scraping, and leaflet for mapping.

library (plotly)
Loading required package: ggplot2

Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':

    last_plot
The following object is masked from 'package:stats':

    filter
The following object is masked from 'package:graphics':

    layout
library(rvest)
library(leaflet)
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ lubridate 1.9.4     ✔ tibble    3.3.0
✔ purrr     1.0.4     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter()         masks plotly::filter(), stats::filter()
✖ readr::guess_encoding() masks rvest::guess_encoding()
✖ dplyr::lag()            masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

I web scraped for my main table from Wikipedia.

#Getting my website
wikipedia_url <- "https://en.wikipedia.org/wiki/Figure_skating_at_the_2022_Winter_Olympics_%E2%80%93_Women%27s_singles#cite_ref-isu-progress-W-TO_48-0"
wiki_page <- read_html(wikipedia_url)

#Getting the table I need into my datasets on R Studio
wiki_tables <- wiki_page %>% 
  html_nodes("table.wikitable") %>% 
  html_table()
wikipedia_data <- wiki_tables[[5]]

#I had to rename my header but the normal cleaning was not working so I just renamed my headers
names(wikipedia_data) <- c("rank_old", "name", "country", "total_score", "sp_rank", "sp_score", "fs_rank", "fs_score")

#The rank for 1st, 2nd, and 3rd on the Wikipedia sight were pictures of medals so I had to fix that
wikipedia_data <- wikipedia_data %>%
  mutate(rank = 1:n())
wikipedia_data <- wikipedia_data %>%
  select("name", "country", "total_score", "sp_rank", "sp_score", "fs_rank", "fs_score", "rank")
#I also needed to change ROC to Russia so that I could easily merge and map later and then UK was losted as Great Britain so I had to change that
wikipedia_data <- wikipedia_data %>%
  mutate(country = str_replace(country, "ROC", "Russia")) %>%
  mutate(country = str_replace(country, "Great Britain", "United Kingdom"))

I loaded my csv with the coordinates and merged them.

#Here I loaded my csv and merged my data
countries_data <- read_csv("world_country_and_usa_states_latitude_and_longitude_values.csv")
Rows: 245 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): country_code, country, usa_state_code, usa_state
dbl (4): latitude, longitude, usa_state_latitude, usa_state_longitude

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
final_data <- left_join(wikipedia_data, countries_data, by = "country")

#I had a lot of extra column from the csv that I deleted
final_data <- final_data %>%
  select("name", "country", "total_score", "sp_rank", "sp_score", "fs_rank", "fs_score", "rank", "latitude", "longitude")

#I replace dashes and N/A with "0"
final_data <- final_data %>%
  mutate(across(everything(), ~ str_replace(., "N/A", "0")))

#I made sure everything was numeric 
final_data <- final_data %>%
  mutate(across(c(sp_rank, sp_score, fs_rank, fs_score, total_score, rank, latitude, longitude), as.numeric))
Warning: There were 4 warnings in `mutate()`.
The first warning was:
ℹ In argument: `across(...)`.
Caused by warning:
! NAs introduced by coercion
ℹ Run `dplyr::last_dplyr_warnings()` to see the 3 remaining warnings.
head(final_data)
# A tibble: 6 × 10
  name      country total_score sp_rank sp_score fs_rank fs_score  rank latitude
  <chr>     <chr>         <dbl>   <dbl>    <dbl>   <dbl>    <dbl> <dbl>    <dbl>
1 Anna Shc… Russia         256.       1     80.2       2     176.     1     61.5
2 Alexandr… Russia         252.       3     74.6       1     177.     2     61.5
3 Kaori Sa… Japan          233.       2     79.8       3     153.     3     36.2
4 Wakaba H… Japan          214.       4     73.5       5     141.     4     36.2
5 You Young South …        213.       5     70.3       4     143.     5     35.9
6 Alysa Liu United…        209.       7     69.5       6     139.     6     37.1
# ℹ 1 more variable: longitude <dbl>

Plot

#I deleted the last few rows because those contain skaters that didnt ove onto the freestyle program
plot_data <- final_data %>%
  slice(1:(n()-6))

#making my plot
my_plot <- ggplot(plot_data, aes(
  x = sp_score,       
  y = fs_score,                 
  color = rank,              
  text = paste("Skater Name:", name, "\n",
               "Overall Rank:", rank, "\n",
               "Short Program Score:", sp_score, "\n",
               "Freestyle Score:", fs_score)
)) +
  geom_point(size = 3) +
  scale_color_gradient(low = "lightblue", high = "mediumpurple") +
  theme_bw() +
  labs(
    title = "Figure Skating: Short Program vs Freestyle Scores",
    x = "Short Program Score", 
    y = "Freestyle Score",
    color = "Overall Rank",
    caption = "Data Source: Wikipedia, ISU"
  )

ggplotly(my_plot, tooltip = "text")

Map

#Here I made my map
skaters_map <- final_data %>%
  leaflet() %>%
  addTiles() %>%
  #because there were some skaters from the same country I had to jitter
  addCircleMarkers(
    lng = ~jitter(longitude, amount = 1.5), 
    lat = ~jitter(latitude, amount = 1.5),   
    popup = ~paste("Skater Name: ", name, "<br>",
                   "Country: ", country, "<br>",
                   "Overall Rank: ", rank, "<br>",
                   "Overall Score: ", total_score, "<br>",
                   "Short Program Score: ", sp_score, "<br>",
                   "Freestyle Score: ", fs_score),
    clusterOptions = markerClusterOptions(),
    #madding colors to different places so its easier to see
    color = ~case_when(
      rank == 1 ~ "gold",
      rank == 2 ~ "gray80",
      rank == 3 ~ "brown", 
      rank <= 10 ~ "green",
      rank <= 24 ~ "purple",
      rank <= 29 ~ "red",
      TRUE ~ "black"
    ),
    radius = 8
  )
skaters_map

B. The visualization represents how each of the skaters that moved onto the free skate did in the short program. Do skaters that do better in the short program automatically do better in the freestyle too or are there variations in how high of a score a skater gets in the two performances. I noticed that a better short program score does not always mean a better freestyle score. While this didn’t exactly surprise me it made me think about how different scores could be if all competitors did both programs. In my map, you can see pretty well where most skaters are. The interesting thing is even though Europe produces the most figure skaters for the Olympics, most of the big winners are from Asia. I think I did everything I wanted to do!

C. I used wikipedia as my source: https://en.wikipedia.org/wiki/Figure_skating_at_the_2022_Winter_Olympics_%E2%80%93_Women%27s_singles I used a data set from kaggle found: https://www.kaggle.com/datasets/paultimothymooney/latitude-and-longitude-for-every-country-and-state?resource=download

And I just used the class notes and messed around with the code until I got it to do what I wanted.