A. The Winter Olympics were last held in Beijing in 2022. The games were very exciting and my personal favorite sport in the winter Olympics is figure skating. So, I have taken this project as an opportunity to make a map showing where all the female skaters who competed are from and their scores. I thought this would be an interesting project, not only because I love figure skating and this is something I am genuinely interested in, but also because you may be able to see some patterns as to where the skaters who do well are from. To make this happen, I used a chat from Wikipedia that was a summary of several tables from ISU (International Skating Union). This table showed the skaters name, country they represented, their overall rank, and their rank and score for their short program and freestyle performances. I then merged it with a data set from Kaggle that has the coordinates of every country so I could plot where these skaters were from on a map.
First, I loaded the libraries needed for this project. I used tidyverse, rvest for web scraping, and leaflet for mapping.
library (plotly)
Loading required package: ggplot2
Attaching package: 'plotly'
The following object is masked from 'package:ggplot2':
last_plot
The following object is masked from 'package:stats':
filter
The following object is masked from 'package:graphics':
layout
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks plotly::filter(), stats::filter()
✖ readr::guess_encoding() masks rvest::guess_encoding()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
I web scraped for my main table from Wikipedia.
#Getting my websitewikipedia_url <-"https://en.wikipedia.org/wiki/Figure_skating_at_the_2022_Winter_Olympics_%E2%80%93_Women%27s_singles#cite_ref-isu-progress-W-TO_48-0"wiki_page <-read_html(wikipedia_url)#Getting the table I need into my datasets on R Studiowiki_tables <- wiki_page %>%html_nodes("table.wikitable") %>%html_table()wikipedia_data <- wiki_tables[[5]]#I had to rename my header but the normal cleaning was not working so I just renamed my headersnames(wikipedia_data) <-c("rank_old", "name", "country", "total_score", "sp_rank", "sp_score", "fs_rank", "fs_score")#The rank for 1st, 2nd, and 3rd on the Wikipedia sight were pictures of medals so I had to fix thatwikipedia_data <- wikipedia_data %>%mutate(rank =1:n())wikipedia_data <- wikipedia_data %>%select("name", "country", "total_score", "sp_rank", "sp_score", "fs_rank", "fs_score", "rank")#I also needed to change ROC to Russia so that I could easily merge and map later and then UK was losted as Great Britain so I had to change thatwikipedia_data <- wikipedia_data %>%mutate(country =str_replace(country, "ROC", "Russia")) %>%mutate(country =str_replace(country, "Great Britain", "United Kingdom"))
I loaded my csv with the coordinates and merged them.
#Here I loaded my csv and merged my datacountries_data <-read_csv("world_country_and_usa_states_latitude_and_longitude_values.csv")
Rows: 245 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (4): country_code, country, usa_state_code, usa_state
dbl (4): latitude, longitude, usa_state_latitude, usa_state_longitude
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
final_data <-left_join(wikipedia_data, countries_data, by ="country")#I had a lot of extra column from the csv that I deletedfinal_data <- final_data %>%select("name", "country", "total_score", "sp_rank", "sp_score", "fs_rank", "fs_score", "rank", "latitude", "longitude")#I replace dashes and N/A with "0"final_data <- final_data %>%mutate(across(everything(), ~str_replace(., "N/A", "0")))#I made sure everything was numeric final_data <- final_data %>%mutate(across(c(sp_rank, sp_score, fs_rank, fs_score, total_score, rank, latitude, longitude), as.numeric))
Warning: There were 4 warnings in `mutate()`.
The first warning was:
ℹ In argument: `across(...)`.
Caused by warning:
! NAs introduced by coercion
ℹ Run `dplyr::last_dplyr_warnings()` to see the 3 remaining warnings.
head(final_data)
# A tibble: 6 × 10
name country total_score sp_rank sp_score fs_rank fs_score rank latitude
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Anna Shc… Russia 256. 1 80.2 2 176. 1 61.5
2 Alexandr… Russia 252. 3 74.6 1 177. 2 61.5
3 Kaori Sa… Japan 233. 2 79.8 3 153. 3 36.2
4 Wakaba H… Japan 214. 4 73.5 5 141. 4 36.2
5 You Young South … 213. 5 70.3 4 143. 5 35.9
6 Alysa Liu United… 209. 7 69.5 6 139. 6 37.1
# ℹ 1 more variable: longitude <dbl>
Plot
#I deleted the last few rows because those contain skaters that didnt ove onto the freestyle programplot_data <- final_data %>%slice(1:(n()-6))#making my plotmy_plot <-ggplot(plot_data, aes(x = sp_score, y = fs_score, color = rank, text =paste("Skater Name:", name, "\n","Overall Rank:", rank, "\n","Short Program Score:", sp_score, "\n","Freestyle Score:", fs_score))) +geom_point(size =3) +scale_color_gradient(low ="lightblue", high ="mediumpurple") +theme_bw() +labs(title ="Figure Skating: Short Program vs Freestyle Scores",x ="Short Program Score", y ="Freestyle Score",color ="Overall Rank",caption ="Data Source: Wikipedia, ISU" )ggplotly(my_plot, tooltip ="text")
Map
#Here I made my mapskaters_map <- final_data %>%leaflet() %>%addTiles() %>%#because there were some skaters from the same country I had to jitteraddCircleMarkers(lng =~jitter(longitude, amount =1.5), lat =~jitter(latitude, amount =1.5), popup =~paste("Skater Name: ", name, "<br>","Country: ", country, "<br>","Overall Rank: ", rank, "<br>","Overall Score: ", total_score, "<br>","Short Program Score: ", sp_score, "<br>","Freestyle Score: ", fs_score),clusterOptions =markerClusterOptions(),#madding colors to different places so its easier to seecolor =~case_when( rank ==1~"gold", rank ==2~"gray80", rank ==3~"brown", rank <=10~"green", rank <=24~"purple", rank <=29~"red",TRUE~"black" ),radius =8 )skaters_map
B. The visualization represents how each of the skaters that moved onto the free skate did in the short program. Do skaters that do better in the short program automatically do better in the freestyle too or are there variations in how high of a score a skater gets in the two performances. I noticed that a better short program score does not always mean a better freestyle score. While this didn’t exactly surprise me it made me think about how different scores could be if all competitors did both programs. In my map, you can see pretty well where most skaters are. The interesting thing is even though Europe produces the most figure skaters for the Olympics, most of the big winners are from Asia. I think I did everything I wanted to do!
C. I used wikipedia as my source: https://en.wikipedia.org/wiki/Figure_skating_at_the_2022_Winter_Olympics_%E2%80%93_Women%27s_singles I used a data set from kaggle found: https://www.kaggle.com/datasets/paultimothymooney/latitude-and-longitude-for-every-country-and-state?resource=download
And I just used the class notes and messed around with the code until I got it to do what I wanted.