Introduction:
The question we are trying to answer through this Olympic dataset is what traits and characteristic make certain athletes succeed in their specific sport, and how does their county benefit and effect the outcome. This relates to real world questions because it explores how different levels of opportunity can impact the success of an individual. We will be looking at all the Olympic games from the start of the Olympics to now. We will definitely include information on trends over time and how people and countries that are winning have changed. We really want to focus on the traits such as height and weight to see how much of an impact they have on winning. From their we will explore how their counties impact their ability to win to hopefully come to a conclusion about opportunity impacting success.
Our data comes from real Olympic data from a public database. The observational units are peoples names and then the information about them such as county, physical traits, winnings, and event. Every variable in each row is describing the person assigned to that specific row. Our data has been collected from 1896 when the olympics started to the last Olympics in 2024. The key variables we will be examining is country, height in inches, and weight in lbs. We will be using the height and the weight to come to a conclusion about the effects of certain heights and weights in winning, and then tie in the country to those winners to see if that has an effect.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr 1.2.0 âś” readr 2.2.0
## âś” forcats 1.0.1 âś” stringr 1.6.0
## âś” ggplot2 4.0.2 âś” tibble 3.3.1
## âś” lubridate 1.9.5 âś” tidyr 1.3.2
## âś” purrr 1.2.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
## Rows: 271116 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): Name, Sex, Team, NOC, Games, Season, City, Sport, Event, Medal
## dbl (5): ID, Age, Height, Weight, Year
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 271,116 Ă— 15
## ID Name Sex Age Height Weight Team NOC
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
## 1 1 A Dijiang M 24 180 80 China CHN
## 2 2 A Lamusi M 23 170 60 China CHN
## 3 3 Gunnar Nielsen Aaby M 24 NA NA Denmark DEN
## 4 4 Edgar Lindenau Aabye M 34 NA NA Denmark/Sweden DEN
## 5 5 Christine Jacoba Aaftink F 21 185 82 Netherlands NED
## 6 5 Christine Jacoba Aaftink F 21 185 82 Netherlands NED
## 7 5 Christine Jacoba Aaftink F 25 185 82 Netherlands NED
## 8 5 Christine Jacoba Aaftink F 25 185 82 Netherlands NED
## 9 5 Christine Jacoba Aaftink F 27 185 82 Netherlands NED
## 10 5 Christine Jacoba Aaftink F 27 185 82 Netherlands NED
## Games Year Season City Sport
## <chr> <dbl> <chr> <chr> <chr>
## 1 1992 Summer 1992 Summer Barcelona Basketball
## 2 2012 Summer 2012 Summer London Judo
## 3 1920 Summer 1920 Summer Antwerpen Football
## 4 1900 Summer 1900 Summer Paris Tug-Of-War
## 5 1988 Winter 1988 Winter Calgary Speed Skating
## 6 1988 Winter 1988 Winter Calgary Speed Skating
## 7 1992 Winter 1992 Winter Albertville Speed Skating
## 8 1992 Winter 1992 Winter Albertville Speed Skating
## 9 1994 Winter 1994 Winter Lillehammer Speed Skating
## 10 1994 Winter 1994 Winter Lillehammer Speed Skating
## Event Medal
## <chr> <chr>
## 1 Basketball Men's Basketball <NA>
## 2 Judo Men's Extra-Lightweight <NA>
## 3 Football Men's Football <NA>
## 4 Tug-Of-War Men's Tug-Of-War Gold
## 5 Speed Skating Women's 500 metres <NA>
## 6 Speed Skating Women's 1,000 metres <NA>
## 7 Speed Skating Women's 500 metres <NA>
## 8 Speed Skating Women's 1,000 metres <NA>
## 9 Speed Skating Women's 500 metres <NA>
## 10 Speed Skating Women's 1,000 metres <NA>
## # ℹ 271,106 more rows
## `summarise()` has regrouped the output.
## ℹ Summaries were computed grouped by Year and is_medal.
## ℹ Output is grouped by Year.
## ℹ Use `summarise(.groups = "drop_last")` to silence this message.
## ℹ Use `summarise(.by = c(Year, is_medal))` for per-operation grouping
## (`?dplyr::dplyr_by`) instead.
Fig. 1. A line plot displaying the average heights of Olympic athletes over time in swimming. Data was taken from a database of olympic participants.
Alt text: This is a line graph that has the year on the x-axis and the average height on the y-axis. The years range from 1972 to 2016, and the average heights range from 173 cm to 184 cm. Over time, the average heights seem to increase until around 1990 and then level out. In addition, average heights for medal-winners seems to constantly be around 2-3 cm taller than non-medal-winners.
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Medal = fct_explicit_na(Medal, na_level = "No medal won")`.
## Caused by warning:
## ! `fct_explicit_na()` was deprecated in forcats 1.0.0.
## ℹ Please use `fct_na_value_to_level()` instead.
## Warning: Removed 16 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
## Warning: Removed 16 rows containing missing values or values outside the scale range
## (`geom_point()`).
Fig. 2. Higher median and interquartile range of age in male 100 meter dash gold medalists. Data collected observationaly from Olympic athletes, filtered by athletes participating after 1960. Grouped by medal awarded to athlete.
## # A tibble: 4 Ă— 6
## Medal `Min Age` `Max Age` `Mean Age` `Med Age` `Age IQR`
## <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Gold 21 32 24.9 25 5.5
## 2 Silver 20 34 24.9 24 3
## 3 Bronze 20 30 24.5 24 4
## 4 No medal won 15 40 24.2 24 6
Alt text:
This is a jittered box plot representing the relationship between age and an athlete’s medaling in the men’s 100 meter dash. On the x-axis is the medal awarded to the athlete, and on the y-axis is the age of the athlete. Age takes values between 15 and 40 and is measured in years. Medal takes the values Gold, Silver, Bronze, and No medal won. Based on the plot, it seems that the age of gold medalists has a higher median and interquartile range than that of silver and bronze medalists.
Interpretation:
Though we observe a significant range of age values within the non medal category, the slim IQR of Gold, Silver, and Bronze indicate a correlation between medaling and an age between twenty and twenty-seven. It is, however, salient to note that there are multiple outliers in each medaling category, implying that the correlation may not be significant.
Fig. 3. Higher mean height (cm) in male 100 meter dash gold and silver medalists. Data collected observationaly from Olympic athletes, filtered by athletes participating after 1960. Grouped by year of event and medal awarded to athlete.
Alt text:
This is a line graph describing the relationship between mean height in centimeters and medaling by year. On the x-axis is the year the Olympic event took place, and on the y-axis is the mean height of the athletes from each medaling group. Furthermore, the graph is faceted by medal. Year takes values between 1960 and 2016, mean height takes values between 167cm and 196cm, and medal takes the values Gold, Silver, Bronze, and No medal won. Based on the graph, it seems that both gold medalists and non medalists have shown a positive trend in average height over time since 1960. Furthermore, it seems that the average height of gold and silver medalists has consistently remained above the average height of non medalists over time.
Interpretation:
Based on the significant difference between the mean height of medalists and non medalists presented in the graph, we may conclude that there is likely positive correlation between height and medaling among 100 meter dash runners. This distinction is most pronounced among gold and silver medalists, who display a relatively positive trend in height over time; additionally the male 100 meter dash gold medalist from 2008, 2012, and 2016 (Usain Bolt) was over 15 cm taller than the mean height of a 100 meter dash Olympian in that same period.
Fig. 4. Higher mean weight (kg) in male 100 meter dash gold medalists. Data collected observationaly from Olympic athletes, filtered by athletes participating after 1960. Grouped by year of event and medal awarded to athlete.
Alt text:
This is a line graph describing the relationship between mean weight in kilograms and medaling by year. On the x-axis is the year the Olympic event took place, and on the y-axis is the mean weight of the athletes from each medaling group. Furthermore, the graph is faceted by medal. Year takes values between 1960 and 2016, mean weight takes values between 61kg and 95kg, and medal takes the values Gold, Silver, Bronze, and No medal won. Based on the graph, it seems that both gold medalists and non medalists have shown a positive trend in average weight over time since 1960. Furthermore, it seems that the average weight of gold medalists has consistently remained above the average weight of non medalists over time.
Interpretation:
The above line graph displays a significant difference between the mean weight in kilograms between 100 meter dash gold medalists and non-medalists over time. What’s more, we observe a somewhat positive trend in mean weight of both non medalists and gold medalists over time. We conclude that the increase in mean weight over time displays a correlation between height and medaling—more specifically, the dominant athletic framework of an 100 meter dash Olympic runner has become increasingly heavier over the 56 year period.