── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
ggplot(medal_counts, aes(x = Year)) +geom_line(aes(y = prop_gold, color ="Gold"), linewidth =1) +geom_line(aes(y = prop_silver, color ="Silver"), linewidth =1) +geom_line(aes(y = prop_bronze, color ="Bronze"), linewidth =1) +facet_wrap(~ Season) +labs(title ="Proportion of Medal Types Over Time",y ="Proportion",color ="Medal Type")
This data shows that the overall medals have generally increased over the years. While there was a decrease early in the Summer Olympics, it has continued to increase over time.
Question 2: Average Age of Olympians
ages <- df %>%filter(!is.na(Age), Age >0)avg_age <- ages %>%group_by(Year) %>%summarise(mean_age =mean(Age, na.rm =TRUE))ggplot(avg_age, aes(x = Year, y = mean_age)) +geom_line(linewidth =1, color ="darkred") +geom_point(color ="darkred") +labs(title ="Average Age of Olympic Athletes Over Time",x ="Year",y ="Average Age") +theme_minimal()
The graph shows that while Olympic athletes have generally remained in their twenties over the years, 1932 was a noticeable outlier, as it had a much higher average age at around 32.
# A tibble: 10 × 2
NOC total_medals
<chr> <int>
1 USA 5637
2 URS 2503
3 GER 2165
4 GBR 2068
5 FRA 1777
6 ITA 1637
7 SWE 1536
8 CAN 1352
9 AUS 1320
10 RUS 1165
top10 <- country_totals %>%top_n(10, total_medals)ggplot(top10, aes(x =reorder(NOC, total_medals), y = total_medals, fill = NOC)) +geom_bar(stat ="identity") +coord_flip() +labs(title ="Top 10 Countries by Total Medals (All Years)",x ="Country (NOC)",y ="Total Medals") +theme_minimal() +theme(legend.position ="none")
The data shows that the USA has absolutely dominated in total medals taken home. A far second is the USR, otherwise known as the Soviet Union, which is impressive considering they don’t exist anymore. It is worth noting that Russia has it’s own separate count.
Question 4: Physiological Change over Time
athletes <-read_csv("data/athlete_events.csv")
Rows: 271116 Columns: 15
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (10): Name, Sex, Team, NOC, Games, Season, City, Sport, Event, Medal
dbl (5): ID, Age, Height, Weight, Year
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
ggplot(phys_trends, aes(x = Year, y = avg_age)) +geom_line(color ="purple", linewidth =1.2) +geom_point(color ="purple") +labs(title ="Average Athlete Age Over Time",x ="Year",y ="Age" ) +theme_minimal(base_size =14)
Strangely enough, even those these are showing three different factors, the graphs look similar. They show that for height, weight, and age, they slowly increased, and when they peaked, they dramatically decreased, before leveling out recently, remaining fairly consistent. Notably, height and weight peaked at the same time, while age peaked a little later.