This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.
devtools::install_github("thebioengineer/tidytuesdayR")
library(tidytuesdayR)
library(tidyverse)
tuesdata <- tidytuesdayR::tt_load('2020-04-07')
tdf_winners <- tuesdata$tdf_winners
(tuesdata$tdf_winners)
## # A tibble: 106 x 19
## edition start_date winner_name winner_team distance time_overall time_margin
## <dbl> <date> <chr> <chr> <dbl> <dbl> <dbl>
## 1 1 1903-07-01 Maurice Ga… La Françai… 2428 94.6 2.99
## 2 2 1904-07-02 Henri Corn… Conte 2428 96.1 2.27
## 3 3 1905-07-09 Louis Trou… Peugeot–Wo… 2994 NA NA
## 4 4 1906-07-04 René Potti… Peugeot–Wo… 4637 NA NA
## 5 5 1907-07-08 Lucien Pet… Peugeot–Wo… 4488 NA NA
## 6 6 1908-07-13 Lucien Pet… Peugeot–Wo… 4497 NA NA
## 7 7 1909-07-05 François F… Alcyon–Dun… 4498 NA NA
## 8 8 1910-07-01 Octave Lap… Alcyon–Dun… 4734 NA NA
## 9 9 1911-07-02 Gustave Ga… Alcyon–Dun… 5343 NA NA
## 10 10 1912-06-30 Odile Defr… Alcyon–Dun… 5289 NA NA
## # … with 96 more rows, and 12 more variables: stage_wins <dbl>,
## # stages_led <dbl>, height <dbl>, weight <dbl>, age <dbl>, born <date>,
## # died <date>, full_name <chr>, nickname <chr>, birth_town <chr>,
## # birth_country <chr>, nationality <chr>
For this data set I actually had to google mutiple different things and download the package that I needed to have to be able to open the data set which was called devtools. Basically what this data set is showing us is a list of all the winners of the Tour de Fance since it first began. Through that list it has tons of different variables about the winners such as distance, time, and time margin. However, thats not the only data that it shows but it also disects physical attributes about each of the winner, when they were born or passed away, nationality, and even birth country. The purpose of the data is to be able to extract different variables to find comparisons in the winners.
Hint: One graph of your choice.
tdf_winners %>%
count(birth_country, sort = TRUE) %>%
mutate(birth_country = fct_reorder(birth_country, n)) %>%
ggplot(aes(n, birth_country)) +
geom_col() +
labs(y = "Birth Countries",
x = "Number of Wins",
title = "Most Successful Countries in the Tour De France")
This plot also caused me some issues impputing as well that I had to go back and rewrite and change some of the inital code that I orginally inputted. The reason that I chose this graph is because I thought it was a really interesting to see what countries have been able to produce the most winners for the Tour De France. After inputing that graph I thought that it was just as interesting to see that France has had the most succsessful bikers throughout history having 36 wins when most other countries have only had a few. Which makes me wonder if there has ever been any scandles that have came out about France athletes cheating in attemtpts to win the Tour De France since its their home town, like how Russia cheated during the winter olympics when they hosted it back in 2014. USA is also preforming fairly well in the rankings in 5th place with 10 Tour de France victories thanks to Neil Armstrong who recieved 7 in consecutive years.