This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.

Import data

devtools::install_github("thebioengineer/tidytuesdayR")

library(tidytuesdayR)
library(tidyverse)
tuesdata <- tidytuesdayR::tt_load('2020-04-07')

tdf_winners <- tuesdata$tdf_winners
(tuesdata$tdf_winners)
## # A tibble: 106 x 19
##    edition start_date winner_name winner_team distance time_overall time_margin
##      <dbl> <date>     <chr>       <chr>          <dbl>        <dbl>       <dbl>
##  1       1 1903-07-01 Maurice Ga… La Françai…     2428         94.6        2.99
##  2       2 1904-07-02 Henri Corn… Conte           2428         96.1        2.27
##  3       3 1905-07-09 Louis Trou… Peugeot–Wo…     2994         NA         NA   
##  4       4 1906-07-04 René Potti… Peugeot–Wo…     4637         NA         NA   
##  5       5 1907-07-08 Lucien Pet… Peugeot–Wo…     4488         NA         NA   
##  6       6 1908-07-13 Lucien Pet… Peugeot–Wo…     4497         NA         NA   
##  7       7 1909-07-05 François F… Alcyon–Dun…     4498         NA         NA   
##  8       8 1910-07-01 Octave Lap… Alcyon–Dun…     4734         NA         NA   
##  9       9 1911-07-02 Gustave Ga… Alcyon–Dun…     5343         NA         NA   
## 10      10 1912-06-30 Odile Defr… Alcyon–Dun…     5289         NA         NA   
## # … with 96 more rows, and 12 more variables: stage_wins <dbl>,
## #   stages_led <dbl>, height <dbl>, weight <dbl>, age <dbl>, born <date>,
## #   died <date>, full_name <chr>, nickname <chr>, birth_town <chr>,
## #   birth_country <chr>, nationality <chr>

Description of the data and definition of variables

For this data set I actually had to google mutiple different things and download the package that I needed to have to be able to open the data set which was called devtools. Basically what this data set is showing us is a list of all the winners of the Tour de Fance since it first began. Through that list it has tons of different variables about the winners such as distance, time, and time margin. However, thats not the only data that it shows but it also disects physical attributes about each of the winner, when they were born or passed away, nationality, and even birth country. The purpose of the data is to be able to extract different variables to find comparisons in the winners.

Visualize data

Hint: One graph of your choice.

tdf_winners %>%
  count(birth_country, sort = TRUE) %>%
  mutate(birth_country = fct_reorder(birth_country, n)) %>%
  ggplot(aes(n, birth_country)) +
  geom_col() +
  labs(y = "Birth Countries",
       x = "Number of Wins",
       title = "Most Successful Countries in the Tour De France")

What is the story behind the graph?

This plot also caused me some issues impputing as well that I had to go back and rewrite and change some of the inital code that I orginally inputted. The reason that I chose this graph is because I thought it was a really interesting to see what countries have been able to produce the most winners for the Tour De France. After inputing that graph I thought that it was just as interesting to see that France has had the most succsessful bikers throughout history having 36 wins when most other countries have only had a few. Which makes me wonder if there has ever been any scandles that have came out about France athletes cheating in attemtpts to win the Tour De France since its their home town, like how Russia cheated during the winter olympics when they hosted it back in 2014. USA is also preforming fairly well in the rankings in 5th place with 10 Tour de France victories thanks to Neil Armstrong who recieved 7 in consecutive years.

Hide the messages, but display the code and its results on the webpage.

Write your name for the author at the top.

Use the correct slug.