library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.5.3
## -- Attaching packages --------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.1.1 v purrr 0.3.2
## v tibble 2.1.1 v dplyr 0.8.0.1
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.5.3
## Warning: package 'tibble' was built under R version 3.5.3
## Warning: package 'tidyr' was built under R version 3.5.3
## Warning: package 'readr' was built under R version 3.5.3
## Warning: package 'purrr' was built under R version 3.5.3
## Warning: package 'dplyr' was built under R version 3.5.3
## Warning: package 'stringr' was built under R version 3.5.3
## Warning: package 'forcats' was built under R version 3.5.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(lubridate)
## Warning: package 'lubridate' was built under R version 3.5.3
##
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
##
## date
theme_set(theme_light())
player_dob <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-09/player_dob.csv")
## Parsed with column specification:
## cols(
## name = col_character(),
## grand_slam = col_character(),
## date_of_birth = col_date(format = ""),
## date_of_first_title = col_date(format = ""),
## age = col_double()
## )
# Removing some players in 1977 who were duplicated
grand_slams <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-09/grand_slams.csv") %>%
arrange(year, grand_slam, name, gender) %>%
distinct(year, grand_slam, name, .keep_all = TRUE) %>%
mutate(grand_slam = str_replace(str_to_title(str_replace(grand_slam, "_", " ")), "Us", "US"))
## Parsed with column specification:
## cols(
## year = col_double(),
## grand_slam = col_character(),
## name = col_character(),
## rolling_win_count = col_double(),
## tournament_date = col_date(format = ""),
## gender = col_character()
## )
grand_slam_timeline <- readr::read_csv("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-04-09/grand_slam_timeline.csv")
## Parsed with column specification:
## cols(
## player = col_character(),
## year = col_double(),
## tournament = col_character(),
## outcome = col_character(),
## gender = col_character()
## )
grand_slam_timeline %>%
count(year, tournament, gender) %>%
arrange(year) %>%
View()
grand_slam_timeline %>%
filter(year == 2018, tournament == "US Open") %>%
count(outcome, sort = TRUE)
## # A tibble: 11 x 2
## outcome n
## <chr> <int>
## 1 2nd Round 8
## 2 3rd Round 6
## 3 Retired 6
## 4 1st Round 5
## 5 4th Round 5
## 6 Absent 5
## 7 Quarterfinalist 4
## 8 Semi-finalist 3
## 9 Finalist 2
## 10 Won 2
## 11 Qualification Stage 1 1
grand_slams %>%
count(name, grand_slam, sort = TRUE) %>%
add_count(name, wt = n) %>%
filter(n >= 8) %>%
mutate(name = fct_reorder(name, n, sum)) %>%
ggplot(aes(name, n, fill = grand_slam)) +
geom_col() +
coord_flip() +
labs(x = "",
y = "# of Grand Slam tournaments won",
title = "Tennis players with the most Grand Slam tournament wins",
subtitle = "1968-Present",
fill = "Grand Slam")
grand_slam_timeline %>%
count(year, tournament, gender) %>%
arrange(year) %>%
View()
grand_slam_timeline %>%
filter(year == 2018, tournament == "US Open") %>%
count(outcome, sort = TRUE)
## # A tibble: 11 x 2
## outcome n
## <chr> <int>
## 1 2nd Round 8
## 2 3rd Round 6
## 3 Retired 6
## 4 1st Round 5
## 5 4th Round 5
## 6 Absent 5
## 7 Quarterfinalist 4
## 8 Semi-finalist 3
## 9 Finalist 2
## 10 Won 2
## 11 Qualification Stage 1 1
This chart represents the top 15 male and female tennis players with the most Grand Slam tournament wins from 1968-Present. The ranking, from 1st to 15th goes as following: Serena Williams, Steffi Graf, Roger Federer, Martina Navaratilova, Chris Evert, Rafael Nadal, Novak Djokovic, Pete Sampras, Margaret Court, Björn Borg, Monica Seles, Billie Jean King, Jimmy Connors, Ivan Lendl, and Andre Agassi.
For organizational purposes, the chart is broken up by each of the 4 Grand Slam Tournaments. Purple represents Wimbledon, blue represents the US Open, green represents the French Open, and red represents the Australian Open.
So not only does it show how many tournaments they have won in total of Grand Slam Tournaments, but it also shows how many of each of the 4 tournaments they won. For example, Ivan Lendl and Monica Seles are both in the top 15 Grand Slam winners, however, neither players have won a single Wimbledon tournament. Björn Borg had only won Wimbledon and French Open tournaments. Serena Williams has won the most Grand Slam tournaments in total but Rafael Nadal has won almost triple the French Open tournaments as her.
This graph covers 51 years of wins at Grand Slam tournaments, so the players listed are a variety of players that are still playing as well as those who have since retired. For example, Björn Borg retired in 1983 and was once considered No. 1 in the world, but many players have surpassed his records since. Serena Williams was born only 2 years prior to Mr. Borg’s retirement and she reached the No. 1 ranking for the first time on July 8, 2002. Steffi Graf is also a player that has retired from the game but is ranked 2nd between two players that are still actively playing (Serena Williams and Roger Federer).
Data set grand slams piped with count function in order to find out most frequent winners from 1968 to present. Serena Williams has the most with 23 grand slam wins Create bar graph using ggplot and aes Use “labs” to label y axis as # of Grand Slam tournaments won, title labeled as Tennis players with the most Grand Slam tournament wins Subtitle as 1968-Present Add grand_slam to the count function to determine which grand slams players won Have the fill equal grand-slam Use add_count function to find out who has won the most grand slams