Since last time, my goal was to create a somewhat visually appealing graph for my imported data set of the career statistics of tennis player Rafael Nadal. Previously, I had made a botched attempt at my intended graph and, as I had not watched many of Danielle’s videos, I was quite limited in my knowledge of visualising data using R.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.0.6 ✓ dplyr 1.0.4
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(readxl)
RAFA <- read_excel("RAFA.xlsx")
For reference, this is the data that I imported.
print(RAFA)
## # A tibble: 4 x 14
## Surface M W L `Win%` `Set%` `Game%` `TB%` MS `Hld%` `Brk%`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 Hard 636 494 142 77.7 73.1 5737 58.6 620 86 29
## 2 Clay 491 449 42 91.4 86.2 64.3 66.7 468 84.9 43.2
## 3 Grass 91 71 20 78 71.8 56.8 62.5 91 89.9 23.2
## 4 Carpet 8 2 6 25 30 45.2 33.3 6 73.5 13.2
## # … with 3 more variables: `A%` <dbl>, `DF%` <dbl>, `1stIn` <dbl>
My original goal was to create a graph marking the win percentage Win% against the surface type.
skills <- ggplot(data = RAFA) +
geom_point(
mapping = aes(
x = Surface,
y = "Win%"
)
)
print (skills)
Instead, this abomination was made
Though with some amazing help from Jenny and after finishing the data visualisation series, I did some more fiddling around and managed to create something resembling an acceptable graph.
RAFA_new <- RAFA %>% rename(win_percent = "Win%")
plot <- ggplot(RAFA_new) +
geom_point(aes(Surface, win_percent)) +
ggtitle(label = "Rafael Nadal Win Percentage Over Different Surfaces", subtitle = "By: Yours Truly")
plot(plot)
As you can see, I finally accomplished my goal but felt that it was still lacking in many aspects.
A scatterplot did not seem at all appropriate for plotting this data and the y-axis should be properly labelled. It also looks quite boring with no colour at all and none of the other variables used.
So I sought to amend these issues.
plot <- ggplot(RAFA_new) +
geom_col(aes(Surface, win_percent, fill = M)) +
theme_classic() +
scale_y_continuous(name = "Win Percentage") +
scale_fill_continuous(name = "Matches Played") +
ggtitle(label = "Rafael Nadal Career Statistics Over Different Surfaces", subtitle = "By: Yours Truly")
plot(plot)
Much better
Though, it turns out enough is never enough for me so I made another version with switched out variables and now it looks a lot more colourful and a little more informative.
another_one <- ggplot(RAFA_new) +
geom_point(aes(M, win_percent, colour = Surface), size = 5) +
theme_classic() +
ylab("Win Percentage") +
xlab("Matches Played") +
ggtitle(label = "Rafael Nadal Career Statistics Over Different Surfaces", subtitle = "By: Yours Truly")
plot(another_one)
Although it still does not look like it is ‘science journal publishable’ quality, it will suffice for now.
I finally managed to fulfill my goal in creating that long-awaited graph with the main hindrance being the poor naming of the Win% variable which messed up the code. Using Jenny’s helpful advice on renaming variables, I could successfully produce the intended graph and I expanded on it a little by adding in another variable, changing the graph type and making it a little more aesthetically pleasing.
The main challenge was finding something interesting to plot with the data I had imported since it was a very simple set of data and therefore I couldn’t do anything too amazing with it such as grouping variables and doing means and standard deviations.
Now that I am quite familiar with R, I intend to do the following: