The Goal

Since last time, my goal was to create a somewhat visually appealing graph for my imported data set of the career statistics of tennis player Rafael Nadal. Previously, I had made a botched attempt at my intended graph and, as I had not watched many of Danielle’s videos, I was quite limited in my knowledge of visualising data using R.

Progress

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.6     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(readxl)

RAFA <- read_excel("RAFA.xlsx")

For reference, this is the data that I imported.

print(RAFA)
## # A tibble: 4 x 14
##   Surface     M     W     L `Win%` `Set%` `Game%` `TB%`    MS `Hld%` `Brk%`
##   <chr>   <dbl> <dbl> <dbl>  <dbl>  <dbl>   <dbl> <dbl> <dbl>  <dbl>  <dbl>
## 1 Hard      636   494   142   77.7   73.1  5737    58.6   620   86     29  
## 2 Clay      491   449    42   91.4   86.2    64.3  66.7   468   84.9   43.2
## 3 Grass      91    71    20   78     71.8    56.8  62.5    91   89.9   23.2
## 4 Carpet      8     2     6   25     30      45.2  33.3     6   73.5   13.2
## # … with 3 more variables: `A%` <dbl>, `DF%` <dbl>, `1stIn` <dbl>

My original goal was to create a graph marking the win percentage Win% against the surface type.

skills <- ggplot(data = RAFA) +
  geom_point(
    mapping = aes(
      x = Surface,
      y = "Win%"
    )
  )

print (skills)

Instead, this abomination was made

Though with some amazing help from Jenny and after finishing the data visualisation series, I did some more fiddling around and managed to create something resembling an acceptable graph.

RAFA_new <- RAFA %>% rename(win_percent = "Win%")

plot <- ggplot(RAFA_new) + 
  geom_point(aes(Surface, win_percent)) +
  ggtitle(label = "Rafael Nadal Win Percentage Over Different Surfaces", subtitle = "By: Yours Truly")

plot(plot)

As you can see, I finally accomplished my goal but felt that it was still lacking in many aspects.

A scatterplot did not seem at all appropriate for plotting this data and the y-axis should be properly labelled. It also looks quite boring with no colour at all and none of the other variables used.

So I sought to amend these issues.

plot <- ggplot(RAFA_new) + 
  geom_col(aes(Surface, win_percent, fill = M)) +
  theme_classic() +
  scale_y_continuous(name = "Win Percentage") +
  scale_fill_continuous(name = "Matches Played") + 
  ggtitle(label = "Rafael Nadal Career Statistics Over Different Surfaces", subtitle = "By: Yours Truly")

plot(plot)

Much better

Though, it turns out enough is never enough for me so I made another version with switched out variables and now it looks a lot more colourful and a little more informative.

another_one <- ggplot(RAFA_new) + 
  geom_point(aes(M, win_percent, colour = Surface), size = 5) +
  theme_classic() +
  ylab("Win Percentage") + 
  xlab("Matches Played") +
  ggtitle(label = "Rafael Nadal Career Statistics Over Different Surfaces", subtitle = "By: Yours Truly")

plot(another_one)

Although it still does not look like it is ‘science journal publishable’ quality, it will suffice for now.

Challenges/Successes

I finally managed to fulfill my goal in creating that long-awaited graph with the main hindrance being the poor naming of the Win% variable which messed up the code. Using Jenny’s helpful advice on renaming variables, I could successfully produce the intended graph and I expanded on it a little by adding in another variable, changing the graph type and making it a little more aesthetically pleasing.

The main challenge was finding something interesting to plot with the data I had imported since it was a very simple set of data and therefore I couldn’t do anything too amazing with it such as grouping variables and doing means and standard deviations.

Next steps

Now that I am quite familiar with R, I intend to do the following: