Welcome to R at the Brandeis Library!

What is ggplot2?

Image from The Grammar of Graphics by Leland Wilkinson

Arguments for ggplot2 funtions:

Let’s install/load tidyverse!

The very first time you want to use a package you first need to install it.

# if you have never downloaded tidyverse uncomment the line below and run to install it
#install.packages('tidyverse')

Load tidyverse

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2     ✓ purrr   0.3.4
## ✓ tibble  3.0.4     ✓ dplyr   1.0.2
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

Let’s learn ggplot2 with some wine

  • We will use the WineRatings.csv dataset.
wine_ratings <- read_csv('WineRatings.csv')
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   country = col_character(),
##   description = col_character(),
##   designation = col_character(),
##   points = col_double(),
##   price = col_double(),
##   province = col_character(),
##   region_1 = col_character(),
##   region_2 = col_character(),
##   taster_name = col_character(),
##   taster_twitter_handle = col_character(),
##   title = col_character(),
##   variety = col_character(),
##   winery = col_character()
## )

We use the View function to look at your dataframe and check that we have tidy data (each variable is a column and each observation is a row)

View(wine_ratings)

We can delete X1.

wine_ratings<-select(wine_ratings, -X1)

Let’s create a few graphs using ggplot2.

ggplot(data=wine_ratings)

Now we need to add aesthetics and geometric objects. aes is what you plot (point, line, bar, boxplot), and geoms are how you plot aes (y, x, size, color, fill, shape specify aes() inside each geom_() so that we know which aes correspond to each geoms

ggplot(data=wine_ratings)+
  geom_point(aes(x=points,
                 y=price))
## Warning: Removed 8996 rows containing missing values (geom_point).

I am going to create a new data frame to compare Spain and the U.S. We will focus on cheap wine

Spain_and_US<- filter(wine_ratings, country %in% c("US","Spain"), price<500)

Let’s add facets

ggplot(data=Spain_and_US)+
  geom_point(aes(x=points,
                 y=price))+
  facet_wrap(~country)

Let’s add a stat layer

ggplot(data=Spain_and_US)+
  geom_point(aes(x=points,
                 y=price))+
  facet_wrap(~country)+
  stat_smooth(aes(x=points, y=price), method="lm", formula = y ~ x)

p<-ggplot(Spain_and_US, aes(x=points, y=price))+geom_point()+facet_grid(~country)

p+stat_smooth(method="lm", formula = y ~ x)

Changing the theme

ggplot(data=Spain_and_US)+
  geom_point(aes(x=points,
                 y=price, color=country))+
  theme_minimal()

Adding Labels

ggplot(data=Spain_and_US)+
  geom_point(aes(x=points,
                 y=price, color=country))+
  theme_minimal()+
  labs(title = "Wine Scores and Price",
       x="Expert Scores",
       y= "Price")

Changing Legends

ggplot(data=Spain_and_US)+
  geom_point(aes(x=points,
                 y=price, color=country))+
  theme_minimal()+
  labs(title = "Wine Scores and Price",
       x="Score",
       y= "Price")+
  scale_color_discrete(name="Country", labels= c("Spain", "United States"))