ggplot2 is a visualization package part of tidyverse.
ggplot2 follows the Grammar of Graphics (GoG) [Create elegant data visualizations using the grammar of graphics, ggplot2] (https://ggplot2.tidyverse.org/)
The idea is to build graphs from the following components:
Image from The Grammar of Graphics by Leland Wilkinson
The very first time you want to use a package you first need to install it.
# if you have never downloaded tidyverse uncomment the line below and run to install it
#install.packages('tidyverse')
Load tidyverse
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.2 ✓ purrr 0.3.4
## ✓ tibble 3.0.4 ✓ dplyr 1.0.2
## ✓ tidyr 1.1.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
wine_ratings <- read_csv('WineRatings.csv')
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## X1 = col_double(),
## country = col_character(),
## description = col_character(),
## designation = col_character(),
## points = col_double(),
## price = col_double(),
## province = col_character(),
## region_1 = col_character(),
## region_2 = col_character(),
## taster_name = col_character(),
## taster_twitter_handle = col_character(),
## title = col_character(),
## variety = col_character(),
## winery = col_character()
## )
We use the View function to look at your dataframe and check that we have tidy data (each variable is a column and each observation is a row)
View(wine_ratings)
We can delete X1.
wine_ratings<-select(wine_ratings, -X1)
Let’s create a few graphs using ggplot2.
ggplot(data=wine_ratings)
Now we need to add aesthetics and geometric objects. aes is what you plot (point, line, bar, boxplot), and geoms are how you plot aes (y, x, size, color, fill, shape specify aes() inside each geom_() so that we know which aes correspond to each geoms
ggplot(data=wine_ratings)+
geom_point(aes(x=points,
y=price))
## Warning: Removed 8996 rows containing missing values (geom_point).
I am going to create a new data frame to compare Spain and the U.S. We will focus on cheap wine
Spain_and_US<- filter(wine_ratings, country %in% c("US","Spain"), price<500)
Let’s add facets
ggplot(data=Spain_and_US)+
geom_point(aes(x=points,
y=price))+
facet_wrap(~country)
Let’s add a stat layer
ggplot(data=Spain_and_US)+
geom_point(aes(x=points,
y=price))+
facet_wrap(~country)+
stat_smooth(aes(x=points, y=price), method="lm", formula = y ~ x)
p<-ggplot(Spain_and_US, aes(x=points, y=price))+geom_point()+facet_grid(~country)
p+stat_smooth(method="lm", formula = y ~ x)
Changing the theme
ggplot(data=Spain_and_US)+
geom_point(aes(x=points,
y=price, color=country))+
theme_minimal()
Adding Labels
ggplot(data=Spain_and_US)+
geom_point(aes(x=points,
y=price, color=country))+
theme_minimal()+
labs(title = "Wine Scores and Price",
x="Expert Scores",
y= "Price")
Changing Legends
ggplot(data=Spain_and_US)+
geom_point(aes(x=points,
y=price, color=country))+
theme_minimal()+
labs(title = "Wine Scores and Price",
x="Score",
y= "Price")+
scale_color_discrete(name="Country", labels= c("Spain", "United States"))