ggplot grammar of graphics- aesthetic = variable geometry= shape (boxplot) still using tidyverse + strings things together in ggplot aesthetics is for data, geom= make it change/pretty different themes: style, preference, gridlines, etc, cosmetic choices. default + theme classic
now make box plot need x axis and y axis in aesthetics cereals %>% ggplot(aes(x =, y = )) + geom_boxplot()
library(tidyverse)
## Warning in system("timedatectl", intern = TRUE): running command 'timedatectl'
## had status 1
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.1.0 ✓ dplyr 1.0.5
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 1.4.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(here)
## here() starts at /data/biostat/a089861/A089861/R Trainings
cereals <- read_csv(here("Course #2", "cereals.csv"))
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## name = col_character(),
## mfr = col_character(),
## type = col_character(),
## calories = col_double(),
## protein = col_double(),
## fat = col_double(),
## sodium = col_double(),
## fiber = col_double(),
## carbo = col_double(),
## sugars = col_double(),
## potass = col_double(),
## vitamins = col_double(),
## shelf = col_double(),
## weight = col_double(),
## cups = col_double(),
## rating = col_double()
## )
library(ggplot2)
cereals <- read_csv("cereals.csv")
##
## ── Column specification ────────────────────────────────────────────────────────
## cols(
## name = col_character(),
## mfr = col_character(),
## type = col_character(),
## calories = col_double(),
## protein = col_double(),
## fat = col_double(),
## sodium = col_double(),
## fiber = col_double(),
## carbo = col_double(),
## sugars = col_double(),
## potass = col_double(),
## vitamins = col_double(),
## shelf = col_double(),
## weight = col_double(),
## cups = col_double(),
## rating = col_double()
## )
Clean Data and Add category
##clean the data
cereals <- cereals %>%
mutate(
mfr = factor(mfr),
type = factor(type),
potass = na_if(potass, -1)
) %>%
mutate_if(is.numeric,
~na_if(.x, -1))
head(cereals)
## # A tibble: 6 x 16
## name mfr type calories protein fat sodium fiber carbo sugars potass
## <chr> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 100% Bran N C 70 4 1 130 10 5 6 280
## 2 100% Natu… Q C 120 3 5 15 2 8 8 135
## 3 All-Bran K C 70 4 1 260 9 7 5 320
## 4 All-Bran … K C 50 4 0 140 14 8 0 330
## 5 Almond De… R C 110 2 2 200 1 14 8 NA
## 6 Apple Cin… G C 110 2 2 180 1.5 10.5 10 70
## # … with 5 more variables: vitamins <dbl>, shelf <dbl>, weight <dbl>,
## # cups <dbl>, rating <dbl>
Turn data into new categories, mean and median, sd of sugar and remove missing data
cereals %>%
summarize(
mean_sugar = mean(sugars, na.rm = TRUE),
median_sugar = median(sugars, na.rm = TRUE),
sd_sugar = sd(sugars, na.rm = TRUE)
)
## # A tibble: 1 x 3
## mean_sugar median_sugar sd_sugar
## <dbl> <dbl> <dbl>
## 1 7.03 7 4.38
make new variable: cal_per_cup
cereals <- cereals %>%
mutate(cal_per_cup = calories/cups)
normal
cereals %>%
ggplot(aes(x = mfr , y = cal_per_cup, fill )) +
geom_boxplot()
make it pretty
cereals %>%
ggplot(aes(x = mfr , y = cal_per_cup, fill = mfr)) +
geom_boxplot()+
theme_classic()
FIND ONLINE:
Scales: https://ggplot2-book.org/scale-position.html
Themes: https://ggplot2-book.org/polishing.html
Add themes
coord_flip reverse coordinates/ switches back and forth
cereals %>%
ggplot(aes(x = mfr , y = cal_per_cup, fill = mfr)) +
geom_boxplot()+
theme_classic()+
coord_flip()
TOP 5 Geometries- Visualizations come from these geometry
geom_bar() is used to change the asthetics of the “x” axis
cereals %>%
ggplot(aes(x = mfr)) +
geom_bar()
add color
cereals %>%
ggplot(aes(x = mfr, fill = mfr)) +
geom_bar()
Another category, Calories Per cup
finding mean of the cal_per_cup
First make it into a dataframe
cereals %>%
group_by(mfr) %>%
summarize(avg_cal = mean(cal_per_cup))
## # A tibble: 7 x 2
## mfr avg_cal
## <fct> <dbl>
## 1 A 100
## 2 G 138.
## 3 K 145.
## 4 N 125.
## 5 P 195.
## 6 Q 125.
## 7 R 134.
change from geom_bar()
add cal_per_cup on the bar plot and y= ave_cal does not accept aesthetics, only geom_col()
Now ggplot can plot it out
cereals %>%
group_by(mfr) %>%
summarize(avg_cal = mean(cal_per_cup)) %>%
ggplot(aes(x = mfr, fill = mfr, y = avg_cal))+
geom_col()
How to use barplot to compare categorical variables 1. compare: manufacture and hot or cold cereal
compare manufacture plus color it to see if it was a hot or cold cereal
categorical, not supplying a y, so we use geom_bar to see if 2 categorical variables are dependent on eachother Brands that contain hot cereals
cereals %>%
ggplot(aes(x = mfr, fill = type)) +
geom_bar()
Arrange hot and cold cereals by percentage
cereals %>%
ggplot(aes(x = mfr, fill = type)) +
geom_bar(position = "fill")
2. Box plot, color to compare a second category geom_boxplot, compare mfr and calores, color by type
cereals %>%
ggplot(aes(x = mfr, y = cal_per_cup, fill = type)) +
geom_boxplot()
cereals %>%
ggplot(aes(x = cal_per_cup)) +
geom_histogram()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
change bins:
cereals %>%
ggplot(aes(x = cal_per_cup)) +
geom_histogram(bin = 30)
## Warning: Ignoring unknown parameters: bin
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
less bins:
cereals %>%
ggplot(aes(x = cal_per_cup)) +
geom_histogram(bin =3)
## Warning: Ignoring unknown parameters: bin
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
density, change the way it looks, smoother
cereals %>%
ggplot(aes(x = cal_per_cup, fill = mfr)) +
geom_density()
## Warning: Groups with fewer than two data points have been dropped.
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning -
## Inf
Alpha, how see through do I want it to be?
cereals %>%
ggplot(aes(x = cal_per_cup, fill = mfr)) +
geom_density(alpha = 0.5)
## Warning: Groups with fewer than two data points have been dropped.
## Warning in max(ids, na.rm = TRUE): no non-missing arguments to max; returning -
## Inf
#4 and #5 scatterplots/lineplots
instead of color, use fill to change color (gg plot= color = 1 Dimension; fill= 2 dimension)
cereals %>%
ggplot(aes(x = sugars, y = calories)) +
geom_point()
## Warning: Removed 1 rows containing missing values (geom_point).
Use Fill to change dot color (1D)
cereals %>%
ggplot(aes(x = sugars, y = calories, color = mfr)) +
geom_point()
## Warning: Removed 1 rows containing missing values (geom_point).
shape of points can be manufactured
cereals %>%
ggplot(aes(x = sugars, y = calories, shape = mfr)) +
geom_point()
## Warning: The shape palette can deal with a maximum of 6 discrete values because
## more than 6 becomes difficult to discriminate; you have 7. Consider
## specifying shapes manually if you must have them.
## Warning: Removed 9 rows containing missing values (geom_point).
Shape and color together:
cereals %>%
ggplot(aes(x = sugars, y = calories, color = mfr, shape = type)) +
geom_point()
## Warning: Removed 1 rows containing missing values (geom_point).
change size in the whole graph
cereals %>%
ggplot(aes(x = sugars, y = calories, color = mfr, shape = type )) +
geom_point(size = 2)
## Warning: Removed 1 rows containing missing values (geom_point).
change size in aesthetic- example, to see who gives more servings per cup
cereals %>%
ggplot(aes(x = sugars, y = calories, color = mfr, shape = type, size = cups)) +
geom_point()
## Warning: Removed 1 rows containing missing values (geom_point).
Now: data looks like its rounded to whole numbers geom_jitter: randomly shakes data, to make visualization easier due to rounding or exact numbers
cereals %>%
ggplot(aes(x = sugars, y = calories)) +
geom_jitter()
## Warning: Removed 1 rows containing missing values (geom_point).
ggplot cheat sheet: https://www.rstudio.com/wp-content/uploads/2015/03/ggplot2-cheatsheet.pdf
Multiple plots across categories: assign a and b
a <- cereals %>%
ggplot(aes(x = sugars, y = calories))+
geom_point()
b <- cereals %>%
ggplot(aes(x = sugars, y = calories))+
geom_jitter()
library(patchwork)
library(patchwork)
comparing a and b
a/b
## Warning: Removed 1 rows containing missing values (geom_point).
## Warning: Removed 1 rows containing missing values (geom_point).
Facetwrap - categorical variable only
sugars vs calories by manufacture
cereals %>%
ggplot(aes(x = sugars, y = calories))+
geom_jitter()+
facet_wrap(~mfr)
## Warning: Removed 1 rows containing missing values (geom_point).
templates in R Markdown:
What did you learn about cereals? Write a few sentences summarizing your findings, knit your document, and admire your handiwork!