1. Load the tidyverse

library(tidyverse)
data(mpg)

2. Ggplot

ggplot(mpg, aes(x = displ, y = hwy))+
  geom_point()

2.1. Facets

wrapping works fine

ggplot(mpg, aes(x = displ, y = hwy))+
  geom_point()+
  facet_wrap(.~class)

gridding works better, maybe (by columns)

ggplot(mpg, aes(x = displ, y = hwy))+
  geom_point()+
  facet_grid(.~class)

gridding by rows

ggplot(mpg, aes(x = displ, y = hwy))+
  geom_point()+
  facet_grid(class~.)

faceting with two variables - empty facets mean there’s no data there

ggplot(mpg, aes(x = displ, y = hwy))+
  geom_point()+
  facet_grid(drv~cyl)

2.2. Geom_smooth

gray area is the standard error

ggplot(mpg, aes(x = displ, y = hwy))+
  geom_smooth()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

removing the standard error bar:

ggplot(mpg, aes(x = displ, y = hwy))+
  geom_smooth(se = FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

multiple geometries:

ggplot(mpg, aes(x = displ, y = hwy))+
  geom_point()+
  geom_smooth(se=FALSE)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

specifying linear regression:

ggplot(mpg, aes(x = displ, y = hwy))+
  geom_point()+
  geom_smooth(method = "lm",se=FALSE)
## `geom_smooth()` using formula 'y ~ x'

2.2.1. playing with color

grouping with color - 3 different lines

ggplot(mpg, aes(x = displ, y = hwy, color = drv))+
  geom_point()+
  geom_smooth(method = "lm",se=FALSE)
## `geom_smooth()` using formula 'y ~ x'

grouping with color - just 1 line

ggplot(mpg, aes(x = displ, y = hwy))+
  geom_point(aes(color = drv))+
  geom_smooth(method = "lm",se=FALSE)
## `geom_smooth()` using formula 'y ~ x'

grouping without color - 3 lines

ggplot(mpg, aes(x = displ, y = hwy, group = drv))+
  geom_point()+
  geom_smooth(method = "lm",se=FALSE)
## `geom_smooth()` using formula 'y ~ x'

2.3. Boxplots

making a simple boxplot

ggplot(mpg, aes(y = hwy))+
  geom_boxplot()

making colorful side-by-side boxplots - good with compaing distributions across different groups

ggplot(mpg, aes(y = hwy, fill = drv))+
  geom_boxplot()

2.4. Bar Graphs

loading a new dataset & making a simple bar graph

data("diamonds")

ggplot(diamonds, aes(x=cut))+
  geom_bar()

adding color - double defining

ggplot(diamonds, aes(x = cut, fill = cut))+
  geom_bar()

stacked bar graph - bad for comparison

ggplot(diamonds, aes(x=cut, fill=color))+
  geom_bar()

mini bar graphs - better for comparison

ggplot(diamonds, aes(x=cut, fill=color))+
  geom_bar(position = "dodge")

stacked but with proportion instead of count - can be useful!

ggplot(diamonds, aes(x=cut, fill=color))+
  geom_bar(position = "fill")

3. Exploratory Data Analysis (EDA)

EDA is an iterative cycle- you must:

  1. Generate questions about your data

  2. Search for answers by visualizing, transforming, and modeling your data

  3. Use what you learn to refine your questions and/or generate new questions

Questions to ask yourself:

  1. What type of variation occurs within my variables?

  2. Which values are the most common? Why?

  3. Which values are rare? Why? Does this match your expectations?

  4. Can you see any unusual patterns? What might explain them?

  5. What type of covariation occurs between my variables?

Two main tips: 1. write down expectations/preconcieved notions - gives you a starting point

  1. show the data- don’t over-process the data. start with the rawest data possible and then refine it

  2. Note what surprises you- otherwise you may forget how you got to what you did. USE R MARKDOWNS.