data("diamonds")
str(diamonds)
## tibble [53,940 x 10] (S3: tbl_df/tbl/data.frame)
##  $ carat  : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
##  $ cut    : Ord.factor w/ 5 levels "Fair"<"Good"<..: 5 4 2 4 2 3 3 3 1 3 ...
##  $ color  : Ord.factor w/ 7 levels "D"<"E"<"F"<"G"<..: 2 2 2 6 7 7 6 5 2 5 ...
##  $ clarity: Ord.factor w/ 8 levels "I1"<"SI2"<"SI1"<..: 2 3 5 4 2 6 7 3 4 5 ...
##  $ depth  : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
##  $ table  : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
##  $ price  : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
##  $ x      : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
##  $ y      : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
##  $ z      : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...

Basic Scatterplot

ggplot(diamonds, aes(x= carat, y= price))+
  geom_point()

Facets

The period (.) won’t specific the row or column, and the~variable will take the second column. Example= facet_grid(.~cut)

ggplot(diamonds, aes(x= carat, y= price))+
  geom_point()+
  facet_grid(color~cut)

Smooth

se = standard error. lm= linear model. Color in the main statement (ggplot, aes) will apply to the whole graph. Anytime that you apply a variable within the geom it will only apply in the geometry and you have to use aes (geom_point(aes(color= cut))

ggplot(diamonds, aes(x= carat, y= price, color = cut)) +
  geom_point()+
  geom_smooth(method = "lm", se = FALSE)
## `geom_smooth()` using formula 'y ~ x'

Box Plot

You also can use the violin

Outliers are the bold line, they are points.

ggplot(diamonds, aes(y= price)) +
  geom_boxplot()

Side by Side

Compare prices around different cuts. Use fill to color the inside of the box plots and each cut will have a different color.

ggplot(diamonds, aes(x= cut, y= price, fill= cut)) +
  geom_boxplot()

As the color improve there is more variability.

ggplot(diamonds, aes(x= color, y= price, fill= color)) +
  geom_boxplot(outlier.color="red", outlier.alpha=0.2, outlier.shape=1, outlier.size =.2)

Useful for ANOVA (differences in means)

Bars (For categorical data)

ggplot(diamonds, aes(x= cut, fill = cut))+
  geom_bar()

The default is the Stacked Bars
ggplot(diamonds, aes(x= cut, fill = clarity))+
  geom_bar()

Use dodge for side by side bar graph
ggplot(diamonds, aes(x= cut, fill = clarity))+
  geom_bar(position = "dodge")

Fill .

It shows the distribution between each. We are comparing proportions.

ggplot(diamonds, aes(x= cut, fill = clarity))+
  geom_bar(position = "fill")