Installing and loading necessary packages in R Studio for Ch. 1 of R for Data Science (Wickham et al., 2023):

Install tidyverse package (pg. 1-2)

ONLY install packages if first time using them! FYI - you have to load packages every time you use R
install.packages("tidyverse")
## 
## The downloaded binary packages are in
##  /var/folders/8n/yt7q563d0kq2_9rbr216z8mc0000gn/T//Rtmp0YgZtr/downloaded_packages

Load tidyverse (p. 1-2)

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
FYI: 
The white boxes that have text that start with "##" are output (or output messages, warnings, errors, etc.).
The gray boxes are R code. 


I don't know what the conflicts are when I load tidyverse at this point in my training, but I am not worried about them right now because that code isn't necessary for ch.1 (I am interested in ggplot2 right now)

Install palmerpenguins & ggthemes packages to follow along (pg. 2)

ONLY install packages if first time using them! FYI - you have to load packages every time you use R
install.packages("palmerpenguins")
## 
## The downloaded binary packages are in
##  /var/folders/8n/yt7q563d0kq2_9rbr216z8mc0000gn/T//Rtmp0YgZtr/downloaded_packages
install.packages("ggthemes")
## 
## The downloaded binary packages are in
##  /var/folders/8n/yt7q563d0kq2_9rbr216z8mc0000gn/T//Rtmp0YgZtr/downloaded_packages

Load palmerpenguins package to follow along (pg. 2)

library(palmerpenguins)
library(ggthemes)

Open interactive data viewer of palmerpenguins data (pg. 3)

This will open a separate tab of the data. If you only want a preview of the data, type "penguins" in the console and hit enter & R will print a preview
view(penguins)

Creating a ggplot (pg. 5-7)

Ultimate goal with penguin data in Ch.1 : create a visual representation of the relationship between body 
mass and flipper length, with consideration for the penguin species present in the data set (pg. 4).

1. Create a plot object or “blank canvas” that you can add layers to

ggplot(data = penguins)

2. Tell ggplot how data will be represented (i.e., define x & y axis)

ggplot( data = penguins, mapping = aes(x = flipper_length_mm, y = body_mass_g))

3. Define the geom (in this case, create a scatterplot)

ggplot( data = penguins, mapping = aes(x = flipper_length_mm, y = body_mass_g)) + geom_point()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

STOP!! Why the warning?!

It looks like R automatically removed missing data that it was unable to plot. You will see this warning pop up as we layer the code, feel free to ignore it.

Alright, let's carry on!

Adding Aesthetics & Layers (pg. 8-12)

4. Differentiate species by color in the scatterplot

ggplot( data = penguins, mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) + geom_point()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

5. Differentiate species by color in the scatterplot + add trend lines for each species

ggplot( data = penguins, mapping = aes(x = flipper_length_mm, y = body_mass_g, color = species)) + geom_point() + geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

6. Differentiate species by color in the scatterplot + one general trend line

ggplot( data = penguins, mapping = aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(mapping = aes(color = species)) + geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

7. Change shape in species plots to increase visual accessibility

ggplot( data = penguins, mapping = aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(mapping = aes(color = species, shape = species)) + geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

8. Add title/subtitle, define legend labels, and change color scheme for accessibility

ggplot( data = penguins, mapping = aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(mapping = aes(color = species, shape = species)) + geom_smooth(method = "lm") + labs(title = "Body Mass and Flipper Length", subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", x = "Flipper Length (mm)", y = "Body Mass (g)", color = "Species", shape = "Species") + scale_color_colorblind()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

Piping function (pg. 14)

(look at the difference in code between piping and #8)
penguins |>
ggplot(aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(mapping = aes(color = species, shape = species)) + geom_smooth(method = "lm") + labs(title = "Body Mass and Flipper Length", subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins", x = "Flipper Length (mm)", y = "Body Mass (g)", color = "Species", shape = "Species") + scale_color_colorblind()
## `geom_smooth()` using formula = 'y ~ x'
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_smooth()`).
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

Visualizing Distributions (pg. 14-18)

  Categorical Variable
  
penguins |>
ggplot(aes(x = species)) + geom_bar()

  Categorical Variable: Ordered Levels
penguins |>
ggplot(aes(x = fct_infreq(species))) + geom_bar()

  Numerical Variable: Histogram
  
penguins |>
ggplot(aes(x = body_mass_g)) + geom_histogram(binwidth = 200)
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).

  Numerical Variable: Histogram - Exploring Width
  
penguins |>
ggplot(aes(x = body_mass_g)) + geom_histogram(binwidth = 20)
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).

penguins |>
ggplot(aes(x = body_mass_g)) + geom_histogram(binwidth = 2000)
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_bin()`).

  Numerical Variable: Density Plot
  
penguins |>
ggplot(aes(x = body_mass_g)) + geom_density()
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_density()`).

Visualizing Relationships (pp. 19-27)

A Numerical and a Catergorical Variable

    Boxplot
    
penguins |>
ggplot(aes(x = species, y = body_mass_g)) + geom_boxplot()
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_boxplot()`).

    Density Plot
    
penguins |>
ggplot(aes(x = body_mass_g, color = species)) + geom_density(linewidth = 0.75)
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_density()`).

    Density Plot: Add Transparancy to Density Curves
    
penguins |>
ggplot(aes(x = body_mass_g, color = species, fill = species)) + geom_density(alpha = 0.5)
## Warning: Removed 2 rows containing non-finite outside the scale range
## (`stat_density()`).

Two Categorical Variables

    Stacked Bar Plot
    
penguins |>
ggplot(aes(x = island, fill = species)) + geom_bar()

    Relative Frequency Plot (%)
    
penguins |>
ggplot(aes(x = island, fill = species)) + geom_bar(position = "fill")

Two Numerical Variables

  Scatterplot
penguins |>
ggplot(aes(x = flipper_length_mm, y = body_mass_g)) + geom_point()
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

Three or More Variables

  Scatterplot: Add Aesthetics & Layers
penguins |>
ggplot(aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(aes(color = species, shape = island))
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

  Scatterplot: Facets
penguins |>
ggplot(aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(aes(color = species, shape = species)) + facet_wrap(~island)
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

Saving Your Plots (pg. 28)

  Saving plots as an image (.png)
penguins |>
ggplot(aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(aes(color = species, shape = species)) + facet_wrap(~island)
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).

ggsave(filename = "penguin-plot.png")
## Saving 7 x 5 in image
## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_point()`).
  Use the code ggsave(filename = "ADD-TITLE.png") at the each plot's code that you want to export.The png file will save where you opened your syntax. I suggest creating a folder for projects so that everything is organized in one location. 
  

Common Problems

-Make sure (parentheses) and “exclamations” are paired together

-When writing code, the “+” needs to come at the end of the line, not the start

-You can get help by running ?function_name (e.g., ?ggsave) in the console or highlight function + F1 in RStudio

-Carefully read error messages and google it if you can’t figure it out! :)

END OF CHAPTER 1.