Note: This Project is based on the key takeaways from
the course provided by Google: Data
Analysis with R Programming.
Introduction. This is the first R Markdown Project
by the Author, which provides examples of the following
activities:
* Running code chunks
* Loading some necessary packages and data frames
* Building plots and including them in this Project
* Using other features of R Markdown Environment (attaching simple and
embedded links)
Step 1. Loading the Necessary Packages:
To begin with, all the necessary packages and libraries need to be
installed to the R working environment, such as:
* ‘tidyverse’
* ‘palmerpenguins’
install.packages("tidyverse")
install.packages("palmerpenguins")
library(tidyverse)
library(palmerpenguins)
Step 2. Building a Basic Scatter Plot.
This step covers visualizing the ‘palmerpenguins’ data frame with a
scatter plot to examine the correlation between two variables:
* flipper length (mm) on x - axis
* and body mass (g) on y - axis
ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point()
Step 3. Improving the Readability of the Basic
Plot
The basic plot created in Step 2 comes with default features of
appearance. However, using labs() function allows making this plot even
more visually appealing by:
* Adding a title and a caption to introduce the plot to the reader
better
* Changing the labels of the x- and y- axes
ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point() +
labs(title = "Palmer Penguins: Relationship between Flipper Length and Body Mass", caption = "Data was collected by Dr. Christen Gorman", x = "Flipper Length (mm)", y = "Body Mass (g)")
Step 4. Adding a Trend Line to the Plot with geom_smooth()
function
It is also meaningful to add a trend line to the scatter plot created in
Step 3 to make it easier to understand the overall trend in the
data:
ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g)) +
geom_point() +
labs(title = "Palmer Penguins: Relationship between Flipper Length and Body Mass", caption = "Data was collected by Dr. Christen Gorman", x = "Flipper Length (mm)", y = "Body Mass (g)") +
geom_smooth(method = "loess")
Step 5. Subsetting the Data using facet()
functions
The more detailed version of the visualization created in Step 4 can be
broken down into sub-sets to focus on different categories of a specific
variable (for example species or island) using the following facet
functions:
* facet_wrap() for subsetting the data with one variable
* facet_grid() for subsetting the data with two variables
Scenario 1. The task is to examine the relationship
between flipper length and body mass among three different species, but
on separate plots and with different colors:
ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
geom_point() +
labs(title = "Palmer Penguins: Relationship between Flipper Length and Body Mass", caption = "Data was collected by Dr. Christen Gorman", x = "Flipper Length (mm)", y = "Body Mass (g)") +
geom_smooth(method = "loess") +
facet_wrap(~species)
It is also possible to subset the plot of the data by island:
ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g, color = island)) +
geom_point() +
labs(title = "Palmer Penguins: Relationship between Flipper Length and Body Mass", caption = "Data was collected by Dr. Christen Gorman", x = "Flipper Length (mm)", y = "Body Mass (g)") +
geom_smooth(method = "loess") +
facet_wrap(~island)
Scenario 2. The task is to subset the plot created in
Step 4 by two variables: sex and species using the facet_grid()
function.
This function maps the first variable into the y - axis (vertically),
while the latter is mapped to the x - axis (horizontally).
In this example, color is mapped to species and the trend lines have
been removed from the plot for simplicity purposes.
ggplot(data = penguins, aes(x = flipper_length_mm, y = body_mass_g, color = species)) +
geom_point() +
labs(title = "Palmer Penguins: Relationship between Flipper Length and Body Mass", caption = "Data was collected by Dr. Christen Gorman", x = "Flipper Length (mm)", y = "Body Mass (g)") +
facet_grid(sex~species)
Additional Section.
More information on working on R Markdown files, editing and adding
additional elements can be found on https://community.rstudio.com/.
In addition, RStudio Official
Learning Blog and R for Data
Science Online Learning Community provide tons of useful
information, answers to frequently asked questions and some tutorials in
R.