Your first line of code will usually be importing packages.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
We will look at the data set mpg, which gives many details about specific cars allowing us to find patterns within.
data(mpg) #data() lets us load the data
head(mpg)
## # A tibble: 6 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…
## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…
## 4 audi a4 2 2008 4 auto(av) f 21 30 p compa…
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
mpg is the data set that’s been loaded.
head allows you to view the first 6-10 rows.
view(data set) lets you view the entire data set in
a separate tab.
Adding ? in front of any variable will open the help tab which
gives you a detailed explanation of the variable. example:
?mpg will show give you a rundown on the data.
ggplot(data=mpg) +
geom_point(mapping = aes(x=displ, y = hwy))
ggplot(): creates an empty plot and loads the dataset
mpg.geom_point(): used to plot points to make a
scatterplotmapping = aes(x=z, y=j): tells ggplot how we want it to
map the points on the graph. aes is short for aestheticggplot(data = mpg) +
geom_point(mapping = aes(x=displ, y = hwy, color = class))
By adding color = class, we have now differentiated
each point by their set class’ colour! This will make is easier for us
to spot patterns in our scatter plot.
Notice how color = class is inside
`mapping``
ggplot(data=mpg) + geom_point(mapping = aes(x=displ, y=hwy,), color= "blue")
We do it like this!
Notice how color = "blue" is outside mapping. This
is because we making the points blue is not something that we are
mapping but a design choice. Adding it inside mapping will cause weird
outputs as plotting fixed variables does not make sense
ggplot(data=mpg) + geom_smooth(mapping = aes(x=displ, y=hwy, color=class))
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
geom_smooth. There are many other “geoms” that allow us to
plot data differently.If we use geom_point and geom_smooth
together it will create something like this…
ggplot(data=mpg) + geom_smooth(mapping = aes(x = displ, y = hwy, color = class)) +
geom_point(mapping = aes(x = displ, y = hwy, color = class))
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Lets simplify this code a bit..
ggplot(data=mpg,mapping = aes(x = displ, y = hwy, color = class)) +
geom_point() + geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
Facets allow us to create multiple scatterplots split by the variable of your choice. Sometimes, even with the different colours, it can be hard to make use of data when plotted in a singular graph
ggplot(data=mpg,mapping = aes(x = displ, y = hwy, color = class)) +
geom_point() + geom_smooth() +
facet_wrap(~class, nrow = 2)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
facet_wrap() lets you create multiple scatterplots
split by a single variable, in this case, class.nrow = 2 just lets the program know we want the plots
to load in 2 rows.Lets use hwy as our variable.
ggplot(data=mpg,mapping = aes(x = displ, y = hwy, color = manufacturer)) +
geom_point() + geom_smooth() +
facet_wrap(~hwy, nrow = 4)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
ggplot(data=mpg,mapping = aes(x = displ, y = hwy, color = manufacturer)) +
geom_point() +
facet_grid(drv ~ cyl)
facet_grid() takes two variables split by the ~ sign
and allows you to facet on two variables, in this example, drv and
cyl.
drv is the type of wheel (4 wheel, front wheel, rear wheel), cyl is the number of cylinders.
The graph might seem hard to read but imagine it as battleship:
We can also use facet_grid on one variable by using . at
the beginning!
ggplot(data=mpg, mapping = aes(x = displ, y = hwy)) +
geom_point() +
facet_wrap(. ~cyl)
ggplots are built in layers made up of data, mapping, geom and optionally stats. Data can be broken down into subsets using facets. When we + something, we are adding a new layer to the ggplot
The gapminder dataset contains the life expectancy and many other data from various countries from 1952 to 2007.
Now lets add the library, etc!!
library(gapminder)
data(gapminder)
head(gapminder)
## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 1952 28.8 8425333 779.
## 2 Afghanistan Asia 1957 30.3 9240934 821.
## 3 Afghanistan Asia 1962 32.0 10267083 853.
## 4 Afghanistan Asia 1967 34.0 11537966 836.
## 5 Afghanistan Asia 1972 36.1 13079460 740.
## 6 Afghanistan Asia 1977 38.4 14880372 786.
gm07 <- filter(gapminder, year == 2007)
head(gm07)
## # A tibble: 6 × 6
## country continent year lifeExp pop gdpPercap
## <fct> <fct> <int> <dbl> <int> <dbl>
## 1 Afghanistan Asia 2007 43.8 31889923 975.
## 2 Albania Europe 2007 76.4 3600523 5937.
## 3 Algeria Africa 2007 72.3 33333216 6223.
## 4 Angola Africa 2007 42.7 12420476 4797.
## 5 Argentina Americas 2007 75.3 40301927 12779.
## 6 Australia Oceania 2007 81.2 20434176 34435.
filter() lets us filter the data so that we only have
relevant informationggplot(gm07, aes(x = gdpPercap, y = lifeExp, color = continent)) +
geom_point()
The range of GDP is very large so the graph might not be very accurate to scale. To counter that, we can take the log of the GDP in hopes to get the numbers closer to scale.
gm07 <- mutate(gm07, log10GdpPercap = log10(gdpPercap))
ggplot(gm07, aes(x=log10GdpPercap, y = lifeExp, color = continent)) +
geom_point()
mutate that takes all the
log of gdpPercap. This was used for our x instead of gdpPercap.Lets mutate gapminder first..
gapminder <- mutate(gapminder, log10GdpPercap = log10(gdpPercap))
p <- ggplot(gapminder, aes(x=log10GdpPercap, y=lifeExp, color = continent))
g1 <- geom_point(alpha=0.1)
p1 <- p + g1
alpha controls the transparency. In this example it is
10%.p #base of ggplot
g1 #the data of the points
## geom_point: na.rm = FALSE
## stat_identity: na.rm = FALSE
## position_identity
p1 #the combination of the two
We have to import a new library called gridExtra but
lets make a new plot first
g2 = geom_point(alpha = 0.5)
p2 = p + g2
p2
Now that we’ve created a new plot, lets look at them side by side!
library(gridExtra)
##
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
##
## combine
grid.arrange(p1, p2, nrow = 1)
grid.arrange() takes in the plots and the number of
rows we want as the parameters
There are many other ways to transform your data such as
stat_smooth to make curves
p + stat_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'