This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Note that the echo = FALSE
parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
library(ggplot2)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ lubridate 1.9.3 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dplyr)
?mpg #access information about the mpg dataset
## starting httpd help server ... done
mpg #display dataset
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
## # ℹ 224 more rows
head(mpg) #display the first 6 rows in the dataset
## # A tibble: 6 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compa…
## 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compa…
## 3 audi a4 2 2008 4 manual(m6) f 20 31 p compa…
## 4 audi a4 2 2008 4 auto(av) f 21 30 p compa…
## 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compa…
## 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compa…
#command is ggplot, the first argument is the dataset, the + sign adds an additional layer,
#geom_point creates a scatter plot, mapping defines what will be in the scatter plot, aes
#means aestetic, x is set to displacement (engine size), y is set to highway mileage (fuel effieciency)
#ggplot(data=<DATA>)+<GEOM_FUNCTION>(mapping=aes<MAPPINGS>)
ggplot(data=mpg)+geom_point(mapping=aes(x=displ,y=hwy))
ggplot(data=mpg)+geom_point(mapping=aes(x=displ,y=cty))
ggplot(data=mpg)+geom_point(mapping=aes(x=displ,y=cyl))
#you can add a third variable, like class, to a two dimensional scatterplot by mapping it to an aesthetic
#an aestetic is a visual property of the objects in our plot
#can change the levels of a point's size, shape, color, or alpha
#convey info about your data by mapping the aesthetics in your plot to variables in the data set
#map the colors of your points to the class variables to reveal the class of each car
#will display each class value with a diff color
#scaling: mapping an aesthetic to a variable by associating the name of the aesthetic tot he name of the variable inside aes()
ggplot(data=mpg)+geom_point(mapping=aes(x=displ,y=hwy, color=class))
ggplot(data=mpg)+geom_point(mapping=aes(x=displ,y=hwy, size=class)) #not advisable
## Warning: Using size for a discrete variable is not advised.
ggplot(data=mpg)+geom_point(mapping=aes(x=displ,y=hwy, shape=class)) #displays points as different shapes, stars, crosses, etc
## Warning: The shape palette can deal with a maximum of 6 discrete values because more
## than 6 becomes difficult to discriminate
## ℹ you have requested 7 values. Consider specifying shapes manually if you need
## that many have them.
## Warning: Removed 62 rows containing missing values (`geom_point()`).
ggplot(data=mpg)+geom_point(mapping=aes(x=displ,y=hwy, alpha=class)) #displays different transparencies
## Warning: Using alpha for a discrete variable is not advised.
#set aesthetics manually, make all of the points blue
#important to note, this does not go inside of the aes function
ggplot(data=mpg)+geom_point(mapping=aes(x=displ,y=hwy), color="blue")
ggplot(data=mpg)+geom_point(mapping=aes(x=displ,y=hwy, color=class))
ggplot(data=mpg)+geom_point(mapping=aes(x=displ,y=hwy, color="blue")) #changes the name of the class to "blue" rather than the color of the points to blue
#facets: creates subplots that each display one subset of the data,
#useful for categorical variables to add additional variables
#if its continuous and doesn't have specific values for the attribute, it will not work
ggplot(data=mpg)+geom_point(mapping=aes(x=displ,y=hwy)) + facet_wrap(~class, nrow=2) #creates 2 rows of plots