How to load data:
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
cars_df<-mtcars
(Modtern Statistics with R, Chapter 2)
Modifying existing data frames:
cars_df <- cars_df %>% select(mpg,hp)
Code chunks must open and close with “```”. Space can be created in a chunk with the “enter” key:
(Anything else will not work)
Removing NA
air<-airquality
air<-air %>% select(Ozone,Month,Day) %>% na.omit(.)
The dollar sign specifies columns/variables:
summary(air$Ozone)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 18.00 31.50 42.13 63.25 168.00
summary(air$Month)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 5.000 6.000 7.000 7.198 8.250 9.000
# summary can't be called on more than one column, unless you call the whole dataset
summary(air)
## Ozone Month Day
## Min. : 1.00 Min. :5.000 Min. : 1.00
## 1st Qu.: 18.00 1st Qu.:6.000 1st Qu.: 8.00
## Median : 31.50 Median :7.000 Median :16.00
## Mean : 42.13 Mean :7.198 Mean :15.53
## 3rd Qu.: 63.25 3rd Qu.:8.250 3rd Qu.:22.00
## Max. :168.00 Max. :9.000 Max. :31.00
Correlation: both variables need to be free of NA and both need to be specified with a “$”:
cor(cars_df$mpg,cars_df$hp)
## [1] -0.7761684
fist element is data frame, variables are which variables + geom_point says to take all before # and make a plot chart
ggplot(cars_df,aes(x=mpg,y=hp)) + geom_point()