library("tidyverse")
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.6
## ✔ forcats 1.0.1 ✔ stringr 1.6.0
## ✔ ggplot2 4.0.1 ✔ tibble 3.3.0
## ✔ lubridate 1.9.4 ✔ tidyr 1.3.2
## ✔ purrr 1.2.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
mpg
## # A tibble: 234 × 11
## manufacturer model displ year cyl trans drv cty hwy fl class
## <chr> <chr> <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
## 1 audi a4 1.8 1999 4 auto… f 18 29 p comp…
## 2 audi a4 1.8 1999 4 manu… f 21 29 p comp…
## 3 audi a4 2 2008 4 manu… f 20 31 p comp…
## 4 audi a4 2 2008 4 auto… f 21 30 p comp…
## 5 audi a4 2.8 1999 6 auto… f 16 26 p comp…
## 6 audi a4 2.8 1999 6 manu… f 18 26 p comp…
## 7 audi a4 3.1 2008 6 auto… f 18 27 p comp…
## 8 audi a4 quattro 1.8 1999 4 manu… 4 18 26 p comp…
## 9 audi a4 quattro 1.8 1999 4 auto… 4 16 25 p comp…
## 10 audi a4 quattro 2 2008 4 manu… 4 20 28 p comp…
## # ℹ 224 more rows
summary(mpg)
## manufacturer model displ year
## Length:234 Length:234 Min. :1.600 Min. :1999
## Class :character Class :character 1st Qu.:2.400 1st Qu.:1999
## Mode :character Mode :character Median :3.300 Median :2004
## Mean :3.472 Mean :2004
## 3rd Qu.:4.600 3rd Qu.:2008
## Max. :7.000 Max. :2008
## cyl trans drv cty
## Min. :4.000 Length:234 Length:234 Min. : 9.00
## 1st Qu.:4.000 Class :character Class :character 1st Qu.:14.00
## Median :6.000 Mode :character Mode :character Median :17.00
## Mean :5.889 Mean :16.86
## 3rd Qu.:8.000 3rd Qu.:19.00
## Max. :8.000 Max. :35.00
## hwy fl class
## Min. :12.00 Length:234 Length:234
## 1st Qu.:18.00 Class :character Class :character
## Median :24.00 Mode :character Mode :character
## Mean :23.44
## 3rd Qu.:27.00
## Max. :44.00
str(mpg)
## tibble [234 × 11] (S3: tbl_df/tbl/data.frame)
## $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
## $ model : chr [1:234] "a4" "a4" "a4" "a4" ...
## $ displ : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
## $ year : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
## $ cyl : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
## $ trans : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
## $ drv : chr [1:234] "f" "f" "f" "f" ...
## $ cty : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
## $ hwy : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
## $ fl : chr [1:234] "p" "p" "p" "p" ...
## $ class : chr [1:234] "compact" "compact" "compact" "compact" ...
names(mpg)
## [1] "manufacturer" "model" "displ" "year" "cyl"
## [6] "trans" "drv" "cty" "hwy" "fl"
## [11] "class"
We are loading tidy verse which has all the commands we need and the table needed for this project. Then we use the summary,str,and names command to confirm we have the right things.
mpg_data = mpg %>%
select(manufacturer, class, cty, hwy) %>%
mutate(avg_mpg = as.numeric((cty + hwy) / 2))
mpg_data
## # A tibble: 234 × 5
## manufacturer class cty hwy avg_mpg
## <chr> <chr> <int> <int> <dbl>
## 1 audi compact 18 29 23.5
## 2 audi compact 21 29 25
## 3 audi compact 20 31 25.5
## 4 audi compact 21 30 25.5
## 5 audi compact 16 26 21
## 6 audi compact 18 26 22
## 7 audi compact 18 27 22.5
## 8 audi compact 18 26 22
## 9 audi compact 16 25 20.5
## 10 audi compact 20 28 24
## # ℹ 224 more rows
We name our new table “mpg_data” and use the “mpg %>%” to make changes to our data set for making it look cleaner and easier to read. In this chunck we use the select function to cut out unnecessary columns, and the mutate function to create a new column known as “avg_mpg” which is a combination of our cty and hwy colums added and divided by 2.
mpg_data2 = mpg_data %>%
filter(avg_mpg >=25, !is.na(class) ) %>%
slice(-c(2,5))
mpg_data2
## # A tibble: 41 × 5
## manufacturer class cty hwy avg_mpg
## <chr> <chr> <int> <int> <dbl>
## 1 audi compact 21 29 25
## 2 audi compact 21 30 25.5
## 3 chevrolet midsize 22 30 26
## 4 honda subcompact 24 32 28
## 5 honda subcompact 25 32 28.5
## 6 honda subcompact 23 29 26
## 7 honda subcompact 24 32 28
## 8 honda subcompact 26 34 30
## 9 honda subcompact 25 36 30.5
## 10 honda subcompact 24 36 30
## # ℹ 31 more rows
In this chunk we use the filter command to further refine our search, cutting out any car that is lower than average mpg of 25mpg. We also use the slice data to remove rows 2 and 5and any missing values.
mpgsummary=mpg_data2 %>%
rename(brand=manufacturer, vehicle_type=class)
mpgsummary
## # A tibble: 41 × 5
## brand vehicle_type cty hwy avg_mpg
## <chr> <chr> <int> <int> <dbl>
## 1 audi compact 21 29 25
## 2 audi compact 21 30 25.5
## 3 chevrolet midsize 22 30 26
## 4 honda subcompact 24 32 28
## 5 honda subcompact 25 32 28.5
## 6 honda subcompact 23 29 26
## 7 honda subcompact 24 32 28
## 8 honda subcompact 26 34 30
## 9 honda subcompact 25 36 30.5
## 10 honda subcompact 24 36 30
## # ℹ 31 more rows
Here we have our data “mpg_data2” and our rename function to chage our clas and manufacturer name to something more user friendly. Then because we typed it our to “=mpgsummary” we can type it out to confirm that the name switch happened.
mpg_summary1 <- mpgsummary %>%
group_by(brand, vehicle_type) %>%
summarise(avg_of_avg_mpg = mean(avg_mpg, na.rm = TRUE),count = n())
## `summarise()` has grouped output by 'brand'. You can override using the
## `.groups` argument.
mpg_summary1
## # A tibble: 11 × 4
## # Groups: brand [7]
## brand vehicle_type avg_of_avg_mpg count
## <chr> <chr> <dbl> <int>
## 1 audi compact 25.2 2
## 2 chevrolet midsize 26 1
## 3 honda subcompact 28.2 8
## 4 hyundai midsize 25.8 2
## 5 nissan compact 25 1
## 6 nissan midsize 27.2 2
## 7 toyota compact 28.3 8
## 8 toyota midsize 25.7 3
## 9 volkswagen compact 26.6 9
## 10 volkswagen midsize 25 2
## 11 volkswagen subcompact 33.2 3
In this chunck use our now updated and filtered table to create ou rfinal summary table. We use “group by” to fine every combination possible for our vehicle type. Our summarise line also helps organize the average mpg of each type of car and the count helps remove and unnessecary bloat that would show in the table.