Data Management

Loading Data Set

library("tidyverse")

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.1     ✔ tibble    3.3.0
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.2
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

mpg

## # A tibble: 234 × 11
##    manufacturer model      displ  year   cyl trans drv     cty   hwy fl    class
##    <chr>        <chr>      <dbl> <int> <int> <chr> <chr> <int> <int> <chr> <chr>
##  1 audi         a4           1.8  1999     4 auto… f        18    29 p     comp…
##  2 audi         a4           1.8  1999     4 manu… f        21    29 p     comp…
##  3 audi         a4           2    2008     4 manu… f        20    31 p     comp…
##  4 audi         a4           2    2008     4 auto… f        21    30 p     comp…
##  5 audi         a4           2.8  1999     6 auto… f        16    26 p     comp…
##  6 audi         a4           2.8  1999     6 manu… f        18    26 p     comp…
##  7 audi         a4           3.1  2008     6 auto… f        18    27 p     comp…
##  8 audi         a4 quattro   1.8  1999     4 manu… 4        18    26 p     comp…
##  9 audi         a4 quattro   1.8  1999     4 auto… 4        16    25 p     comp…
## 10 audi         a4 quattro   2    2008     4 manu… 4        20    28 p     comp…
## # ℹ 224 more rows

summary(mpg)

##  manufacturer          model               displ            year     
##  Length:234         Length:234         Min.   :1.600   Min.   :1999  
##  Class :character   Class :character   1st Qu.:2.400   1st Qu.:1999  
##  Mode  :character   Mode  :character   Median :3.300   Median :2004  
##                                        Mean   :3.472   Mean   :2004  
##                                        3rd Qu.:4.600   3rd Qu.:2008  
##                                        Max.   :7.000   Max.   :2008  
##       cyl           trans               drv                 cty       
##  Min.   :4.000   Length:234         Length:234         Min.   : 9.00  
##  1st Qu.:4.000   Class :character   Class :character   1st Qu.:14.00  
##  Median :6.000   Mode  :character   Mode  :character   Median :17.00  
##  Mean   :5.889                                         Mean   :16.86  
##  3rd Qu.:8.000                                         3rd Qu.:19.00  
##  Max.   :8.000                                         Max.   :35.00  
##       hwy             fl               class          
##  Min.   :12.00   Length:234         Length:234        
##  1st Qu.:18.00   Class :character   Class :character  
##  Median :24.00   Mode  :character   Mode  :character  
##  Mean   :23.44                                        
##  3rd Qu.:27.00                                        
##  Max.   :44.00

str(mpg)

## tibble [234 × 11] (S3: tbl_df/tbl/data.frame)
##  $ manufacturer: chr [1:234] "audi" "audi" "audi" "audi" ...
##  $ model       : chr [1:234] "a4" "a4" "a4" "a4" ...
##  $ displ       : num [1:234] 1.8 1.8 2 2 2.8 2.8 3.1 1.8 1.8 2 ...
##  $ year        : int [1:234] 1999 1999 2008 2008 1999 1999 2008 1999 1999 2008 ...
##  $ cyl         : int [1:234] 4 4 4 4 6 6 6 4 4 4 ...
##  $ trans       : chr [1:234] "auto(l5)" "manual(m5)" "manual(m6)" "auto(av)" ...
##  $ drv         : chr [1:234] "f" "f" "f" "f" ...
##  $ cty         : int [1:234] 18 21 20 21 16 18 18 18 16 20 ...
##  $ hwy         : int [1:234] 29 29 31 30 26 26 27 26 25 28 ...
##  $ fl          : chr [1:234] "p" "p" "p" "p" ...
##  $ class       : chr [1:234] "compact" "compact" "compact" "compact" ...

names(mpg)

##  [1] "manufacturer" "model"        "displ"        "year"         "cyl"         
##  [6] "trans"        "drv"          "cty"          "hwy"          "fl"          
## [11] "class"

We are loading tidy verse which has all the commands we need and the table needed for this project. Then we use the summary,str,and names command to confirm we have the right things.

Filtered Data

mpg_data = mpg %>%
  select(manufacturer, class, cty, hwy) %>%
  mutate(avg_mpg = as.numeric((cty + hwy) / 2))
mpg_data

## # A tibble: 234 × 5
##    manufacturer class     cty   hwy avg_mpg
##    <chr>        <chr>   <int> <int>   <dbl>
##  1 audi         compact    18    29    23.5
##  2 audi         compact    21    29    25  
##  3 audi         compact    20    31    25.5
##  4 audi         compact    21    30    25.5
##  5 audi         compact    16    26    21  
##  6 audi         compact    18    26    22  
##  7 audi         compact    18    27    22.5
##  8 audi         compact    18    26    22  
##  9 audi         compact    16    25    20.5
## 10 audi         compact    20    28    24  
## # ℹ 224 more rows

We name our new table “mpg_data” and use the “mpg %>%” to make changes to our data set for making it look cleaner and easier to read. In this chunck we use the select function to cut out unnecessary columns, and the mutate function to create a new column known as “avg_mpg” which is a combination of our cty and hwy colums added and divided by 2.

Selecting specific Criteria

mpg_data2 = mpg_data %>%
  filter(avg_mpg >=25, !is.na(class) ) %>%
  slice(-c(2,5))
mpg_data2

## # A tibble: 41 × 5
##    manufacturer class        cty   hwy avg_mpg
##    <chr>        <chr>      <int> <int>   <dbl>
##  1 audi         compact       21    29    25  
##  2 audi         compact       21    30    25.5
##  3 chevrolet    midsize       22    30    26  
##  4 honda        subcompact    24    32    28  
##  5 honda        subcompact    25    32    28.5
##  6 honda        subcompact    23    29    26  
##  7 honda        subcompact    24    32    28  
##  8 honda        subcompact    26    34    30  
##  9 honda        subcompact    25    36    30.5
## 10 honda        subcompact    24    36    30  
## # ℹ 31 more rows

In this chunk we use the filter command to further refine our search, cutting out any car that is lower than average mpg of 25mpg. We also use the slice data to remove rows 2 and 5and any missing values.

Renaming

mpgsummary=mpg_data2 %>%
  rename(brand=manufacturer, vehicle_type=class)
mpgsummary

## # A tibble: 41 × 5
##    brand     vehicle_type   cty   hwy avg_mpg
##    <chr>     <chr>        <int> <int>   <dbl>
##  1 audi      compact         21    29    25  
##  2 audi      compact         21    30    25.5
##  3 chevrolet midsize         22    30    26  
##  4 honda     subcompact      24    32    28  
##  5 honda     subcompact      25    32    28.5
##  6 honda     subcompact      23    29    26  
##  7 honda     subcompact      24    32    28  
##  8 honda     subcompact      26    34    30  
##  9 honda     subcompact      25    36    30.5
## 10 honda     subcompact      24    36    30  
## # ℹ 31 more rows

Here we have our data “mpg_data2” and our rename function to chage our clas and manufacturer name to something more user friendly. Then because we typed it our to “=mpgsummary” we can type it out to confirm that the name switch happened.

Final Table

mpg_summary1 <- mpgsummary %>%
  group_by(brand, vehicle_type) %>%
  summarise(avg_of_avg_mpg = mean(avg_mpg, na.rm = TRUE),count = n())

## `summarise()` has grouped output by 'brand'. You can override using the
## `.groups` argument.

mpg_summary1

## # A tibble: 11 × 4
## # Groups:   brand [7]
##    brand      vehicle_type avg_of_avg_mpg count
##    <chr>      <chr>                 <dbl> <int>
##  1 audi       compact                25.2     2
##  2 chevrolet  midsize                26       1
##  3 honda      subcompact             28.2     8
##  4 hyundai    midsize                25.8     2
##  5 nissan     compact                25       1
##  6 nissan     midsize                27.2     2
##  7 toyota     compact                28.3     8
##  8 toyota     midsize                25.7     3
##  9 volkswagen compact                26.6     9
## 10 volkswagen midsize                25       2
## 11 volkswagen subcompact             33.2     3

In this chunck use our now updated and filtered table to create ou rfinal summary table. We use “group by” to fine every combination possible for our vehicle type. Our summarise line also helps organize the average mpg of each type of car and the count helps remove and unnessecary bloat that would show in the table.