#Load Tidyverse
install.packages("tidyverse")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
install.packages("dplyr")
## Installing package into '/cloud/lib/x86_64-pc-linux-gnu-library/4.5'
## (as 'lib' is unspecified)
library (dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## Load the Dataset

For today’s activity, we will use a built-in dataset so that everyone is working with the same data. Load the mtcars dataset

data("mtcars")

Take a moment to look at the dataset.

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Inspecting the Data

Before cleaning data, it is important to understand its structure. View the structure of your variables.

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : num  6 6 4 6 8 6 8 4 4 6 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : num  1 1 1 0 0 0 0 0 0 0 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Answer the following questions in text below (not as code):


Cleaning the Data

Some variables in this dataset are stored as numbers even though they represent categories.

Convert Variables to Factors

Convert the following variables to factors:

  • cyl (number of cylinders)

  • am (transmission type)


``` r
mtcars <- mtcars %>%
  mutate(
    cyl = factor(cyl),
    am = factor(am)
  )

Check that the conversion worked by looking at the structure again.

str(mtcars)
## 'data.frame':    32 obs. of  11 variables:
##  $ mpg : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
##  $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
##  $ disp: num  160 160 108 258 360 ...
##  $ hp  : num  110 110 93 110 175 105 245 62 95 123 ...
##  $ drat: num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
##  $ wt  : num  2.62 2.88 2.32 3.21 3.44 ...
##  $ qsec: num  16.5 17 18.6 19.4 17 ...
##  $ vs  : num  0 0 1 1 0 1 0 1 1 1 ...
##  $ am  : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
##  $ gear: num  4 4 4 3 3 3 3 4 4 4 ...
##  $ carb: num  4 4 1 1 2 1 4 2 2 4 ...

Selecting Relevant Variables

For visualization, it is often helpful to work with only the variables you need.

Create a new object called cars_clean that contains only:

cars_clean <- mtcars %>%
  select(mpg, hp, wt, cyl, am)
head(cars_clean)
##                    mpg  hp    wt cyl am
## Mazda RX4         21.0 110 2.620   6  1
## Mazda RX4 Wag     21.0 110 2.875   6  1
## Datsun 710        22.8  93 2.320   4  1
## Hornet 4 Drive    21.4 110 3.215   6  0
## Hornet Sportabout 18.7 175 3.440   8  0
## Valiant           18.1 105 3.460   6  0

Filtering Observations

Now filter the dataset to include only cars with:

Save the result as a new object called cars_hp.

cars_hp <-cars_clean %>%
  filter(hp > 100)
head(cars_hp)
##                    mpg  hp    wt cyl am
## Mazda RX4         21.0 110 2.620   6  1
## Mazda RX4 Wag     21.0 110 2.875   6  1
## Hornet 4 Drive    21.4 110 3.215   6  0
## Hornet Sportabout 18.7 175 3.440   8  0
## Valiant           18.1 105 3.460   6  0
## Duster 360        14.3 245 3.570   8  0

Check how many rows remain.

str(cars_hp)
## 'data.frame':    23 obs. of  5 variables:
##  $ mpg: num  21 21 21.4 18.7 18.1 14.3 19.2 17.8 16.4 17.3 ...
##  $ hp : num  110 110 110 175 105 245 123 123 180 180 ...
##  $ wt : num  2.62 2.88 3.21 3.44 3.46 ...
##  $ cyl: Factor w/ 3 levels "4","6","8": 2 2 2 3 2 3 2 2 3 3 ...
##  $ am : Factor w/ 2 levels "0","1": 2 2 1 1 1 1 1 1 1 1 ...

Creating New Variables

Create a new variable called power_to_weight defined as:


horsepower / weight

Add this variable to cars_hp.

cars_hp <- cars_hp %>%
mutate(
  power_to_weight = hp / wt
)
head(cars_hp)
##                    mpg  hp    wt cyl am power_to_weight
## Mazda RX4         21.0 110 2.620   6  1        41.98473
## Mazda RX4 Wag     21.0 110 2.875   6  1        38.26087
## Hornet 4 Drive    21.4 110 3.215   6  0        34.21462
## Hornet Sportabout 18.7 175 3.440   8  0        50.87209
## Valiant           18.1 105 3.460   6  0        30.34682
## Duster 360        14.3 245 3.570   8  0        68.62745

Grouping and Summarizing Data

To prepare data for figures, we often summarize values by group.

Summary by Number of Cylinders

Create a summary table that shows:

  • Mean miles per gallon (mpg)

  • Mean horsepower (hp)

  • Number of observations

Grouped by:

  • cyl

Save this as summary_cyl.

summary_cyl <-cars_hp %>%
  group_by(cyl) %>%
  summarize(
    mean_mpg = mean(mpg),
    mean_hp = mean(hp),
    nrows = nrow(cars_hp)
  )

Display the table.

head(summary_cyl)
## # A tibble: 3 × 4
##   cyl   mean_mpg mean_hp nrows
##   <fct>    <dbl>   <dbl> <int>
## 1 4         25.9    111     23
## 2 6         19.7    122.    23
## 3 8         15.1    209.    23

Summary by Transmission Type

Now create a second summary table grouped by:

Include:

Save this as summary_transmission.

summary_transmission <- cars_hp %>%
  group_by(am) %>%
  summarize(
    mean_mpg = mean(mpg),
    mean_power_to_weight_ratio = mean(power_to_weight),
  )
head(summary_transmission)
## # A tibble: 2 × 3
##   am    mean_mpg mean_power_to_weight_ratio
##   <fct>    <dbl>                      <dbl>
## 1 0         16.1                       44.6
## 2 1         20.6                       62.1

```


Interpreting the Summaries

Answer the following questions in text:


Reflection

In a short paragraph, describe:

Final Check

Before submitting, confirm that: