Homework 2

The SPSS data file relig-baseline.sav (relig-baseline.csv) includes data from a religious study. The data were collected from students from a private university in the West, a private university in the Midwest, and public university in the East, and a public university in the South. Using the data, do the following:

How many rows and columns of data are there in the data set?

dim(religstudy)

## [1] 138 373

Are there any missing values in the participants’ ages? What is the average age of the participants? What is the most frequent age?

religstudy %>%
  summarise(
    missing_ages = sum(is.na(age)),
    mean_age = mean(age, na.rm = TRUE),
    most_frequent_age = age %>%
      na.omit() %>%
      table() %>%
      which.max()
  )

## # A tibble: 1 × 3
##   missing_ages mean_age most_frequent_age
##          <int>    <dbl>             <int>
## 1            0     23.6                 7

In the data set, the variables depress1, depress2, …, depress21 are measures of depression. Select these columns and save as a subset called relig_depress using the select() and starts_with() functions in Tidyverse.

relig_depress<- religstudy%>%
  select(starts_with("depress"))

Calculate the total score of depression by adding the 21 depression variables together for each participant (set na.rm = TRUE) and add this total score as a new variable to the relig_depress subset, and save the new subset as relig_depress_total.

relig_depress_total <- relig_depress %>%
  mutate(
    depress_total = rowSums(across(everything()), na.rm = TRUE)
  )

Save the relig_depress_total subset into .csv data file. [Use write.csv function here.]

write.csv(
  relig_depress_total,
  file = "relig_depress_total.csv",
  row.names = FALSE
)

In this case study, you’re going to work with a famous dataset, Iris. This dataset contains three plant species (setosa, virginica, versicolor) and four features measured for each sample. Please use the Tidyverse (load the tidyverse library first) to solve the following questions.

The iris dataset is pre-installed in R. Please convert it to a tibble by using the as_tibble function and save the tibble as an object called iris_dat.

iris_dat <- as_tibble(iris)
glimpse(iris_dat)

## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
## $ Sepal.Width  <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…
## $ Petal.Width  <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.…
## $ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…

Use the select() function to keep only three variables—Sepal.Length, Petal.Length, and Species—and arrange them in this order: Species, Sepal.Length, Petal.Length. Save this subset as an object called iris_subset.

iris_subset <- iris_dat %>%
  select(Species, Sepal.Length, Petal.Length)
glimpse(iris_subset)

## Rows: 150
## Columns: 3
## $ Species      <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s…
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.…
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.…

Within iris_subset, filter out rows where Sepal.Length is greater than 6.

iris_subset <- iris_subset %>%
  filter(Sepal.Length > 6)
glimpse(iris_subset)

## Rows: 61
## Columns: 3
## $ Species      <fct> versicolor, versicolor, versicolor, versicolor, versicolo…
## $ Sepal.Length <dbl> 7.0, 6.4, 6.9, 6.5, 6.3, 6.6, 6.1, 6.7, 6.2, 6.1, 6.3, 6.…
## $ Petal.Length <dbl> 4.7, 4.5, 4.9, 4.6, 4.7, 4.6, 4.7, 4.4, 4.5, 4.0, 4.9, 4.…

Within iris_subset, compute average Petal.Length by species using group_by() and summarize().

iris_subset %>%
  group_by(Species) %>%
  summarise(
    avg_petal_length = mean(Petal.Length)
  )

## # A tibble: 2 × 2
##   Species    avg_petal_length
##   <fct>                 <dbl>
## 1 versicolor             4.58
## 2 virginica              5.68

Add the average Petal.Length as a new variable (column) to iris_subset.

iris_subset <- iris_subset %>%
  group_by(Species) %>%
  mutate(
    avg_petal_length = mean(Petal.Length)
  ) %>%
  ungroup()
glimpse(iris_subset)

## Rows: 61
## Columns: 4
## $ Species          <fct> versicolor, versicolor, versicolor, versicolor, versi…
## $ Sepal.Length     <dbl> 7.0, 6.4, 6.9, 6.5, 6.3, 6.6, 6.1, 6.7, 6.2, 6.1, 6.3…
## $ Petal.Length     <dbl> 4.7, 4.5, 4.9, 4.6, 4.7, 4.6, 4.7, 4.4, 4.5, 4.0, 4.9…
## $ avg_petal_length <dbl> 4.585000, 4.585000, 4.585000, 4.585000, 4.585000, 4.5…

Homework 2

Johnathon Crince

2026-02-09

In this case study, you’re going to work with a famous dataset, Iris. This dataset contains three plant species (setosa, virginica, versicolor) and four features measured for each sample. Please use the Tidyverse (load the tidyverse library first) to solve the following questions.