Assignment 1

Author

Zac Howe

We will be using USM peer graduation rate comparison data from the UMS Dashboard.

Exercises

There are eleven exercises in this assignment.

Exercise 1

Load the tidyverse family of packages and read the graduation.csv file into a tibble called graduation. Suppress the message generated from loading tidyverse. The file is available at https://jsuleiman.com/datasets/graduation.csv

Exercise 1 has already been completed for you. It won’t be for future assignments.

library(tidyverse)
graduation <- read_csv("https://jsuleiman.com/datasets/graduation.csv")

Exercise 2

glimpse() the graduation data in the code cell below.

# Load necessary library
suppressMessages(library(tidyverse))

# Read the graduation.csv file into a tibble
graduation <- read_csv("https://jsuleiman.com/datasets/graduation.csv")

Rows: 77 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): peer_institution, six_yr_grad_rate
dbl (3): year, four_yr_grad_rate, retention

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Use glimpse to inspect the dataset structure
glimpse(graduation)

Rows: 77
Columns: 5
$ year              <dbl> 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2012, 2013…
$ peer_institution  <chr> "University of Southern Maine", "University of South…
$ four_yr_grad_rate <dbl> 17, 10, 9, 10, 13, 13, 14, 11, 9, 10, 9, 9, 11, 13, …
$ retention         <dbl> 64, 67, 65, 64, 63, 68, 70, 74, 80, 72, 72, 68, 77, …
$ six_yr_grad_rate  <chr> "3500.00%", "3300.00%", "3000.00%", "3400.00%", "320…

Describe any potential issues with sixyrgradrate in your narrative for this exercise.

Potential issues with sixyrgradrate include missing values, outliers, incorrect data types, skewed distribution, and misinterpretation of zero rates.

Exercise 3

Modify graduation so the data type for sixyrgradrate is a dbl format. Overwrite the existing tibble.

Exercise 3 has already been completed for you since we haven’t fully covered how to do this. The mutate() function should look familiar but parse_number() is new.

graduation <- graduation |> 
  mutate(six_yr_grad_rate = parse_number(six_yr_grad_rate)/100)
glimpse(graduation)

Rows: 77
Columns: 5
$ year              <dbl> 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2012, 2013…
$ peer_institution  <chr> "University of Southern Maine", "University of South…
$ four_yr_grad_rate <dbl> 17, 10, 9, 10, 13, 13, 14, 11, 9, 10, 9, 9, 11, 13, …
$ retention         <dbl> 64, 67, 65, 64, 63, 68, 70, 74, 80, 72, 72, 68, 77, …
$ six_yr_grad_rate  <dbl> 35, 33, 30, 34, 32, 33, 34, 39, 37, 36, 38, 34, 38, …

Exercise 4

List the peer schools for University of Southern Maine without duplicates and without listing University of Southern Maine. For these exercises, if there is no code chunk (like this one) position your cursor in the whitespace above the next exercise heading and select Insert -> Executable Cell -> R

# Filter out the University of Southern Maine without duplicates
graduation |> 
  filter(peer_institution != "University of Southern Maine") |> 
  distinct(peer_institution)

# A tibble: 10 × 1
   peer_institution                           
   <chr>                                      
 1 University of Michigan-Flint               
 2 University of Arkansas at Little Rock      
 3 Texas Woman's University                   
 4 Texas A & M University-Kingsville          
 5 Salem State University                     
 6 North Carolina Central University          
 7 Murray State University                    
 8 Fayetteville State University              
 9 Chicago State University                   
10 California State University-Dominguez Hills

If there is no whitespace, position your cursor to the left of the E in Exercise and hit enter to create a blank line.

Exercise 5

List the top two six_yr_grad_rate and the instution for each. Filter by the most recent year in the dataset.

# Get the top two institutions by six year graduation rate for the most recent year
graduation |> 
  filter(year == 2018) |> 
  slice_max(six_yr_grad_rate, n = 2) |> 
  select(peer_institution, six_yr_grad_rate)

# A tibble: 2 × 2
  peer_institution        six_yr_grad_rate
  <chr>                              <dbl>
1 Salem State University                52
2 Murray State University               49

Exercise 6

List the top two four_yr_grad_rate and the instution for each. Filter by the most recent year in the dataset.

#List the top two four year graduation rates and institutions for the most recent year
graduation |> 
  filter(year == 2018) |> 
  slice_max(four_yr_grad_rate, n = 2) |> 
  select(peer_institution, four_yr_grad_rate)

# A tibble: 2 × 2
  peer_institution        four_yr_grad_rate
  <chr>                               <dbl>
1 Salem State University                 28
2 Murray State University                25

Exercise 7

Use the arrange function arrange(desc(six_yr_grad_rate)) to show the six_yr_grad_rate for the most recent year in the dataset? In the narrative below the code, mention where USM ranks in the list.

# Use arrange to show the six year graduation rate for the most recent year
graduation |> 
  filter(year == 2018) |> 
  arrange(desc(six_yr_grad_rate)) |> 
  select(peer_institution, six_yr_grad_rate)

# A tibble: 11 × 2
   peer_institution                            six_yr_grad_rate
   <chr>                                                  <dbl>
 1 Salem State University                                    52
 2 Murray State University                                   49
 3 North Carolina Central University                         43
 4 California State University-Dominguez Hills               42
 5 Texas Woman's University                                  38
 6 University of Michigan-Flint                              37
 7 University of Southern Maine                              34
 8 Fayetteville State University                             32
 9 Texas A & M University-Kingsville                         29
10 University of Arkansas at Little Rock                     28
11 Chicago State University                                  14

USM ranks 7th as shown on the list above

Exercise 8

Using ggplot() and geom_line() create a line plot of six_yr_grad_rate over time. The x-axis should be year. The line should be colored by peer_institution. Make sure only the graph displays, not the code.

# Use ggplot and geomline to create a line plot of six year graduation rate over time. The x-axis should be year. The line should be colored by peer institution
graduation |> 
  ggplot(aes(x = year, y = six_yr_grad_rate, color = peer_institution)) + 
  geom_line()

Exercise 9

There are too many institutions on the list. The code below will filter the list to only show the top and bottom two institutions along with USM. It creates a new tibble called top_bottom_usm. Use top_bottom_usm to create a line plot of six_yr_grad_rate over time. The x-axis should be year. The line should be colored by peer_institution. Make sure only the graph displays, not the code.

top <-
  graduation |> 
  filter(year == 2018) |>
  slice_max(six_yr_grad_rate, n = 2) |> 
  select(peer_institution) |> 
  pull()

bottom <-
  graduation |> 
  filter(year == 2018) |>
  slice_min(six_yr_grad_rate, n = 2) |> 
  select(peer_institution) |> 
  pull()

top_bottom_usm <-
  graduation |>
  filter(peer_institution %in% c(top, bottom, "University of Southern Maine"))
  
# Use top_bottom_usm to create a line plot of six year graduation rate over time. The x-axis should be `year`. The line should be colored by `peer_institution`
top_bottom_usm |> 
  ggplot(aes(x = year, y = six_yr_grad_rate, color = peer_institution)) + 
  geom_line()

Exercise 10

The lines are a bit thin. With Copilot enabled, create a code chunk that adds a comment line that reads: # make the lines in the prior line plot thicker. Describe what happened in the narrative below the code. If a warning is generated, describe that as well.

# make the lines in the prior line plot thicker
top_bottom_usm |> 
  ggplot(aes(x = year, y = six_yr_grad_rate, color = peer_institution)) + 
  geom_line(size = 1.5)

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

When running the code, this warning showed up in the console : “Warning: Using size aesthetic for lines was depreciated in ggplot2 3.4.0. Please use linewidth instead.”

Exercise 11

Take the graph from the prior exercise, add a title, and provide a better label for the y-axis. Use a minimal theme and if the code generates a warning either fix it or suppress the warning.

# Take the graph from the prior exercise, add a title, and provide a better label for the y-axis. Use a minimal theme
top_bottom_usm |> 
  ggplot(aes(x = year, y = six_yr_grad_rate, color = peer_institution)) + 
  geom_line(size = 1.5) + 
  labs(title = "Six Year Graduation Rate Over Time", y = "Six Year Graduation Rate") + 
  theme_minimal()

Submission

To submit your assignment:

Change the author name to your name in the YAML portion at the top of this document
Render your document to html and publish it to RPubs.
Submit the link to your Rpubs document in the Brightspace comments section for this assignment.
Click on the “Add a File” button and upload your .qmd file for this assignment to Brightspace.