library(tidyverse)
<- read_csv("https://jsuleiman.com/datasets/graduation.csv") graduation
Assignment 1
We will be using USM peer graduation rate comparison data from the UMS Dashboard.
Exercises
There are eleven exercises in this assignment.
Exercise 1
Load the tidyverse
family of packages and read the graduation.csv
file into a tibble
called graduation
. Suppress the message generated from loading tidyverse
. The file is available at https://jsuleiman.com/datasets/graduation.csv
Exercise 1 has already been completed for you. It won’t be for future assignments.
Exercise 2
glimpse()
the graduation data in the code cell below.
glimpse(graduation)
Rows: 77
Columns: 5
$ year <dbl> 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2012, 2013…
$ peer_institution <chr> "University of Southern Maine", "University of South…
$ four_yr_grad_rate <dbl> 17, 10, 9, 10, 13, 13, 14, 11, 9, 10, 9, 9, 11, 13, …
$ retention <dbl> 64, 67, 65, 64, 63, 68, 70, 74, 80, 72, 72, 68, 77, …
$ six_yr_grad_rate <chr> "3500.00%", "3300.00%", "3000.00%", "3400.00%", "320…
Describe any potential issues with sixyrgradrate
in your narrative for this exercise.
The format for sixyrgradrate is different and could cause issues with comparison and interpretation.
Exercise 3
Modify graduation
so the data type for sixyrgradrate
is a dbl
format. Overwrite the existing tibble.
Exercise 3 has already been completed for you since we haven’t fully covered how to do this. The mutate()
function should look familiar but parse_number()
is new.
<- graduation |>
graduation mutate(six_yr_grad_rate = parse_number(six_yr_grad_rate)/100)
glimpse(graduation)
Rows: 77
Columns: 5
$ year <dbl> 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2012, 2013…
$ peer_institution <chr> "University of Southern Maine", "University of South…
$ four_yr_grad_rate <dbl> 17, 10, 9, 10, 13, 13, 14, 11, 9, 10, 9, 9, 11, 13, …
$ retention <dbl> 64, 67, 65, 64, 63, 68, 70, 74, 80, 72, 72, 68, 77, …
$ six_yr_grad_rate <dbl> 35, 33, 30, 34, 32, 33, 34, 39, 37, 36, 38, 34, 38, …
Exercise 4
List the peer schools for University of Southern Maine without duplicates and without listing University of Southern Maine. For these exercises, if there is no code chunk (like this one) position your cursor in the whitespace above the next exercise heading and select Insert -> Executable Cell -> R
If there is no whitespace, position your cursor to the left of the E in Exercise and hit enter to create a blank line.
|>
graduation pull(peer_institution) |>
unique() |>
setdiff("University of Southern Maine")
[1] "University of Michigan-Flint"
[2] "University of Arkansas at Little Rock"
[3] "Texas Woman's University"
[4] "Texas A & M University-Kingsville"
[5] "Salem State University"
[6] "North Carolina Central University"
[7] "Murray State University"
[8] "Fayetteville State University"
[9] "Chicago State University"
[10] "California State University-Dominguez Hills"
Exercise 5
List the top two six_yr_grad_rate
and the instution for each. Filter by the most recent year in the dataset.
|>
graduation filter(year == 2018) |>
slice_max(six_yr_grad_rate, n = 2)
# A tibble: 2 × 5
year peer_institution four_yr_grad_rate retention six_yr_grad_rate
<dbl> <chr> <dbl> <dbl> <dbl>
1 2018 Salem State University 28 74 52
2 2018 Murray State University 25 79 49
Exercise 6
List the top two four_yr_grad_rate
and the instution for each. Filter by the most recent year in the dataset.
|>
graduation filter(year == 2018) |>
slice_max(four_yr_grad_rate, n = 2)
# A tibble: 2 × 5
year peer_institution four_yr_grad_rate retention six_yr_grad_rate
<dbl> <chr> <dbl> <dbl> <dbl>
1 2018 Salem State University 28 74 52
2 2018 Murray State University 25 79 49
Exercise 7
Use the arrange function arrange(desc(six_yr_grad_rate))
to show the six_yr_grad_rate
for the most recent year in the dataset? In the narrative below the code, mention where USM ranks in the list.
|>
graduation arrange(desc(six_yr_grad_rate)) |>
filter(year == 2018)|>
pull(six_yr_grad_rate, peer_institution)
Salem State University
52
Murray State University
49
North Carolina Central University
43
California State University-Dominguez Hills
42
Texas Woman's University
38
University of Michigan-Flint
37
University of Southern Maine
34
Fayetteville State University
32
Texas A & M University-Kingsville
29
University of Arkansas at Little Rock
28
Chicago State University
14
"USM ranks 7th in the list."
[1] "USM ranks 7th in the list."
Exercise 8
Using ggplot()
and geom_line()
create a line plot of six_yr_grad_rate
over time. The x-axis should be year
. The line should be colored by peer_institution
. Make sure only the graph displays, not the code.
|>
graduation ggplot(aes(x = year, y = six_yr_grad_rate, color = peer_institution)) +
geom_line()
Exercise 9
There are too many institutions on the list. The code below will filter the list to only show the top and bottom two institutions along with USM. It creates a new tibble called top_bottom_usm
. Use top_bottom
to create a line plot of six_yr_grad_rate
over time. The x-axis should be year
. The line should be colored by peer_institution
. Make sure only the graph displays, not the code.
<-
top |>
graduation filter(year == 2018) |>
slice_max(six_yr_grad_rate, n = 2) |>
select(peer_institution) |>
pull()
<-
bottom |>
graduation filter(year == 2018) |>
slice_min(six_yr_grad_rate, n = 2) |>
select(peer_institution) |>
pull()
<-
top_bottom_usm |>
graduation filter(peer_institution %in% c(top, bottom, "University of Southern Maine"))
# add your code here
|>
top_bottom_usm ggplot(aes(x = year, y = six_yr_grad_rate, color = peer_institution)) +
geom_line()
Exercise 10
The lines are a bit thin. With Copilot enabled, create a code chunk that adds a comment line that reads: # make the lines in the prior line plot thicker
. Describe what happened in the narrative below the code. If a warning is generated, describe that as well.
# make the lines in the prior line plot thicker
|>
top_bottom_usm ggplot(aes(x = year, y = six_yr_grad_rate, color = peer_institution)) +
geom_line(size = 1.5)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
The lines in the graph were made thicker using the geom_line() function. The warning suggested using the word “linewidth” instead of “size” for the function the graph precise.
Exercise 11
Take the graph from the prior exercise, add a title, and provide a better label for the y-axis. Use a minimal theme and if the code generates a warning either fix it or suppress the warning.
Submission
To submit your assignment:
- Change the author name to your name in the YAML portion at the top of this document
- Render your document to html and publish it to RPubs.
- Submit the link to your Rpubs document in the Brightspace comments section for this assignment.
- Click on the “Add a File” button and upload your .qmd file for this assignment to Brightspace.