Jamal Butt Assignment 1 Go to the shared posit.cloud workspace for this class and open the lab01_assign01 project. Open the lab01.qmd file and complete the exercises. Below is an annotated guide to assist you. There is also a video in the Brightspace Todo section for this module.
We will be using USM peer graduation rate comparison data from the UMS Dashboard.
Since this is the first assignment, the grading rubric is provided at the top of the document. For future assignments, the rubric will be at the end of the document.
1 Grading Rubric
Item (percent overall) 100% - flawless 67% - minor issues 33% - moderate issues 0% - major issues or not attempted Narrative: typos and grammatical errors (8%)
Document formatting: correctly implemented instructions (8%)
Exercises
(7% each)
Submitted properly to Brightspace
(7%)
NA NA You must submit according to instructions to receive any credit for this portion. 2 Exercises
There are eleven exercises in this assignment.
2.1 Exercise 1 Load the tidyverse family of packages and read the graduation.csv file into a tibble called graduation. Suppress the message generated from loading tidyverse. The file is available at https://jsuleiman.com/datasets/graduation.csv
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 77 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): peer_institution, six_yr_grad_rate
dbl (3): year, four_yr_grad_rate, retention
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Note Exercise 1 has already been completed for you. It won’t be for future assignments. library(tidyverse) graduation <- read_csv(“https://jsuleiman.com/datasets/graduation.csv”)
2.2 Exercise 2 glimpse() the graduation data in the code cell below.
Describe any potential issues with sixyrgradrate in your narrative for this exercise.
2.3 Exercise 3 Modify graduation so the data type for sixyrgradrate is a dbl format. Overwrite the existing tibble.
Note Exercise 3 has already been completed for you since we haven’t fully covered how to do this. The mutate() function should look familiar but parse_number() is new. graduation <- graduation |> mutate(six_yr_grad_rate = parse_number(six_yr_grad_rate)/100) glimpse(graduation)
Exercise 4 List the peer schools for University of Southern Maine without duplicates and without listing University of Southern Maine.
graduation |>distinct(peer_institution) |>pull()
[1] "University of Southern Maine"
[2] "University of Michigan-Flint"
[3] "University of Arkansas at Little Rock"
[4] "Texas Woman's University"
[5] "Texas A & M University-Kingsville"
[6] "Salem State University"
[7] "North Carolina Central University"
[8] "Murray State University"
[9] "Fayetteville State University"
[10] "Chicago State University"
[11] "California State University-Dominguez Hills"
2.5 Exercise 5 List the top two six_yr_grad_rate and the instution for each. Filter by the most recent year in the dataset.
top <- graduation |>filter(year ==2018) |>slice_max(six_yr_grad_rate, n =2) |>select(peer_institution) |>pull()
2.6 Exercise 6 List the top two four_yr_grad_rate and the instution for each. Filter by the most recent year in the dataset.
top <- graduation |>filter(year ==2018) |>slice_max(four_yr_grad_rate, n =2) |>select(peer_institution) |>pull()
2.7 Exercise 7 Use the arrange function arrange(desc(six_yr_grad_rate)) to show the six_yr_grad_rate for the most recent year in the dataset? In the narrative below the code, mention where USM ranks in the list.
# A tibble: 11 × 2
peer_institution six_yr_grad_rate
<chr> <chr>
1 Salem State University 5200.00%
2 Murray State University 4900.00%
3 North Carolina Central University 4300.00%
4 California State University-Dominguez Hills 4200.00%
5 Texas Woman's University 3800.00%
6 University of Michigan-Flint 3700.00%
7 University of Southern Maine 3400.00%
8 Fayetteville State University 3200.00%
9 Texas A & M University-Kingsville 2900.00%
10 University of Arkansas at Little Rock 2800.00%
11 Chicago State University 1400.00%
2.8 Exercise 8 Using ggplot() and geom_line() create a line plot of six_yr_grad_rate over time. The x-axis should be year. The line should be colored by peer_institution. Make sure only the graph displays, not the code.
2.9 Exercise 9 There are too many institutions on the list. The code below will filter the list to only show the top and bottom two institutions along with USM. It creates a new tibble called top_bottom_usm. Use top_bottom to create a line plot of six_yr_grad_rate over time. The x-axis should be year. The line should be colored by peer_institution. Make sure only the graph displays, not the code.
top <- graduation |> filter(year == 2018) |> slice_max(six_yr_grad_rate, n = 2) |> select(peer_institution) |> pull()
2.10 Exercise 10 The lines are a bit thin. With Copilot enabled, create a code chunk that adds a comment line that reads: # make the lines in the prior line plot thicker. Describe what happened in the narrative below the code. If a warning is generated, describe that as well.
top <- graduation |>filter(year ==2018) |>slice_max(six_yr_grad_rate, n =2) |>select(peer_institution) |>pull()bottom <- graduation |>filter(year ==2018) |>slice_min(six_yr_grad_rate, n =2) |>select(peer_institution) |>pull()top_bottom_usm <- graduation |>filter(peer_institution %in%c(top, bottom, "University of Southern Maine"))# make the lines in the prior line plot thicker.top_bottom_usm |>ggplot(aes(x = year, y = six_yr_grad_rate)) +geom_line(aes(color = peer_institution), size =1.5) +theme_minimal()
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
make the lines in the prior line plot thicker
2.11 Exercise 11 Take the graph from the prior exercise, add a title, and provide a better label for the y-axis. Use a minimal theme and if the code generates a warning either fix it or suppress the warning.
top <- graduation |>filter(year ==2018) |>slice_max(six_yr_grad_rate, n =2) |>select(peer_institution) |>pull()bottom <- graduation |>filter(year ==2018) |>slice_min(six_yr_grad_rate, n =2) |>select(peer_institution) |>pull()top_bottom_usm <- graduation |>filter(peer_institution %in%c(top, bottom, "University of Southern Maine"))top_bottom_usm |>ggplot(aes(x = year, y = six_yr_grad_rate)) +geom_line(aes(color = peer_institution), size =1.5) +labs(title ="USM VS. Top and Bottom Peer-Institutions' Graduation Rate",y ="Six-Year Graduation %" ) +theme_minimal()
3 Submission
To submit your assignment:
Change the author name to your name in the YAML portion at the top of this document Render your document to html and publish it to RPubs. Submit the link to your Rpubs document in the Brightspace comments section for this assignment. Click on the “Add a File” button and upload your .Rmd file for this assignment to Brightspace.