```{r}
#| message: false
library(tidyverse)
library(ggrepel)
library(gt)
cumberland_places <- read_csv("https://jsuleiman.com/datasets/cumberland_places.csv")
cumberland_edu <- read_csv("https://jsuleiman.com/datasets/cumberland_edu.csv")
```
Assignment 3
We will be using two files complied from the American Community Survey five year estimates for 2022 (which is built over a sample of surveys sent out over the five year period from 2017-2022). Both files contain data from towns in Cumberland County, Maine. The first file loaded is cumberland_places.csv
and contains the following columns:
town
- town namecounty
- Cumberland for all recordsstate
- ME for all recordspop
- an estimate of the populationtop5
- median income for the top 5% of earners in the townmedian_income
- median income for all adults in the townmedian_age
- median age for the town
The second file, cumberland_edu.csv
contains:
town
,county
,state
- same values as the first filen_pop_over_25
- the number of people over 25 in the townn_masters_above
- the number of people over 25 that have a Master’s degree or higher in the town.
We’ll start by loading the tidyverse family of packages, ggrepel
for our graph, and gt
(optional) for making pretty tables, and read in the data into two tibbles (cumberland_places
and cumberland_edu
). We’ll be using the message: false
option to suppress the output message from loading tidyverse
and gt
1 Exercises
There are five exercises in this assignment. The Grading Rubric is available at the end of this document. You will need to create your own code chunks for this assignment. Remember, you can do this with Insert -> Executable Cell -> R while in Visual mode.
1.1 Exercise 1
Create a tibble called maine_data
that joins the two tibbles. Think about what matching columns you will be using in your join by clause. Review the joined data and state explicitly in the narrative whether you see any NA values and why you think they might exist.
<- cumberland_places |>
maine_data inner_join(cumberland_edu)
Joining with `by = join_by(town, county, state)`
glimpse(maine_data)
Rows: 28
Columns: 9
$ town <chr> "Baldwin", "Bridgton", "Brunswick", "Cape Elizabeth", …
$ county <chr> "Cumberland", "Cumberland", "Cumberland", "Cumberland"…
$ state <chr> "ME", "ME", "ME", "ME", "ME", "ME", "ME", "ME", "ME", …
$ pop <dbl> 1180, 5471, 21691, 9519, 3657, 597, 8443, 12504, 8700,…
$ top5 <dbl> 212856, 315914, 418666, 833724, 392392, 668189, 998035…
$ median_income <dbl> 68625, 78546, 71236, 144250, 60708, 58571, 144167, 144…
$ median_age <dbl> 50.3, 47.6, 41.3, 48.6, 51.1, 55.6, 40.6, 47.7, 47.5, …
$ n_pop_over_25 <dbl> 927, 4170, 14910, 6809, 3006, 430, 5782, 8800, 6516, 2…
$ n_masters_above <dbl> 88, 564, 3628, 2803, 465, 125, 2222, 2954, 1904, 4, 13…
From what can be seen, there is one NA value, that coming from one of the top5 data numbers that aligns with Frye Island. This NA value might exist because there was not enough data.
1.2 Exercise 2
Since the dataset only has 28 towns, you don’t need to show code to answer the questions in Exercise 2, you can simply look at the table and add the answers to your narrative. Make sure you specify the town name and the actual value that answers the question.
maine_data
# A tibble: 28 × 9
town county state pop top5 median_income median_age n_pop_over_25
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Baldwin Cumbe… ME 1180 212856 68625 50.3 927
2 Bridgton Cumbe… ME 5471 315914 78546 47.6 4170
3 Brunswick Cumbe… ME 21691 418666 71236 41.3 14910
4 Cape Elizab… Cumbe… ME 9519 833724 144250 48.6 6809
5 Casco Cumbe… ME 3657 392392 60708 51.1 3006
6 Chebeague I… Cumbe… ME 597 668189 58571 55.6 430
7 Cumberland Cumbe… ME 8443 998035 144167 40.6 5782
8 Falmouth Cumbe… ME 12504 919545 144118 47.7 8800
9 Freeport Cumbe… ME 8700 535344 95398 47.5 6516
10 Frye Island Cumbe… ME 20 NA 101250 68.7 20
# ℹ 18 more rows
# ℹ 1 more variable: n_masters_above <dbl>
What town has the most people? Portland, with a population fo 68280.
What town has the highest median age? Frye Island, with a median age of 68.7
What town has the highest median income? Cape Elizabeth, with a median income of 144250.
1.3 Exercise 3
Add a column to maine_data
called to pct_grad_degree
that shows the percentage of graduate degrees for the town, which is defined as n_masters_above / n_pop_over_25
<- maine_data |>
maine_data mutate(pct_grad_degree = n_masters_above / n_pop_over_25)
glimpse(maine_data)
Rows: 28
Columns: 10
$ town <chr> "Baldwin", "Bridgton", "Brunswick", "Cape Elizabeth", …
$ county <chr> "Cumberland", "Cumberland", "Cumberland", "Cumberland"…
$ state <chr> "ME", "ME", "ME", "ME", "ME", "ME", "ME", "ME", "ME", …
$ pop <dbl> 1180, 5471, 21691, 9519, 3657, 597, 8443, 12504, 8700,…
$ top5 <dbl> 212856, 315914, 418666, 833724, 392392, 668189, 998035…
$ median_income <dbl> 68625, 78546, 71236, 144250, 60708, 58571, 144167, 144…
$ median_age <dbl> 50.3, 47.6, 41.3, 48.6, 51.1, 55.6, 40.6, 47.7, 47.5, …
$ n_pop_over_25 <dbl> 927, 4170, 14910, 6809, 3006, 430, 5782, 8800, 6516, 2…
$ n_masters_above <dbl> 88, 564, 3628, 2803, 465, 125, 2222, 2954, 1904, 4, 13…
$ pct_grad_degree <dbl> 0.09492988, 0.13525180, 0.24332663, 0.41166104, 0.1546…
1.4 Exercise 4
What town has the lowest percentage of graduate degrees for people over 25?
|>
maine_data select(town, pct_grad_degree) |>
filter(pct_grad_degree == min(pct_grad_degree))
# A tibble: 1 × 2
town pct_grad_degree
<chr> <dbl>
1 Baldwin 0.0949
1.5 Exercise 5
Replicate this graph. Note: use geom_label_repel()
just like you would use geom_label()
Discuss any patterns you see in the narrative.
library(ggrepel)
|>
maine_data ggplot(aes(x = median_income, y = pct_grad_degree)) +
geom_label_repel(aes(label = town), color="blue") +
scale_y_continuous(labels = scales::comma) +
theme_minimal()
2 Submission
To submit your assignment:
- Change the author name to your name in the YAML portion at the top of this document
- Render your document to html and publish it to RPubs.
- Submit the link to your Rpubs document in the Brightspace comments section for this assignment.
- Click on the “Add a File” button and upload your .qmd file for this assignment to Brightspace.
3 Grading Rubric
Item (percent overall) |
100% - flawless | 67% - minor issues | 33% - moderate issues | 0% - major issues or not attempted |
---|---|---|---|---|
Narrative: typos and grammatical errors (8%) |
||||
Document formatting: correctly implemented instructions (8%) |
||||
Exercises (15% each) |
||||
Submitted properly to Brightspace (9%) |
NA | NA | You must submit according to instructions to receive any credit for this portion. |