Assignment 3

Author

Theresa Anderson

We will be using two files complied from the American Community Survey five year estimates for 2022 (which is built over a sample of surveys sent out over the five year period from 2017-2022). Both files contain data from towns in Cumberland County, Maine. The first file loaded is cumberland_places.csv and contains the following columns:

The second file, cumberland_edu.csv contains:

We’ll start by loading the tidyverse family of packages, ggrepel for our graph, and gt (optional) for making pretty tables, and read in the data into two tibbles (cumberland_places and cumberland_edu). We’ll be using the message: false option to suppress the output message from loading tidyverse and gt

```{r}
#| message: false
library(tidyverse)
library(ggrepel)
library(gt)
cumberland_places <- read_csv("https://jsuleiman.com/datasets/cumberland_places.csv")
cumberland_edu <- read_csv("https://jsuleiman.com/datasets/cumberland_edu.csv")
```

1 Exercises

There are five exercises in this assignment. The Grading Rubric is available at the end of this document. You will need to create your own code chunks for this assignment. Remember, you can do this with Insert -> Executable Cell -> R while in Visual mode.

1.1 Exercise 1

Create a tibble called maine_data that joins the two tibbles. Think about what matching columns you will be using in your join by clause. Review the joined data and state explicitly in the narrative whether you see any NA values and why you think they might exist.

Code
glimpse(cumberland_places)
Rows: 28
Columns: 7
$ town          <chr> "Baldwin", "Bridgton", "Brunswick", "Cape Elizabeth", "C…
$ county        <chr> "Cumberland", "Cumberland", "Cumberland", "Cumberland", …
$ state         <chr> "ME", "ME", "ME", "ME", "ME", "ME", "ME", "ME", "ME", "M…
$ pop           <dbl> 1180, 5471, 21691, 9519, 3657, 597, 8443, 12504, 8700, 2…
$ top5          <dbl> 212856, 315914, 418666, 833724, 392392, 668189, 998035, …
$ median_income <dbl> 68625, 78546, 71236, 144250, 60708, 58571, 144167, 14411…
$ median_age    <dbl> 50.3, 47.6, 41.3, 48.6, 51.1, 55.6, 40.6, 47.7, 47.5, 68…
Code
glimpse(cumberland_edu)
Rows: 28
Columns: 5
$ town            <chr> "Baldwin", "Bridgton", "Brunswick", "Cape Elizabeth", …
$ county          <chr> "Cumberland", "Cumberland", "Cumberland", "Cumberland"…
$ state           <chr> "ME", "ME", "ME", "ME", "ME", "ME", "ME", "ME", "ME", …
$ n_pop_over_25   <dbl> 927, 4170, 14910, 6809, 3006, 430, 5782, 8800, 6516, 2…
$ n_masters_above <dbl> 88, 564, 3628, 2803, 465, 125, 2222, 2954, 1904, 4, 13…
maine_data <- cumberland_places |>
  inner_join(cumberland_edu)
Joining with `by = join_by(town, county, state)`
glimpse(maine_data)
Rows: 28
Columns: 9
$ town            <chr> "Baldwin", "Bridgton", "Brunswick", "Cape Elizabeth", …
$ county          <chr> "Cumberland", "Cumberland", "Cumberland", "Cumberland"…
$ state           <chr> "ME", "ME", "ME", "ME", "ME", "ME", "ME", "ME", "ME", …
$ pop             <dbl> 1180, 5471, 21691, 9519, 3657, 597, 8443, 12504, 8700,…
$ top5            <dbl> 212856, 315914, 418666, 833724, 392392, 668189, 998035…
$ median_income   <dbl> 68625, 78546, 71236, 144250, 60708, 58571, 144167, 144…
$ median_age      <dbl> 50.3, 47.6, 41.3, 48.6, 51.1, 55.6, 40.6, 47.7, 47.5, …
$ n_pop_over_25   <dbl> 927, 4170, 14910, 6809, 3006, 430, 5782, 8800, 6516, 2…
$ n_masters_above <dbl> 88, 564, 3628, 2803, 465, 125, 2222, 2954, 1904, 4, 13…

I see an NA value within the top5 category, which indicates the median income for the top 5% of earners within a town. It may be the full data was not collected for this town.

1.2 Exercise 2

Since the dataset only has 28 towns, you don’t need to show code to answer the questions in Exercise 2, you can simply look at the table and add the answers to your narrative. Make sure you specify the town name and the actual value that answers the question.

What town has the most people?

Portland has the most people with a population of 68,280 .

Code
maine_data |>
  select(town, pop) |>
  filter(pop == max(pop))
# A tibble: 1 × 2
  town       pop
  <chr>    <dbl>
1 Portland 68280

What town has the highest median age?

Frye Island is the town with the highest median age at 68.7

Code
maine_data |>
  select(town, median_age) |>
  filter(median_age == max(median_age))
# A tibble: 1 × 2
  town        median_age
  <chr>            <dbl>
1 Frye Island       68.7

What town has the highest median income?

Cape Elizabeth is the town with the highest median income at $144,250.

Code
maine_data |>
  select(town, median_income) |>
  filter(median_income == max(median_income))
# A tibble: 1 × 2
  town           median_income
  <chr>                  <dbl>
1 Cape Elizabeth        144250

I opted to run code for each of these because I feel as though I need additional practice.

1.3 Exercise 3

Add a column to maine_data called to pct_grad_degree that shows the percentage of graduate degrees for the town, which is defined as n_masters_above / n_pop_over_25

maine_data <- maine_data |>
  mutate(pct_grad_degree = n_masters_above/n_pop_over_25)
glimpse(maine_data)
Rows: 28
Columns: 10
$ town            <chr> "Baldwin", "Bridgton", "Brunswick", "Cape Elizabeth", …
$ county          <chr> "Cumberland", "Cumberland", "Cumberland", "Cumberland"…
$ state           <chr> "ME", "ME", "ME", "ME", "ME", "ME", "ME", "ME", "ME", …
$ pop             <dbl> 1180, 5471, 21691, 9519, 3657, 597, 8443, 12504, 8700,…
$ top5            <dbl> 212856, 315914, 418666, 833724, 392392, 668189, 998035…
$ median_income   <dbl> 68625, 78546, 71236, 144250, 60708, 58571, 144167, 144…
$ median_age      <dbl> 50.3, 47.6, 41.3, 48.6, 51.1, 55.6, 40.6, 47.7, 47.5, …
$ n_pop_over_25   <dbl> 927, 4170, 14910, 6809, 3006, 430, 5782, 8800, 6516, 2…
$ n_masters_above <dbl> 88, 564, 3628, 2803, 465, 125, 2222, 2954, 1904, 4, 13…
$ pct_grad_degree <dbl> 0.09492988, 0.13525180, 0.24332663, 0.41166104, 0.1546…

1.4 Exercise 4

What town has the lowest percentage of graduate degrees for people over 25?

Baldwin has the lowest percentage of graduate degrees for people over 25 at a rate of 9.5%.

maine_data |>
  select(town, pct_grad_degree) |>
  filter(pct_grad_degree == min(pct_grad_degree))
# A tibble: 1 × 2
  town    pct_grad_degree
  <chr>             <dbl>
1 Baldwin          0.0949

1.5 Exercise 5

Replicate this graph. Note: use geom_label_repel() just like you would use geom_label() Discuss any patterns you see in the narrative.

library(ggrepel)
maine_data |>
  ggplot(aes(x = median_income, y = pct_grad_degree)) +
  geom_label_repel(aes(label = town), color="blue") +
  theme_minimal()

The three towns with the highest median income Cape Elizabeth, Cumberland, and Falmouth, all also have the highest percentage of masters degrees for people over 25. Overall, a correlation can be seen between higher median income and the rate of masters degrees. However, there are quite a few towns with higher median incomes who have a lower percentage of people over 25 with graudate degrees, such as Gorham or Windham. 

2 Submission

To submit your assignment:

  • Change the author name to your name in the YAML portion at the top of this document
  • Render your document to html and publish it to RPubs.
  • Submit the link to your Rpubs document in the Brightspace comments section for this assignment.
  • Click on the “Add a File” button and upload your .qmd file for this assignment to Brightspace.

3 Grading Rubric

Item
(percent overall)
100% - flawless 67% - minor issues 33% - moderate issues 0% - major issues or not attempted
Narrative: typos and grammatical errors
(8%)
Document formatting: correctly implemented instructions
(8%)

Exercises

(15% each)

Submitted properly to Brightspace

(9%)

NA NA You must submit according to instructions to receive any credit for this portion.