Part A - Non Spatial

Summary:

Tidycensus is a package that pulls census data into your R script for data manipulation, visualization, and analytics. It was a package created by Kyle Walker who aimed to efficiently explore data without downloading from the census website/repository. During this lab, we explore population data related to graduate degree holders by county in the state of Arizona. Later we will explore poverty estimates at the tract census level for Rochester, New York. Living in poverty is often related to elevated crime levels in the area. Exploring poverty levels will enrich data insights when exploring crime data in the area.

Data Preparation

Downloading the correct packages that are used within the script will enable data manipulation and R programing.

library(ggplot2)
library(dplyr)        
library(tidyr)        
library(scales)       
library(units)
library(tidycensus)
library(plotly)
library(ggiraph)
library(survey)
library(srvyr)
library(tidyverse)
library(mapview)

Utilize the Tidycensus library to make a call to the census data based on specified variables. For this step, I used az_perc_degree to identify the function with get_acs, making a call to the county level in the state of Arizona, while specifying the variable for degree graduates in 2021.

az_perc_degree <- get_acs( # makes call to acs and the census data.
  geography = "county", # identifies the level of geography.
  state = "AZ", # identifies the location of the data.
  variables = "DP02_0066P", # specifically pulls the variable associated iwth graduate degree percentages.
  year = 2021 # identifies the year of census data.
)

Analysis

Once the information is pulled from the census website, it needs to be visualized. For this example we are going to use our az_perc_degree pull to identify a ggplot function as az_perc_degree_int. The ggplot will show the estimated population with a graduate degree, as well as the margin of error. In order to accurately portray this information, an altered box and whisker plot, or errorbar, is used to show the margin of error.

az_perc_degree_int <- ggplot(az_perc_degree, aes(x = estimate, # identifies the function ggplot with the az_perc_degree data, while using estimate as the x axis.
                           y = reorder(NAME, estimate))) + # uses the name and estimate as the y axis while reordering them by name and estimate.
  geom_errorbar(aes(xmin = estimate - moe, xmax = estimate + moe), # adds a error bar to the ggplot, while plotting the x minimum and x maximum.
                linewidth = 0.5, size = 0.5) + # creates the line width and size of the error bar.
  geom_point(color = "darkblue", size = 2) + # adds a point to the center of the error bar, while changing the color and size.
  scale_x_continuous(labels = label_number()) + # identifies the type of label for the x axis.
  scale_y_discrete(labels = function(x) str_remove(x, ", Arizona")) + # identifies the labels for the y axis while removing erroneous information after each county, such as "Arizona"
  labs(title = "Arizona Percentage with Graduate Degree", # creates several titles and map features. 
       subtitle = "Counties in Arizona",
       caption = "Data acquired with R and tidycensus. Error bars represent margin of error around estimates.",
       x = "ACS Estimated Percentage",
       y = "") + 
  theme_minimal(base_size = 12)

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Now that we have identified the ggplot, we can make it an interactive visualization with the ggplotly function. This one line of code allows the user to interact with the ggplotly and explore the data in depth by zooming in and comparing several variables.

ggplotly(az_perc_degree_int, tooltip = "x") # creates interactive visualization while adding a hover feature that displays the estimated percentages.

Which counties in the selected state have the largest percentages of graduate degree holders?

At first glance, Coconino County has the highest amount of graduate degree holders at just 15%. Pima County is a close second at 14%. However, based on the margin of error for both counties, it is possible that Pima County could have a higher number of graduate degree holders. Pima’s high-end MOE extends past 15% while Coconino’s could drop as low as 14%.

Which have the smallest percentages?

The same concept occurs with the bottom three counties. Graham, La Paz, and Greenlee all have MOEs that are within range of each other. If the estimate is accurate, Greenlee County has the smallest percentage of graduate degree holders with just 3%.

Does this method work well for your state?

This method works well for the top four counties. However, when analyzing the bottom-half and middle counties, their margin of error is so high that they could significantly move around in the percentiles of graduate degree holders. The large MOEs makes it difficult to accurately portray their graduate degree levels outside of the top four counties.

Part B - Spatial

Summary:

For this section, we will continue to utilize the Tinycensus library to look at population data. Specifically, we will look at Rochester, NY poverty levels throughout the county of Monroe. Monroe County is the county that makes up all of Rochester, NY. For the final project, I will be exploring car jacking crime in the Rochester from 2022 - 2023. High crime areas are often correlated directly to areas of high poverty. An article from 2014 analyzed high poverty areas related to crime and saw crime was doubled in areas that had high poverty (Harrell ,2014). As a result, understanding the areas in Rochester that experience high poverty may be related to areas with high crime, to include car jackings.

Data Preparation

Lets make a call with the Tidycensus package to get tract level data for the variable that looks at males under the poverty line. Furthermore, we will specify New York and Monroe county as our filters.

rochester_poverty <- get_acs(
  geography = "tract",
  variables = "B17001_003",
  state = "NY",
  county = "monroe",
  geometry = TRUE)

After analyzing the data, lets make manual breaks in the data so we can accurately visualize the high poverty areas.

breaks <- c(0, 200, 400, 600, 800, 1000) # inserts manual breaks in the data
color <- c("#e7d7c1", "#a78a7f", "#bf4342", "#8c1c13", "#540804") # identifies the color scheme

Analysis

Creating a ggplot to view the data in a choropleth map will help with understanding where in Monroe County, some of the high areas are for poverty.

ggplot(data = rochester_poverty) +
  geom_sf(aes(fill = estimate)) +
  scale_fill_distiller() +
  ggtitle("Rochester, NY Poverty Estimates") + 
  theme_void()

The ggplot is difficult to discern, it would be easier to visualize with a different color scheme and some interaction. Lets change the ggplot to a mapview that allows user interaction to zoom in and pan around the map.

mapview(rochester_poverty, zcol = 'estimate', at = breaks, col.regions = color) # one line code to create the map view with the rochester_poverty pull

Conclusion:

In conclusion, Monroe County has low poverty rates in the suburbs outside of the inner city. There are some larger areas in the west and southwest that have elevated levels of poverty. However, majority of the high poverty areas are in the inner city to the west. There is only one tract census that has poverty levels over 800. On first glance, the average poverty rate in the city is between 200 and 600.

Sources

Harrell, Erika. (2014). U.S. Department of Justice. Household Poverty and Nonfatal Violence Victimization, 2008-2012. NCJ 248384. Bureau of Justice Statistics.

Exploring Tidycensus and Relevant Data

Cody Longbotham

2024-04-01

Part A - Non Spatial

Summary:

Data Preparation

Analysis

Part B - Spatial

Summary:

Data Preparation

Analysis

Conclusion: