Hello and welcome to our Introduction to R for Non-Profits Workshop :)
Goals of this R Markdown (Rmd)
1. Import data using an R package connected to an API (Applicatoin Programming Interface)
- In other words, we don't have to:
- manually go to a website
- hit download
- move data from downloads to appropriate folder
- write code (ex. read_csv()) to import the data into R
- Instead, we [request an API key here](https://api.census.gov/data/key_signup.html)
- Place our unique api key in the code below
- Import all the census data we would like with the following function
- get_acs()
- Explore which variables exist (depending on survey) with the following function
- load_variables()
2. Visualize Family Poverty Data in Honolulu County
- Map
- Time Series
- Scatter plot with high school diploma attainment
It may be helpful to think of packages as toolboxes which contain functions, the tools
To save time, these packages are wrapped in a function that installs any package that you do not currently have installed
If you ever receive the error message: “package name” does not exist, the fix is simple: - Go to the console in R Studio and type install.packages(“name of package you want to install”)
knitr::opts_chunk$set(echo = TRUE)
if (!require(librarian)){
install.packages("librarian")
library(librarian)
}
librarian::shelf(tidycensus, # star of the show, what we will use to import data
tidyverse, # most popular R Package for manipulating data
reactable, # makes fun, nice looking, interactive tables
plotly, # makes static graphs interactive
ggiraph, # makes static maps interactive
ggpubr) # does some statistics and prints the results on our graphs
options(progress_enabled=FALSE) # picky but i didn't want the progress bar appearing in pdf
Request Here Place the API Key emailed to you in the code below replacing 06f9dcf5172cd1b403f9a6c34beea0d7929604f3 with your own
census_api_key("06f9dcf5172cd1b403f9a6c34beea0d7929604f3")
Now that we have access to census data, what variables can we explore?
This code chunk below shows us all the variables from the American Community Survey (ACS) 1 Year Survey Data Profiles for 2022
Data Profiles have the most frequently requested social, economic, housing, and demographic data. Each of these four subject areas is a separate data profile. The Data Profiles summarize the data for a single geographic area, both numbers and percent, to cover the most basic data on all topics.
Here is an amazing guide to navigating census data and using tidycensus
acs_2022_variables <- load_variables(2022, "acs1/profile")
The dataframe we just created (acs_2022_variables) has all the info we need, but is pretty bland Let’s create a table that is prettier and easier to navigate - works best when knitted
reactable(acs_2022_variables, filterable = TRUE, showPageSizeOptions = TRUE, minRows = 10)
Let’s investigate family poverty on Oahu at the census tract level
DP03_0119P is the code for Percent!!PERCENTAGE OF FAMILIES AND PEOPLE WHOSE INCOME IN THE PAST 12 MONTHS IS BELOW THE POVERTY LEVEL!!All families
oahu_family_poverty <- get_acs(
geography = "tract",
variables = c(percent_of_families_with_income_below_poverty_line = "DP03_0119P"),
state = "HI",
county = "Honolulu",
geometry = TRUE,
output = "wide",
year = 2022
) %>%
dplyr::filter(GEOID != "15003981200" & GEOID !="15003981900") # Tract in Northwestern Hawaiian Islands that makes map very small and Mamala Bay Golf Course that has %100 percent of families living below the poverty line
Congragulations! You have successfully imported census data into R
Now, let’s map the data
family_poverty_map <- ggplot(oahu_family_poverty) +
geom_sf_interactive(
aes(
fill = percent_of_families_with_income_below_poverty_lineE,
tooltip = paste(NAME, ": ", percent_of_families_with_income_below_poverty_lineE, "%"),
data_id = NAME
)
) +
scale_fill_viridis_c(option = "magma") +
theme_void()
girafe(ggobj = family_poverty_map, width = 700, height = 400)
Here is a list of the Tract Names with reference numbers
Now let’s look at family poverty in Honolulu County over time
years <- 2010:2019
names(years) <- years
family_poverty_2009_2019 <- map_dfr(years, ~{
get_acs(
geography = "county",
variables = c(percent_of_families_with_income_below_poverty_line = "DP03_0119P"),
state = "HI",
county = "Honolulu",
year = .x,
survey = "acs1"
)
}, .id = "year")
ggplot(family_poverty_2009_2019, aes(x = year, y = estimate, group = 1)) +
geom_line() +
geom_point()
This graph is okay, but we can do alot to make it more informative and nicer too look at
ggplot(family_poverty_2009_2019, aes(x = year, y = estimate, group = 1)) +
geom_ribbon(aes(ymax = estimate + moe, ymin = estimate - moe),
fill = "steelblue",
alpha = 0.4) +
geom_line(color = "steelblue") +
geom_point(color = "steelblue", size = 2) +
theme_minimal() +
#scale_y_continuous(labels = label_dollar(scale = .001, suffix = "k")) +
labs(title = "Percentage of Families Under the Poverty Line: Honolulu County",
x = "Year",
y = "%",
caption = "Shaded area represents margin of error around the estimate")
## Scatter Plot
Let’s explore the relationship between poverty and education with a scatter plot
oahu_family_poverty_college_education <- get_acs(
geography = "tract",
variables = c(percent_of_families_with_income_below_poverty_line = "DP03_0119P",
percent_of_indivivuals_with_bachelors_degree_or_higher = "DP02_0065P"),
state = "HI",
county = "Honolulu",
#geometry = TRUE,
output = "wide",
year = 2021
) %>%
dplyr::filter(GEOID != "15003981200" & GEOID !="15003981900")
ggplot(oahu_family_poverty_college_education, aes(x = percent_of_families_with_income_below_poverty_lineE,
y = percent_of_indivivuals_with_bachelors_degree_or_higherE)) +
geom_point() +
geom_smooth() +
stat_cor() +
theme_minimal()
This is pretty interesting
With each point being a census tract, our graph hints that there is some relationship between poverty and education at the tract level
- The number with the "R =" up top tells us that the correlation coefficient is -0.38
- This is on a scale from -1 to 1
- Negative correlations describe relationships where one variable increases, the other decreases (and vice versa)
- Positive correlations describe relationships where variables move together (up or down)
- Here is a [really neat tool for exploring correlations](https://rpsychologist.com/correlation/)
- So in summary, there is a negative correlation between the variables, but it is not very strong
This may cause you to think!!
- What variables included in the American Community Survey are the most correlated with family poverty?
- We can explore this with correlation matricies and machine learning tools that we can discuss in the future
Can you replace family poverty with another variable and