library(tidyverse)library(tidycensus)library(scales)# look at the help section for the load_variables() function # run the line of code below in your console and look at the help section?load_variables
HELP
List 2020 Census variables
Data available from tidycensus:
The 2020 Census Redistricting Data population data - pl
Demographic and Housing Characteristics - dhc
Demographic Profile - dp
To view all of the variables in any of these tables use the load_variables() function
# create table of all variables in the 2020 redistricting filepl_2020 <-load_variables(2020, "pl", cache = T)
cache = TRUE means it’s faster to load the next time
Data in the Redistricting Dataset
P1. Race
P2. Hispanic or Latino, and Not Hispanic or Latino by Race
P3. Race for the Population 18 Years and Over
P4. Hispanic or Latino, and Not Hispanic or Latino by Race for the Population 18 Years and Over
P5. Group Quarters Population by Major Group Quarters Type
H1. Occupancy Status
Import Housing Units data
housing_units <-get_decennial(geography ="state",variables =c(housing_units ="H1_001N"), year =2020)
GEOID
NAME
variable
value
42
Pennsylvania
housing_units
5742828
06
California
housing_units
14392140
A question:
What percentage of housing units receive an American Community Survey each year?
The answer
# ACS questionaires go to 3.5 million addresses each yearacs_percent <-3500000/sum(housing_units$value)acs_percent
Use the reorder() function to alphabetize the states
Format the y-axis as %
ggplot(data=housing_2020, aes(x=reorder(state,pct_vacant), y=pct_vacant)) +geom_col() +scale_y_continuous(labels =percent_format(accuracy =1)) +theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5)) +labs(x ="State", y ="Percent Vacant",title ="Proportion of Housing Units that are vacant")
Plot each state and reorder columns - code
Redistricting Race/Ethnicity data
Let’s look at the list of variables again.
P1. Race
P2. Hispanic or Latino, and Not Hispanic or Latino by Race
P3. Race for the Population 18 Years and Over
P4. Hispanic or Latino, and Not Hispanic or Latino by Race for the Population 18 Years and Over
P5. Group Quarters Population by Major Group Quarters Type
H1. Occupancy Status
Find the variable for:
Total Population
Hispanic or Latino
Asian Alone, NOT Hispanic or Latino
In-class Analysis
Use the get_decennial function to create a state dataframe with the following variables:
GEOID
State
Total Population (P1_001N)
Total Hispanic or Latino Population (P2_002N)
Percent Hispanic or Latino (P2_002N/P1_001N)
Total Black alone, not Hispanic or Latino (P2_006N), Total Asian alone, not Hispanic or Latino, Total Native American alone, not Hispanic or Latino
Total Population of two or more races (P2_009N)
Calculate the percent for each race and ethnicity
There are a lot of race/ethnicity variables. It is not easy to determine which one to use!
Create a bar chart of the percent Black population, with the states ordered by population.
Create a dataframe of estimated Median Household Income and selected race/ethnicity variables for every county in one state. Use this data to understand the relationship between race/ethnicity and income in this state. Explore the dataframe by:
looking at the data
calculate summary statistics for your state to determine:
the average median household income
the percent of the population for each race and ethnicity
creating plots
a bar plot of the median household income of each county, ordered ascending, colored by one of your % race variables
a scatter plot comparing median household income of each county to one of your % race variables
Write a paragraph explaining at least 3 things you have learned about your state by exploring the data. Include plots and/or statistics to support your conclusions. (You can upload the plots separately or create a pdf with text and images)
See more instructions on the next slide
Assignment 7b: specific instructions
Use the get_decennial function to create a dataframe of all counties in one state (pick any state) with the following variables:
County
Total Population (P1_001N)
Total Hispanic or Latino Population
Total Black alone, not Hispanic or Latino, Total Asian alone, not Hispanic or Latino, Total Native American alone, not Hispanic or Latino
Total Population of two or more races (P1_009N)
Percent for each race and ethnicity
ex. Percent Hispanic or Latino = (P2_002N/P1_001N)
Use the get_acs function to create a dataframe of the estimated Median household income for all counties in the same state. Use the code below. We’ll learn more about ACS next week.
raw_mhi_2020 =get_acs(geography ="county", variables =c(mhi ="B19013_001"), # renaming the variable as I import state ="GA",year =2020,output ="wide",survey ="acs5")
Join these two dataframes together. Explore as described in the assignment overview.