Identification of bias
Mitigation of bias
The idea that those in the positions to make decisions are unaware of the potential harms of their biases and blind spots.
They are the ones deciding:
To help identify and mitigate bias in data and research
The US Census follows “standards on race and ethnicity set by the U.S. Office of Management and Budget (OMB) in 1997
. These standards guide how the federal government collects and presents data on these topics.”
Anyone want to share?
The specification of the list of folders to get to a file on your computer is called a path.
"/Users/sarahodges/spatial/methods1/part1/data/raw/invol_data_propublica.csv"
"data/raw/invol_data_propublica.csv"
In this class, we created an R project called part1.Rproj that defines the folder that your relative path starts with.
If you are using them
/Users/sarahodges/spatial/Data/tabular/msa/nhgis_fam_pov_20197bsa.csv
use the file.path()
function if you have issues:
file.path("C:\\Users\\sarahodges\\spatial\\Data\\tabular\\msa\\nhgis_fam_pov_20197.csv")
You already have all the data you need for the first few classes. Let’s talk through the file structure we’ll all use.
data/raw
Create new R script File > R Script
Save it in methods1/part1/scripts
as ny_school_districts_student_poverty_2019.R
At the top of your script, write a comment describing the purpose of this file and the source:
Next, load the tidyverse collection of packages to your environment
Notice
load
a package into your R session every time.
install.packages(package_name)
to install the package on your computer - ONLY ONCElibrary(package_name)
to load the packages you are using into the current R environment - IN EVERY SCRIPTWe’ll learn four functions that are the backbone of data transformation in R
read_csv()
to import a dataset into your R Environment, and the metadata.raw_stpov19
to indicate that this is the original form of the data…1 | Postal | FIPS | County | CONUM | district_id | Name | Estimated_Total_Pop | Estimated Population 5-17 | Estimated_relevant_5_17_in_poverty |
---|---|---|---|---|---|---|---|---|---|
1 | NY | 36 | Steuben County | 36101 | 3602370 | Addison Central School District | 6856 | 1210 | 283 |
2 | NY | 36 | Oneida County | 36065 | 3605040 | Adirondack Central School District | 8424 | 1346 | 189 |
3 | NY | 36 | Chenango County | 36017 | 3602400 | Afton Central School District | 3696 | 583 | 115 |
4 | NY | 36 | Erie County | 36029 | 3602430 | Akron Central School District | 9618 | 1503 | 143 |
5 | NY | 36 | Albany County | 36001 | 3602460 | Albany City School District | 98257 | 10895 | 2897 |
variable | definition |
---|---|
Postal | State Potasl code |
FIPS | State FIPS code |
district_id | unique school district identifier |
Name | school district name |
Estimated_Total_Pop | estimated population |
Estimated_Pop_5_17 | estimated population of school-age children, aged 5 to 17 years old |
Estimated_relevant_5_17_in_poverty | estimated population of school-age children, aged 5 to 17 years old that are living in poverty |
NA | NA |
NA | NA |
data source: U.S. Census Small Area Income and Poverty Estimates (SAIPE) Program, https://www.census.gov/programs-surveys/saipe.html | NA |
Notice
Notice
character
Our student poverty table has 9 columns, we may not need all of those.
Type names(stpov19)
in your console to see column names
# Process the 2019 student-age poverty data from SAIPE for New York
# source = https://www.census.gov/data/datasets/2019/demo/saipe/2019-school-districts.html
library(tidyverse)
raw_stpov19 <- read_csv("data/raw/school_district_child_poverty_2019.csv")
# import variable definitions to review
stpov_meta <- read_excel("data/raw/ny_student_poverty_metadata.xlsx")
# create student poverty rate & year column
# select necessary variables
stpov19 <- raw_stpov19 |>
mutate(stud_pov_rate = Estimated_relevant_5_17_in_poverty
/`Estimated Population 5-17`,
year = "2019") |>
select(Postal, district_id, Name, Estimated_Total_Pop,
`Estimated Population 5-17`, stud_pov_rate, year)
Notice
|>
to add a new function
mutate()
all of your variables, then |>
to use select()
Notice
Try these!
# filter based on text value
nyc <- stpov19 |>
filter(Name == "New York City Department Of Education")
# remove new york city
ny_no_nyc <- stpov19 |>
filter(Name != "New York City Department Of Education")
# remove all districts with more than 10,000 people
ny_no_large_districts <- stpov19 |>
filter(Estimated_Total_Pop <= 10000)
# remove all districts with more than 10,000 people AND less than or equal to 500
ny_medium_districts <- stpov19 |>
filter(Estimated_Total_Pop <= 10000 & Estimated_Total_Pop > 500)
rename your columns new_name = old_name no spaces in variable names
stpov19 <- raw_stpov19 |>
mutate(stud_pov_rate = Estimated_relevant_5_17_in_poverty
/`Estimated Population 5-17`,
year = "2019") |>
select(Postal, district_id, Name, Estimated_Total_Pop,
`Estimated Population 5-17`, Estimated_relevant_5_17_in_poverty, stud_pov_rate, year) |>
filter(`Estimated Population 5-17` >= 100) |>
rename(id = district_id,
district = Name,
tpop = Estimated_Total_Pop,
stpop = `Estimated Population 5-17`,
stpov = Estimated_relevant_5_17_in_poverty,
stpovrate = stud_pov_rate)
Changes that you make to the dataframe are not saved to your computer until you save the dataframe to your computer
We’ll use the write_csv() function to save the processed data frame to your computer.
Getting help in R presentation.
OPTIONAL: For additional context and information on R, review Chapters 4 and 5 of R for Data Science by Hadley Wickham and Garrett Grolemund
Use the 2019 student poverty processing script to process the same dataset for 2022.
ny_school_districts_student_poverty_2022_your_name.R
Import the 2022 ACS New York Poverty Data by County and create a data frame of the poverty rate
ny_county_poverty_rate_22_you_name.R
raw_county_pov22
rename
:
County FIPS Code
to conumPoverty Estimate, Age 5-17 in Families
to county_child_poverty_countselect
the following columns: NAME, conum, county_child_poverty_count, county_child_poverty_rateny_school_district_student_poverty_rate_2022.csv
)Upload both of you scripts to their assignment in Canvas