The global COVID-19 pandemic catalyzed the emergence of new migration patterns as remote work and quarantines spread worldwide. One particular trend was a wave of migration to the rural countryside in search of a slower pace of life, social distance, and access to natural amenities. In many places across the country, this phenomenon has resulted in rural gentrification, where the influx of newcomers increases housing prices and pushes locals out. The granularity of census migration data leaves a lot of questions to be answered about the true characteristics of migration during the pandemic. Focusing on the case study of the Berkshires, a largely rural county in Western Massachusetts, this study uses mobile device location data that records monthly migration at the individual level. With this novel form of data, this study examines the complexity of new migration trends by establishing a classification of these new movement patterns, connecting origins and destinations to investigate the direction of migration on the urban hierarchy, and identifying socioeconomic trends in migration that can point to rural gentrification and local displacement. The results thus far have demonstrated increased complexity and transience in migration where temporary and cyclical migration has a stronger presence than ever before. Additionally, the findings have confirmed a strong migration trend between urban centers and rural amenity-rich areas. Further work will focus on replicability of the study by running the script on different counties across the US to compare relative trends between counties.
Soon to be added!
Key words: R, Spatial AnalysisSubject: Social and Behavioral Sciences: Geography:
Geographic Information SciencesDate created: 2023-09-15Date modified: 2023-09-15Spatial Coverage: Specify the geographic extent of your
study. This may be a place name and link to a feature in a gazetteer
like GeoNames or OpenStreetMap, or a well known text (WKT)
representation of a bounding box.Spatial Resolution: Specify the spatial resolution as a
scale factor, description of the level of detail of each unit of
observation (including administrative level of administrative areas),
and/or or distance of a raster GRID sizeSpatial Reference System: Specify the geographic or
projected coordinate system for the studyTemporal Coverage: Specify the temporal extent of your
study—i.e. the range of time represented by the data observations.Temporal Resolution: Specify the temporal resolution of
your study—i.e. the duration of time for which each observation
represents or the revisit period for repeated observationsFunding Name: NSF Directorate for Social, Behavioral
and Economic SciencesFunding Title: Transforming theory-building and STEM
education through reproductions and replications in the geographical
sciencesAward info URI: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2049837Award number: BCS-2049837# install packages
#install.packages( c("tigris", "sf", "tidycensus", "lubridate", "dplyr", "ggplot2", "ggmap", "zoo", "data.table", "tidyr") )
# look for pop-up dialogue to agree to installation!
# load packages
library(tigris)
## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.
library(sf)
## Warning: package 'sf' was built under R version 4.3.3
## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(tidycensus)
## Warning: package 'tidycensus' was built under R version 4.3.3
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(ggmap)
## ℹ Google's Terms of Service: <https://mapsplatform.google.com>
## Stadia Maps' Terms of Service: <https://stadiamaps.com/terms-of-service/>
## OpenStreetMap's Tile Usage Policy: <https://operations.osmfoundation.org/policies/tiles/>
## ℹ Please cite ggmap if you use it! Use `citation("ggmap")` for details.
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(data.table)
##
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
##
## between, first, last
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
library(tidyr)
#census_api_key(key = "ffa84ce2bb640d4a800340766ebeb796333dd0d2", install = TRUE)
# Load a CSV file into a data frame
movers <- read.csv("/Users/isaiahbennett/Downloads/fall 2024/migration/research/data/berkshire_moversR/BerkshireMovers.csv"
)
# query metadata for 2022 census
# Define Variables to input for census data query
target_state = "MA"
target_county = "Berkshire"
target_fips = 25003
# Set options for tigris
options(tigris_use_cache = TRUE)
# Check and load ACS 2022 variables
if (!exists("acs2022vars")) {
acs2022vars <- load_variables(year = 2022, dataset = "acs5")
print("ACS 2022 variables loaded!")
}
## [1] "ACS 2022 variables loaded!"
# Define target state and county (modify as needed)
target_state <- "MA" # Change to your desired state abbreviation
target_county <- "Berkshire" # Change to your desired county name
# Check and load Berkshire County census tracts
if (!exists("berks_tracts_2020")) {
tryCatch({
berks_tracts_2020 <- tracts(state = target_state, county = target_county, year = 2020, cb = TRUE)
print("Berkshire County tracts loaded!")
}, error = function(e) {
print("Error loading Berkshire County tracts")
print(e)
})
}
## [1] "Berkshire County tracts loaded!"
# Check and load US census tracts
if (!exists("US_tracts_2020")) {
tryCatch({
US_tracts_2020 <- tracts(year = 2020, cb = TRUE)
print("US tracts loaded!")
}, error = function(e) {
print("Error loading US tracts")
print(e)
})
}
## Retrieving Census tracts for the entire United States
## [1] "US tracts loaded!"
# Check and load US counties
if (!exists("US_counties_2020")) {
options(tigris_use_cache = FALSE) # Disable cache temporarily
tryCatch({
US_counties_2020 <- counties(year = 2020, cb = FALSE)
print("US counties loaded!")
}, error = function(e) {
print("Error loading US counties")
print(e)
})
options(tigris_use_cache = TRUE) # Re-enable cache
}
##
Downloading: 1.7 kB
Downloading: 1.7 kB
Downloading: 1.7 kB
Downloading: 1.7 kB
Downloading: 1.7 kB
Downloading: 1.7 kB
## Warning in unzip(file_loc, exdir = tmp): error 1 in extracting from zip file
## [1] "Error loading US counties"
## <Rcpp::exception: Cannot open "/private/var/folders/76/0f4vvnw909n136hz270cz5yc0000gn/T/Rtmp2Z7xLz"; The source could be corrupt or not supported. See `st_drivers()` for a list of supported formats.>
print("All datasets are now loaded and ready!")
## [1] "All datasets are now loaded and ready!"
# st_write(berks_tracts_2020, "BerksMovers.gpkg", layer = "tracts2020")
# st_write(US_tracts_2020, "BerksMovers.gpkg", layer = "UStracts2020")
# st_write(US_counties_2020, "BerksMovers.gpkg", layer = "UScounties2020")
# Convert decimal year to Year Month
movers$new_date <- as.yearmon(movers$date, format = "%d-%m-%Y")
movers$new_date <- as.character(movers$new_date)
# Add leading zeros to 4 digit county fips so that all county fips have 5 digits
movers$county_fips <- ifelse(nchar(movers$county_fips) == 4,
paste0("0", movers$county_fips),
movers$county_fips)
#Change data type for county fips to character for join
#movers$county_fips <- as.character(movers$county_fips) ### can delete?
#Add leading zeros to 10 digit tract fips
movers$tract_fips <- ifelse(nchar(movers$tract_fips) == 10,
paste0("0", movers$tract_fips),
movers$tract_fips)
#Change data type to character for joins
movers$tract_fips <- as.character(movers$tract_fips)
# get state codes to choose only states that exist in the data
#to query tracts from 2010 state by state for whats needed
extract_states <- substr(movers$tract_fips, 1, 2) #start at 1st and stop at 2nd character
# Combine and find unique state codes
all_state_codes <- unique(extract_states)
# query US 2010 census with state codes that are in movers data
# using 2010 because the tractfips from the data correspond
# with the tract delineation from 2010 and not 2020
# Initialize an empty list to store the tracts data for each state
all_tracts <- list()
# Loop through each state code and get the tracts data
for (state_code in all_state_codes) {
state_tracts <- tracts(state = state_code, year = 2010, cb = TRUE)
all_tracts[[state_code]] <- state_tracts
}
# Combine all tracts data into a single data frame
US_tracts_2010 <- do.call(rbind, all_tracts)
#create GEOID to join with movers
US_tracts_2010 <- US_tracts_2010 %>% mutate(GEOID = paste0(STATE, COUNTY, TRACT))
# selecting fields from tracts 2020 that i want to add to tracts 2010 such as the actual written out name of the counties and states
tracts_2020_names <- US_tracts_2020 %>% select(STATEFP, COUNTYFP, NAMELSADCO, STATE_NAME)
# Check if it's an sf object
if (inherits(tracts_2020_names, "sf")) {
# Convert it to a regular data frame by dropping the geometry to be able to join normally to sf tracts 2010
tracts_2020_names <- st_drop_geometry(tracts_2020_names)
}
# Select only unique STATEFP and COUNTYFP with state and county names so that it is not a many to many join and rather a one to many
tracts_2020_unique <- tracts_2020_names %>%
select(STATEFP, COUNTYFP, STATE_NAME, NAMELSADCO) %>%
distinct(STATEFP, COUNTYFP, .keep_all = TRUE)
#left join keeping all from the US_tracts_2010 and joining just the matching values from the unique names frame
US_tracts_2010 <- US_tracts_2010 %>%
left_join(tracts_2020_unique, by = c("STATEFP", "COUNTYFP"))
# save to geopackage
#st_write(US_tracts_2010, "BerksMovers.gpkg", layer = "UStracts2010")
categorized_movers <- movers %>%
arrange(caid, date) %>% # Sort by caid and date to track movements chronologically
group_by(caid) %>%
mutate(
T_first_outmove = first(tract_fips[outmove == 1], order_by = date), # First outmove tract
T_last_inmove = last(tract_fips[inmove == 1], order_by = date), # Final inmove tract
C_first_outmove = first(county_fips[outmove == 1], order_by = date), # First outmove county
C_last_inmove = last(county_fips[inmove == 1], order_by = date), # Final inmove county
date_last_inmove = last(date[T_last_inmove == tract_fips], order_by = date),
date_first_outmove = first(date[T_first_outmove == tract_fips], order_by = date),
date_first_berks = first(date[county_fips == target_fips], order_by = date),
date_last_berks = last(date[county_fips == target_fips], order_by = date),
county_fips_changes = n_distinct(county_fips), # Count the number of distinct county_fips
county_chunk = rleid(county_fips), #gives adjacent countys same unique ids
berkshire_flag = ifelse(county_fips == target_fips, 1, 2), #Mark if it's Berkshire or not by 1s or 2s
chunk_id = rleid(berkshire_flag), # creates distinct ids that represent changes in berkshire to non berkshire chunks
movement_pattern = case_when(
# permanent moved/resident
#One way single time
# maybe use july 2021 for edge effect
#period where migration is the greatest
### Single Origin Move to Berkshires
#allows multiple moves between county outside and before berkshires
C_first_outmove != target_fips & # first outmove is not berkshires
C_last_inmove == target_fips & # last inmove is berkshires
county_fips_changes == 2 &
sum(n_distinct(chunk_id)) == 2 # allows multiple moves in outside county
~ "SO Moved",
### multiple outside possibly between different counties ending up in berkshires move
C_first_outmove != target_fips &
C_last_inmove == target_fips & #berkshire last inmove
county_fips_changes >= 2 & # county fips greater than 2
sum(n_distinct(chunk_id)) == 2 # outside chunk to inside chunk
~ "MO Moved",
# permanent left - One way single time
### Single Destination left berkshire
C_first_outmove == target_fips &
C_last_inmove != target_fips & #berkshire last inmove
county_fips_changes == 2 & # county fips greater than 2
sum(n_distinct(chunk_id)) == 2 # outside chunk to inside chunk
~ "SD Left", # maybe most susceptible to displacement
### Multi Destination left berkshire
#allows moves in multiple counties after berkshires
C_first_outmove == target_fips & #first outmove berkshires
C_last_inmove != target_fips & #last inmove is not berkshires
county_fips_changes >= 2 & # more than one county
sum(n_distinct(chunk_id)) == 2 #allows multiple moves in outside county
~ "MD Left",
#intra only
# Moved only within the county
all(county_fips == target_fips)
~ "Intra-County",
#2 locations multiple moves
C_last_inmove == target_fips &
county_fips_changes == 2 & #between 2 counties
#sum(inmove == 1 & county_fips == 25003, na.rm = TRUE) > 1 & # more than 1 berkshire inmove
sum(county_fips != lag(county_fips), na.rm = TRUE) >= 3 & # the number of times the county changes at least two appearances of origin and destination
date_last_inmove < 2021.5
~ "Back and Forth then Inmover",
### back and forth
# has to touch both berkshires and outside county twice
county_fips_changes == 2 & #between 2 counties
#sum(inmove == 1 & county_fips == 25003, na.rm = TRUE) > 1 & # more than 1 berkshire inmove
sum(county_fips != lag(county_fips), na.rm = TRUE) >= 3 # the number of times the county changes at least two appearances of origin and destination
~ "Back and Forth",
### Temporary Outmover: first outmove and final inmove are both Berkshire
# one time get away to berkshire
C_first_outmove == target_fips &
C_first_outmove == C_last_inmove &
county_fips_changes == 2
~ "Temporary Outmover",
### Temporary visitor who left their origin, went to Berkshire, and returned to their origin
# One time getaway to berkshire and return
C_first_outmove != target_fips & # first outmove is not from berkshires
C_first_outmove == C_last_inmove & #origin = destination
county_fips_changes == 2
~ "Temporary Inmover", #return mover remote work or second home owner same tract
# Multiple moves multiple locations
### temporary out move same county but many stops in between
C_first_outmove == target_fips &
C_first_outmove == C_last_inmove &
sum(n_distinct(chunk_id)) == 3 #
~ "Transient Temporary Outmover",
### separate origin and destination with temp berks inmove
# allows for multiple counties before and after
C_first_outmove != target_fips & # first outmove is not from berkshires
C_last_inmove != target_fips &
sum(n_distinct(chunk_id)) == 3 # two changes in berkshire or non berkshire
~ "Transient Temporary Inmover", #leaves to berks getaway relocates in other county
### bounces between berkshires and at least 2 other counties before staying
C_last_inmove == target_fips &
date_last_inmove < 2021.5
~ "Transient then Inmover", #
### Visits more than once with other counties included
# could be from the berkshires or from somewhere else
#but they have a strong connection to the area through more than 1 multiweek visit
sum(berkshire_flag == 1) > 1 & # more than one berkshire flag
sum(n_distinct(chunk_id)) > 3 & # more than two changes between berks and non berks
county_fips_changes >= 3 # 3 or more different counties
~ "Transient", #was frequent visitor
# Catch all others
TRUE ~ "Unknown"
),
inmove_date = ifelse(inmove == 1, date, NA),
outmove_date = ifelse(outmove == 1, date, NA),
)
#create chunk date that is the first date for the first feature in each chunk
#helpful for filtering by time for edge control
categorized_movers <- categorized_movers %>%
group_by(caid, chunk_id) %>%
mutate(
chunk_date = first(date),
second_chunk_date = ifelse(chunk_id == 2, first(date), NA)
) %>%
ungroup()
# create duration to calculate the length of time movers spent in each county they visited
duration_categorized_movers <- categorized_movers %>%
group_by(caid, county_chunk) %>%
summarize(
outmove_date = last(outmove_date), # Last non-NA outmove_date
inmove_date = first(inmove_date), # First non-NA inmove_date
.groups = "drop"
) %>%
mutate(
duration = case_when(
is.na(outmove_date) | outmove_date < inmove_date ~ 2021.917 - inmove_date, # If no outmove_date for final inmove
is.na(inmove_date) ~ outmove_date - 2019.000, # If no inmove_date for first outmove
TRUE ~ outmove_date - inmove_date # Normal case
)
)
# Join duration back to categorized movers frame by caid, and county_chunk
categorized_movers <- categorized_movers %>%
left_join(
duration_categorized_movers %>% select(caid, county_chunk, duration),
by = c("caid", "county_chunk")
)
# Identify the singular, non berkshire longest stay
categorized_movers <- categorized_movers %>%
group_by(caid) %>%
mutate(
max_duration = if_else(
duration == max(duration[county_fips != 25003], na.rm = TRUE) & county_fips != 25003,
duration,
NA_real_
)
) %>%
ungroup()
## Warning: There were 15500 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `max_duration = if_else(...)`.
## ℹ In group 7: `caid =
## "000d732c0e8054420fc6e034bc544843bb5d2e9c2c0e48dd8d6fac6ce412f314"`.
## Caused by warning in `max()`:
## ! no non-missing arguments to max; returning -Inf
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 15499 remaining warnings.
##Movement Pattern Totals
# calculate the total number of phones for each category and the count of distinct counties for each category
test_category_totals <- categorized_movers %>%
group_by(movement_pattern) %>%
summarize(
total_pop = n_distinct(caid),
count_counties = (n_distinct(county_fips) - 1),
count_tracts = n_distinct(tract_fips))
categorized_movers <- categorized_movers %>%
mutate(
temporal_simple_categories = case_when(
# single origin one way permanent
movement_pattern %in% c("SO Moved", "MO Moved", "Back and Forth then Inmover", "Transient then Inmover")
~ "Moved",
# Combine these categories into "Left"
movement_pattern %in% c("SD Left", "MD Left")
~ "Left",
# Combine these categories into "Temporary Inmover"
movement_pattern %in% c("Transient Temporary Inmover", "Temporary Inmover")
~ "Temporary Inmover",
# Combine these categories into "Temporary Outmover"
movement_pattern %in% c("Transient Temporary Outmover", "Temporary Outmover")
~ "Temporary Outmover",
# Catch all others
TRUE ~ movement_pattern
))
temporal_simple_category_totals <- categorized_movers %>%
group_by(temporal_simple_categories) %>%
summarize(
total_pop = n_distinct(caid),
count_counties = (n_distinct(county_fips) - 1),
count_tracts = n_distinct(tract_fips)
)
##Temporal Distribution of Total Movers
# Temporal graph of when movers arrived vs left
# Generate sequence of months from Jan 2019 to Dec 2021
dates <- seq(from = as.Date("2019-01-01"), to = as.Date("2021-12-01"), by = "month")
# Convert to "Mon YYYY" format and create temporal dataframe
temporal <- data.frame(Date = format(dates, "%b %Y"))
# Convert temporal Date to yearmon for joining
temporal$Date <- as.yearmon(temporal$Date, format = "%b %Y")
# Convert date columns in categorized_movers to yearmon format
categorized_movers <- categorized_movers %>%
mutate(date_last_berks = as.yearmon(date_last_berks, format = "%d-%m-%Y"),
date_first_berks = as.yearmon(date_first_berks, format = "%d-%m-%Y"),
second_chunk_date = as.yearmon(second_chunk_date, format = "%d-%m-%Y"))
# Summarize distinct `caid` counts for each simple movement pattern by month
simple_movement_summary <- categorized_movers %>%
mutate(date_key = case_when(
movement_pattern %in% c("SD Left", "MD Left") ~ as.yearmon(date_last_berks),
movement_pattern %in% c("SO Moved", "MO Moved", "Temporary Inmover", "Transient Temporary Inmover") ~ as.yearmon(date_first_berks),
movement_pattern %in% c("Temporary Outmover", "Transient Temporary Outmover") ~ as.yearmon(second_chunk_date),
TRUE ~ as.yearmon(NA) # Ensures NA is also numeric-compatible with yearmon
)) %>%
filter(!is.na(date_key)) %>%
group_by(date_key, temporal_simple_categories) %>%
summarize(total_movers = n_distinct(caid), .groups = "drop") %>%
pivot_wider(names_from = temporal_simple_categories, values_from = total_movers, values_fill = 0) # na is replaced w 0
# pivot wider is essential because instead of having multiple rows of each date for each movement pattern they are reoriented into columns
# Merge summarized movement data with temporal dataframe
simple_temporal_categorized_movers <- temporal %>%
left_join(simple_movement_summary, by = c("Date" = "date_key"))
write.csv(simple_temporal_categorized_movers, "new_simple_temporal_categorized_movers.csv", row.names = FALSE)
##Filter by study temporal edge constraints
# temporal control for edge effects and lockdown start
lockdown = 2020.1 # date variable to input into filtering data 2020.1 is between feb and march of 2020 for when lockdown began
endcap = 2021.5 # and june = 2021.417
time_filt_categories <- categorized_movers %>%
filter(
# if first berk inmove is within constraints
(movement_pattern == "SO Moved" & lockdown < date_first_berks & date_first_berks < endcap) |
(movement_pattern == "MO Moved" & lockdown < date_first_berks & date_first_berks < endcap) |
# if last berks outmove is within constraints (last berks for left categories is last outmove)
(movement_pattern == "SD Left" & lockdown < date_last_berks & date_last_berks < endcap) |
(movement_pattern == "MD Left" & lockdown < date_last_berks & date_last_berks < endcap) |
#first berks move within constraints
(movement_pattern == "Intra-County" & lockdown < date_last_berks) |# last date for berkshires just
(movement_pattern == "Back and Forth" & lockdown < date_last_berks) | # last date for berkshires just after lockdown
# all proceding filters where the first date of second chunk is within constraints
(movement_pattern == "Temporary Outmover" & any(chunk_id== 2 & lockdown < chunk_date & chunk_date < endcap)) |
(movement_pattern == "Transient Temporary Outmover" & any(chunk_id== 2 & lockdown < chunk_date & chunk_date < endcap)) |
(movement_pattern == "Temporary Inmover" & any(chunk_id== 2 & lockdown < chunk_date & chunk_date < endcap)) |
(movement_pattern == "Transient Temporary Inmover" & any(chunk_id== 2 & lockdown < chunk_date & chunk_date < endcap)) |
(movement_pattern == "Transient" & any(county_fips==target_fips & inmove == 1 & lockdown < date & date < endcap)) |
(movement_pattern == "Back and Forth then Inmover") | (movement_pattern == "Transient then Inmover")
)
# category totals
time_filt_category_totals <- time_filt_categories %>%
group_by(movement_pattern) %>%
summarize(
total_pop = n_distinct(caid),
count_counties = (n_distinct(county_fips) - 1),
count_tracts = n_distinct(tract_fips))
time_filt_simple_category_totals <- time_filt_categories %>%
group_by(temporal_simple_categories) %>%
summarize(
total_pop = n_distinct(caid),
count_counties = (n_distinct(county_fips) - 1),
count_tracts = n_distinct(tract_fips)
)
# Calculate the total sum of the total_pop column
total_sum <- sum(time_filt_category_totals$total_pop) - time_filt_category_totals$total_pop[time_filt_category_totals$movement_pattern == "Only Intra-County"]
# Add a new column for the percentage per category of total movers
#time_filt_category_totals$percent_of_total <- (time_filt_category_totals$total_pop / total_sum) * 100
# Group by tracts with all categories to be filtered later
time_filt_categories <- time_filt_categories %>% filter(movement_pattern != "Intra-County") %>%
group_by(caid) %>%
mutate(
#Records the tract_fips of the first row where max_duration is not NA
principal_track = if_else(row_number() == first(which(!is.na(max_duration))), tract_fips, NA_character_)
)
group_by_tract_categorized <- time_filt_categories %>%
group_by(movement_pattern, tract_fips, county_fips, temporal_simple_categories) %>%
summarize(
PopMovers = n_distinct(caid[tract_fips == principal_track], na.rm = TRUE),
GrossMigT = n_distinct(caid[inmove == 1], na.rm = TRUE))
## `summarise()` has grouped output by 'movement_pattern', 'tract_fips',
## 'county_fips'. You can override using the `.groups` argument.
urban_rural_NCHS <- read.csv("/Users/isaiahbennett/Downloads/fall 2024/migration/research/data/berkshire_moversR/NCHSURCodes2013.csv"
)
ERS_typology<- read.csv("/Users/isaiahbennett/Downloads/spring 2025/migration research/main/data/ers county typology/ERSCountyTypology2015Edition.csv")
#Add leading zeros to 4 digit county fips
urban_rural_NCHS$GEOID <- ifelse(nchar(urban_rural_NCHS$FIPS.code) == 4,
paste0("0", urban_rural_NCHS$FIPS.code),
urban_rural_NCHS$FIPS.code)
#Change data type to character for join
urban_rural_NCHS$GEOID <- as.character(urban_rural_NCHS$GEOID)
CountyCode = urban_rural_NCHS$X2013.code[urban_rural_NCHS$GEOID == target_fips]
###Comparison Counties Query
# Read county typology
ERS_typology <- read.csv("/Users/isaiahbennett/Downloads/spring 2025/migration research/main/data/ers county typology/ERSCountyTypology2015Edition.csv")
#add leading zeros for 4 digit county fips
ERS_typology$FIPStxt <- ifelse(nchar(ERS_typology$FIPStxt) == 4,
paste0("0", ERS_typology$FIPStxt),
ERS_typology$FIPStxt)
#join by fips
NCHS_ERS_join <- urban_rural_NCHS %>%
select(GEOID, CBSA.title, X2013.code, X2006.code, X1990.based.code, County.2012.pop) %>%
right_join(ERS_typology, by = c("GEOID" = "FIPStxt"))
#filter for recreational counties that are small metros
NCHS_ERS_filtered_4 <- NCHS_ERS_join %>% filter(Recreation_2015_Update..allows.overlap..1.yes.== 1, X2013.code == 4)
NCHS_ERS_filtered_3 <- NCHS_ERS_join %>% filter(Recreation_2015_Update..allows.overlap..1.yes.== 1, X2013.code == 3)
NCHS_ERS_filtered_5 <- NCHS_ERS_join %>% filter(Recreation_2015_Update..allows.overlap..1.yes.== 1, X2013.code == 5)
# group by counties for each movement pattern and simple category
# each county will have its own row for each movement pattern that it has a record in
county_categorized <- group_by_tract_categorized %>%
group_by(county_fips, movement_pattern, temporal_simple_categories) %>%
summarize(
PopMoversC = sum(PopMovers, na.rm = TRUE),
#PopOriginC = sum(PopOrigin, na.rm = TRUE),
#PopDestinationC = sum(PopDestination, na.rm = TRUE),
#PopTempOutC = sum(PopDestTempOut, na.rm = TRUE),
GrossMigC = sum(GrossMigT, na.rm = TRUE),
) %>%
ungroup() # Ungroup to group later
## `summarise()` has grouped output by 'county_fips', 'movement_pattern'. You can
## override using the `.groups` argument.
#right join where it retains all rows from group_by_tract_categorized (the right table), along with any matching rows from US_tracts_2010 (the left data table)
NCHS_join <- urban_rural_NCHS %>%
select(GEOID, CBSA.title, X2013.code, X2006.code, X1990.based.code) %>%
right_join(county_categorized, by = c("GEOID" = "county_fips"))
# calculate urban hierarchy shift ### MUST be noted that for resident simple category it should be the inversed aka /-1
#if its positive 4 - 1, 2, 3 destination then moving up urban hierarchy more urban
#if its negative then moving down urban hierarchy going more rural
#weighted average to see
#direction for left origin minus destination
NCHS_join$UrbanHierLeft <- CountyCode - NCHS_join$X2013.code
NCHS_join$UrbanHierMoved <- NCHS_join$X2013.code - CountyCode
# Nchs county urban to rural classification
#1 Large central metro
#2 Large fringe metro
#3 Medium metro
#4 Small metro
#5 Micropolitan
#6 Non-core
#read US counties 2020 since querying from census was not working earlier
US_counties_2020 <- st_read("BerksMovers.gpkg", layer = "UScounties2020")
## Reading layer `UScounties2020' from data source
## `/Users/isaiahbennett/Downloads/spring 2025/migration research/main/BerksMovers.gpkg'
## using driver `GPKG'
## Simple feature collection with 3234 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -179.1467 ymin: -14.5487 xmax: 179.7785 ymax: 71.38782
## Geodetic CRS: NAD83
county_join <- US_counties_2020 %>%
right_join(NCHS_join, by = "GEOID")
#st_write(county_join, "BerksMovers.gpkg", layer = "final_county_join", append = FALSE)
#moving out where are they going to
#single origin single destination
weighted_shift_left <- county_join %>%
filter(temporal_simple_categories %in% c("Left", "Temporary Outmover")) %>%
group_by(movement_pattern) %>%
summarize(
UrbanShift = sum(UrbanHierLeft * PopMoversC, na.rm = TRUE),
PopMoversWS = sum(PopMoversC, na.rm = TRUE),
WeightedShift = UrbanShift / PopMoversWS
)
weighted_shift_moved <- county_join %>%
filter(temporal_simple_categories %in% c("Moved", "Temporary Inmover", "Back and Forth", "Transient")) %>%
group_by(movement_pattern) %>%
summarize(
UrbanShift = sum(UrbanHierMoved * PopMoversC, na.rm = TRUE),
PopMoversWS = sum(PopMoversC, na.rm = TRUE),
WeightedShift = UrbanShift / PopMoversWS)
# Combine both summarizations
weighted_shift <- bind_rows(weighted_shift_left, weighted_shift_moved)
weighted_shift <- st_drop_geometry(weighted_shift)
# Save as a comma-separated file
#write.csv(weighted_shift, "weighted_shift.csv", row.names = FALSE)
simple_w_shift_left <- county_join %>%
filter(temporal_simple_categories %in% c("Left", "Temporary Outmover")) %>%
group_by(temporal_simple_categories) %>%
summarize(
UrbanShift = sum(UrbanHierLeft * PopMoversC, na.rm = TRUE),
PopMoversWS = sum(PopMoversC, na.rm = TRUE),
WeightedShift = UrbanShift / PopMoversWS
)
simple_w_shift_moved <- county_join %>%
filter(temporal_simple_categories %in% c("Moved", "Temporary Inmover", "Back and Forth", "Transient")) %>%
group_by(temporal_simple_categories) %>%
summarize(
UrbanShift = sum(UrbanHierMoved * PopMoversC, na.rm = TRUE),
PopMoversWS = sum(PopMoversC, na.rm = TRUE),
WeightedShift = UrbanShift / PopMoversWS)
# Combine both summarizations
simple_weighted_shift <- bind_rows(simple_w_shift_left, simple_w_shift_moved)
simple_weighted_shift <- st_drop_geometry(simple_weighted_shift)
# Save as a comma-separated file
write.csv(simple_weighted_shift, "simple_weighted_shift.csv", row.names = FALSE)
##Visualizations / Create Centroids and Map
#create centroids
centroids_county_join <- st_centroid(county_join)
## Warning: st_centroid assumes attributes are constant over geometries
st_write(centroids_county_join, "CountiesNew.gpkg", layer = "Counties Centroids", append = FALSE)
## Deleting layer `Counties Centroids' using driver `GPKG'
## Writing layer `Counties Centroids' to data source
## `CountiesNew.gpkg' using driver `GPKG'
## Writing 3390 features with 22 fields and geometry type Point.
register_stadiamaps(key = "5d2cfcab-324c-4817-bb30-d413891ab37e")
# Filter the data to select rows where movement_pattern is "SO Moved"
SO_Moved_centroids <- centroids_county_join %>%
filter(movement_pattern == "SO Moved") %>% st_transform(crs = 4326)
SD_Left_centroids <- centroids_county_join %>%
filter(movement_pattern == "SD Left") %>% st_transform(crs = 4326)
#this works for basemap
basemap <- get_stadiamap(
bbox = c(left = -75.4, bottom = 38.4, right = -69.5, top = 45.1), # Bounding box for northeast US
zoom = 6, # Zoom level, adjust as needed
maptype = "alidade_smooth" # Use terrain map type
)
## ℹ © Stadia Maps © Stamen Design © OpenMapTiles © OpenStreetMap contributors.
one_way_permanent_map <- ggmap(basemap) + # Add the basemap
geom_sf(data = SD_Left_centroids, aes(size = PopMoversC, color = "Total Outmovers at Destination"), alpha = 0.7, inherit.aes = FALSE) + geom_sf(data = SO_Moved_centroids, aes(size = PopMoversC, color = "Total Inmovers at Origin"), alpha = 0.7, inherit.aes = FALSE) +
# Plot centroids
scale_size_continuous(range = c(0.5, 10), name = "Mover Count") + # Adjust symbol size range and add legend for size
scale_color_manual(values = c("Total Inmovers at Origin" = "yellow", "Total Outmovers at Destination" = "red"), name = "Movement Type") + # Custom color legend
theme_minimal() +
labs(
title = "One Way Permanent Migration Flows",
subtitle = ""
) +
theme(
legend.position = "bottom", # Adjust legend position
axis.text = element_blank(), # Remove latitude & longitude labels
axis.ticks = element_blank(), # Remove tick marks on axes
axis.title = element_blank() # Remove axis titles
) +
guides(
size = guide_legend(override.aes = list(color = "grey")) # Force size legend points to be orange
)
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
one_way_permanent_map
# Save the plot as a larger image
ggsave("larger_map3.png", plot = one_way_permanent_map, width = 12, height = 10, dpi = 300)
###Simple Categories Maps
#Map for simple categories Moved and Left
simple_centroids <- centroids_county_join %>% group_by(GEOID, temporal_simple_categories) %>% summarize(
PopMoversC = sum(PopMoversC, na.rm = TRUE),
#PopOriginC = sum(PopOriginC, na.rm = TRUE),
#PopDestinationC = sum(PopDestinationC, na.rm = TRUE),
#PopTempOutC = sum(PopTempOutC, na.rm = TRUE),
GrossMigC = sum(GrossMigC, na.rm = TRUE),
)
## `summarise()` has grouped output by 'GEOID'. You can override using the
## `.groups` argument.
# Filter the data to select rows where movement_pattern is "SO Moved"
Moved_centroids <- simple_centroids %>%
filter(temporal_simple_categories == "Moved") %>% st_transform(crs = 4326)
Left_centroids <- simple_centroids %>%
filter(temporal_simple_categories == "Left") %>% st_transform(crs = 4326)
#st_write(Moved_centroids, "BerksMovers.gpkg", layer = "Inmover Centroids")
#st_write(Left_centroids, "BerksMovers.gpkg", layer = "Outmover Centroids")
# inmovers mapped over outmovers - in yellow out red
one_way_permanent_simple_map <- ggmap(basemap) + # Add the basemap
geom_sf(data = Left_centroids, aes(size = PopMoversC, color = "Total Outmovers at Destination"), alpha = 0.8, inherit.aes = FALSE) + geom_sf(data = Moved_centroids, aes(size = PopMoversC, color = "Total Inmovers at Origin"), alpha = 0.7, inherit.aes = FALSE) +
# Plot centroids
scale_size_continuous(range = c(0.5, 20), name = "Mover Count") + # Adjust symbol size range and add legend for size
scale_color_manual(values = c("Total Inmovers at Origin" = "yellow", "Total Outmovers at Destination" = "red"), name = "Movement Type") + # Custom color legend
theme_minimal() +
labs(
title = "Permanent Migration Flows",
subtitle = ""
) +
theme(
legend.position = "bottom", # Adjust legend position
axis.text = element_blank(), # Remove latitude & longitude labels
axis.ticks = element_blank(), # Remove tick marks on axes
axis.title = element_blank() # Remove axis titles
) +
guides(
size = guide_legend(override.aes = list(color = "grey")) # Force size legend points to be orange
)
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
one_way_permanent_simple_map
# Save the plot as a larger image
#ggsave("one_way_simple_map.png", plot = one_way_permanent_simple_map, width = 12, height = 10, dpi = 300)
###Temporary Mover Map
# Filter the data to select rows where movement_pattern is "SO Moved"
Temp_in_centroids <- simple_centroids %>%
filter(temporal_simple_categories == "Temporary Inmover") %>% st_transform(crs = 4326)
Temp_out_centroids <- simple_centroids %>%
filter(temporal_simple_categories == "Temporary Outmover") %>% st_transform(crs = 4326)
#st_write(Temp_in_centroids, "BerksMovers.gpkg", layer = "Temporary Inmover Centroids")
#st_write(Temp_out_centroids, "BerksMovers.gpkg", layer = "Temporary Outmover Centroids")
temporary_movers_simple_map <- ggmap(basemap) + # Add the basemap
geom_sf(data = Temp_out_centroids, aes(size = PopMoversC, color = "Total Temporary Outmovers at Destination"), alpha = 0.9, inherit.aes = FALSE) +
geom_sf(data = Temp_in_centroids, aes(size = PopMoversC, color = "Total Temporary Inmovers at Origin"), alpha = 0.7, inherit.aes = FALSE) +
# Plot centroids
scale_size_continuous(range = c(0.5, 20), name = "Mover Count") + # Adjust symbol size range and add legend for size
scale_color_manual(values = c("Total Temporary Inmovers at Origin" = "orange", "Total Temporary Outmovers at Destination" = "blue"), name = "Movement Type") + # Custom color legend
theme_minimal() +
labs(
title = "Temporary Migration Flows",
subtitle = ""
) +
theme(
legend.position = "bottom", # Adjust legend position
axis.text = element_blank(), # Remove latitude & longitude labels
axis.ticks = element_blank(), # Remove tick marks on axes
axis.title = element_blank() # Remove axis titles
) +
guides(
size = guide_legend(override.aes = list(color = "grey")) # Force size legend points to be orange
)
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
temporary_movers_simple_map
# Save the plot as a larger image
#ggsave("new_temp_movers_simple_map.png", plot = temporary_movers_simple_map, width = 12, height = 10, dpi = 300)
###Cyclical Mover Map
#back and forth map
Back_and_forth_centroids <- simple_centroids %>%
filter(temporal_simple_categories == "Back and Forth", GEOID != 25003) %>% st_transform(crs = 4326)
#st_write(Back_and_forth_centroids, "BerksMovers.gpkg", layer = "Cyclical Centroids")
cyclical_movers_simple_map <- ggmap(basemap) + # Add the basemap
geom_sf(data = Back_and_forth_centroids, aes(size = GrossMigC, color = "Total Cyclical Movers at Outside County"), alpha = 0.7, inherit.aes = FALSE) + # Plot centroids
scale_size_continuous(range = c(0.5, 20), name = "Mover Count") + # Adjust symbol size range and add legend for size
scale_color_manual(values = c("Total Cyclical Movers at Outside County" = "purple"), name = "Movement Type") + # Custom color legend
theme_minimal() +
labs(
title = "Cyclical Mover Gross Migration Flows",
subtitle = ""
) +
theme(
legend.position = "bottom", # Adjust legend position
axis.text = element_blank(), # Remove latitude & longitude labels
axis.ticks = element_blank(), # Remove tick marks on axes
axis.title = element_blank() # Remove axis titles
) +
guides(
size = guide_legend(override.aes = list(color = "grey")) # Force size legend points to be orange
)
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
cyclical_movers_simple_map
# Save the plot as a larger image
#ggsave("new_cyclical_movers_simple_map.png", plot = cyclical_movers_simple_map, width = 12, height = 10, dpi = 300)
###Gross Multi Mover Map
#back and forth map
Transient_centroids <- simple_centroids %>%
filter(temporal_simple_categories == "Transient", GEOID != 25003) %>% st_transform(crs = 4326)
#st_write(Transient_centroids, "BerksMovers.gpkg", layer = "Transient Centroids")
Transient_simple_map <- ggmap(basemap) + # Add the basemap
geom_sf(data = Transient_centroids, aes(size = PopMoversC, color = "Gross Out Migration of Transient Movers"), alpha = 0.8, inherit.aes = FALSE) + # Plot centroids
scale_size_continuous(range = c(0.5, 20), name = "Mover Count") + # Adjust symbol size range and add legend for size
scale_color_manual(values = c("Gross Out Migration of Transient Movers" = "green"), name = "Movement Type") + # Custom color legend
theme_minimal() +
labs(
title = "Transient Mover Gross Migration Flows",
subtitle = ""
) +
theme(
legend.position = "bottom", # Adjust legend position
axis.text = element_blank(), # Remove latitude & longitude labels
axis.ticks = element_blank(), # Remove tick marks on axes
axis.title = element_blank() # Remove axis titles
) +
guides(
size = guide_legend(override.aes = list(color = "grey")) # Force size legend points to be orange
)
## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.
Transient_simple_map
# Save the plot as a larger image
#ggsave("new_transient_movers_simple_map.png", plot = Transient_simple_map, width = 12, height = 10, dpi = 300)
###Interactive Map
library(leaflet)
### commenting out for the knit
# Create an interactive map using leaflet
# leaflet() %>%
# addProviderTiles("CartoDB.Positron") %>% # Add a tile layer, can be customized with different map types
# # Add markers for SD_Left_centroids
# addCircleMarkers(
# data = SD_Left_centroids,
# lng = ~st_coordinates(geometry)[,1], # Extract longitude from geometry column
# lat = ~st_coordinates(geometry)[,2], # Extract latitude from geometry column
# radius = ~sqrt(PopMoversC) * 2, # Adjust size based on population; scale as needed
# color = "red",
# fillColor = "red",
# fillOpacity = 0.7,
# stroke = TRUE,
# weight = 1,
# popup = ~paste("Destination Total: ", PopMoversC) # Add a popup with population info
#
# ) %>%
#
# # Add markers for SO_Moved_centroids
# addCircleMarkers(
# data = SO_Moved_centroids,
# lng = ~st_coordinates(geometry)[,1], # Extract longitude from geometry column
# lat = ~st_coordinates(geometry)[,2], # Extract latitude from geometry column
# radius = ~sqrt(PopMoversC) * 2, # Adjust size based on population; scale as needed
# color = "yellow",
# fillColor = "yellow",
# fillOpacity = 0.7,
# stroke = TRUE, # To add a border around the circles
# weight = 1,
# popup = ~paste("Origin Total: ", PopMoversC) # Add a popup with population info
#
# ) %>%
#
# # Customize the appearance of the legend
# addLegend(
# position = "bottomright",
# colors = c("yellow", "red"),
# labels = c("Total Inmovers at Origin", "Total Outmovers at Destination"),
# title = "Permanent Migration Flows"
# )
##Plot Redfin Data
redfin_data <- read.csv("/Users/isaiahbennett/Downloads/spring 2025/migration research/main/data/redfin/Berkshire_redfin.csv"
)
redfin_data_refined <- redfin_data %>% select(region, period_begin, property_type, median_sale_price, median_sale_price_yoy, homes_sold, new_listings, median_dom) %>% arrange(property_type, period_begin) %>% filter(
period_begin > as.Date("2018-12-01") & period_begin < as.Date("2022-01-01"), property_type != "Townhouse")
redfin_data_refined$period_begin <- as.yearmon(redfin_data_refined$period_begin, format = "%Y-%m-%d")
redfin_single_fam <- redfin_data_refined %>%
filter(property_type == "Single Family Residential")
redfin_all_res <- redfin_data_refined %>%
filter(property_type == "All Residential")
ggplot(redfin_single_fam, aes(x = period_begin, y = median_sale_price)) +
geom_line(color = "blue", size = 1) +
labs(title = "Median Sale Price Over Time", x = "Year-Month", y = "Median Sale Price") +
theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## Warning: The `trans` argument of `continuous_scale()` is deprecated as of ggplot2 3.5.0.
## ℹ Please use the `transform` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
ggplot(redfin_all_res, aes(x = period_begin, y = median_sale_price)) +
geom_line(color = "blue", size = 1) +
labs(title = "Median Sale Price Over Time", x = "Year-Month", y = "Median Sale Price") +
theme_minimal()
ggplot(redfin_data_refined, aes(x = period_begin, y = median_sale_price, color = property_type)) +
geom_line(size = 1) + # Line plot
labs(title = "Median Sale Price Over Time by Property Type",
x = "",
y = "Price ($)",
color = "Property Type") + # Legend title
theme_minimal()
median_sales_plot <- ggplot(redfin_data_refined, aes(x = period_begin, y = median_sale_price, color = property_type)) +
geom_line(size = 1) + # Line plot
labs(title = "Median Sale Price Over Time by Property Type",
x = "",
y = "Price ($)",
color = "Property Type") + # Legend title
theme_minimal()
#ggsave("median_sales_plot.png", plot = median_sales_plot, width = 8, height = 6, dpi = 300)
#write.csv(redfin_data_refined, "refin_data_refined.csv", row.names = FALSE)
ggplot(redfin_single_fam, aes(x = period_begin)) +
geom_line(aes(y = homes_sold, color = "Homes Sold"), size = 1) +
geom_line(aes(y = new_listings, color = "New Listings"), size = 1) +
labs(title = "Homes Sold & New Listings Over Time", x = "Year-Month", y = "Count") +
scale_color_manual(values = c("Homes Sold" = "red", "New Listings" = "green")) +
theme_minimal()
ggplot(redfin_data_refined, aes(x = period_begin, y = homes_sold, color = property_type)) +
geom_line(size = 1) +
labs(title = "Homes Sold Over Time by Property Type",
x = "Year-Month",
y = "Homes Sold",
color = "Property Type") +
theme_minimal()
ggplot(redfin_single_fam, aes(x = period_begin, y = median_dom)) +
geom_line(color = "purple", size = 1) +
labs(title = "Median Days on Market Over Time", x = "Year-Month", y = "Median DOM") +
theme_minimal()
#Question #3: Economic Shifts
#query and save rent data at the tract level
if (exists("econ_indicators")) {
# Code to execute if the dataframe exists
print("Dataframe exists! On to next step...")
} else {
econ_indicators <- get_acs(
geography = "tract",
variables = c(
"B25064_001", # Median Gross Rent
"B25077_001", # Median Home Value
"B19013_001", # Median Household Income
"B25119_003" # Median Renter Income
),
survey = "acs5",
state = all_state_codes,
year = 2022,
output = "wide",
geometry = FALSE # turn this on later once it's working
)
# write.csv(x = econ_indicators,
# file = "econ_indicators_22.csv",
# row.names = FALSE,
# na = "")
}
## Getting data from the 2018-2022 5-year ACS
## Fetching tract data by state and combining the result.
#add 2020 tract geometry since the query was failing to execute with geometry = TRUE
econ_indicators_spatial <- US_tracts_2020 %>% select(GEOID, geometry) %>%
right_join(econ_indicators, by = "GEOID")
### The categorized mover geographic attributes align with the census data from 2010. To be able to join the economic census data with 2020 tract geometries properly, they have to be joined by location since they some tracts have different tract ids.
#The goal is for the final econ_join to have geometry from 2010 census to align with movers
# Join categorized movers to 2010 tract geometry
categorized_movers_spatial_2010 <- US_tracts_2010 %>%
right_join(time_filt_categories, by = c("GEOID" = "tract_fips"))
#arrange by caid and date
categorized_movers_spatial_2010 <- arrange(categorized_movers_spatial_2010, caid, date)
#create a centroid on surface to avoid concave outlier centroids
categorized_movers_centroids <- categorized_movers_spatial_2010 %>%
mutate(centroid = st_point_on_surface(geometry)) %>%
st_set_geometry("centroid") # Temporarily set the geometry to the centroids
## Warning: There was 1 warning in `stopifnot()`.
## ℹ In argument: `centroid = st_point_on_surface(geometry)`.
## Caused by warning in `st_point_on_surface.sfc()`:
## ! st_point_on_surface may not give correct results for longitude/latitude data
#ensure same crs between econ data and movers
categorized_movers_centroids <- st_transform(categorized_movers_centroids, crs = 4326)
econ_indicators_spatial <- st_transform(econ_indicators_spatial, crs = 4326)
###join by location using centroids within econ tracts
#mover tracts are assigned the econ census data from the tract that their centroids falls within
categorized_movers_with_econ <- st_join(
categorized_movers_centroids,
econ_indicators_spatial,
join = st_within, # Ensures the centroid falls within the 2020 geometry
left = TRUE
)
# Set back to the original geometries
econ_join <- categorized_movers_with_econ %>%
st_set_geometry("geometry")
#arrange and erase redundant columns
econ_join <- econ_join %>% arrange(caid, date) %>% select(-c(1:9, ))
### filters out permanent simple category and creates origin and destination tract
permanent_econ_join <- econ_join %>% filter(temporal_simple_categories %in% c("Left", "Moved")) %>% group_by(caid) %>% mutate(
#latest target county date either before moving out or last moved into
last_target_tract = if_else(new_date == as.character(date_last_berks), GEOID.x, NA_character_),
# selecting the GEOID from the mover frame with the same index as the max duration tract
principal_outside_tract = if_else(row_number() == first(which(!is.na(max_duration))), GEOID.x, NA_character_)
) %>%
relocate(last_target_tract, principal_outside_tract, .before = everything())
# only selects origin and destination tract gets rid of other na tracts
filtered_permanent_econ_join <- permanent_econ_join %>%
filter(!is.na(last_target_tract) | !is.na(principal_outside_tract))
## filters for just Left
#pairs origin to destination in one row per caid and defines economic factors for origin and destination
left_econ_tracts_paired <- filtered_permanent_econ_join %>% filter(temporal_simple_categories == "Left") %>%
summarize(
movement_pattern = first(movement_pattern, na.rm = TRUE),
temporal_simple_categories = first(temporal_simple_categories, na.rm = TRUE),
last_target_tract = first(last_target_tract, na.rm = TRUE),
principal_outside_tract = last(principal_outside_tract, na.rm = TRUE),
CountyOrig = first(NAMELSADCO, na.rm = TRUE),
CountyDest = last(NAMELSADCO, na.rm = TRUE),
dest_county_fips = last(county_fips, na.rm = TRUE),
OrigHomeValue = first(B25077_001E, na.rm = TRUE),
DestHomeValue = last(B25077_001E, na.rm = TRUE),
OrigMedIncome = first(B19013_001E, na.rm = TRUE),
DestMedIncome = last(B19013_001E, na.rm = TRUE),
OrigMedMonthlyIncome = first(B19013_001E/12, na.rm = TRUE),
DestMedMonthlyIncome = last(B19013_001E/12, na.rm = TRUE),
OrigMedRenterMonthlyIncome = ifelse(is.na(first(B25119_003E/12)), first(B19013_001E/12, na.rm = TRUE), first(B25119_003E/12, na.rm = TRUE)), #if renter income doesnt exist use overall income
DestMedRenterMonthlyIncome = ifelse(is.na(last(B25119_003E/12)), last(B19013_001E/12, na.rm = TRUE), last(B25119_003E/12, na.rm = TRUE)), #if renter income doesnt exist use overall income
OrigGrsRent = first(B25064_001E, na.rm = TRUE),
DestGrsRent = last(B25064_001E, na.rm = TRUE),
AffordableRent = OrigMedMonthlyIncome * 0.3,
RentDiff = OrigGrsRent - DestGrsRent,
OrigPctIncomeSpentOnRent = OrigGrsRent/OrigMedRenterMonthlyIncome * 100,
DestPctIncomeSpentOnRent = DestGrsRent/DestMedRenterMonthlyIncome * 100,
OrigHomeValtoIncome = OrigHomeValue / OrigMedIncome,
DestHomeValtoIncome = DestHomeValue / OrigMedIncome,
) %>%
ungroup()
#counts total movers by berkshire origin tract
left_econ_tract_count <- left_econ_tracts_paired %>%
group_by(last_target_tract) %>%
summarize(
TotalOutmovers = n_distinct(caid),
across(
c(movement_pattern, dest_county_fips, OrigHomeValue, OrigMedIncome, OrigMedMonthlyIncome, OrigGrsRent, OrigPctIncomeSpentOnRent, OrigHomeValtoIncome),
~ first(.x, na.rm = TRUE),
.names = "{.col}"
)
)
#groups by same path and counts number of movers
left_econ_by_path <- left_econ_tracts_paired %>% group_by(last_target_tract, principal_outside_tract) %>% summarize(
TotalMovers = n_distinct(caid),
across(
c(movement_pattern, dest_county_fips, OrigHomeValue, OrigHomeValue, DestHomeValue, OrigMedIncome, DestMedIncome, OrigMedMonthlyIncome, DestMedMonthlyIncome, OrigMedRenterMonthlyIncome, DestMedRenterMonthlyIncome, OrigGrsRent, DestGrsRent, AffordableRent, RentDiff, OrigPctIncomeSpentOnRent, DestPctIncomeSpentOnRent, OrigHomeValtoIncome, DestHomeValtoIncome),
~ first(.x, na.rm = TRUE),
.names = "{.col}"
))
## `summarise()` has grouped output by 'last_target_tract'. You can override using
## the `.groups` argument.
## Create classes of rent burden intervals to later count total outmovers per class
left_econ_by_path <- left_econ_by_path %>% mutate(ClassesPctIncomeOnRent = case_when(
OrigPctIncomeSpentOnRent >= 0 & OrigPctIncomeSpentOnRent < 10 ~ "0-10",
OrigPctIncomeSpentOnRent >= 10 & OrigPctIncomeSpentOnRent < 20 ~ "10-20",
OrigPctIncomeSpentOnRent >= 20 & OrigPctIncomeSpentOnRent < 30 ~ "20-30",
OrigPctIncomeSpentOnRent >= 30 & OrigPctIncomeSpentOnRent < 40 ~ "30-40",
OrigPctIncomeSpentOnRent >= 40 & OrigPctIncomeSpentOnRent < 50 ~ "40-50",
OrigPctIncomeSpentOnRent >= 50 & OrigPctIncomeSpentOnRent < 60 ~ "50-60",
OrigPctIncomeSpentOnRent >= 60 & OrigPctIncomeSpentOnRent < 70 ~ "60-70",
OrigPctIncomeSpentOnRent >= 70 & OrigPctIncomeSpentOnRent < 80 ~ "70-80",
OrigPctIncomeSpentOnRent >= 80 & OrigPctIncomeSpentOnRent < 90 ~ "10-20",
TRUE ~ "No Census Data"),
DestClassesPctIncomeOnRent = case_when(
DestPctIncomeSpentOnRent >= 0 & DestPctIncomeSpentOnRent < 10 ~ "0-10",
DestPctIncomeSpentOnRent >= 10 & DestPctIncomeSpentOnRent < 20 ~ "10-20",
DestPctIncomeSpentOnRent >= 20 & DestPctIncomeSpentOnRent < 30 ~ "20-30",
DestPctIncomeSpentOnRent >= 30 & DestPctIncomeSpentOnRent < 40 ~ "30-40",
DestPctIncomeSpentOnRent >= 40 & DestPctIncomeSpentOnRent < 50 ~ "40-50",
DestPctIncomeSpentOnRent >= 50 & DestPctIncomeSpentOnRent < 60 ~ "50-60",
DestPctIncomeSpentOnRent >= 60 & DestPctIncomeSpentOnRent < 70 ~ "60-70",
DestPctIncomeSpentOnRent >= 70 & DestPctIncomeSpentOnRent < 80 ~ "70-80",
DestPctIncomeSpentOnRent >= 80 & DestPctIncomeSpentOnRent < 90 ~ "80-90",
DestPctIncomeSpentOnRent >= 90 ~ "90 and up",
TRUE ~ "No Census Data"),
)
# Group by classes and summarize total movers
left_econ_by_classes <- left_econ_by_path %>% group_by(ClassesPctIncomeOnRent) %>% summarize(TotalMovers = sum(TotalMovers))
# Calculate percent of total movers per class
left_econ_by_classes <- left_econ_by_classes %>% mutate(PctTotalMovers = round(TotalMovers/sum(left_econ_by_classes$TotalMovers) *100, 2))
### This ones not super helpful
# filters out displaced movers and counts their totals by rent burden classes
displaced_migration_paths_by_classes <- left_econ_by_path %>% filter(OrigPctIncomeSpentOnRent >=30 & DestPctIncomeSpentOnRent < 30) %>% group_by(DestClassesPctIncomeOnRent) %>% summarize(TotalMovers = sum(TotalMovers))
displaced_migration_paths_by_classes <- displaced_migration_paths_by_classes %>% mutate(PctTotalMovers = round(TotalMovers/sum(left_econ_by_classes$TotalMovers) *100, 2))
# Total count of outmovers based on the rent burden classes of their destination
left_dest_econ_by_classes <- left_econ_by_path %>% group_by(DestClassesPctIncomeOnRent) %>% summarize(TotalMovers = sum(TotalMovers))
left_dest_econ_by_classes <- left_dest_econ_by_classes %>% mutate(PctTotalMovers = round(TotalMovers/sum(left_econ_by_classes$TotalMovers) *100, 2))
#filters for movers who were rent burdened to renters no longer rent burdened
displaced_migration_paths <- left_econ_by_path %>% filter(OrigPctIncomeSpentOnRent >=30 & DestPctIncomeSpentOnRent < 30)
# Counts the total number of displaced movers by target tracts
total_displaced_movers_by_tract <- displaced_migration_paths %>% group_by(last_target_tract) %>% summarize(TotalDisplaced = sum(TotalMovers))
# Counts total number of outmovers designated as displaced
totaldisplaced = sum(total_displaced_movers_by_tract$TotalDisplaced)
## filters for just Moved
#pairs origin to destination in one row per caid and defines important economic indicators for origin and destination
moved_econ_tracts_paired <- filtered_permanent_econ_join %>% filter(temporal_simple_categories == "Moved") %>%
summarize(
movement_pattern = first(movement_pattern, na.rm = TRUE),
last_target_tract = last(last_target_tract, na.rm = TRUE),
principal_outside_tract = first(principal_outside_tract, na.rm = TRUE),
CountyOrig = first(NAMELSADCO, na.rm = TRUE),
orig_county_fips = first(county_fips, na.rm = TRUE),
OrigHomeValue = first(B25077_001E, na.rm = TRUE),
DestHomeValue = last(B25077_001E, na.rm = TRUE),
OrigMedIncome = first(B19013_001E, na.rm = TRUE),
DestMedIncome = last(B19013_001E, na.rm = TRUE),
PctChangeIncome = (OrigMedIncome - DestMedIncome)/OrigMedIncome *100 # change in income between origin and destination
) %>%
ungroup()
#counts total movers per unique migration path keeping their unique economic variables
moved_econ_by_path <- moved_econ_tracts_paired %>%
group_by(principal_outside_tract, last_target_tract) %>% summarize(
MovedTotalMovers = n_distinct(caid), # Count unique 'caid'
across(
c(movement_pattern, orig_county_fips, CountyOrig, OrigHomeValue, DestHomeValue, OrigMedIncome, DestMedIncome, PctChangeIncome),
~ first(.x, na.rm = TRUE),
.names = "{.col}"
))
## `summarise()` has grouped output by 'principal_outside_tract'. You can override
## using the `.groups` argument.
#Creates classes of percent change in income between origin and destination
moved_econ_by_path <- moved_econ_by_path %>%
mutate(ClassesPctChangeIncome = case_when(
PctChangeIncome < -100 ~ "Under -100",
PctChangeIncome >= -100 & PctChangeIncome < -75 ~ "-100 to -75",
PctChangeIncome >= -75 & PctChangeIncome < -50 ~ "-75 to -50",
PctChangeIncome >= -50 & PctChangeIncome < -25 ~ "-50 to -25",
PctChangeIncome >= -25 & PctChangeIncome < 0 ~ "-25 to 0",
PctChangeIncome >= 0 & PctChangeIncome < 25 ~ "0 to 25",
PctChangeIncome >= 25 & PctChangeIncome < 50 ~ "25 to 50",
PctChangeIncome >= 50 & PctChangeIncome < 75 ~ "50 to 75",
PctChangeIncome >= 75 & PctChangeIncome < 100 ~ "75 to 100",
PctChangeIncome >= 100 ~ "Over 100",
TRUE ~ "No Census Data"
))
#Orders classes so they are not rendered alphabetically
moved_econ_by_path$ClassesPctChangeIncome <- factor(
moved_econ_by_path$ClassesPctChangeIncome,
levels = c(
"Over 100",
"75 to 100",
"50 to 75",
"25 to 50",
"0 to 25",
"-25 to 0",
"-50 to -25",
"-75 to -50",
"-100 to -75",
"Under -100",
"No Census Data"
))
#Group by the classes and count total movers
moved_econ_by_income_classes <- moved_econ_by_path %>% group_by(ClassesPctChangeIncome) %>% summarize(MovedTotalMovers = sum(MovedTotalMovers))
#calculate percent of total movers
moved_econ_by_income_classes <- moved_econ_by_income_classes%>% mutate(PctTotalMovers = round(MovedTotalMovers/sum(moved_econ_by_income_classes$MovedTotalMovers) *100, 2))
gentrifier_migration_paths <- moved_econ_by_path %>% filter(PctChangeIncome >=25)
#Counts total inmovers whos origin has 25% higher median income or more by target tracts
total_gentrifier_movers_by_tract <- gentrifier_migration_paths %>% group_by(last_target_tract) %>% summarize(TotalGentrifiers = sum(MovedTotalMovers))
#Join the displaced by tract and gentrifier by tract
gent_disp_tract_join <- total_displaced_movers_by_tract %>% right_join(total_gentrifier_movers_by_tract, by = "last_target_tract")
write.csv(gent_disp_tract_join, "Gent and Disp Counts by Target Tract.csv", row.names = FALSE)
#Run a correlation test between the total gentrifiers and displaced movers per tract
correlation <- cor.test(
gent_disp_tract_join$TotalGentrifiers,
gent_disp_tract_join$TotalDisplaced,
method = "pearson"
)
correlation
##
## Pearson's product-moment correlation
##
## data: gent_disp_tract_join$TotalGentrifiers and gent_disp_tract_join$TotalDisplaced
## t = 7.6031, df = 18, p-value = 5.02e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.7019915 0.9490313
## sample estimates:
## cor
## 0.873243
###Group by
#group displaced movers by county and count total movers
displaced_county_group_by <- displaced_migration_paths %>% group_by(dest_county_fips, movement_pattern) %>% summarize(TotalMovers = sum(TotalMovers),
across(
c(OrigMedMonthlyIncome, DestMedMonthlyIncome, OrigMedRenterMonthlyIncome, DestMedRenterMonthlyIncome, OrigGrsRent, DestGrsRent, OrigPctIncomeSpentOnRent, DestPctIncomeSpentOnRent),
~ first(.x, na.rm = TRUE),
.names = "{.col}"
))
## `summarise()` has grouped output by 'dest_county_fips'. You can override using
## the `.groups` argument.
# group gentrifier movers by county and count total movers
gentrifier_county_group_by <- gentrifier_migration_paths %>% group_by(orig_county_fips, movement_pattern) %>% summarize(TotalMovers = sum(MovedTotalMovers),
across(
c(OrigMedIncome, DestMedIncome, PctChangeIncome),
~ first(.x, na.rm = TRUE),
.names = "{.col}"
))
## `summarise()` has grouped output by 'orig_county_fips'. You can override using
## the `.groups` argument.
#Create County Centroids for the US to be joined with both dataframes
US_counties_2020_centroids <- st_centroid(US_counties_2020)
## Warning: st_centroid assumes attributes are constant over geometries
displaced_county_spatial <- US_counties_2020_centroids %>%
select(GEOID, geom) %>% # Keep only GEOID and geometry before joining
right_join(displaced_county_group_by, by = c("GEOID" = "dest_county_fips"))
gentrifier_county_spatial <- US_counties_2020_centroids %>%
select(GEOID, geom) %>% # Keep only GEOID and geometry before joining
right_join(gentrifier_county_group_by, by = c("GEOID" = "orig_county_fips"))
#st_write(displaced_county_spatial, "BerksMovers.gpkg", layer = "displaced_county_centroids" )
#st_write(gentrifier_county_spatial, "BerksMovers.gpkg", layer = "gentrifier_county_centroids" )
##Urban Shift for Gentrifiers and Displaced Movers
NCHS_displaced_join <- urban_rural_NCHS %>%
select(GEOID, CBSA.title, X2013.code, X2006.code, X1990.based.code) %>%
right_join(displaced_county_group_by, by = c("GEOID" = "dest_county_fips"))
NCHS_gentrifier_join <- urban_rural_NCHS %>%
select(GEOID, CBSA.title, X2013.code, X2006.code, X1990.based.code) %>%
right_join(gentrifier_county_group_by, by = c("GEOID" = "orig_county_fips"))
NCHS_displaced_join$UrbanHierLeft <- CountyCode - NCHS_displaced_join$X2013.code
NCHS_gentrifier_join$UrbanHierMoved <- NCHS_gentrifier_join$X2013.code - CountyCode
Displaced_Weighted_Shift <- NCHS_displaced_join %>%
group_by(movement_pattern) %>% summarize(
UrbanShift = sum(UrbanHierLeft * TotalMovers, na.rm = TRUE),
PopMoversWS = sum(TotalMovers, na.rm = TRUE),
WeightedShift = UrbanShift / PopMoversWS
)
Gentrifier_Weighted_Shift <- NCHS_gentrifier_join %>%
group_by(movement_pattern) %>% summarize(
UrbanShift = sum(UrbanHierMoved * TotalMovers, na.rm = TRUE),
PopMoversWS = sum(TotalMovers, na.rm = TRUE),
WeightedShift = UrbanShift / PopMoversWS
)
##Join geometries to target tracts and save
#Join tract geometry for just the layer that has total number of displaced movers
displaced_join <- US_tracts_2010 %>%
select(GEOID, geometry) %>% # Keep only GEOID and geometry before joining
right_join(total_displaced_movers_by_tract, by = c("GEOID" = "last_target_tract"))
#create centroids
displaced_join_centroids <- st_centroid(displaced_join)
## Warning: st_centroid assumes attributes are constant over geometries
gentrifier_join <- US_tracts_2010 %>%
select(GEOID, geometry) %>% # Keep only GEOID and geometry before joining
right_join(total_gentrifier_movers_by_tract, by = c("GEOID" = "last_target_tract"))
#create centroids
gentrifier_join_centroids <- st_centroid(gentrifier_join)
## Warning: st_centroid assumes attributes are constant over geometries
target_tracts_econ <- US_tracts_2010 %>%
select(GEOID, geometry) %>% # Keep only GEOID and geometry before joining
right_join(left_econ_tract_count, by = c("GEOID" = "last_target_tract"))
#save to geopackage
#st_write(displaced_join_centroids, "BerksMovers.gpkg", layer = "displaced_tract_centroids", append = FALSE)
#st_write(gentrifier_join_centroids, "BerksMovers.gpkg", layer = "gentrifier_tract_centroids" )
#st_write(target_tracts_econ, "BerksMovers.gpkg", layer = "target_tracts_econ" )
Rights: LICENSE: BSD 3-Clause
“New” or “Revised”Resource type: CollectionResource language: EnglishConforms to: Template for Reproducible and Replicable
Research in Human-Environment and Geographical Sciences version 1.0, DOI:[10.17605/OSF.IO/W29MQ](DOI:%5B10.17605/OSF.IO/W29MQ){.uri}This research compendium is structured with four main directories:
data: contains subdirectories for raw data
and derived data.docs: contains subdirectories for
manuscript, presentation, and
reportprocedure: contains subdirectories for
code or software scripts, information about the
computational environment in which the research was
conducted, and non-code research protocolsresults: contains subdirectories for
figures, formatted data tables, or
other formats of research results.The data, procedures, and results of this repository are outlined in three tables: - Data: data/data_index.csv - Procedures: procedure/procedure_index.csv - Results: results/results_index.csv
Important local documents include: - Pre-analysis plan: docs/report/preanalysis.pdf - Study report: docs/report/report.pdf - Manuscript: docs/manuscript/manuscript.pdf - Presentation: docs/presentation/presentation.pdf
The template_readme.md file contains more information on the design of this template and references used in the design. The Template_LICENSE file provides the BSD 3-Clause license for using this template. To cite the template, please use template_reference.bib or: > Kedron, P., & Holler, J. (2023). Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences. https://doi.org/10.17605/OSF.IO/W29MQ
This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.
Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.
Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.
When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).
The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.