Mapping COVID-19 Amenity Migration to Explore Rural Gentrification and Local Displacement in the Berkshires

Contributors

Isaiah Bennett

Abstract

The global COVID-19 pandemic catalyzed the emergence of new migration patterns as remote work and quarantines spread worldwide. One particular trend was a wave of migration to the rural countryside in search of a slower pace of life, social distance, and access to natural amenities. In many places across the country, this phenomenon has resulted in rural gentrification, where the influx of newcomers increases housing prices and pushes locals out. The granularity of census migration data leaves a lot of questions to be answered about the true characteristics of migration during the pandemic. Focusing on the case study of the Berkshires, a largely rural county in Western Massachusetts, this study uses mobile device location data that records monthly migration at the individual level. With this novel form of data, this study examines the complexity of new migration trends by establishing a classification of these new movement patterns, connecting origins and destinations to investigate the direction of migration on the urban hierarchy, and identifying socioeconomic trends in migration that can point to rural gentrification and local displacement. The results thus far have demonstrated increased complexity and transience in migration where temporary and cyclical migration has a stronger presence than ever before. Additionally, the findings have confirmed a strong migration trend between urban centers and rural amenity-rich areas. Further work will focus on replicability of the study by running the script on different counties across the US to compare relative trends between counties.

Study Metadata

Soon to be added!

Key words: R, Spatial Analysis
Subject: Social and Behavioral Sciences: Geography: Geographic Information Sciences
Date created: 2023-09-15
Date modified: 2023-09-15
Spatial Coverage: Specify the geographic extent of your study. This may be a place name and link to a feature in a gazetteer like GeoNames or OpenStreetMap, or a well known text (WKT) representation of a bounding box.
Spatial Resolution: Specify the spatial resolution as a scale factor, description of the level of detail of each unit of observation (including administrative level of administrative areas), and/or or distance of a raster GRID size
Spatial Reference System: Specify the geographic or projected coordinate system for the study
Temporal Coverage: Specify the temporal extent of your study—i.e. the range of time represented by the data observations.
Temporal Resolution: Specify the temporal resolution of your study—i.e. the duration of time for which each observation represents or the revisit period for repeated observations
Funding Name: NSF Directorate for Social, Behavioral and Economic Sciences
Funding Title: Transforming theory-building and STEM education through reproductions and replications in the geographical sciences
Award info URI: https://www.nsf.gov/awardsearch/showAward?AWD_ID=2049837
Award number: BCS-2049837

Set up Environment

# install packages
#install.packages( c("tigris", "sf", "tidycensus", "lubridate", "dplyr", "ggplot2", "ggmap", "zoo", "data.table", "tidyr") )
# look for pop-up dialogue to agree to installation!



# load packages
library(tigris)

## To enable caching of data, set `options(tigris_use_cache = TRUE)`
## in your R script or .Rprofile.

library(sf)

## Warning: package 'sf' was built under R version 4.3.3

## Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE

library(tidycensus)

## Warning: package 'tidycensus' was built under R version 4.3.3

library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(ggmap)

## ℹ Google's Terms of Service: <https://mapsplatform.google.com>
##   Stadia Maps' Terms of Service: <https://stadiamaps.com/terms-of-service/>
##   OpenStreetMap's Tile Usage Policy: <https://operations.osmfoundation.org/policies/tiles/>
## ℹ Please cite ggmap if you use it! Use `citation("ggmap")` for details.

library(zoo)

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(data.table)

## 
## Attaching package: 'data.table'

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year

library(tidyr)


#census_api_key(key = "ffa84ce2bb640d4a800340766ebeb796333dd0d2", install = TRUE)

# Load a CSV file into a data frame
movers <- read.csv("/Users/isaiahbennett/Downloads/fall 2024/migration/research/data/berkshire_moversR/BerkshireMovers.csv"
)

Census Data

# query metadata for 2022 census

# Define Variables to input for census data query
target_state = "MA"
target_county = "Berkshire"
target_fips = 25003


# Set options for tigris
options(tigris_use_cache = TRUE)

# Check and load ACS 2022 variables
if (!exists("acs2022vars")) {
  acs2022vars <- load_variables(year = 2022, dataset = "acs5")
  print("ACS 2022 variables loaded!")
}

## [1] "ACS 2022 variables loaded!"

# Define target state and county (modify as needed)
target_state <- "MA"  # Change to your desired state abbreviation
target_county <- "Berkshire"  # Change to your desired county name



# Check and load Berkshire County census tracts
if (!exists("berks_tracts_2020")) {
  tryCatch({
    berks_tracts_2020 <- tracts(state = target_state, county = target_county, year = 2020, cb = TRUE)
    print("Berkshire County tracts loaded!")
  }, error = function(e) {
    print("Error loading Berkshire County tracts")
    print(e)
  })
}

## [1] "Berkshire County tracts loaded!"

# Check and load US census tracts
if (!exists("US_tracts_2020")) {
  tryCatch({
    US_tracts_2020 <- tracts(year = 2020, cb = TRUE)
    print("US tracts loaded!")
  }, error = function(e) {
    print("Error loading US tracts")
    print(e)
  })
}

## Retrieving Census tracts for the entire United States

## [1] "US tracts loaded!"

# Check and load US counties
if (!exists("US_counties_2020")) {
  options(tigris_use_cache = FALSE)  # Disable cache temporarily
  tryCatch({
    US_counties_2020 <- counties(year = 2020, cb = FALSE)
    print("US counties loaded!")
  }, error = function(e) {
    print("Error loading US counties")
    print(e)
  })
  options(tigris_use_cache = TRUE)  # Re-enable cache
}

## 
Downloading: 1.7 kB     
Downloading: 1.7 kB     
Downloading: 1.7 kB     
Downloading: 1.7 kB     
Downloading: 1.7 kB     
Downloading: 1.7 kB

## Warning in unzip(file_loc, exdir = tmp): error 1 in extracting from zip file

## [1] "Error loading US counties"
## <Rcpp::exception: Cannot open "/private/var/folders/76/0f4vvnw909n136hz270cz5yc0000gn/T/Rtmp2Z7xLz"; The source could be corrupt or not supported. See `st_drivers()` for a list of supported formats.>

print("All datasets are now loaded and ready!")

## [1] "All datasets are now loaded and ready!"

# st_write(berks_tracts_2020, "BerksMovers.gpkg", layer = "tracts2020")
# st_write(US_tracts_2020, "BerksMovers.gpkg", layer = "UStracts2020")
# st_write(US_counties_2020, "BerksMovers.gpkg", layer = "UScounties2020")

Data Preparations

# Convert decimal year to Year Month
movers$new_date <- as.yearmon(movers$date, format = "%d-%m-%Y")
movers$new_date <- as.character(movers$new_date)

# Add leading zeros to 4 digit county fips so that all county fips have 5 digits
movers$county_fips <- ifelse(nchar(movers$county_fips) == 4,
                                           paste0("0", movers$county_fips),
                                           movers$county_fips)

#Change data type for county fips to character for join
#movers$county_fips <- as.character(movers$county_fips) ### can delete?

#Add leading zeros to 10 digit tract fips
movers$tract_fips <- ifelse(nchar(movers$tract_fips) == 10,
                             paste0("0", movers$tract_fips),
                             movers$tract_fips)

#Change data type to character for joins
movers$tract_fips <- as.character(movers$tract_fips)


# get state codes to choose only states that exist in the data 
#to query tracts from 2010 state by state for whats needed
extract_states <- substr(movers$tract_fips, 1, 2) #start at 1st and stop at 2nd character

# Combine and find unique state codes
all_state_codes <- unique(extract_states)

# query US 2010 census with state codes that are in movers data
# using 2010 because the tractfips from the data correspond 
# with the tract delineation from 2010 and not 2020 

# Initialize an empty list to store the tracts data for each state
all_tracts <- list()

# Loop through each state code and get the tracts data
for (state_code in all_state_codes) {
  state_tracts <- tracts(state = state_code, year = 2010, cb = TRUE)
  all_tracts[[state_code]] <- state_tracts
}

# Combine all tracts data into a single data frame
US_tracts_2010 <- do.call(rbind, all_tracts)

#create GEOID to join with movers
US_tracts_2010 <- US_tracts_2010 %>% mutate(GEOID = paste0(STATE, COUNTY, TRACT))

# selecting fields from tracts 2020 that i want to add to tracts 2010 such as the actual written out name of the counties and states
tracts_2020_names <- US_tracts_2020 %>% select(STATEFP, COUNTYFP, NAMELSADCO, STATE_NAME)

# Check if it's an sf object
if (inherits(tracts_2020_names, "sf")) {
  # Convert it to a regular data frame by dropping the geometry to be able to join normally to sf tracts 2010
  tracts_2020_names <- st_drop_geometry(tracts_2020_names)
}

# Select only unique STATEFP and COUNTYFP with state and county names so that it is not a many to many join and rather a one to many 
tracts_2020_unique <- tracts_2020_names %>%
  select(STATEFP, COUNTYFP, STATE_NAME, NAMELSADCO) %>%
  distinct(STATEFP, COUNTYFP, .keep_all = TRUE)

#left join keeping all from the US_tracts_2010 and joining just the matching values from the unique names frame 
US_tracts_2010 <- US_tracts_2010 %>%
  left_join(tracts_2020_unique, by = c("STATEFP", "COUNTYFP"))

# save to geopackage
#st_write(US_tracts_2010, "BerksMovers.gpkg", layer = "UStracts2010")

Question #1: Categorizing Movers

categorized_movers <- movers %>%
  arrange(caid, date) %>%  # Sort by caid and date to track movements chronologically
  group_by(caid) %>%
  mutate(
    
    
    T_first_outmove = first(tract_fips[outmove == 1], order_by = date),  # First outmove tract 
    T_last_inmove = last(tract_fips[inmove == 1], order_by = date),  # Final inmove tract
    
    C_first_outmove = first(county_fips[outmove == 1], order_by = date),  # First outmove county
    C_last_inmove = last(county_fips[inmove == 1], order_by = date),  # Final inmove county 
    
    date_last_inmove = last(date[T_last_inmove == tract_fips], order_by = date),
    date_first_outmove = first(date[T_first_outmove == tract_fips], order_by = date),
    
    date_first_berks = first(date[county_fips == target_fips], order_by = date),
    date_last_berks = last(date[county_fips == target_fips], order_by = date),
    
  
    county_fips_changes = n_distinct(county_fips),  # Count the number of distinct county_fips
    
    county_chunk = rleid(county_fips), #gives adjacent countys same unique ids 
    
    berkshire_flag = ifelse(county_fips == target_fips, 1, 2),  #Mark if it's Berkshire or not by 1s or 2s 
    
    chunk_id = rleid(berkshire_flag), # creates distinct ids that represent changes in berkshire to non berkshire chunks
    
    movement_pattern = case_when(
  

 # permanent moved/resident     
      #One way single time  
      # maybe use july 2021 for edge effect 
      #period where migration is the greatest 
      
      
      ### Single Origin Move to Berkshires 
        #allows multiple moves between county outside and before berkshires
      C_first_outmove != target_fips & # first outmove is not berkshires 
        C_last_inmove == target_fips & # last inmove is berkshires 
        county_fips_changes == 2 &
        sum(n_distinct(chunk_id)) == 2  # allows multiple moves in outside county
      ~ "SO Moved",  
      
      
      ### multiple outside possibly between different counties ending up in berkshires move 
      C_first_outmove != target_fips & 
        C_last_inmove == target_fips & #berkshire last inmove 
        county_fips_changes >= 2 & # county fips greater than 2   
        sum(n_distinct(chunk_id)) == 2  # outside chunk to inside chunk
      ~ "MO Moved",  
  
      
# permanent left - One way single time
 
      ### Single Destination left berkshire 
      C_first_outmove == target_fips & 
        C_last_inmove != target_fips & #berkshire last inmove 
        county_fips_changes == 2 & # county fips greater than 2   
        sum(n_distinct(chunk_id)) == 2  # outside chunk to inside chunk 
      ~ "SD Left", # maybe most susceptible to displacement

      ### Multi Destination left berkshire
        #allows moves in multiple counties after berkshires
      C_first_outmove == target_fips & #first outmove berkshires
        C_last_inmove != target_fips & #last inmove is not berkshires 
        county_fips_changes >= 2 & # more than one county 
        sum(n_distinct(chunk_id)) == 2 #allows multiple moves in outside county 
      ~ "MD Left", 


#intra only       
      # Moved only within the county 
      all(county_fips == target_fips) 
      ~ "Intra-County",


#2 locations multiple moves 
      C_last_inmove == target_fips &
      county_fips_changes == 2 & #between 2 counties 
      #sum(inmove == 1 & county_fips == 25003, na.rm = TRUE) > 1 & # more than 1 berkshire inmove
      sum(county_fips != lag(county_fips), na.rm = TRUE) >= 3 &  # the number of times the county changes at least two appearances of origin and destination
      date_last_inmove < 2021.5 
      ~ "Back and Forth then Inmover",

     ### back and forth 
     # has to touch both berkshires and outside county twice
     county_fips_changes == 2 & #between 2 counties 
      #sum(inmove == 1 & county_fips == 25003, na.rm = TRUE) > 1 & # more than 1 berkshire inmove
      sum(county_fips != lag(county_fips), na.rm = TRUE) >= 3  # the number of times the county changes at least two appearances of origin and destination
      ~ "Back and Forth",

      
      
      ### Temporary Outmover: first outmove and final inmove are both Berkshire
        # one time get away to berkshire
      C_first_outmove == target_fips & 
        C_first_outmove == C_last_inmove &
        county_fips_changes == 2
      ~ "Temporary Outmover", 
      

      ### Temporary visitor who left their origin, went to Berkshire, and returned to their origin 
        # One time getaway to berkshire and return
      C_first_outmove != target_fips & # first outmove is not from berkshires
        C_first_outmove == C_last_inmove & #origin = destination
        county_fips_changes == 2
      ~ "Temporary Inmover", #return mover remote work or second home owner same tract 


# Multiple moves multiple locations 
      

      ### temporary out move same county but many stops in between 
      C_first_outmove == target_fips & 
        C_first_outmove == C_last_inmove & 
        sum(n_distinct(chunk_id)) == 3 # 
      ~ "Transient Temporary Outmover",
      

      ### separate origin and destination with temp berks inmove 
        # allows for multiple counties before and after 
      C_first_outmove != target_fips & # first outmove is not from berkshires
        C_last_inmove != target_fips & 
        sum(n_distinct(chunk_id)) == 3 # two changes in berkshire or non berkshire
      ~ "Transient Temporary Inmover", #leaves to berks getaway relocates in other county
      
### bounces between berkshires and at least 2 other counties before staying 
      C_last_inmove == target_fips & 
      date_last_inmove < 2021.5 
      ~ "Transient then Inmover", #
      

      ### Visits more than once with other counties included 
        # could be from the berkshires or from somewhere else 
        #but they have a strong connection to the area through more than 1 multiweek visit
      sum(berkshire_flag == 1) > 1 & # more than one berkshire flag 
        sum(n_distinct(chunk_id)) > 3 & # more than two changes between berks and non berks 
        county_fips_changes >= 3 # 3 or more different counties
      ~ "Transient", #was frequent visitor

      
      # Catch all others
      TRUE ~ "Unknown"  
    ),
inmove_date = ifelse(inmove == 1, date, NA),
    outmove_date = ifelse(outmove == 1, date, NA),
  ) 

#create chunk date that is the first date for the first feature in each chunk
  #helpful for filtering by time for edge control
categorized_movers <- categorized_movers %>%
  group_by(caid, chunk_id) %>%
  mutate(
    chunk_date = first(date),
    second_chunk_date = ifelse(chunk_id == 2, first(date), NA)
  ) %>%
  ungroup()



# create duration to calculate the length of time movers spent in each county they visited 
duration_categorized_movers <- categorized_movers %>%
  group_by(caid, county_chunk) %>%
  summarize(
    outmove_date = last(outmove_date),  # Last non-NA outmove_date
    inmove_date = first(inmove_date),    # First non-NA inmove_date
    .groups = "drop"
  ) %>%
  mutate(
    duration = case_when(
      is.na(outmove_date) | outmove_date < inmove_date ~ 2021.917 - inmove_date,  # If no outmove_date for final inmove
      is.na(inmove_date)  ~ outmove_date - 2019.000,  # If no inmove_date for first outmove 
      TRUE ~ outmove_date - inmove_date              # Normal case
    )
  )

# Join duration back to categorized movers frame by caid, and county_chunk
categorized_movers <- categorized_movers %>%
  left_join(
    duration_categorized_movers %>% select(caid, county_chunk, duration), 
    by = c("caid", "county_chunk")
  )


# Identify the singular, non berkshire longest stay 
categorized_movers <- categorized_movers %>%
  group_by(caid) %>%
  mutate(
    max_duration = if_else(
      duration == max(duration[county_fips != 25003], na.rm = TRUE) & county_fips != 25003, 
      duration, 
      NA_real_
    )
  ) %>%
  ungroup()

## Warning: There were 15500 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `max_duration = if_else(...)`.
## ℹ In group 7: `caid =
##   "000d732c0e8054420fc6e034bc544843bb5d2e9c2c0e48dd8d6fac6ce412f314"`.
## Caused by warning in `max()`:
## ! no non-missing arguments to max; returning -Inf
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 15499 remaining warnings.

##Movement Pattern Totals

# calculate the total number of phones for each category and the count of distinct counties for each category 
test_category_totals <- categorized_movers %>% 
  group_by(movement_pattern) %>% 
  summarize(
    total_pop = n_distinct(caid),
    count_counties = (n_distinct(county_fips) - 1),
    count_tracts = n_distinct(tract_fips))

Temporal Simple Categories

categorized_movers <- categorized_movers %>%
  mutate(
    temporal_simple_categories = case_when(

# single origin one way permanent 
      movement_pattern %in% c("SO Moved", "MO Moved", "Back and Forth then Inmover", "Transient then Inmover") 
      ~ "Moved",
      

      # Combine these categories into "Left"
      movement_pattern %in% c("SD Left", "MD Left") 
      ~ "Left",
      
      # Combine these categories into "Temporary Inmover"
      movement_pattern %in% c("Transient Temporary Inmover", "Temporary Inmover") 
      ~ "Temporary Inmover",
      
      # Combine these categories into "Temporary Outmover"
      movement_pattern %in% c("Transient Temporary Outmover", "Temporary Outmover") 
      ~ "Temporary Outmover",
      
      
      # Catch all others
      TRUE ~ movement_pattern
    ))

temporal_simple_category_totals <- categorized_movers %>% 
  group_by(temporal_simple_categories) %>% 
  summarize(
    total_pop = n_distinct(caid),
    count_counties = (n_distinct(county_fips) - 1),
    count_tracts = n_distinct(tract_fips)
  )

##Temporal Distribution of Total Movers

# Temporal graph of when movers arrived vs left 



# Generate sequence of months from Jan 2019 to Dec 2021
dates <- seq(from = as.Date("2019-01-01"), to = as.Date("2021-12-01"), by = "month")

# Convert to "Mon YYYY" format and create temporal dataframe
temporal <- data.frame(Date = format(dates, "%b %Y"))

# Convert temporal Date to yearmon for joining
temporal$Date <- as.yearmon(temporal$Date, format = "%b %Y")

# Convert date columns in categorized_movers to yearmon format
categorized_movers <- categorized_movers %>%
  mutate(date_last_berks = as.yearmon(date_last_berks, format = "%d-%m-%Y"),
         date_first_berks = as.yearmon(date_first_berks, format = "%d-%m-%Y"),
         second_chunk_date = as.yearmon(second_chunk_date, format = "%d-%m-%Y"))


# Summarize distinct `caid` counts for each simple movement pattern by month
simple_movement_summary <- categorized_movers %>%
  mutate(date_key = case_when(
    movement_pattern %in% c("SD Left", "MD Left") ~ as.yearmon(date_last_berks),
    movement_pattern %in% c("SO Moved", "MO Moved", "Temporary Inmover", "Transient Temporary Inmover") ~ as.yearmon(date_first_berks),
    movement_pattern %in% c("Temporary Outmover", "Transient Temporary Outmover") ~ as.yearmon(second_chunk_date),
    TRUE ~ as.yearmon(NA)  # Ensures NA is also numeric-compatible with yearmon
  )) %>%
  filter(!is.na(date_key)) %>%
  group_by(date_key, temporal_simple_categories) %>%
  summarize(total_movers = n_distinct(caid), .groups = "drop") %>%  
  pivot_wider(names_from = temporal_simple_categories, values_from = total_movers, values_fill = 0) # na is replaced w 0
# pivot wider is essential because instead of having multiple rows of each date for each movement pattern they are reoriented into columns 

# Merge summarized movement data with temporal dataframe
simple_temporal_categorized_movers <- temporal %>%
  left_join(simple_movement_summary, by = c("Date" = "date_key"))

write.csv(simple_temporal_categorized_movers, "new_simple_temporal_categorized_movers.csv",  row.names = FALSE)

##Filter by study temporal edge constraints

# temporal control for edge effects and lockdown start 
lockdown = 2020.1 # date variable to input into filtering data 2020.1 is between feb and march of 2020 for when lockdown began
endcap = 2021.5 # and june = 2021.417

time_filt_categories <- categorized_movers  %>% 
  filter(
    
       # if first berk inmove is within constraints
        (movement_pattern == "SO Moved" & lockdown < date_first_berks & date_first_berks < endcap) |  
        (movement_pattern == "MO Moved" & lockdown < date_first_berks & date_first_berks < endcap) | 
             
      # if last berks outmove is within constraints (last berks for left categories is last outmove)
        (movement_pattern == "SD Left" & lockdown < date_last_berks & date_last_berks < endcap) | 
        (movement_pattern == "MD Left" & lockdown < date_last_berks & date_last_berks < endcap) |

         #first berks move within constraints 
             
        (movement_pattern == "Intra-County" & lockdown < date_last_berks) |# last date for berkshires just 
        
        (movement_pattern == "Back and Forth" & lockdown < date_last_berks) | # last date for berkshires just after lockdown
           
        # all proceding filters where the first date of second chunk is within constraints 
        (movement_pattern == "Temporary Outmover" & any(chunk_id== 2 & lockdown < chunk_date & chunk_date < endcap)) | 
       
        (movement_pattern == "Transient Temporary Outmover" & any(chunk_id== 2 & lockdown < chunk_date & chunk_date < endcap)) |
        (movement_pattern == "Temporary Inmover" & any(chunk_id== 2 & lockdown < chunk_date & chunk_date < endcap)) |
        (movement_pattern == "Transient Temporary Inmover" & any(chunk_id== 2 & lockdown < chunk_date & chunk_date < endcap)) |
        (movement_pattern == "Transient" & any(county_fips==target_fips & inmove == 1 & lockdown < date & date < endcap)) | 
        (movement_pattern == "Back and Forth then Inmover") | (movement_pattern == "Transient then Inmover")
)

         
# category totals 
time_filt_category_totals <- time_filt_categories %>% 
  group_by(movement_pattern) %>% 
  summarize(
    total_pop = n_distinct(caid),
    count_counties = (n_distinct(county_fips) - 1),
    count_tracts = n_distinct(tract_fips))

time_filt_simple_category_totals <- time_filt_categories %>% 
  group_by(temporal_simple_categories) %>% 
  summarize(
    total_pop = n_distinct(caid),
    count_counties = (n_distinct(county_fips) - 1),
    count_tracts = n_distinct(tract_fips)
  )

# Calculate the total sum of the total_pop column
total_sum <- sum(time_filt_category_totals$total_pop) - time_filt_category_totals$total_pop[time_filt_category_totals$movement_pattern == "Only Intra-County"]

# Add a new column for the percentage per category of total movers 
#time_filt_category_totals$percent_of_total <- (time_filt_category_totals$total_pop / total_sum) * 100

Group By Tracts - All Categories

# Group by tracts with all categories to be filtered later

time_filt_categories <- time_filt_categories %>% filter(movement_pattern != "Intra-County") %>%
  group_by(caid) %>%
  mutate(
         
          #Records the tract_fips of the first row where max_duration is not NA
         principal_track = if_else(row_number() == first(which(!is.na(max_duration))), tract_fips, NA_character_)     
         )


group_by_tract_categorized <- time_filt_categories %>%
  group_by(movement_pattern, tract_fips, county_fips, temporal_simple_categories) %>%
  summarize(
    
    PopMovers = n_distinct(caid[tract_fips == principal_track], na.rm = TRUE),
    
    
    GrossMigT = n_distinct(caid[inmove == 1], na.rm = TRUE))

## `summarise()` has grouped output by 'movement_pattern', 'tract_fips',
## 'county_fips'. You can override using the `.groups` argument.

Question #2: Vulnerability to Urban Centers

Load and join NCHS urban-rural classification

urban_rural_NCHS <- read.csv("/Users/isaiahbennett/Downloads/fall 2024/migration/research/data/berkshire_moversR/NCHSURCodes2013.csv"
)

ERS_typology<- read.csv("/Users/isaiahbennett/Downloads/spring 2025/migration research/main/data/ers county typology/ERSCountyTypology2015Edition.csv")


#Add leading zeros to 4 digit county fips
urban_rural_NCHS$GEOID <- ifelse(nchar(urban_rural_NCHS$FIPS.code) == 4,
                             paste0("0", urban_rural_NCHS$FIPS.code),
                             urban_rural_NCHS$FIPS.code)

#Change data type to character for join
urban_rural_NCHS$GEOID <- as.character(urban_rural_NCHS$GEOID)

CountyCode = urban_rural_NCHS$X2013.code[urban_rural_NCHS$GEOID == target_fips]

###Comparison Counties Query

# Read county typology 
ERS_typology <- read.csv("/Users/isaiahbennett/Downloads/spring 2025/migration research/main/data/ers county typology/ERSCountyTypology2015Edition.csv")

#add leading zeros for 4 digit county fips 
ERS_typology$FIPStxt <- ifelse(nchar(ERS_typology$FIPStxt) == 4,
                                           paste0("0", ERS_typology$FIPStxt),
                                           ERS_typology$FIPStxt)
#join by fips 
NCHS_ERS_join <- urban_rural_NCHS %>%
  select(GEOID, CBSA.title, X2013.code, X2006.code, X1990.based.code, County.2012.pop) %>%
  right_join(ERS_typology, by = c("GEOID" = "FIPStxt"))

#filter for recreational counties that are small metros 
NCHS_ERS_filtered_4 <- NCHS_ERS_join %>% filter(Recreation_2015_Update..allows.overlap..1.yes.== 1, X2013.code == 4)

NCHS_ERS_filtered_3 <- NCHS_ERS_join %>% filter(Recreation_2015_Update..allows.overlap..1.yes.== 1, X2013.code == 3)

NCHS_ERS_filtered_5 <- NCHS_ERS_join %>% filter(Recreation_2015_Update..allows.overlap..1.yes.== 1, X2013.code == 5)

Group By County for Movement Pattern and Join Geometry and NCHS codes

# group by counties for each movement pattern and simple category 
  # each county will have its own row for each movement pattern that it has a record in 
county_categorized <- group_by_tract_categorized %>% 
  group_by(county_fips, movement_pattern, temporal_simple_categories) %>%
  summarize(
    PopMoversC = sum(PopMovers, na.rm = TRUE),
    #PopOriginC = sum(PopOrigin, na.rm = TRUE),
    #PopDestinationC = sum(PopDestination, na.rm = TRUE), 
    #PopTempOutC = sum(PopDestTempOut, na.rm = TRUE),
    GrossMigC = sum(GrossMigT, na.rm = TRUE),
    
  ) %>%
  ungroup() # Ungroup to group later

## `summarise()` has grouped output by 'county_fips', 'movement_pattern'. You can
## override using the `.groups` argument.

#right join where it retains all rows from group_by_tract_categorized (the right table), along with any matching rows from US_tracts_2010 (the left data table)


NCHS_join <- urban_rural_NCHS %>%
  select(GEOID, CBSA.title, X2013.code, X2006.code, X1990.based.code) %>%
  right_join(county_categorized, by = c("GEOID" = "county_fips"))

Calculate weighted average of urban hierarchy shift

# calculate urban hierarchy shift ### MUST be noted that for resident simple category it should be the inversed aka /-1 
#if its positive  4 - 1, 2, 3 destination then moving up urban hierarchy more urban 
#if its negative then moving down urban hierarchy going more rural 
#weighted average to see 

#direction for left origin minus destination 
NCHS_join$UrbanHierLeft <- CountyCode - NCHS_join$X2013.code 
NCHS_join$UrbanHierMoved <- NCHS_join$X2013.code - CountyCode


# Nchs county urban to rural classification 
#1 Large central metro 
#2 Large fringe metro
#3 Medium metro
#4 Small metro 
#5 Micropolitan 
#6 Non-core


#read US counties 2020 since querying from census was not working earlier 
US_counties_2020 <- st_read("BerksMovers.gpkg", layer = "UScounties2020")

## Reading layer `UScounties2020' from data source 
##   `/Users/isaiahbennett/Downloads/spring 2025/migration research/main/BerksMovers.gpkg' 
##   using driver `GPKG'
## Simple feature collection with 3234 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.1467 ymin: -14.5487 xmax: 179.7785 ymax: 71.38782
## Geodetic CRS:  NAD83

county_join <- US_counties_2020 %>%
  right_join(NCHS_join, by = "GEOID")

#st_write(county_join, "BerksMovers.gpkg", layer = "final_county_join", append = FALSE)

#moving out where are they going to 
#single origin single destination 

weighted_shift_left <- county_join %>% 
  filter(temporal_simple_categories %in% c("Left", "Temporary Outmover")) %>% 
  group_by(movement_pattern) %>% 
  summarize(
    UrbanShift = sum(UrbanHierLeft * PopMoversC, na.rm = TRUE), 
    PopMoversWS = sum(PopMoversC, na.rm = TRUE),
    WeightedShift = UrbanShift / PopMoversWS
    )


weighted_shift_moved <- county_join %>% 
  filter(temporal_simple_categories %in% c("Moved", "Temporary Inmover", "Back and Forth", "Transient")) %>% 
  group_by(movement_pattern) %>% 
  summarize(
    UrbanShift = sum(UrbanHierMoved * PopMoversC, na.rm = TRUE), 
    PopMoversWS = sum(PopMoversC, na.rm = TRUE),
    WeightedShift = UrbanShift / PopMoversWS)

 


# Combine both summarizations
weighted_shift  <- bind_rows(weighted_shift_left, weighted_shift_moved)

weighted_shift <- st_drop_geometry(weighted_shift)

# Save as a comma-separated file
#write.csv(weighted_shift, "weighted_shift.csv", row.names = FALSE)

simple_w_shift_left <- county_join %>% 
  filter(temporal_simple_categories %in% c("Left", "Temporary Outmover")) %>% 
  group_by(temporal_simple_categories) %>% 
  summarize(
    UrbanShift = sum(UrbanHierLeft * PopMoversC, na.rm = TRUE), 
    PopMoversWS = sum(PopMoversC, na.rm = TRUE),
    WeightedShift = UrbanShift / PopMoversWS
    )



simple_w_shift_moved <- county_join %>% 
  filter(temporal_simple_categories %in% c("Moved", "Temporary Inmover", "Back and Forth", "Transient")) %>% 
  group_by(temporal_simple_categories) %>% 
  summarize(
    UrbanShift = sum(UrbanHierMoved * PopMoversC, na.rm = TRUE), 
    PopMoversWS = sum(PopMoversC, na.rm = TRUE),
    WeightedShift = UrbanShift / PopMoversWS)

 


# Combine both summarizations
simple_weighted_shift  <- bind_rows(simple_w_shift_left, simple_w_shift_moved)

simple_weighted_shift <- st_drop_geometry(simple_weighted_shift)

# Save as a comma-separated file
write.csv(simple_weighted_shift, "simple_weighted_shift.csv", row.names = FALSE)

##Visualizations / Create Centroids and Map

#create centroids 
centroids_county_join <- st_centroid(county_join)

## Warning: st_centroid assumes attributes are constant over geometries

st_write(centroids_county_join, "CountiesNew.gpkg", layer = "Counties Centroids", append = FALSE)

## Deleting layer `Counties Centroids' using driver `GPKG'
## Writing layer `Counties Centroids' to data source 
##   `CountiesNew.gpkg' using driver `GPKG'
## Writing 3390 features with 22 fields and geometry type Point.

register_stadiamaps(key = "5d2cfcab-324c-4817-bb30-d413891ab37e")

# Filter the data to select rows where movement_pattern is "SO Moved"
SO_Moved_centroids <- centroids_county_join %>%
  filter(movement_pattern == "SO Moved") %>% st_transform(crs = 4326)

SD_Left_centroids <- centroids_county_join %>%
  filter(movement_pattern == "SD Left") %>% st_transform(crs = 4326)

#this works for basemap 
basemap <- get_stadiamap(
  bbox = c(left = -75.4, bottom = 38.4, right = -69.5, top = 45.1),  # Bounding box for northeast US
  zoom = 6,  # Zoom level, adjust as needed
  maptype = "alidade_smooth"  # Use terrain map type
)

## ℹ © Stadia Maps © Stamen Design © OpenMapTiles © OpenStreetMap contributors.

one_way_permanent_map <- ggmap(basemap) +  # Add the basemap
  geom_sf(data = SD_Left_centroids, aes(size = PopMoversC, color = "Total Outmovers at Destination"), alpha = 0.7, inherit.aes = FALSE) +   geom_sf(data = SO_Moved_centroids, aes(size = PopMoversC, color = "Total Inmovers at Origin"), alpha = 0.7, inherit.aes = FALSE) + 
    # Plot centroids
  scale_size_continuous(range = c(0.5, 10), name = "Mover Count") +  # Adjust symbol size range and add legend for size
  scale_color_manual(values = c("Total Inmovers at Origin" = "yellow", "Total Outmovers at Destination" = "red"), name = "Movement Type") +  # Custom color legend
  theme_minimal() +
  labs(
    title = "One Way Permanent Migration Flows",
    subtitle = ""
  ) +
  theme(
    legend.position = "bottom",  # Adjust legend position
    axis.text = element_blank(),  # Remove latitude & longitude labels
    axis.ticks = element_blank(), # Remove tick marks on axes
    axis.title = element_blank()  # Remove axis titles
  ) +
  guides(
    size = guide_legend(override.aes = list(color = "grey"))  # Force size legend points to be orange
  )

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

one_way_permanent_map

# Save the plot as a larger image
ggsave("larger_map3.png", plot = one_way_permanent_map, width = 12, height = 10, dpi = 300)

###Simple Categories Maps

#Map for simple categories Moved and Left

simple_centroids <- centroids_county_join %>% group_by(GEOID, temporal_simple_categories) %>% summarize(
 PopMoversC = sum(PopMoversC, na.rm = TRUE),  
 #PopOriginC =  sum(PopOriginC, na.rm = TRUE), 
 #PopDestinationC = sum(PopDestinationC, na.rm = TRUE),
 #PopTempOutC = sum(PopTempOutC, na.rm = TRUE),
 GrossMigC = sum(GrossMigC, na.rm = TRUE),
 
 
  )

## `summarise()` has grouped output by 'GEOID'. You can override using the
## `.groups` argument.

# Filter the data to select rows where movement_pattern is "SO Moved"
Moved_centroids <- simple_centroids %>%
  filter(temporal_simple_categories == "Moved") %>% st_transform(crs = 4326)

Left_centroids <- simple_centroids %>%
  filter(temporal_simple_categories == "Left") %>% st_transform(crs = 4326)

#st_write(Moved_centroids, "BerksMovers.gpkg", layer = "Inmover Centroids")
#st_write(Left_centroids, "BerksMovers.gpkg", layer = "Outmover Centroids")

# inmovers mapped over outmovers - in yellow out red 

one_way_permanent_simple_map <- ggmap(basemap) +  # Add the basemap
  geom_sf(data = Left_centroids, aes(size = PopMoversC, color = "Total Outmovers at Destination"), alpha = 0.8, inherit.aes = FALSE) +   geom_sf(data = Moved_centroids, aes(size = PopMoversC, color = "Total Inmovers at Origin"), alpha = 0.7, inherit.aes = FALSE) + 
    # Plot centroids
  scale_size_continuous(range = c(0.5, 20), name = "Mover Count") +  # Adjust symbol size range and add legend for size
  scale_color_manual(values = c("Total Inmovers at Origin" = "yellow", "Total Outmovers at Destination" = "red"), name = "Movement Type") +  # Custom color legend
  theme_minimal() +
  labs(
    title = "Permanent Migration Flows",
    subtitle = ""
  ) +
  theme(
    legend.position = "bottom",  # Adjust legend position
    axis.text = element_blank(),  # Remove latitude & longitude labels
    axis.ticks = element_blank(), # Remove tick marks on axes
    axis.title = element_blank()  # Remove axis titles
  ) +
  guides(
    size = guide_legend(override.aes = list(color = "grey"))  # Force size legend points to be orange
  )

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

one_way_permanent_simple_map

# Save the plot as a larger image
#ggsave("one_way_simple_map.png", plot = one_way_permanent_simple_map, width = 12, height = 10, dpi = 300)

###Temporary Mover Map

# Filter the data to select rows where movement_pattern is "SO Moved"
Temp_in_centroids <- simple_centroids %>%
  filter(temporal_simple_categories == "Temporary Inmover") %>% st_transform(crs = 4326)

Temp_out_centroids <- simple_centroids %>%
  filter(temporal_simple_categories == "Temporary Outmover") %>% st_transform(crs = 4326)

#st_write(Temp_in_centroids, "BerksMovers.gpkg", layer = "Temporary Inmover Centroids")
#st_write(Temp_out_centroids, "BerksMovers.gpkg", layer = "Temporary Outmover Centroids")



temporary_movers_simple_map <- ggmap(basemap) +  # Add the basemap
  geom_sf(data = Temp_out_centroids, aes(size = PopMoversC, color = "Total Temporary Outmovers at Destination"), alpha = 0.9, inherit.aes = FALSE) +
  geom_sf(data = Temp_in_centroids, aes(size = PopMoversC, color = "Total Temporary Inmovers at Origin"), alpha = 0.7, inherit.aes = FALSE) + 
    # Plot centroids
  scale_size_continuous(range = c(0.5, 20), name = "Mover Count") +  # Adjust symbol size range and add legend for size
  scale_color_manual(values = c("Total Temporary Inmovers at Origin" = "orange", "Total Temporary Outmovers at Destination" = "blue"), name = "Movement Type") +  # Custom color legend
  theme_minimal() +
  labs(
    title = "Temporary Migration Flows",
    subtitle = ""
  ) +
  theme(
    legend.position = "bottom",  # Adjust legend position
    axis.text = element_blank(),  # Remove latitude & longitude labels
    axis.ticks = element_blank(), # Remove tick marks on axes
    axis.title = element_blank()  # Remove axis titles
  ) +
  guides(
    size = guide_legend(override.aes = list(color = "grey"))  # Force size legend points to be orange
  )

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

temporary_movers_simple_map

# Save the plot as a larger image
#ggsave("new_temp_movers_simple_map.png", plot = temporary_movers_simple_map, width = 12, height = 10, dpi = 300)

###Cyclical Mover Map

#back and forth map 
Back_and_forth_centroids <- simple_centroids %>%
  filter(temporal_simple_categories == "Back and Forth", GEOID != 25003) %>% st_transform(crs = 4326)

#st_write(Back_and_forth_centroids, "BerksMovers.gpkg", layer = "Cyclical Centroids")



cyclical_movers_simple_map <- ggmap(basemap) +  # Add the basemap
   
  geom_sf(data = Back_and_forth_centroids, aes(size = GrossMigC, color = "Total Cyclical Movers at Outside County"), alpha = 0.7, inherit.aes = FALSE) +  # Plot centroids
  scale_size_continuous(range = c(0.5, 20), name = "Mover Count") +  # Adjust symbol size range and add legend for size
  scale_color_manual(values = c("Total Cyclical Movers at Outside County" = "purple"), name = "Movement Type") +  # Custom color legend
  theme_minimal() +
  labs(
    title = "Cyclical Mover Gross Migration Flows",
    subtitle = ""
  ) +
  theme(
    legend.position = "bottom",  # Adjust legend position
    axis.text = element_blank(),  # Remove latitude & longitude labels
    axis.ticks = element_blank(), # Remove tick marks on axes
    axis.title = element_blank()  # Remove axis titles
  ) +
  guides(
    size = guide_legend(override.aes = list(color = "grey"))  # Force size legend points to be orange
  )

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

cyclical_movers_simple_map

# Save the plot as a larger image
#ggsave("new_cyclical_movers_simple_map.png", plot = cyclical_movers_simple_map, width = 12, height = 10, dpi = 300)

###Gross Multi Mover Map

#back and forth map 
Transient_centroids <- simple_centroids %>%
  filter(temporal_simple_categories == "Transient", GEOID != 25003) %>% st_transform(crs = 4326)

#st_write(Transient_centroids, "BerksMovers.gpkg", layer = "Transient Centroids")


Transient_simple_map <- ggmap(basemap) +  # Add the basemap
   
  geom_sf(data = Transient_centroids, aes(size = PopMoversC, color = "Gross Out Migration of Transient Movers"), alpha = 0.8, inherit.aes = FALSE) +  # Plot centroids
  scale_size_continuous(range = c(0.5, 20), name = "Mover Count") +  # Adjust symbol size range and add legend for size
  scale_color_manual(values = c("Gross Out Migration of Transient Movers" = "green"), name = "Movement Type") +  # Custom color legend
  theme_minimal() +
  labs(
    title = "Transient Mover Gross Migration Flows",
    subtitle = ""
  ) +
  theme(
    legend.position = "bottom",  # Adjust legend position
    axis.text = element_blank(),  # Remove latitude & longitude labels
    axis.ticks = element_blank(), # Remove tick marks on axes
    axis.title = element_blank()  # Remove axis titles
  ) +
  guides(
    size = guide_legend(override.aes = list(color = "grey"))  # Force size legend points to be orange
  )

## Coordinate system already present. Adding new coordinate system, which will
## replace the existing one.

Transient_simple_map

# Save the plot as a larger image
#ggsave("new_transient_movers_simple_map.png", plot = Transient_simple_map, width = 12, height = 10, dpi = 300)

###Interactive Map

library(leaflet)
### commenting out for the knit 
# Create an interactive map using leaflet
# leaflet() %>%
#   addProviderTiles("CartoDB.Positron") %>%  # Add a tile layer, can be customized with different map types
#    # Add markers for SD_Left_centroids
#   addCircleMarkers(
#     data = SD_Left_centroids, 
#     lng = ~st_coordinates(geometry)[,1],  # Extract longitude from geometry column
#     lat = ~st_coordinates(geometry)[,2],  # Extract latitude from geometry column
#     radius = ~sqrt(PopMoversC) * 2,  # Adjust size based on population; scale as needed
#     color = "red",
#     fillColor = "red", 
#     fillOpacity = 0.7,
#     stroke = TRUE,
#     weight = 1,
#     popup = ~paste("Destination Total: ", PopMoversC)  # Add a popup with population info
# 
#   ) %>% 
#   
#   # Add markers for SO_Moved_centroids
#   addCircleMarkers(
#     data = SO_Moved_centroids, 
#     lng = ~st_coordinates(geometry)[,1],  # Extract longitude from geometry column
#     lat = ~st_coordinates(geometry)[,2],  # Extract latitude from geometry column
#     radius = ~sqrt(PopMoversC) * 2,  # Adjust size based on population; scale as needed
#     color = "yellow",
#     fillColor = "yellow", 
#     fillOpacity = 0.7,
#     stroke = TRUE,  # To add a border around the circles
#     weight = 1,
#     popup = ~paste("Origin Total: ", PopMoversC)  # Add a popup with population info
#     
#   ) %>%
#  
#   # Customize the appearance of the legend
#   addLegend(
#     position = "bottomright", 
#     colors = c("yellow", "red"), 
#     labels = c("Total Inmovers at Origin", "Total Outmovers at Destination"), 
#     title = "Permanent Migration Flows"
#   )

##Plot Redfin Data

redfin_data <- read.csv("/Users/isaiahbennett/Downloads/spring 2025/migration research/main/data/redfin/Berkshire_redfin.csv"
)

redfin_data_refined <- redfin_data %>% select(region, period_begin, property_type, median_sale_price, median_sale_price_yoy, homes_sold, new_listings, median_dom) %>% arrange(property_type, period_begin) %>% filter(
         period_begin > as.Date("2018-12-01") & period_begin < as.Date("2022-01-01"), property_type != "Townhouse")

redfin_data_refined$period_begin <- as.yearmon(redfin_data_refined$period_begin, format = "%Y-%m-%d")




redfin_single_fam <- redfin_data_refined %>%
  filter(property_type == "Single Family Residential")

redfin_all_res <- redfin_data_refined %>%
  filter(property_type == "All Residential")



ggplot(redfin_single_fam, aes(x = period_begin, y = median_sale_price)) +
  geom_line(color = "blue", size = 1) +
  labs(title = "Median Sale Price Over Time", x = "Year-Month", y = "Median Sale Price") +
  theme_minimal()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

## Warning: The `trans` argument of `continuous_scale()` is deprecated as of ggplot2 3.5.0.
## ℹ Please use the `transform` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

ggplot(redfin_all_res, aes(x = period_begin, y = median_sale_price)) +
  geom_line(color = "blue", size = 1) +
  labs(title = "Median Sale Price Over Time", x = "Year-Month", y = "Median Sale Price") +
  theme_minimal()

ggplot(redfin_data_refined, aes(x = period_begin, y = median_sale_price, color = property_type)) +
  geom_line(size = 1) +  # Line plot
  labs(title = "Median Sale Price Over Time by Property Type",
       x = "",
       y = "Price ($)",
       color = "Property Type") +  # Legend title
  theme_minimal()

median_sales_plot <- ggplot(redfin_data_refined, aes(x = period_begin, y = median_sale_price, color = property_type)) +
  geom_line(size = 1) +  # Line plot
  labs(title = "Median Sale Price Over Time by Property Type",
       x = "",
       y = "Price ($)",
       color = "Property Type") +  # Legend title
  theme_minimal()

#ggsave("median_sales_plot.png", plot = median_sales_plot, width = 8, height = 6, dpi = 300)


#write.csv(redfin_data_refined, "refin_data_refined.csv", row.names = FALSE)

ggplot(redfin_single_fam, aes(x = period_begin)) +
  geom_line(aes(y = homes_sold, color = "Homes Sold"), size = 1) +
  geom_line(aes(y = new_listings, color = "New Listings"), size = 1) +
  labs(title = "Homes Sold & New Listings Over Time", x = "Year-Month", y = "Count") +
  scale_color_manual(values = c("Homes Sold" = "red", "New Listings" = "green")) +
  theme_minimal()

ggplot(redfin_data_refined, aes(x = period_begin, y = homes_sold, color = property_type)) +
  geom_line(size = 1) +
  labs(title = "Homes Sold Over Time by Property Type",
       x = "Year-Month",
       y = "Homes Sold",
       color = "Property Type") +
  theme_minimal()

ggplot(redfin_single_fam, aes(x = period_begin, y = median_dom)) +
  geom_line(color = "purple", size = 1) +
  labs(title = "Median Days on Market Over Time", x = "Year-Month", y = "Median DOM") +
  theme_minimal()

#Question #3: Economic Shifts

Query and Join Economic Indicator Data

#query and save rent data at the tract level

if (exists("econ_indicators")) {
  # Code to execute if the dataframe exists
  print("Dataframe exists! On to next step...")
} else {
  
    
econ_indicators <- get_acs(
  geography = "tract",
  variables = c(
    "B25064_001",  # Median Gross Rent
    "B25077_001",  # Median Home Value
    "B19013_001",  # Median Household Income
    "B25119_003"  # Median Renter Income
  ),
  survey = "acs5",
  state = all_state_codes,        
  year = 2022,
  output = "wide",
  geometry = FALSE     # turn this on later once it's working
)




# write.csv(x = econ_indicators,
#           file = "econ_indicators_22.csv", 
#           row.names = FALSE,
#           na = "")

}

## Getting data from the 2018-2022 5-year ACS

## Fetching tract data by state and combining the result.

#add 2020 tract geometry since the query was failing to execute with geometry = TRUE
econ_indicators_spatial <- US_tracts_2020 %>% select(GEOID, geometry) %>%
  right_join(econ_indicators, by = "GEOID")


### The categorized mover geographic attributes align with the census data from 2010. To be able to join the economic census data with 2020 tract geometries properly, they have to be joined by location since they some tracts have different tract ids. 

#The goal is for the final econ_join to have geometry from 2010 census to align with movers

# Join categorized movers to 2010 tract geometry 
categorized_movers_spatial_2010 <- US_tracts_2010 %>%
  right_join(time_filt_categories, by = c("GEOID" = "tract_fips"))

#arrange by caid and date 
categorized_movers_spatial_2010 <- arrange(categorized_movers_spatial_2010, caid, date)

#create a centroid on surface to avoid concave outlier centroids 
categorized_movers_centroids <- categorized_movers_spatial_2010 %>%
  mutate(centroid = st_point_on_surface(geometry)) %>%
  st_set_geometry("centroid")  # Temporarily set the geometry to the centroids

## Warning: There was 1 warning in `stopifnot()`.
## ℹ In argument: `centroid = st_point_on_surface(geometry)`.
## Caused by warning in `st_point_on_surface.sfc()`:
## ! st_point_on_surface may not give correct results for longitude/latitude data

#ensure same crs between econ data and movers 
categorized_movers_centroids <- st_transform(categorized_movers_centroids, crs = 4326)
econ_indicators_spatial <- st_transform(econ_indicators_spatial, crs = 4326)

###join by location using centroids within econ tracts
#mover tracts are assigned the econ census data from the tract that their centroids falls within
categorized_movers_with_econ <- st_join(
  categorized_movers_centroids,
  econ_indicators_spatial,
  join = st_within,  # Ensures the centroid falls within the 2020 geometry
  left = TRUE
)


# Set back to the original geometries
econ_join <- categorized_movers_with_econ %>%
  st_set_geometry("geometry")  

#arrange and erase redundant columns
econ_join <- econ_join %>% arrange(caid, date) %>% select(-c(1:9, ))

Filter Out Permanent Left and Calculate Rent Burden

### filters out permanent simple category and creates origin and destination tract
permanent_econ_join <- econ_join %>% filter(temporal_simple_categories %in% c("Left", "Moved")) %>% group_by(caid) %>% mutate(
  #latest target county date either before moving out or last moved into
  last_target_tract = if_else(new_date == as.character(date_last_berks), GEOID.x, NA_character_),
  # selecting the GEOID from the mover frame with the same index as the max duration tract
  principal_outside_tract = if_else(row_number() == first(which(!is.na(max_duration))), GEOID.x, NA_character_)
) %>% 
  relocate(last_target_tract, principal_outside_tract, .before = everything())

# only selects origin and destination tract gets rid of other na tracts
filtered_permanent_econ_join <- permanent_econ_join %>% 
  filter(!is.na(last_target_tract) | !is.na(principal_outside_tract)) 

    
## filters for just Left 
#pairs origin to destination in one row per caid and defines economic factors for origin and destination
left_econ_tracts_paired <- filtered_permanent_econ_join  %>% filter(temporal_simple_categories == "Left") %>%
  summarize(
    movement_pattern = first(movement_pattern, na.rm = TRUE),
    temporal_simple_categories = first(temporal_simple_categories, na.rm = TRUE),
    last_target_tract = first(last_target_tract, na.rm = TRUE),
    principal_outside_tract = last(principal_outside_tract, na.rm = TRUE),
    CountyOrig = first(NAMELSADCO, na.rm = TRUE),
    CountyDest = last(NAMELSADCO, na.rm = TRUE),  
    dest_county_fips = last(county_fips, na.rm = TRUE),
    OrigHomeValue = first(B25077_001E, na.rm = TRUE),
    DestHomeValue = last(B25077_001E, na.rm = TRUE),
    OrigMedIncome = first(B19013_001E, na.rm = TRUE),
    DestMedIncome = last(B19013_001E, na.rm = TRUE),
    OrigMedMonthlyIncome = first(B19013_001E/12, na.rm = TRUE),
    DestMedMonthlyIncome = last(B19013_001E/12, na.rm = TRUE),
    OrigMedRenterMonthlyIncome = ifelse(is.na(first(B25119_003E/12)), first(B19013_001E/12, na.rm = TRUE), first(B25119_003E/12, na.rm = TRUE)), #if renter income doesnt exist use overall income
    DestMedRenterMonthlyIncome = ifelse(is.na(last(B25119_003E/12)), last(B19013_001E/12, na.rm = TRUE), last(B25119_003E/12, na.rm = TRUE)), #if renter income doesnt exist use overall income
    OrigGrsRent = first(B25064_001E, na.rm = TRUE),
    DestGrsRent = last(B25064_001E, na.rm = TRUE),
    AffordableRent = OrigMedMonthlyIncome * 0.3, 
    RentDiff = OrigGrsRent - DestGrsRent, 
    OrigPctIncomeSpentOnRent = OrigGrsRent/OrigMedRenterMonthlyIncome * 100,
    DestPctIncomeSpentOnRent = DestGrsRent/DestMedRenterMonthlyIncome * 100,
    OrigHomeValtoIncome = OrigHomeValue / OrigMedIncome, 
    DestHomeValtoIncome = DestHomeValue / OrigMedIncome,
     

    ) %>%
  ungroup()


#counts total movers by berkshire origin tract 
left_econ_tract_count <- left_econ_tracts_paired %>%
  group_by(last_target_tract) %>%
  summarize(
    TotalOutmovers = n_distinct(caid),
    across(
      c(movement_pattern, dest_county_fips, OrigHomeValue, OrigMedIncome, OrigMedMonthlyIncome, OrigGrsRent, OrigPctIncomeSpentOnRent, OrigHomeValtoIncome),
      ~ first(.x, na.rm = TRUE),
      .names = "{.col}"
    )
  )

#groups by same path and counts number of movers 
left_econ_by_path <- left_econ_tracts_paired %>% group_by(last_target_tract, principal_outside_tract) %>% summarize( 
    TotalMovers = n_distinct(caid),
    across(
      c(movement_pattern, dest_county_fips, OrigHomeValue, OrigHomeValue, DestHomeValue, OrigMedIncome, DestMedIncome, OrigMedMonthlyIncome, DestMedMonthlyIncome, OrigMedRenterMonthlyIncome, DestMedRenterMonthlyIncome, OrigGrsRent, DestGrsRent, AffordableRent, RentDiff, OrigPctIncomeSpentOnRent, DestPctIncomeSpentOnRent, OrigHomeValtoIncome, DestHomeValtoIncome),
      ~ first(.x, na.rm = TRUE),
      .names = "{.col}"
    ))

## `summarise()` has grouped output by 'last_target_tract'. You can override using
## the `.groups` argument.

## Create classes of rent burden intervals to later count total outmovers per class 
left_econ_by_path <- left_econ_by_path %>% mutate(ClassesPctIncomeOnRent = case_when(
  OrigPctIncomeSpentOnRent >= 0 & OrigPctIncomeSpentOnRent < 10 ~ "0-10",
  OrigPctIncomeSpentOnRent >= 10 & OrigPctIncomeSpentOnRent < 20 ~ "10-20",
  OrigPctIncomeSpentOnRent >= 20 & OrigPctIncomeSpentOnRent < 30 ~ "20-30",
  OrigPctIncomeSpentOnRent >= 30 & OrigPctIncomeSpentOnRent < 40 ~ "30-40",
  OrigPctIncomeSpentOnRent >= 40 & OrigPctIncomeSpentOnRent < 50 ~ "40-50",
  OrigPctIncomeSpentOnRent >= 50 & OrigPctIncomeSpentOnRent < 60 ~ "50-60",
  OrigPctIncomeSpentOnRent >= 60 & OrigPctIncomeSpentOnRent < 70 ~ "60-70",
  OrigPctIncomeSpentOnRent >= 70 & OrigPctIncomeSpentOnRent < 80 ~ "70-80",
  OrigPctIncomeSpentOnRent >= 80 & OrigPctIncomeSpentOnRent < 90 ~ "10-20",
  TRUE ~ "No Census Data"), 
  DestClassesPctIncomeOnRent = case_when(
  DestPctIncomeSpentOnRent >= 0 & DestPctIncomeSpentOnRent < 10 ~ "0-10",
  DestPctIncomeSpentOnRent >= 10 & DestPctIncomeSpentOnRent < 20 ~ "10-20",
  DestPctIncomeSpentOnRent >= 20 & DestPctIncomeSpentOnRent < 30 ~ "20-30",
  DestPctIncomeSpentOnRent >= 30 & DestPctIncomeSpentOnRent < 40 ~ "30-40",
  DestPctIncomeSpentOnRent >= 40 & DestPctIncomeSpentOnRent < 50 ~ "40-50",
  DestPctIncomeSpentOnRent >= 50 & DestPctIncomeSpentOnRent < 60 ~ "50-60",
  DestPctIncomeSpentOnRent >= 60 & DestPctIncomeSpentOnRent < 70 ~ "60-70",
  DestPctIncomeSpentOnRent >= 70 & DestPctIncomeSpentOnRent < 80 ~ "70-80",
  DestPctIncomeSpentOnRent >= 80 & DestPctIncomeSpentOnRent < 90 ~ "80-90",
  DestPctIncomeSpentOnRent >= 90  ~ "90 and up",

  TRUE ~ "No Census Data"), 
  )

# Group by classes and summarize total movers 
left_econ_by_classes <- left_econ_by_path %>% group_by(ClassesPctIncomeOnRent) %>% summarize(TotalMovers = sum(TotalMovers)) 
# Calculate percent of total movers per class
left_econ_by_classes <- left_econ_by_classes %>% mutate(PctTotalMovers = round(TotalMovers/sum(left_econ_by_classes$TotalMovers) *100, 2))


### This ones not super helpful 
# filters out displaced movers and counts their totals by rent burden classes 
displaced_migration_paths_by_classes <- left_econ_by_path %>% filter(OrigPctIncomeSpentOnRent >=30 & DestPctIncomeSpentOnRent < 30) %>% group_by(DestClassesPctIncomeOnRent) %>% summarize(TotalMovers = sum(TotalMovers)) 

displaced_migration_paths_by_classes <- displaced_migration_paths_by_classes %>% mutate(PctTotalMovers = round(TotalMovers/sum(left_econ_by_classes$TotalMovers) *100, 2))


# Total count of outmovers based on the rent burden classes of their destination 
left_dest_econ_by_classes <- left_econ_by_path %>% group_by(DestClassesPctIncomeOnRent) %>% summarize(TotalMovers = sum(TotalMovers)) 
left_dest_econ_by_classes <- left_dest_econ_by_classes %>% mutate(PctTotalMovers = round(TotalMovers/sum(left_econ_by_classes$TotalMovers) *100, 2))

    

#filters for movers who were rent burdened to renters no longer rent burdened
displaced_migration_paths <- left_econ_by_path %>% filter(OrigPctIncomeSpentOnRent >=30 & DestPctIncomeSpentOnRent < 30)

# Counts the total number of displaced movers by target tracts  
total_displaced_movers_by_tract <- displaced_migration_paths %>% group_by(last_target_tract) %>% summarize(TotalDisplaced = sum(TotalMovers))

# Counts total number of outmovers designated as displaced 
totaldisplaced = sum(total_displaced_movers_by_tract$TotalDisplaced)

Filter out Permanent In-Mover and Calculate Percent Change in Income

## filters for just Moved 
#pairs origin to destination in one row per caid and defines important economic indicators for origin and destination
moved_econ_tracts_paired <- filtered_permanent_econ_join  %>% filter(temporal_simple_categories == "Moved") %>%
  summarize(
    movement_pattern = first(movement_pattern, na.rm = TRUE),
    last_target_tract = last(last_target_tract, na.rm = TRUE),
    principal_outside_tract = first(principal_outside_tract, na.rm = TRUE),
    CountyOrig = first(NAMELSADCO, na.rm = TRUE), 
    orig_county_fips = first(county_fips, na.rm = TRUE),
    OrigHomeValue = first(B25077_001E, na.rm = TRUE),
    DestHomeValue = last(B25077_001E, na.rm = TRUE),
    OrigMedIncome = first(B19013_001E, na.rm = TRUE),
    DestMedIncome = last(B19013_001E, na.rm = TRUE),
    PctChangeIncome = (OrigMedIncome - DestMedIncome)/OrigMedIncome *100 # change in income between origin and destination

    ) %>%
  ungroup()

#counts total movers per unique migration path keeping their unique economic variables 
moved_econ_by_path <- moved_econ_tracts_paired %>%
  group_by(principal_outside_tract, last_target_tract) %>% summarize(
    MovedTotalMovers = n_distinct(caid),  # Count unique 'caid'
    across(
      c(movement_pattern, orig_county_fips, CountyOrig, OrigHomeValue, DestHomeValue, OrigMedIncome, DestMedIncome, PctChangeIncome),
      ~ first(.x, na.rm = TRUE),
      .names = "{.col}"
    ))

## `summarise()` has grouped output by 'principal_outside_tract'. You can override
## using the `.groups` argument.

#Creates classes of percent change in income between origin and destination 
moved_econ_by_path <- moved_econ_by_path %>%
  mutate(ClassesPctChangeIncome = case_when(
    PctChangeIncome < -100 ~ "Under -100",
    PctChangeIncome >= -100 & PctChangeIncome < -75 ~ "-100 to -75",
    PctChangeIncome >= -75 & PctChangeIncome < -50 ~ "-75 to -50",
    PctChangeIncome >= -50 & PctChangeIncome < -25 ~ "-50 to -25",
    PctChangeIncome >= -25 & PctChangeIncome < 0 ~ "-25 to 0",
    PctChangeIncome >= 0 & PctChangeIncome < 25 ~ "0 to 25",
    PctChangeIncome >= 25 & PctChangeIncome < 50 ~ "25 to 50",
    PctChangeIncome >= 50 & PctChangeIncome < 75 ~ "50 to 75",
    PctChangeIncome >= 75 & PctChangeIncome < 100 ~ "75 to 100",
    PctChangeIncome >= 100 ~ "Over 100",
    TRUE ~ "No Census Data"
  ))

#Orders classes so they are not rendered alphabetically 
moved_econ_by_path$ClassesPctChangeIncome <- factor(
  moved_econ_by_path$ClassesPctChangeIncome,
  levels = c(
    "Over 100",
    "75 to 100",
    "50 to 75",
    "25 to 50",
    "0 to 25",
    "-25 to 0",
    "-50 to -25",
    "-75 to -50",
    "-100 to -75",
    "Under -100",
    "No Census Data"
  ))

#Group by the classes and count total movers
moved_econ_by_income_classes <- moved_econ_by_path %>% group_by(ClassesPctChangeIncome) %>% summarize(MovedTotalMovers = sum(MovedTotalMovers)) 

#calculate percent of total movers
moved_econ_by_income_classes <- moved_econ_by_income_classes%>% mutate(PctTotalMovers = round(MovedTotalMovers/sum(moved_econ_by_income_classes$MovedTotalMovers) *100, 2))

gentrifier_migration_paths <- moved_econ_by_path %>% filter(PctChangeIncome >=25)


#Counts total inmovers whos origin has 25% higher median income or more by target tracts
total_gentrifier_movers_by_tract <- gentrifier_migration_paths %>%  group_by(last_target_tract) %>% summarize(TotalGentrifiers = sum(MovedTotalMovers))





#Join the displaced by tract and gentrifier by tract
gent_disp_tract_join <- total_displaced_movers_by_tract %>% right_join(total_gentrifier_movers_by_tract, by = "last_target_tract")

write.csv(gent_disp_tract_join, "Gent and Disp Counts by Target Tract.csv", row.names = FALSE)

#Run a correlation test between the total gentrifiers and displaced movers per tract 
correlation <- cor.test(
  gent_disp_tract_join$TotalGentrifiers,
  gent_disp_tract_join$TotalDisplaced,
  method = "pearson"
)

correlation

## 
##  Pearson's product-moment correlation
## 
## data:  gent_disp_tract_join$TotalGentrifiers and gent_disp_tract_join$TotalDisplaced
## t = 7.6031, df = 18, p-value = 5.02e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.7019915 0.9490313
## sample estimates:
##      cor 
## 0.873243

###Group by

#group displaced movers by county and count total movers 
displaced_county_group_by <- displaced_migration_paths %>% group_by(dest_county_fips, movement_pattern) %>% summarize(TotalMovers = sum(TotalMovers), 
  across(
      c(OrigMedMonthlyIncome, DestMedMonthlyIncome, OrigMedRenterMonthlyIncome, DestMedRenterMonthlyIncome, OrigGrsRent, DestGrsRent, OrigPctIncomeSpentOnRent, DestPctIncomeSpentOnRent),
      ~ first(.x, na.rm = TRUE),
      .names = "{.col}"
    ))

## `summarise()` has grouped output by 'dest_county_fips'. You can override using
## the `.groups` argument.

# group gentrifier movers by county and count total movers                       
gentrifier_county_group_by <- gentrifier_migration_paths %>% group_by(orig_county_fips, movement_pattern) %>% summarize(TotalMovers = sum(MovedTotalMovers), 
  across(
      c(OrigMedIncome, DestMedIncome, PctChangeIncome),
      ~ first(.x, na.rm = TRUE),
      .names = "{.col}"
    ))

## `summarise()` has grouped output by 'orig_county_fips'. You can override using
## the `.groups` argument.

#Create County Centroids for the US to be joined with both dataframes 
US_counties_2020_centroids <- st_centroid(US_counties_2020)

## Warning: st_centroid assumes attributes are constant over geometries

displaced_county_spatial <- US_counties_2020_centroids %>%
  select(GEOID, geom) %>%  # Keep only GEOID and geometry before joining
  right_join(displaced_county_group_by, by = c("GEOID" = "dest_county_fips"))

gentrifier_county_spatial <- US_counties_2020_centroids %>%
  select(GEOID, geom) %>%  # Keep only GEOID and geometry before joining
  right_join(gentrifier_county_group_by, by = c("GEOID" = "orig_county_fips"))

#st_write(displaced_county_spatial, "BerksMovers.gpkg", layer = "displaced_county_centroids" )
#st_write(gentrifier_county_spatial, "BerksMovers.gpkg", layer = "gentrifier_county_centroids" )

##Urban Shift for Gentrifiers and Displaced Movers

NCHS_displaced_join <- urban_rural_NCHS %>%
  select(GEOID, CBSA.title, X2013.code, X2006.code, X1990.based.code) %>%
  right_join(displaced_county_group_by, by = c("GEOID" = "dest_county_fips"))

NCHS_gentrifier_join <- urban_rural_NCHS %>%
  select(GEOID, CBSA.title, X2013.code, X2006.code, X1990.based.code) %>%
  right_join(gentrifier_county_group_by, by = c("GEOID" = "orig_county_fips"))

NCHS_displaced_join$UrbanHierLeft <- CountyCode - NCHS_displaced_join$X2013.code 
NCHS_gentrifier_join$UrbanHierMoved <- NCHS_gentrifier_join$X2013.code - CountyCode

Displaced_Weighted_Shift <- NCHS_displaced_join %>%  
  group_by(movement_pattern) %>% summarize(
    UrbanShift = sum(UrbanHierLeft * TotalMovers, na.rm = TRUE), 
    PopMoversWS = sum(TotalMovers, na.rm = TRUE),
    WeightedShift = UrbanShift / PopMoversWS
    )
Gentrifier_Weighted_Shift <- NCHS_gentrifier_join %>%  
  group_by(movement_pattern) %>% summarize(
    UrbanShift = sum(UrbanHierMoved * TotalMovers, na.rm = TRUE), 
    PopMoversWS = sum(TotalMovers, na.rm = TRUE),
    WeightedShift = UrbanShift / PopMoversWS
    )

##Join geometries to target tracts and save

#Join tract geometry for just the layer that has total number of displaced movers 
displaced_join <- US_tracts_2010 %>%
  select(GEOID, geometry) %>%  # Keep only GEOID and geometry before joining
  right_join(total_displaced_movers_by_tract, by = c("GEOID" = "last_target_tract"))
#create centroids
displaced_join_centroids <- st_centroid(displaced_join)

## Warning: st_centroid assumes attributes are constant over geometries

gentrifier_join <- US_tracts_2010 %>%
  select(GEOID, geometry) %>%  # Keep only GEOID and geometry before joining
  right_join(total_gentrifier_movers_by_tract, by = c("GEOID" = "last_target_tract"))
#create centroids
gentrifier_join_centroids <- st_centroid(gentrifier_join)

## Warning: st_centroid assumes attributes are constant over geometries

target_tracts_econ <- US_tracts_2010 %>%
  select(GEOID, geometry) %>%  # Keep only GEOID and geometry before joining
  right_join(left_econ_tract_count, by = c("GEOID" = "last_target_tract"))

#save to geopackage
#st_write(displaced_join_centroids, "BerksMovers.gpkg", layer = "displaced_tract_centroids", append = FALSE)
#st_write(gentrifier_join_centroids, "BerksMovers.gpkg", layer = "gentrifier_tract_centroids" )
#st_write(target_tracts_econ, "BerksMovers.gpkg", layer = "target_tracts_econ" )

Metadata for access

Rights: LICENSE: BSD 3-Clause “New” or “Revised”
Resource type: Collection
Resource language: English
Conforms to: Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences version 1.0, DOI:[10.17605/OSF.IO/W29MQ](DOI:%5B10.17605/OSF.IO/W29MQ){.uri}

Compendium structure and contents

This research compendium is structured with four main directories:

data: contains subdirectories for raw data and derived data.
docs: contains subdirectories for manuscript, presentation, and report
procedure: contains subdirectories for code or software scripts, information about the computational environment in which the research was conducted, and non-code research protocols
results: contains subdirectories for figures, formatted data tables, or other formats of research results.

The data, procedures, and results of this repository are outlined in three tables: - Data: data/data_index.csv - Procedures: procedure/procedure_index.csv - Results: results/results_index.csv

Important local documents include: - Pre-analysis plan: docs/report/preanalysis.pdf - Study report: docs/report/report.pdf - Manuscript: docs/manuscript/manuscript.pdf - Presentation: docs/presentation/presentation.pdf

Compendium reference

The template_readme.md file contains more information on the design of this template and references used in the design. The Template_LICENSE file provides the BSD 3-Clause license for using this template. To cite the template, please use template_reference.bib or: > Kedron, P., & Holler, J. (2023). Template for Reproducible and Replicable Research in Human-Environment and Geographical Sciences. https://doi.org/10.17605/OSF.IO/W29MQ

This is an R Markdown Notebook. When you execute code within the notebook, the results appear beneath the code.

Try executing this chunk by clicking the Run button within the chunk or by placing your cursor inside it and pressing Cmd+Shift+Enter.

Add a new chunk by clicking the Insert Chunk button on the toolbar or by pressing Cmd+Option+I.

When you save the notebook, an HTML file containing the code and output will be saved alongside it (click the Preview button or press Cmd+Shift+K to preview the HTML file).

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.

R Notebook