Introduction

This markdown file creates a map to visualize the population changes across U.S. counties from 2020 to 2023. Understanding population dynamics is crucial for effective planning and resource allocation. This analysis aims to highlight areas of significant population growth or decline, providing insights into demographic trends and regional development. Also this is a fun way to practice my data analysis skills. The main package that will be used for the map creation is sf.

1.1 Read in Data

The data used for this analysis was obtained from US Census Bureau under the name CO-EST2023-POP. This data encompasses population estimates for U.S. counties & equivalents. The data was processed to calculate the three-year population change, enabling a clear visual representation of shifts in population.

The Shape files for US county & equivalents comes from the Tiger/line shapefiles and represent the boundaries in 2023.

# Read data from first sheet 
raw_df <- read_xlsx(path = "~/Documents/Rpubs_Projects/Maps/co-est2023-pop.xlsx", 
          sheet = "CO-EST2023-POP",
          range = "A4:F3149",
          col_names = TRUE) |>
  rename(Geographic_Area = `...1`,
         April2020BasePop = `...2`)
## New names:
## • `` -> `...1`
## • `` -> `...2`
# read county FIPS Ids 
FIPS <- read_xlsx( path = "~/Documents/Rpubs_Projects/Maps/co-est2023-pop.xlsx",
                   sheet = "FIPS",
                   range = "A1:B3197",
                   col_names = TRUE) 
str(raw_df)
## tibble [3,145 × 6] (S3: tbl_df/tbl/data.frame)
##  $ Geographic_Area : chr [1:3145] "United States" ".Autauga County, Alabama" ".Baldwin County, Alabama" ".Barbour County, Alabama" ...
##  $ April2020BasePop: num [1:3145] 331464948 58809 231768 25229 22301 ...
##  $ 2020            : num [1:3145] 331526933 58915 233227 24969 22188 ...
##  $ 2021            : num [1:3145] 332048977 59203 239439 24533 22359 ...
##  $ 2022            : num [1:3145] 333271411 59726 246531 24700 21986 ...
##  $ 2023            : num [1:3145] 334914895 60342 253507 24585 21868 ...
str(FIPS)
## tibble [3,196 × 2] (S3: tbl_df/tbl/data.frame)
##  $ FIPS code: num [1:3196] 1000 1001 1003 1005 1007 ...
##  $ name     : chr [1:3196] "Alabama" "Autauga County" "Baldwin County" "Barbour County" ...
shapes <- st_read(dsn = "~/Documents/Rpubs_Projects/Maps/tl_2023_us_county/tl_2023_us_county.shp")
## Reading layer `tl_2023_us_county' from data source 
##   `/Users/avery/Documents/Rpubs_Projects/Maps/tl_2023_us_county/tl_2023_us_county.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 3235 features and 18 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.43979
## Geodetic CRS:  NAD83

2.0 Data Preparation and FIPS Code Standardization

2.1 Standarization

# add leading digets to standardize the FIPS codes 
FIPS <- FIPS |>
  mutate(`FIPS code` = sprintf("%05d", `FIPS code`)) 

In this step, the FIPS code is standardized to ensure that each code is represented by 5 digits. This is done by adding leading zeros to codes with fewer than 5 digits, using the sprintf() function.

2.2 Extract State Codes

# get the state codes 
StateCodes <- FIPS[grep("000$", FIPS$`FIPS code`),]

# grab first two charecters in string 
StateCodes$`FIPS code` <- substr(StateCodes$`FIPS code`,1,2)

StateCodes <- StateCodes |>
  rename(prefixes = `FIPS code`,
         names = name)

Here, state codes are extracted by identifying FIPS codes that end with “000”, which represent entire states. Then, the first two characters of these codes (representing the state prefix) are extracted for further use.

2.3 Fix County Names

# create function to make code more repoducable

fix_names <- function(StateCodes){
  for (i in 1:nrow(StateCodes)) {
  # Get current prefix and state name
    prefix <- StateCodes$prefixes[i]
    state_name <- StateCodes$names[i]
  
  # logical vector for rows where FIPS code starts with the current prefix
    matches <- grepl(paste0("^", prefix), FIPS$`FIPS code`)
  
    FIPS$name[matches] <- gsub("\\s*\\(.*?\\)\\s*", "", FIPS$name[matches])
    
  # Append the state name where matches are found
    FIPS$name[matches] <- paste(FIPS$name[matches], sep = ", ", state_name)
  }
return(FIPS)
}

This custom function appends state names to county names for each row where the FIPS code starts with the appropriate prefix. It also removes any text within parentheses (e.g., removing additional information in county names) using regular expressions.

# call function on StateCodes raw_df 
CleanFIPS <- fix_names(StateCodes)
head(CleanFIPS)
## # A tibble: 6 × 2
##   `FIPS code` name                   
##   <chr>       <chr>                  
## 1 01000       Alabama, Alabama       
## 2 01001       Autauga County, Alabama
## 3 01003       Baldwin County, Alabama
## 4 01005       Barbour County, Alabama
## 5 01007       Bibb County, Alabama   
## 6 01009       Blount County, Alabama

The fix_names() function is applied to the StateCodes dataset, updating the FIPS dataset by appending the state name to the relevant county names.

# fix raw_df county names 
raw_df$Geographic_Area <- gsub("\\.","",raw_df$Geographic_Area)

Here, we clean up the Geographic_Area column of the raw data by removing any periods in county names to ensure consistency during merging.

# CT has different names 
CT <- raw_df[agrep(", Connecticut", raw_df$Geographic_Area),]

CT$GEOID <- c("09110", "09120", "09130", 
              '09140', '09150', '09160', 
              '09170', '09180', '09190')

For counties in Connecticut, specific GEOIDs are manually added since the names vary and need special handling.

2.4 Join Data

# join population to FIPS data
pop_df <- inner_join(raw_df,CleanFIPS, by = c("Geographic_Area"= "name") )

pop_df |>
  rename(GEOID = `FIPS code`) ->pop_df

pop_df <- rbind(pop_df,CT)

The cleaned FIPS data is merged with theraw_df based on the geographic area name. The FIPS code column is then renamed to GEOID for consistency. Finally, the Connecticut county data is appended to the merged dataset.

# join  
map_data <- inner_join(pop_df,shapes,by = c("GEOID" = "GEOID"))

The final dataset merge is done by joining the population data pop_df with the shapefile data shapes using the GEOID field. This ensures that each county’s population data is mapped correctly to its geographic shape for visualization.

final_map_data <- map_data |>
  rename(longitude = INTPTLON,  
         latitude = INTPTLAT) |>
  filter(!STATEFP %in% c( "60", "64", "66", "68","69","70","72",
                          "74","78","81","86", "67", "89", "71",
                          "76", "95", "79", "02", "15")) |>
  mutate(pop_change = `2023` - `2020`, 
         transformed_pop_change = asinh(pop_change))

In this step, the geographic data map_data is cleaned up. Counties from states and territories like Puerto Rico, Guam, and Alaska are excluded by filtering out specific state FIPS codes. These states and territories were removed to focus on the contiguous U.S. counties.

Next, we calculate the population change between 2020 and 2023 for each county by creating a new variable pop_change. This is done by subtracting the 2020 population from the 2023 population.

To handle the wide range of population changes across counties, the inverse hyperbolic sine (asinh()) transformation is applied to the pop_change variable. This transformation ensures that small population changes are more visible on the map while preventing large changes from overwhelming the visual representation.

The Inverse hyperbolic sine function is defined as:

\[asinh(x) = ln(x+ \sqrt{x^2+1})\]

3.0 Make Maps

map <- final_map_data |>
  st_as_sf()

The map data is then converted into an sf (spatial features) object, which is necessary for spatial plotting using geom_sf() in ggplot.

ggplot() +
  geom_sf(data = map, aes(fill = transformed_pop_change)) +
  scale_fill_gradient2(
    low = "darkred", 
    mid = "white", 
    high = "darkblue", 
    midpoint = 0,
    name = "Population Change"
  )  +
  labs(title = "US Counties 2020 to 2023 Population Change") + 
  theme_classic() +
   theme(
    axis.text = element_blank(),      
    axis.ticks = element_blank(),
    axis.line = element_blank(),
    plot.title = element_text(size = 40) ) 

4.0 Conclusion and analysis

final_map_data |>
  dplyr::select(Geographic_Area, GEOID, pop_change) |>
  arrange(desc(pop_change)) |>
  head(n = 10L) |>
    kable(caption = "Table 1 countys with most population gain") |>
  kable_classic()
Table 1 countys with most population gain
Geographic_Area GEOID pop_change
Maricopa County, Arizona 04013 140812
Collin County, Texas 48085 119623
Harris County, Texas 48201 100333
Denton County, Texas 48121 93305
Polk County, Florida 12105 88172
Fort Bend County, Texas 48157 87669
Montgomery County, Texas 48339 86063
Williamson County, Texas 48491 81639
Bexar County, Texas 48029 72178
Riverside County, California 06065 69449
final_map_data |>
  arrange(desc(pop_change)) |>
  dplyr::select(Geographic_Area, GEOID, pop_change) |>
  tail(n = 10L) |>
  kable(caption = "Table 2 countys with most population lost") |>
  kable_classic()
Table 2 countys with most population lost
Geographic_Area GEOID pop_change
Philadelphia County, Pennsylvania 42101 -50142
Santa Clara County, California 06085 -53576
Alameda County, California 06001 -58278
San Francisco County, California 06075 -61530
New York County, New York 36061 -79781
Bronx County, New York 36005 -104675
Queens County, New York 36081 -136668
Kings County, New York 36047 -157222
Cook County, Illinois 17031 -176536
Los Angeles County, California 06037 -329468
final_map_data |>
  group_by(STATEFP) |>
  dplyr::summarise(StatePopChange = sum(pop_change)) |>
  arrange(desc(StatePopChange)) |>
  inner_join( StateCodes, by = c("STATEFP" = "prefixes")) |>
  dplyr::select(STATEFP, names, StatePopChange) |>
  kable(caption = "Table 3 State Population change Ranking") |>
  kable_classic()
Table 3 State Population change Ranking
STATEFP names StatePopChange
48 Texas 1268940
12 Florida 943089
37 North Carolina 381679
13 Georgia 296837
04 Arizona 244661
45 South Carolina 241404
47 Tennessee 200398
49 Utah 133752
16 Idaho 115387
08 Colorado 90021
40 Oklahoma 88590
53 Washington 88314
51 Virginia 78505
32 Nevada 78336
01 Alabama 72709
18 Indiana 72228
29 Missouri 65115
05 Arkansas 54304
30 Montana 45601
10 Delaware 40028
09 Connecticut 39590
46 South Dakota 31680
23 Maine 31205
27 Minnesota 26906
33 New Hampshire 23352
24 Maryland 20742
34 New Jersey 18449
21 Kentucky 17999
19 Iowa 16238
31 Nebraska 15106
55 Wisconsin 11364
11 District of Columbia 8133
56 Wyoming 6393
50 Vermont 4528
38 North Dakota 4363
25 Massachusetts 3686
20 Kansas 2422
44 Rhode Island -482
35 New Mexico -9243
41 Oregon -11686
39 Ohio -12357
28 Mississippi -18719
54 West Virginia -21491
26 Michigan -32797
42 Pennsylvania -32914
22 Louisiana -80206
17 Illinois -233741
36 New York -532105
06 California -538007

Key Takeaways

Population Growth Leaders: Texas and Florida are leading the nation in total population increases, which aligns with trends of migration to warmer climates and lower living costs.

Emerging States: States like North Carolina and Georgia are also showing substantial growth, indicating a broader trend of migration to the Southeast.

Western States: Arizona, Utah, Idaho, and Colorado are growing as well, reflecting the attractiveness of these regions for new residents, likely due to quality of life, job opportunities, and natural beauty.

Major Urban Areas: The counties experiencing the most significant declines are primarily urban centers, particularly in California and New York. This may reflect trends such as high housing prices, economic factors, or changing demographics.

Migration Patterns: There is a clear trend of population movement from urban centers in the Northeast and West Coast to suburban and rural areas in the South and Southwest.