Introduction

This markdown file creates a map to visualize the population changes across U.S. counties from 2020 to 2023. Understanding population dynamics is crucial for effective planning and resource allocation. This analysis aims to highlight areas of significant population growth or decline, providing insights into demographic trends and regional development. Also this is a fun way to practice my data analysis skills. The main package that will be used for the map creation is sf.

1.1 Read in Data

The data used for this analysis was obtained from US Census Bureau under the name CO-EST2023-POP. This data encompasses population estimates for U.S. counties & equivalents. The data was processed to calculate the three-year population change, enabling a clear visual representation of shifts in population.

The Shape files for US county & equivalents comes from the Tiger/line shapefiles and represent the boundaries in 2023.

# Read data from first sheet 
raw_df <- read_xlsx(path = "~/Documents/Rpubs_Projects/Maps/co-est2023-pop.xlsx", 
          sheet = "CO-EST2023-POP",
          range = "A4:F3149",
          col_names = TRUE) |>
  rename(Geographic_Area = `...1`,
         April2020BasePop = `...2`)

## New names:
## • `` -> `...1`
## • `` -> `...2`

# read county FIPS Ids 
FIPS <- read_xlsx( path = "~/Documents/Rpubs_Projects/Maps/co-est2023-pop.xlsx",
                   sheet = "FIPS",
                   range = "A1:B3197",
                   col_names = TRUE) 
str(raw_df)

## tibble [3,145 × 6] (S3: tbl_df/tbl/data.frame)
##  $ Geographic_Area : chr [1:3145] "United States" ".Autauga County, Alabama" ".Baldwin County, Alabama" ".Barbour County, Alabama" ...
##  $ April2020BasePop: num [1:3145] 331464948 58809 231768 25229 22301 ...
##  $ 2020            : num [1:3145] 331526933 58915 233227 24969 22188 ...
##  $ 2021            : num [1:3145] 332048977 59203 239439 24533 22359 ...
##  $ 2022            : num [1:3145] 333271411 59726 246531 24700 21986 ...
##  $ 2023            : num [1:3145] 334914895 60342 253507 24585 21868 ...

str(FIPS)

## tibble [3,196 × 2] (S3: tbl_df/tbl/data.frame)
##  $ FIPS code: num [1:3196] 1000 1001 1003 1005 1007 ...
##  $ name     : chr [1:3196] "Alabama" "Autauga County" "Baldwin County" "Barbour County" ...

shapes <- st_read(dsn = "~/Documents/Rpubs_Projects/Maps/tl_2023_us_county/tl_2023_us_county.shp")

## Reading layer `tl_2023_us_county' from data source 
##   `/Users/avery/Documents/Rpubs_Projects/Maps/tl_2023_us_county/tl_2023_us_county.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 3235 features and 18 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.43979
## Geodetic CRS:  NAD83

2.0 Data Preparation and FIPS Code Standardization

2.1 Standarization

# add leading digets to standardize the FIPS codes 
FIPS <- FIPS |>
  mutate(`FIPS code` = sprintf("%05d", `FIPS code`))

In this step, the FIPS code is standardized to ensure that each code is represented by 5 digits. This is done by adding leading zeros to codes with fewer than 5 digits, using the sprintf() function.

2.2 Extract State Codes

# get the state codes 
StateCodes <- FIPS[grep("000$", FIPS$`FIPS code`),]

# grab first two charecters in string 
StateCodes$`FIPS code` <- substr(StateCodes$`FIPS code`,1,2)

StateCodes <- StateCodes |>
  rename(prefixes = `FIPS code`,
         names = name)

Here, state codes are extracted by identifying FIPS codes that end with “000”, which represent entire states. Then, the first two characters of these codes (representing the state prefix) are extracted for further use.

2.3 Fix County Names

# create function to make code more repoducable

fix_names <- function(StateCodes){
  for (i in 1:nrow(StateCodes)) {
  # Get current prefix and state name
    prefix <- StateCodes$prefixes[i]
    state_name <- StateCodes$names[i]
  
  # logical vector for rows where FIPS code starts with the current prefix
    matches <- grepl(paste0("^", prefix), FIPS$`FIPS code`)
  
    FIPS$name[matches] <- gsub("\\s*\\(.*?\\)\\s*", "", FIPS$name[matches])
    
  # Append the state name where matches are found
    FIPS$name[matches] <- paste(FIPS$name[matches], sep = ", ", state_name)
  }
return(FIPS)
}

This custom function appends state names to county names for each row where the FIPS code starts with the appropriate prefix. It also removes any text within parentheses (e.g., removing additional information in county names) using regular expressions.

# call function on StateCodes raw_df 
CleanFIPS <- fix_names(StateCodes)
head(CleanFIPS)

## # A tibble: 6 × 2
##   `FIPS code` name                   
##   <chr>       <chr>                  
## 1 01000       Alabama, Alabama       
## 2 01001       Autauga County, Alabama
## 3 01003       Baldwin County, Alabama
## 4 01005       Barbour County, Alabama
## 5 01007       Bibb County, Alabama   
## 6 01009       Blount County, Alabama

The fix_names() function is applied to the StateCodes dataset, updating the FIPS dataset by appending the state name to the relevant county names.

# fix raw_df county names 
raw_df$Geographic_Area <- gsub("\\.","",raw_df$Geographic_Area)

Here, we clean up the Geographic_Area column of the raw data by removing any periods in county names to ensure consistency during merging.

# CT has different names 
CT <- raw_df[agrep(", Connecticut", raw_df$Geographic_Area),]

CT$GEOID <- c("09110", "09120", "09130", 
              '09140', '09150', '09160', 
              '09170', '09180', '09190')

For counties in Connecticut, specific GEOIDs are manually added since the names vary and need special handling.

2.4 Join Data

# join population to FIPS data
pop_df <- inner_join(raw_df,CleanFIPS, by = c("Geographic_Area"= "name") )

pop_df |>
  rename(GEOID = `FIPS code`) ->pop_df

pop_df <- rbind(pop_df,CT)

The cleaned FIPS data is merged with theraw_df based on the geographic area name. The FIPS code column is then renamed to GEOID for consistency. Finally, the Connecticut county data is appended to the merged dataset.

# join  
map_data <- inner_join(pop_df,shapes,by = c("GEOID" = "GEOID"))

The final dataset merge is done by joining the population data pop_df with the shapefile data shapes using the GEOID field. This ensures that each county’s population data is mapped correctly to its geographic shape for visualization.

final_map_data <- map_data |>
  rename(longitude = INTPTLON,  
         latitude = INTPTLAT) |>
  filter(!STATEFP %in% c( "60", "64", "66", "68","69","70","72",
                          "74","78","81","86", "67", "89", "71",
                          "76", "95", "79", "02", "15")) |>
  mutate(pop_change = `2023` - `2020`, 
         transformed_pop_change = asinh(pop_change))

In this step, the geographic data map_data is cleaned up. Counties from states and territories like Puerto Rico, Guam, and Alaska are excluded by filtering out specific state FIPS codes. These states and territories were removed to focus on the contiguous U.S. counties.

Next, we calculate the population change between 2020 and 2023 for each county by creating a new variable pop_change. This is done by subtracting the 2020 population from the 2023 population.

To handle the wide range of population changes across counties, the inverse hyperbolic sine (asinh()) transformation is applied to the pop_change variable. This transformation ensures that small population changes are more visible on the map while preventing large changes from overwhelming the visual representation.

The Inverse hyperbolic sine function is defined as:

\[asinh(x) = ln(x+ \sqrt{x^2+1})\]

3.0 Make Maps

map <- final_map_data |>
  st_as_sf()

The map data is then converted into an sf (spatial features) object, which is necessary for spatial plotting using geom_sf() in ggplot.

ggplot() +
  geom_sf(data = map, aes(fill = transformed_pop_change)) +
  scale_fill_gradient2(
    low = "darkred", 
    mid = "white", 
    high = "darkblue", 
    midpoint = 0,
    name = "Population Change"
  )  +
  labs(title = "US Counties 2020 to 2023 Population Change") + 
  theme_classic() +
   theme(
    axis.text = element_blank(),      
    axis.ticks = element_blank(),
    axis.line = element_blank(),
    plot.title = element_text(size = 40) )

4.0 Conclusion and analysis

final_map_data |>
  dplyr::select(Geographic_Area, GEOID, pop_change) |>
  arrange(desc(pop_change)) |>
  head(n = 10L) |>
    kable(caption = "Table 1 countys with most population gain") |>
  kable_classic()

Table 1 countys with most population gain
Geographic_Area	GEOID	pop_change
Maricopa County, Arizona	04013	140812
Collin County, Texas	48085	119623
Harris County, Texas	48201	100333
Denton County, Texas	48121	93305
Polk County, Florida	12105	88172
Fort Bend County, Texas	48157	87669
Montgomery County, Texas	48339	86063
Williamson County, Texas	48491	81639
Bexar County, Texas	48029	72178
Riverside County, California	06065	69449

final_map_data |>
  arrange(desc(pop_change)) |>
  dplyr::select(Geographic_Area, GEOID, pop_change) |>
  tail(n = 10L) |>
  kable(caption = "Table 2 countys with most population lost") |>
  kable_classic()

Table 2 countys with most population lost
Geographic_Area	GEOID	pop_change
Philadelphia County, Pennsylvania	42101	-50142
Santa Clara County, California	06085	-53576
Alameda County, California	06001	-58278
San Francisco County, California	06075	-61530
New York County, New York	36061	-79781
Bronx County, New York	36005	-104675
Queens County, New York	36081	-136668
Kings County, New York	36047	-157222
Cook County, Illinois	17031	-176536
Los Angeles County, California	06037	-329468

final_map_data |>
  group_by(STATEFP) |>
  dplyr::summarise(StatePopChange = sum(pop_change)) |>
  arrange(desc(StatePopChange)) |>
  inner_join( StateCodes, by = c("STATEFP" = "prefixes")) |>
  dplyr::select(STATEFP, names, StatePopChange) |>
  kable(caption = "Table 3 State Population change Ranking") |>
  kable_classic()

Table 3 State Population change Ranking
STATEFP	names	StatePopChange
48	Texas	1268940
12	Florida	943089
37	North Carolina	381679
13	Georgia	296837
04	Arizona	244661
45	South Carolina	241404
47	Tennessee	200398
49	Utah	133752
16	Idaho	115387
08	Colorado	90021
40	Oklahoma	88590
53	Washington	88314
51	Virginia	78505
32	Nevada	78336
01	Alabama	72709
18	Indiana	72228
29	Missouri	65115
05	Arkansas	54304
30	Montana	45601
10	Delaware	40028
09	Connecticut	39590
46	South Dakota	31680
23	Maine	31205
27	Minnesota	26906
33	New Hampshire	23352
24	Maryland	20742
34	New Jersey	18449
21	Kentucky	17999
19	Iowa	16238
31	Nebraska	15106
55	Wisconsin	11364
11	District of Columbia	8133
56	Wyoming	6393
50	Vermont	4528
38	North Dakota	4363
25	Massachusetts	3686
20	Kansas	2422
44	Rhode Island	-482
35	New Mexico	-9243
41	Oregon	-11686
39	Ohio	-12357
28	Mississippi	-18719
54	West Virginia	-21491
26	Michigan	-32797
42	Pennsylvania	-32914
22	Louisiana	-80206
17	Illinois	-233741
36	New York	-532105
06	California	-538007

Key Takeaways

Population Growth Leaders: Texas and Florida are leading the nation in total population increases, which aligns with trends of migration to warmer climates and lower living costs.

Emerging States: States like North Carolina and Georgia are also showing substantial growth, indicating a broader trend of migration to the Southeast.

Western States: Arizona, Utah, Idaho, and Colorado are growing as well, reflecting the attractiveness of these regions for new residents, likely due to quality of life, job opportunities, and natural beauty.

Major Urban Areas: The counties experiencing the most significant declines are primarily urban centers, particularly in California and New York. This may reflect trends such as high housing prices, economic factors, or changing demographics.

Migration Patterns: There is a clear trend of population movement from urban centers in the Northeast and West Coast to suburban and rural areas in the South and Southwest.

Population Change Maps

Avery Davis

2024-09-27