This markdown file creates a map to visualize the population changes
across U.S. counties from 2020 to 2023. Understanding population
dynamics is crucial for effective planning and resource allocation. This
analysis aims to highlight areas of significant population growth or
decline, providing insights into demographic trends and regional
development. Also this is a fun way to practice my data analysis skills.
The main package that will be used for the map creation is
sf
.
The data used for this analysis was obtained from US Census Bureau under the name CO-EST2023-POP. This data encompasses population estimates for U.S. counties & equivalents. The data was processed to calculate the three-year population change, enabling a clear visual representation of shifts in population.
The Shape files for US county & equivalents comes from the Tiger/line shapefiles and represent the boundaries in 2023.
# Read data from first sheet
raw_df <- read_xlsx(path = "~/Documents/Rpubs_Projects/Maps/co-est2023-pop.xlsx",
sheet = "CO-EST2023-POP",
range = "A4:F3149",
col_names = TRUE) |>
rename(Geographic_Area = `...1`,
April2020BasePop = `...2`)
## New names:
## • `` -> `...1`
## • `` -> `...2`
# read county FIPS Ids
FIPS <- read_xlsx( path = "~/Documents/Rpubs_Projects/Maps/co-est2023-pop.xlsx",
sheet = "FIPS",
range = "A1:B3197",
col_names = TRUE)
str(raw_df)
## tibble [3,145 × 6] (S3: tbl_df/tbl/data.frame)
## $ Geographic_Area : chr [1:3145] "United States" ".Autauga County, Alabama" ".Baldwin County, Alabama" ".Barbour County, Alabama" ...
## $ April2020BasePop: num [1:3145] 331464948 58809 231768 25229 22301 ...
## $ 2020 : num [1:3145] 331526933 58915 233227 24969 22188 ...
## $ 2021 : num [1:3145] 332048977 59203 239439 24533 22359 ...
## $ 2022 : num [1:3145] 333271411 59726 246531 24700 21986 ...
## $ 2023 : num [1:3145] 334914895 60342 253507 24585 21868 ...
str(FIPS)
## tibble [3,196 × 2] (S3: tbl_df/tbl/data.frame)
## $ FIPS code: num [1:3196] 1000 1001 1003 1005 1007 ...
## $ name : chr [1:3196] "Alabama" "Autauga County" "Baldwin County" "Barbour County" ...
shapes <- st_read(dsn = "~/Documents/Rpubs_Projects/Maps/tl_2023_us_county/tl_2023_us_county.shp")
## Reading layer `tl_2023_us_county' from data source
## `/Users/avery/Documents/Rpubs_Projects/Maps/tl_2023_us_county/tl_2023_us_county.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 3235 features and 18 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -179.2311 ymin: -14.60181 xmax: 179.8597 ymax: 71.43979
## Geodetic CRS: NAD83
# add leading digets to standardize the FIPS codes
FIPS <- FIPS |>
mutate(`FIPS code` = sprintf("%05d", `FIPS code`))
In this step, the FIPS code
is standardized to ensure
that each code is represented by 5 digits. This is done by adding
leading zeros to codes with fewer than 5 digits, using the
sprintf()
function.
# get the state codes
StateCodes <- FIPS[grep("000$", FIPS$`FIPS code`),]
# grab first two charecters in string
StateCodes$`FIPS code` <- substr(StateCodes$`FIPS code`,1,2)
StateCodes <- StateCodes |>
rename(prefixes = `FIPS code`,
names = name)
Here, state codes are extracted by identifying FIPS codes that end with “000”, which represent entire states. Then, the first two characters of these codes (representing the state prefix) are extracted for further use.
# create function to make code more repoducable
fix_names <- function(StateCodes){
for (i in 1:nrow(StateCodes)) {
# Get current prefix and state name
prefix <- StateCodes$prefixes[i]
state_name <- StateCodes$names[i]
# logical vector for rows where FIPS code starts with the current prefix
matches <- grepl(paste0("^", prefix), FIPS$`FIPS code`)
FIPS$name[matches] <- gsub("\\s*\\(.*?\\)\\s*", "", FIPS$name[matches])
# Append the state name where matches are found
FIPS$name[matches] <- paste(FIPS$name[matches], sep = ", ", state_name)
}
return(FIPS)
}
This custom function appends state names to county names for each row where the FIPS code starts with the appropriate prefix. It also removes any text within parentheses (e.g., removing additional information in county names) using regular expressions.
# call function on StateCodes raw_df
CleanFIPS <- fix_names(StateCodes)
head(CleanFIPS)
## # A tibble: 6 × 2
## `FIPS code` name
## <chr> <chr>
## 1 01000 Alabama, Alabama
## 2 01001 Autauga County, Alabama
## 3 01003 Baldwin County, Alabama
## 4 01005 Barbour County, Alabama
## 5 01007 Bibb County, Alabama
## 6 01009 Blount County, Alabama
The fix_names() function is applied to the StateCodes dataset, updating the FIPS dataset by appending the state name to the relevant county names.
# fix raw_df county names
raw_df$Geographic_Area <- gsub("\\.","",raw_df$Geographic_Area)
Here, we clean up the Geographic_Area column of the raw data by removing any periods in county names to ensure consistency during merging.
# CT has different names
CT <- raw_df[agrep(", Connecticut", raw_df$Geographic_Area),]
CT$GEOID <- c("09110", "09120", "09130",
'09140', '09150', '09160',
'09170', '09180', '09190')
For counties in Connecticut, specific GEOIDs are manually added since the names vary and need special handling.
# join population to FIPS data
pop_df <- inner_join(raw_df,CleanFIPS, by = c("Geographic_Area"= "name") )
pop_df |>
rename(GEOID = `FIPS code`) ->pop_df
pop_df <- rbind(pop_df,CT)
The cleaned FIPS data is merged with theraw_df
based on
the geographic area name. The FIPS code column is then renamed to
GEOID
for consistency. Finally, the Connecticut county data
is appended to the merged dataset.
# join
map_data <- inner_join(pop_df,shapes,by = c("GEOID" = "GEOID"))
The final dataset merge is done by joining the population data
pop_df
with the shapefile data shapes
using
the GEOID
field. This ensures that each county’s population
data is mapped correctly to its geographic shape for visualization.
final_map_data <- map_data |>
rename(longitude = INTPTLON,
latitude = INTPTLAT) |>
filter(!STATEFP %in% c( "60", "64", "66", "68","69","70","72",
"74","78","81","86", "67", "89", "71",
"76", "95", "79", "02", "15")) |>
mutate(pop_change = `2023` - `2020`,
transformed_pop_change = asinh(pop_change))
In this step, the geographic data map_data
is cleaned
up. Counties from states and territories like Puerto Rico, Guam, and
Alaska are excluded by filtering out specific state FIPS codes. These
states and territories were removed to focus on the contiguous U.S.
counties.
Next, we calculate the population change between 2020 and 2023 for each county by creating a new variable pop_change. This is done by subtracting the 2020 population from the 2023 population.
To handle the wide range of population changes across counties, the
inverse hyperbolic sine (asinh()
) transformation is applied
to the pop_change variable. This transformation ensures that small
population changes are more visible on the map while preventing large
changes from overwhelming the visual representation.
The Inverse hyperbolic sine function is defined as:
\[asinh(x) = ln(x+ \sqrt{x^2+1})\]
map <- final_map_data |>
st_as_sf()
The map data is then converted into an sf (spatial features) object,
which is necessary for spatial plotting using geom_sf()
in
ggplot.
ggplot() +
geom_sf(data = map, aes(fill = transformed_pop_change)) +
scale_fill_gradient2(
low = "darkred",
mid = "white",
high = "darkblue",
midpoint = 0,
name = "Population Change"
) +
labs(title = "US Counties 2020 to 2023 Population Change") +
theme_classic() +
theme(
axis.text = element_blank(),
axis.ticks = element_blank(),
axis.line = element_blank(),
plot.title = element_text(size = 40) )
final_map_data |>
dplyr::select(Geographic_Area, GEOID, pop_change) |>
arrange(desc(pop_change)) |>
head(n = 10L) |>
kable(caption = "Table 1 countys with most population gain") |>
kable_classic()
Geographic_Area | GEOID | pop_change |
---|---|---|
Maricopa County, Arizona | 04013 | 140812 |
Collin County, Texas | 48085 | 119623 |
Harris County, Texas | 48201 | 100333 |
Denton County, Texas | 48121 | 93305 |
Polk County, Florida | 12105 | 88172 |
Fort Bend County, Texas | 48157 | 87669 |
Montgomery County, Texas | 48339 | 86063 |
Williamson County, Texas | 48491 | 81639 |
Bexar County, Texas | 48029 | 72178 |
Riverside County, California | 06065 | 69449 |
final_map_data |>
arrange(desc(pop_change)) |>
dplyr::select(Geographic_Area, GEOID, pop_change) |>
tail(n = 10L) |>
kable(caption = "Table 2 countys with most population lost") |>
kable_classic()
Geographic_Area | GEOID | pop_change |
---|---|---|
Philadelphia County, Pennsylvania | 42101 | -50142 |
Santa Clara County, California | 06085 | -53576 |
Alameda County, California | 06001 | -58278 |
San Francisco County, California | 06075 | -61530 |
New York County, New York | 36061 | -79781 |
Bronx County, New York | 36005 | -104675 |
Queens County, New York | 36081 | -136668 |
Kings County, New York | 36047 | -157222 |
Cook County, Illinois | 17031 | -176536 |
Los Angeles County, California | 06037 | -329468 |
final_map_data |>
group_by(STATEFP) |>
dplyr::summarise(StatePopChange = sum(pop_change)) |>
arrange(desc(StatePopChange)) |>
inner_join( StateCodes, by = c("STATEFP" = "prefixes")) |>
dplyr::select(STATEFP, names, StatePopChange) |>
kable(caption = "Table 3 State Population change Ranking") |>
kable_classic()
STATEFP | names | StatePopChange |
---|---|---|
48 | Texas | 1268940 |
12 | Florida | 943089 |
37 | North Carolina | 381679 |
13 | Georgia | 296837 |
04 | Arizona | 244661 |
45 | South Carolina | 241404 |
47 | Tennessee | 200398 |
49 | Utah | 133752 |
16 | Idaho | 115387 |
08 | Colorado | 90021 |
40 | Oklahoma | 88590 |
53 | Washington | 88314 |
51 | Virginia | 78505 |
32 | Nevada | 78336 |
01 | Alabama | 72709 |
18 | Indiana | 72228 |
29 | Missouri | 65115 |
05 | Arkansas | 54304 |
30 | Montana | 45601 |
10 | Delaware | 40028 |
09 | Connecticut | 39590 |
46 | South Dakota | 31680 |
23 | Maine | 31205 |
27 | Minnesota | 26906 |
33 | New Hampshire | 23352 |
24 | Maryland | 20742 |
34 | New Jersey | 18449 |
21 | Kentucky | 17999 |
19 | Iowa | 16238 |
31 | Nebraska | 15106 |
55 | Wisconsin | 11364 |
11 | District of Columbia | 8133 |
56 | Wyoming | 6393 |
50 | Vermont | 4528 |
38 | North Dakota | 4363 |
25 | Massachusetts | 3686 |
20 | Kansas | 2422 |
44 | Rhode Island | -482 |
35 | New Mexico | -9243 |
41 | Oregon | -11686 |
39 | Ohio | -12357 |
28 | Mississippi | -18719 |
54 | West Virginia | -21491 |
26 | Michigan | -32797 |
42 | Pennsylvania | -32914 |
22 | Louisiana | -80206 |
17 | Illinois | -233741 |
36 | New York | -532105 |
06 | California | -538007 |
Key Takeaways
Population Growth Leaders: Texas and Florida are leading the nation in total population increases, which aligns with trends of migration to warmer climates and lower living costs.
Emerging States: States like North Carolina and Georgia are also showing substantial growth, indicating a broader trend of migration to the Southeast.
Western States: Arizona, Utah, Idaho, and Colorado are growing as well, reflecting the attractiveness of these regions for new residents, likely due to quality of life, job opportunities, and natural beauty.
Major Urban Areas: The counties experiencing the most significant declines are primarily urban centers, particularly in California and New York. This may reflect trends such as high housing prices, economic factors, or changing demographics.
Migration Patterns: There is a clear trend of population movement from urban centers in the Northeast and West Coast to suburban and rural areas in the South and Southwest.