This tutorial demonstrates how to work with the Longitudinal Tract Database (LTDB) to analyze demographic changes in U.S. neighborhoods between 2000 and 2010.
By the end of this tutorial, you’ll know how to:
Understanding how neighborhoods change over time helps us:
Let’s get started!
First, we need to load the R packages we’ll use. Think of packages as toolboxes with special functions.
# Load required packages
library(dplyr) # For data manipulation
library(ggplot2) # For creating plots
library(knitr) # For nice tablesWhat each package does:
The LTDB data is stored online as RDS files (R’s native data format). We’ll download three files:
# Download 2000 census data
URL1 <- "https://github.com/DS4PS/cpp-529-fall-2020/raw/main/LABS/data/rodeo/LTDB-2000.rds"
d2000 <- readRDS(gzcon(url(URL1)))
# Download 2010 census data
URL2 <- "https://github.com/DS4PS/cpp-529-fall-2020/raw/main/LABS/data/rodeo/LTDB-2010.rds"
d2010 <- readRDS(gzcon(url(URL2)))
# Download metadata
URLmd <- "https://github.com/DS4PS/cpp-529-fall-2020/raw/main/LABS/data/rodeo/LTDB-META-DATA.rds"
metadata <- readRDS(gzcon(url(URLmd)))
# Check what we got
cat("✓ 2000 Data:", nrow(d2000), "census tracts\n")## ✓ 2000 Data: 72693 census tracts
## ✓ 2010 Data: 74022 census tracts
What’s happening here?
readRDS() reads R data filesgzcon(url()) downloads and uncompresses the filesLet’s see what variables are in our datasets.
## First 10 columns in 2000 data:
## [1] "year" "tractid" "pop00.x" "nhwht00" "nhblk00" "ntv00" "asian00"
## [8] "hisp00" "haw00" "india00"
##
## First 5 rows (showing first 6 columns):
## year tractid pop00.x nhwht00 nhblk00 ntv00
## 1 2000 fips-01-001-020100 1920.975 1722.977 144.9981 28.99962
## 2 2000 fips-01-001-020200 1892.000 671.000 1177.0000 12.00000
## 3 2000 fips-01-001-020300 3339.000 2738.000 498.0000 16.00000
## 4 2000 fips-01-001-020400 4556.000 4273.000 118.0000 23.00000
## 5 2000 fips-01-001-020500 6053.912 5426.983 367.4790 36.10111
## 6 2000 fips-01-001-020600 3272.000 2615.275 553.0823 25.18413
Key Variables to Know:
tractid: Unique identifier for each census tractpop00 / pop12: Population in 2000 /
2010nhwht00 / nhwht12: Non-Hispanic White
populationmhmval00 / mhmval12: Median home
valuehinc00 / hinc12: Median household
incomeLet’s focus on Denver, Colorado (Denver County). We need to filter the data using the county FIPS code.
Denver County FIPS code: 08031
# The tractid format is "fips-08-031-..." so we need to search for that pattern
# Filter for Denver in 2000
denver_2000 <- d2000 %>%
filter(grepl("fips-08-031", tractid))
# Filter for Denver in 2010
denver_2010 <- d2010 %>%
filter(grepl("fips-08-031", tractid))
cat("Denver census tracts in 2000:", nrow(denver_2000), "\n")## Denver census tracts in 2000: 144
## Denver census tracts in 2010: 144
##
## Sample tract IDs:
## [1] "fips-08-031-000102" "fips-08-031-000201" "fips-08-031-000202"
What’s grepl() doing?
The grepl("fips-08-031", tractid) looks for tract IDs
containing “fips-08-031” (Denver County, Colorado - state code 08,
county code 031).
Now let’s combine the 2000 and 2010 data and calculate how things changed.
# Merge the datasets
denver_change <- denver_2000 %>%
inner_join(denver_2010, by = "tractid")
cat("Merged dataset has", nrow(denver_change), "rows\n")## Merged dataset has 144 rows
# Calculate population change
# The 2000 data has pop00.x and 2010 data has pop12
denver_change <- denver_change %>%
mutate(
pop_change = pop12 - pop00.x,
pop_pct_change = ((pop12 - pop00.x) / pop00.x) * 100
)
# Show summary
cat("\nSummary of Population Changes:\n")##
## Summary of Population Changes:
## Mean change: 345 people
## Mean % change: Inf %
## Tracts that grew: 74
## Tracts that declined: 69
Understanding the code:
inner_join() combines the two datasets by matching
tractidpop00.x for populationpop12 for 2010-2012 population
estimatemutate() creates new calculated columnspop_change = simple difference (pop12 - pop00.x)pop_pct_change = percentage changepop_pct_change = percentage changeLet’s create a histogram to see how population changed across Denver neighborhoods.
# Create histogram
ggplot(denver_change, aes(x = pop_pct_change)) +
geom_histogram(bins = 25, fill = "steelblue", color = "white") +
geom_vline(xintercept = 0, color = "red", linetype = "dashed", size = 1.2) +
labs(
title = "Population Change in Denver Census Tracts (2000-2010)",
x = "Population Change (%)",
y = "Number of Census Tracts",
caption = "Red line = no change"
) +
theme_minimal(base_size = 13) +
theme(plot.title = element_text(face = "bold", hjust = 0.5))Reading the plot:
A scatter plot shows the relationship between 2000 and 2010 populations.
# Create scatter plot comparing 2000 vs 2010 populations
ggplot(denver_change, aes(x = pop00.x, y = pop12)) +
geom_point(alpha = 0.6, size = 3, color = "darkgreen") +
geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray50") +
labs(
title = "Denver Census Tract Populations: 2000 vs 2010",
x = "Population in 2000",
y = "Population in 2010",
caption = "Points above the diagonal line grew; points below declined"
) +
theme_minimal(base_size = 13) +
theme(plot.title = element_text(face = "bold", hjust = 0.5))How to read this:
Let’s identify which neighborhoods changed the most.
# Top 5 growing tracts
top_growth <- denver_change %>%
arrange(desc(pop_change)) %>%
head(5) %>%
select(tractid, pop00.x, pop12, pop_change, pop_pct_change)
kable(top_growth,
caption = "Top 5 Fastest Growing Census Tracts in Denver",
col.names = c("Tract ID", "Pop 2000", "Pop 2010", "Change", "% Change"),
digits = 1)| Tract ID | Pop 2000 | Pop 2010 | Change | % Change |
|---|---|---|---|---|
| fips-08-031-008389 | 5 | 8175 | 8170 | 163400.0 |
| fips-08-031-004106 | 2575 | 10137 | 7562 | 293.7 |
| fips-08-031-004405 | 2025 | 8065 | 6040 | 298.3 |
| fips-08-031-008388 | 878 | 6399 | 5521 | 628.8 |
| fips-08-031-008391 | 3185 | 8247 | 5062 | 158.9 |
# Top 5 declining tracts
top_decline <- denver_change %>%
arrange(pop_change) %>%
head(5) %>%
select(tractid, pop00.x, pop12, pop_change, pop_pct_change)
kable(top_decline,
caption = "Top 5 Fastest Declining Census Tracts in Denver",
col.names = c("Tract ID", "Pop 2000", "Pop 2010", "Change", "% Change"),
digits = 1)| Tract ID | Pop 2000 | Pop 2010 | Change | % Change |
|---|---|---|---|---|
| fips-08-031-000402 | 7012 | 4970 | -2042 | -29.1 |
| fips-08-031-006814 | 5607 | 4270 | -1337 | -23.8 |
| fips-08-031-004403 | 5053 | 3937 | -1116 | -22.1 |
| fips-08-031-003601 | 5662 | 4592 | -1070 | -18.9 |
| fips-08-031-000600 | 3330 | 2332 | -998 | -30.0 |
Let’s create a bubble chart where the bubble size represents the 2000 population.
# Filter out extreme outliers for better visualization
denver_viz <- denver_change %>%
filter(pop_pct_change > -100 & pop_pct_change < 200)
ggplot(denver_viz, aes(x = pop00.x, y = pop_pct_change, size = pop00.x, color = pop_pct_change)) +
geom_point(alpha = 0.6) +
scale_size_continuous(range = c(2, 15), guide = "none") +
scale_color_gradient2(
low = "#d7191c", # Red for decline
mid = "#ffffbf", # Yellow for no change
high = "#2c7bb6", # Blue for growth
midpoint = 0,
name = "% Change"
) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray30", size = 0.8) +
labs(
title = "Population Change by Tract Size",
subtitle = "Bubble size represents 2000 population",
x = "Population in 2000",
y = "Percent Change (2000-2010)",
caption = "Larger bubbles = larger tracts in 2000"
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
plot.subtitle = element_text(hjust = 0.5, color = "gray40"),
legend.position = "right"
)What does this show?
Let’s compare the distribution of populations in 2000 vs 2010 side-by-side.
# Prepare data for density plot
library(tidyr)
density_data <- denver_change %>%
select(tractid, pop00.x, pop12) %>%
pivot_longer(cols = c(pop00.x, pop12),
names_to = "Year",
values_to = "Population") %>%
mutate(Year = ifelse(Year == "pop00.x", "2000", "2010"))
ggplot(density_data, aes(x = Population, fill = Year)) +
geom_density(alpha = 0.5, size = 1) +
scale_fill_manual(values = c("2000" = "#e66101", "2010" = "#5e3c99")) +
labs(
title = "Distribution of Tract Populations: 2000 vs 2010",
x = "Population per Census Tract",
y = "Density",
fill = "Year"
) +
theme_minimal(base_size = 13) +
theme(
plot.title = element_text(face = "bold", hjust = 0.5),
legend.position = "top"
)Interpretation:
A visual comparison of the biggest winners and not winners…..
# Get top 5 gainers and top 5 losers
top_combined <- denver_change %>%
arrange(desc(pop_change)) %>%
slice(c(1:5, (n()-4):n())) %>%
mutate(
tract_short = substr(tractid, 14, 25), # Shorten tract ID for display
change_type = ifelse(pop_change > 0, "Growth", "Decline")
)
ggplot(top_combined, aes(x = reorder(tract_short, pop_change),
y = pop_change,
fill = change_type)) +
geom_col(width = 0.7) +
coord_flip() +
scale_fill_manual(values = c("Growth" = "#1b7837", "Decline" = "#c51b7d")) +
labs(
title = "Top 5 Growing & Declining Census Tracts",
x = "Census Tract",
y = "Population Change (2000-2010)",
fill = NULL
) +
theme_minimal(base_size = 12) +
theme(
plot.title = element_text(face = "bold", hjust = 0.5, size = 15),
legend.position = "top",
panel.grid.major.y = element_blank()
) +
geom_hline(yintercept = 0, color = "black", size = 0.5)Why this is useful:
Let’s make a summary table of key statistics.
# Calculate summary statistics
summary_stats <- data.frame(
Metric = c(
"Total Population (2000)",
"Total Population (2010)",
"Total Change",
"Number of Growing Tracts",
"Number of Declining Tracts",
"Average Tract Population (2000)",
"Average Tract Population (2010)"
),
Value = c(
format(sum(denver_change$pop00.x, na.rm = TRUE), big.mark = ","),
format(sum(denver_change$pop12, na.rm = TRUE), big.mark = ","),
format(sum(denver_change$pop_change, na.rm = TRUE), big.mark = ","),
sum(denver_change$pop_change > 0, na.rm = TRUE),
sum(denver_change$pop_change < 0, na.rm = TRUE),
format(round(mean(denver_change$pop00.x, na.rm = TRUE), 0), big.mark = ","),
format(round(mean(denver_change$pop12, na.rm = TRUE), 0), big.mark = ",")
)
)
kable(summary_stats,
caption = "Denver Demographic Summary (2000-2010)",
col.names = c("Metric", "Value"))| Metric | Value |
|---|---|
| Total Population (2000) | 554,692.5 |
| Total Population (2010) | 604,356 |
| Total Change | 49,663.52 |
| Number of Growing Tracts | 74 |
| Number of Declining Tracts | 69 |
| Average Tract Population (2000) | 3,852 |
| Average Tract Population (2010) | 4,197 |
In this tutorial, we:
About the Data: - The LTDB provides consistent census tract data over time - Census tracts are small geographic areas (about 4,000 people) - We can track demographic changes at the neighborhood level
About R Programming: - filter() selects
specific rows - mutate() creates new columns -
inner_join() merges two datasets - ggplot2
creates visualizations layer by layer
You can adapt this code for any U.S. city by changing the FIPS code pattern in Step 4:
Popular cities: - Los Angeles:
"fips-06-037" - Chicago: "fips-17-031"
- Houston: "fips-48-201" - Phoenix:
"fips-04-013" - Seattle: "fips-53-033" -
Boston: "fips-25-025" - Atlanta: "fips-13-121"
- San Francisco: "fips-06-075"
Format explanation: - First 2 digits = State code -
Next 3 digits = County code - Pattern: "fips-SS-CCC" where
SS is state, CCC is county