Introduction

This tutorial demonstrates how to work with the Longitudinal Tract Database (LTDB) to analyze demographic changes in U.S. neighborhoods between 2000 and 2010.

What You’ll Learn

By the end of this tutorial, you’ll know how to:

  1. Load census data from online sources
  2. Explore and understand the data structure
  3. Filter data for a specific city
  4. Calculate changes between two time periods
  5. Create simple visualizations

Why This Matters

Understanding how neighborhoods change over time helps us:

  • Track urban development patterns
  • Identify areas of growth or decline
  • Understand housing market dynamics
  • Support evidence-based policy decisions

Let’s get started!


Step 1: Load Packages

First, we need to load the R packages we’ll use. Think of packages as toolboxes with special functions.

# Load required packages
library(dplyr)      # For data manipulation
library(ggplot2)    # For creating plots
library(knitr)      # For nice tables

What each package does:

  • dplyr: Helps us filter, sort, and transform data easily
  • ggplot2: Creates beautiful visualizations
  • knitr: Makes tables look nice in our report

Step 2: Load the Data

The LTDB data is stored online as RDS files (R’s native data format). We’ll download three files:

  1. Census 2000 data
  2. Census 2010 data
  3. Metadata (explains what each variable means)
# Download 2000 census data
URL1 <- "https://github.com/DS4PS/cpp-529-fall-2020/raw/main/LABS/data/rodeo/LTDB-2000.rds"
d2000 <- readRDS(gzcon(url(URL1)))

# Download 2010 census data
URL2 <- "https://github.com/DS4PS/cpp-529-fall-2020/raw/main/LABS/data/rodeo/LTDB-2010.rds"
d2010 <- readRDS(gzcon(url(URL2)))

# Download metadata
URLmd <- "https://github.com/DS4PS/cpp-529-fall-2020/raw/main/LABS/data/rodeo/LTDB-META-DATA.rds"
metadata <- readRDS(gzcon(url(URLmd)))

# Check what we got
cat("✓ 2000 Data:", nrow(d2000), "census tracts\n")
## ✓ 2000 Data: 72693 census tracts
cat("✓ 2010 Data:", nrow(d2010), "census tracts\n")
## ✓ 2010 Data: 74022 census tracts

What’s happening here?

  • readRDS() reads R data files
  • gzcon(url()) downloads and uncompresses the files
  • These datasets contain ALL U.S. census tracts

Step 3: Peek at the Data

Let’s see what variables are in our datasets.

# Show first 10 variable names
cat("First 10 columns in 2000 data:\n")
## First 10 columns in 2000 data:
head(names(d2000), 10)
##  [1] "year"    "tractid" "pop00.x" "nhwht00" "nhblk00" "ntv00"   "asian00"
##  [8] "hisp00"  "haw00"   "india00"
# Look at a few rows
cat("\nFirst 5 rows (showing first 6 columns):\n")
## 
## First 5 rows (showing first 6 columns):
head(d2000[, 1:6])
##   year            tractid  pop00.x  nhwht00   nhblk00    ntv00
## 1 2000 fips-01-001-020100 1920.975 1722.977  144.9981 28.99962
## 2 2000 fips-01-001-020200 1892.000  671.000 1177.0000 12.00000
## 3 2000 fips-01-001-020300 3339.000 2738.000  498.0000 16.00000
## 4 2000 fips-01-001-020400 4556.000 4273.000  118.0000 23.00000
## 5 2000 fips-01-001-020500 6053.912 5426.983  367.4790 36.10111
## 6 2000 fips-01-001-020600 3272.000 2615.275  553.0823 25.18413

Key Variables to Know:

  • tractid: Unique identifier for each census tract
  • pop00 / pop12: Population in 2000 / 2010
  • nhwht00 / nhwht12: Non-Hispanic White population
  • mhmval00 / mhmval12: Median home value
  • hinc00 / hinc12: Median household income

Step 4: Filter for One City

Let’s focus on Denver, Colorado (Denver County). We need to filter the data using the county FIPS code.

Denver County FIPS code: 08031

# The tractid format is "fips-08-031-..." so we need to search for that pattern
# Filter for Denver in 2000
denver_2000 <- d2000 %>%
  filter(grepl("fips-08-031", tractid))

# Filter for Denver in 2010
denver_2010 <- d2010 %>%
  filter(grepl("fips-08-031", tractid))

cat("Denver census tracts in 2000:", nrow(denver_2000), "\n")
## Denver census tracts in 2000: 144
cat("Denver census tracts in 2010:", nrow(denver_2010), "\n")
## Denver census tracts in 2010: 144
# Show a sample
cat("\nSample tract IDs:\n")
## 
## Sample tract IDs:
head(denver_2000$tractid, 3)
## [1] "fips-08-031-000102" "fips-08-031-000201" "fips-08-031-000202"

What’s grepl() doing?

The grepl("fips-08-031", tractid) looks for tract IDs containing “fips-08-031” (Denver County, Colorado - state code 08, county code 031).


Step 5: Merge and Calculate Changes

Now let’s combine the 2000 and 2010 data and calculate how things changed.

# Merge the datasets
denver_change <- denver_2000 %>%
  inner_join(denver_2010, by = "tractid")

cat("Merged dataset has", nrow(denver_change), "rows\n")
## Merged dataset has 144 rows
# Calculate population change
# The 2000 data has pop00.x and 2010 data has pop12
denver_change <- denver_change %>%
  mutate(
    pop_change = pop12 - pop00.x,
    pop_pct_change = ((pop12 - pop00.x) / pop00.x) * 100
  )

# Show summary
cat("\nSummary of Population Changes:\n")
## 
## Summary of Population Changes:
cat("Mean change:", round(mean(denver_change$pop_change, na.rm = TRUE), 0), "people\n")
## Mean change: 345 people
cat("Mean % change:", round(mean(denver_change$pop_pct_change, na.rm = TRUE), 1), "%\n")
## Mean % change: Inf %
cat("Tracts that grew:", sum(denver_change$pop_change > 0, na.rm = TRUE), "\n")
## Tracts that grew: 74
cat("Tracts that declined:", sum(denver_change$pop_change < 0, na.rm = TRUE), "\n")
## Tracts that declined: 69

Understanding the code:

  • inner_join() combines the two datasets by matching tractid
  • The 2000 data has pop00.x for population
  • The 2010 data has pop12 for 2010-2012 population estimate
  • mutate() creates new calculated columns
  • pop_change = simple difference (pop12 - pop00.x)
  • pop_pct_change = percentage change
  • pop_pct_change = percentage change

Step 6: Visualize the Changes

Let’s create a histogram to see how population changed across Denver neighborhoods.

# Create histogram
ggplot(denver_change, aes(x = pop_pct_change)) +
  geom_histogram(bins = 25, fill = "steelblue", color = "white") +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", size = 1.2) +
  labs(
    title = "Population Change in Denver Census Tracts (2000-2010)",
    x = "Population Change (%)",
    y = "Number of Census Tracts",
    caption = "Red line = no change"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold", hjust = 0.5))

Reading the plot:

  • Bars to the RIGHT of the red line = neighborhoods that grew
  • Bars to the LEFT of the red line = neighborhoods that lost population
  • The height of each bar shows how many tracts had that amount of change

Step 7: Compare 2000 vs 2010 Populations

A scatter plot shows the relationship between 2000 and 2010 populations.

# Create scatter plot comparing 2000 vs 2010 populations
ggplot(denver_change, aes(x = pop00.x, y = pop12)) +
  geom_point(alpha = 0.6, size = 3, color = "darkgreen") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray50") +
  labs(
    title = "Denver Census Tract Populations: 2000 vs 2010",
    x = "Population in 2000",
    y = "Population in 2010",
    caption = "Points above the diagonal line grew; points below declined"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold", hjust = 0.5))

How to read this:

  • Each point represents one census tract
  • The diagonal line shows where points would be if there was NO change
  • Points above the line = gained population
  • Points below the line = lost population

Step 8: Find Top Gainers and Non-Gainers

Let’s identify which neighborhoods changed the most.

# Top 5 growing tracts
top_growth <- denver_change %>%
  arrange(desc(pop_change)) %>%
  head(5) %>%
  select(tractid, pop00.x, pop12, pop_change, pop_pct_change)

kable(top_growth, 
      caption = "Top 5 Fastest Growing Census Tracts in Denver",
      col.names = c("Tract ID", "Pop 2000", "Pop 2010", "Change", "% Change"),
      digits = 1)
Top 5 Fastest Growing Census Tracts in Denver
Tract ID Pop 2000 Pop 2010 Change % Change
fips-08-031-008389 5 8175 8170 163400.0
fips-08-031-004106 2575 10137 7562 293.7
fips-08-031-004405 2025 8065 6040 298.3
fips-08-031-008388 878 6399 5521 628.8
fips-08-031-008391 3185 8247 5062 158.9
# Top 5 declining tracts  
top_decline <- denver_change %>%
  arrange(pop_change) %>%
  head(5) %>%
  select(tractid, pop00.x, pop12, pop_change, pop_pct_change)

kable(top_decline,
      caption = "Top 5 Fastest Declining Census Tracts in Denver", 
      col.names = c("Tract ID", "Pop 2000", "Pop 2010", "Change", "% Change"),
      digits = 1)
Top 5 Fastest Declining Census Tracts in Denver
Tract ID Pop 2000 Pop 2010 Change % Change
fips-08-031-000402 7012 4970 -2042 -29.1
fips-08-031-006814 5607 4270 -1337 -23.8
fips-08-031-004403 5053 3937 -1116 -22.1
fips-08-031-003601 5662 4592 -1070 -18.9
fips-08-031-000600 3330 2332 -998 -30.0

Step 8B: Bubble Chart - Population Matters!

Let’s create a bubble chart where the bubble size represents the 2000 population.

# Filter out extreme outliers for better visualization
denver_viz <- denver_change %>%
  filter(pop_pct_change > -100 & pop_pct_change < 200)

ggplot(denver_viz, aes(x = pop00.x, y = pop_pct_change, size = pop00.x, color = pop_pct_change)) +
  geom_point(alpha = 0.6) +
  scale_size_continuous(range = c(2, 15), guide = "none") +
  scale_color_gradient2(
    low = "#d7191c",      # Red for decline
    mid = "#ffffbf",      # Yellow for no change
    high = "#2c7bb6",     # Blue for growth
    midpoint = 0,
    name = "% Change"
  ) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray30", size = 0.8) +
  labs(
    title = "Population Change by Tract Size",
    subtitle = "Bubble size represents 2000 population",
    x = "Population in 2000",
    y = "Percent Change (2000-2010)",
    caption = "Larger bubbles = larger tracts in 2000"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
    plot.subtitle = element_text(hjust = 0.5, color = "gray40"),
    legend.position = "right"
  )

What does this show?

  • Bubble size = how big the tract was in 2000
  • Color = whether it grew (blue) or declined (red)
  • Y-axis position = how much it changed
  • Larger tracts often had more moderate changes

Step 8C: Distribution Comparison

Let’s compare the distribution of populations in 2000 vs 2010 side-by-side.

# Prepare data for density plot
library(tidyr)

density_data <- denver_change %>%
  select(tractid, pop00.x, pop12) %>%
  pivot_longer(cols = c(pop00.x, pop12), 
               names_to = "Year", 
               values_to = "Population") %>%
  mutate(Year = ifelse(Year == "pop00.x", "2000", "2010"))

ggplot(density_data, aes(x = Population, fill = Year)) +
  geom_density(alpha = 0.5, size = 1) +
  scale_fill_manual(values = c("2000" = "#e66101", "2010" = "#5e3c99")) +
  labs(
    title = "Distribution of Tract Populations: 2000 vs 2010",
    x = "Population per Census Tract",
    y = "Density",
    fill = "Year"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    legend.position = "top"
  )

Interpretation:

  • Shows how population is distributed across tracts
  • If the curves overlap, distributions are similar
  • Shifts show whether tracts got larger or smaller overall

Step 8D: Top 10 Changes Bar Chart

A visual comparison of the biggest winners and not winners…..

# Get top 5 gainers and top 5 losers
top_combined <- denver_change %>%
  arrange(desc(pop_change)) %>%
  slice(c(1:5, (n()-4):n())) %>%
  mutate(
    tract_short = substr(tractid, 14, 25),  # Shorten tract ID for display
    change_type = ifelse(pop_change > 0, "Growth", "Decline")
  )

ggplot(top_combined, aes(x = reorder(tract_short, pop_change), 
                         y = pop_change, 
                         fill = change_type)) +
  geom_col(width = 0.7) +
  coord_flip() +
  scale_fill_manual(values = c("Growth" = "#1b7837", "Decline" = "#c51b7d")) +
  labs(
    title = "Top 5 Growing & Declining Census Tracts",
    x = "Census Tract",
    y = "Population Change (2000-2010)",
    fill = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5, size = 15),
    legend.position = "top",
    panel.grid.major.y = element_blank()
  ) +
  geom_hline(yintercept = 0, color = "black", size = 0.5)

Why this is useful:

  • Easy to compare magnitude of changes
  • Clearly shows which neighborhoods are hot spots
  • Horizontal bars make tract IDs easier to read

Step 9: Create a Simple Summary

Let’s make a summary table of key statistics.

# Calculate summary statistics
summary_stats <- data.frame(
  Metric = c(
    "Total Population (2000)",
    "Total Population (2010)",
    "Total Change",
    "Number of Growing Tracts",
    "Number of Declining Tracts",
    "Average Tract Population (2000)",
    "Average Tract Population (2010)"
  ),
  Value = c(
    format(sum(denver_change$pop00.x, na.rm = TRUE), big.mark = ","),
    format(sum(denver_change$pop12, na.rm = TRUE), big.mark = ","),
    format(sum(denver_change$pop_change, na.rm = TRUE), big.mark = ","),
    sum(denver_change$pop_change > 0, na.rm = TRUE),
    sum(denver_change$pop_change < 0, na.rm = TRUE),
    format(round(mean(denver_change$pop00.x, na.rm = TRUE), 0), big.mark = ","),
    format(round(mean(denver_change$pop12, na.rm = TRUE), 0), big.mark = ",")
  )
)

kable(summary_stats, 
      caption = "Denver Demographic Summary (2000-2010)",
      col.names = c("Metric", "Value"))
Denver Demographic Summary (2000-2010)
Metric Value
Total Population (2000) 554,692.5
Total Population (2010) 604,356
Total Change 49,663.52
Number of Growing Tracts 74
Number of Declining Tracts 69
Average Tract Population (2000) 3,852
Average Tract Population (2010) 4,197

Conclusion

What We Accomplished

In this tutorial, we:

  1. Loaded census data from online sources
  2. Filtered for a specific city (Denver)
  3. Calculated changes between 2000 and 2010
  4. Created visualizations to understand the patterns
  5. Identified top growing and declining neighborhoods

Key Takeaways

About the Data: - The LTDB provides consistent census tract data over time - Census tracts are small geographic areas (about 4,000 people) - We can track demographic changes at the neighborhood level

About R Programming: - filter() selects specific rows - mutate() creates new columns - inner_join() merges two datasets - ggplot2 creates visualizations layer by layer

Try It Yourself!

You can adapt this code for any U.S. city by changing the FIPS code pattern in Step 4:

Popular cities: - Los Angeles: "fips-06-037" - Chicago: "fips-17-031"
- Houston: "fips-48-201" - Phoenix: "fips-04-013" - Seattle: "fips-53-033" - Boston: "fips-25-025" - Atlanta: "fips-13-121" - San Francisco: "fips-06-075"

Format explanation: - First 2 digits = State code - Next 3 digits = County code - Pattern: "fips-SS-CCC" where SS is state, CCC is county