Introduction

This tutorial demonstrates how to work with the Longitudinal Tract Database (LTDB) to analyze demographic changes in U.S. neighborhoods between 2000 and 2010.

What You’ll Learn

By the end of this tutorial, you’ll know how to:

Load census data from online sources
Explore and understand the data structure
Filter data for a specific city
Calculate changes between two time periods
Create simple visualizations

Why This Matters

Understanding how neighborhoods change over time helps us:

Track urban development patterns
Identify areas of growth or decline
Understand housing market dynamics
Support evidence-based policy decisions

Let’s get started!

Step 1: Load Packages

First, we need to load the R packages we’ll use. Think of packages as toolboxes with special functions.

# Load required packages
library(dplyr)      # For data manipulation
library(ggplot2)    # For creating plots
library(knitr)      # For nice tables

What each package does:

dplyr: Helps us filter, sort, and transform data easily
ggplot2: Creates beautiful visualizations
knitr: Makes tables look nice in our report

Step 2: Load the Data

The LTDB data is stored online as RDS files (R’s native data format). We’ll download three files:

Census 2000 data
Census 2010 data
Metadata (explains what each variable means)

# Download 2000 census data
URL1 <- "https://github.com/DS4PS/cpp-529-fall-2020/raw/main/LABS/data/rodeo/LTDB-2000.rds"
d2000 <- readRDS(gzcon(url(URL1)))

# Download 2010 census data
URL2 <- "https://github.com/DS4PS/cpp-529-fall-2020/raw/main/LABS/data/rodeo/LTDB-2010.rds"
d2010 <- readRDS(gzcon(url(URL2)))

# Download metadata
URLmd <- "https://github.com/DS4PS/cpp-529-fall-2020/raw/main/LABS/data/rodeo/LTDB-META-DATA.rds"
metadata <- readRDS(gzcon(url(URLmd)))

# Check what we got
cat("✓ 2000 Data:", nrow(d2000), "census tracts\n")

## ✓ 2000 Data: 72693 census tracts

cat("✓ 2010 Data:", nrow(d2010), "census tracts\n")

## ✓ 2010 Data: 74022 census tracts

What’s happening here?

readRDS() reads R data files
gzcon(url()) downloads and uncompresses the files
These datasets contain ALL U.S. census tracts

Step 3: Peek at the Data

Let’s see what variables are in our datasets.

# Show first 10 variable names
cat("First 10 columns in 2000 data:\n")

## First 10 columns in 2000 data:

head(names(d2000), 10)

##  [1] "year"    "tractid" "pop00.x" "nhwht00" "nhblk00" "ntv00"   "asian00"
##  [8] "hisp00"  "haw00"   "india00"

# Look at a few rows
cat("\nFirst 5 rows (showing first 6 columns):\n")

## 
## First 5 rows (showing first 6 columns):

head(d2000[, 1:6])

##   year            tractid  pop00.x  nhwht00   nhblk00    ntv00
## 1 2000 fips-01-001-020100 1920.975 1722.977  144.9981 28.99962
## 2 2000 fips-01-001-020200 1892.000  671.000 1177.0000 12.00000
## 3 2000 fips-01-001-020300 3339.000 2738.000  498.0000 16.00000
## 4 2000 fips-01-001-020400 4556.000 4273.000  118.0000 23.00000
## 5 2000 fips-01-001-020500 6053.912 5426.983  367.4790 36.10111
## 6 2000 fips-01-001-020600 3272.000 2615.275  553.0823 25.18413

Key Variables to Know:

tractid: Unique identifier for each census tract
pop00 / pop12: Population in 2000 / 2010
nhwht00 / nhwht12: Non-Hispanic White population
mhmval00 / mhmval12: Median home value
hinc00 / hinc12: Median household income

Step 4: Filter for One City

Let’s focus on Denver, Colorado (Denver County). We need to filter the data using the county FIPS code.

Denver County FIPS code: 08031

# The tractid format is "fips-08-031-..." so we need to search for that pattern
# Filter for Denver in 2000
denver_2000 <- d2000 %>%
  filter(grepl("fips-08-031", tractid))

# Filter for Denver in 2010
denver_2010 <- d2010 %>%
  filter(grepl("fips-08-031", tractid))

cat("Denver census tracts in 2000:", nrow(denver_2000), "\n")

## Denver census tracts in 2000: 144

cat("Denver census tracts in 2010:", nrow(denver_2010), "\n")

## Denver census tracts in 2010: 144

# Show a sample
cat("\nSample tract IDs:\n")

## 
## Sample tract IDs:

head(denver_2000$tractid, 3)

## [1] "fips-08-031-000102" "fips-08-031-000201" "fips-08-031-000202"

What’s grepl() doing?

The grepl("fips-08-031", tractid) looks for tract IDs containing “fips-08-031” (Denver County, Colorado - state code 08, county code 031).

Step 5: Merge and Calculate Changes

Now let’s combine the 2000 and 2010 data and calculate how things changed.

# Merge the datasets
denver_change <- denver_2000 %>%
  inner_join(denver_2010, by = "tractid")

cat("Merged dataset has", nrow(denver_change), "rows\n")

## Merged dataset has 144 rows

# Calculate population change
# The 2000 data has pop00.x and 2010 data has pop12
denver_change <- denver_change %>%
  mutate(
    pop_change = pop12 - pop00.x,
    pop_pct_change = ((pop12 - pop00.x) / pop00.x) * 100
  )

# Show summary
cat("\nSummary of Population Changes:\n")

## 
## Summary of Population Changes:

cat("Mean change:", round(mean(denver_change$pop_change, na.rm = TRUE), 0), "people\n")

## Mean change: 345 people

cat("Mean % change:", round(mean(denver_change$pop_pct_change, na.rm = TRUE), 1), "%\n")

## Mean % change: Inf %

cat("Tracts that grew:", sum(denver_change$pop_change > 0, na.rm = TRUE), "\n")

## Tracts that grew: 74

cat("Tracts that declined:", sum(denver_change$pop_change < 0, na.rm = TRUE), "\n")

## Tracts that declined: 69

Understanding the code:

inner_join() combines the two datasets by matching tractid
The 2000 data has pop00.x for population
The 2010 data has pop12 for 2010-2012 population estimate
mutate() creates new calculated columns
pop_change = simple difference (pop12 - pop00.x)
pop_pct_change = percentage change
pop_pct_change = percentage change

Step 6: Visualize the Changes

Let’s create a histogram to see how population changed across Denver neighborhoods.

# Create histogram
ggplot(denver_change, aes(x = pop_pct_change)) +
  geom_histogram(bins = 25, fill = "steelblue", color = "white") +
  geom_vline(xintercept = 0, color = "red", linetype = "dashed", size = 1.2) +
  labs(
    title = "Population Change in Denver Census Tracts (2000-2010)",
    x = "Population Change (%)",
    y = "Number of Census Tracts",
    caption = "Red line = no change"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold", hjust = 0.5))

Reading the plot:

Bars to the RIGHT of the red line = neighborhoods that grew
Bars to the LEFT of the red line = neighborhoods that lost population
The height of each bar shows how many tracts had that amount of change

Step 7: Compare 2000 vs 2010 Populations

A scatter plot shows the relationship between 2000 and 2010 populations.

# Create scatter plot comparing 2000 vs 2010 populations
ggplot(denver_change, aes(x = pop00.x, y = pop12)) +
  geom_point(alpha = 0.6, size = 3, color = "darkgreen") +
  geom_abline(intercept = 0, slope = 1, linetype = "dashed", color = "gray50") +
  labs(
    title = "Denver Census Tract Populations: 2000 vs 2010",
    x = "Population in 2000",
    y = "Population in 2010",
    caption = "Points above the diagonal line grew; points below declined"
  ) +
  theme_minimal(base_size = 13) +
  theme(plot.title = element_text(face = "bold", hjust = 0.5))

How to read this:

Each point represents one census tract
The diagonal line shows where points would be if there was NO change
Points above the line = gained population
Points below the line = lost population

Step 8: Find Top Gainers and Non-Gainers

Let’s identify which neighborhoods changed the most.

# Top 5 growing tracts
top_growth <- denver_change %>%
  arrange(desc(pop_change)) %>%
  head(5) %>%
  select(tractid, pop00.x, pop12, pop_change, pop_pct_change)

kable(top_growth, 
      caption = "Top 5 Fastest Growing Census Tracts in Denver",
      col.names = c("Tract ID", "Pop 2000", "Pop 2010", "Change", "% Change"),
      digits = 1)

Top 5 Fastest Growing Census Tracts in Denver
Tract ID	Pop 2000	Pop 2010	Change	% Change
fips-08-031-008389	5	8175	8170	163400.0
fips-08-031-004106	2575	10137	7562	293.7
fips-08-031-004405	2025	8065	6040	298.3
fips-08-031-008388	878	6399	5521	628.8
fips-08-031-008391	3185	8247	5062	158.9

# Top 5 declining tracts  
top_decline <- denver_change %>%
  arrange(pop_change) %>%
  head(5) %>%
  select(tractid, pop00.x, pop12, pop_change, pop_pct_change)

kable(top_decline,
      caption = "Top 5 Fastest Declining Census Tracts in Denver", 
      col.names = c("Tract ID", "Pop 2000", "Pop 2010", "Change", "% Change"),
      digits = 1)

Top 5 Fastest Declining Census Tracts in Denver
Tract ID	Pop 2000	Pop 2010	Change	% Change
fips-08-031-000402	7012	4970	-2042	-29.1
fips-08-031-006814	5607	4270	-1337	-23.8
fips-08-031-004403	5053	3937	-1116	-22.1
fips-08-031-003601	5662	4592	-1070	-18.9
fips-08-031-000600	3330	2332	-998	-30.0

Step 8B: Bubble Chart - Population Matters!

Let’s create a bubble chart where the bubble size represents the 2000 population.

# Filter out extreme outliers for better visualization
denver_viz <- denver_change %>%
  filter(pop_pct_change > -100 & pop_pct_change < 200)

ggplot(denver_viz, aes(x = pop00.x, y = pop_pct_change, size = pop00.x, color = pop_pct_change)) +
  geom_point(alpha = 0.6) +
  scale_size_continuous(range = c(2, 15), guide = "none") +
  scale_color_gradient2(
    low = "#d7191c",      # Red for decline
    mid = "#ffffbf",      # Yellow for no change
    high = "#2c7bb6",     # Blue for growth
    midpoint = 0,
    name = "% Change"
  ) +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray30", size = 0.8) +
  labs(
    title = "Population Change by Tract Size",
    subtitle = "Bubble size represents 2000 population",
    x = "Population in 2000",
    y = "Percent Change (2000-2010)",
    caption = "Larger bubbles = larger tracts in 2000"
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
    plot.subtitle = element_text(hjust = 0.5, color = "gray40"),
    legend.position = "right"
  )

What does this show?

Bubble size = how big the tract was in 2000
Color = whether it grew (blue) or declined (red)
Y-axis position = how much it changed
Larger tracts often had more moderate changes

Step 8C: Distribution Comparison

Let’s compare the distribution of populations in 2000 vs 2010 side-by-side.

# Prepare data for density plot
library(tidyr)

density_data <- denver_change %>%
  select(tractid, pop00.x, pop12) %>%
  pivot_longer(cols = c(pop00.x, pop12), 
               names_to = "Year", 
               values_to = "Population") %>%
  mutate(Year = ifelse(Year == "pop00.x", "2000", "2010"))

ggplot(density_data, aes(x = Population, fill = Year)) +
  geom_density(alpha = 0.5, size = 1) +
  scale_fill_manual(values = c("2000" = "#e66101", "2010" = "#5e3c99")) +
  labs(
    title = "Distribution of Tract Populations: 2000 vs 2010",
    x = "Population per Census Tract",
    y = "Density",
    fill = "Year"
  ) +
  theme_minimal(base_size = 13) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5),
    legend.position = "top"
  )

Interpretation:

Shows how population is distributed across tracts
If the curves overlap, distributions are similar
Shifts show whether tracts got larger or smaller overall

Step 8D: Top 10 Changes Bar Chart

A visual comparison of the biggest winners and not winners…..

# Get top 5 gainers and top 5 losers
top_combined <- denver_change %>%
  arrange(desc(pop_change)) %>%
  slice(c(1:5, (n()-4):n())) %>%
  mutate(
    tract_short = substr(tractid, 14, 25),  # Shorten tract ID for display
    change_type = ifelse(pop_change > 0, "Growth", "Decline")
  )

ggplot(top_combined, aes(x = reorder(tract_short, pop_change), 
                         y = pop_change, 
                         fill = change_type)) +
  geom_col(width = 0.7) +
  coord_flip() +
  scale_fill_manual(values = c("Growth" = "#1b7837", "Decline" = "#c51b7d")) +
  labs(
    title = "Top 5 Growing & Declining Census Tracts",
    x = "Census Tract",
    y = "Population Change (2000-2010)",
    fill = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(
    plot.title = element_text(face = "bold", hjust = 0.5, size = 15),
    legend.position = "top",
    panel.grid.major.y = element_blank()
  ) +
  geom_hline(yintercept = 0, color = "black", size = 0.5)

Why this is useful:

Easy to compare magnitude of changes
Clearly shows which neighborhoods are hot spots
Horizontal bars make tract IDs easier to read

Step 9: Create a Simple Summary

Let’s make a summary table of key statistics.

# Calculate summary statistics
summary_stats <- data.frame(
  Metric = c(
    "Total Population (2000)",
    "Total Population (2010)",
    "Total Change",
    "Number of Growing Tracts",
    "Number of Declining Tracts",
    "Average Tract Population (2000)",
    "Average Tract Population (2010)"
  ),
  Value = c(
    format(sum(denver_change$pop00.x, na.rm = TRUE), big.mark = ","),
    format(sum(denver_change$pop12, na.rm = TRUE), big.mark = ","),
    format(sum(denver_change$pop_change, na.rm = TRUE), big.mark = ","),
    sum(denver_change$pop_change > 0, na.rm = TRUE),
    sum(denver_change$pop_change < 0, na.rm = TRUE),
    format(round(mean(denver_change$pop00.x, na.rm = TRUE), 0), big.mark = ","),
    format(round(mean(denver_change$pop12, na.rm = TRUE), 0), big.mark = ",")
  )
)

kable(summary_stats, 
      caption = "Denver Demographic Summary (2000-2010)",
      col.names = c("Metric", "Value"))

Denver Demographic Summary (2000-2010)
Metric	Value
Total Population (2000)	554,692.5
Total Population (2010)	604,356
Total Change	49,663.52
Number of Growing Tracts	74
Number of Declining Tracts	69
Average Tract Population (2000)	3,852
Average Tract Population (2010)	4,197

Conclusion

What We Accomplished

In this tutorial, we:

✅ Loaded census data from online sources
✅ Filtered for a specific city (Denver)
✅ Calculated changes between 2000 and 2010
✅ Created visualizations to understand the patterns
✅ Identified top growing and declining neighborhoods

Key Takeaways

About the Data: - The LTDB provides consistent census tract data over time - Census tracts are small geographic areas (about 4,000 people) - We can track demographic changes at the neighborhood level

About R Programming: - filter() selects specific rows - mutate() creates new columns - inner_join() merges two datasets - ggplot2 creates visualizations layer by layer

Try It Yourself!

You can adapt this code for any U.S. city by changing the FIPS code pattern in Step 4:

Popular cities: - Los Angeles: "fips-06-037" - Chicago: "fips-17-031"
- Houston: "fips-48-201" - Phoenix: "fips-04-013" - Seattle: "fips-53-033" - Boston: "fips-25-025" - Atlanta: "fips-13-121" - San Francisco: "fips-06-075"

Format explanation: - First 2 digits = State code - Next 3 digits = County code - Pattern: "fips-SS-CCC" where SS is state, CCC is county

Working with Census Tract Data: A Simple Guide to Analyzing Neighborhood Change

Sinchana Praveen

2025-12-02