The Impact of Community Design on Public Health: A Focus on Obesity and COPD
Introduction
Chronic health issues like obesity and Chronic Obstructive Pulmonary Disease (COPD) are major concerns in the United States. Obesity rates currently sit at a staggering 42%, with associated healthcare costs exceeding $173 billion annually [5, 6]. COPD, often linked to smoking and a risk factor for heart disease, further burdens the healthcare system. Interestingly, research suggests a connection between community design and the prevalence of these health problems. Studies have shown that adults residing in walkable neighborhoods with good street connectivity and green spaces tend to engage in more physical activity, have lower BMIs, and potentially experience better overall heart health [1]. However, historical policies like redlining have disproportionately impacted communities of color, often creating neighborhoods with limited walkability, hindering physical activity, and potentially contributing to higher health risks [1]. This project delves into the relationship between community design and public health outcomes, specifically focusing on obesity and COPD rates. By examining the impact of walkability and design features on physical activity levels and overall health, this research aims to highlight the potential for community design to be a powerful tool in promoting public health and reducing healthcare burdens.
Facts
When adults in the US live in highly walkable neighborhoods they are more likely to engage in a proper amount of physical activity, walk more often, and have a lower BMI (Morris, 2023)
This is important because in numerous areas around the US past racial segregation and policies (like redlining) have caused a decrease in walkability, street connectivity, and green space in neighborhoods where lots of people of color live. (Morris, 2023)
COPD or Chronic obstructive pulmonary disease is a type of lung disease that causes obstructed airflow and coughing. People with this issue often struggle with heart disease, making the prevalence of COPD a possible indicator of a population’s overall heart health. (Mayo Clinic, 2020)
COPD and heart disease often occur together within an individual patient. And according to some research individuals with COPD are 2x more likely to develop cardiovascular issues. And smoking is often a contributing factor. (Harvard Health, 2022)
Load Libraries/Set Directory
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.2 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.0
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.2 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sf)
Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
New names:
Rows: 1406 Columns: 16
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(9): State, Census Tract, copdrates, 95% Confidence Interval, Confidence... dbl
(6): ...1, StateFIPS, CensusTract, Year, Number, parkdistancepopulation lgl
(1): ...11
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
walkability<-read_csv("marylandwalk.csv")
New names:
Rows: 3926 Columns: 118
── Column specification
──────────────────────────────────────────────────────── Delimiter: "," chr
(2): CSA_Name, CBSA_Name dbl (116): ...1, OBJECTID, GEOID10, GEOID20, STATEFP,
COUNTYFP, TRACTCE, BLK...
ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
Specify the column types or set `show_col_types = FALSE` to quiet this message.
• `` -> `...1`
Clean Datasets
Use Gsub to remove percent values
Visualization 1 (Walkability Data)
# This scatter plot shows the relation ship between the amount of working age people within a census tract (P_WrkAge) and the Walkability Index of a census tractggplot(walkability, aes(x = P_WrkAge, y = NatWalkInd)) +geom_point(alpha =0.5) +labs(x ="Workage", y ="Walkability Score") +theme_minimal()
Visual 2: Walkability Index Scores versus Number of Population that is working age
# Pct_AO2pggplot(walkability, aes(x = Pct_AO2p, y = NatWalkInd)) +geom_point(alpha =0.5) +labs(x ="Workage", y ="Walkability Score") +theme_minimal()
Visual 3: Proportion of population earning less than 1250 monthly and Walkability Index
ggplot(walkability, aes(x = E_LowWageWk, y = NatWalkInd)) +geom_point(alpha =0.5) +labs(x ="Workage", y ="Walkability Score") +theme_minimal()
Preliminary Linear Model
model2 <-lm(NatWalkInd ~ E_LowWageWk, data = walkability) # y ~ x represents dependent variable ~ independentsummary(model2)
Call:
lm(formula = NatWalkInd ~ E_LowWageWk, data = walkability)
Residuals:
Min 1Q Median 3Q Max
-11.778 -3.867 0.559 3.381 8.895
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.010e+01 7.240e-02 139.54 <2e-16 ***
E_LowWageWk 2.737e-03 2.171e-04 12.61 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.123 on 3924 degrees of freedom
Multiple R-squared: 0.03894, Adjusted R-squared: 0.03869
F-statistic: 159 on 1 and 3924 DF, p-value: < 2.2e-16
Histogram
ggplot(walkability, aes(x = R_MedWageWk)) +geom_histogram(binwidth =2, fill ="red", color ="black", alpha =0.7) +labs(title ="Income Distributions (less that 3300 but more than 1250 USD",y ="Frequency") +theme_minimal()
ggplot(moco_walkability, aes(x = Pct_AO2p, y = NatWalkInd)) +geom_point(alpha =0.5) +labs(x ="% of 2+ Car Owning Homes", y ="Walkability Score") +# stat_ellipse()# geom_smooth()theme_minimal()
Remove %’s from oldhousing column (this column represents the % of housing in a census tract built prior to 1980) and the COPD rates column
# Remove "%" symbol using gsubcommunity$oldhousing <-gsub("%", "", community$oldhousing)community$copdrates <-gsub("%", "", community$copdrates)# Convert the column to numeric (depending on later results)community$oldhousing <-as.numeric(community$oldhousing)community$copdrates <-as.numeric(community$copdrates)
ggplot(community, aes(x = community$parkdistancepopulation, y = community$Number)) +geom_bar(stat ="identity") +labs(title ="Number of People living Near Park per County",x ="Counties",y ="Frequenct") +theme_classic()
model3 <-lm(copdrates ~ oldhousing, data = community) # y ~ x represents dependent variable ~ independentsummary(model3)
Call:
lm(formula = copdrates ~ oldhousing, data = community)
Residuals:
Min 1Q Median 3Q Max
-4.9315 -1.2323 -0.3673 1.0400 10.1918
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.993742 0.126692 31.52 <2e-16 ***
oldhousing 0.024456 0.001935 12.64 <2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1.844 on 1266 degrees of freedom
(138 observations deleted due to missingness)
Multiple R-squared: 0.1121, Adjusted R-squared: 0.1114
F-statistic: 159.8 on 1 and 1266 DF, p-value: < 2.2e-16
# Model Analysis: The coefficient of 0.024 indicates that for each one-unit increase in the oldhousing variable there is a 0.024 increase for the COPD rates variable. Also the P-values are much smaller than 0.05 suggesting that there is a statistically significant relationship between housing age and COPD rates. However, the R-Squared value is on the lower end and signifies that the model explains about 11.2% of variance in the dependent variable. This suggests that other variables may have more of an impact. So I will likely use a multiple regression model to include more variables and make the model better. All of these values indicate a relationship between housing age and COPD rates in MD census tracts.
ggplot(community, aes(x = oldhousing, y = copdrates)) +geom_point(color ="red") +geom_smooth(method ="lm", aes(linetype ="Linear Model"), model = model3, color ="blue")
Warning in geom_smooth(method = "lm", aes(linetype = "Linear Model"), model =
model3, : Ignoring unknown parameters: `model`
labs(title ="Sample Scatter Plot",x ="% of Old Housing In Census Tract",y ="Crude Percent of COPD In Census Tract")
$x
[1] "% of Old Housing In Census Tract"
$y
[1] "Crude Percent of COPD In Census Tract"
$title
[1] "Sample Scatter Plot"
attr(,"class")
[1] "labels"
Data Analysis & Methodology
Guiding Question 1: Is there a measurable correlation between a community’s Walk Score (a standardized measure of walkability) and average resident BMI and COPD Rates?
Guiding Question 2: Can we identify specific design variables (e.g age of housing, distance from park, etc) within the community design dataset that are statistically associated with higher walkability scores (from the walkability dataset)?
Statistical Method (2): Multiple Linear Regression- analyzes the relationship between multiple independent variables (community design features like age of housing, park distance) and a single dependent variable (Walk Score).
Guiding Question 3: Is there a statistically significant correlation between a community’s median household income and its walkability score (from the walkability dataset)?
Statistical Method (3): Bootstrapping (If I want to see how confident I am in my findings)
Guiding Question 4: Are low income areas more likely to have people experiencing the public health factors (High COPD Rates and Obesity Rates)
Statistical Method (4): Chi-Square Test or Logistic Regression
Guiding Question 5: What are policy considerations that can be developed based on this data?