MATH 217 Final: Assesing The Relationship Between Community Design and Health Outcomes

Introduction

  1. Background

Chronic health issues like obesity and Chronic Obstructive Pulmonary Disease (COPD) are major concerns in the United States. Obesity rates currently sit at a staggering 42%, with associated healthcare costs exceeding $173 billion annually (CDC 2022). COPD, often linked to smoking and a risk factor for heart disease, further burdens the healthcare system. Interestingly, research suggests a connection between community design and the prevalence of these health problems. Studies have shown that adults residing in walkable neighborhoods with good street connectivity and green spaces tend to engage in more physical activity, have lower BMIs, and potentially experience better overall heart health (Morris, 2023) . However, historical policies like redlining have disproportionately impacted communities of color, often creating neighborhoods with limited walkability, hindering physical activity, and potentially contributing to higher health risks (Morris, 2023).

As previously mentioned this project will focus on one specific public health outcome: COPD Rates in a given census tract. COPD or Chronic obstructive pulmonary disease is a type of lung disease that causes obstructed airflow and coughing. These symptoms can be significantly worsened by obsesity (which can occure when one is not engaging in an active lifestyle). Excess weight puts a strain on the diaphragm, the muscle responsible for breathing, making it harder to take in air (CDC, 2023). Furthermore, fat deposits around the chest and abdomen can compress the lungs, further limiting their capacity. Additionally, People with this issue often struggle with heart disease, making the prevalence of COPD a possible indicator of a population’s overall heart health. (Mayo Clinic, 2020). This is because COPD and heart disease often occur together in an individual (Harvard Health, 2022).

Even though COPD is a deblitating condition that has horrible impacts on the individual and the Americans as a whole, it can be prevented with healthy diet and exercise. For people with COPD exercise is a crucial part of condition management and keeps patients out of the hospital since it increases circulation within the body (CDC, 2023). While, a healthy diet and an active lifestyle function as well-established measures for preventing and managing COPD and obesity, access to these measures is often unequal. People living in certain neighborhoods may have limited access to parks or safe, walkable streets. Additionally, poorly maintained older housing with cramped living quarters can further discourage physical activity. It has become increasingly difficult for people to move and live in ‘greener’ and ‘healthier’ areas as housing in these areas come at a premium price. In 2019 before the pandemic premiums were 35-45% higher compared to housing in less walable areas. (Chamberlain, 2023)

These social determinants of health create disparities in COPD, highlighting the need for further exploration to understand and address inequities in public health This project delves into the relationship between community design and public health outcomes, specifically focusing on COPD rates. By examining the impact of walkability and design features on physical activity levels and overall health, this research aims to highlight the potential for community design to be a powerful tool in promoting public health and reducing healthcare burdens.

  1. Data Sources

For this project 2 data sets are being used. The first is the EPA Walkability Index Data set (querried to Montgomery County, MD) and second is a merged data set using data from the CDC Public Health Tracking Network using their Data Explorer Tool. To get that CDC Data I querried the data within their system before filtering and exporting that into 3 CSV files that were merged to create the final ‘community’ data set used.

‘Sample Pathway: CDC Public Health Tracking Network->Data Explorer Tool Site->Step 1 (Select Content, Indicator and Measure) [Ex: Community Design, Access to Parks, Number of People living within 1 Mile of a Park]-> Step 2 (Select Geography) [Ex: State By Census Tracts], Step 3 (Geography) [Ex: Maryland], Step 4 (Time) [Ex: 2020], Step 5 (Advanced Options) [Ex: Distance to Parks: 1/2 Mile, Ethnicity], Step 6 (Export Data) [Ex: CSV]’

The data in the EPA Data set was collected by the EPA agency after nationwide data collection and analysis. And the data in the CDC data was collected by the CDC. Because these variables are standardized across all CDC and EPA offices I don’t anticipate much bias or issues within the data. As for specific statistics most of this project will utilize linear modeling.

2a. Variable Definitions

The EPA Walkability Dataset has over 100 variables so I will be defining the ones used in the project.

  1. NatWalkInd: This is a score between 0-20 that measures the walkability of a certain area. The higher the number the more walkable a location is.

  2. PWrkAge: Percent of population in a given census tract that is working aged 18 to 64 years. (2020 Census)

  3. E_LowWageWk: # of workers earning $1250/month or less (work location) (2020 Census)

  4. E_HiWageWk: # of workers earning $3333/month or more

  5. Pct_AO2p : Percent of two-plus-car households in area (Census 2020)

CDC ‘Community Data Set’ Variables included geospatial data that was not used for the overall project but I described some.

  1. Census Tract : Set of numbers identifying a census tract (Geospatial)
  2. County: Name of county (Variable added by student)
  3. COPD Rates: Percent of Population with COPD (modeled data by CDC)
  4. ‘OldHousing’ : Percent of housing built prior to 1980
  5. ‘Number’: The number of housing built prior to 1980
  6. “ParkDistancePop’: Number of Population living within 1/2 Mile of a Park
  7. ‘Value’: Percent of population living within 1/2 Mile of a Park

By examining variables like housing age, the percentage of the population living near green space, COPD rates, and the walkability index, we can investigate how these factors interrelate. This exploration can shed light on how environmental factors like neighborhood design can influence health outcomes. By understanding these connections, we can work towards creating communities that promote healthy lifestyles for all residents, regardless of zip code.

  1. Guiding Questions

These questions guide this investigation and will be answered through the Data Work conducted.

  1. What is the relationship between walkability and households with cars?
  2. Is there a relationship between walkability/greenspace and income (Variables: E_HiWageWk, NatWalkInd,E_LowWageWk)?
  3. Is there a relationship between COPD Rates and greenspace?
  4. Is there a relationship between COPD Rates and age of housing?
  5. Are there trends between what locations have the ‘better housing’ (are they located in certain counties or certain areas within MoCo?)

Data Work

EDA: Exploratory Data Analysis is first conducted to explore relationships between variables and get a general idea for what stories the data is telling.

  1. Load Libraries

    library(tidyverse)
    ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
    ✔ dplyr     1.1.2     ✔ readr     2.1.4
    ✔ forcats   1.0.0     ✔ stringr   1.5.0
    ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
    ✔ lubridate 1.9.2     ✔ tidyr     1.3.0
    ✔ purrr     1.0.2     
    ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
    ✖ dplyr::filter() masks stats::filter()
    ✖ dplyr::lag()    masks stats::lag()
    ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
    library(sf)
    Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
    library(pROC)
    Type 'citation("pROC")' for a citation.
    
    Attaching package: 'pROC'
    
    The following objects are masked from 'package:stats':
    
        cov, smooth, var
    setwd("/Users/blossomanyanwu/Documents/MATH 217 HM")
  2. Load Data Sets+Clean+Merge

    community<-read_csv("finalmerge.csv")
    New names:
    Rows: 1406 Columns: 16
    ── Column specification
    ──────────────────────────────────────────────────────── Delimiter: "," chr
    (9): State, Census Tract, copdrates, 95% Confidence Interval, Confidence... dbl
    (6): ...1, StateFIPS, CensusTract, Year, Number, parkdistancepopulation lgl
    (1): ...11
    ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
    Specify the column types or set `show_col_types = FALSE` to quiet this message.
    • `` -> `...1`
    walkability<- read_csv("marylandwalk.csv")
    New names:
    Rows: 3926 Columns: 118
    ── Column specification
    ──────────────────────────────────────────────────────── Delimiter: "," chr
    (2): CSA_Name, CBSA_Name dbl (116): ...1, OBJECTID, GEOID10, GEOID20, STATEFP,
    COUNTYFP, TRACTCE, BLK...
    ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
    Specify the column types or set `show_col_types = FALSE` to quiet this message.
    • `` -> `...1`
    # These next line merges the % of people living near a park with the 'community data set'
    parkpercent<-read_csv("parkpercent.csv") 
    New names:
    Rows: 1463 Columns: 9
    ── Column specification
    ──────────────────────────────────────────────────────── Delimiter: "," chr
    (4): State, Census Tract, Value, Distance to Parks dbl (3): StateFIPS,
    CensusTract, Year lgl (2): Data Comment, ...8
    ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
    Specify the column types or set `show_col_types = FALSE` to quiet this message.
    • `` -> `...8`
    # Full join using dplyr
    community <- full_join(community, parkpercent, by = c("StateFIPS", "State", "CensusTract", "Year", "Distance to Parks", "Census Tract"))
    # Remove %s using Gsub
     community$oldhousing <- gsub("%", "", community$oldhousing)
    community$copdrates <- gsub("%", "", community$copdrates)
    community$oldhousing <- as.numeric(community$oldhousing)
    community$copdrates <- as.numeric(community$copdrates)
    Warning: NAs introduced by coercion
    community$Value <- gsub("%", "", community$Value)
    community$Value <- as.numeric(community$Value)
    # Remove Empty Columns
    community <- community[, !(names(community) %in% "Data Comment")]
    community <- community[, !(names(community) %in% "...11")]
    community <- community[, !(names(community) %in% "...8")]
    # Define a dictionary to map starting codes to county names
    county_codes <- c('24001' = 'Allegheny', '24003' = 'Anne Arrundel', '24510' = 'Baltimore City','24005' = 'Baltimore', '24009' = 'Calvert', '24013' = 'Carroll', '24015' = 'Cecil', '24017' = 'Charles', '24019' = 'Dorchester', '24021' = 'Fredrick', '24023' = 'Garrett', '24025' = 'Harford', '24027' = 'Howard', '24031' = 'Montgomery', '24033' = 'P.G County', '24035' = 'Queen Anne','24039' = 'Somorsett','24037' = 'St. Mary','24041' = 'Talbot','24043' = 'Washington','24045' = 'Wicomico', '24047' = 'Worcestor', '24029' = 'Kent County')
    
    # Create a new column 'County' based on the starting 4 digits of 'CensusTract'
    community$County <- sapply(community$CensusTract, function(x) county_codes[substr(x, 1, 5)])
  3. Basic Plots + Observations

Summary Statistics (‘NatWalkInd’ and ‘Value’)

# Walkability Mean
moco_walkability <- walkability %>%
  filter(COUNTYFP == 3)
mean_walk <- mean(moco_walkability$NatWalkInd)
print(mean_walk)
[1] 9.288462
# 'Value' (Percent Near Parks) Mean
community <- community[!is.na(community$Value), ]
mean_park <- mean(community$Value)
print(mean_park)
[1] 75.01141

Observation: The mean walkability score in Montgomery County is 9/20 suggesting that walkability here is below average and more can be done to improve the situation. The mean of people who live within 1/2 to 1 mile near a park in Montgomery County is 76%, showing that a large amount of MoCo residents can possibly access a park.

```{r}
# Visual 1 (Walkability Data Set): This scatter plot shows the relationship between the amount of working age people within a census tract (P_WrkAge) and the Walkability Index of a census tract
ggplot(moco_walkability, aes(x = P_WrkAge, y = NatWalkInd)) +
  geom_point(alpha = 0.5) +
  labs(x = "Workage", y = "Walkability Score", title = "Working Age Vs Walkability") +
  theme_minimal()
```

Observations: The majority of census tracts in Montgomery County have between 50-75% of people being within the working age (18-64) so it is difficult to determine a relationship based on this alone. However, it appears that some areas with more working populations are more walkeable.

```{r}
# Visual 2: Proportion of population earning less than 1250 monthly and Walkability Index
ggplot(moco_walkability, aes(x = E_LowWageWk, y = NatWalkInd)) +
  geom_point(alpha = 0.5) +
  labs(x = "Number of People in E_LowWageWk", y = "Walkability Score", title = "Low Income Rates Vs Walkability") +
  theme_minimal()
```

Observations: Montgomery County census tracts in general have a low amount of documented individuals earning less than 1250 monthly. However, in areas were that number of low wage earning people is higher the walkability scores are often below the median (9.00)

```{r}
# Visual 3: County Comparison Bar Graphs
community <- community[!is.na(community$copdrates), ]
copd_averages <- community %>%
  group_by(County) %>%
  summarise(Average_COPD_Rate = mean(copdrates))
ggplot(copd_averages, aes(x = County, y = Average_COPD_Rate)) +
geom_bar(stat = "identity", fill = "blue") +  
labs(title = "Average COPD Rate per County", x = "County", y = "Average COPD Rate") +
theme_classic()
```

Observation: Montgomery County has the second lowest COPD Rates in the state. When compared statewide health factors in Montgomery County appear to be better, however within the county there are disparities that need to be explored.

# Visual 4: County Comparison Housing Age
    community <- community[!is.na(community$oldhousing), ]
    oldhousing1 <- community %>%
  group_by(County) %>%
  summarise(avg_house = mean(oldhousing))
    ggplot(oldhousing1, aes(x = County, y = avg_house)) +
    geom_bar(stat = "identity", fill = "blue")+  
    labs(title = "Average % Of Old Housing", x = "County", y = "Average COPD Rate") +
    theme_classic()

Observations: Over 50% of housing in Montgomery County has been built prior to the year 1980 which can be an issue. When compared to other counties MoCo did not perform as well and still has a higher average that several counties.

Guiding Questions And Statistical Methods:

Guiding Theme: How Does Community Design Impact Health+Health Access

  1. Is there a relationship between the COPD Rates and the age of housing in Maryland and Moco? (Linear Regression Model)
# People living closer to green space and parks have more opportunity to exercise which is an important health factor in preventing COPD. And in Montgomery County a place where a mean of 76.39% of the people live near a park its important to understand.

modelwalk <- lm(copdrates ~ oldhousing, data = community)  # y ~ x represents dependent variable ~ independent
summary(modelwalk)

Call:
lm(formula = copdrates ~ oldhousing, data = community)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.9315 -1.2323 -0.3673  1.0400 10.1918 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 3.993742   0.126692   31.52   <2e-16 ***
oldhousing  0.024456   0.001935   12.64   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.844 on 1266 degrees of freedom
  (5 observations deleted due to missingness)
Multiple R-squared:  0.1121,    Adjusted R-squared:  0.1114 
F-statistic: 159.8 on 1 and 1266 DF,  p-value: < 2.2e-16

Model Analysis (Statewide):

  1. Intercept is 3.993. This means the average COPD rate when the percent of housing built before 1980 is 0, is 3.99.
  2. Oldhousing coefficient (0.024456): Indicates a positive association. In simple terms for every unit increase in oldhousing rates, there is a 0.0244 increase in COPD Rates within a given census tract.
  3. P-Values (<2e-16): Less than 0.05 suggesting statistical significance.
  4. R-squared values (0.1121 and 0.114) show that around 11% of variance in COPD can be explained by percent of old housing.
# Now same model is conducted for MoCo instead of the whole state

# Specify my target county
target_county <- "Montgomery"

# Subset the data for the target county
moco_comm <- community %>%
  filter(County == target_county)
modelmoco <- lm(copdrates ~ oldhousing, data = moco_comm)  # y ~ x represents dependent variable ~ independent
summary(modelmoco)

Call:
lm(formula = copdrates ~ oldhousing, data = moco_comm)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.7611 -0.5312 -0.1415  0.4527  6.6521 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 3.590294   0.151766  23.657  < 2e-16 ***
oldhousing  0.006242   0.002352   2.654  0.00861 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9562 on 194 degrees of freedom
Multiple R-squared:  0.03504,   Adjusted R-squared:  0.03006 
F-statistic: 7.044 on 1 and 194 DF,  p-value: 0.008612

Model Analysis Pt2

  1. Intercept is 3.590294: This means the average COPD rate in Montgomery County communities when the percent of old housing is 0%.

  2. Oldhousing coefficient (0.006242): This indicates a positive association, but weaker than the previous models which included all counties For every unit increase in oldhousing, there’s an estimated increase of 0.006242 in COPD rates.

  3. P-Values: 0.00861 and 2e-16. Both are less than 0.05 showing statistical significance

  4. Model is weaker for MoCo so other variables must be at play

  5. How do the variables of Housing Age and % Population Near Park impact the COPD Rates? What is the amount of variance that these variable account for? (Multiple Regression Analysis)

# Maryland/Statewide Model
finalmodel<-lm(formula = copdrates ~ oldhousing + Value, data = community)
# MoCo Model with same variables
moco2<-lm(formula = copdrates ~ oldhousing + Value, data = moco_comm)
summary(moco2)

Call:
lm(formula = copdrates ~ oldhousing + Value, data = moco_comm)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.5752 -0.5421 -0.0954  0.3569  5.2706 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.600334   0.551446  11.969  < 2e-16 ***
oldhousing   0.008790   0.002230   3.941 0.000113 ***
Value       -0.033065   0.005856  -5.646  5.8e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.8881 on 193 degrees of freedom
Multiple R-squared:  0.1718,    Adjusted R-squared:  0.1632 
F-statistic: 20.02 on 2 and 193 DF,  p-value: 1.255e-08

The models generated were low in significance so I moved to develop a more specific model.

# Best Model
finalmodel<-lm(formula = copdrates ~ oldhousing + Value + Number + parkdistancepopulation, data = community)
summary(finalmodel)

Call:
lm(formula = copdrates ~ oldhousing + Value + Number + parkdistancepopulation, 
    data = community)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.5862 -1.0351 -0.2293  0.8854  9.7649 

Coefficients:
                         Estimate Std. Error t value Pr(>|t|)    
(Intercept)             5.217e+00  1.474e-01  35.383  < 2e-16 ***
oldhousing              1.043e-02  3.084e-03   3.383 0.000739 ***
Value                   2.981e-03  2.637e-03   1.130 0.258537    
Number                  9.821e-04  1.451e-04   6.770 1.97e-11 ***
parkdistancepopulation -4.981e-04  4.785e-05 -10.409  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.684 on 1263 degrees of freedom
  (5 observations deleted due to missingness)
Multiple R-squared:  0.2618,    Adjusted R-squared:  0.2594 
F-statistic:   112 on 4 and 1263 DF,  p-value: < 2.2e-16

Model Analysis: This model is the strongest out of all the ones created. It has a multiple R-squared of 25% meaning it accounts for 25% of the variance seen in the data.

  1. Does living in an area with more low wage workers ensure someone lives in a more walkable location? (Bootstrapping)
# Create Variable
# Reasoning 0.25 or 1/4 is a significant amount of a population to be low income/low wage earning
walkability$Income_Level <- ifelse(walkability$R_PCTLOWWAGE > 0.2500000, "siglow", "minlow")
library(infer)
diff_mean_ci <- walkability |>
# Specify measure vs variable
specify(NatWalkInd ~ Income_Level) |>
# Generate 1500 bootstrap replicates
generate(reps = 1500, type = "bootstrap") |>
# Calculate the difference in means, minimal low wage workers vs significant low wage workers
calculate(stat = "diff in means", order = c("minlow", "siglow"))
# Calculate the 95% CI via percentile method
diff_mean_ci |>
get_confidence_interval(level = 0.95, type = "percentile")
# A tibble: 1 × 2
  lower_ci upper_ci
     <dbl>    <dbl>
1    -1.64   -0.992

Observation : Since both the lower and upper bounds of the confidence interval are negative, this suggests that the “siglow” group (significant low wage workers) has a lower average value on the “NatWalkInd” variable compared to the “minlow” group (minimal low wage workers) with 95% confidence. In other words, census tracts with a greater amount of low wage workers tend to have lower walkability scores.

  1. Is there an association between county and walkability score (MoCo, PG, Baltimore, Howard, Alleghany)? (Chi-Squared Test of Independence/Walkability Data)
# Filter data based on COUNTYFP values
desired_values <- c(33, 31, 27, 510, 5)
filtered_df <- walkability[walkability$COUNTYFP %in% desired_values, ]
# Turn Numerical FP into character strings
filtered_df$COUNTYFP <- as.character(filtered_df$COUNTYFP)
filtered_df |>
ggplot(aes(x = NatWalkInd, fill = COUNTYFP)) +
# Add bar layer of proportions
geom_bar()

Perform Chi Squared

library(tidymodels)
── Attaching packages ────────────────────────────────────── tidymodels 1.1.1 ──
✔ broom        1.0.5     ✔ rsample      1.2.0
✔ dials        1.2.0     ✔ tune         1.1.2
✔ modeldata    1.3.0     ✔ workflows    1.1.3
✔ parsnip      1.1.1     ✔ workflowsets 1.0.1
✔ recipes      1.0.9     ✔ yardstick    1.3.0
── Conflicts ───────────────────────────────────────── tidymodels_conflicts() ──
✖ scales::discard() masks purrr::discard()
✖ dplyr::filter()   masks stats::filter()
✖ recipes::fixed()  masks stringr::fixed()
✖ dplyr::lag()      masks stats::lag()
✖ yardstick::spec() masks readr::spec()
✖ recipes::step()   masks stats::step()
• Use suppressPackageStartupMessages() to eliminate package startup messages
obs <- filtered_df |>
select(Income_Level, COUNTYFP) |>
table()


obs |>
# Tidy the table
tidy() |>
# Expand out the counts
uncount(n)
Warning: 'tidy.table' is deprecated.
Use 'tibble::as_tibble()' instead.
See help("Deprecated")
# A tibble: 2,473 × 2
   Income_Level COUNTYFP
   <chr>        <chr>   
 1 minlow       27      
 2 minlow       27      
 3 minlow       27      
 4 minlow       27      
 5 minlow       27      
 6 minlow       27      
 7 minlow       27      
 8 minlow       27      
 9 minlow       27      
10 minlow       27      
# ℹ 2,463 more rows

ANOVA Does Income level/Presence of low wage workers Impact Walkability?

aov_county_ <- aov(NatWalkInd ~ Income_Level, data = walkability)
#summary(aov_delay_origin)
# Tidy the model
tidy(aov_county_)
# A tibble: 2 × 6
  term            df  sumsq meansq statistic   p.value
  <chr>        <dbl>  <dbl>  <dbl>     <dbl>     <dbl>
1 Income_Level     1   978.  978.       56.1  8.51e-14
2 Residuals     3924 68438.   17.4      NA   NA       
ggplot(community, aes(x = oldhousing, y = copdrates)) +
  geom_point(color = "blue") +  # Add data points
  geom_smooth(model = finalmodel, color = "red") +  # Add regression line
  labs(title = "Housing Vs COPD (With Final Model)",
       x = "X Variable", y = "Y Variable")
Warning in geom_smooth(model = finalmodel, color = "red"): Ignoring unknown
parameters: `model`
`geom_smooth()` using method = 'gam' and formula = 'y ~ s(x, bs = "cs")'
Warning: Removed 5 rows containing non-finite values (`stat_smooth()`).
Warning: Removed 5 rows containing missing values (`geom_point()`).

Conclusions

Through the data exploration numerous observations were found that helped one understand how community design is impacted health variables and how financial variables impact community design. In a general sense areas with more low wage workers and are not doing as well economically will have lower walkability scores. This could be for numerous reasons, like possibly there less money going towards taxes which in turn helps with community upkeep.

The first half of this exploration was dedicated to understanding how community design implacts health factors like COPD. Through linear modeling and multiple linear models it is to be understood that these communities factors (housing age,park distance) have a significant impact on COPD. Once that understanding was established I moved towards understanding walkability and what factors impact that. One very interesting finding was that when an area has a significant amount of low wage workers (>0.25%) the Walkability Index Score has a 95% chance of being lower when compared to census tracts with a smaller amount of low wage workers. This highlights the disparity and inequity within community design factors in Maryland and MoCo in general.

Overall the exploration went well. Because the data I chose lacked non-geospatial categorical variables it was difficult for me to find statistical methods to use other than linear and multiple regression analysis. As a result I often had to create variables or add columns with categorical variables. This was quite difficult as I had to create data dictionaries and alter the data set. I felt confident that doing this did not negatively affect the integrity of the data, as I worked with the numerica geospatial data and made it categorical. For instance using the FIPS codes I added county names. And with the PCT_LowWrkAge variable I added a variable expanding on those observations. I wish my data in general had more categorical observations. Because instead of using stats to answer more questions I worked backwards, developing questions for each of the 4 statistical methods I was required to use. Which hindered my statistical analysis.

After the general exploration I found that I still had some questions. Since the model I made only accounted for 27% of the variance I wonder what other variables within the CDC Public Health Tracking Network could have had a greater impact.

Bibliography

CDC. (2022, May 17). Adult Obesity Facts. Centers for Disease Control and Prevention; CDC. https://www.cdc.gov/obesity/data/adult.html

Chamberlain, L. (2023, March). Why walkable urban areas are efficient economic areas in the US. World Economic Forum. https://www.weforum.org/agenda/2023/03/why-walkable-urban-areas-are-america-s-efficient-economic-engines/#:~:text=Walkable%20urban%20areas%20have%20a%20price%20premium&text=The%20rent%20or%20sales%20premiums

Cleveland Clinic. (2022). COPD Exercises Can Keep You Out of the Hospital. Cleveland Clinic. https://health.clevelandclinic.org/have-copd-exercise-helps-keep-you-out-of-the-hospital

Harvard Health. (2022). Understanding COPD from a cardiovascular perspective. Harvard Health. https://www.health.harvard.edu/heart-health/understanding-copd-from-a-cardiovascular-perspective#:~:text=Heart%20disease%20and%20COPD%20often

Mayo Clinic. (2020, April 15). COPD. Mayo Clinic; Mayo Clinic. https://www.mayoclinic.org/diseases-conditions/copd/symptoms-causes/syc-20353679

Morris, V. (2023, February 2). US Neighborhood Walkability Influences Physical Activity, BMI Levels. Www.bu.edu. https://www.bu.edu/sph/news/articles/2023/us-neighborhood-walkability-influences-physical-activity-bmi-levels/