The Centers for Disease Control Social Vulnerability Index shows which communities are especially at risk during public health emergencies because of factors like socioeconomic status, household composition, racial composition of neighborhoods, or housing type and transportation. The CDC SVI uses 15 U.S. census variables to identify communities that may need support before, during, or after disasters. Learn more here.
The condition is the overall ranking of four social theme rankings where lower values indicate high vulnerability and high values indicate low vulnerability.
Quintiles for this condition were determined for all the Census tracts in King County. Quintile 1 is the most vulnerable residents, Quintile 5 is the least vulnerable residents.
Data is released every 2 years following the American Community Survey release in December of the year following the Survey. The most recent data for 2018 was downloaded from the ATSDR website.
# Load dataset
cdc <- read.csv("CDC_Social_Vulnerability_Index__CDCSVI_.csv")
# Drop `the_geom` : a Well Known Text (WKT) column
cdc <- cdc %>%
select(- the_geom)
# Show 5 rows
head(cdc)
## Condition Condition_TotalPop F_TOTAL FeatureID Geography Name_Geography
## 1 56.06 2708.819 0 53033001702 County King County
## 2 30.21 2030.414 1 53033000401 County King County
## 3 71.17 2849.647 0 53033001701 County King County
## 4 65.00 2742.350 0 53033029307 County King County
## 5 57.38 1975.593 0 53033029306 County King County
## 6 92.58 4391.995 0 53033002900 County King County
## NotSociallyVulnerable OBJECTID Quintile RPL_Themes Shape__Area Shape__Length
## 1 0 19 3 0.4394 13779307 17476.41
## 2 0 4 2 0.6979 14673110 15741.89
## 3 0 18 3 0.2883 9609032 17164.46
## 4 0 279 3 0.3500 41663148 26347.76
## 5 0 278 3 0.4262 27509913 20992.20
## 6 0 30 5 0.0742 10614781 13366.32
## TotalPopulation WeightedAvgQuintile Year
## 1 4832 0.6342126 2018
## 2 6721 0.3792233 2018
## 3 4004 0.6342126 2018
## 4 4219 0.6342126 2018
## 5 3443 0.6342126 2018
## 6 4744 0.9520116 2018
# Histogram of Total Population
hist(cdc$TotalPopulation, main="Total Population Histogram", col="blue", xlab= "Total Population", ylab = "Frequency")
ggplot(cdc, aes(x = Shape__Area)) +
geom_histogram(fill = "yellow", color = "black", bins = 30) +
scale_x_log10() +
labs(
title = "Distribution of Census Tract Areas",
x = "Shape Area (Log Transform)",
y = "Count"
) +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
# Plot of condition of total population by quintile
ggplot(data = cdc, aes(x= Quintile, y = Condition_TotalPop))+
geom_col(fill = "lightgreen") +
labs(
title = "Total Population Condition Level",
subtitle = "By Quintile",
x = "Quintile",
y = "Total Condition of Population"
) +
theme_classic() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold"),
plot.subtitle = element_text(hjust = 0.5, face = "bold")
)
It appears that larger and more developed areas tend to have more resources, which lowers vulnerability. Conversely, small populations, likely rural areas, are highly vulnerable due to lack of resources.
RPL_Themes represents the overall social vulnerability
score for each census tract, combining socioeconomic status, household
composition, minority status, and housing/transportation factors into a
single percentile ranking. To understand its usage, higher values
indicate greater vulnerability relative to other tracts.
# Plot condition of total population vs shape area
ggplot(data = cdc, aes(x= RPL_Themes, y = Condition_TotalPop))+
geom_point( color = "black", alpha = 0.7, size= 2) +
stat_smooth(method = "lm", col = "red", se = TRUE) +
labs(
title = "RPL Themes vs Condition",
x = "RPL Themes",
y = " Condition of Total Population"
) +
theme_classic() +
theme(
plot.title = element_text(hjust = 0.5, face = "bold")
)
The scatterplot shows a clear negative relationship between overall social vulnerability (RPL_ThemesandCondition_TotalPop). More vulnerable tracts tend to have lower condition values. This mirrors the pattern in the quintile barplot, where the most vulnerable groups consistently showed lower population condition levels.
ggplot(cdc, aes(x = factor(Quintile), y = Condition)) +
geom_boxplot(fill = "green") +
labs(
title = "Condition Distribution by SVI Quintile",
x = "Social Vulnerability Index Quintile",
y = "Condition"
) +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5, face = "bold"))
Less vulnerable communities (where
Quintile= 5) tend to have better Condition values.