Data source

The Centers for Disease Control Social Vulnerability Index shows which communities are especially at risk during public health emergencies because of factors like socioeconomic status, household composition, racial composition of neighborhoods, or housing type and transportation. The CDC SVI uses 15 U.S. census variables to identify communities that may need support before, during, or after disasters. Learn more here.

The condition is the overall ranking of four social theme rankings where lower values indicate high vulnerability and high values indicate low vulnerability.

Quintiles for this condition were determined for all the Census tracts in King County. Quintile 1 is the most vulnerable residents, Quintile 5 is the least vulnerable residents.

Data is released every 2 years following the American Community Survey release in December of the year following the Survey. The most recent data for 2018 was downloaded from the ATSDR website.



# Load dataset
cdc <- read.csv("CDC_Social_Vulnerability_Index__CDCSVI_.csv")

# Drop `the_geom` : a Well Known Text (WKT) column
cdc <- cdc %>% 
  select(- the_geom)

# Show 5 rows
head(cdc)
##   Condition Condition_TotalPop F_TOTAL   FeatureID Geography Name_Geography
## 1     56.06           2708.819       0 53033001702    County    King County
## 2     30.21           2030.414       1 53033000401    County    King County
## 3     71.17           2849.647       0 53033001701    County    King County
## 4     65.00           2742.350       0 53033029307    County    King County
## 5     57.38           1975.593       0 53033029306    County    King County
## 6     92.58           4391.995       0 53033002900    County    King County
##   NotSociallyVulnerable OBJECTID Quintile RPL_Themes Shape__Area Shape__Length
## 1                     0       19        3     0.4394    13779307      17476.41
## 2                     0        4        2     0.6979    14673110      15741.89
## 3                     0       18        3     0.2883     9609032      17164.46
## 4                     0      279        3     0.3500    41663148      26347.76
## 5                     0      278        3     0.4262    27509913      20992.20
## 6                     0       30        5     0.0742    10614781      13366.32
##   TotalPopulation WeightedAvgQuintile Year
## 1            4832           0.6342126 2018
## 2            6721           0.3792233 2018
## 3            4004           0.6342126 2018
## 4            4219           0.6342126 2018
## 5            3443           0.6342126 2018
## 6            4744           0.9520116 2018


Data Exploration


# Histogram of Total Population
hist(cdc$TotalPopulation, main="Total Population Histogram", col="blue", xlab= "Total Population", ylab = "Frequency")

ggplot(cdc, aes(x = Shape__Area)) +
  geom_histogram(fill = "yellow", color = "black", bins = 30) +
  scale_x_log10() +
  labs(
    title = "Distribution of Census Tract Areas",
    x = "Shape Area (Log Transform)",
    y = "Count"
  ) +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"))


# Plot of condition of total population by quintile
ggplot(data = cdc, aes(x= Quintile, y = Condition_TotalPop))+
  geom_col(fill = "lightgreen") + 
  labs(
    title = "Total Population Condition Level",
    subtitle = "By Quintile",
    x = "Quintile",
    y = "Total Condition of Population"
  ) + 
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold"),
    plot.subtitle = element_text(hjust = 0.5, face = "bold")
  )



It appears that larger and more developed areas tend to have more resources, which lowers vulnerability. Conversely, small populations, likely rural areas, are highly vulnerable due to lack of resources.


RPL_Themes represents the overall social vulnerability score for each census tract, combining socioeconomic status, household composition, minority status, and housing/transportation factors into a single percentile ranking. To understand its usage, higher values indicate greater vulnerability relative to other tracts.

# Plot condition of total population vs shape area
ggplot(data = cdc, aes(x= RPL_Themes, y = Condition_TotalPop))+
  geom_point( color = "black", alpha = 0.7, size= 2) + 
  stat_smooth(method = "lm", col = "red", se = TRUE) +
  labs(
    title = "RPL Themes vs Condition",
    x = "RPL Themes",
    y = " Condition of Total Population"
  ) + 
  theme_classic() +
  theme(
    plot.title = element_text(hjust = 0.5, face = "bold")
  )

The scatterplot shows a clear negative relationship between overall social vulnerability (RPL_Themes and Condition_TotalPop). More vulnerable tracts tend to have lower condition values. This mirrors the pattern in the quintile barplot, where the most vulnerable groups consistently showed lower population condition levels.


ggplot(cdc, aes(x = factor(Quintile), y = Condition)) +
  geom_boxplot(fill = "green") +
  labs(
    title = "Condition Distribution by SVI Quintile",
    x = "Social Vulnerability Index Quintile",
    y = "Condition"
  ) +
  theme_classic() +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"))

Less vulnerable communities (where Quintile = 5) tend to have better Condition values.