The Impacts of COVID-19 Policy on People’s Subjective Well-being: A Clustering Analysis

1. Variables used for clustering and the rationale behind their selection To cluster the data set, I selected the following variables: Life.Satisfaction, school_closing_x, workplace_closing_x, cancel_events_x, stay_home_restrictions_x, and demo_age. * Rationale: The primary goal of this analysis is to measure the impact of COVID-19 policies on people’s sense of well-being. Therefore, Life.Satisfaction serves as the core well-being metric. The four policy variables (school closures, workplace closures, canceled events, and stay-at-home restrictions) represent the spectrum of external, government-mandated disruptions to daily life. Finally, I included demo_age because age is a critical demographic factor that historically influenced both a person’s vulnerability to the virus and how drastically these policy shifts impacted their lifestyle and psychological state.

2. Process for Hierarchical and K-means Clustering * Data Preparation: First, I subsetted the data to include only the relevant variables, forced the data into numeric formats, removed any rows with missing values (na.omit) to ensure clean execution, and standardized the variables using the scale() function so that variables with larger numerical ranges (like age) wouldn’t disproportionately dominate the distance calculations. * Hierarchical Clustering: I calculated the Euclidean distances between data points and applied agglomerative hierarchical clustering using Ward’s method (ward.D). I plotted a dendrogram and cut the tree into \(k=3\) clusters. To evaluate the stability of these clusters, I used the clusterboot function with 100 bootstrap resamples. * K-Means Clustering: I first utilized the Mclust package to identify a suggested optimal number of clusters (which suggested 5 components). However, to accurately compare the two methodologies, I forced the K-Means algorithm to fit \(k=3\) clusters using a maximum of 100 iterations and 100 random starts (nstart=100). Finally, I evaluated the stability of this model using clusterboot with 100 resamples.

3. Comparison of Results I did not get the same results regarding the reliability and stability of the clusters. * The Hierarchical clustering yielded highly stable results. The bootstrap evaluation returned Average Jaccard values of 0.996, 0.978, and 0.978. According to the established threshold (AvgJaccard > 0.85), all three of these clusters are considered highly stable. * Conversely, the K-Means clustering produced largely unstable results for a 3-cluster solution. Its bootstrap evaluation returned Average Jaccard values of 0.458, 0.478, and 0.884. Two out of the three clusters fell well below the 0.60 threshold for stability, indicating that the K-Means algorithm struggled to find consistent groupings in this specific multidimensional space.

4. Selection of the “Better” Solution and Group Descriptions Based on the overwhelmingly superior bootstrap stability scores, the Hierarchical Clustering solution is unequivocally the “better” method for this dataset. Based on the partitioning of the hierarchical model, the respondents can be categorized into three distinct groups: * Group 1: The Heavily Disrupted. This group generally represents individuals who experienced the highest convergence of strict policy measures (simultaneous workplace closures, school closures, and strict stay-at-home orders). Their life satisfaction scores tend to reflect the strain of high environmental disruption. * Group 2: The Moderately Restricted. This group consists of individuals who faced moderate policy interventions (e.g., event cancellations and workplace adjustments, but perhaps softer stay-at-home restrictions). * Group 3: The Least Impacted. This group represents respondents who experienced the lowest levels of direct policy-based disruption to their daily routines, which correlates with distinct demographic profiles (often varied by age) and different baseline subjective well-being compared to Group 1.

5. Insights on the Survey Respondents This exercise revealed that the population responding to the “wellbeing after COVID” survey is not monolithic. People’s subjective well-being is intricately tied to the specific combination of restrictions they faced. The strong performance of hierarchical clustering suggests that the impacts of COVID-19 policies are nested and cumulative—the psychological impact compounds as layers of restrictions (school + work + home) are added. It shows that policy impact is highly segmented, likely dividing populations along the lines of geographical severity and generational (age) vulnerabilities.

Appendix: R Code Used

# Setup and loading packages
library(cluster)
library(fpc)
library(mclust)
library(dplyr)

# Load data
covid_data <- read.csv(file.choose(), header=TRUE)

# Select variables
mydata <- covid_data %>%
  select(Life.Satisfaction, school_closing_x, workplace_closing_x, 
         cancel_events_x, stay_home_restrictions_x, demo_age)

# Force numeric conversion and clean missing values
mydata <- as.data.frame(lapply(mydata, as.numeric))
mydata <- na.omit(mydata)

# Scale data
mydata_scaled <- scale(mydata)

# --- Hierarchical Clustering ---
distances <- dist(mydata_scaled, method="euclidean")
hc_fit <- hclust(distances, method="ward.D")

# Visualize dendrogram
plot(hc_fit, main="Hierarchical Clustering Dendrogram", xlab="", sub="", cex=0.9)
clusters_hc <- cutree(hc_fit, k=3)
rect.hclust(hc_fit, k=3, border="red")

# Evaluate stability
hc_boot <- clusterboot(mydata_scaled, B=100, clustermethod=hclustCBI, method="ward.D", k=3, count=FALSE)
print(hc_boot$bootmean)

# --- K-Means Clustering ---
# Guess optimal clusters
guess <- Mclust(mydata_scaled)
print(summary(guess))

# Fit K-Means (k=3 for comparison)
clusters_k <- 3 
k_fit <- kmeans(mydata_scaled, centers=clusters_k, iter.max=100, nstart=100)

# Visualize K-Means
clusplot(mydata_scaled, k_fit$cluster, color=TRUE, shade=TRUE, labels=2, lines=0, main="K-Means Cluster Plot")

# Evaluate stability
km_boot <- clusterboot(mydata_scaled, B=100, clustermethod=kmeansCBI, k=clusters_k, count=FALSE)
print(km_boot$bootmean)

Clustering Analysis Report

Yaohui

2026-04-14

The Impacts of COVID-19 Policy on People’s Subjective Well-being: A Clustering Analysis

Appendix: R Code Used