DATA 607 -  Communication in Data Science

Name - Amit Sunil Chaudhary

UID - 121232120

An Analysis of Elderly Protection and Economic Support Policies

Including Plots

Abstract

This project of DATA 607 Communication in Data Science looks at the relationship between the COVID-19 governments  support towards the general public in terms of financial support and how that affected the population mobility and subsequent disease spread. Based on the data from the COVID19 R package combined with Google mobility reports spanning 131 countries from 2020-2022. Here I analyzed how the elderly protection measures and economic support policies influenced the outdoor mobility patterns and case trajectories. In this project a multivariate analysis employs k-mean clustering to identify distinct policy response patterns, multiple regression models to quantify policy effects with uncertainty measures, and logistic regression classification with ROC analysis to predict high-case periods.

Introduction Data and Methods

The COVID-19 pandemic caused panic worldwide and  different governmental responses caused a chain reaction. Which in several cases led to good positive results and several total failures. Understanding the effectiveness of these interventions is crucial for informing future pandemic preparedness strategies. 


This analysis is pinpointed on two dimensions, one being the elderly protection measure and other being the economic support indices. On further deep dive I can see how these steps by the government  influenced the population mobility which is a key behavioral indicator linked to virus transmission and effectively leading to COVID-19 trajectories. 


This report includes - 

1.How do elderly protection policies correlate with outdoor mobility patterns?

2.Does economic support influence people’s ability to reduce their mobility?

3.What is the relationship between mobility and COVID-19 case counts?

Data and Methods


Data was obtained from the COVID19 R package, which aggregates information from multiple authoritative sources including Johns Hopkins University, Oxford COVID-19 Government Response Tracker, and Google Mobility Reports. The dataset contains 287,783 country-day observations covering 131 countries from January 2020 to October 2024.


Data Cleaning proceeding :-

  1. Converted  the data into columns to proper Date type and ensure the character columns.

  2. Removed the rows with missing country identifiers or dates and eliminated the duplicate records.

  3. Replacing the negative cumulative case/death counts with NA. 

  4. Capped extreme mobility values at -100% and flagged values exceeding +300% as potential errors. 

  5. Calculated daily cases from cumulative totals, 7-day moving averages, infection rates per 100,000 population, and lagged mobility variables

The final analysis dataset contained around 100K complete observations across 131 counties.

Variables

Outcome Variable - 

Daily COVID-19 cases, 7-day moving average of cases, binary high-case period indicator (above median cases per million).

Policy Predictors -

Elderly protection level (0-3 scale), economic support index (0-100), stringency index (0-100).

Mobility Variables - 

Outdoor mobility index (composite of retail, parks, and transit changes from baseline), residential mobility, and 14/21-day lagged versions.

Statistical Methods 

EDA -

The data shows the different factors evolved over time such as changes in the mobility patterns, sudden spikes in the COVID-19 cases and policy changes as well. It also indicates how they are related to each other.

K Means Clustering -

Countries were separated in clusters  based on average stringency, economic support, elderly protection, and mobility. The elbow method suggested k=4 clusters. Variables were standardized prior to clustering.

Regression Analysis -

The three models examined the poiciy effect on the mobility,  lagged mobility and interaction effects. All models had good confidence intervals and bootstrap confidence intervals for key coefficients. 

Classification - 

Logistic regression predicted binary high-case periods using lagged mobility and policy features. Model performance was evaluated using ROC curve analysis with AUC, along with sensitivity, specificity, and precision metrics on a held-out 30% test set.

Results

The dataset revealed substantial variation across all key variables. The outdoor mobility index ranged from -100% to over +200% change from baseline, with a mean near -5% indicating overall reduced mobility during the pandemic period. Daily cases showed extreme right-skew, necessitating log transformation for regression analysis. The stringency index averaged 45.5 and SD = 24.3, while economic support averaged 33.4 and SD = 32.1, reflecting considerable heterogeneity in government responses.

The boxplot analysis revealed clear monotonic relationships between elderly protection levels and mobility. Countries with no elderly protection measures showed average outdoor mobility of +22.4% above baseline SD was 45.7, while those with mandatory protection for all showed -20.1% and SD = 30.6.

The box plot for economic support strongly suggests that higher economic support is associated with a lower average outdoor mobility index, lending evidence to the hypothesis that receiving more economic support may allow or enable people to reduce their outdoor movement and stay at home.

The regression model Model 1 quantified these relationships. Each unit increase in elderly protection level was associated with a 5.86 percentage point decrease in outdoor mobility was 95% CI: -6.07 to -5.65, p < 0.001. The stringency index showed the strongest effect where β = -0.76, p < 0.001 , while economic support had a smaller but significant negative association where β = -0.12, p < 0.001. The model explained 32.6% of variance in outdoor mobility was R² = 0.326.

Model 2 examined the lagged relationship between mobility and cases. Outdoor mobility 14 days prior showed a small but significant negative association with log-transformed cases where β = -0.0066, 95% CI: -0.0071 to -0.0061. This counter-intuitive finding may reflect confounding countries with high case burdens implemented stricter measures that reduced mobility. The model’s low R² was 0.050 which indicates that lagged mobility alone explains little variance in case outcomes, highlighting the complex, multifactorial nature of disease transmission.

Model 3 examined interaction effects. The interaction between mobility and elderly protection was significant and positive where β = 0.0053, p < 0.001, indicating that the relationship between mobility and cases varied across protection levelsCluster 1 which was low response countries maintained high mobility but had the lowest case rates, potentially reflecting geographic isolation or early pandemic timing. Cluster 2 which was high economic support) showed reduced mobility but higher case rates, suggesting these were often high-income countries with extensive testing. Cluster 4 which was strict policies, low support achieved the greatest mobility reductions and lowest case rates among active-response clusters.

The logistic regression classifier achieved an AUC of 0.597 on the held-out test set, indicating modest discriminative ability. At the optimal threshold, the model achieved 57.8% accuracy, 58.2% sensitivity, and 57.4% specificity. While performance exceeded random chance, the moderate AUC suggests that policy and mobility features alone are insufficient for reliably predicting high-case periods, likely due to unmeasured confounders such as testing capacity, variant emergence, and population immunity.

Conclusions

Key findings from this analysis include:

1. Elderly protection policies showed strong, graded associations with reduced outdoor mobility, with each protection level corresponding to approximately 6 percentage points lower mobility.

2. Economic support demonstrated modest negative associations with mobility, potentially enabling populations to reduce movement without economic hardship.

3. K-means clustering identified four distinct national policy response patterns, revealing heterogeneity in the relationship between policy stringency, economic support, and outcomes.

4.The lagged mobility-case relationship was complex and confounded, with regression models explaining only 5% of variance in case outcomes.

5. Classification of high-case periods achieved moderate performance of AUC = 0.597, indicating policy and mobility features provide useful but incomplete predictive information.

These findings contribute to our understanding of pandemic policy effectiveness while highlighting the challenges of drawing causal conclusions from observational data during a complex, evolving public health emergency.