Erin Wright

Opening Thoughts

Vaccines have recently began circulating after the FDA issued EUA (Emergency Use Authorization) Approval1 for the Pfizer and Moderna vaccines. Many employers have launched surveys to gauge employee support and are strongly suggesting vaccination or in some case have announced plans to require vaccination.

This analysis explores data available from the VAERS (Vaccine Adverse Event Reporting System)2 related to COVID-19.

This data set does not provide a way to compare the total number of vaccines administered to the number of adverse events reported. I think a good next step would be to incorporate this data somehow that I found available here3.

Also, I did not normalize the data for each company; Pfizer is obviously larger and distributing more therefore you cannot use this analysis to say one company has more or less X than the other without further data prep.

The WorldOMeters is a recommended source to check out current COVID-19 case statistics. Currently (as of 2021-01-30) the death rate of COVID-19 in the USA is 1339 persons per 1 Million population or about 0.13%. If you want to look at those numbers flat, you get a 2% of confirmed cases resulted in death thus far for the USA and 2.16% for the world.

Overview

The VAERS data contains 2213 events reported from January 1, 2021 to January 22, 2021 and 92.77% of these are from the new COVID-19 vaccines.

Of the 2053 reported adverse events related to COVID-19, 9.79% resulted in life-threatening illness and 272 persons or 13.25% of reported cases died.

Delivery Comparisons4

Number of Vaccines

Demographics

Life-threatening and Death

Life-threatening

Demographics

Reported Death

Demographics

What could be contributing?

Correlations

There appear to be no strong correlations between any variables in general for who reported…

…but age and length of hospital stay appear to be correlated with deaths.

Model

I decided to explore a little further and apply a model to this to see if there are enough variables in this data set to build a predictive model. I did not spend time researching or testing to find the best model. Based on the categorical nature of the data, I picked a the SVM5 model to apply and it did not disappoint.

## Support Vector Machines with Linear Kernel 
## 
## 1438 samples
##   80 predictor
##    2 classes: 'No', 'Yes' 
## 
## No pre-processing
## Resampling: Cross-Validated (10 fold) 
## Summary of sample sizes: 1295, 1294, 1293, 1295, 1295, 1294, ... 
## Resampling results:
## 
##   Accuracy   Kappa    
##   0.9346438  0.6775986
## 
## Tuning parameter 'C' was held constant at a value of 1

I cleaned up the data set a little to run it through (shared below) and the SVM model was able to produce an accuracy of 93.82%.

Looking at the ROC curve plot, we don’t see a tight curve in the top left corner indicating high specificity and sensitivity so this is an ‘pretty good’ but not great model.

The confusion matrix indicates that the model did well in actual prediction of true values but also had some false negatives. The p-value also indicates we do have statistically significant results.

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No  523  27
##        Yes  11  54
##                                           
##                Accuracy : 0.9382          
##                  95% CI : (0.9162, 0.9559)
##     No Information Rate : 0.8683          
##     P-Value [Acc > NIR] : 1.405e-08       
##                                           
##                   Kappa : 0.7051          
##                                           
##  Mcnemar's Test P-Value : 0.01496         
##                                           
##             Sensitivity : 0.9794          
##             Specificity : 0.6667          
##          Pos Pred Value : 0.9509          
##          Neg Pred Value : 0.8308          
##              Prevalence : 0.8683          
##          Detection Rate : 0.8504          
##    Detection Prevalence : 0.8943          
##       Balanced Accuracy : 0.8230          
##                                           
##        'Positive' Class : No              
## 

I wanted to see what factors could be important. Interestingly, a persons age and how the vaccine was administered (intramuscular, subcutaneous etc.) were the most important predictors.

To illustrate that there was variance in delivery I plotted this as well, though I think the data recorded for this variable could be misleading. This is supported by the data interpretation guide6. For example, a syringe is used for IM (intra-muscular delivery) and for others. This leaves age and if allergies were noted as likely factors.

Data Preparation Notes

Model Clean Data

Data Updated through: 01/22/2021