Covid-19 Data Analysis

Introduction

This report primarily analyzes COVID-19 data between 2020 and 2024 from the CDC, focusing on the correlations between vaccinations, seasonal trends, and travel patterns to Covid-19 cases and Covid-19 deaths. A second dataset was used for travel patters. For your convenience, below are the summary statistics for the Covid-19 dataset

summary(df)
##    iso_code          continent           location              date           
##  Length:429435      Length:429435      Length:429435      Min.   :2020-01-01  
##  Class :character   Class :character   Class :character   1st Qu.:2021-03-05  
##  Mode  :character   Mode  :character   Mode  :character   Median :2022-04-20  
##                                                           Mean   :2022-04-21  
##                                                           3rd Qu.:2023-06-08  
##                                                           Max.   :2024-08-14  
##                                                                               
##   total_cases          new_cases        new_cases_smoothed  total_deaths    
##  Min.   :        0   Min.   :       0   Min.   :      0    Min.   :      0  
##  1st Qu.:     6281   1st Qu.:       0   1st Qu.:      0    1st Qu.:     43  
##  Median :    63653   Median :       0   Median :     12    Median :    799  
##  Mean   :  7365292   Mean   :    8017   Mean   :   8041    Mean   :  81260  
##  3rd Qu.:   758272   3rd Qu.:       0   3rd Qu.:    313    3rd Qu.:   9574  
##  Max.   :775866783   Max.   :44236227   Max.   :6319461    Max.   :7057132  
##  NA's   :17631       NA's   :19276      NA's   :20506      NA's   :17631    
##    new_deaths        new_deaths_smoothed total_cases_per_million
##  Min.   :     0.00   Min.   :    0.000   Min.   :     0         
##  1st Qu.:     0.00   1st Qu.:    0.000   1st Qu.:  1916         
##  Median :     0.00   Median :    0.000   Median : 29146         
##  Mean   :    71.85   Mean   :   72.061   Mean   :112096         
##  3rd Qu.:     0.00   3rd Qu.:    3.143   3rd Qu.:156770         
##  Max.   :103719.00   Max.   :14817.000   Max.   :763599         
##  NA's   :18827       NA's   :20057       NA's   :17631          
##  new_cases_per_million new_cases_smoothed_per_million total_deaths_per_million
##  Min.   :     0.0      Min.   :    0.00               Min.   :   0.00         
##  1st Qu.:     0.0      1st Qu.:    0.00               1st Qu.:  24.57         
##  Median :     0.0      Median :    2.79               Median : 295.09         
##  Mean   :   122.4      Mean   :  122.71               Mean   : 835.51         
##  3rd Qu.:     0.0      3rd Qu.:   56.25               3rd Qu.:1283.82         
##  Max.   :241758.2      Max.   :34536.89               Max.   :6601.11         
##  NA's   :19276         NA's   :20506                  NA's   :17631           
##  new_deaths_per_million new_deaths_smoothed_per_million reproduction_rate
##  Min.   :  0.000        Min.   :  0.000                 Min.   :-0.07    
##  1st Qu.:  0.000        1st Qu.:  0.000                 1st Qu.: 0.72    
##  Median :  0.000        Median :  0.000                 Median : 0.95    
##  Mean   :  0.762        Mean   :  0.765                 Mean   : 0.91    
##  3rd Qu.:  0.000        3rd Qu.:  0.357                 3rd Qu.: 1.14    
##  Max.   :893.655        Max.   :127.665                 Max.   : 5.87    
##  NA's   :18827          NA's   :20057                   NA's   :244618   
##   icu_patients    icu_patients_per_million hosp_patients   
##  Min.   :    0    Min.   :  0.0            Min.   :     0  
##  1st Qu.:   21    1st Qu.:  2.3            1st Qu.:   186  
##  Median :   90    Median :  6.4            Median :   776  
##  Mean   :  661    Mean   : 15.7            Mean   :  3912  
##  3rd Qu.:  413    3rd Qu.: 18.8            3rd Qu.:  3051  
##  Max.   :28891    Max.   :180.7            Max.   :154497  
##  NA's   :390319   NA's   :390319           NA's   :388779  
##  hosp_patients_per_million weekly_icu_admissions
##  Min.   :   0.0            Min.   :   0.0       
##  1st Qu.:  31.0            1st Qu.:  17.0       
##  Median :  74.2            Median :  92.0       
##  Mean   : 126.0            Mean   : 317.9       
##  3rd Qu.: 159.8            3rd Qu.: 353.0       
##  Max.   :1526.8            Max.   :4838.0       
##  NA's   :388779            NA's   :418442       
##  weekly_icu_admissions_per_million weekly_hosp_admissions
##  Min.   :  0.0                     Min.   :     0        
##  1st Qu.:  1.5                     1st Qu.:   223        
##  Median :  4.6                     Median :   864        
##  Mean   :  9.7                     Mean   :  4292        
##  3rd Qu.: 12.7                     3rd Qu.:  3893        
##  Max.   :225.0                     Max.   :153977        
##  NA's   :418442                    NA's   :404938        
##  weekly_hosp_admissions_per_million  total_tests           new_tests       
##  Min.   :  0.0                      Min.   :         0   Min.   :       1  
##  1st Qu.: 23.7                      1st Qu.:    364660   1st Qu.:    2244  
##  Median : 56.3                      Median :   2067330   Median :    8783  
##  Mean   : 82.6                      Mean   :  21104573   Mean   :   67285  
##  3rd Qu.:110.0                      3rd Qu.:  10248295   3rd Qu.:   37229  
##  Max.   :717.1                      Max.   :9214000000   Max.   :35855632  
##  NA's   :404938                     NA's   :    350048   NA's   :354032    
##  total_tests_per_thousand new_tests_per_thousand new_tests_smoothed
##  Min.   :    0.0          Min.   :  0.0          Min.   :       0  
##  1st Qu.:   43.6          1st Qu.:  0.3          1st Qu.:    1486  
##  Median :  234.1          Median :  1.0          Median :    6570  
##  Mean   :  924.3          Mean   :  3.3          Mean   :  142178  
##  3rd Qu.:  894.4          3rd Qu.:  2.9          3rd Qu.:   32205  
##  Max.   :32925.8          Max.   :531.1          Max.   :14769984  
##  NA's   :350048           NA's   :354032         NA's   :325470    
##  new_tests_smoothed_per_thousand positive_rate    tests_per_case     
##  Min.   :  0.0                   Min.   :0.0      Min.   :      1.0  
##  1st Qu.:  0.2                   1st Qu.:0.0      1st Qu.:      7.1  
##  Median :  0.9                   Median :0.1      Median :     17.5  
##  Mean   :  2.8                   Mean   :0.1      Mean   :   2403.6  
##  3rd Qu.:  2.6                   3rd Qu.:0.1      3rd Qu.:     54.6  
##  Max.   :147.6                   Max.   :1.0      Max.   :1023631.9  
##  NA's   :325470                  NA's   :333508   NA's   :335087     
##  tests_units        total_vaccinations    people_vaccinated   
##  Length:429435      Min.   :          0   Min.   :         0  
##  Class :character   1st Qu.:    1970788   1st Qu.:   1050028  
##  Mode  :character   Median :   14394348   Median :   6900885  
##                     Mean   :  561697983   Mean   : 248706410  
##                     3rd Qu.:  116197175   3rd Qu.:  50932936  
##                     Max.   :13578774356   Max.   :5631263739  
##                     NA's   :     344018   NA's   :    348303  
##  people_fully_vaccinated total_boosters       new_vaccinations  
##  Min.   :         1      Min.   :         1   Min.   :       0  
##  1st Qu.:    964400      1st Qu.:    602288   1st Qu.:    2010  
##  Median :   6191345      Median :   5765384   Median :   20531  
##  Mean   : 228663910      Mean   : 150581058   Mean   :  739864  
##  3rd Qu.:  47731850      3rd Qu.:  40188947   3rd Qu.:  173612  
##  Max.   :5177942957      Max.   :2817381093   Max.   :49673198  
##  NA's   :    351374      NA's   :    375835   NA's   :358464    
##  new_vaccinations_smoothed total_vaccinations_per_hundred
##  Min.   :       0          Min.   :  0.0                 
##  1st Qu.:     279          1st Qu.: 44.8                 
##  Median :    3871          Median :130.6                 
##  Mean   :  283876          Mean   :124.3                 
##  3rd Qu.:   31803          3rd Qu.:195.0                 
##  Max.   :43691814          Max.   :410.2                 
##  NA's   :234406            NA's   :344018                
##  people_vaccinated_per_hundred people_fully_vaccinated_per_hundred
##  Min.   :  0.0                 Min.   :  0.0                      
##  1st Qu.: 27.9                 1st Qu.: 21.2                      
##  Median : 64.3                 Median : 57.9                      
##  Mean   : 53.5                 Mean   : 48.7                      
##  3rd Qu.: 77.8                 3rd Qu.: 73.6                      
##  Max.   :129.1                 Max.   :126.9                      
##  NA's   :348303                NA's   :351374                     
##  total_boosters_per_hundred new_vaccinations_smoothed_per_million
##  Min.   :  0.0              Min.   :     0                       
##  1st Qu.:  5.9              1st Qu.:   106                       
##  Median : 35.9              Median :   605                       
##  Mean   : 36.3              Mean   :  1851                       
##  3rd Qu.: 57.6              3rd Qu.:  2402                       
##  Max.   :150.5              Max.   :117113                       
##  NA's   :375835             NA's   :234406                       
##  new_people_vaccinated_smoothed new_people_vaccinated_smoothed_per_hundred
##  Min.   :       0               Min.   : 0.00                             
##  1st Qu.:      43               1st Qu.: 0.00                             
##  Median :     771               Median : 0.01                             
##  Mean   :  106071               Mean   : 0.07                             
##  3rd Qu.:    9307               3rd Qu.: 0.07                             
##  Max.   :21071266               Max.   :11.71                             
##  NA's   :237258                 NA's   :237258                            
##  stringency_index population_density   median_age    aged_65_older   
##  Min.   :  0.00   Min.   :    0.14   Min.   :15.10   Min.   : 1.14   
##  1st Qu.: 22.22   1st Qu.:   37.73   1st Qu.:22.20   1st Qu.: 3.53   
##  Median : 42.85   Median :   88.12   Median :29.70   Median : 6.29   
##  Mean   : 42.88   Mean   :  394.07   Mean   :30.46   Mean   : 8.68   
##  3rd Qu.: 62.04   3rd Qu.:  222.87   3rd Qu.:38.70   3rd Qu.:13.93   
##  Max.   :100.00   Max.   :20546.77   Max.   :48.20   Max.   :27.05   
##  NA's   :233245   NA's   :68943      NA's   :94772   NA's   :106165  
##  aged_70_older   gdp_per_capita     extreme_poverty  cardiovasc_death_rate
##  Min.   : 0.53   Min.   :   661.2   Min.   : 0.10    Min.   : 79.37       
##  1st Qu.: 2.06   1st Qu.:  4227.6   1st Qu.: 0.60    1st Qu.:175.70       
##  Median : 3.87   Median : 12294.9   Median : 2.50    Median :245.46       
##  Mean   : 5.49   Mean   : 18904.2   Mean   :13.92    Mean   :264.64       
##  3rd Qu.: 8.64   3rd Qu.: 27216.4   3rd Qu.:21.40    3rd Qu.:333.44       
##  Max.   :18.49   Max.   :116935.6   Max.   :77.60    Max.   :724.42       
##  NA's   :98120   NA's   :101143     NA's   :217439   NA's   :100570       
##  diabetes_prevalence female_smokers    male_smokers    handwashing_facilities
##  Min.   : 0.99       Min.   : 0.10    Min.   : 7.7     Min.   :  1.19        
##  1st Qu.: 5.35       1st Qu.: 1.90    1st Qu.:22.6     1st Qu.: 20.86        
##  Median : 7.20       Median : 6.30    Median :33.1     Median : 49.54        
##  Mean   : 8.56       Mean   :10.77    Mean   :33.1     Mean   : 50.65        
##  3rd Qu.:10.79       3rd Qu.:19.30    3rd Qu.:41.5     3rd Qu.: 82.50        
##  Max.   :30.53       Max.   :44.00    Max.   :78.1     Max.   :100.00        
##  NA's   :83524       NA's   :182270   NA's   :185618   NA's   :267694        
##  hospital_beds_per_thousand life_expectancy human_development_index
##  Min.   : 0.10              Min.   :53.28   Min.   :0.39           
##  1st Qu.: 1.30              1st Qu.:69.50   1st Qu.:0.60           
##  Median : 2.50              Median :75.05   Median :0.74           
##  Mean   : 3.11              Mean   :73.70   Mean   :0.72           
##  3rd Qu.: 4.21              3rd Qu.:79.46   3rd Qu.:0.83           
##  Max.   :13.80              Max.   :86.75   Max.   :0.96           
##  NA's   :138746             NA's   :39136   NA's   :110308         
##    population         excess_mortality_cumulative_absolute
##  Min.   :        47   Min.   : -37726.1                   
##  1st Qu.:    523798   1st Qu.:    176.5                   
##  Median :   6336393   Median :   6815.2                   
##  Mean   : 152033640   Mean   :  56047.7                   
##  3rd Qu.:  32969520   3rd Qu.:  39128.0                   
##  Max.   :7975105024   Max.   :1349776.4                   
##                       NA's   :416024                      
##  excess_mortality_cumulative excess_mortality
##  Min.   :-44.2               Min.   :-95.9   
##  1st Qu.:  2.1               1st Qu.: -1.5   
##  Median :  8.1               Median :  5.7   
##  Mean   :  9.8               Mean   : 10.9   
##  3rd Qu.: 15.2               3rd Qu.: 15.6   
##  Max.   : 78.1               Max.   :378.2   
##  NA's   :416024              NA's   :416024  
##  excess_mortality_cumulative_per_million
##  Min.   :-2936.5                        
##  1st Qu.:  116.9                        
##  Median : 1270.8                        
##  Mean   : 1772.7                        
##  3rd Qu.: 2883.0                        
##  Max.   :10293.5                        
##  NA's   :416024

Effect of Vaccination on New COVID-19 Cases

This is a line chart depicting the correlation between new vaccinations and new cases. The x-axis shows dates, and the y-axis shows counts per million people.

Most vaccinations were administered between the start and end of 2021. Interestingly, the largest spike of Covid-19 occurred after the mass vaccinations of 2021, and when vaccination rates dwindled. One explanation is that cases went up because vaccination rates went down. However, vaccinations are not a short-term solution but are instead meant to last for years, if not a lifetime. This could mean that the vaccinations were at best only temporarily effective in preventing Covid-19.

lineplot <- df %>%
  group_by(date) %>%
  summarise(
    newcasespermil = sum(new_cases_per_million, na.rm = TRUE),
    newvaccinationspermil = sum(new_vaccinations_smoothed_per_million, na.rm = TRUE)
  )

p1 <- ggplot(lineplot, aes(x = date)) +
  geom_line(aes(y = newcasespermil, color = "New Cases Per Million"), size = 1) +
  geom_line(aes(y = newvaccinationspermil, color = "New Vaccinations Per Million"), size = 1) +
  scale_color_manual(values = c("New Cases Per Million" = "red", "New Vaccinations Per Million" = "blue")) +
  labs(title = "Effect of Vaccination on New COVID-19 Cases", 
       x = "Date", 
       y = "Per Million People", 
       color = "Legend") +
  theme_minimal() +
  scale_x_date(
    date_labels = "%b %Y",  # Format: "Jan 2020"
    date_breaks = "6 months"  # Show labels every 6 months
  ) +
  theme(legend.position = "top")

p1

8-Season Analysis of COVID-19 Cases

This grouped bar chart presents the correlation between seasons of each year and Covid-19 Cases. Seasons are divided into an early and late phase, for specificity. 19

Covid-19 is a virus that is killed by sunlight. Furthermore, biological analysis shows that Covid-19 thrives best in colder whether, allowing for mutations. Finally, the human body is weakend by colder temperatures. Based on this information, it is worthwile to look at the correlation between seasons and Covid-19 cases.

As expected, the cases were highest during the Late Winter, Early Winter, and Early Spring seasons, for almost each year. During these times, the length of the day is the shortest. One explanation for why Late Fall does not have high cases is that the effect of temperature on the human body is delayed, and only after prolonged exposure to the cold does one’s immune system weaken.

df <- df %>%
  mutate(season = case_when(
    month(date) == 12 ~ "Early Winter",
    month(date) %in% c(1, 2) ~ "Late Winter",
    month(date) %in% c(3, 4) ~ "Early Spring",
    month(date) %in% c(5) ~ "Late Spring",
    month(date) %in% c(6, 7) ~ "Early Summer",
    month(date) %in% c(8) ~ "Late Summer",
    month(date) %in% c(9, 10) ~ "Early Fall",
    month(date) %in% c(11) ~ "Late Fall"
  ))

seasonal_data <- df %>%
  group_by(year = year(date), season) %>%
  summarise(
    total_cases = sum(new_cases, na.rm = TRUE),
    total_deaths = sum(new_deaths, na.rm = TRUE)
  ) %>%
  ungroup()

season_order <- c("Early Winter", "Late Winter", "Early Spring", "Late Spring",
                  "Early Summer", "Late Summer", "Early Fall", "Late Fall")
seasonal_data$season <- factor(seasonal_data$season, levels = season_order)

p2 <- ggplot(seasonal_data, aes(x = season, y = total_cases, fill = as.factor(year))) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "8-Season Analysis of COVID-19 Cases",
    x = "Season",
    y = "Total Cases",
    fill = "Year"
  ) +
  theme_minimal() +
  theme(legend.position = "top", axis.text.x = element_text(angle = 45, hjust = 1))

p2

8-Season Analysis of COVID-19 Deaths

This is a grouped line chart, that is almost identical to the previous graph. It differs in that it shows Deaths correlated with Seasons, rather than Cases correlated with seasons.

As Covid-19 began to rapidly spread, scientists were worried about the increased risk of deaths amongst the homeless population as a result of lower hygiene, knowledge, and possibilty of quarantine. However, the opposite was true, and homeless people were among the least effected by Covid-19. The commonly accepted theory is that those living outside have the highest rates of sunlight exposure, and consequently, vitamin D levels, which have been shown to prevent side effects of Covid-19, including death.

Just as with cases in the previous chart, for each year deaths were highest in the seasons with limited sunlight, with peaks in late winter. As previously Vitamin D defficiencies are highest during these times, and immune defense mechanisms are weakened.

p3 <- ggplot(seasonal_data, aes(x = season, y = total_deaths, fill = as.factor(year))) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(
    title = "8-Season Analysis of COVID-19 Deaths",
    x = "Season",
    y = "Total Deaths",
    fill = "Year"
  ) +
  theme_minimal() +
  theme(legend.position = "top", axis.text.x = element_text(angle = 45, hjust = 1))

p3

Correlation Between Vaccination Rates and Case Fatality Rates

This bubble chart uses points to show the relationship between vaccination rates and case fatality rates.

This graph differs from the first one in that it aims to see if vaccinations were effective in lowering mortality rates, rather than new cases. Although there are many outliers, there is some correlation betwene the two. As more vaccinations were administerd, less people died from Covid-19. Becuase increased vaccinations are correlated with time, it is possible again that mortality decreased because of herd immunity or an exacerbation of older/weaker people dying.

bubble_data <- df %>%
  group_by(people_vaccinated_per_hundred) %>%
  summarise(
    case_fatality_rate = mean(total_deaths / total_cases, na.rm = TRUE),
    count = n()
  )

p5 <- ggplot(bubble_data, aes(x = people_vaccinated_per_hundred, y = case_fatality_rate, size = count)) +
  geom_point(color = "blue", alpha = 0.6) +
  labs(
    title = "Correlation Between Vaccination Rates and Case Fatality Rates",
    x = "Vaccination Rate (per hundred)",
    y = "Case Fatality Rate",
    size = "Number of Observations"
  ) +
  theme_minimal() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
    axis.title = element_text(size = 12),
    axis.text = element_text(size = 10)
  )

p5

Conclusion

This analysis highlights the impact of vaccinations, seasonal trends, and travel patterns on COVID-19 cases and deaths. Key findings include: - Vaccinations were effective in reducing case fatality rates but had a limited impact on preventing new cases. - Seasonal trends showed higher cases and deaths during winter months, likely due to lower vitamin D levels and weakened immune systems. - Air travel was strongly correlated with COVID-19 cases during the early stages of the pandemic but less so as herd immunity developed.