This report primarily analyzes COVID-19 data between 2020 and 2024 from the CDC, focusing on the correlations between vaccinations, seasonal trends, and travel patterns to Covid-19 cases and Covid-19 deaths. A second dataset was used for travel patters. For your convenience, below are the summary statistics for the Covid-19 dataset
summary(df)
## iso_code continent location date
## Length:429435 Length:429435 Length:429435 Min. :2020-01-01
## Class :character Class :character Class :character 1st Qu.:2021-03-05
## Mode :character Mode :character Mode :character Median :2022-04-20
## Mean :2022-04-21
## 3rd Qu.:2023-06-08
## Max. :2024-08-14
##
## total_cases new_cases new_cases_smoothed total_deaths
## Min. : 0 Min. : 0 Min. : 0 Min. : 0
## 1st Qu.: 6281 1st Qu.: 0 1st Qu.: 0 1st Qu.: 43
## Median : 63653 Median : 0 Median : 12 Median : 799
## Mean : 7365292 Mean : 8017 Mean : 8041 Mean : 81260
## 3rd Qu.: 758272 3rd Qu.: 0 3rd Qu.: 313 3rd Qu.: 9574
## Max. :775866783 Max. :44236227 Max. :6319461 Max. :7057132
## NA's :17631 NA's :19276 NA's :20506 NA's :17631
## new_deaths new_deaths_smoothed total_cases_per_million
## Min. : 0.00 Min. : 0.000 Min. : 0
## 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 1916
## Median : 0.00 Median : 0.000 Median : 29146
## Mean : 71.85 Mean : 72.061 Mean :112096
## 3rd Qu.: 0.00 3rd Qu.: 3.143 3rd Qu.:156770
## Max. :103719.00 Max. :14817.000 Max. :763599
## NA's :18827 NA's :20057 NA's :17631
## new_cases_per_million new_cases_smoothed_per_million total_deaths_per_million
## Min. : 0.0 Min. : 0.00 Min. : 0.00
## 1st Qu.: 0.0 1st Qu.: 0.00 1st Qu.: 24.57
## Median : 0.0 Median : 2.79 Median : 295.09
## Mean : 122.4 Mean : 122.71 Mean : 835.51
## 3rd Qu.: 0.0 3rd Qu.: 56.25 3rd Qu.:1283.82
## Max. :241758.2 Max. :34536.89 Max. :6601.11
## NA's :19276 NA's :20506 NA's :17631
## new_deaths_per_million new_deaths_smoothed_per_million reproduction_rate
## Min. : 0.000 Min. : 0.000 Min. :-0.07
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.72
## Median : 0.000 Median : 0.000 Median : 0.95
## Mean : 0.762 Mean : 0.765 Mean : 0.91
## 3rd Qu.: 0.000 3rd Qu.: 0.357 3rd Qu.: 1.14
## Max. :893.655 Max. :127.665 Max. : 5.87
## NA's :18827 NA's :20057 NA's :244618
## icu_patients icu_patients_per_million hosp_patients
## Min. : 0 Min. : 0.0 Min. : 0
## 1st Qu.: 21 1st Qu.: 2.3 1st Qu.: 186
## Median : 90 Median : 6.4 Median : 776
## Mean : 661 Mean : 15.7 Mean : 3912
## 3rd Qu.: 413 3rd Qu.: 18.8 3rd Qu.: 3051
## Max. :28891 Max. :180.7 Max. :154497
## NA's :390319 NA's :390319 NA's :388779
## hosp_patients_per_million weekly_icu_admissions
## Min. : 0.0 Min. : 0.0
## 1st Qu.: 31.0 1st Qu.: 17.0
## Median : 74.2 Median : 92.0
## Mean : 126.0 Mean : 317.9
## 3rd Qu.: 159.8 3rd Qu.: 353.0
## Max. :1526.8 Max. :4838.0
## NA's :388779 NA's :418442
## weekly_icu_admissions_per_million weekly_hosp_admissions
## Min. : 0.0 Min. : 0
## 1st Qu.: 1.5 1st Qu.: 223
## Median : 4.6 Median : 864
## Mean : 9.7 Mean : 4292
## 3rd Qu.: 12.7 3rd Qu.: 3893
## Max. :225.0 Max. :153977
## NA's :418442 NA's :404938
## weekly_hosp_admissions_per_million total_tests new_tests
## Min. : 0.0 Min. : 0 Min. : 1
## 1st Qu.: 23.7 1st Qu.: 364660 1st Qu.: 2244
## Median : 56.3 Median : 2067330 Median : 8783
## Mean : 82.6 Mean : 21104573 Mean : 67285
## 3rd Qu.:110.0 3rd Qu.: 10248295 3rd Qu.: 37229
## Max. :717.1 Max. :9214000000 Max. :35855632
## NA's :404938 NA's : 350048 NA's :354032
## total_tests_per_thousand new_tests_per_thousand new_tests_smoothed
## Min. : 0.0 Min. : 0.0 Min. : 0
## 1st Qu.: 43.6 1st Qu.: 0.3 1st Qu.: 1486
## Median : 234.1 Median : 1.0 Median : 6570
## Mean : 924.3 Mean : 3.3 Mean : 142178
## 3rd Qu.: 894.4 3rd Qu.: 2.9 3rd Qu.: 32205
## Max. :32925.8 Max. :531.1 Max. :14769984
## NA's :350048 NA's :354032 NA's :325470
## new_tests_smoothed_per_thousand positive_rate tests_per_case
## Min. : 0.0 Min. :0.0 Min. : 1.0
## 1st Qu.: 0.2 1st Qu.:0.0 1st Qu.: 7.1
## Median : 0.9 Median :0.1 Median : 17.5
## Mean : 2.8 Mean :0.1 Mean : 2403.6
## 3rd Qu.: 2.6 3rd Qu.:0.1 3rd Qu.: 54.6
## Max. :147.6 Max. :1.0 Max. :1023631.9
## NA's :325470 NA's :333508 NA's :335087
## tests_units total_vaccinations people_vaccinated
## Length:429435 Min. : 0 Min. : 0
## Class :character 1st Qu.: 1970788 1st Qu.: 1050028
## Mode :character Median : 14394348 Median : 6900885
## Mean : 561697983 Mean : 248706410
## 3rd Qu.: 116197175 3rd Qu.: 50932936
## Max. :13578774356 Max. :5631263739
## NA's : 344018 NA's : 348303
## people_fully_vaccinated total_boosters new_vaccinations
## Min. : 1 Min. : 1 Min. : 0
## 1st Qu.: 964400 1st Qu.: 602288 1st Qu.: 2010
## Median : 6191345 Median : 5765384 Median : 20531
## Mean : 228663910 Mean : 150581058 Mean : 739864
## 3rd Qu.: 47731850 3rd Qu.: 40188947 3rd Qu.: 173612
## Max. :5177942957 Max. :2817381093 Max. :49673198
## NA's : 351374 NA's : 375835 NA's :358464
## new_vaccinations_smoothed total_vaccinations_per_hundred
## Min. : 0 Min. : 0.0
## 1st Qu.: 279 1st Qu.: 44.8
## Median : 3871 Median :130.6
## Mean : 283876 Mean :124.3
## 3rd Qu.: 31803 3rd Qu.:195.0
## Max. :43691814 Max. :410.2
## NA's :234406 NA's :344018
## people_vaccinated_per_hundred people_fully_vaccinated_per_hundred
## Min. : 0.0 Min. : 0.0
## 1st Qu.: 27.9 1st Qu.: 21.2
## Median : 64.3 Median : 57.9
## Mean : 53.5 Mean : 48.7
## 3rd Qu.: 77.8 3rd Qu.: 73.6
## Max. :129.1 Max. :126.9
## NA's :348303 NA's :351374
## total_boosters_per_hundred new_vaccinations_smoothed_per_million
## Min. : 0.0 Min. : 0
## 1st Qu.: 5.9 1st Qu.: 106
## Median : 35.9 Median : 605
## Mean : 36.3 Mean : 1851
## 3rd Qu.: 57.6 3rd Qu.: 2402
## Max. :150.5 Max. :117113
## NA's :375835 NA's :234406
## new_people_vaccinated_smoothed new_people_vaccinated_smoothed_per_hundred
## Min. : 0 Min. : 0.00
## 1st Qu.: 43 1st Qu.: 0.00
## Median : 771 Median : 0.01
## Mean : 106071 Mean : 0.07
## 3rd Qu.: 9307 3rd Qu.: 0.07
## Max. :21071266 Max. :11.71
## NA's :237258 NA's :237258
## stringency_index population_density median_age aged_65_older
## Min. : 0.00 Min. : 0.14 Min. :15.10 Min. : 1.14
## 1st Qu.: 22.22 1st Qu.: 37.73 1st Qu.:22.20 1st Qu.: 3.53
## Median : 42.85 Median : 88.12 Median :29.70 Median : 6.29
## Mean : 42.88 Mean : 394.07 Mean :30.46 Mean : 8.68
## 3rd Qu.: 62.04 3rd Qu.: 222.87 3rd Qu.:38.70 3rd Qu.:13.93
## Max. :100.00 Max. :20546.77 Max. :48.20 Max. :27.05
## NA's :233245 NA's :68943 NA's :94772 NA's :106165
## aged_70_older gdp_per_capita extreme_poverty cardiovasc_death_rate
## Min. : 0.53 Min. : 661.2 Min. : 0.10 Min. : 79.37
## 1st Qu.: 2.06 1st Qu.: 4227.6 1st Qu.: 0.60 1st Qu.:175.70
## Median : 3.87 Median : 12294.9 Median : 2.50 Median :245.46
## Mean : 5.49 Mean : 18904.2 Mean :13.92 Mean :264.64
## 3rd Qu.: 8.64 3rd Qu.: 27216.4 3rd Qu.:21.40 3rd Qu.:333.44
## Max. :18.49 Max. :116935.6 Max. :77.60 Max. :724.42
## NA's :98120 NA's :101143 NA's :217439 NA's :100570
## diabetes_prevalence female_smokers male_smokers handwashing_facilities
## Min. : 0.99 Min. : 0.10 Min. : 7.7 Min. : 1.19
## 1st Qu.: 5.35 1st Qu.: 1.90 1st Qu.:22.6 1st Qu.: 20.86
## Median : 7.20 Median : 6.30 Median :33.1 Median : 49.54
## Mean : 8.56 Mean :10.77 Mean :33.1 Mean : 50.65
## 3rd Qu.:10.79 3rd Qu.:19.30 3rd Qu.:41.5 3rd Qu.: 82.50
## Max. :30.53 Max. :44.00 Max. :78.1 Max. :100.00
## NA's :83524 NA's :182270 NA's :185618 NA's :267694
## hospital_beds_per_thousand life_expectancy human_development_index
## Min. : 0.10 Min. :53.28 Min. :0.39
## 1st Qu.: 1.30 1st Qu.:69.50 1st Qu.:0.60
## Median : 2.50 Median :75.05 Median :0.74
## Mean : 3.11 Mean :73.70 Mean :0.72
## 3rd Qu.: 4.21 3rd Qu.:79.46 3rd Qu.:0.83
## Max. :13.80 Max. :86.75 Max. :0.96
## NA's :138746 NA's :39136 NA's :110308
## population excess_mortality_cumulative_absolute
## Min. : 47 Min. : -37726.1
## 1st Qu.: 523798 1st Qu.: 176.5
## Median : 6336393 Median : 6815.2
## Mean : 152033640 Mean : 56047.7
## 3rd Qu.: 32969520 3rd Qu.: 39128.0
## Max. :7975105024 Max. :1349776.4
## NA's :416024
## excess_mortality_cumulative excess_mortality
## Min. :-44.2 Min. :-95.9
## 1st Qu.: 2.1 1st Qu.: -1.5
## Median : 8.1 Median : 5.7
## Mean : 9.8 Mean : 10.9
## 3rd Qu.: 15.2 3rd Qu.: 15.6
## Max. : 78.1 Max. :378.2
## NA's :416024 NA's :416024
## excess_mortality_cumulative_per_million
## Min. :-2936.5
## 1st Qu.: 116.9
## Median : 1270.8
## Mean : 1772.7
## 3rd Qu.: 2883.0
## Max. :10293.5
## NA's :416024
This is a line chart depicting the correlation between new vaccinations and new cases. The x-axis shows dates, and the y-axis shows counts per million people.
Most vaccinations were administered between the start and end of 2021. Interestingly, the largest spike of Covid-19 occurred after the mass vaccinations of 2021, and when vaccination rates dwindled. One explanation is that cases went up because vaccination rates went down. However, vaccinations are not a short-term solution but are instead meant to last for years, if not a lifetime. This could mean that the vaccinations were at best only temporarily effective in preventing Covid-19.
lineplot <- df %>%
group_by(date) %>%
summarise(
newcasespermil = sum(new_cases_per_million, na.rm = TRUE),
newvaccinationspermil = sum(new_vaccinations_smoothed_per_million, na.rm = TRUE)
)
p1 <- ggplot(lineplot, aes(x = date)) +
geom_line(aes(y = newcasespermil, color = "New Cases Per Million"), size = 1) +
geom_line(aes(y = newvaccinationspermil, color = "New Vaccinations Per Million"), size = 1) +
scale_color_manual(values = c("New Cases Per Million" = "red", "New Vaccinations Per Million" = "blue")) +
labs(title = "Effect of Vaccination on New COVID-19 Cases",
x = "Date",
y = "Per Million People",
color = "Legend") +
theme_minimal() +
scale_x_date(
date_labels = "%b %Y", # Format: "Jan 2020"
date_breaks = "6 months" # Show labels every 6 months
) +
theme(legend.position = "top")
p1
This grouped bar chart presents the correlation between seasons of each year and Covid-19 Cases. Seasons are divided into an early and late phase, for specificity. 19
Covid-19 is a virus that is killed by sunlight. Furthermore, biological analysis shows that Covid-19 thrives best in colder whether, allowing for mutations. Finally, the human body is weakend by colder temperatures. Based on this information, it is worthwile to look at the correlation between seasons and Covid-19 cases.
As expected, the cases were highest during the Late Winter, Early Winter, and Early Spring seasons, for almost each year. During these times, the length of the day is the shortest. One explanation for why Late Fall does not have high cases is that the effect of temperature on the human body is delayed, and only after prolonged exposure to the cold does one’s immune system weaken.df <- df %>%
mutate(season = case_when(
month(date) == 12 ~ "Early Winter",
month(date) %in% c(1, 2) ~ "Late Winter",
month(date) %in% c(3, 4) ~ "Early Spring",
month(date) %in% c(5) ~ "Late Spring",
month(date) %in% c(6, 7) ~ "Early Summer",
month(date) %in% c(8) ~ "Late Summer",
month(date) %in% c(9, 10) ~ "Early Fall",
month(date) %in% c(11) ~ "Late Fall"
))
seasonal_data <- df %>%
group_by(year = year(date), season) %>%
summarise(
total_cases = sum(new_cases, na.rm = TRUE),
total_deaths = sum(new_deaths, na.rm = TRUE)
) %>%
ungroup()
season_order <- c("Early Winter", "Late Winter", "Early Spring", "Late Spring",
"Early Summer", "Late Summer", "Early Fall", "Late Fall")
seasonal_data$season <- factor(seasonal_data$season, levels = season_order)
p2 <- ggplot(seasonal_data, aes(x = season, y = total_cases, fill = as.factor(year))) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "8-Season Analysis of COVID-19 Cases",
x = "Season",
y = "Total Cases",
fill = "Year"
) +
theme_minimal() +
theme(legend.position = "top", axis.text.x = element_text(angle = 45, hjust = 1))
p2
This is a grouped line chart, that is almost identical to the previous graph. It differs in that it shows Deaths correlated with Seasons, rather than Cases correlated with seasons.
As Covid-19 began to rapidly spread, scientists were worried about the increased risk of deaths amongst the homeless population as a result of lower hygiene, knowledge, and possibilty of quarantine. However, the opposite was true, and homeless people were among the least effected by Covid-19. The commonly accepted theory is that those living outside have the highest rates of sunlight exposure, and consequently, vitamin D levels, which have been shown to prevent side effects of Covid-19, including death.
Just as with cases in the previous chart, for each year deaths were highest in the seasons with limited sunlight, with peaks in late winter. As previously Vitamin D defficiencies are highest during these times, and immune defense mechanisms are weakened.p3 <- ggplot(seasonal_data, aes(x = season, y = total_deaths, fill = as.factor(year))) +
geom_bar(stat = "identity", position = "dodge") +
labs(
title = "8-Season Analysis of COVID-19 Deaths",
x = "Season",
y = "Total Deaths",
fill = "Year"
) +
theme_minimal() +
theme(legend.position = "top", axis.text.x = element_text(angle = 45, hjust = 1))
p3
This dual axis line chart uses two lines on separate y-axes to compare trends of Air travel to Covid-19 cases over time.
As seen in the chart, between 2020 and 2022 there is a clear relationship between increased travel and increased Covid-19 cases. Only during the peak of Covid in early 2022 did travel die down again, but only for a short time.
Interestingly, after this, when travel continued to grow, Covid-19 cases dwindled. One possible explanation is that vaccinations were mostly distributed, but as seen in the previous graph the correlation between vaccinations and Covid-19 cases is not strong. More likely, humanity developed heard immunity, and those who were most prone to Covid-19 already got sick or passed away.travel_data <- read.csv("C:/Users/Conrad/Downloads/US Airline Flight Routes and Fares 1993-2024.csv", header = TRUE)
travel_data <- travel_data %>%
filter(Year >= 2020)
travel_agg <- travel_data %>%
group_by(Year, quarter) %>%
summarise(total_passengers = sum(passengers, na.rm = TRUE)) %>%
ungroup()
covid_agg <- df %>%
mutate(Year = year(date), quarter = quarter(date)) %>%
group_by(Year, quarter) %>%
summarise(total_cases = sum(new_cases, na.rm = TRUE)) %>%
ungroup()
combined_data <- merge(travel_agg, covid_agg, by = c("Year", "quarter"))
combined_data <- combined_data %>%
mutate(date = as.Date(paste(Year, (quarter - 1) * 3 + 1, "01", sep = "-")))
p4 <- ggplot(combined_data, aes(x = date)) +
geom_line(aes(y = total_passengers, color = "Passengers"), size = 1) +
geom_line(aes(y = total_cases / 1000, color = "COVID-19 Cases (in thousands)"), size = 1) +
scale_y_continuous(
name = "Total Passengers",
sec.axis = sec_axis(~ . * 1000, name = "COVID-19 Cases")
) +
labs(
title = "Trends in Air Travel and COVID-19 Cases Over Time",
x = "Year",
y = "Total Passengers",
color = "Legend"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
p4
This bubble chart uses points to show the relationship between vaccination rates and case fatality rates.
This graph differs from the first one in that it aims to see if vaccinations were effective in lowering mortality rates, rather than new cases. Although there are many outliers, there is some correlation betwene the two. As more vaccinations were administerd, less people died from Covid-19. Becuase increased vaccinations are correlated with time, it is possible again that mortality decreased because of herd immunity or an exacerbation of older/weaker people dying.
bubble_data <- df %>%
group_by(people_vaccinated_per_hundred) %>%
summarise(
case_fatality_rate = mean(total_deaths / total_cases, na.rm = TRUE),
count = n()
)
p5 <- ggplot(bubble_data, aes(x = people_vaccinated_per_hundred, y = case_fatality_rate, size = count)) +
geom_point(color = "blue", alpha = 0.6) +
labs(
title = "Correlation Between Vaccination Rates and Case Fatality Rates",
x = "Vaccination Rate (per hundred)",
y = "Case Fatality Rate",
size = "Number of Observations"
) +
theme_minimal() +
theme(
plot.title = element_text(hjust = 0.5, size = 16, face = "bold"),
axis.title = element_text(size = 12),
axis.text = element_text(size = 10)
)
p5
This analysis highlights the impact of vaccinations, seasonal trends, and travel patterns on COVID-19 cases and deaths. Key findings include: - Vaccinations were effective in reducing case fatality rates but had a limited impact on preventing new cases. - Seasonal trends showed higher cases and deaths during winter months, likely due to lower vitamin D levels and weakened immune systems. - Air travel was strongly correlated with COVID-19 cases during the early stages of the pandemic but less so as herd immunity developed.