The dataset used in this analysis contains information on COVID-19 vaccine distribution and administration. It includes attributes such as date, location (e.g., country, region), vaccine manufacturer, doses distributed, doses administered, and population demographics. Each record represents a daily snapshot of vaccine-related activities in a specific geographic area.
The dataset consists of COVID-19-related metrics for different countries or regions. Here’s an introduction to each column:
Country.Region: The name of the country or region.
Confirmed: The total number of confirmed COVID-19 cases.
Deaths: The total number of deaths attributed to COVID-19.
Recovered: The total number of individuals who have recovered from COVID-19.
Active: The number of active COVID-19 cases (Confirmed - Deaths - Recovered).
New.cases: The number of new confirmed COVID-19 cases reported.
New.deaths: The number of new deaths attributed to COVID-19 reported.
New.recovered: The number of new recoveries reported.
Deaths…100.Cases: The percentage of deaths among confirmed cases.
Recovered…100.Cases: The percentage of recoveries among confirmed cases.
Deaths…100.Recovered: The percentage of deaths among recovered cases.
Confirmed.last.week: The total number of confirmed cases reported in the previous week.
X1.week.change: The change in confirmed cases compared to the previous week.
X1.week…increase: Indicates whether there was an increase in confirmed cases compared to the previous week.
WHO.Region: The World Health Organization (WHO) region to which the country or region belongs.
Importing three datasets into this project
-covid– This dataset contains Country/Region, Continent, Population, TotalCases, NewCases, TotalDeaths, NewDeaths, TotalRecovered, NewRecovered, ActiveCases, Serious, Critical, Tot Cases/1M pop, Deaths/1M pop, TotalTests, Tests/1M pop, WHO Region, iso_alpha. -covid_grouped– This dataset contains Date(from 20-01-22 to 20-07-27), Country/Region, Confirmed, Deaths, Recovered, Active, New cases, New deaths, New recovered, WHO Region, iso_alpha. -coviddeath– This dataset contains real-world examples of a number of Covid-19 deaths and the reasons behind the deaths.
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
# Load the dataset
data <- read.csv("covid.csv")
data_grouped <- read.csv("covid_grouped.csv")
data_death <- read.csv("coviddeath.csv")
head(data)
## Country.Region Continent Population TotalCases NewCases TotalDeaths
## 1 USA North America 331198130 5032179 NA 162804
## 2 Brazil South America 212710692 2917562 NA 98644
## 3 India Asia 1381344997 2025409 NA 41638
## 4 Russia Europe 145940924 871894 NA 14606
## 5 South Africa Africa 59381566 538184 NA 9604
## 6 Mexico North America 129066160 462690 6590 50517
## NewDeaths TotalRecovered NewRecovered ActiveCases Serious.Critical
## 1 NA 2576668 NA 2292707 18296
## 2 NA 2047660 NA 771258 8318
## 3 NA 1377384 NA 606387 8944
## 4 NA 676357 NA 180931 2300
## 5 NA 387316 NA 141264 539
## 6 819 308848 4140 103325 3987
## Tot.Cases.1M.pop Deaths.1M.pop TotalTests Tests.1M.pop WHO.Region
## 1 15194 492 63139605 190640 Americas
## 2 13716 464 13206188 62085 Americas
## 3 1466 30 22149351 16035 South-EastAsia
## 4 5974 100 29716907 203623 Europe
## 5 9063 162 3149807 53044 Africa
## 6 3585 391 1056915 8189 Americas
## iso_alpha
## 1 USA
## 2 BRA
## 3 IND
## 4 RUS
## 5 ZAF
## 6 MEX
dim(data)
## [1] 209 17
str(data)
## 'data.frame': 209 obs. of 17 variables:
## $ Country.Region : chr "USA" "Brazil" "India" "Russia" ...
## $ Continent : chr "North America" "South America" "Asia" "Europe" ...
## $ Population : num 3.31e+08 2.13e+08 1.38e+09 1.46e+08 5.94e+07 ...
## $ TotalCases : int 5032179 2917562 2025409 871894 538184 462690 455409 366671 357710 354530 ...
## $ NewCases : num NA NA NA NA NA 6590 NA NA NA NA ...
## $ TotalDeaths : num 162804 98644 41638 14606 9604 ...
## $ NewDeaths : num NA NA NA NA NA 819 NA NA NA NA ...
## $ TotalRecovered : num 2576668 2047660 1377384 676357 387316 ...
## $ NewRecovered : num NA NA NA NA NA 4140 NA NA NA NA ...
## $ ActiveCases : num 2292707 771258 606387 180931 141264 ...
## $ Serious.Critical: num 18296 8318 8944 2300 539 ...
## $ Tot.Cases.1M.pop: num 15194 13716 1466 5974 9063 ...
## $ Deaths.1M.pop : num 492 464 30 100 162 391 619 517 234 610 ...
## $ TotalTests : num 63139605 13206188 22149351 29716907 3149807 ...
## $ Tests.1M.pop : num 190640 62085 16035 203623 53044 ...
## $ WHO.Region : chr "Americas" "Americas" "South-EastAsia" "Europe" ...
## $ iso_alpha : chr "USA" "BRA" "IND" "RUS" ...
import pandas as pd
df = r.data # Accessing a r variable
df = pd.DataFrame(df)
print(df.columns)
## Index(['Country.Region', 'Continent', 'Population', 'TotalCases', 'NewCases',
## 'TotalDeaths', 'NewDeaths', 'TotalRecovered', 'NewRecovered',
## 'ActiveCases', 'Serious.Critical', 'Tot.Cases.1M.pop', 'Deaths.1M.pop',
## 'TotalTests', 'Tests.1M.pop', 'WHO.Region', 'iso_alpha'],
## dtype='object')
# Drop NewCases, NewDeaths, NewRecovered rows from
df.drop(['NewCases', 'NewDeaths', 'NewRecovered'],
axis=1, inplace=True)
# Select random set of values from
df.sample(5)
## Country.Region Continent ... WHO.Region iso_alpha
## 167 Isle of Man Europe ... IMN
## 43 Portugal Europe ... Europe PRT
## 29 Sweden Europe ... Europe SWE
## 65 Costa Rica North America ... Americas CRI
## 99 Libya Africa ... EasternMediterranean LBY
##
## [5 rows x 14 columns]
# Load required packages
library(reticulate)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(DT)
df <- py$df #Importing a python variable
# Create interactive table using DT package
datatable(df, options = list(
pageLength = 5,
scrollX = TRUE
))
We are going to create a bar chart to visualize the total no. of cases in each country by using ggplot2
top15 <- df %>%
arrange(desc(TotalCases)) %>%
slice_head(n = 15)
ggplot(top15, aes(x = reorder(`Country.Region`, TotalCases), y = TotalCases, fill = TotalCases)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Top 15 Countries by Total Cases", x = "Country", y = "Total Cases") +
theme_minimal()
# Top 15 countries by TotalCases
top15_df <- df %>%
arrange(desc(TotalCases)) %>%
slice(1:15)
# Bar plot colored by TotalCases
ggplot(top15_df, aes(x = reorder(Country.Region, -TotalCases), y = TotalCases, fill = TotalCases)) +
geom_bar(stat = "identity") +
labs(x = "Country", y = "Total Cases", title = "Top 15 Countries by Total COVID Cases") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(top15_df, aes(x = reorder(Country.Region, -TotalCases), y = TotalCases, fill = TotalDeaths)) +
geom_bar(stat = "identity") +
labs(x = "Country", y = "Total Cases", title = "Top 15 Countries - Total Deaths") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(top15_df, aes(x = reorder(Country.Region, -TotalCases), y = TotalCases, fill = TotalRecovered)) +
geom_bar(stat = "identity") +
labs(x = "Country", y = "Total Cases", title = "Top 15 Countries - Total Recovered") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
ggplot(top15_df, aes(x = TotalTests, y = reorder(Country.Region, TotalTests), fill = TotalTests)) +
geom_bar(stat = "identity") +
labs(x = "Total Tests", y = "Country", title = "Horizontal Bar Plot - Total Tests by Country") +
theme_minimal()
ggplot(top15_df, aes(x = TotalTests, y = reorder(Continent, TotalTests), fill = Continent)) +
geom_bar(stat = "identity") +
labs(x = "Total Tests", y = "Continent", title = "Total Tests by Continent") +
theme_minimal()
ggplot(df, aes(x = Continent, y = TotalCases, size = TotalCases, color = TotalCases)) +
geom_point(alpha = 0.6) +
labs(title = "Bubble Chart: Total Cases by Continent") +
theme_minimal()
ggplot(df, aes(x = Continent, y = TotalCases, size = TotalCases, color = TotalCases)) +
geom_point(alpha = 0.6) +
scale_y_log10() +
labs(title = "Bubble Chart: Total Cases by Continent (Log Scale)") +
theme_minimal()
The heatmap visualizes the relative intensity of the pandemic impact across nations, helping to quickly identify severely affected areas.
# Select and transform top 15 countries
heatmap_df <- df %>%
arrange(desc(TotalCases)) %>%
slice(1:15) %>%
select(Country.Region, TotalCases, TotalDeaths, TotalRecovered, TotalTests) %>%
pivot_longer(cols = -Country.Region, names_to = "Metric", values_to = "Value")
Data was normalized or scaled to ensure comparability across different numerical ranges, and visual cues were provided using a color palette that enhances interpretation.
# Normalize values for better heatmap contrast
heatmap_df <- heatmap_df %>%
group_by(Metric) %>%
mutate(NormalizedValue = Value / max(Value, na.rm = TRUE))
Heatmap was created using ggplot2
Rows represent different countries or regions.
Columns represent selected variables such as total cases, deaths, and recoveries.
ggplot(heatmap_df, aes(x = Metric, y = reorder(Country.Region, desc(Country.Region)), fill = NormalizedValue)) +
geom_tile(color = "white") +
scale_fill_gradient(low = "lightblue", high = "darkblue") +
labs(title = "COVID-19 Metrics Heatmap (Top 15 Countries)",
x = "Metric", y = "Country", fill = "Normalized\nValue") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))