COVID-19 is a highly infectious respiratory illness. The virus primarily spreads through respiratory droplets produced when an infected person talks, coughs, or sneezes, and can also spread by touching a contaminated surface. Common symptoms include fever, cough, shortness of breath, fatigue, headache, and loss of taste or smell. Preventative measures such as wearing masks, washing hands frequently, and maintaining physical distance can help control the spread of the virus. As of currently, there have been over 488 million confirmed cases of COVID-19 worldwide, with over 6.2 million deaths.(W.H.O.) https://www.who.int/health-topics/coronavirus#tab=tab_1
install.packages(‘dplyr’)
library(readxl)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.1.2
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
First we imported all the data from both the coronavirus excel sheet and the comorbidity excel sheet into 2 seperate variables. We then wanted to combine the 2 into another variable to have all the traits that each excel sheet had. Afterwards, we then didnt want people who’s ages were more than 120 years old, so we replaced those with NA. We, as well, didnt want the country label for China to be Mainland China, so we changed it to the proper name.
#import the data
corona <- read_excel("coronavirus.xlsx")
comorbid <- read_excel("comorbidity.xlsx")
#filter the data
join <- inner_join(corona, comorbid)
## Joining with `by = join_by(ID)`
join <- mutate(join, Age = ifelse(Age >= 120, NA, Age))
join <- mutate(join, Country = ifelse(Country == "Mainland China", "China", Country))
For the data analysis, we found that there were 1764 obersvations and 12 variables. We also saw that 49.9% consisted of females and 50.1% were males. The median of the age group of people who have covid was 28 years old. The mean of the age of people who died to covid was 28.4. The standard deviation of the age of people who died was 15.9. Percentage of all of china, Italy, and US that were positive was 30.7%.Percentage of comorbidity who also had covid was; 3.9% heart disease, 7.5% HIV, 1.9% Liver disease, 15.3% Lupus.
# number of observations and variables
dim(join)
## [1] 1764 12
#percentage of females and males
prop.table(table(join$Sex))*100
##
## F M
## 49.88662 50.11338
#Median of age group of people who have corona
median(join$Age[join$CoronaVirus==1], na.rm = T)
## [1] 28
#Mean of Age group
mean(join$Age[join$Death==1], na.rm = T)
## [1] 28.39723
#standard deviation of people who died
sd(join$Age[join$Death == 1], na.rm = T)
## [1] 15.92602
#percentage of people who have covid from China, italy, and US
prop.table(table(join$CoronaVirus == 1 & (join$Country == "China" | join$Country == "Italy" | join$Country == "US")))*100
##
## FALSE TRUE
## 69.33107 30.66893
#percentage of comorbidity who also have covid
prop.table(table(join$Comorbidity[join$CoronaVirus == 1]))*100
##
## Heart Disease HIV Liver Disease Lupus None
## 3.885714 7.542857 1.942857 15.314286 71.314286
The box plot illustration shows us the average and outliers of the age group of people who have covid as well as another plot of people who dont have covid. We also see the outliers of people who were much older than the average upper quartile and people who were much younger than the average lower quartile.
#Creates a boxplot of the age group of people who have or dont have covid
boxplot(join$Age, join$CoronaVirus,
main = "Corona Virus to Age Group",
xlab = "Age",
ylab = "Corona",
horizontal = TRUE)
## Citation
World Health Organization. Coronavirus disease outbreak. https://www.who.int/health-topics/coronavirus#tab=tab_1