COVID-19 is the infectious disease caused by the most recently discovered corona virus. This new virus and disease were unknown before the outbreak began in Wuhan, China, in December 2019.
Corona viruses are a large family of viruses which may cause illness in animals or humans. In humans, several coronaviruses are known to cause respiratory infections ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The most recently discovered coronavirus causes coronavirus disease COVID-19.
–fever
–tiredness
–dry
–aches and pain
–nasal congestion
–runny nose
–sore throat or diarrhea
This is a data analysis project with R programming . I personally wanted to understand the underlying patterns in covid patients and the gradual increment of covid positive patients count in the Country and across the world. So, I used different datasets about COVID patients that are updated on 7th May 2020. For reference , you can visit covid 19 in india and for datasets. For the Offical information you can visit Misistry of Health and Family Welfarewebsite. Please consider that this analysis is totally for personal understanding.
According the datasets provided by John Hopkins University in GitHub ,the Us is marked first with highest number of positive cases and highest number of deaths. ket us see the top major countries with highest covid +vs,recoveries and deaths.
# Rading data from database
conag<-read.csv("countries-aggregated_csv.csv")
#performing functions to find out top 10 countries
confirmed<-aggregate(conag$Confirmed~conag$Country,conag,FUN = max)
confirmed10<-confirmed[order(-confirmed$`conag$Confirmed`),][1:10,]
recovered<-aggregate(conag$Recovered~conag$Country,conag,FUN = max)
recovered10<-recovered[order(-recovered$`conag$Recovered`),][1:10,]
dead<-aggregate(conag$Deaths~conag$Country,conag,FUN= max)
dead10<-dead[order(-dead$`conag$Deaths`),][1:10,]
#top countries with highest positive cases
confirmed10
## conag$Country conag$Confirmed
## 179 US 1367638
## 158 Spain 264663
## 140 Russia 221334
## 177 United Kingdom 219183
## 86 Italy 219070
## 63 France 176970
## 67 Germany 169430
## 24 Brazil 162699
## 173 Turkey 138657
## 82 Iran 103135
#top 10 countries with highest recovered cases
recovered10
## conag$Country conag$Recovered
## 179 US 256336
## 158 Spain 176439
## 67 Germany 145600
## 86 Italy 105186
## 173 Turkey 92691
## 82 Iran 86143
## 37 China 78977
## 24 Brazil 64957
## 63 France 56217
## 140 Russia 39801
#top 10 countries with highest death cases
dead10
## conag$Country conag$Deaths
## 179 US 80787
## 177 United Kingdom 31885
## 86 Italy 29958
## 158 Spain 26621
## 63 France 26380
## 24 Brazil 11123
## 17 Belgium 8415
## 67 Germany 7569
## 82 Iran 6640
## 122 Netherlands 5306
#plotting these in one devices
par(mfrow=c(1,3),mar=c(12,5,3,2),cex=0.8)
barplot(confirmed10$`conag$Confirmed`,names.arg = confirmed10$`conag$Country`,
main="Highest covid confirmed cases",las=3,col="red")
barplot(recovered10$`conag$Recovered`,names.arg=recovered10$`conag$Country`,
main="Highest covid recovered cases",las=3,col="green")
barplot(dead10$`conag$Deaths`,names.arg = dead10$`conag$Country`,
main="Highest covid death cases",las=3,col="grey")
As we know it already, the covid patients count is increasing day by day.
As of now, there are 67,152 cases (updated at 11 may 7:58 PM). and 20,917 recovered , 2,206 are dead.
Let’s see the covid effect in india.
#reading the data from dataset
timeseries<-read.csv("time-series-19-covid-combined_csv.csv")
#subset the dataset with country india
india<-subset.data.frame(timeseries,timeseries$Country.Region=="India")
tail(india)
## Date Country.Region Province.State Lat Long Confirmed Recovered
## 15510 2020-05-02 India 21 78 39699 10819
## 15511 2020-05-03 India 21 78 42505 11775
## 15512 2020-05-04 India 21 78 46437 12847
## 15513 2020-05-05 India 21 78 49400 14142
## 15514 2020-05-06 India 21 78 52987 15331
## 15515 2020-05-07 India 21 78 56351 16776
## Deaths
## 15510 1323
## 15511 1391
## 15512 1566
## 15513 1693
## 15514 1785
## 15515 1889
#plotting the graph to compare confirmed,recovered and death count
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.3
q1<-ggplot(india,aes(x=Date,y=Confirmed,color="Confirmed"))+geom_point(size=4)
q2<-q1+geom_point(aes(x=Date,y=Recovered,color="Recovered"),data=india,size=4)
q3<-q2+geom_point(aes(x=Date,y=Deaths,color="Deaths"),data=india,size=4)
q3+theme_light()+labs(x="Dates",y="Total Range",
title="Covid Effect in India",
subtitle = "Positive,Recovered,Death cases")+
theme(
legend.position = c(0.05,0.95),
legend.justification = c("left", "top"),
legend.box.just = "left",
legend.margin = margin(6, 6, 6, 6)
)
In r programming, The Exploratory Data Analysis provides the graphs to compare these type of objects. now, we have to read the dataset and preprocess it for understanding and then plot.
#reading the dataset statewise
swtd<-read.csv("StatewiseTestingDetails.csv")
#Looking at the data
str(swtd)
## 'data.frame': 862 obs. of 5 variables:
## $ Date : Factor w/ 42 levels "2020-02-04","2020-02-16",..: 19 26 29 33 4 12 13 14 15 16 ...
## $ State : Factor w/ 33 levels "Andaman and Nicobar Islands",..: 1 1 1 1 2 2 2 2 2 2 ...
## $ TotalSamples: num 1403 2679 2848 3754 1800 ...
## $ Negative : num 1210 NA NA NA 1175 ...
## $ Positive : num 12 27 33 33 132 365 381 405 432 473 ...
states<-aggregate(swtd$TotalSamples~swtd$State,swtd,FUN = max)
states10<-states[order(-states$`swtd$TotalSamples`),][1:10,]
# states with highest covid tests
states10
## swtd$State swtd$TotalSamples
## 28 Tamil Nadu 216416
## 19 Maharashtra 200477
## 2 Andhra Pradesh 156681
## 26 Rajasthan 152245
## 31 Uttar Pradesh 119688
## 10 Gujarat 105386
## 15 Karnataka 98081
## 8 Delhi 77234
## 18 Madhya Pradesh 63705
## 23 Odisha 52974
par(mfrow=c(1,2),mar=c(12,6,3,2),cex=0.8)
barplot(states10$`swtd$TotalSamples`,names.arg =states10$`swtd$State`,col="grey",main="States with highest covid tests",las=3 )
pstates<-aggregate(swtd$Positive~swtd$State,swtd,FUN = max)
pstates10<-pstates[order(-pstates$`swtd$Positive`),][1:10,]
#states with highest positives
pstates10
## swtd$State swtd$Positive
## 19 Maharashtra 17974
## 10 Gujarat 7402
## 28 Tamil Nadu 6009
## 8 Delhi 5980
## 26 Rajasthan 3579
## 18 Madhya Pradesh 3341
## 31 Uttar Pradesh 3214
## 2 Andhra Pradesh 1887
## 25 Punjab 1731
## 33 West Bengal 1678
barplot(pstates10$`swtd$Positive`,names.arg =pstates10$`swtd$State`,col="red",main="States with highest covid +ve cases",las=3 )
WHO stated that the agegroups between 21-60 are the most effected people . so, inorder to prove it , I gathered datasets from various websites like wiki, and extracted the needed information. just look at it.
#reading data from the dataset
age<-read.csv("AgeGroupDetails.csv")
#plotting the pie chart
slices<-age$percent
lbls<-age$AgeGroup
par(mfrow=c(1,1),mar=c(4,3,2,2))
pie(slices,labels = lbls,main="Age Groups of Covid Patients")
Data is always large and complex. It will be increasing day by day and even every minute So, In this critical Situation,
Please Be aware and Stay Safe