COVID-19

COVID-19 is the infectious disease caused by the most recently discovered corona virus. This new virus and disease were unknown before the outbreak began in Wuhan, China, in December 2019.

What is corona virus

Corona viruses are a large family of viruses which may cause illness in animals or humans. In humans, several coronaviruses are known to cause respiratory infections ranging from the common cold to more severe diseases such as Middle East Respiratory Syndrome (MERS) and Severe Acute Respiratory Syndrome (SARS). The most recently discovered coronavirus causes coronavirus disease COVID-19.

What are the symptoms of COVID-19

–fever
–tiredness
–dry
–aches and pain
–nasal congestion
–runny nose
–sore throat or diarrhea

What does this Document include?

This is a data analysis project with R programming . I personally wanted to understand the underlying patterns in covid patients and the gradual increment of covid positive patients count in the Country and across the world. So, I used different datasets about COVID patients that are updated on 7th May 2020. For reference , you can visit covid 19 in india and for datasets. For the Offical information you can visit Misistry of Health and Family Welfarewebsite. Please consider that this analysis is totally for personal understanding.

1.World Present Situation

According the datasets provided by John Hopkins University in GitHub ,the Us is marked first with highest number of positive cases and highest number of deaths. ket us see the top major countries with highest covid +vs,recoveries and deaths.

# Rading data from database
conag<-read.csv("countries-aggregated_csv.csv")
#performing functions to find out top 10 countries
confirmed<-aggregate(conag$Confirmed~conag$Country,conag,FUN = max)
confirmed10<-confirmed[order(-confirmed$`conag$Confirmed`),][1:10,]
recovered<-aggregate(conag$Recovered~conag$Country,conag,FUN = max)
recovered10<-recovered[order(-recovered$`conag$Recovered`),][1:10,]
dead<-aggregate(conag$Deaths~conag$Country,conag,FUN= max)
dead10<-dead[order(-dead$`conag$Deaths`),][1:10,]
#top countries with highest positive cases
confirmed10
##      conag$Country conag$Confirmed
## 179             US         1367638
## 158          Spain          264663
## 140         Russia          221334
## 177 United Kingdom          219183
## 86           Italy          219070
## 63          France          176970
## 67         Germany          169430
## 24          Brazil          162699
## 173         Turkey          138657
## 82            Iran          103135
#top 10 countries with highest recovered cases
recovered10
##     conag$Country conag$Recovered
## 179            US          256336
## 158         Spain          176439
## 67        Germany          145600
## 86          Italy          105186
## 173        Turkey           92691
## 82           Iran           86143
## 37          China           78977
## 24         Brazil           64957
## 63         France           56217
## 140        Russia           39801
#top 10 countries with highest death cases
dead10
##      conag$Country conag$Deaths
## 179             US        80787
## 177 United Kingdom        31885
## 86           Italy        29958
## 158          Spain        26621
## 63          France        26380
## 24          Brazil        11123
## 17         Belgium         8415
## 67         Germany         7569
## 82            Iran         6640
## 122    Netherlands         5306
#plotting these in one devices
par(mfrow=c(1,3),mar=c(12,5,3,2),cex=0.8)
barplot(confirmed10$`conag$Confirmed`,names.arg = confirmed10$`conag$Country`,
        main="Highest covid confirmed cases",las=3,col="red")
barplot(recovered10$`conag$Recovered`,names.arg=recovered10$`conag$Country`,
        main="Highest covid recovered cases",las=3,col="green")
barplot(dead10$`conag$Deaths`,names.arg = dead10$`conag$Country`,
        main="Highest covid death cases",las=3,col="grey")

2.comparision of gradual covid rate in India

As we know it already, the covid patients count is increasing day by day.
As of now, there are 67,152 cases (updated at 11 may 7:58 PM). and 20,917 recovered , 2,206 are dead.

Let’s see the covid effect in india.

#reading the data from dataset
timeseries<-read.csv("time-series-19-covid-combined_csv.csv")
#subset the dataset with country india
india<-subset.data.frame(timeseries,timeseries$Country.Region=="India")
tail(india)
##             Date Country.Region Province.State Lat Long Confirmed Recovered
## 15510 2020-05-02          India                 21   78     39699     10819
## 15511 2020-05-03          India                 21   78     42505     11775
## 15512 2020-05-04          India                 21   78     46437     12847
## 15513 2020-05-05          India                 21   78     49400     14142
## 15514 2020-05-06          India                 21   78     52987     15331
## 15515 2020-05-07          India                 21   78     56351     16776
##       Deaths
## 15510   1323
## 15511   1391
## 15512   1566
## 15513   1693
## 15514   1785
## 15515   1889
#plotting the graph to compare confirmed,recovered and death count
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.5.3
q1<-ggplot(india,aes(x=Date,y=Confirmed,color="Confirmed"))+geom_point(size=4)
 
q2<-q1+geom_point(aes(x=Date,y=Recovered,color="Recovered"),data=india,size=4)
q3<-q2+geom_point(aes(x=Date,y=Deaths,color="Deaths"),data=india,size=4)
q3+theme_light()+labs(x="Dates",y="Total Range",
                      title="Covid Effect in India",
                      subtitle = "Positive,Recovered,Death cases")+
    theme(
        legend.position = c(0.05,0.95),
        legend.justification = c("left", "top"),
        legend.box.just = "left",
        legend.margin = margin(6, 6, 6, 6)
    )

3. Comparision between states in covid tests and covid positives in India

In r programming, The Exploratory Data Analysis provides the graphs to compare these type of objects. now, we have to read the dataset and preprocess it for understanding and then plot.

#reading the dataset statewise 
swtd<-read.csv("StatewiseTestingDetails.csv")
#Looking at the data
str(swtd)
## 'data.frame':    862 obs. of  5 variables:
##  $ Date        : Factor w/ 42 levels "2020-02-04","2020-02-16",..: 19 26 29 33 4 12 13 14 15 16 ...
##  $ State       : Factor w/ 33 levels "Andaman and Nicobar Islands",..: 1 1 1 1 2 2 2 2 2 2 ...
##  $ TotalSamples: num  1403 2679 2848 3754 1800 ...
##  $ Negative    : num  1210 NA NA NA 1175 ...
##  $ Positive    : num  12 27 33 33 132 365 381 405 432 473 ...
states<-aggregate(swtd$TotalSamples~swtd$State,swtd,FUN = max)
states10<-states[order(-states$`swtd$TotalSamples`),][1:10,]
# states with highest covid tests
states10
##        swtd$State swtd$TotalSamples
## 28     Tamil Nadu            216416
## 19    Maharashtra            200477
## 2  Andhra Pradesh            156681
## 26      Rajasthan            152245
## 31  Uttar Pradesh            119688
## 10        Gujarat            105386
## 15      Karnataka             98081
## 8           Delhi             77234
## 18 Madhya Pradesh             63705
## 23         Odisha             52974
par(mfrow=c(1,2),mar=c(12,6,3,2),cex=0.8)
barplot(states10$`swtd$TotalSamples`,names.arg =states10$`swtd$State`,col="grey",main="States with highest covid tests",las=3 )
pstates<-aggregate(swtd$Positive~swtd$State,swtd,FUN = max)
pstates10<-pstates[order(-pstates$`swtd$Positive`),][1:10,]
#states with highest positives
pstates10
##        swtd$State swtd$Positive
## 19    Maharashtra         17974
## 10        Gujarat          7402
## 28     Tamil Nadu          6009
## 8           Delhi          5980
## 26      Rajasthan          3579
## 18 Madhya Pradesh          3341
## 31  Uttar Pradesh          3214
## 2  Andhra Pradesh          1887
## 25         Punjab          1731
## 33    West Bengal          1678
barplot(pstates10$`swtd$Positive`,names.arg =pstates10$`swtd$State`,col="red",main="States with highest covid +ve cases",las=3 )

4.Age groups that are mostly effected by COVID

WHO stated that the agegroups between 21-60 are the most effected people . so, inorder to prove it , I gathered datasets from various websites like wiki, and extracted the needed information. just look at it.

#reading data from the dataset
age<-read.csv("AgeGroupDetails.csv")
#plotting the pie chart
slices<-age$percent
lbls<-age$AgeGroup
par(mfrow=c(1,1),mar=c(4,3,2,2))
pie(slices,labels = lbls,main="Age Groups of Covid Patients")

Data is always large and complex. It will be increasing day by day and even every minute So, In this critical Situation,
Please Be aware and Stay Safe