Covid-19 cases in the world continue to increase, therefore we will analyze how the Covid-19 cases in Indonesia are based on Covid data in the World. data from https://covid.ourworldindata.org/data/owid-covid-data.csv
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --
## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.4 v dplyr 1.0.7
## v tidyr 1.1.4 v stringr 1.4.0
## v readr 2.0.2 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(scales)
##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
dat <- read_csv("https://covid.ourworldindata.org/data/owid-covid-data.csv")
## Rows: 148106 Columns: 67
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (4): iso_code, continent, location, tests_units
## dbl (62): total_cases, new_cases, new_cases_smoothed, total_deaths, new_dea...
## date (1): date
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
dat$date <- as.Date(dat$date,"%d-%m-%Y")
Covid 19 data subset for Indonesia from March 2, 2020 (2020-03-02) to December 17, 2021 (2021-12-17). Show first 6 observations
subset.dat = subset(dat, subset = (dat$date > "02-03-2020" | dat$date < "17-12-2021") &
(dat$location=="Indonesia"))
head(subset.dat,6)
## # A tibble: 6 x 67
## iso_code continent location date total_cases new_cases new_cases_smoot~
## <chr> <chr> <chr> <date> <dbl> <dbl> <dbl>
## 1 IDN Asia Indonesia 2020-03-02 2 2 NA
## 2 IDN Asia Indonesia 2020-03-03 2 0 NA
## 3 IDN Asia Indonesia 2020-03-04 2 0 NA
## 4 IDN Asia Indonesia 2020-03-05 2 0 NA
## 5 IDN Asia Indonesia 2020-03-06 4 2 NA
## 6 IDN Asia Indonesia 2020-03-07 4 0 0.571
## # ... with 60 more variables: total_deaths <dbl>, new_deaths <dbl>,
## # new_deaths_smoothed <dbl>, total_cases_per_million <dbl>,
## # new_cases_per_million <dbl>, new_cases_smoothed_per_million <dbl>,
## # total_deaths_per_million <dbl>, new_deaths_per_million <dbl>,
## # new_deaths_smoothed_per_million <dbl>, reproduction_rate <dbl>,
## # icu_patients <dbl>, icu_patients_per_million <dbl>, hosp_patients <dbl>,
## # hosp_patients_per_million <dbl>, weekly_icu_admissions <dbl>, ...
From the data in Indonesia, make a barplot between date versus new_cases which is overlapped with a barplot between date and new_death. Provide interpretation!
data2 = subset.dat[!is.na(subset.dat$new_deaths), ]
data2 = as.data.frame(data2)
library(ggplot2)
ggplot(data2, aes(x = new_cases,
y = new_deaths,
fill = date))+geom_bar(stat = "identity")
Interpretation: From the Barplot above, it can be seen that this covid data has a lot of data so that the barplot is not clearly visible. If seen from the Barplot, it can be seen that as new cases increase, new deaths also increase.
From the dataset used in point a), take the first date as 1 and date last as (a last observation) then store as independent variable , take total_cases as dependent variable . Show the first 6 observations. Make a plot between and , give an interpretation.
x = subset.dat$date
x <- 1:656
y = subset.dat$total_cases
df2 = data.frame(x,y)
head(df2,6)
## x y
## 1 1 2
## 2 2 2
## 3 3 2
## 4 4 2
## 5 5 4
## 6 6 4
plot(x,y, col = "green", main = "Date vs Total Cases",
xlab = "Date", ylab = "Total Cases")
From the plot above, it can be seen that x is a date with observations from 1 to n and y is the total cases have a positive linear relationship, marked with a slanted green line to the top right.
So, it can be concluded that as the number of days increases, the total number of Covid cases increases.