Transform the type of data of date, location and continent to (in order) Date, Factor and Factor.
covid <- read.csv("owid-covid-data.csv", stringsAsFactors = F)
covid$date <- as.Date(covid$date) #change the type of date
covid$location <- as.factor(covid$location) #change the type of location into factor
covid$continent <- as.factor(covid$continent) #change the type of continent into factor
glimpse(covid) #take a quick look on data type## Rows: 33,823
## Columns: 34
## $ iso_code <chr> "ABW", "ABW", "ABW", "ABW", "ABW", "A…
## $ continent <fct> North America, North America, North A…
## $ location <fct> Aruba, Aruba, Aruba, Aruba, Aruba, Ar…
## $ date <date> 2020-03-13, 2020-03-20, 2020-03-24, …
## $ total_cases <dbl> 2, 4, 12, 17, 19, 28, 28, 28, 50, 55,…
## $ new_cases <dbl> 2, 2, 8, 5, 2, 9, 0, 0, 22, 5, 0, 5, …
## $ total_deaths <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ new_deaths <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ total_cases_per_million <dbl> 18.733, 37.465, 112.395, 159.227, 177…
## $ new_cases_per_million <dbl> 18.733, 18.733, 74.930, 46.831, 18.73…
## $ total_deaths_per_million <dbl> 0.000, 0.000, 0.000, 0.000, 0.000, 0.…
## $ new_deaths_per_million <dbl> 0.000, 0.000, 0.000, 0.000, 0.000, 0.…
## $ new_tests <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ total_tests <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ total_tests_per_thousand <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_tests_per_thousand <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_tests_smoothed <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ new_tests_smoothed_per_thousand <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ tests_units <chr> "", "", "", "", "", "", "", "", "", "…
## $ stringency_index <dbl> 0.00, 33.33, 44.44, 44.44, 44.44, 44.…
## $ population <dbl> 106766, 106766, 106766, 106766, 10676…
## $ population_density <dbl> 584.8, 584.8, 584.8, 584.8, 584.8, 58…
## $ median_age <dbl> 41.2, 41.2, 41.2, 41.2, 41.2, 41.2, 4…
## $ aged_65_older <dbl> 13.085, 13.085, 13.085, 13.085, 13.08…
## $ aged_70_older <dbl> 7.452, 7.452, 7.452, 7.452, 7.452, 7.…
## $ gdp_per_capita <dbl> 35973.78, 35973.78, 35973.78, 35973.7…
## $ extreme_poverty <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ cardiovasc_death_rate <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ diabetes_prevalence <dbl> 11.62, 11.62, 11.62, 11.62, 11.62, 11…
## $ female_smokers <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ male_smokers <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ handwashing_facilities <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ hospital_beds_per_thousand <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ life_expectancy <dbl> 76.29, 76.29, 76.29, 76.29, 76.29, 76…
Period of data collected
## [1] "2019-12-31" "2020-07-31"
Rank of total_deaths in the world, it shows that United States are the highest country in total deaths.
latest <- covid[covid$date == "2020-07-31",] #Select the latest year
death <- aggregate(total_deaths~location, latest, sum)
death <- death[order(death$total_deaths, decreasing = T),] #Ordering for the highest total_deaths
death <- death[-1,] #Subset the first row because it shows World
deathAssuring the unique value of locations, it shows 212 locations (or country) that contained in the data.
## [1] 212
From the exploration, we can see that North America, Europe and South America continent are the 3 highest of total deaths.
continents <- aggregate(total_deaths~continent, latest, sum) #Aggregate total_deaths by continent
continents <- continents[-1,] #subset 1st row because it given no continent name
continents <- continents[order(continents$total_deaths, decreasing = T),] #order continent based on total_deaths
continentsWe want to know the highest case on each continent and location.
newcases <- covid[covid$location != "World",]
newcases <- aggregate(x = list(new_cases = newcases$new_cases),
by = list(continent = newcases$continent,
location = newcases$location
), FUN = max)
newcases <- newcases[order(newcases$new_cases, decreasing = T),]
newcasesWe want to explore more about this pandemic in Indonesia, first we need to check the amount of new_cases, total_deaths and new_deaths within period. “2019-12-31” “2020-07-31”
indo <- covid[covid$location == "Indonesia",]
new_cases <- formatC(sum(indo$new_cases), big.mark=",", format="d")
total_deaths <- formatC(sum(indo$total_deaths), big.mark=",", format="d")
new_deaths <- formatC(sum(indo$new_deaths), big.mark=",",format = "d")
summary <- data.frame(INDONESIA = c("new_cases","total_deaths", "new_deaths"),
Amount = c(new_cases, total_deaths, new_deaths))
summaryWe wants to know if Indonesia Government has followed WHO Instructions regards of the standard amount of test by population. 1st we check the number of population.
## [1] "273,523,621"
Indonesia has 273.523.621 million of population. From WHO policy, we need to do at least 1:1000 population which means, at least we need to do test 273.536,621 tests. lets check the actual conditions.
newtest <- aggregate(new_tests ~population, indo, sum)
newtest$diff. <- sum(indo$new_tests, na.rm=TRUE) - max(indo$population/1000)
newtestFrom the diff. column, it means that Indonesia has passed the minimum amount of test by population.
We want to know the median of new_cases per daily
In a simple statistical way.
## [1] 297
new_cases to be appear in Indonesia every day is 297 cases.
It has 213days from 2019-12-31 to 2020-07-31, by the range we compare to how many test has been implemented by Indonesian Government by using simple mathematical equation.
## [1] 3204.493
Daily test shows only 3204.493 samples which means we need to be 10 times higher as Indonesia’s President Mr. Joko Widodo had targeted 30.000 daily test must be implemented in order to keep tracking the spreads of infection.
Government appeals some rules in order to reduce or stop the infection as follow:
Wear a proper face mask everytime outside.
Keep hands clean by washing our hands with running water and hand soap especially before touching face or eating.
Practicing physical distancing by avoiding public packed area.
Change and wash your clothing right after return from outside and take a shower after that.
Checking your body temperature if you caught fever and avoid to go anywhere and do self isolation first for 2 weeks.