webpage_url <- "https://www.worldometers.info/coronavirus/"
webpage <- xml2::read_html(webpage_url)
ExOffndrsRaw <- rvest::html_table(webpage)[[1]] %>%
tibble::as_tibble(.name_repair = "unique") # repair the repeated columns
CV <- ExOffndrsRaw %>% dplyr::glimpse(45)
## Observations: 106
## Variables: 9
## $ `Country,Other` <chr> "China", "S. ...
## $ TotalCases <chr> "80,703", "7,...
## $ NewCases <chr> "+52", "+272"...
## $ TotalDeaths <chr> "3,098", "50"...
## $ NewDeaths <int> 28, 2, 49, NA...
## $ TotalRecovered <chr> "57,333", "13...
## $ ActiveCases <chr> "20,272", "7,...
## $ `Serious,Critical` <chr> "5,264", "36"...
## $ `Tot Cases/1M pop` <dbl> 56.1, 142.6, ...
str(CV)
## Classes 'tbl_df', 'tbl' and 'data.frame': 106 obs. of 9 variables:
## $ Country,Other : chr "China" "S. Korea" "Iran" "Italy" ...
## $ TotalCases : chr "80,703" "7,313" "6,566" "5,883" ...
## $ NewCases : chr "+52" "+272" "+743" "" ...
## $ TotalDeaths : chr "3,098" "50" "194" "233" ...
## $ NewDeaths : int 28 2 49 NA NA NA NA 7 1 NA ...
## $ TotalRecovered : chr "57,333" "130" "2,134" "589" ...
## $ ActiveCases : chr "20,272" "7,133" "4,238" "5,061" ...
## $ Serious,Critical: chr "5,264" "36" "" "567" ...
## $ Tot Cases/1M pop: num 56.1 142.6 78.2 97.3 12.2 ...
View(CV)
loading ggplot2 for plotting
library(ggplot2)
CV %>% ggplot(aes(`Country,Other`, TotalCases)) + geom_bar(stat = "identity")
you can see x-asis is overcrowded and we can’t really figure out what is going on, so we need to modify the plot and if you pay close attention to y axis also the values seems little off as some of the values have “commas” in the data frame for Total Case, which need to be fixed before we use the value as continuous_y_axis.
so for that I used mutate function from dplyr package with gsub to remove the comma and created one extra column named as “Total_cases”
CV <- mutate(CV, Total_cases = as.numeric(gsub(",", "", gsub("\\,", ",,", CV$TotalCases))))
str(CV)
## Classes 'tbl_df', 'tbl' and 'data.frame': 106 obs. of 10 variables:
## $ Country,Other : chr "China" "S. Korea" "Iran" "Italy" ...
## $ TotalCases : chr "80,703" "7,313" "6,566" "5,883" ...
## $ NewCases : chr "+52" "+272" "+743" "" ...
## $ TotalDeaths : chr "3,098" "50" "194" "233" ...
## $ NewDeaths : int 28 2 49 NA NA NA NA 7 1 NA ...
## $ TotalRecovered : chr "57,333" "130" "2,134" "589" ...
## $ ActiveCases : chr "20,272" "7,133" "4,238" "5,061" ...
## $ Serious,Critical: chr "5,264" "36" "" "567" ...
## $ Tot Cases/1M pop: num 56.1 142.6 78.2 97.3 12.2 ...
## $ Total_cases : num 80703 7313 6566 5883 1018 ...
View(CV)
here i modified the x axis for better readibility, I rotated the axis text at 45 angle.
CV %>% ggplot(aes(`Country,Other`, Total_cases)) +
geom_bar(stat = "identity") + theme(axis.text.x = element_text(angle = 45, hjust = 1))
however if you still go through x-axis, its quite tough to read, so we can remove few of the countries which has reported only few cases like lets say 15. So, I will use filter function from dplyr to remove those.
CV %>% filter(Total_cases >= 15) %>% ggplot(aes(as.character(`Country,Other`), Total_cases)) +
geom_bar(stat = "identity") +theme(axis.text.x = element_text(angle = 45,hjust = 1))
lets scale the axis and transform it for better comparison. I scaled the y axis with log10, you can transform accordingly.
CV %>% filter(Total_cases > 15) %>% ggplot(aes(`Country,Other`, Total_cases)) +
geom_bar(stat = "identity") +theme(axis.text.x = element_text(angle = 45, hjust = 1)) + scale_y_continuous(trans = "log10")