Introduction

This project demonstrates some basic data visualizations using a specific data-set. the data set we have used here is a combination of two data sets on the Covid-19 outbreak cases of Brazil and Peru which have been merged into a single data set for ease of use. The data set contains total cases, daily cases, active cases, total deaths and daily deaths of both the countries between the period of 15th of February to the 7th of June, 2020.

library(readxl)
covid19stats_2020 <- read_excel("brazil_&_peru_covid19stats_2020.xlsx")
library(ggplot2)
library(gapminder)
library(gganimate)
## No renderer backend detected. gganimate will default to writing frames to separate files
## Consider installing:
## - the `gifski` package for gif output
## - the `av` package for video output
## and restarting the R session
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
print(covid19stats_2020)
## # A tibble: 114 × 11
##    date                Peru_total_cases brazil_total_cases Peru_daily_cases
##    <dttm>                         <dbl>              <dbl>            <dbl>
##  1 2020-02-15 00:00:00                0                  0                0
##  2 2020-02-16 00:00:00                0                  0                0
##  3 2020-02-17 00:00:00                0                  0                0
##  4 2020-02-18 00:00:00                0                  0                0
##  5 2020-02-19 00:00:00                0                  0                0
##  6 2020-02-20 00:00:00                0                  0                0
##  7 2020-02-21 00:00:00                0                  0                0
##  8 2020-02-22 00:00:00                0                  0                0
##  9 2020-02-23 00:00:00                0                  0                0
## 10 2020-02-24 00:00:00                0                  0                0
## # ℹ 104 more rows
## # ℹ 7 more variables: brazil_daily_cases <dbl>, Peru_active_cases <dbl>,
## #   brazil_active_cases <dbl>, Peru_total_deaths <dbl>,
## #   brazil_total_deaths <dbl>, Peru_daily_deaths <dbl>,
## #   brazil_daily_deaths <dbl>

Peru Daily Cases displayed through a bar chart

Now we look into some data visualizations with the table above. We first plot a bar chart of total daily cases of Peru during the given time period of the five months mentioned above.

bar_chart_1 <- ggplot(data = covid19stats_2020, aes(x = date, y = Peru_daily_cases)) +
  geom_bar(stat = "identity", fill = "red") +
  labs(title = "Peru Daily Cases", x = "Months", y = "Total Daily Cases")

ggplotly(bar_chart_1)

From the above bar plot we can see that the number of daily cases has risen steadily from the middle of March to its peak of 8,805 cases on the 31st of May, and later the cases fell towards June.

Comparative total daily cases between Brazil & Peru

Now we look into a different visualization of data. We will plot a comparitive scatter plot between the total daily cases between the countries for the given time period.

scatter_plot_1 <- ggplot(data = covid19stats_2020, aes(x = date)) +
  geom_point(aes(y = brazil_daily_cases), color = "blue", size = 3) +
  geom_point(aes(y = Peru_daily_cases), color = "red", size = 3) +
  labs(title = "Comparative Scatterplot of daily cases of Brazil & Peru", x = "Months", y = "Total Daily Cases") +
  scale_color_manual(values = c("blue", "red"))

ggplotly(scatter_plot_1)

The data-set above presents a comparative scatter plot of daily COVID-19 cases in Brazil and Peru. In this plot, the red scatter points correspond to Peru, while the blue scatter points represent Brazil.From the above scatter-plot we can see that Brazil had a more severe Covid-19 outbreak than Peru, this may be due to Brazil’s comparatively higher population than Peru, thus more people were exposed to the virus.

Total Cases of Covid-19 outbreak in Brazil

Now we look into an interactive line graph of the total cases of the Covid-19 outbreak in Brazil

line_chart <-ggplot(covid19stats_2020, aes(x = date, y = brazil_total_cases, group = 1 )) +
  geom_line(color= "red", size= 1.5) +
  labs(title = "Total Cases of Covid-19 outbreak in Brazil",
       x = "Months",
       y = "Total Cases") +
   theme(
     plot.title = element_text(size = 18, face = "bold", color = "blue"),
     panel.background = element_rect(fill = "lightgray"),
    panel.grid.major = element_line(color = "gray", linetype = "dashed"),
    panel.grid.minor = element_line(color = "gray", linetype = "dotted")
  )
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
ggplotly(line_chart)

From the above animated line graph, we can see that the total cases of Covid-19 outbreak has significantly increased over the course of the 5 months period. The cases have made a steep climb from its first case in the 25th of February to 691,962 cases by the 7th of June 2020.

Conclusion

The purpose of this project was to illustrate basic data visualizations using R, focusing on the COVID-19 outbreak data-set for Brazil and Peru from February 15th to June 7th, 2020.