Mapping Deaths by COVID-19 in Brazil

We will download a COVID-19 dataset from Brasil.io Website that contains the number of deaths by city per day. The dataset used here was dowloaded in 11-17-2020.

1. Preparing the Environment

Like always, we will start setting the directory and loading the packages we are going to use. We will use the ggmap package to get the coordinates of the cities in order to plot in the map. To make a map we will use the leaflet package.

#Setting directory
#getwd()
#setwd("seu_diretorio")

#Loading packages
#Format the dataset
library(dplyr)
library(tidyverse)
library(reshape2)
library(data.table)
#Get coordinates
library(ggmap)
#Plot
library(ggplot2)
library(leaflet)

2. Reading the Data

Once we downloaded the data, we will read it. Here I use the function fread from data.table to do it because it allows to read only the columns of interest. Then, we remove “NA” values and format the column data to data objects type using as.Date() function. We do not have a column containing the regions information. How we will make some plots using this information, we will create it based on the state information column.

#Reading the dataset
coviddata <- fread("caso_full.csv", sep = ",", header = T,
                   na.strings = "NA", select = c("city", 
                                                 "city_ibge_code",
                                                 "date",
                                                 "epidemiological_week",
                                                 "place_type", "state",
                                                 "new_deaths"))
#Removing rows containing NA
coviddata <- drop_na(coviddata)

#Formatting dates
coviddata$date <- as.Date(coviddata$date)
#str(coviddata)

#Defining state regions and creating region column
norte <- c("AM", "RR", "AP", "PA", "TO", "RO", "AC")
nordeste <- c("MA", "PI", "CE", "RN", "PE", "PB", "SE", "AL", "BA")
centro_oeste <- c("MT", "MS", "GO", "DF")
sudeste <- c("SP", "RJ", "ES", "MG")
sul <- c("PR", "SC", "RS")

coviddata <- coviddata %>%
             mutate(region = case_when(coviddata$state %in%
                                       norte==TRUE ~ "North",
                                       coviddata$state %in%
                                       nordeste==TRUE ~ "Northeast",
                                       coviddata$state %in% 
                                       centro_oeste==TRUE ~ "Midwest",
                                       coviddata$state %in%
                                       sudeste==TRUE ~ "Southeast",
                                       coviddata$state %in%
                                       sul==TRUE ~ "South"))

3. Making Some Plots

We will create a plot considering the number of total deaths by COVID-19 in each region per week. We calculate this number, subsequently, we plot the graph.

#Plotting deaths by reagion each epidemiological week
#Calculate total number of deaths per region per week
region_totals <- coviddata                %>%
    filter(place_type == "state")         %>%
    group_by(epidemiological_week,region) %>%  
    summarize(tot = sum(new_deaths))

#Plot
region_totals %>% 
    ggplot(aes(x = epidemiological_week, 
               y = tot,
               group = region, color = region))        + 
    geom_line()                                        +  
    geom_point()                                       +
    xlab("Epidemiological Week")                       +
    ylab("Total Deaths")                               +
    theme(plot.title = element_text(hjust = 0.5))      +
    ggtitle("Deaths by COVID-19 per Region of Brazil") +
    theme_classic()

We also can represent the total number of deaths by region as a pie plot.

#Total by region
somaregiao <- tapply(region_totals$tot, region_totals$region, sum)

#barplot(somaregiao,
#        horiz = TRUE,
#        xlab = "Number of Deaths by COVID-19",
#        ylab = "Region",
#        col = c("red"))
pie(somaregiao, main = "Total Deaths by COVID-19 per Region")

We can plot the number of total deaths by state.

#Filter the lines that contain the information by state
state <- coviddata %>% 
         filter(place_type == "state")
#Plot
state %>% 
    ggplot(aes(x =state, y = new_deaths)) + geom_bar(stat = "identity") +
    ylab("New Deaths") + xlab("State") +
    theme(plot.title = element_text(hjust = 0.5)) +
    ggtitle("Deaths by COVID-19 per State - Brazil") +
    theme_classic()

4. Mapping the Total Number of Death by City

We will prepare our data to map it. We will get the total number of deaths by city. In order to get the coordinates of the cities to plot in the map, we need a vector with this information that will be used as entrancy to next function. There are more than one city with the same name in some cases. So, how more information about the places we have to use is better. In this case, we have the country, Brazil, the state in the state column and the name of the city. We are going to concatenate these three information in a unique and create a column (citst).

#Calculating total deaths by city
#Filtering the rows with city information
covidcity <- coviddata    %>%
             filter(place_type == "city")

#Getting the sum
covidcity                 %>% 
    group_by(date,region) %>% 
    summarize(tot=sum(new_deaths)) -> totals

#Creating the column with the information of the cities that will be searched
brazil <- rep("Brazil", length(unique(covidcity$city)))
covidcity$citst <- paste0(covidcity$city, sep = " ", covidcity$state, sep = " ", "Brazil")

To search for coordinates of the cities we will use the function geocode() from ggmap package. To perform this search, we need a API credential that can be obtained on Google Cloud Plataform. We will create a data frame with three columns: city, longitude and latitude.

#key <- "YOUR KEY HERE"
register_google(key, account_type = "standard")
#If your dataset if very big, it can take a time to search all cities
local   <- unique(covidcity$citst)
longlat <- geocode(local) %>% 
    mutate(loc = local)
#Write this object in a csv file to use later and do not need make the search again.
#write.csv(longlat, "longlat.csv")

We will create a vector with the information that will be mapped: the coordinates of the places and the total number of deaths by each place.

#Joining information of the 2 dataframes by the name of the city
covidcity %>%
    group_by(citst)                              %>%
    summarize(cases = sum(new_deaths))           %>% 
    inner_join(longlat, by = c("citst" = "loc")) %>% 
    mutate(LatLon = paste(lat, lon, sep = ":"))  -> formapping

#Formatting the object
num_of_times_to_repeat <- formapping$cases
long_formapping <- formapping[rep(seq_len(nrow(formapping)),
                                  num_of_times_to_repeat),]

Finally, we will use the leaflet to plot the map.

#Plot the map
leaflet(long_formapping)     %>% 
    addTiles()               %>% 
    addMarkers(clusterOptions = markerClusterOptions())