Data Visualization on Ebola Data

Background

The dataset would be downloaded from [1]. There are 95 csv files. Each of them contains 38 variables among different counties in Liberia. [2]

Those 38 variables are:

Specimens collected
Specimens pending for testing
Total specimens tested
Newly reported deaths
Total death/s in confirmed cases
Total death/s in probable cases
Total death/s in suspected cases
Total death/s in confirmed, probable, suspected cases
Case Fatality Rate (CFR) - Confirmed & Probable Cases
Newly reported contacts
Total contacts listed
Currently under follow-up
Contacts seen
Contacts who completed 21 day follow-up
Contacts lost to follow-up
New admissions
Total no. currently in Treatment Units
Total discharges
Cumulative admission/isolation
Newly Reported Cases in HCW
Cumulative cases among HCW
Newly Reported deaths in HCW
Cumulative deaths among HCW
New Case/s (Suspected)
New Case/s (Probable)
New case/s (confirmed)
Total suspected cases
Total probable cases
Total confirmed cases
Total Number of Confirmed Cases of Sierra Leonean Nationality
Total Number of Confirmed Cases of Guinean Nationality
Cumulative confirmed, probable and suspected cases

16 counties in Liberia:

Bomi County
Bong County
Gbarpolu County
Grand Bassa
Grand Cape Mount
Grand Gedeh
Grand Kru
Lofa County
Margibi County
Maryland County
Montserrado County
Nimba County
River Gee County
RiverCess County
Sinoe County

1: Aggregation

Suppose all the csv files are already downloaded in the machine. First, we need to aggregate all csv files into a single workable dataset.

#set working directory
setwd("C:/[DATA]/DataLiterate/P2")

#set the csv directory
csvDir = paste0(getwd(), "/csv")

#get list of csv
csvFiles = list.files(path=csvDir, full.names=TRUE)

#combine all csv data into single data frame
library(plyr)
allData = do.call(rbind.fill, lapply(csvFiles, read.csv))

#change all column names as lowercase
colnames(allData) = tolower(colnames(allData))

Let’s see if there are really 38 different variables.

variableNames = unique(allData$variable)

As we can see, there are 45 difference variable, rather than 38, among all csv files. They are:

##  [1] Case Fatality Rate (CFR) - Confirmed & Probable Cases           
##  [2] Contacts lost to follow-up                                      
##  [3] Contacts seen                                                   
##  [4] Contacts who completed 21 day follow-up                         
##  [5] Cumulative admission/isolation                                  
##  [6] Cumulative cases among HCW                                      
##  [7] Cumulative deaths among HCW                                     
##  [8] Currently under follow-up                                       
##  [9] New admissions                                                  
## [10] New case/s (confirmed)                                          
## [11] New Case/s (Probable)                                           
## [12] New Case/s (Suspected)                                          
## [13] Newly Reported Cases in HCW                                     
## [14] Newly reported contacts                                         
## [15] Newly reported deaths                                           
## [16] Newly Reported deaths in HCW                                    
## [17] Specimens collected                                             
## [18] Specimens pending for testing                                   
## [19] Total confirmed cases                                           
## [20] Total contacts listed                                           
## [21] Total death/s in confirmed cases                                
## [22] Total death/s in confirmed, probable, suspected cases           
## [23] Total death/s in probable cases                                 
## [24] Total death/s in suspected cases                                
## [25] Total discharges                                                
## [26] Total no. currently in Treatment Units                          
## [27] Total Number of Confirmed Cases of Guinean Nationality          
## [28] Total Number of Confirmed Cases of Sierra Leonean Nationality   
## [29] Total probable cases                                            
## [30] Total specimens tested                                          
## [31] Total suspected cases                                           
## [32] Cumulative (confirmed + probable + suspected)                   
## [33] Cumulative CFR                                                  
## [34] Total case/s (confirmed)                                        
## [35] Total Case/s (Probable)                                         
## [36] Total Case/s (Suspected)                                        
## [37] Cumulative (confirmed + probable + suspects)                    
## [38] Cumulative confirmed, probable and suspected cases              
## [39] Total death/s in confirmed,  probable, suspected cases          
## [40] Case Fatality Rate (CFR) - \n Confirmed & Probable Cases        
## [41] Contacts who completed 21 day \n follow-up                      
## [42] Total death/s in confirmed, \n probable, suspected cases        
## [43] Total no. currently in Treatment \n Units                       
## [44] Total Number of Confirmed Cases \n of Guinean Nationality       
## [45] Total Number of Confirmed Cases \n of Sierra Leonean Nationality
## 45 Levels: Case Fatality Rate (CFR) - Confirmed & Probable Cases ...

Thus, it is necessary to rename/combine a number of variables. For example, remove “” from the variable names. Furthermore, to ease for manipulation, all variable names were converted into lowercase.

allData$variable = gsub("\n ", "", allData$variable)
allData$variable = gsub("  ", " ", allData$variable)
allData$variable = tolower(allData$variable)
variableNames = unique(allData$variable)
sort(variableNames)

##  [1] "case fatality rate (cfr) - confirmed & probable cases"        
##  [2] "contacts lost to follow-up"                                   
##  [3] "contacts seen"                                                
##  [4] "contacts who completed 21 day follow-up"                      
##  [5] "cumulative (confirmed + probable + suspected)"                
##  [6] "cumulative (confirmed + probable + suspects)"                 
##  [7] "cumulative admission/isolation"                               
##  [8] "cumulative cases among hcw"                                   
##  [9] "cumulative cfr"                                               
## [10] "cumulative confirmed, probable and suspected cases"           
## [11] "cumulative deaths among hcw"                                  
## [12] "currently under follow-up"                                    
## [13] "new admissions"                                               
## [14] "new case/s (confirmed)"                                       
## [15] "new case/s (probable)"                                        
## [16] "new case/s (suspected)"                                       
## [17] "newly reported cases in hcw"                                  
## [18] "newly reported contacts"                                      
## [19] "newly reported deaths"                                        
## [20] "newly reported deaths in hcw"                                 
## [21] "specimens collected"                                          
## [22] "specimens pending for testing"                                
## [23] "total case/s (confirmed)"                                     
## [24] "total case/s (probable)"                                      
## [25] "total case/s (suspected)"                                     
## [26] "total confirmed cases"                                        
## [27] "total contacts listed"                                        
## [28] "total death/s in confirmed cases"                             
## [29] "total death/s in confirmed, probable, suspected cases"        
## [30] "total death/s in probable cases"                              
## [31] "total death/s in suspected cases"                             
## [32] "total discharges"                                             
## [33] "total no. currently in treatment units"                       
## [34] "total number of confirmed cases of guinean nationality"       
## [35] "total number of confirmed cases of sierra leonean nationality"
## [36] "total probable cases"                                         
## [37] "total specimens tested"                                       
## [38] "total suspected cases"

After finished the update of the variable names, now let’s see the column names.

sort(colnames(allData))

##  [1] "bomi.county"        "bong.county"        "date"              
##  [4] "gbarpolu.county"    "grand.bassa"        "grand.cape.mount"  
##  [7] "grand.gedeh"        "grand.kru"          "lofa.county"       
## [10] "margibi.county"     "maryland.county"    "montserrado.county"
## [13] "national"           "nimba.county"       "river.gee.county"  
## [16] "rivercess.county"   "sinoe.county"       "variable"          
## [19] "x"

There are 19 columns among all csv files. There is a weird column namded “x”. I wondered how many row(s) with non-NA values in such column.

allData[!is.na(allData$x), ]

##           date               variable national bomi.county bong.county
## 2920 12/1/2014 new case/s (suspected)       25           0           3
## 2921 12/1/2014  new case/s (probable)        9           0           0
## 2922 12/1/2014 new case/s (confirmed)        1           0           0
##      grand.kru lofa.county margibi.county maryland.county
## 2920         0           0              0               5
## 2921         0           0              0               1
## 2922         0           0              0               0
##      montserrado.county nimba.county river.gee.county rivercess.county
## 2920                  0           12                0                0
## 2921                  0            7                0                0
## 2922                  0            1                0                0
##      sinoe.county gbarpolu.county grand.bassa grand.cape.mount grand.gedeh
## 2920            0               0           2               NA           2
## 2921            0               0           1               NA           0
## 2922            0               0           0                0          NA
##      x
## 2920 1
## 2921 0
## 2922 0

From the result shown above, there are only 3 rows with non-NA values in column ‘x’. It seems that those values are not meaningful (with value 0 or 1) and I decided to drop this column.

allData$x = NULL #drop column x

Next, I’d like to check the date format and see if they are consistent.

unique(allData$date)

##  [1] 6/16/2014  6/17/2014  6/22/2014  6/24/2014  6/25/14    6/28/2014 
##  [7] 6/29/2014  7/1/2014   7/2/2014   7/3/2014   7/7/2014   7/8/2014  
## [13] 7/10/2014  7/13/2014  7/17/2014  7/20/2014  7/24/2014  7/26/2014 
## [19] 8/2/2014   8/4/14     8/12/2014  8/15/2014  8/17/2014  8/18/2014 
## [25] 8/20/14    8/25/2014  8/28/2014  9/1/2014   9/2/2014   9/3/2014  
## [31] 9/4/2014   9/5/2014   9/6/2014   9/7/2014   9/8/2014   9/10/2014 
## [37] 9/11/2014  9/12/2014  9/13/2014  9/14/2014  9/15/2014  9/16/2014 
## [43] 9/17/2014  9/20/2014  9/21/2014  09/23/2014 09/25/2014 09/26/2014
## [49] 09/27/2014 9/28/2014  9/30/2014  10/1/2014  10/3/2014  10/4/14   
## [55] 10/5/14    10/7/2014  10/8/2014  10/9/2014  10/10/2014 10/11/14  
## [61] 10/12/14   10/13/14   10/16/14   10/17/14   10/18/14   10/19/14  
## [67] 10/20/14   10/21/2014 10/22/2014 10/23/2014 10/24/14   10/25/2014
## [73] 10/27/14   10/28/2014 10/29/2014 10/30/2014 10/31/2014 11/2/2014 
## [79] 11/4/2014  11/8/2014  11/14/2014 11/15/2014 11/19/2014 11/20/2014
## [85] 11/21/2014 11/23/2014 11/24/2014 11/26/2014 11/27/2014 11/28/2014
## [91] 11/29/2014 11/30/2014 12/1/2014  12/2/2014 
## 94 Levels: 6/16/2014 6/17/2014 6/22/2014 6/24/2014 6/25/14 ... 12/2/2014

There are 94 Ebloa records between 6 June 2014 and 2 December 2014. Based on this information, we can see that the data was not updated every day in such period. Also, the date format of the data are inconsistent. Some of them use the short form of the year (e.g. 14 for 2014) and some of them with the leading zero for the month value (e.g. 09 for September). It is required to convert the date value in a consistent format.

#correct the date format
allData$date = as.Date(gsub("/14$", "/2014", allData$date), "%m/%d/%Y")
unique(allData$date)

##  [1] "2014-06-16" "2014-06-17" "2014-06-22" "2014-06-24" "2014-06-25"
##  [6] "2014-06-28" "2014-06-29" "2014-07-01" "2014-07-02" "2014-07-03"
## [11] "2014-07-07" "2014-07-08" "2014-07-10" "2014-07-13" "2014-07-17"
## [16] "2014-07-20" "2014-07-24" "2014-07-26" "2014-08-02" "2014-08-04"
## [21] "2014-08-12" "2014-08-15" "2014-08-17" "2014-08-18" "2014-08-20"
## [26] "2014-08-25" "2014-08-28" "2014-09-01" "2014-09-02" "2014-09-03"
## [31] "2014-09-04" "2014-09-05" "2014-09-06" "2014-09-07" "2014-09-08"
## [36] "2014-09-10" "2014-09-11" "2014-09-12" "2014-09-13" "2014-09-14"
## [41] "2014-09-15" "2014-09-16" "2014-09-17" "2014-09-20" "2014-09-21"
## [46] "2014-09-23" "2014-09-25" "2014-09-26" "2014-09-27" "2014-09-28"
## [51] "2014-09-30" "2014-10-01" "2014-10-03" "2014-10-04" "2014-10-05"
## [56] "2014-10-07" "2014-10-08" "2014-10-09" "2014-10-10" "2014-10-11"
## [61] "2014-10-12" "2014-10-13" "2014-10-16" "2014-10-17" "2014-10-18"
## [66] "2014-10-19" "2014-10-20" "2014-10-21" "2014-10-22" "2014-10-23"
## [71] "2014-10-24" "2014-10-25" "2014-10-27" "2014-10-28" "2014-10-29"
## [76] "2014-10-30" "2014-10-31" "2014-11-02" "2014-11-04" "2014-11-08"
## [81] "2014-11-14" "2014-11-15" "2014-11-19" "2014-11-20" "2014-11-21"
## [86] "2014-11-23" "2014-11-24" "2014-11-26" "2014-11-27" "2014-11-28"
## [91] "2014-11-29" "2014-11-30" "2014-12-01" "2014-12-02"

2. Visualization

Total death/s in confirmed cases

By using the collected data, let’s check the total death/s in confirmed cases in Liberia.

totalDeaths = allData[allData$variable == "total death/s in confirmed cases", ]
#remove the column about variable
totalDeaths$variable = NULL

#replace the 0 entry with NA in national
totalDeaths[which(totalDeaths$national == 0), "national"] = NA

library(ggplot2)
library(reshape2)
library(RColorBrewer)

t = melt(totalDeaths, id.vars="date")
colnames(t)[2] = "location" 
colorCount = length(unique(t$location))
getPalette = colorRampPalette(brewer.pal(12, "Paired"))

lp1 = ggplot(data=t, aes(x=date, y=value, group=location, color=location)) + geom_point() + geom_line(data=t[!is.na(t$value),], size=2)
lp1 + scale_colour_manual(name = "Location",
                          values=getPalette(colorCount),
                          breaks=c("national", "bomi.county", "bong.county",
                                   "grand.kru", "lofa.county", "margibi.county",
                                   "maryland.county", "montserrado.county", "nimba.county",
                                   "river.gee.county", "rivercess.county", "sinoe.county",
                                   "gbarpolu.county", "grand.bassa", "grand.cape.mount",
                                   "grand.gedeh"),
                          labels=c("National", "Bomi County", "Bong County",
                                   "Grand Kru County", "Lofa County", "Margibi County",
                                   "Maryland County", "Montserrado County", "Nimba County",
                                   "River Gee County", "Rivercess County", "Sinoe County",
                                   "Gbarpolu County", "Grand Bassa County", "Grand Cape Mount County",
                                   "Grand Gedeh County")) +
  xlab("Year 2014") +
  ylab("Total Number of Death(s)")  + 
  ggtitle("Total death(s) in Confirmed Ebola Cases in Liberia")

plot of chunk unnamed-chunk-10

Refer to the graph, the total number of deaths was increased from July to mid-October 2014. In Liberia, Montserrado County had the most number of deaths in confirmed cases. Since the data did not provide any update about the total number of death in confirmed cases after mid-October, we have no idea if there is any increase /decrease after mid-October.

New Case/s (Probable)

Let’s check the the number of new probable Ebola cases in Liberia.

newCasesProb = allData[allData$variable == "new case/s (probable)", ]

#remove the column about variable
newCasesProb$variable = NULL
t = melt(newCasesProb, id.vars="date")

lp1 = ggplot(data=t, aes(x=date, y=value, group=variable, color=variable)) + geom_point() + geom_line(data=t[!is.na(t$value),], size=2)

lp1 + scale_colour_manual(name = "Location",
                          values=getPalette(colorCount),
                          breaks=c("national", "bomi.county", "bong.county",
                                   "grand.kru", "lofa.county", "margibi.county",
                                   "maryland.county", "montserrado.county", "nimba.county",
                                   "river.gee.county", "rivercess.county", "sinoe.county",
                                   "gbarpolu.county", "grand.bassa", "grand.cape.mount",
                                   "grand.gedeh"),
                          labels=c("National", "Bomi County", "Bong County",
                                   "Grand Kru County", "Lofa County", "Margibi County",
                                   "Maryland County", "Montserrado County", "Nimba County",
                                   "River Gee County", "Rivercess County", "Sinoe County",
                                   "Gbarpolu County", "Grand Bassa County", "Grand Cape Mount County",
                                   "Grand Gedeh County"))  +
  xlab("Year 2014") +
  ylab("Number of New Probable Ebola Cases in Liberia")  + 
  ggtitle("New Probable Cases")

plot of chunk unnamed-chunk-11

Refer to the graph, we can see that the number of new probable cases was quite fluctuate from July to December 2014. There was a great increase in Lofa County in August 2014. In addition, there were two peaks in Montserrado County around mid-September and mid-October. However, we can see that the number of probable case was decreasing after mid-October 2014.

Total Confirmed Ebola Cases in Libera

When I first plotted the graph with the orginal data, I found that there is inconsistent value in the total confirmed Ebola cases in the data (the accumlated value dropped which was invalid). Thus, I defined a function to remove the inconsistent value.

library(zoo) #for locf function

#remove the inconsistent accumlated value
removeInconsistent = function(values){
  orgValues = values
  #forward check
  idx = which(diff(na.locf(values, na.rm=FALSE))<0) + 1
  if(length(idx) != 0){
    values[idx] = NA  
  }
  
  #backward check
  idx = which(rev(diff(na.locf(rev(values), na.rm=FALSE)) > 0))
  if(length(idx) != 0){
    values[idx] = NA  
  }
  values
}

totalConfirmedCases = allData[allData$variable == "total confirmed cases", ]
#remove the column about variable
totalConfirmedCases$variable = NULL
#for each 2nd column till the last column, remove the inconsistent value
for (i in 2:ncol(totalConfirmedCases)){
  totalConfirmedCases[,i] = removeInconsistent(totalConfirmedCases[, i])  
}

t = melt(totalConfirmedCases, id.vars="date")


lp1 = ggplot(data=t, aes(x=date, y=value, group=variable, color=variable)) + geom_point() + geom_line(data=t[!is.na(t$value),], size=2)

lp1 + scale_colour_manual(name = "Location",
                          values=getPalette(colorCount),
                          breaks=c("national", "bomi.county", "bong.county",
                                   "grand.kru", "lofa.county", "margibi.county",
                                   "maryland.county", "montserrado.county", "nimba.county",
                                   "river.gee.county", "rivercess.county", "sinoe.county",
                                   "gbarpolu.county", "grand.bassa", "grand.cape.mount",
                                   "grand.gedeh"),
                          labels=c("National", "Bomi County", "Bong County",
                                   "Grand Kru County", "Lofa County", "Margibi County",
                                   "Maryland County", "Montserrado County", "Nimba County",
                                   "River Gee County", "Rivercess County", "Sinoe County",
                                   "Gbarpolu County", "Grand Bassa County", "Grand Cape Mount County",
                                   "Grand Gedeh County")) +
  xlab("Year 2014") +
  ylab("Total Number of Confirmed  Cases")  + 
  ggtitle("Total Number of Confirmed Ebola Cases in Liberia")

plot of chunk unnamed-chunk-12

Montserrado County had a great increased in the total number of confirmed Ebola cases at the beginning of September and in the mid-October 2014. Lofa County also has a shape increase at the beginning of September 2014. However, in Lofa County, the number of confirmed Ebola cases was only steadily increasing from October 2014.

Choropleth map - The Number of New Ebola Cases (confirmed, probable and suspected) in Liberia

To get some idea how Ebola spread in Liberia, a choropleth map about the number of new cases (confirmed, probable and suspected) in Liberia was created. I mainly refer to [3] to learn how to plot choropleth and I downloaded the adminstrative boundary level 1 of Liberia from [4].

Since the orignal data about the new Ebola cases are seperarted into three variables: confirmed, probable and suspected, It is required to sum these three values to obtain the total value of news cases for each county in each month. Then, I plotted thecorrsponding choropleth map accordingly.

#extract the new cases
newCases = allData[grepl("new case*", allData$variable), ]
#drop variable, national column
newCases$variable = NULL
newCases$national = NULL

#convert the date to store month value only
newCases$date = unlist(lapply(newCases$date, function(x){format(x, "%m")}))

#calculate the total number of new cases for each month
newCases = ddply(newCases, .(date), function(x) colSums(x[,-1], na.rm = TRUE))

At this stage, we finished the preparing the data for plotting Choropleth.

newCases

##   date bomi.county bong.county grand.kru lofa.county margibi.county
## 1   06           0           0         0          24              2
## 2   07          17          12         0          25              2
## 3   08           5          34         0         143             23
## 4   09          36         148         4         137            333
## 5   10          10         143         5          33            148
## 6   11           9          54         2           4             32
## 7   12           4           3         0           0              0
##   maryland.county montserrado.county nimba.county river.gee.county
## 1               0                 14            0                0
## 2               0                 35            2                0
## 3               4                100           17                4
## 4               8                681          117                7
## 5               0                741           29                0
## 6               2                233            8               12
## 7               9                  0           41                2
##   rivercess.county sinoe.county gbarpolu.county grand.bassa
## 1                0            0               0           0
## 2                0            0               0           0
## 3                0            0               0           2
## 4                0            9               0          46
## 5                3            9               8          20
## 6               31           22              14          19
## 7                0            0               0           6
##   grand.cape.mount grand.gedeh
## 1                0           0
## 2                0           1
## 3                2           1
## 4                6           0
## 5               29           2
## 6               27           1
## 7                0           6

# Multiple plot function
#
# ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
# - cols:   Number of columns in layout
# - layout: A matrix specifying the layout. If present, 'cols' is ignored.
#
# If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
# then plot 1 will go in the upper left, 2 will go in the upper right, and
# 3 will go all the way across the bottom.
#
# from: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
  require(grid)
  
  # Make a list from the ... arguments and plotlist
  plots <- c(list(...), plotlist)
  
  numPlots = length(plots)
  
  # If layout is NULL, then use 'cols' to determine layout
  if (is.null(layout)) {
    # Make the panel
    # ncol: Number of columns of plots
    # nrow: Number of rows needed, calculated from # of cols
    layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
                     ncol = cols, nrow = ceiling(numPlots/cols))
  }
  
  if (numPlots==1) {
    print(plots[[1]])
    
  } else {
    # Set up the page
    grid.newpage()
    pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
    
    # Make each plot, in the correct location
    for (i in 1:numPlots) {
      # Get the i,j matrix positions of the regions that contain this subplot
      matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))
      
      print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
                                      layout.pos.col = matchidx$col))
    }
  }
}


# melt the data frame 
newCases = melt(newCases, id.vars="date")


#mapping for the map
#replace the name
mapping = data.frame(shortForm = c("bomi.county", "bong.county",
                                   "grand.kru", "lofa.county", "margibi.county",
                                   "maryland.county", "montserrado.county", "nimba.county",
                                   "river.gee.county", "rivercess.county", "sinoe.county",
                                   "gbarpolu.county", "grand.bassa", "grand.cape.mount",
                                   "grand.gedeh"),
                     fullName = c("BOMI", "BONG", 
                                  "GRANDKRU", "LOFA", "MARGIBI",
                                  "MARYLAND", "MONTSERRADO", "NIMBA",
                                  "RIVER GEE", "RIVER CESS", "SINOE",
                                  "GBAPOLU", "GRANDBASSA", "GRAND CAPE MOUNT", 
                                  "GRANDGEDEH"))

#update the county name
newCases$variable = factor(newCases$variable, levels=mapping$shortForm, labels=mapping$fullName)


#plot the map
library(scales)
library(rgeos)
library(maptools)

lbr_dist <- readShapeSpatial("./LBR_adm/LBR_adm1.shp") #admin boundary level 1
lbr_dist <- fortify(lbr_dist, region = "NAME_1") # get the name of county for plotting
lbr_dist$id <- toupper(lbr_dist$id) #change the id to uppercase
#calculate the center
distcenters <- ddply(lbr_dist, .(id), summarize, clat = mean(lat), clong = mean(long))

### prepare the plot for each month ###
jun = newCases[newCases$date == "06", c("variable", "value")]
p1 = ggplot() + geom_map(data =jun, aes(map_id = variable, fill = value), 
                         map = lbr_dist) + expand_limits(x = lbr_dist$long, y = lbr_dist$lat) + 
  scale_fill_gradient2(low = muted("blue"), mid="orange", midpoint = 400, high = muted("red"), limits = c(0, 800)) + 
  geom_text(data = distcenters, aes(x = clong, y = clat, label = id, size = 0.2)) +
  xlab("")+ ylab("") +
  ggtitle("The Number of New Ebola Cases in Liberia\nJun 2014")


jul = newCases[newCases$date == "07", c("variable", "value")]
p2 = ggplot() + geom_map(data =jul, aes(map_id = variable, fill = value), 
                         map = lbr_dist) + expand_limits(x = lbr_dist$long, y = lbr_dist$lat) + 
  scale_fill_gradient2(low = muted("blue"), mid="orange", midpoint = 400, high = muted("red"), limits = c(0, 800)) + 
  geom_text(data = distcenters, aes(x = clong, y = clat, label = id, size = 0.2)) +
  xlab("")+ ylab("") +
  ggtitle("The Number of New Ebola Cases in Liberia\nJul 2014")

aug = newCases[newCases$date == "08", c("variable", "value")]
p3 = ggplot() + geom_map(data =aug, aes(map_id = variable, fill = value), 
                         map = lbr_dist) + expand_limits(x = lbr_dist$long, y = lbr_dist$lat) + 
  scale_fill_gradient2(low = muted("blue"), mid="orange", midpoint = 400, high = muted("red"), limits = c(0, 800)) + 
  geom_text(data = distcenters, aes(x = clong, y = clat, label = id, size = 0.2)) +
  xlab("")+ ylab("") +
  ggtitle("The Number of New Ebola Cases in Liberia\nAug 2014")

sept = newCases[newCases$date == "09", c("variable", "value")]
p4 = ggplot() + geom_map(data =sept, aes(map_id = variable, fill = value), 
                         map = lbr_dist) + expand_limits(x = lbr_dist$long, y = lbr_dist$lat) + 
  scale_fill_gradient2(low = muted("blue"), mid="orange", midpoint = 400, high = muted("red"), limits = c(0, 800)) + 
  geom_text(data = distcenters, aes(x = clong, y = clat, label = id, size = 0.2)) +
  xlab("")+ ylab("") +
  ggtitle("The Number of New Ebola Cases in Liberia\nSept 2014")

oct = newCases[newCases$date == "10", c("variable", "value")]
p5 = ggplot() + geom_map(data =oct, aes(map_id = variable, fill = value), 
                         map = lbr_dist) + expand_limits(x = lbr_dist$long, y = lbr_dist$lat) + 
  scale_fill_gradient2(low = muted("blue"), mid="orange", midpoint = 400, high = muted("red"), limits = c(0, 800)) + 
  geom_text(data = distcenters, aes(x = clong, y = clat, label = id, size = 0.2)) +
  xlab("")+ ylab("") +
  ggtitle("The Number of New Ebola Cases in Liberia\nOct 2014")

nov = newCases[newCases$date == "11", c("variable", "value")]
p6 = ggplot() + geom_map(data =nov, aes(map_id = variable, fill = value), 
                         map = lbr_dist) + expand_limits(x = lbr_dist$long, y = lbr_dist$lat) + 
  scale_fill_gradient2(low = muted("blue"), mid="orange", midpoint = 400, high = muted("red"), limits = c(0, 800)) + 
  geom_text(data = distcenters, aes(x = clong, y = clat, label = id, size = 0.2)) +
  xlab("")+ ylab("") +
  ggtitle("The Number of New Ebola Cases in Liberia\nNov 2014")

dec = newCases[newCases$date == "12", c("variable", "value")]
p7 = ggplot() + geom_map(data =dec, aes(map_id = variable, fill = value), 
                         map = lbr_dist) + expand_limits(x = lbr_dist$long, y = lbr_dist$lat) + 
  scale_fill_gradient2(low = muted("blue"), mid="orange", midpoint = 400, high = muted("red"), limits = c(0, 800)) + 
  geom_text(data = distcenters, aes(x = clong, y = clat, label = id, size = 0.2)) +
  xlab("")+ ylab("") +
  ggtitle("The Number of New Ebola Cases in Liberia\nDec 2014")




p1 #Jun