The dataset would be downloaded from [1]. There are 95 csv files. Each of them contains 38 variables among different counties in Liberia. [2]
Those 38 variables are:
16 counties in Liberia:
Suppose all the csv files are already downloaded in the machine. First, we need to aggregate all csv files into a single workable dataset.
#set working directory
setwd("C:/[DATA]/DataLiterate/P2")
#set the csv directory
csvDir = paste0(getwd(), "/csv")
#get list of csv
csvFiles = list.files(path=csvDir, full.names=TRUE)
#combine all csv data into single data frame
library(plyr)
allData = do.call(rbind.fill, lapply(csvFiles, read.csv))
#change all column names as lowercase
colnames(allData) = tolower(colnames(allData))
Let’s see if there are really 38 different variables.
variableNames = unique(allData$variable)
As we can see, there are 45 difference variable, rather than 38, among all csv files. They are:
## [1] Case Fatality Rate (CFR) - Confirmed & Probable Cases
## [2] Contacts lost to follow-up
## [3] Contacts seen
## [4] Contacts who completed 21 day follow-up
## [5] Cumulative admission/isolation
## [6] Cumulative cases among HCW
## [7] Cumulative deaths among HCW
## [8] Currently under follow-up
## [9] New admissions
## [10] New case/s (confirmed)
## [11] New Case/s (Probable)
## [12] New Case/s (Suspected)
## [13] Newly Reported Cases in HCW
## [14] Newly reported contacts
## [15] Newly reported deaths
## [16] Newly Reported deaths in HCW
## [17] Specimens collected
## [18] Specimens pending for testing
## [19] Total confirmed cases
## [20] Total contacts listed
## [21] Total death/s in confirmed cases
## [22] Total death/s in confirmed, probable, suspected cases
## [23] Total death/s in probable cases
## [24] Total death/s in suspected cases
## [25] Total discharges
## [26] Total no. currently in Treatment Units
## [27] Total Number of Confirmed Cases of Guinean Nationality
## [28] Total Number of Confirmed Cases of Sierra Leonean Nationality
## [29] Total probable cases
## [30] Total specimens tested
## [31] Total suspected cases
## [32] Cumulative (confirmed + probable + suspected)
## [33] Cumulative CFR
## [34] Total case/s (confirmed)
## [35] Total Case/s (Probable)
## [36] Total Case/s (Suspected)
## [37] Cumulative (confirmed + probable + suspects)
## [38] Cumulative confirmed, probable and suspected cases
## [39] Total death/s in confirmed, probable, suspected cases
## [40] Case Fatality Rate (CFR) - \n Confirmed & Probable Cases
## [41] Contacts who completed 21 day \n follow-up
## [42] Total death/s in confirmed, \n probable, suspected cases
## [43] Total no. currently in Treatment \n Units
## [44] Total Number of Confirmed Cases \n of Guinean Nationality
## [45] Total Number of Confirmed Cases \n of Sierra Leonean Nationality
## 45 Levels: Case Fatality Rate (CFR) - Confirmed & Probable Cases ...
Thus, it is necessary to rename/combine a number of variables. For example, remove “” from the variable names. Furthermore, to ease for manipulation, all variable names were converted into lowercase.
allData$variable = gsub("\n ", "", allData$variable)
allData$variable = gsub(" ", " ", allData$variable)
allData$variable = tolower(allData$variable)
variableNames = unique(allData$variable)
sort(variableNames)
## [1] "case fatality rate (cfr) - confirmed & probable cases"
## [2] "contacts lost to follow-up"
## [3] "contacts seen"
## [4] "contacts who completed 21 day follow-up"
## [5] "cumulative (confirmed + probable + suspected)"
## [6] "cumulative (confirmed + probable + suspects)"
## [7] "cumulative admission/isolation"
## [8] "cumulative cases among hcw"
## [9] "cumulative cfr"
## [10] "cumulative confirmed, probable and suspected cases"
## [11] "cumulative deaths among hcw"
## [12] "currently under follow-up"
## [13] "new admissions"
## [14] "new case/s (confirmed)"
## [15] "new case/s (probable)"
## [16] "new case/s (suspected)"
## [17] "newly reported cases in hcw"
## [18] "newly reported contacts"
## [19] "newly reported deaths"
## [20] "newly reported deaths in hcw"
## [21] "specimens collected"
## [22] "specimens pending for testing"
## [23] "total case/s (confirmed)"
## [24] "total case/s (probable)"
## [25] "total case/s (suspected)"
## [26] "total confirmed cases"
## [27] "total contacts listed"
## [28] "total death/s in confirmed cases"
## [29] "total death/s in confirmed, probable, suspected cases"
## [30] "total death/s in probable cases"
## [31] "total death/s in suspected cases"
## [32] "total discharges"
## [33] "total no. currently in treatment units"
## [34] "total number of confirmed cases of guinean nationality"
## [35] "total number of confirmed cases of sierra leonean nationality"
## [36] "total probable cases"
## [37] "total specimens tested"
## [38] "total suspected cases"
After finished the update of the variable names, now let’s see the column names.
sort(colnames(allData))
## [1] "bomi.county" "bong.county" "date"
## [4] "gbarpolu.county" "grand.bassa" "grand.cape.mount"
## [7] "grand.gedeh" "grand.kru" "lofa.county"
## [10] "margibi.county" "maryland.county" "montserrado.county"
## [13] "national" "nimba.county" "river.gee.county"
## [16] "rivercess.county" "sinoe.county" "variable"
## [19] "x"
There are 19 columns among all csv files. There is a weird column namded “x”. I wondered how many row(s) with non-NA values in such column.
allData[!is.na(allData$x), ]
## date variable national bomi.county bong.county
## 2920 12/1/2014 new case/s (suspected) 25 0 3
## 2921 12/1/2014 new case/s (probable) 9 0 0
## 2922 12/1/2014 new case/s (confirmed) 1 0 0
## grand.kru lofa.county margibi.county maryland.county
## 2920 0 0 0 5
## 2921 0 0 0 1
## 2922 0 0 0 0
## montserrado.county nimba.county river.gee.county rivercess.county
## 2920 0 12 0 0
## 2921 0 7 0 0
## 2922 0 1 0 0
## sinoe.county gbarpolu.county grand.bassa grand.cape.mount grand.gedeh
## 2920 0 0 2 NA 2
## 2921 0 0 1 NA 0
## 2922 0 0 0 0 NA
## x
## 2920 1
## 2921 0
## 2922 0
From the result shown above, there are only 3 rows with non-NA values in column ‘x’. It seems that those values are not meaningful (with value 0 or 1) and I decided to drop this column.
allData$x = NULL #drop column x
Next, I’d like to check the date format and see if they are consistent.
unique(allData$date)
## [1] 6/16/2014 6/17/2014 6/22/2014 6/24/2014 6/25/14 6/28/2014
## [7] 6/29/2014 7/1/2014 7/2/2014 7/3/2014 7/7/2014 7/8/2014
## [13] 7/10/2014 7/13/2014 7/17/2014 7/20/2014 7/24/2014 7/26/2014
## [19] 8/2/2014 8/4/14 8/12/2014 8/15/2014 8/17/2014 8/18/2014
## [25] 8/20/14 8/25/2014 8/28/2014 9/1/2014 9/2/2014 9/3/2014
## [31] 9/4/2014 9/5/2014 9/6/2014 9/7/2014 9/8/2014 9/10/2014
## [37] 9/11/2014 9/12/2014 9/13/2014 9/14/2014 9/15/2014 9/16/2014
## [43] 9/17/2014 9/20/2014 9/21/2014 09/23/2014 09/25/2014 09/26/2014
## [49] 09/27/2014 9/28/2014 9/30/2014 10/1/2014 10/3/2014 10/4/14
## [55] 10/5/14 10/7/2014 10/8/2014 10/9/2014 10/10/2014 10/11/14
## [61] 10/12/14 10/13/14 10/16/14 10/17/14 10/18/14 10/19/14
## [67] 10/20/14 10/21/2014 10/22/2014 10/23/2014 10/24/14 10/25/2014
## [73] 10/27/14 10/28/2014 10/29/2014 10/30/2014 10/31/2014 11/2/2014
## [79] 11/4/2014 11/8/2014 11/14/2014 11/15/2014 11/19/2014 11/20/2014
## [85] 11/21/2014 11/23/2014 11/24/2014 11/26/2014 11/27/2014 11/28/2014
## [91] 11/29/2014 11/30/2014 12/1/2014 12/2/2014
## 94 Levels: 6/16/2014 6/17/2014 6/22/2014 6/24/2014 6/25/14 ... 12/2/2014
There are 94 Ebloa records between 6 June 2014 and 2 December 2014. Based on this information, we can see that the data was not updated every day in such period. Also, the date format of the data are inconsistent. Some of them use the short form of the year (e.g. 14 for 2014) and some of them with the leading zero for the month value (e.g. 09 for September). It is required to convert the date value in a consistent format.
#correct the date format
allData$date = as.Date(gsub("/14$", "/2014", allData$date), "%m/%d/%Y")
unique(allData$date)
## [1] "2014-06-16" "2014-06-17" "2014-06-22" "2014-06-24" "2014-06-25"
## [6] "2014-06-28" "2014-06-29" "2014-07-01" "2014-07-02" "2014-07-03"
## [11] "2014-07-07" "2014-07-08" "2014-07-10" "2014-07-13" "2014-07-17"
## [16] "2014-07-20" "2014-07-24" "2014-07-26" "2014-08-02" "2014-08-04"
## [21] "2014-08-12" "2014-08-15" "2014-08-17" "2014-08-18" "2014-08-20"
## [26] "2014-08-25" "2014-08-28" "2014-09-01" "2014-09-02" "2014-09-03"
## [31] "2014-09-04" "2014-09-05" "2014-09-06" "2014-09-07" "2014-09-08"
## [36] "2014-09-10" "2014-09-11" "2014-09-12" "2014-09-13" "2014-09-14"
## [41] "2014-09-15" "2014-09-16" "2014-09-17" "2014-09-20" "2014-09-21"
## [46] "2014-09-23" "2014-09-25" "2014-09-26" "2014-09-27" "2014-09-28"
## [51] "2014-09-30" "2014-10-01" "2014-10-03" "2014-10-04" "2014-10-05"
## [56] "2014-10-07" "2014-10-08" "2014-10-09" "2014-10-10" "2014-10-11"
## [61] "2014-10-12" "2014-10-13" "2014-10-16" "2014-10-17" "2014-10-18"
## [66] "2014-10-19" "2014-10-20" "2014-10-21" "2014-10-22" "2014-10-23"
## [71] "2014-10-24" "2014-10-25" "2014-10-27" "2014-10-28" "2014-10-29"
## [76] "2014-10-30" "2014-10-31" "2014-11-02" "2014-11-04" "2014-11-08"
## [81] "2014-11-14" "2014-11-15" "2014-11-19" "2014-11-20" "2014-11-21"
## [86] "2014-11-23" "2014-11-24" "2014-11-26" "2014-11-27" "2014-11-28"
## [91] "2014-11-29" "2014-11-30" "2014-12-01" "2014-12-02"
By using the collected data, let’s check the total death/s in confirmed cases in Liberia.
totalDeaths = allData[allData$variable == "total death/s in confirmed cases", ]
#remove the column about variable
totalDeaths$variable = NULL
#replace the 0 entry with NA in national
totalDeaths[which(totalDeaths$national == 0), "national"] = NA
library(ggplot2)
library(reshape2)
library(RColorBrewer)
t = melt(totalDeaths, id.vars="date")
colnames(t)[2] = "location"
colorCount = length(unique(t$location))
getPalette = colorRampPalette(brewer.pal(12, "Paired"))
lp1 = ggplot(data=t, aes(x=date, y=value, group=location, color=location)) + geom_point() + geom_line(data=t[!is.na(t$value),], size=2)
lp1 + scale_colour_manual(name = "Location",
values=getPalette(colorCount),
breaks=c("national", "bomi.county", "bong.county",
"grand.kru", "lofa.county", "margibi.county",
"maryland.county", "montserrado.county", "nimba.county",
"river.gee.county", "rivercess.county", "sinoe.county",
"gbarpolu.county", "grand.bassa", "grand.cape.mount",
"grand.gedeh"),
labels=c("National", "Bomi County", "Bong County",
"Grand Kru County", "Lofa County", "Margibi County",
"Maryland County", "Montserrado County", "Nimba County",
"River Gee County", "Rivercess County", "Sinoe County",
"Gbarpolu County", "Grand Bassa County", "Grand Cape Mount County",
"Grand Gedeh County")) +
xlab("Year 2014") +
ylab("Total Number of Death(s)") +
ggtitle("Total death(s) in Confirmed Ebola Cases in Liberia")
Refer to the graph, the total number of deaths was increased from July to mid-October 2014. In Liberia, Montserrado County had the most number of deaths in confirmed cases. Since the data did not provide any update about the total number of death in confirmed cases after mid-October, we have no idea if there is any increase /decrease after mid-October.
Let’s check the the number of new probable Ebola cases in Liberia.
newCasesProb = allData[allData$variable == "new case/s (probable)", ]
#remove the column about variable
newCasesProb$variable = NULL
t = melt(newCasesProb, id.vars="date")
lp1 = ggplot(data=t, aes(x=date, y=value, group=variable, color=variable)) + geom_point() + geom_line(data=t[!is.na(t$value),], size=2)
lp1 + scale_colour_manual(name = "Location",
values=getPalette(colorCount),
breaks=c("national", "bomi.county", "bong.county",
"grand.kru", "lofa.county", "margibi.county",
"maryland.county", "montserrado.county", "nimba.county",
"river.gee.county", "rivercess.county", "sinoe.county",
"gbarpolu.county", "grand.bassa", "grand.cape.mount",
"grand.gedeh"),
labels=c("National", "Bomi County", "Bong County",
"Grand Kru County", "Lofa County", "Margibi County",
"Maryland County", "Montserrado County", "Nimba County",
"River Gee County", "Rivercess County", "Sinoe County",
"Gbarpolu County", "Grand Bassa County", "Grand Cape Mount County",
"Grand Gedeh County")) +
xlab("Year 2014") +
ylab("Number of New Probable Ebola Cases in Liberia") +
ggtitle("New Probable Cases")
Refer to the graph, we can see that the number of new probable cases was quite fluctuate from July to December 2014. There was a great increase in Lofa County in August 2014. In addition, there were two peaks in Montserrado County around mid-September and mid-October. However, we can see that the number of probable case was decreasing after mid-October 2014.
When I first plotted the graph with the orginal data, I found that there is inconsistent value in the total confirmed Ebola cases in the data (the accumlated value dropped which was invalid). Thus, I defined a function to remove the inconsistent value.
library(zoo) #for locf function
#remove the inconsistent accumlated value
removeInconsistent = function(values){
orgValues = values
#forward check
idx = which(diff(na.locf(values, na.rm=FALSE))<0) + 1
if(length(idx) != 0){
values[idx] = NA
}
#backward check
idx = which(rev(diff(na.locf(rev(values), na.rm=FALSE)) > 0))
if(length(idx) != 0){
values[idx] = NA
}
values
}
totalConfirmedCases = allData[allData$variable == "total confirmed cases", ]
#remove the column about variable
totalConfirmedCases$variable = NULL
#for each 2nd column till the last column, remove the inconsistent value
for (i in 2:ncol(totalConfirmedCases)){
totalConfirmedCases[,i] = removeInconsistent(totalConfirmedCases[, i])
}
t = melt(totalConfirmedCases, id.vars="date")
lp1 = ggplot(data=t, aes(x=date, y=value, group=variable, color=variable)) + geom_point() + geom_line(data=t[!is.na(t$value),], size=2)
lp1 + scale_colour_manual(name = "Location",
values=getPalette(colorCount),
breaks=c("national", "bomi.county", "bong.county",
"grand.kru", "lofa.county", "margibi.county",
"maryland.county", "montserrado.county", "nimba.county",
"river.gee.county", "rivercess.county", "sinoe.county",
"gbarpolu.county", "grand.bassa", "grand.cape.mount",
"grand.gedeh"),
labels=c("National", "Bomi County", "Bong County",
"Grand Kru County", "Lofa County", "Margibi County",
"Maryland County", "Montserrado County", "Nimba County",
"River Gee County", "Rivercess County", "Sinoe County",
"Gbarpolu County", "Grand Bassa County", "Grand Cape Mount County",
"Grand Gedeh County")) +
xlab("Year 2014") +
ylab("Total Number of Confirmed Cases") +
ggtitle("Total Number of Confirmed Ebola Cases in Liberia")
Montserrado County had a great increased in the total number of confirmed Ebola cases at the beginning of September and in the mid-October 2014. Lofa County also has a shape increase at the beginning of September 2014. However, in Lofa County, the number of confirmed Ebola cases was only steadily increasing from October 2014.
To get some idea how Ebola spread in Liberia, a choropleth map about the number of new cases (confirmed, probable and suspected) in Liberia was created. I mainly refer to [3] to learn how to plot choropleth and I downloaded the adminstrative boundary level 1 of Liberia from [4].
Since the orignal data about the new Ebola cases are seperarted into three variables: confirmed, probable and suspected, It is required to sum these three values to obtain the total value of news cases for each county in each month. Then, I plotted thecorrsponding choropleth map accordingly.
#extract the new cases
newCases = allData[grepl("new case*", allData$variable), ]
#drop variable, national column
newCases$variable = NULL
newCases$national = NULL
#convert the date to store month value only
newCases$date = unlist(lapply(newCases$date, function(x){format(x, "%m")}))
#calculate the total number of new cases for each month
newCases = ddply(newCases, .(date), function(x) colSums(x[,-1], na.rm = TRUE))
At this stage, we finished the preparing the data for plotting Choropleth.
newCases
## date bomi.county bong.county grand.kru lofa.county margibi.county
## 1 06 0 0 0 24 2
## 2 07 17 12 0 25 2
## 3 08 5 34 0 143 23
## 4 09 36 148 4 137 333
## 5 10 10 143 5 33 148
## 6 11 9 54 2 4 32
## 7 12 4 3 0 0 0
## maryland.county montserrado.county nimba.county river.gee.county
## 1 0 14 0 0
## 2 0 35 2 0
## 3 4 100 17 4
## 4 8 681 117 7
## 5 0 741 29 0
## 6 2 233 8 12
## 7 9 0 41 2
## rivercess.county sinoe.county gbarpolu.county grand.bassa
## 1 0 0 0 0
## 2 0 0 0 0
## 3 0 0 0 2
## 4 0 9 0 46
## 5 3 9 8 20
## 6 31 22 14 19
## 7 0 0 0 6
## grand.cape.mount grand.gedeh
## 1 0 0
## 2 0 1
## 3 2 1
## 4 6 0
## 5 29 2
## 6 27 1
## 7 0 6
# Multiple plot function
#
# ggplot objects can be passed in ..., or to plotlist (as a list of ggplot objects)
# - cols: Number of columns in layout
# - layout: A matrix specifying the layout. If present, 'cols' is ignored.
#
# If the layout is something like matrix(c(1,2,3,3), nrow=2, byrow=TRUE),
# then plot 1 will go in the upper left, 2 will go in the upper right, and
# 3 will go all the way across the bottom.
#
# from: http://www.cookbook-r.com/Graphs/Multiple_graphs_on_one_page_(ggplot2)/
multiplot <- function(..., plotlist=NULL, file, cols=1, layout=NULL) {
require(grid)
# Make a list from the ... arguments and plotlist
plots <- c(list(...), plotlist)
numPlots = length(plots)
# If layout is NULL, then use 'cols' to determine layout
if (is.null(layout)) {
# Make the panel
# ncol: Number of columns of plots
# nrow: Number of rows needed, calculated from # of cols
layout <- matrix(seq(1, cols * ceiling(numPlots/cols)),
ncol = cols, nrow = ceiling(numPlots/cols))
}
if (numPlots==1) {
print(plots[[1]])
} else {
# Set up the page
grid.newpage()
pushViewport(viewport(layout = grid.layout(nrow(layout), ncol(layout))))
# Make each plot, in the correct location
for (i in 1:numPlots) {
# Get the i,j matrix positions of the regions that contain this subplot
matchidx <- as.data.frame(which(layout == i, arr.ind = TRUE))
print(plots[[i]], vp = viewport(layout.pos.row = matchidx$row,
layout.pos.col = matchidx$col))
}
}
}
# melt the data frame
newCases = melt(newCases, id.vars="date")
#mapping for the map
#replace the name
mapping = data.frame(shortForm = c("bomi.county", "bong.county",
"grand.kru", "lofa.county", "margibi.county",
"maryland.county", "montserrado.county", "nimba.county",
"river.gee.county", "rivercess.county", "sinoe.county",
"gbarpolu.county", "grand.bassa", "grand.cape.mount",
"grand.gedeh"),
fullName = c("BOMI", "BONG",
"GRANDKRU", "LOFA", "MARGIBI",
"MARYLAND", "MONTSERRADO", "NIMBA",
"RIVER GEE", "RIVER CESS", "SINOE",
"GBAPOLU", "GRANDBASSA", "GRAND CAPE MOUNT",
"GRANDGEDEH"))
#update the county name
newCases$variable = factor(newCases$variable, levels=mapping$shortForm, labels=mapping$fullName)
#plot the map
library(scales)
library(rgeos)
library(maptools)
lbr_dist <- readShapeSpatial("./LBR_adm/LBR_adm1.shp") #admin boundary level 1
lbr_dist <- fortify(lbr_dist, region = "NAME_1") # get the name of county for plotting
lbr_dist$id <- toupper(lbr_dist$id) #change the id to uppercase
#calculate the center
distcenters <- ddply(lbr_dist, .(id), summarize, clat = mean(lat), clong = mean(long))
### prepare the plot for each month ###
jun = newCases[newCases$date == "06", c("variable", "value")]
p1 = ggplot() + geom_map(data =jun, aes(map_id = variable, fill = value),
map = lbr_dist) + expand_limits(x = lbr_dist$long, y = lbr_dist$lat) +
scale_fill_gradient2(low = muted("blue"), mid="orange", midpoint = 400, high = muted("red"), limits = c(0, 800)) +
geom_text(data = distcenters, aes(x = clong, y = clat, label = id, size = 0.2)) +
xlab("")+ ylab("") +
ggtitle("The Number of New Ebola Cases in Liberia\nJun 2014")
jul = newCases[newCases$date == "07", c("variable", "value")]
p2 = ggplot() + geom_map(data =jul, aes(map_id = variable, fill = value),
map = lbr_dist) + expand_limits(x = lbr_dist$long, y = lbr_dist$lat) +
scale_fill_gradient2(low = muted("blue"), mid="orange", midpoint = 400, high = muted("red"), limits = c(0, 800)) +
geom_text(data = distcenters, aes(x = clong, y = clat, label = id, size = 0.2)) +
xlab("")+ ylab("") +
ggtitle("The Number of New Ebola Cases in Liberia\nJul 2014")
aug = newCases[newCases$date == "08", c("variable", "value")]
p3 = ggplot() + geom_map(data =aug, aes(map_id = variable, fill = value),
map = lbr_dist) + expand_limits(x = lbr_dist$long, y = lbr_dist$lat) +
scale_fill_gradient2(low = muted("blue"), mid="orange", midpoint = 400, high = muted("red"), limits = c(0, 800)) +
geom_text(data = distcenters, aes(x = clong, y = clat, label = id, size = 0.2)) +
xlab("")+ ylab("") +
ggtitle("The Number of New Ebola Cases in Liberia\nAug 2014")
sept = newCases[newCases$date == "09", c("variable", "value")]
p4 = ggplot() + geom_map(data =sept, aes(map_id = variable, fill = value),
map = lbr_dist) + expand_limits(x = lbr_dist$long, y = lbr_dist$lat) +
scale_fill_gradient2(low = muted("blue"), mid="orange", midpoint = 400, high = muted("red"), limits = c(0, 800)) +
geom_text(data = distcenters, aes(x = clong, y = clat, label = id, size = 0.2)) +
xlab("")+ ylab("") +
ggtitle("The Number of New Ebola Cases in Liberia\nSept 2014")
oct = newCases[newCases$date == "10", c("variable", "value")]
p5 = ggplot() + geom_map(data =oct, aes(map_id = variable, fill = value),
map = lbr_dist) + expand_limits(x = lbr_dist$long, y = lbr_dist$lat) +
scale_fill_gradient2(low = muted("blue"), mid="orange", midpoint = 400, high = muted("red"), limits = c(0, 800)) +
geom_text(data = distcenters, aes(x = clong, y = clat, label = id, size = 0.2)) +
xlab("")+ ylab("") +
ggtitle("The Number of New Ebola Cases in Liberia\nOct 2014")
nov = newCases[newCases$date == "11", c("variable", "value")]
p6 = ggplot() + geom_map(data =nov, aes(map_id = variable, fill = value),
map = lbr_dist) + expand_limits(x = lbr_dist$long, y = lbr_dist$lat) +
scale_fill_gradient2(low = muted("blue"), mid="orange", midpoint = 400, high = muted("red"), limits = c(0, 800)) +
geom_text(data = distcenters, aes(x = clong, y = clat, label = id, size = 0.2)) +
xlab("")+ ylab("") +
ggtitle("The Number of New Ebola Cases in Liberia\nNov 2014")
dec = newCases[newCases$date == "12", c("variable", "value")]
p7 = ggplot() + geom_map(data =dec, aes(map_id = variable, fill = value),
map = lbr_dist) + expand_limits(x = lbr_dist$long, y = lbr_dist$lat) +
scale_fill_gradient2(low = muted("blue"), mid="orange", midpoint = 400, high = muted("red"), limits = c(0, 800)) +
geom_text(data = distcenters, aes(x = clong, y = clat, label = id, size = 0.2)) +
xlab("")+ ylab("") +
ggtitle("The Number of New Ebola Cases in Liberia\nDec 2014")
p1 #Jun
p2 #Jul
p3 #Aug
p4 #Sept
p5 #Oct
p6 #Nov
p7 #Dec
Refer to these graphs, we can see that the protential source of the Ebola cases in Liberia were from Lofa and Monsterrado . Since Bong is located in between Lofa and Montserrado, Ebola spreaded to Bong in July and the counties next to Monserrado, i.e. Bomi and Grandbassa.
In August 2014, Lofa and Monserrado had a great increase in the number of new Ebloa cases and Ebloa spread outwards to counties such as Numba.
In September 2014, Monserrado had the most number of new Ebloa cases. There is also a great increases in Lofa, Bong, Nuba. The counties next to Monserrado, such as Margibi, Bomi and Grandassa, also had an increase in the new Ebloa cases.
In October 2014, Monserrado reached around 750 new cases and there was a trend that Eloba spread to the boundary of Liberia.
In November and Decmeber 2014, the number of new cases were greatly decreased. It showed that the situation was under control. Although Monserrado still has around 400 new cases, the other nearby counties did not have any more increase in the number of new cases.
The potential sources location of Eloba in Liberia are Lofa and Monserrado. It spreads to the nearby counties in short period of time and greatly affected the other countries in September 2014. In October 2014, It seems that the spread was stopped, but, Montserrado still had a great number of new cases. The great number of new cases in Montserrado would tigger another wave of Eloba spread. Lucky that the spread was halted in November 2014. Since the data was in-complete in December (have collected data up to 02 December 2014), we could not draw any the conclusion that if Ebola already stopped spreading to the other counties. However, from the news, the spread of Ebola is now under-controlled. ### Reference
[1] - https://github.com/tristantao/ebola/tree/master/liberia_data
[2] - https://github.com/tristantao/ebola/blob/master/liberia_data/help_me/template.csv
[3] - http://bl.ocks.org/prabhasp/raw/5030005/
[4] - http://gadm.org/