This report covers a project of which the goal was to map impact locations of bomb strikes in Rotterdam between 1940 and 1945 to a map. To do so, a PDF file is used which contains dates and strings with information regarding bomb strikes in Rotterdam, which is then cleaned, manipulated and transformed to eventually obtain coordinates which can be used for plotting.
The idea behind this report is to thoroughly cover how I got from messy data to approximations of addresses and coordinates using regex, R and Leaflet. Hence it might be exhaustive, but if you’re looking to learn more about these subjects, it can be worth the read!
The following packages were used for this project. Make sure to install and load these, before running the rest of the code.
# Library
library(XLConnect)
library(readr)
library(dplyr)
library(stringdist)
library(stringr)
library(leaflet)
library(rgdal)
library(ggplot2)
library(readxl)
library(RColorBrewer)
library(leaflet.extras)
library(knitr)
library(kableExtra)
library(htmltools)
In order to identify the impact locations of bomb strikes between 1940 and 1945, a PDF file was used from the municipal archive of Rotterdam, which lists date and impact location of each strike. This document was then transformed to an Excel file, after which it could be read into R. As the Excel file contains many sheets, the XLConnect package was used to load the workbook and read in all sheets all at once as a list.
In addition, a file which contains all addresses of Rotterdam and links these to longitude and latitude coordinates was used to map each address to the map of Rotterdam. Lastly, all addresses were linked to districts, which in turn were linked to an ID file to link the districts to district ID’s in a shapefile used later on.
The PDF file of the addresses is available here
# Read in data
wb <- loadWorkbook("bombs.xlsx")
lst <- readWorksheet(wb, sheet = getSheets(wb))
adresloc <- read.csv2("adresses.csv", header=TRUE, sep=";", encoding="UTF-8")
names(adresloc)[1] <- "STRAATNUMM"
ID <- read_excel("ID_Gebied.xlsx")
# List to data frame
df <- do.call("rbind", lst)
rm(lst)
The raw data for this project looks as follows.
table <- df %>%
`row.names<-`(.,c()) %>%
head(20)
kable(table, format="html", caption="The raw data", align=c("l", "r")) %>%
kable_styling(bootstrap_options=c("striped", "hover"), font_size=12.5, full_width=FALSE)
| Datum | Locatie |
|---|---|
| 1945-02-25 | Adrianalaan.; vliegt. bommen |
| 1944-02-16 | Agniesestr. 60/62; granaten |
| 1941-11-15 | Albr. Engelmanstr.;4b.1g,ms |
| 1944-03-04 | Aleidisstr. 68 |
| 1940-10-09 | Almondestr.;3b,2d,2g,ms |
| 1944-03-04 | Anjerstr. 7; 4 doden en brandschade |
| 1940-10-09 | Anna Mariastr.;3b,2d,2g,ms |
| 1944-09-20 | Azaleastr. 66b; projectiel |
| 1940-10-05 | Bellamystr. |
| 1944-10-06 | Benedenrijweg; doden en gewonden |
| 1944-09-13 | Berghaven, HvH; bommen nabij |
| 1940-06-26 | Bergweg; bomschade aan wegdek en huizen |
| 1944-03-02 | Bergweg;illegale krantjes uit lucht |
| 1940-06-04 | Bierhaven; huizen verwoest |
| 1940-07-02 | Bilderdykstr.,; glasschade |
| 1940-10-05 | Bilderdykstr.;doden en gew., veel schade |
| 1940-12-05 | Blokmakerstr.;4b,1d,1g, ms |
| 1944-11-30 | Boergoensestr.; inslag luchtafw.granaat |
| 1944-01-11 | Bomstr. 9 inslag afweergranaat |
| 1940-10-05 | Boschpolderplein |
The first column Datum, which shows the date of the bomb strikes, seems to be properly registered. The second column Locatie, which contains the location of the strikes, seems to be the exact opposite and is rather messy. In just the first 20 rows of the data, a combination of streets, classifications of bomb strikes (air strike, projectile, grenade), house numbers, casualties, amounts of wounded civilians and abbreviations of all these registrations are found. As the main purpose of this project is to extract the streetname and house number per record in the data, the Locatie column needs to be cleaned accordingly to achieve this.
To extract these elements from the data, the first step is to remove records that are either outside of Rotterdam or are too general (i.e. the registered location is a district, instead of a specific street). This is done as follows.
# Remove non-Rotterdam and too general adresses
index <- c(grep("[Hh]oek [Vv]an|[Hh][Vv][Hh]|[Hh]oek [Vv]\\.* [Hh]olland", df$Locatie),
grep("Hoogvliet", df$Locatie),
grep("Oude Hoek", df$Locatie),
grep("Pernis", df$Locatie),
grep("Rhoon", df$Locatie),
grep("Europa", df$Locatie),
grep("Blydorp|Blijdorp", df$Locatie),
grep("Diwero", df$Locatie),
grep("Spangen", df$Locatie),
grep("^Gem\\.", df$Locatie),
grep("Feijenoord,", df$Locatie),
grep("M,", df$Locatie),
grep("^Shell$", df$Locatie),
grep("Rooyen", df$Locatie))
df <- df[-index,]
Next, this dataset is used twice to (1) extract the streetname and (2) to extract the housenumber for each record.
To extract the streetnames, the following steps are taken:
STRAATNAAM column of the adresloc dataset, which contains all streetnames of the streets in Rotterdam, is checked on correctness. Mistakes found in the streetnames are corrected using gsub;Locatie column of the main dataset is put in a separate vector, clnv, and is thoroughly cleaned, with the goal of only retaining the streetname per record. This is done mainly by using gsub;clnv is matched with the previously cleaned STRAATNAAM column. This step has two purposes: (1) adjust the spelling of old streetnames to their contemporary spelling, and (2) find the streets which no longer exist. The results of matching these vectors are stored in the vector matches;matches vector is added to the main data frame, after which the data frame is filtered to exclude all non-matched addresses.Cleaning the STRAATNAAM column is rather easy, as only a few addresses have mistakes in them. The following code corrects for these mistakes.
# Clean addresses
adresloc$STRAATNAAM <- gsub("^.*sener", "Rösener", adresloc$STRAATNAAM)
adresloc$STRAATNAAM <- gsub(".*Galile.*", "Galileistraat", adresloc$STRAATNAAM)
adresloc$STRAATNAAM <- gsub(".*Hooft.*", "P.C. Hooftplein", adresloc$STRAATNAAM)
Cleaning the Locatie column is a rather long and meticulous process, as the data is very messy and contains a total of 1508 records. First, the addresses are stored in a vector called clnv. To support the cleaning process, the following function was used to look for partial string matches in the main data frame and the adresloc data frame.
# Put addresses in seperate vector
clnv <- df$Locatie
# Search function to find matching addresses
sf <- function(searchterm) {
x <- grep(searchterm, clnv, value=T)
y <- unique(grep(searchterm, adresloc$STRAATNAAM, value=T))
searchlist <- list(`In clnv`=x, `In adresloc`=y)
searchlist
}
To clean clnv, gsub was used to either remove parts of the location strings or amend them, to ensure that they can subsequently be correctly matched. The actual code that was used can be found in appendix A, as the code got rather lengthy to account for all the exceptions and abnormalities in the strings.
Because clnv is cleaned before it is matched, a low margin of error can be used in the matching function, leading to better matches. Using the cleaned addresses in clnv, the following loop was used to match the strings with streetnames in adresloc. In order to match strings, the Damerau-Levenshtein distance was used with a maximum distance of 4.
# Matching
matches <- list()
search <- unique(adresloc$STRAATNAAM)
for (i in 1:length(clnv)) {
x <- clnv[i]
index <- amatch(x, search, maxDist=4, method="osa")
y <- search[index]
matches[[i]] <- data.frame(Original=x, Match=y)
}
matches <- do.call(rbind, matches)
rm(x, y, index, i, clnv, search)
The matching process has resulted in 1225 matched records and 283 non-matched records. The results are added to the main data frame, after which the non-matched records are removed.
# Add matched records and remove non-matched records
df <- df %>%
mutate(Matchstreet=matches[,2]) %>%
filter(!is.na(Matchstreet))
rm(matches)
Having extracted streetnames, the next step is to extract house numbers. There is a lot of variability in house numbers, however, as the records can contain a single, a range or a series of house numbers. Additionally, other numeric information can also be stored in each record.
The general strategy to extract the house numbers is as follows:
Locatie column from the main data frame and store it in the vector numbers;As ranges are either indicated with a comma, a hyphen, a forward slash or the Dutch version of and, en, the above strategy is applied on numbers twice. First to extract all ranges with commas, hyphens and forwards slashes, then a second time to extract ranges with en in it.
For the first extraction, the following code is used to clean numbers.
# Address numbers with punctuation marks
numbers <- df$Locatie
numbers <- gsub(".*Jobshavean.*", "Sint Jobshaven 105", numbers)
numbers <- gsub(";.*$", "", numbers) # Remove text after semicolons
numbers <- gsub("[^0-9\\/\\,\\-]", "", numbers) # Remove everything except set in brackets
numbers[grep("[0-9]+", numbers, invert=T)] <- NA # Change strings with no digits to NA
numbers <- gsub("^,|^\\/|^\\-", "", numbers) # Remove commas, slashes and hyphens at start of strings
numbers <- gsub("\\,$", "", numbers) # Remove commas at the end of strings
The result looks as follows.
head(numbers, 30)
[1] NA "60/62" NA "68" NA "7" NA "66"
[9] NA NA NA NA NA NA NA NA
[17] NA "9" NA NA "2/214" NA "11" NA
[25] NA "130" "12-14" NA NA "2"
What remains are single numbers, ranges or NA’s. To take these differences into account, two new vectors are generated. firstnum is used to store the first number of each string, while secnum is used to store the last number in each string. When records contain only one number, the first and last number will be the same, while in case of ranges, these newly created vectors will contain the lower and upper bound of the range.
# Create vectors with first and last numbers
firstnum <- gsub("^([0-9]*).*", "\\1", numbers) # Set of digits until first non-digit character
secnum <- gsub(".*[\\,|\\/|\\-]([0-9]+)$", "\\1", numbers) # Last set of digits, preceded by a comma, slash or hyphen
This process is repeated for the records containing en.
# Address numbers with 'en'
numbers <- df$Locatie
numbers <- gsub(";.*$", "", numbers) # Remove text after semicolons
numbers[-grep(" en ", numbers)] <- NA # Remove all records without en in it
numbers[grep("[0-9]+", numbers, invert=T)] <- NA # Change strings with no digits to NA
numbers <- gsub("\\(.*\\)$", "", numbers) # Remove brackets from strings
numbers <- gsub(", tuinen", "", numbers)
numbers <- gsub(".*?([0-9].*$)", "\\1", numbers) # Remove the part of each string before the first digit
numbers <- gsub("\\s$", "", numbers) # Remove white space at the end of strings
numbers <- gsub("\\D$", "", numbers) # Remove non-digit characters at the end of strings
# Create vectors with first and last numbers
firstnum2 <- gsub("^([0-9]*).*", "\\1", numbers) # Set of digits untill first non-digit character
secnum2 <- gsub(".*\\D([0-9]+$)", "\\1", numbers) # Last set of digits, preceded by a comma, slash or hyphen
The results are then combined in a data frame and look as follows:
index <- which(!is.na(firstnum2))
firstnum[index] <- firstnum2[index]
secnum[index] <- secnum2[index]
numbers <- cbind(firstnum, secnum)
rm(firstnum, secnum, firstnum2, secnum2, index)
inival <- df$Locatie %>%
gsub(";.*$", "", .)
table <- cbind(inival[18:35], numbers[18:35,])
kable(table, format="html", align=c("l", "c", "c"), caption="Extracted (range of) housenumber(s) per address") %>%
kable_styling(bootstrap_options=c("striped", "hover"), font_size=12.5, full_width=FALSE)
| firstnum | secnum | |
|---|---|---|
| Bomstr. 9 inslag afweergranaat | 9 | 9 |
| Boschpolderplein | NA | NA |
| Bovenstr. | NA | NA |
| Brielsel. 2 t/m 214 | 2 | 214 |
| Brielsel. | NA | NA |
| Brikstr. 11 | 11 | 11 |
| Burg. Meineszplein | NA | NA |
| Buytewechstr., | NA | NA |
| Charloise Lagedijk 130 | 130 | 130 |
| Cillaarshoekstr. 12-14 | 12 | 14 |
| Crooswyk | NA | NA |
| Deenschestr. | NA | NA |
| Delflandsestr. 2 | 2 | 2 |
| Dordtschel. 60 | 60 | 60 |
| Dordtsestraatweg 809 granaat | 809 | 809 |
| Dordtsestraatweg | NA | NA |
| Dorpsweg | NA | NA |
| Dreef 121-131 | 121 | 131 |
To obtain a final number house number per address, a loop is used. The loop checks which records have house numbers and then proceeds to either (1) store the house number if firstnum and secnum are the same or (2) sample a number from the number range with lower bound firstnum and upperbound secnum. If a record does not have a housenumber, it looks up all available house numbers for the street of that record in adresloc, after which it samples one from that set of house numbers. The definitive house numbers are then added to the main data frame.
# Definitive number
defnum <- vector()
resample <- function(x, ...) x[sample.int(length(x), ...)]
set.seed(9112017)
for (i in 1:nrow(numbers)){
if (!is.na(numbers[i,1])) {
range <- c(numbers[i,1]:numbers[i,2])
num <- resample(range, 1)
defnum[i] <- num
} else {
street <- df$Matchstreet[i]
impnum <- adresloc %>%
filter(STRAATNAAM==street) %>%
select(HUISNUMMER) %>%
unlist() %>%
sample(.,1)
defnum[i] <- impnum
}
}
df <- df %>%
mutate(Matchnumber=defnum)
rm(defnum, i, range, num, numbers, street, impnum)
It should be noted that these definitive house numbers can be approximations, even if house numbers were available in the original data. Take for example a record which states that house numbers 24 and 57 were hit. The loop used above will then sample a number between 24 and 57 and use that as a definitive number, while only these specific addresses were hit. Although the loop can still return 24 or 57 as its definitive number, odds are it will be a number between these bounds. This approach was taken because it was pragmatic and yielded a reasonable approximation of the impact locations without spending too much time on the intricacies of the house number ranges. Alternatively, the middle of number ranges could have been chosen as the impact location. The data contains a lot of records of strikes in the same streets but without house numbers, however, so this would have resulted in a lot of bomb strikes being clustered at the same location. Hence, random sampling was chosen.
Having extracted a streetname and house number for each record in the remaining data set, the next step is to look up the longitude and latitude coordinates of each address, and add these to the data frame. To do so, a loop is used again. The loop takes the streetname and house number of each record and checks in adresloc for a match. If a match is found, it extracts the longitude and latitude coordinates from the lon and lat columns in adresloc. If a match is not found, the loop looks for the closest housenumber to the one in the main data frame, and then stores the coordinates of that address.
# Find longitude and latitude
datalist <- list()
for (i in 1:nrow(df)){
street <- as.character(df$Matchstreet[i])
number <- as.numeric(df$Matchnumber[i])
coords <- adresloc %>%
filter(STRAATNAAM==street)
if (number %in% coords$HUISNUMMER) {
coords <- coords %>%
filter(HUISNUMMER==number) %>%
select(lat, lon)
datalist[[i]] <- coords
} else {
index <- which.min(abs(coords$HUISNUMMER-number))
number <- coords$HUISNUMMER[index]
coords <- coords %>%
filter(HUISNUMMER==number) %>%
select(lat, lon)
datalist[[i]] <- coords
}
}
datalist <- datalist %>%
lapply(function(x) x[1,]) %>%
do.call(rbind, .)
df <- cbind(df, datalist)
rm(datalist, street, number, coords, i, index)
The results of the extractions, matches and coordinate search looks as follows.
table <- df[1:10,]
kable(table, format="html", align=c("l", "l", "l", "c", "c", "c"), caption="Matched addresses and coordinates per record") %>%
kable_styling(bootstrap_options=c("striped", "hover"), font_size=12.5, full_width=FALSE) %>%
add_header_above(c("Original data"=2, "Matched address"=2, "Coordinates"=2))
| Datum | Locatie | Matchstreet | Matchnumber | lat | lon |
|---|---|---|---|---|---|
| 1945-02-25 | Adrianalaan.; vliegt. bommen | Adrianalaan | 377 | 51.96656 | 4.465249 |
| 1944-02-16 | Agniesestr. 60/62; granaten | Agniesestraat | 61 | 51.93105 | 4.475136 |
| 1941-11-15 | Albr. Engelmanstr.;4b.1g,ms | Albregt-Engelmanstraat | 25 | 51.90761 | 4.443347 |
| 1944-03-04 | Aleidisstr. 68 | Aleidisstraat | 68 | 51.91848 | 4.457668 |
| 1940-10-09 | Almondestr.;3b,2d,2g,ms | Almondestraat | 211 | 51.92914 | 4.480394 |
| 1944-03-04 | Anjerstr. 7; 4 doden en brandschade | Anjerstraat | 7 | 51.91266 | 4.494461 |
| 1940-10-09 | Anna Mariastr.;3b,2d,2g,ms | Sint-Mariastraat | 22 | 51.92001 | 4.468466 |
| 1944-09-20 | Azaleastr. 66b; projectiel | Azaleastraat | 66 | 51.94251 | 4.472607 |
| 1940-10-05 | Bellamystr. | Bellamystraat | 54 | 51.91672 | 4.432556 |
| 1944-10-06 | Benedenrijweg; doden en gewonden | Benedenrijweg | 47 | 51.89902 | 4.574057 |
Having extracted the (estimated) coordinates of each matchable record from the original data frame, a few additional data manipulations are applied before plotting the data on the map of Rotterdam. In addition to simply plotting the locations, the idea is to:
In order to distinguish the districts of Rotterdam, a shapefile of the municipality of Rotterdam was used. As they use a different coordinate system, however, a function had to be used to transform the shapefile to a usable polygon for R. Special thanks to Vincent Peijnenburg for figuring this function out! The code is as follows.
# Shapefile function
ShapeFileConverter <- function(x) {
x <- fortify(x)
x <- x %>% mutate(dx =(long-155000)*10^-5)
x <- x %>% mutate(dy =(lat-463000)*10^-5)
x <- x %>% mutate(somn= (3235.65389 * dy) + (-32.58297 * dx ^ 2) + (-0.2475 * dy ^ 2) + (-0.84978 * dx ^ 2 * dy) + (-0.0655 * dy ^ 3) + (-0.01709 * dx ^ 2 * dy ^ 2) + (-0.00738 * dx) + (0.0053 * dx ^ 4) + (-0.00039 * dx ^ 2 * dy ^ 3) + (0.00033 * dx ^ 4 * dy) + (-0.00012 * dx * dy))
x <- x %>% mutate(some= (5260.52916 * dx) + (105.94684 * dx * dy) + (2.45656 * dx * dy ^ 2) + (-0.81885 * dx ^ 3) + (0.05594 * dx * dy ^ 3) + (-0.05607 * dx ^ 3 * dy) + (0.01199 * dy) + (-0.00256 * dx ^ 3 * dy ^ 2) + (0.00128 * dx * dy ^ 4) + (0.00022 * dy ^ 2) + (-0.00022 * dx ^ 2) + (0.00026 * dx ^ 5))
x <- x %>% mutate(lon84= 5.387206 + (some / 3600))
x <- x %>% mutate(lat84=52.15517+(somn/3600))
x <- x[, c(12,13,6)]
colnames(x) <- c("lon", "lat", "id")
x$id <- as.numeric(x$id)
x_list <- split(x, x$id)
x_list <- lapply(x_list, function(x) { x["id"] <- NULL; x })
ps_x <- lapply(x_list, Polygon)
p1_x <- lapply(seq_along(ps_x), function(i) Polygons(list(ps_x[[i]]), ID = names(x_list)[i]))
my_spatial_polys_x <- SpatialPolygons(p1_x, proj4string = CRS("+proj=longlat +datum=WGS84"))
output_ShapeFileConverter<<- SpatialPolygonsDataFrame(my_spatial_polys_x, data.frame(id = unique(x$id), row.names = unique(x$id)))
}
# Make shape file for R
district_areas <- readOGR(dsn = "C:/Users/Rotterdam.LAP13971/Desktop/Coursera/Roffabomb", layer = "Wijken_adressen", verbose=FALSE)
ShapeFileConverter(district_areas)
district_shape <- output_ShapeFileConverter
rm(output_ShapeFileConverter, district_areas)
Next, columns must be added to the main data frame which indicate the district to which the records belong and which combine the date and address data. To add each district ID, the adresloc data frame is used to generate a data frame with every streetname and its district, which is then joined with the main data frame. Next, the ID data frame is cleaned so that it perfectly matches the main data frame, after which it is joined with the main data frame to add ID numbers for each district.
Adding strings to the data frame to be used for the pop-ups on the map is rather easy, as this can be done by pasting together the date, streetname and house number. The following code was used to add these columns to the main data frame.
# Add district and shapefile ID
districts <- adresloc %>%
select(STRAATNAAM, GEBDNAAM) %>%
arrange(STRAATNAAM) %>%
filter(!duplicated(STRAATNAAM),
STRAATNAAM %in% df$Matchstreet) %>%
mutate(GEBDNAAM=tolower(GEBDNAAM)) %>%
`colnames<-`(.,c("Matchstreet", "District"))
df <- left_join(df, districts, by="Matchstreet")
ID <- ID %>%
mutate(Gebied=tolower(Gebied)) %>%
`colnames<-`(., c("District", "ID"))
ID$District <- gsub(".*hillegersberg.*", "hillegersberg/schiebroek", ID$District)
ID$District <- gsub("rotterdam centrum", "stadscentrum", ID$District)
df <- left_join(df, ID, by="District")
# Add year column and year + adress column for pop-ups
df <- df %>%
mutate(Year=format(as.Date(Datum, format="%Y-%d-%m"), "%Y"),
`Pop-up`=paste(format(as.Date(Datum, format="%d-%m-%Y")),
paste(Matchstreet, Matchnumber), sep=": "))
The final data frame now looks as follows.
table <- df[1:10,]
kable(table, format="html", align=c("l", "l", "l", "c", "c", "c", "l", "c", "c"), caption="Matched addresses and coordinates per record") %>%
kable_styling(bootstrap_options=c("striped", "hover"), font_size=12.5, full_width=FALSE) %>%
add_header_above(c("Original data"=2, "Matched address"=2, "Coordinates"=2, " "=1, " "=1, " "=1, " "=1))
| Datum | Locatie | Matchstreet | Matchnumber | lat | lon | District | ID | Year | Pop-up |
|---|---|---|---|---|---|---|---|---|---|
| 1945-02-25 | Adrianalaan.; vliegt. bommen | Adrianalaan | 377 | 51.96656 | 4.465249 | hillegersberg/schiebroek | 17 | 1945 | 1945-02-24: Adrianalaan 377 |
| 1944-02-16 | Agniesestr. 60/62; granaten | Agniesestraat | 61 | 51.93105 | 4.475136 | noord | 1 | 1944 | 1944-02-15: Agniesestraat 61 |
| 1941-11-15 | Albr. Engelmanstr.;4b.1g,ms | Albregt-Engelmanstraat | 25 | 51.90761 | 4.443347 | delfshaven | 10 | 1941 | 1941-11-14: Albregt-Engelmanstraat 25 |
| 1944-03-04 | Aleidisstr. 68 | Aleidisstraat | 68 | 51.91848 | 4.457668 | delfshaven | 10 | 1944 | 1944-03-03: Aleidisstraat 68 |
| 1940-10-09 | Almondestr.;3b,2d,2g,ms | Almondestraat | 211 | 51.92914 | 4.480394 | noord | 1 | 1940 | 1940-10-08: Almondestraat 211 |
| 1944-03-04 | Anjerstr. 7; 4 doden en brandschade | Anjerstraat | 7 | 51.91266 | 4.494461 | feijenoord | 8 | 1944 | 1944-03-03: Anjerstraat 7 |
| 1940-10-09 | Anna Mariastr.;3b,2d,2g,ms | Sint-Mariastraat | 22 | 51.92001 | 4.468466 | stadscentrum | 13 | 1940 | 1940-10-08: Sint-Mariastraat 22 |
| 1944-09-20 | Azaleastr. 66b; projectiel | Azaleastraat | 66 | 51.94251 | 4.472607 | hillegersberg/schiebroek | 17 | 1944 | 1944-09-19: Azaleastraat 66 |
| 1940-10-05 | Bellamystr. | Bellamystraat | 54 | 51.91672 | 4.432556 | delfshaven | 10 | 1940 | 1940-10-04: Bellamystraat 54 |
| 1944-10-06 | Benedenrijweg; doden en gewonden | Benedenrijweg | 47 | 51.89902 | 4.574057 | ijsselmonde | 7 | 1944 | 1944-10-05: Benedenrijweg 47 |
The final step before generating the map is to add the total amount of bomb strikes per district to the district shape file, so that these aggregates can be used to color the districts of Rotterdam. This can be done with the following code.
# Amount per district
bombs <- df %>%
group_by(ID) %>%
summarize(Total=n())
colnames(bombs)[1] <- "id"
district_shape@data <- left_join(district_shape@data, bombs, by="id")
Now that all data has been generated and prepared for the map, the only step that remains is to generate it. Before doing so, a color palette is selected for coloring the districts and a bomb icon is used from the internet as a marker. Then leaflet is used to generate a map, offering the user the following options:
The code to achieve the above and the resulting map are shown below.
# Leaflet
reds <- brewer.pal(9, "Reds")
bombIcon <- makeIcon(
iconUrl = "explosion-417894_640.png",
iconWidth=50, iconHeight=50.15674)
bombMap <- leaflet(df) %>%
addTiles(group="OSM (default)",
options=providerTileOptions(minZoom=11, maxZoom=17)) %>%
addProviderTiles("OpenMapSurfer.Grayscale",
options=providerTileOptions(minZoom=11, maxZoom=17),
group="Black and White") %>%
addProviderTiles("Stamen.Terrain",
options=providerTileOptions(minZoom=11, maxZoom=17),
group="Stamen") %>%
setView(lng=4.4777325, lat=51.912448, zoom=12) %>%
addMarkers(~lon, ~lat, icon=bombIcon,
clusterOptions=markerClusterOptions(disableClusteringAtZoom=15,
spiderfyOnMaxZoom=FALSE),
popup=~htmlEscape(`Pop-up`),
group="Show Bombs") %>%
addPolygons(data = district_shape, weight = 2, color = "black", fillOpacity = 0.80,
fillColor = ~colorNumeric(reds, Total)(Total),
group="Show Districts") %>%
addHeatmap(~lon, ~lat, group="Show Heatmap", blur=45, cellSize=(40)) %>%
addLayersControl(
baseGroups=c("OSM (default)", "Black and White", "Stamen"),
overlayGroups=c("Show Bombs", "Show Districts", "Show Heatmap"),
options=layersControlOptions(collapsed=FALSE)) %>%
hideGroup(c("Show Districts", "Show Heatmap"))
# Final map
bombMap
Having plotted the map, this concludes the report of this project. For those who have made it all the way down here, thank you for taking the time to read through my report and I hope you found it informative!
# Cleaning of location column
clnv <- gsub(";.*$", "", clnv)
clnv <- gsub("v\\.", "van", clnv)
clnv <- gsub("str\\.|str$", "straat", clnv)
clnv <- gsub("str(\\,)", "straat\\1", clnv)
clnv <- gsub(".*Hanno.*", "Heijplaatweg", clnv)
clnv <- gsub("Vondel\\..*$", "Vondelingenplaat", clnv)
clnv <- gsub("Math\\..*$", "Mathenesserdijk", clnv)
clnv <- gsub("Albr.*$", "Albregt-Engelmanstraat", clnv)
clnv <- gsub("Schied\\.", "Schiedamse", clnv)
clnv <- gsub(".*Glashaven.*", "Glashaven", clnv)
clnv <- gsub(".*Bierhaven.*", "Jufferstraat", clnv)
clnv <- gsub("^Ged\\. Binnenrotte", "Binnenrotte", clnv)
clnv <- gsub("^(.*)\\sde$", "De \\1", clnv)
clnv <- gsub("^de", "De", clnv)
clnv <- gsub("dyk", "dijk", clnv)
clnv <- gsub("wyk", "wijk", clnv)
clnv <- gsub("Burg.", "Burgemeester", clnv)
clnv <- gsub("^(.*)Burg$", "Burgemeester \\1", clnv)
clnv <- gsub("pl\\.", "plein", clnv)
clnv <- gsub("l\\.", "laan", clnv)
clnv <- gsub("Jac\\.", "Jacob", clnv)
clnv <- gsub("Nw\\.", "Nieuwe", clnv)
clnv <- gsub("R[Oo]sener", "Rösener", clnv)
clnv <- gsub("RosenMainzstraat", "Rösener Manzstraat", clnv)
clnv <- gsub(".*Nelle.*", "Van Nelleweg", clnv)
clnv <- gsub(".*Oostdijk.*", "Oostdijk", clnv)
clnv <- gsub(".*Rotte.*", "Rottestraat", clnv)
clnv <- gsub("Nic\\.", "Nicolaas", clnv)
clnv <- gsub(".*PC.*", "P.C. Hooftplein", clnv)
clnv <- gsub("\\s*\\(.*\\)*", "", clnv)
clnv <- gsub("Ger\\.", "Gerard", clnv)
clnv <- gsub("Pr\\.", "Prins", clnv)
clnv <- gsub("St.", "Sint", clnv)
clnv <- gsub("Wm\\.", "Willem", clnv)
clnv <- gsub(".*Franciscus.*", "Kleiweg 500", clnv)
clnv <- gsub(".*Bomstraat.*", "Bomstraat", clnv)
clnv <- gsub("(Buyte.*)", "Willem \\1", clnv)
clnv <- gsub("granaat", "", clnv)
clnv <- gsub("Hoffmanplein (.*)", "\\1 Hoffmanplein", clnv)
clnv <- gsub(".*Kerkedijk.*", "Noorder Kerkedijk", clnv)
clnv <- gsub("Goudschestraat", "Goudseweg", clnv)
clnv <- gsub("Mathen\\.", "Mathenesser", clnv)
clnv <- gsub("^Waalhaven$", "Waalhaven O.z.", clnv)
clnv <- gsub("Hilleg\\.bergstraat", "Hillegaersbergstraat", clnv)
clnv <- gsub("Wolph\\. ", "Wolphaerts", clnv)
clnv <- gsub("(Kamphofstraat)", "Van \\1", clnv)
clnv <- gsub(".*[Bb]ier.*", "Bierstraat", clnv)
clnv <- gsub("Duylstraat van", "Van Duylstraat", clnv)
clnv <- gsub("Bruyn", "Bruijn", clnv)
clnv <- gsub("Shell- ", "", clnv)
clnv <- gsub("Witte de Withstraat en", "Witte de Withstraat", clnv)
clnv <- gsub(".*Superfosfaatfabriek.*", "Vondelingenweg", clnv)
clnv <- gsub(".*Genestet.*", "De Genestetplein", clnv)
clnv <- gsub(".*Tollen.*", "Tollenstraat", clnv)
clnv <- gsub(".*Jaffad.*", "Jaffadwarsstraat", clnv)
clnv <- gsub(".*Geref.*", "Jacob Catsstraat", clnv)
clnv <- gsub(".*gashouders.*", "Lusthofstraat", clnv)
clnv <- gsub(".*speeltuin.*", "Olympiaweg", clnv)
clnv <- gsub(".*Vakenoorsch.*", "West-Varkenoordseweg", clnv)
clnv <- gsub(".*Schiedamscheweg.*", "Schiedamseweg", clnv)
clnv <- gsub(".*Groote.*", "Grote Visserijstraat", clnv)
clnv <- gsub(".*Maasveem.*", "Maashaven O.z.", clnv)
clnv <- gsub(".*pier.*", "Waalhavenweg", clnv)
clnv <- gsub(".*Petra.*", "Petroleumweg", clnv)
clnv <- gsub("str$", "straat", clnv)
clnv <- gsub(".*Vollenhovenstraat.*", "Van Vollenhovenstraat", clnv)
clnv <- gsub(".*Sinteltjesstraat.*", "Rosestraat", clnv)
clnv <- gsub(".*Agatha.*", "Sint-Agathastraat", clnv)
clnv <- gsub("[Rr]\\.[Kk]\\.", "Mathenesserlaan", clnv)
clnv <- gsub(".*Adr\\..*", "Adrien Milderstraat", clnv)
clnv <- gsub(".Yzer.*", "1e IJzerstraat", clnv)
clnv <- gsub("Heysche", "Heyse", clnv)
clnv <- gsub("Overschie$", "", clnv)
clnv <- gsub("IJsselmonde$", "", clnv)
clnv <- gsub(".*Mart\\..*", "Martinus Steijnstraat", clnv)
clnv <- gsub(".*Tankweg.*", "Tankweg", clnv)
clnv <- gsub(".*Insulinde.*", "Insulindestraat", clnv)
clnv <- gsub("(.*) vanaf.*", "\\1", clnv)
clnv <- gsub(".*Hofplein.*", "Hofplein", clnv)
clnv <- gsub(".*Raampoort.*", "Raampoortstraat", clnv)
clnv <- gsub(".*Haem.*", "Witte van Haemstedestraat", clnv)
maasindex <- grep("Maashaven", clnv)
newmaas <- sample(grep("Maashaven", unique(adresloc$STRAATNAAM), value=T), size=length(maasindex), replace=T)
clnv[maasindex] <- newmaas
rm(maasindex, newmaas)
clnv <- gsub("\\,.*$", "", clnv)
clnv <- gsub("-", "", clnv)
clnv <- gsub("\\d+", "", clnv)
clnv <- gsub("\\|\\/|\\[|\\]|\"", "", clnv)
clnv <- gsub(" t\\/m", "", clnv)
clnv <- gsub(" en ", "", clnv)
clnv <- gsub("t\\.o\\.", "", clnv)
clnv <- gsub("\\/.*$", "", clnv)