Introduction

This report covers a project of which the goal was to map impact locations of bomb strikes in Rotterdam between 1940 and 1945 to a map. To do so, a PDF file is used which contains dates and strings with information regarding bomb strikes in Rotterdam, which is then cleaned, manipulated and transformed to eventually obtain coordinates which can be used for plotting.

The idea behind this report is to thoroughly cover how I got from messy data to approximations of addresses and coordinates using regex, R and Leaflet. Hence it might be exhaustive, but if you’re looking to learn more about these subjects, it can be worth the read!

Library

The following packages were used for this project. Make sure to install and load these, before running the rest of the code.

# Library
library(XLConnect)
library(readr)
library(dplyr)
library(stringdist)
library(stringr)
library(leaflet)
library(rgdal)
library(ggplot2)
library(readxl)
library(RColorBrewer)
library(leaflet.extras)
library(knitr)
library(kableExtra)
library(htmltools)

Loading the data

In order to identify the impact locations of bomb strikes between 1940 and 1945, a PDF file was used from the municipal archive of Rotterdam, which lists date and impact location of each strike. This document was then transformed to an Excel file, after which it could be read into R. As the Excel file contains many sheets, the XLConnect package was used to load the workbook and read in all sheets all at once as a list.

In addition, a file which contains all addresses of Rotterdam and links these to longitude and latitude coordinates was used to map each address to the map of Rotterdam. Lastly, all addresses were linked to districts, which in turn were linked to an ID file to link the districts to district ID’s in a shapefile used later on.

The PDF file of the addresses is available here

# Read in data
wb <- loadWorkbook("bombs.xlsx")
lst <- readWorksheet(wb, sheet = getSheets(wb))
adresloc <- read.csv2("adresses.csv", header=TRUE, sep=";", encoding="UTF-8")
names(adresloc)[1] <- "STRAATNUMM"
ID <- read_excel("ID_Gebied.xlsx")

# List to data frame
df <- do.call("rbind", lst)
rm(lst)

Cleaning the data

The raw data for this project looks as follows.

table <- df %>%
    `row.names<-`(.,c()) %>%
    head(20)

kable(table, format="html", caption="The raw data", align=c("l", "r")) %>%
    kable_styling(bootstrap_options=c("striped", "hover"), font_size=12.5, full_width=FALSE)
The raw data
Datum Locatie
1945-02-25 Adrianalaan.; vliegt. bommen
1944-02-16 Agniesestr. 60/62; granaten
1941-11-15 Albr. Engelmanstr.;4b.1g,ms
1944-03-04 Aleidisstr. 68
1940-10-09 Almondestr.;3b,2d,2g,ms
1944-03-04 Anjerstr. 7; 4 doden en brandschade
1940-10-09 Anna Mariastr.;3b,2d,2g,ms
1944-09-20 Azaleastr. 66b; projectiel
1940-10-05 Bellamystr.
1944-10-06 Benedenrijweg; doden en gewonden
1944-09-13 Berghaven, HvH; bommen nabij
1940-06-26 Bergweg; bomschade aan wegdek en huizen
1944-03-02 Bergweg;illegale krantjes uit lucht
1940-06-04 Bierhaven; huizen verwoest
1940-07-02 Bilderdykstr.,; glasschade
1940-10-05 Bilderdykstr.;doden en gew., veel schade
1940-12-05 Blokmakerstr.;4b,1d,1g, ms
1944-11-30 Boergoensestr.; inslag luchtafw.granaat
1944-01-11 Bomstr. 9 inslag afweergranaat
1940-10-05 Boschpolderplein

The first column Datum, which shows the date of the bomb strikes, seems to be properly registered. The second column Locatie, which contains the location of the strikes, seems to be the exact opposite and is rather messy. In just the first 20 rows of the data, a combination of streets, classifications of bomb strikes (air strike, projectile, grenade), house numbers, casualties, amounts of wounded civilians and abbreviations of all these registrations are found. As the main purpose of this project is to extract the streetname and house number per record in the data, the Locatie column needs to be cleaned accordingly to achieve this.

To extract these elements from the data, the first step is to remove records that are either outside of Rotterdam or are too general (i.e. the registered location is a district, instead of a specific street). This is done as follows.

# Remove non-Rotterdam and too general adresses
index <- c(grep("[Hh]oek [Vv]an|[Hh][Vv][Hh]|[Hh]oek [Vv]\\.* [Hh]olland", df$Locatie),
           grep("Hoogvliet", df$Locatie),
           grep("Oude Hoek", df$Locatie),
           grep("Pernis", df$Locatie),
           grep("Rhoon", df$Locatie),
           grep("Europa", df$Locatie),
           grep("Blydorp|Blijdorp", df$Locatie),
           grep("Diwero", df$Locatie),
           grep("Spangen", df$Locatie),
           grep("^Gem\\.", df$Locatie),
           grep("Feijenoord,", df$Locatie),
           grep("M,", df$Locatie),
           grep("^Shell$", df$Locatie),
           grep("Rooyen", df$Locatie))
df <- df[-index,]

Next, this dataset is used twice to (1) extract the streetname and (2) to extract the housenumber for each record.

Extracting streetnames

To extract the streetnames, the following steps are taken:

  1. The STRAATNAAM column of the adresloc dataset, which contains all streetnames of the streets in Rotterdam, is checked on correctness. Mistakes found in the streetnames are corrected using gsub;
  2. The Locatie column of the main dataset is put in a separate vector, clnv, and is thoroughly cleaned, with the goal of only retaining the streetname per record. This is done mainly by using gsub;
  3. clnv is matched with the previously cleaned STRAATNAAM column. This step has two purposes: (1) adjust the spelling of old streetnames to their contemporary spelling, and (2) find the streets which no longer exist. The results of matching these vectors are stored in the vector matches;
  4. The matches vector is added to the main data frame, after which the data frame is filtered to exclude all non-matched addresses.

Cleaning the STRAATNAAM column is rather easy, as only a few addresses have mistakes in them. The following code corrects for these mistakes.

# Clean addresses
adresloc$STRAATNAAM <- gsub("^.*sener", "Rösener", adresloc$STRAATNAAM)
adresloc$STRAATNAAM <- gsub(".*Galile.*", "Galileistraat", adresloc$STRAATNAAM)
adresloc$STRAATNAAM <- gsub(".*Hooft.*", "P.C. Hooftplein", adresloc$STRAATNAAM)

Cleaning the Locatie column is a rather long and meticulous process, as the data is very messy and contains a total of 1508 records. First, the addresses are stored in a vector called clnv. To support the cleaning process, the following function was used to look for partial string matches in the main data frame and the adresloc data frame.

# Put addresses in seperate vector
clnv <- df$Locatie

# Search function to find matching addresses
sf <- function(searchterm) {
      x <- grep(searchterm, clnv, value=T)
      y <- unique(grep(searchterm, adresloc$STRAATNAAM, value=T))
      searchlist <- list(`In clnv`=x, `In adresloc`=y)
      searchlist
}

To clean clnv, gsub was used to either remove parts of the location strings or amend them, to ensure that they can subsequently be correctly matched. The actual code that was used can be found in appendix A, as the code got rather lengthy to account for all the exceptions and abnormalities in the strings.

Because clnv is cleaned before it is matched, a low margin of error can be used in the matching function, leading to better matches. Using the cleaned addresses in clnv, the following loop was used to match the strings with streetnames in adresloc. In order to match strings, the Damerau-Levenshtein distance was used with a maximum distance of 4.

# Matching
matches <- list()
search <- unique(adresloc$STRAATNAAM)
for (i in 1:length(clnv)) {
      x <- clnv[i]
      index <- amatch(x, search, maxDist=4, method="osa")
      y <- search[index]
      matches[[i]] <- data.frame(Original=x, Match=y)
}
matches <- do.call(rbind, matches)
rm(x, y, index, i, clnv, search)

The matching process has resulted in 1225 matched records and 283 non-matched records. The results are added to the main data frame, after which the non-matched records are removed.

# Add matched records and remove non-matched records
df <- df %>%
      mutate(Matchstreet=matches[,2]) %>%
      filter(!is.na(Matchstreet))
rm(matches)

Extracting house numbers

Having extracted streetnames, the next step is to extract house numbers. There is a lot of variability in house numbers, however, as the records can contain a single, a range or a series of house numbers. Additionally, other numeric information can also be stored in each record.

The general strategy to extract the house numbers is as follows:

  1. Use the previously filtered Locatie column from the main data frame and store it in the vector numbers;
  2. Remove all non-address information in each string. This turned out to be rather easy, as additional information regarding bomb strikes is generally stored after the address, seperated with a semicolon. Hence, removing all text after semicolons solves this issue;
  3. Remove all characters except numerics and a set of characters used to indicate series and ranges of house numbers;
  4. Extract the first and the last number of the cleaned strings.

As ranges are either indicated with a comma, a hyphen, a forward slash or the Dutch version of and, en, the above strategy is applied on numbers twice. First to extract all ranges with commas, hyphens and forwards slashes, then a second time to extract ranges with en in it.

For the first extraction, the following code is used to clean numbers.

# Address numbers with punctuation marks
numbers <- df$Locatie
numbers <- gsub(".*Jobshavean.*", "Sint Jobshaven 105", numbers)
numbers <- gsub(";.*$", "", numbers)                  # Remove text after semicolons
numbers <- gsub("[^0-9\\/\\,\\-]", "", numbers)       # Remove everything except set in brackets
numbers[grep("[0-9]+", numbers, invert=T)] <- NA      # Change strings with no digits to NA
numbers <- gsub("^,|^\\/|^\\-", "", numbers)          # Remove commas, slashes and hyphens at start of strings
numbers <- gsub("\\,$", "", numbers)                  # Remove commas at the end of strings

The result looks as follows.

head(numbers, 30)
 [1] NA      "60/62" NA      "68"    NA      "7"     NA      "66"   
 [9] NA      NA      NA      NA      NA      NA      NA      NA     
[17] NA      "9"     NA      NA      "2/214" NA      "11"    NA     
[25] NA      "130"   "12-14" NA      NA      "2"    

What remains are single numbers, ranges or NA’s. To take these differences into account, two new vectors are generated. firstnum is used to store the first number of each string, while secnum is used to store the last number in each string. When records contain only one number, the first and last number will be the same, while in case of ranges, these newly created vectors will contain the lower and upper bound of the range.

# Create vectors with first and last numbers
firstnum <- gsub("^([0-9]*).*", "\\1", numbers)             # Set of digits until first non-digit character
secnum <- gsub(".*[\\,|\\/|\\-]([0-9]+)$", "\\1", numbers)  # Last set of digits, preceded by a comma, slash or hyphen

This process is repeated for the records containing en.

# Address numbers with 'en'
numbers <- df$Locatie
numbers <- gsub(";.*$", "", numbers)                  # Remove text after semicolons
numbers[-grep(" en ", numbers)] <- NA                 # Remove all records without en in it
numbers[grep("[0-9]+", numbers, invert=T)] <- NA      # Change strings with no digits to NA
numbers <- gsub("\\(.*\\)$", "", numbers)             # Remove brackets from strings
numbers <- gsub(", tuinen", "", numbers)
numbers <- gsub(".*?([0-9].*$)", "\\1", numbers)      # Remove the part of each string before the first digit
numbers <- gsub("\\s$", "", numbers)                  # Remove white space at the end of strings
numbers <- gsub("\\D$", "", numbers)                  # Remove non-digit characters at the end of strings

# Create vectors with first and last numbers
firstnum2 <- gsub("^([0-9]*).*", "\\1", numbers)      # Set of digits untill first non-digit character
secnum2 <- gsub(".*\\D([0-9]+$)", "\\1", numbers)     # Last set of digits, preceded by a comma, slash or hyphen

The results are then combined in a data frame and look as follows:

index <- which(!is.na(firstnum2))

firstnum[index] <- firstnum2[index]
secnum[index] <- secnum2[index]

numbers <- cbind(firstnum, secnum)
rm(firstnum, secnum, firstnum2, secnum2, index)

inival <- df$Locatie %>%
    gsub(";.*$", "", .)

table <- cbind(inival[18:35], numbers[18:35,])

kable(table, format="html", align=c("l", "c", "c"), caption="Extracted (range of) housenumber(s) per address") %>%
        kable_styling(bootstrap_options=c("striped", "hover"), font_size=12.5, full_width=FALSE)
Extracted (range of) housenumber(s) per address
firstnum secnum
Bomstr. 9 inslag afweergranaat 9 9
Boschpolderplein NA NA
Bovenstr. NA NA
Brielsel. 2 t/m 214 2 214
Brielsel. NA NA
Brikstr. 11 11 11
Burg. Meineszplein NA NA
Buytewechstr., NA NA
Charloise Lagedijk 130 130 130
Cillaarshoekstr. 12-14 12 14
Crooswyk NA NA
Deenschestr. NA NA
Delflandsestr. 2 2 2
Dordtschel. 60 60 60
Dordtsestraatweg 809 granaat 809 809
Dordtsestraatweg NA NA
Dorpsweg NA NA
Dreef 121-131 121 131

To obtain a final number house number per address, a loop is used. The loop checks which records have house numbers and then proceeds to either (1) store the house number if firstnum and secnum are the same or (2) sample a number from the number range with lower bound firstnum and upperbound secnum. If a record does not have a housenumber, it looks up all available house numbers for the street of that record in adresloc, after which it samples one from that set of house numbers. The definitive house numbers are then added to the main data frame.

# Definitive number
defnum <- vector()
resample <- function(x, ...) x[sample.int(length(x), ...)]
set.seed(9112017)

for (i in 1:nrow(numbers)){
      if (!is.na(numbers[i,1])) {
            range <- c(numbers[i,1]:numbers[i,2])
            num <- resample(range, 1)
            defnum[i] <- num
      } else {
          street <- df$Matchstreet[i]
          impnum <- adresloc %>%
              filter(STRAATNAAM==street) %>%
              select(HUISNUMMER) %>%
              unlist() %>%
              sample(.,1)
          defnum[i] <- impnum
      }
}

df <- df %>%
      mutate(Matchnumber=defnum)
rm(defnum, i, range, num, numbers, street, impnum)

It should be noted that these definitive house numbers can be approximations, even if house numbers were available in the original data. Take for example a record which states that house numbers 24 and 57 were hit. The loop used above will then sample a number between 24 and 57 and use that as a definitive number, while only these specific addresses were hit. Although the loop can still return 24 or 57 as its definitive number, odds are it will be a number between these bounds. This approach was taken because it was pragmatic and yielded a reasonable approximation of the impact locations without spending too much time on the intricacies of the house number ranges. Alternatively, the middle of number ranges could have been chosen as the impact location. The data contains a lot of records of strikes in the same streets but without house numbers, however, so this would have resulted in a lot of bomb strikes being clustered at the same location. Hence, random sampling was chosen.

Adding coordinates

Having extracted a streetname and house number for each record in the remaining data set, the next step is to look up the longitude and latitude coordinates of each address, and add these to the data frame. To do so, a loop is used again. The loop takes the streetname and house number of each record and checks in adresloc for a match. If a match is found, it extracts the longitude and latitude coordinates from the lon and lat columns in adresloc. If a match is not found, the loop looks for the closest housenumber to the one in the main data frame, and then stores the coordinates of that address.

# Find longitude and latitude
datalist <- list()
for (i in 1:nrow(df)){
      street <- as.character(df$Matchstreet[i])
      number <- as.numeric(df$Matchnumber[i])
      coords <- adresloc %>%
          filter(STRAATNAAM==street)
      if (number %in% coords$HUISNUMMER) {
          coords <- coords %>%
              filter(HUISNUMMER==number) %>%
              select(lat, lon)
          datalist[[i]] <- coords
      } else {
          index <- which.min(abs(coords$HUISNUMMER-number))
          number <- coords$HUISNUMMER[index]
          coords <- coords %>%
              filter(HUISNUMMER==number) %>%
              select(lat, lon)
          datalist[[i]] <- coords
      }
}

datalist <- datalist %>%
    lapply(function(x) x[1,]) %>%
    do.call(rbind, .)

df <- cbind(df, datalist)
rm(datalist, street, number, coords, i, index)

The results of the extractions, matches and coordinate search looks as follows.

table <- df[1:10,]

kable(table, format="html", align=c("l", "l", "l", "c", "c", "c"), caption="Matched addresses and coordinates per record") %>%
        kable_styling(bootstrap_options=c("striped", "hover"), font_size=12.5, full_width=FALSE) %>%
    add_header_above(c("Original data"=2, "Matched address"=2, "Coordinates"=2))
Matched addresses and coordinates per record
Original data
Matched address
Coordinates
Datum Locatie Matchstreet Matchnumber lat lon
1945-02-25 Adrianalaan.; vliegt. bommen Adrianalaan 377 51.96656 4.465249
1944-02-16 Agniesestr. 60/62; granaten Agniesestraat 61 51.93105 4.475136
1941-11-15 Albr. Engelmanstr.;4b.1g,ms Albregt-Engelmanstraat 25 51.90761 4.443347
1944-03-04 Aleidisstr. 68 Aleidisstraat 68 51.91848 4.457668
1940-10-09 Almondestr.;3b,2d,2g,ms Almondestraat 211 51.92914 4.480394
1944-03-04 Anjerstr. 7; 4 doden en brandschade Anjerstraat 7 51.91266 4.494461
1940-10-09 Anna Mariastr.;3b,2d,2g,ms Sint-Mariastraat 22 51.92001 4.468466
1944-09-20 Azaleastr. 66b; projectiel Azaleastraat 66 51.94251 4.472607
1940-10-05 Bellamystr. Bellamystraat 54 51.91672 4.432556
1944-10-06 Benedenrijweg; doden en gewonden Benedenrijweg 47 51.89902 4.574057

Data manipulations

Having extracted the (estimated) coordinates of each matchable record from the original data frame, a few additional data manipulations are applied before plotting the data on the map of Rotterdam. In addition to simply plotting the locations, the idea is to:

  1. Add a shapefile to the map which separates the districts of Rotterdam;
  2. Link all observations in the data to districts, so the shapefile can be used to indicate on the map which districts were hit the most;
  3. Add pop-ups for each location, which show the address and date of each bomb strike.

In order to distinguish the districts of Rotterdam, a shapefile of the municipality of Rotterdam was used. As they use a different coordinate system, however, a function had to be used to transform the shapefile to a usable polygon for R. Special thanks to Vincent Peijnenburg for figuring this function out! The code is as follows.

# Shapefile function
ShapeFileConverter <- function(x) {
    x <- fortify(x)
    x <- x %>% mutate(dx =(long-155000)*10^-5)
    x <- x %>% mutate(dy =(lat-463000)*10^-5)
    x <- x %>% mutate(somn= (3235.65389 * dy) + (-32.58297 * dx ^ 2) + (-0.2475 * dy ^ 2) + (-0.84978 * dx ^ 2 * dy) + (-0.0655 * dy ^ 3) + (-0.01709 * dx ^ 2 * dy ^ 2) + (-0.00738 * dx) + (0.0053 * dx ^ 4) + (-0.00039 * dx ^ 2 * dy ^ 3) + (0.00033 * dx ^ 4 * dy) + (-0.00012 * dx * dy))
    x <- x %>% mutate(some= (5260.52916 * dx) + (105.94684 * dx * dy) + (2.45656 * dx * dy ^ 2) + (-0.81885 * dx ^ 3) + (0.05594 * dx * dy ^ 3) + (-0.05607 * dx ^ 3 * dy) + (0.01199 * dy) + (-0.00256 * dx ^ 3 * dy ^ 2) + (0.00128 * dx * dy ^ 4) + (0.00022 * dy ^ 2) + (-0.00022 * dx ^ 2) + (0.00026 * dx ^ 5))
    x <- x %>% mutate(lon84= 5.387206 + (some / 3600))
    x <- x %>% mutate(lat84=52.15517+(somn/3600))
    x <- x[, c(12,13,6)]
    colnames(x) <- c("lon", "lat", "id")
    x$id <- as.numeric(x$id)
    x_list <- split(x, x$id)
    x_list <- lapply(x_list, function(x) { x["id"] <- NULL; x })
    ps_x <- lapply(x_list, Polygon)
    p1_x <- lapply(seq_along(ps_x), function(i) Polygons(list(ps_x[[i]]), ID = names(x_list)[i]))
    my_spatial_polys_x <- SpatialPolygons(p1_x, proj4string = CRS("+proj=longlat +datum=WGS84"))
    output_ShapeFileConverter<<-  SpatialPolygonsDataFrame(my_spatial_polys_x, data.frame(id = unique(x$id), row.names = unique(x$id)))
}

# Make shape file for R
district_areas <- readOGR(dsn = "C:/Users/Rotterdam.LAP13971/Desktop/Coursera/Roffabomb", layer = "Wijken_adressen", verbose=FALSE)
ShapeFileConverter(district_areas)
district_shape <- output_ShapeFileConverter
rm(output_ShapeFileConverter, district_areas)

Next, columns must be added to the main data frame which indicate the district to which the records belong and which combine the date and address data. To add each district ID, the adresloc data frame is used to generate a data frame with every streetname and its district, which is then joined with the main data frame. Next, the ID data frame is cleaned so that it perfectly matches the main data frame, after which it is joined with the main data frame to add ID numbers for each district.

Adding strings to the data frame to be used for the pop-ups on the map is rather easy, as this can be done by pasting together the date, streetname and house number. The following code was used to add these columns to the main data frame.

# Add district and shapefile ID
districts <- adresloc %>%
    select(STRAATNAAM, GEBDNAAM) %>%
    arrange(STRAATNAAM) %>%
    filter(!duplicated(STRAATNAAM),
           STRAATNAAM %in% df$Matchstreet) %>%
    mutate(GEBDNAAM=tolower(GEBDNAAM)) %>%
    `colnames<-`(.,c("Matchstreet", "District"))

df <- left_join(df, districts, by="Matchstreet")

ID <- ID %>%
    mutate(Gebied=tolower(Gebied)) %>%
    `colnames<-`(., c("District", "ID"))
ID$District <- gsub(".*hillegersberg.*", "hillegersberg/schiebroek", ID$District)
ID$District <- gsub("rotterdam centrum", "stadscentrum", ID$District)

df <- left_join(df, ID, by="District")

# Add year column and year + adress column for pop-ups
df <- df %>%
    mutate(Year=format(as.Date(Datum, format="%Y-%d-%m"), "%Y"),
           `Pop-up`=paste(format(as.Date(Datum, format="%d-%m-%Y")),
                              paste(Matchstreet, Matchnumber), sep=": "))

The final data frame now looks as follows.

table <- df[1:10,]

kable(table, format="html", align=c("l", "l", "l", "c", "c", "c", "l", "c", "c"), caption="Matched addresses and coordinates per record") %>%
        kable_styling(bootstrap_options=c("striped", "hover"), font_size=12.5, full_width=FALSE) %>%
    add_header_above(c("Original data"=2, "Matched address"=2, "Coordinates"=2, " "=1, " "=1, " "=1, " "=1))
Matched addresses and coordinates per record
Original data
Matched address
Coordinates
Datum Locatie Matchstreet Matchnumber lat lon District ID Year Pop-up
1945-02-25 Adrianalaan.; vliegt. bommen Adrianalaan 377 51.96656 4.465249 hillegersberg/schiebroek 17 1945 1945-02-24: Adrianalaan 377
1944-02-16 Agniesestr. 60/62; granaten Agniesestraat 61 51.93105 4.475136 noord 1 1944 1944-02-15: Agniesestraat 61
1941-11-15 Albr. Engelmanstr.;4b.1g,ms Albregt-Engelmanstraat 25 51.90761 4.443347 delfshaven 10 1941 1941-11-14: Albregt-Engelmanstraat 25
1944-03-04 Aleidisstr. 68 Aleidisstraat 68 51.91848 4.457668 delfshaven 10 1944 1944-03-03: Aleidisstraat 68
1940-10-09 Almondestr.;3b,2d,2g,ms Almondestraat 211 51.92914 4.480394 noord 1 1940 1940-10-08: Almondestraat 211
1944-03-04 Anjerstr. 7; 4 doden en brandschade Anjerstraat 7 51.91266 4.494461 feijenoord 8 1944 1944-03-03: Anjerstraat 7
1940-10-09 Anna Mariastr.;3b,2d,2g,ms Sint-Mariastraat 22 51.92001 4.468466 stadscentrum 13 1940 1940-10-08: Sint-Mariastraat 22
1944-09-20 Azaleastr. 66b; projectiel Azaleastraat 66 51.94251 4.472607 hillegersberg/schiebroek 17 1944 1944-09-19: Azaleastraat 66
1940-10-05 Bellamystr. Bellamystraat 54 51.91672 4.432556 delfshaven 10 1940 1940-10-04: Bellamystraat 54
1944-10-06 Benedenrijweg; doden en gewonden Benedenrijweg 47 51.89902 4.574057 ijsselmonde 7 1944 1944-10-05: Benedenrijweg 47

The final step before generating the map is to add the total amount of bomb strikes per district to the district shape file, so that these aggregates can be used to color the districts of Rotterdam. This can be done with the following code.

# Amount per district
bombs <- df %>%
    group_by(ID) %>%
    summarize(Total=n())

colnames(bombs)[1] <- "id"
district_shape@data <- left_join(district_shape@data, bombs, by="id")

The map

Now that all data has been generated and prepared for the map, the only step that remains is to generate it. Before doing so, a color palette is selected for coloring the districts and a bomb icon is used from the internet as a marker. Then leaflet is used to generate a map, offering the user the following options:

The code to achieve the above and the resulting map are shown below.

# Leaflet
reds <- brewer.pal(9, "Reds")

bombIcon <- makeIcon(
    iconUrl = "explosion-417894_640.png",
    iconWidth=50, iconHeight=50.15674)

bombMap <- leaflet(df) %>%
    addTiles(group="OSM (default)",
             options=providerTileOptions(minZoom=11, maxZoom=17)) %>%
    addProviderTiles("OpenMapSurfer.Grayscale", 
                     options=providerTileOptions(minZoom=11, maxZoom=17),
                     group="Black and White") %>%
    addProviderTiles("Stamen.Terrain",
                     options=providerTileOptions(minZoom=11, maxZoom=17),
                     group="Stamen") %>%
    setView(lng=4.4777325, lat=51.912448, zoom=12) %>%
    addMarkers(~lon, ~lat, icon=bombIcon, 
               clusterOptions=markerClusterOptions(disableClusteringAtZoom=15,
                                                   spiderfyOnMaxZoom=FALSE),
               popup=~htmlEscape(`Pop-up`),
               group="Show Bombs") %>%
    addPolygons(data = district_shape, weight = 2, color = "black", fillOpacity = 0.80, 
                fillColor = ~colorNumeric(reds, Total)(Total),
                group="Show Districts") %>%
    addHeatmap(~lon, ~lat, group="Show Heatmap", blur=45, cellSize=(40)) %>%
    addLayersControl(
        baseGroups=c("OSM (default)", "Black and White", "Stamen"),
        overlayGroups=c("Show Bombs", "Show Districts", "Show Heatmap"),
        options=layersControlOptions(collapsed=FALSE)) %>%
    hideGroup(c("Show Districts", "Show Heatmap"))

# Final map
bombMap



Having plotted the map, this concludes the report of this project. For those who have made it all the way down here, thank you for taking the time to read through my report and I hope you found it informative!

Appendix

A. Cleaning the address vector

# Cleaning of location column
clnv <- gsub(";.*$", "", clnv)
clnv <- gsub("v\\.", "van", clnv)
clnv <- gsub("str\\.|str$", "straat", clnv)
clnv <- gsub("str(\\,)", "straat\\1", clnv)
clnv <- gsub(".*Hanno.*", "Heijplaatweg", clnv)
clnv <- gsub("Vondel\\..*$", "Vondelingenplaat", clnv)
clnv <- gsub("Math\\..*$", "Mathenesserdijk", clnv)
clnv <- gsub("Albr.*$", "Albregt-Engelmanstraat", clnv)
clnv <- gsub("Schied\\.", "Schiedamse", clnv)
clnv <- gsub(".*Glashaven.*", "Glashaven", clnv)
clnv <- gsub(".*Bierhaven.*", "Jufferstraat", clnv)
clnv <- gsub("^Ged\\. Binnenrotte", "Binnenrotte", clnv)
clnv <- gsub("^(.*)\\sde$", "De \\1", clnv)
clnv <- gsub("^de", "De", clnv)
clnv <- gsub("dyk", "dijk", clnv)
clnv <- gsub("wyk", "wijk", clnv)
clnv <- gsub("Burg.", "Burgemeester", clnv)
clnv <- gsub("^(.*)Burg$", "Burgemeester \\1", clnv)
clnv <- gsub("pl\\.", "plein", clnv)
clnv <- gsub("l\\.", "laan", clnv)
clnv <- gsub("Jac\\.", "Jacob", clnv)
clnv <- gsub("Nw\\.", "Nieuwe", clnv)
clnv <- gsub("R[Oo]sener", "Rösener", clnv)
clnv <- gsub("RosenMainzstraat", "Rösener Manzstraat", clnv)
clnv <- gsub(".*Nelle.*", "Van Nelleweg", clnv)
clnv <- gsub(".*Oostdijk.*", "Oostdijk", clnv)
clnv <- gsub(".*Rotte.*", "Rottestraat", clnv)
clnv <- gsub("Nic\\.", "Nicolaas", clnv)
clnv <- gsub(".*PC.*", "P.C. Hooftplein", clnv)
clnv <- gsub("\\s*\\(.*\\)*", "", clnv)
clnv <- gsub("Ger\\.", "Gerard", clnv)
clnv <- gsub("Pr\\.", "Prins", clnv)
clnv <- gsub("St.", "Sint", clnv)
clnv <- gsub("Wm\\.", "Willem", clnv)
clnv <- gsub(".*Franciscus.*", "Kleiweg 500", clnv)
clnv <- gsub(".*Bomstraat.*", "Bomstraat", clnv)
clnv <- gsub("(Buyte.*)", "Willem \\1", clnv)
clnv <- gsub("granaat", "", clnv)
clnv <- gsub("Hoffmanplein (.*)", "\\1 Hoffmanplein", clnv)
clnv <- gsub(".*Kerkedijk.*", "Noorder Kerkedijk", clnv)
clnv <- gsub("Goudschestraat", "Goudseweg", clnv)
clnv <- gsub("Mathen\\.", "Mathenesser", clnv)
clnv <- gsub("^Waalhaven$", "Waalhaven O.z.", clnv)
clnv <- gsub("Hilleg\\.bergstraat", "Hillegaersbergstraat", clnv)
clnv <- gsub("Wolph\\. ", "Wolphaerts", clnv)
clnv <- gsub("(Kamphofstraat)", "Van \\1", clnv)
clnv <- gsub(".*[Bb]ier.*", "Bierstraat", clnv)
clnv <- gsub("Duylstraat van", "Van Duylstraat", clnv)
clnv <- gsub("Bruyn", "Bruijn", clnv)
clnv <- gsub("Shell- ", "", clnv)
clnv <- gsub("Witte de Withstraat en", "Witte de Withstraat", clnv)
clnv <- gsub(".*Superfosfaatfabriek.*", "Vondelingenweg", clnv)
clnv <- gsub(".*Genestet.*", "De Genestetplein", clnv)
clnv <- gsub(".*Tollen.*", "Tollenstraat", clnv)
clnv <- gsub(".*Jaffad.*", "Jaffadwarsstraat", clnv)
clnv <- gsub(".*Geref.*", "Jacob Catsstraat", clnv)
clnv <- gsub(".*gashouders.*", "Lusthofstraat", clnv)
clnv <- gsub(".*speeltuin.*", "Olympiaweg", clnv)
clnv <- gsub(".*Vakenoorsch.*", "West-Varkenoordseweg", clnv)
clnv <- gsub(".*Schiedamscheweg.*", "Schiedamseweg", clnv)
clnv <- gsub(".*Groote.*", "Grote Visserijstraat", clnv)
clnv <- gsub(".*Maasveem.*", "Maashaven O.z.", clnv)
clnv <- gsub(".*pier.*", "Waalhavenweg", clnv)
clnv <- gsub(".*Petra.*", "Petroleumweg", clnv)
clnv <- gsub("str$", "straat", clnv)
clnv <- gsub(".*Vollenhovenstraat.*", "Van Vollenhovenstraat", clnv)
clnv <- gsub(".*Sinteltjesstraat.*", "Rosestraat", clnv)
clnv <- gsub(".*Agatha.*", "Sint-Agathastraat", clnv)
clnv <- gsub("[Rr]\\.[Kk]\\.", "Mathenesserlaan", clnv)
clnv <- gsub(".*Adr\\..*", "Adrien Milderstraat", clnv)
clnv <- gsub(".Yzer.*", "1e IJzerstraat", clnv)
clnv <- gsub("Heysche", "Heyse", clnv)
clnv <- gsub("Overschie$", "", clnv)
clnv <- gsub("IJsselmonde$", "", clnv)
clnv <- gsub(".*Mart\\..*", "Martinus Steijnstraat", clnv)
clnv <- gsub(".*Tankweg.*", "Tankweg", clnv)
clnv <- gsub(".*Insulinde.*", "Insulindestraat", clnv)
clnv <- gsub("(.*) vanaf.*", "\\1", clnv)
clnv <- gsub(".*Hofplein.*", "Hofplein", clnv)
clnv <- gsub(".*Raampoort.*", "Raampoortstraat", clnv)
clnv <- gsub(".*Haem.*", "Witte van Haemstedestraat", clnv)

maasindex <- grep("Maashaven", clnv)
newmaas <- sample(grep("Maashaven", unique(adresloc$STRAATNAAM), value=T), size=length(maasindex), replace=T)
clnv[maasindex] <- newmaas
rm(maasindex, newmaas)

clnv <- gsub("\\,.*$", "", clnv)
clnv <- gsub("-", "", clnv)
clnv <- gsub("\\d+", "", clnv)
clnv <- gsub("\\|\\/|\\[|\\]|\"", "", clnv)
clnv <- gsub(" t\\/m", "", clnv)
clnv <- gsub(" en ", "", clnv)
clnv <- gsub("t\\.o\\.", "", clnv)
clnv <- gsub("\\/.*$", "", clnv)