In this makeover, the main purpose is to draw a map that clearly show the locations of ATM machines of the bank Spar Nord in Denmark. Also show the distribution of ATMs from different ATM manufacturer and the error rate of each machine.
The design of the visualization is two map plot, each shows the location of ATM machines from different manufactory. The position bubble will be colored by the error rate of the ATM machine.
sketch
In order to make user more clearly to see the location of the machines, I will use the package tmap to build a interactive map, which could make user easily zoom in or zoom out to see the overall condition or see a specific region of denmark.
packages = c('sf', 'tmap', 'tidyverse')
for (p in packages){
if(!require(p, character.only = T)){
install.packages(p)
}
library(p,character.only = T)
}
Data can be read from two csv files and use rbind function to get the full dataset.
data1 = read_csv("/Users/vickywei/smu/visual/open_dataset/atm_data_part1.csv")
data2 = read_csv("/Users/vickywei/smu/visual/open_dataset/atm_data_part2.csv")
data = rbind(data1, data2)
In order to calculate the machine error rate from message code (transactions with error message indicate that there is a error, if not, then NA), first transformm message code to 1 and 0 and save to a new column named error. If message_code is NA, then error equals to 0, else error equals to 1. And calculate the error rate for each machine using group_by function as below. Select columns that may used in further analysis and also check and remove the duplicated rows. Sort the data by atm_id.
data <- within(data,
error<- ifelse(!is.na(message_code),1,0)
)
data <- data %>%
group_by(atm_id,atm_manufacturer,atm_location,atm_streetname,atm_lat,atm_lon) %>% #do calculations by siteID
mutate(error_rate = sum(error) /length(error) * 100)
data <- data[c("atm_id","atm_manufacturer","atm_location","atm_streetname","atm_street_number","atm_zipcode","atm_lat","atm_lon","card_type","error_rate")]
data <-unique(data)
data<-data[with(data, order(atm_id)),]
Converts data into a simple feature data frame by using st_as_sf() of sf packages. Set crs equals to 1149 to tranform the coordinates of the machine position. Set tmap mode to interactive viewing.Error_rate variable is used as the colour attribute variable and atm_manufacturer variable is used to divide out two maps indicate each manufacturer seperately. Also set sync equals to true to make the two plot with synchronised zoom and pan settings.
data_sf <- st_as_sf(data, coords = c("atm_lon","atm_lat"),crs=1149)
tmap_mode("view")
tm_shape(data_sf)+
tm_bubbles(col = "error_rate",
size =1,
alpha = 1,
border.col = "darkgray",
border.lwd = 0.4)+
tm_facets(by= "atm_manufacturer",
nrow = 1,
sync = TRUE)
In this plot, we could see that bank Spar Nord has more ATM machines located on the north of Denmark and near Aalborg the ATM machine dense is highest. ATM machines of Spar Nord are produced by two manufacturer and Spar Nord has more ATM machines produced by NCR than Diebold Nixdorf. Overall, the error rate of the ATM machines of Spar Nord is low, but certain ATM machines have a relatively high error rate, for example, machine number 82 and 79, both are produced by NCR.