1.0 Overview

In this project, we want to reveal the failure rate of Spar Nord ATMs by AIM ID. Spar Nord can adjust the strategy for operation improvement.

1.1 Proposed Design

We can easily identify the outliers in funnel plot. Hence we will create a funnel plot. Number of records set as x and failure rate set as y in the plot.

Spar Nord Failure Rate by Funnel Plot

Spar Nord Failure Rate by Funnel Plot

2.0 Installing and Launching R Packages

packages = c('tidyverse','plotly')
for (p in packages){
  if(!require(p,character.only = T)){install.packages(p)
  }
  library(p,character.only = T)
}

3.0 Importing and Preparing The Data Set

In our project, Spar Nord Open Data will be used. The dataset contains information about the location of the ATM, manufacturer, currency, cardtype, weather conditions, etc. Please Download the data here.The original data set is in csv format.

3.1 Importing the data set

In the code chunk below, read.csv() of readr is used to import atm_all.csv into R and parsed it into tibble R data frame format.

data = read.csv("data/atm_all.csv")

3.2 Preparing the data

We identify the situation by ATM_ID and month, and filter out inactive ATMS.

data1<- data %>% 
  filter(atm_status=="Active")%>%
  select(atm_id,atm_location,message_text,weather_main,month)%>%
  group_by(atm_id,month,atm_location)%>%
  summarise(Total_Number = n())
  
data2<- data %>% 
  filter(atm_status=="Active")%>%
  select(atm_id,atm_location,message_text,weather_main,month)%>%
  filter(message_text != '')%>%
  group_by(atm_id,month,atm_location)%>%
  summarise(Failure_Number = n())

data3 <- merge(x=data1,y=data2,by = c("atm_id","month","atm_location"))

data3$Failure_Rate = (data3$Failure_Number/data3$Total_Number)

4.0 Funnel Plot to show Failure rate

4.1 Basic Point plot

At first, we can show the points by x and y in the plot. x is the total number of records and y is failure rate.

x = data3$Total_Number
y = data3$Failure_Rate

ggplot(data3,aes(x=x,y=y))+
  geom_point()

4.2 Point plot in Animation

At first, we can show the points by x and y in the plot. x is the total number of records and y is failure rate.

x = data3$Total_Number
y = data3$Failure_Rate
data3$month=factor(data3$month, levels = month.name)
p_plot <- ggplot(data3,aes(x=x,y=y,color=atm_location))+
  geom_point(aes(frame=month,ids=atm_id))

ggplotly(p_plot)

4.1 Standard Deviation Caculation

We need caculate the se for failure rate and weighted mean of y.

y.se <- sqrt(y*(1-y)/x)
df <- data.frame(data3$atm_id,x, y, y.se)
names(df)[1]<-paste("Atm_ID")
y.fem <- weighted.mean(y, 1/y.se^2)
## lower and upper limits for 95% and 99.9% CI, based on FEM estimator
number.seq <- seq(0.001, max(x), 0.1)
number.ll95 <- y.fem - 1.96 * sqrt((y.fem*(1-y.fem)) / (number.seq)) 
number.ul95 <- y.fem + 1.96 * sqrt((y.fem*(1-y.fem)) / (number.seq)) 
number.ll999 <- y.fem - 3.29 * sqrt((y.fem*(1-y.fem)) / (number.seq))
number.ul999 <- y.fem + 3.29 * sqrt((y.fem*(1-y.fem)) / (number.seq)) 
dfCI <- data.frame(number.ll95, number.ul95, number.ll999, number.ul999, number.seq, y.fem)

4.2 Combined lines with point plot

Some point really close to zero in y-axis and the plot cannot show a full view. Until now, I not fixed the problem of how to combine the points and line in animation plot, since there are some error happend.

ggplot()+
  geom_point(data=data3,aes(x=x,y=y,color=month,frame=month))+
  geom_line(aes(x = number.seq, y = number.ll95), data = dfCI) +
  geom_line(aes(x = number.seq, y = number.ul95), data = dfCI) +
  geom_line(aes(x = number.seq, y = number.ll999), linetype = "dashed", data = dfCI)+
  geom_line(aes(x = number.seq, y = number.ul999), linetype = "dashed", data = dfCI)+
  geom_hline(aes(yintercept = y.fem), data = dfCI) +
  scale_y_continuous(limits = c(0,0.07)) +
  ggtitle("Funnel Plot in Failure Rate")+
  xlab("Number of Records") + ylab("Failure rate") + theme_bw()+
  theme(plot.title = element_text(hjust=0.5))

5.0 What revealed from the data

  1. At first, ATM 39 performed good from January to November, but its failure rate was increased a lot and our suggestion is Spar Nord shold to check and to fix it.

  2. ATM 83 and 79 performed really bad from January to Feburary, and it was not record from March to May. It would be fixed already, because its failure rate was not increased again in the next months.

  3. The number of records of ATM 45,39,10 is more than 4000, and they kept increasing in the whole year. But it was strange that number of records of ATM 41 increased a lot and ranked first.

6.0 Animation advantages compared to static plot

  1. Animation graph has more fun and exciting than stastic graph.

  2. Animation graph shows treands in data clearly, especially for presentation.

  3. User can interactivate with animation graph, where the user can swap out different metrics, such as month and failure rate of atm.

  4. Clearly indicates data correlation (illustrates positive, negative, strong, weak relationships); method of illustration non-linear patterns; shows spread of data, outliers; clearly demonstrate atypical relationships; used for data extrapolation and interpolation.