Introduction

The Swiss National Bank issues Swiss franc banknotes in denominations of 10, 20, 50, 100, 200, and 1,000 Swiss francs. The Swiss National Bank (SNB) has been aware since fall 2012 that there are a limited number of Swiss 1000-franc banknotes in circulation that were not issued by the SNB. According to investigations, about 1,800 notes worth CHF 1.8 million were taken during the manufacturing process at Orell Füssli. The banknotes in question have not gone through the entire manufacturing process. Moreover, Switzerland is unique among wealthy nations in that it used to have banknotes that expired; the Swiss National Bank declared numerous previous series of banknotes to be no longer legal tender shortly after introducing newer series. Notes from these “recalled” series may be exchanged at the National Bank for still-valid notes for up to 20 years following the recall date, at which point the notes lost all value. Consequently there is a need for a counterfeit detection system for verification of funds for people with only genuine 1000-francs notes to be implemented. The old-Swiss bank note in question is the second series 1000-francs Swiss banknote.

Description of Data

The data set contains six measurements made on 100 genuine and 100 counterfeit old-Swiss 1000-franc bank notes. First, we introduce the following variables:
\(X_1 =\) length of the bill ( Length )
\(X_2 =\) height of the bill - left ( Left )
\(X_3 =\) height of the bill - right ( Right )
\(X_4 =\) distance of the inner frame to the lower border ( Bottom )
\(X_5 =\) distance of the inner frame to the upper border ( Top )
\(X_6 =\) length of the diagonal of the central picture ( Diagonal )

[1] ## Loading Packages and Importing Data

# Loading Libraries needed
library(tidyverse)
library(dplyr)
library(readr)
# Getting work directory
getwd()
## [1] "C:/Users/USER/Documents/Banknote"
# Importing banknote.csv data set
banknotes <- read_csv("C:/Users/USER/Documents/Banknote/banknote.csv")

Cleaning Data

# Viewing srtucture of data
head(banknotes)
## # A tibble: 6 x 7
##   Status  Length  Left Right Bottom   Top Diagonal
##   <chr>    <dbl> <dbl> <dbl>  <dbl> <dbl>    <dbl>
## 1 genuine   215.  131   131.    9     9.7     141 
## 2 genuine   215.  130.  130.    8.1   9.5     142.
## 3 genuine   215.  130.  130.    8.7   9.6     142.
## 4 genuine   215.  130.  130.    7.5  10.4     142 
## 5 genuine   215   130.  130.   10.4   7.7     142.
## 6 genuine   216.  131.  130.    9    10.1     141.
glimpse(banknotes)
## Rows: 200
## Columns: 7
## $ Status   <chr> "genuine", "genuine", "genuine", "genuine", "genuine", "genui~
## $ Length   <dbl> 214.8, 214.6, 214.8, 214.8, 215.0, 215.7, 215.5, 214.5, 214.9~
## $ Left     <dbl> 131.0, 129.7, 129.7, 129.7, 129.6, 130.8, 129.5, 129.6, 129.4~
## $ Right    <dbl> 131.1, 129.7, 129.7, 129.6, 129.7, 130.5, 129.7, 129.2, 129.7~
## $ Bottom   <dbl> 9.0, 8.1, 8.7, 7.5, 10.4, 9.0, 7.9, 7.2, 8.2, 9.2, 7.9, 7.7, ~
## $ Top      <dbl> 9.7, 9.5, 9.6, 10.4, 7.7, 10.1, 9.6, 10.7, 11.0, 10.0, 11.7, ~
## $ Diagonal <dbl> 141.0, 141.7, 142.2, 142.0, 141.8, 141.4, 141.6, 141.7, 141.9~
# Convert Status column to a factor variable
banknotes$Status <- as.factor(banknotes$Status)

# View banknotes data
banknotes
## # A tibble: 200 x 7
##    Status  Length  Left Right Bottom   Top Diagonal
##    <fct>    <dbl> <dbl> <dbl>  <dbl> <dbl>    <dbl>
##  1 genuine   215.  131   131.    9     9.7     141 
##  2 genuine   215.  130.  130.    8.1   9.5     142.
##  3 genuine   215.  130.  130.    8.7   9.6     142.
##  4 genuine   215.  130.  130.    7.5  10.4     142 
##  5 genuine   215   130.  130.   10.4   7.7     142.
##  6 genuine   216.  131.  130.    9    10.1     141.
##  7 genuine   216.  130.  130.    7.9   9.6     142.
##  8 genuine   214.  130.  129.    7.2  10.7     142.
##  9 genuine   215.  129.  130.    8.2  11       142.
## 10 genuine   215.  130.  130.    9.2  10       141.
## # ... with 190 more rows
# Checking for number of missing values and duplicates
missing_values <- sum(is.na(banknotes))
duplicates <- sum(duplicated(banknotes))
missing_values
## [1] 0
duplicates
## [1] 0

The old-Swiss 1000-franc bank notes data has 200 rows of observations from each of the variables, where banknotes$Status was change from a character to a factor in order to prepare it for analyses. The rest of the columns of banknotes are in decimal <dbl> variables. missing_values shows that there are 0 missing values, while duplicates show that there also 0 duplicates.

Summary Statistics of each variable

library(vtable)
## Loading required package: kableExtra
## 
## Attaching package: 'kableExtra'
## The following object is masked from 'package:dplyr':
## 
##     group_rows
# Summary statistics of variables
sumtable(banknotes, title = 'Summary Statistics of Banknote Variables ',
         )
Summary Statistics of Banknote Variables
Variable N Mean Std. Dev. Min Pctl. 25 Pctl. 75 Max
Status 200
… counterfeit 100 50%
… genuine 100 50%
Length 200 214.896 0.377 213.8 214.6 215.1 216.3
Left 200 130.121 0.361 129 129.9 130.4 131
Right 200 129.957 0.404 129 129.7 130.225 131.1
Bottom 200 9.418 1.445 7.2 8.2 10.6 12.7
Top 200 10.65 0.803 7.7 10.1 11.2 12.3
Diagonal 200 140.483 1.152 137.8 139.5 141.5 142.4
# Summary statistics of variables based on Status
sumtable(banknotes, title = 'Summary Statistics of Banknote Variables based on Status',
         group = 'Status', group.test = TRUE)
Summary Statistics of Banknote Variables based on Status
Status
counterfeit
genuine
Variable N Mean SD N Mean SD Test
Length 100 214.823 0.352 100 214.969 0.388 F=7.772***
Left 100 130.3 0.255 100 129.943 0.364 F=64.49***
Right 100 130.193 0.298 100 129.72 0.355 F=103.962***
Bottom 100 10.53 1.132 100 8.305 0.643 F=292.155***
Top 100 11.133 0.636 100 10.168 0.649 F=112.788***
Diagonal 100 139.45 0.558 100 141.517 0.447 F=836.069***
Statistical significance markers: * p<0.1; ** p<0.05; *** p<0.01
# Filtering `banknotes` based on Status and removing the Status Column
banknotes_genuine <- banknotes %>% 
  filter(Status == "genuine") %>%  select(-Status)

banknotes_counterfeit <- banknotes %>% 
  filter(Status == "counterfeit") %>% select(-Status)

# Converting banknote from wide data to long
banknotes_long <- pivot_longer(banknotes,cols = c("Length","Diagonal", "Top","Bottom",
                                "Left", "Right"),
               names_to = "banknote_part")

# Merge banknote_part and Status column 
banknotes_long2 <- banknotes_long %>% 
  mutate(banknote_part_status = paste0(banknote_part, "_", Status)) %>% 
  select(-Status, -banknote_part)

banknotes_long2$banknote_part_status <- as.factor(banknotes_long2$banknote_part_status)

banknotes_long2
## # A tibble: 1,200 x 2
##    value banknote_part_status
##    <dbl> <fct>               
##  1 215.  Length_genuine      
##  2 141   Diagonal_genuine    
##  3   9.7 Top_genuine         
##  4   9   Bottom_genuine      
##  5 131   Left_genuine        
##  6 131.  Right_genuine       
##  7 215.  Length_genuine      
##  8 142.  Diagonal_genuine    
##  9   9.5 Top_genuine         
## 10   8.1 Bottom_genuine      
## # ... with 1,190 more rows
# Mean, Standard Deviation, Variance and Confidence intervals of banknotes_long2
summary_banknotes_long2 <- banknotes_long2 %>% 
  group_by(banknote_part_status) %>% 
  summarise(mean = mean(value), sd = sd(value), min = min(value),max = max(value),
            lower_ci = mean(value) - 1.96 * sd(value)/length(value), 
            upper_ci = mean(value) + 1.96 * sd(value)/length(value)) %>% 
  arrange(desc(banknote_part_status))  

summary_banknotes_long2
## # A tibble: 12 x 7
##    banknote_part_status   mean    sd   min   max lower_ci upper_ci
##    <fct>                 <dbl> <dbl> <dbl> <dbl>    <dbl>    <dbl>
##  1 Top_genuine           10.2  0.649   7.7  11.7    10.2     10.2 
##  2 Top_counterfeit       11.1  0.636   9.1  12.3    11.1     11.1 
##  3 Right_genuine        130.   0.355 129   131.    130.     130.  
##  4 Right_counterfeit    130.   0.298 129.  131.    130.     130.  
##  5 Length_genuine       215.   0.388 214.  216.    215.     215.  
##  6 Length_counterfeit   215.   0.352 214.  216.    215.     215.  
##  7 Left_genuine         130.   0.364 129   131     130.     130.  
##  8 Left_counterfeit     130.   0.255 130.  131.    130.     130.  
##  9 Diagonal_genuine     142.   0.447 140.  142.    142.     142.  
## 10 Diagonal_counterfeit 139.   0.558 138.  141.    139.     139.  
## 11 Bottom_genuine         8.30 0.643   7.2  10.4     8.29     8.32
## 12 Bottom_counterfeit    10.5  1.13    7.4  12.7    10.5     10.6

banknotes dataset was converted to banknotes_long to prepare it for statistical summary summary_banknotes_long2. summary_banknotes_long2 comprises of the mean, standard deviation, minimum, maximum and confidence interval values where each measurement variable was paired with it’s category. Thus, summary_banknotes_long2 was grouped by summary_banknotes_long2$banknote_part_status.

Plots of Summary Statistics of Banknote

# Plot of Status of 1000 old-Swiss francs Banknote Variables vs. Mean
summary_banknotes_long2 %>% 
  ggplot(aes(banknote_part_status,mean, fill = banknote_part_status)) + 
  geom_col() + geom_text(aes(label = mean)) + coord_flip() + labs(title = "Status of 1000 old-Swiss francs Banknote Variables vs. Mean")

# Plot of Status of 1000 old-Swiss francs Banknote Variables vs. Max Values
summary_banknotes_long2 %>% 
  ggplot(aes(banknote_part_status, max, fill = banknote_part_status)) + 
  geom_col() + geom_text(aes(label = max)) + coord_flip() + labs(title = "Status of 1000 old-Swiss francs Banknote Variables vs. Max Values")

# Plot of Status of 1000 old-Swiss francs Banknote Variables vs. Max Values
summary_banknotes_long2 %>% 
  ggplot(aes(banknote_part_status, min, fill = banknote_part_status)) + 
  geom_col() + geom_text(aes(label = min)) + coord_flip() + labs(title = "Status of Banknote Variables vs. Min Values", caption = "")

All the bar plots above indicate that there are some margin of differences when measurements of 1000 old-Swiss francs Banknote variables are compared to each other based on their status

Correlation Between 1000 old-Swiss francs Banknote numerical variables

# Creating function to add suffix to all columns
appendDataFrameColumns <- function(df, prefix = "", suffix ="", sep = "") {
  colnames(df) <- paste(prefix, colnames(df), suffix, sep = sep)
  return(df) 
}

banknotes_genuine <- appendDataFrameColumns(banknotes_genuine, suffix = "_gen")
banknotes_genuine
## # A tibble: 100 x 6
##    Length_gen Left_gen Right_gen Bottom_gen Top_gen Diagonal_gen
##         <dbl>    <dbl>     <dbl>      <dbl>   <dbl>        <dbl>
##  1       215.     131       131.        9       9.7         141 
##  2       215.     130.      130.        8.1     9.5         142.
##  3       215.     130.      130.        8.7     9.6         142.
##  4       215.     130.      130.        7.5    10.4         142 
##  5       215      130.      130.       10.4     7.7         142.
##  6       216.     131.      130.        9      10.1         141.
##  7       216.     130.      130.        7.9     9.6         142.
##  8       214.     130.      129.        7.2    10.7         142.
##  9       215.     129.      130.        8.2    11           142.
## 10       215.     130.      130.        9.2    10           141.
## # ... with 90 more rows
banknotes_counterfeit <- appendDataFrameColumns(banknotes_counterfeit, suffix = "_coun")
banknotes_counterfeit
## # A tibble: 100 x 6
##    Length_coun Left_coun Right_coun Bottom_coun Top_coun Diagonal_coun
##          <dbl>     <dbl>      <dbl>       <dbl>    <dbl>         <dbl>
##  1        214.      130.       130.         9.7     11.7          140.
##  2        215.      130.       130.        11       11.5          140.
##  3        215.      130.       130.         8.7     11.7          140.
##  4        215       130.       131.         9.9     10.9          140.
##  5        215.      130.       130.        11.8     10.9          140.
##  6        215       130.       130.        10.6     10.7          140.
##  7        215.      130.       130.         9.3     12.1          140.
##  8        215.      130.       130.         9.8     11.5          140.
##  9        215       130.       130.        10       11.9          139.
## 10        215.      131.       131.        10.4     11.2          140.
## # ... with 90 more rows
# Joining banknotes_genuine and banknotes_counterfeit together
banknotes_gen_coun <- banknotes_genuine %>% 
  full_join(banknotes_counterfeit, by = character())
banknotes_gen_coun
## # A tibble: 10,000 x 12
##    Length_gen Left_gen Right_gen Bottom_gen Top_gen Diagonal_gen Length_coun
##         <dbl>    <dbl>     <dbl>      <dbl>   <dbl>        <dbl>       <dbl>
##  1       215.      131      131.          9     9.7          141        214.
##  2       215.      131      131.          9     9.7          141        215.
##  3       215.      131      131.          9     9.7          141        215.
##  4       215.      131      131.          9     9.7          141        215 
##  5       215.      131      131.          9     9.7          141        215.
##  6       215.      131      131.          9     9.7          141        215 
##  7       215.      131      131.          9     9.7          141        215.
##  8       215.      131      131.          9     9.7          141        215.
##  9       215.      131      131.          9     9.7          141        215 
## 10       215.      131      131.          9     9.7          141        215.
## # ... with 9,990 more rows, and 5 more variables: Left_coun <dbl>,
## #   Right_coun <dbl>, Bottom_coun <dbl>, Top_coun <dbl>, Diagonal_coun <dbl>
# Correlation Matrix of numerical variables
banknotes.cor <- round(cor(banknotes_gen_coun), 2)
banknotes.cor
##               Length_gen Left_gen Right_gen Bottom_gen Top_gen Diagonal_gen
## Length_gen          1.00     0.41      0.42       0.23    0.06         0.03
## Left_gen            0.41     1.00      0.66       0.24    0.21        -0.26
## Right_gen           0.42     0.66      1.00       0.25    0.13        -0.15
## Bottom_gen          0.23     0.24      0.25       1.00   -0.63         0.00
## Top_gen             0.06     0.21      0.13      -0.63    1.00        -0.26
## Diagonal_gen        0.03    -0.26     -0.15       0.00   -0.26         1.00
## Length_coun         0.00     0.00      0.00       0.00    0.00         0.00
## Left_coun           0.00     0.00      0.00       0.00    0.00         0.00
## Right_coun          0.00     0.00      0.00       0.00    0.00         0.00
## Bottom_coun         0.00     0.00      0.00       0.00    0.00         0.00
## Top_coun            0.00     0.00      0.00       0.00    0.00         0.00
## Diagonal_coun       0.00     0.00      0.00       0.00    0.00         0.00
##               Length_coun Left_coun Right_coun Bottom_coun Top_coun
## Length_gen           0.00      0.00       0.00        0.00     0.00
## Left_gen             0.00      0.00       0.00        0.00     0.00
## Right_gen            0.00      0.00       0.00        0.00     0.00
## Bottom_gen           0.00      0.00       0.00        0.00     0.00
## Top_gen              0.00      0.00       0.00        0.00     0.00
## Diagonal_gen         0.00      0.00       0.00        0.00     0.00
## Length_coun          1.00      0.35       0.23       -0.25     0.09
## Left_coun            0.35      1.00       0.61       -0.08    -0.07
## Right_coun           0.23      0.61       1.00       -0.06     0.00
## Bottom_coun         -0.25     -0.08      -0.06        1.00    -0.68
## Top_coun             0.09     -0.07       0.00       -0.68     1.00
## Diagonal_coun        0.06     -0.04       0.21        0.38    -0.06
##               Diagonal_coun
## Length_gen             0.00
## Left_gen               0.00
## Right_gen              0.00
## Bottom_gen             0.00
## Top_gen                0.00
## Diagonal_gen           0.00
## Length_coun            0.06
## Left_coun             -0.04
## Right_coun             0.21
## Bottom_coun            0.38
## Top_coun              -0.06
## Diagonal_coun          1.00
# Correlogram of banknote numerical variables
library(corrplot)
## corrplot 0.92 loaded
corrplot(banknotes.cor, title = "Correlogram of Banknote numerical variables based on Status", mar=c(0,0,2,0))

Based on banknotes.cor there are no correlations between genuine and counterfeit banknote numerical variables and the correlation plot further explains

Exact Nature of a Counterfeit or Genuine Currency

# Difference between Countfeit  and Genuine
banknotes_error <- banknotes_counterfeit - banknotes_genuine
banknotes_error <- rename(banknotes_error, Length = Length_coun , Left = Left_coun, Right = Right_coun,
                          Bottom = Bottom_coun, Top = Top_coun,
                          Diagonal = Diagonal_coun)

head(banknotes_error, n = 19) 
##    Length Left Right Bottom  Top Diagonal
## 1    -0.4 -0.9  -0.8    0.7  2.0     -1.2
## 2     0.3  0.8   0.5    2.9  2.0     -2.2
## 3     0.1  0.6   0.4    0.0  2.1     -2.0
## 4     0.2  0.7   1.0    2.4  0.5     -1.7
## 5    -0.3  0.6   0.6    1.4  3.2     -2.1
## 6    -0.7 -0.6  -0.3    1.6  0.6     -1.5
## 7    -0.2  0.8   0.4    1.4  2.5     -1.4
## 8     0.3  0.5   1.2    2.6  0.8     -1.8
## 9     0.1  0.8   0.2    1.8  0.9     -2.5
## 10    0.0  0.2   0.5    1.2  1.2     -0.4
## 11   -0.1  0.0   0.0    0.1 -0.2     -2.6
## 12    0.0  1.0   0.7    2.9  1.0     -2.1
## 13    0.2 -0.1   1.5    1.8  1.0     -0.8
## 14    0.2  0.7   0.2    3.7  0.1     -1.8
## 15    0.0  0.4   0.3    2.9  0.0     -2.1
## 16    1.0  0.6   0.2   -1.1  2.7     -2.4
## 17    0.1  0.7   0.0    3.6  0.7     -1.9
## 18   -0.3  0.5   0.4    3.1  1.4     -2.0
## 19   -0.4  0.9   0.6    3.6 -0.5     -1.5
# Number of error values greater than zero
banknotes_greater <- c(sum(banknotes_error[,1] > 0),
                       sum(banknotes_error[,2] > 0),
                       sum(banknotes_error[,3] > 0),
                       sum(banknotes_error[,4] > 0),
                       sum(banknotes_error[,5] > 0),
                       sum(banknotes_error[,6] > 0))
names(banknotes_greater) <- c(" Length", "Left", "Right", "Bottom", "Top", "Diagonal")

# Number of error values less than zero
banknotes_less <- c(sum(banknotes_error[,1] < 0),
                       sum(banknotes_error[,2] < 0),
                       sum(banknotes_error[,3] < 0),
                       sum(banknotes_error[,4] < 0),
                       sum(banknotes_error[,5] < 0),
                       sum(banknotes_error[,6] < 0))

names(banknotes_less) <- c(" Length", "Left", "Right", "Bottom", "Top", "Diagonal")
# Counterfeits with measurements bigger than genuine
banknotes_greater
##   Length     Left    Right   Bottom      Top Diagonal 
##       33       77       83       95       80        1
# Counterfeits with measurements bigger than genuine
banknotes_less
##   Length     Left    Right   Bottom      Top Diagonal 
##       59       20       11        4       16       99

banknotes_error is the difference between Counterfeit and Genuine banknotes where a value greater than 0 indicates that the measurement of the numerical variable of Counterfeit is larger than Genuine and vice versa. banknotes_greater and banknotes_less are compared side-by-side and we pick the number that is highest for each.

Conclusion

The Counterfeit 1000 old-Swiss francs banknote has larger measurement for Width of left edge, Width of right edge, bottom margin width and Top margin width but smaller Length of Diagonal as compared to Genuine 1000 old-Swiss francs banknote with no significant difference in Length of bill.