Construction of Automated Banking Machine Location Data

This appendix outlines the steps taken to construct the automated banking machine (ABM) data used in this paper. The construction process proceeded in three steps, outlined below.

Step 1: Identify and Drop Duplicates in Mastercard Data

Identify Duplicates (2023)

Code
# load packages 
library(tidyverse)
library(janitor)
library(knitr)
library(kableExtra)
library(gt)

# import raw MasterCard data

mc_raw_2023 <- read_csv("data/unprocessed/mastercard/BOC_All_ATMImage(December 2023).txt", guess_max = 50000) %>% 
  clean_names()

Within Sponsors

In this step, we are identifying duplicates within each sponsor. We treat both the terminal ID and sponsor name as a combined key. We then add a column called duplicate_tag_1 that indicates whether each terminal ID appears once, twice, or more than twice within the same sponsor:

  • Tag 0: The terminal ID appears only once within a sponsor.
  • Tag 1: The terminal ID appears exactly twice within a sponsor.
  • Tag 2: The terminal ID appears more than twice within a sponsor.

This helps us manage duplicates at the sponsor level and ensures we accurately identify repeated terminal IDs within each sponsor’s dataset.

Code
tagged_2_2023 <- mc_raw_2023 %>% 
  group_by(inst_assigned_term_id, sponsor_name) %>%
  mutate(duplicate_tag_1 = case_when(
    n() == 1 ~ 0,
    n() == 2 ~ 1,
    n() > 2 ~ 2
  )) %>% 
  ungroup()

tagged_2_2023 %>% 
  count(sponsor_name, duplicate_tag_1) %>% 
  pivot_wider(names_from = duplicate_tag_1, values_from = n, names_prefix = "tag_") %>% 
  # Replace missing values (NA) with 0 in the reshaped data
  mutate(across(where(is.integer), ~ replace_na(., 0))) %>% 
  # Create a formatted table with 'kable' for better presentation
  kbl(caption = "Table 2: Duplicates Within Sponsors (2023)", 
      col.names = c("Sponsor Name", "Tag 0", "Tag 1", "Tag 2")) %>% 
  kable_classic()
Table 2: Duplicates Within Sponsors (2023)
Sponsor Name Tag 0 Tag 1 Tag 2
BANK OF MONTREAL 37875 52 0
BANK OF NOVA SCOTIA 2434 48 18
CANADIAN IMPERIAL BANK OF COMMERCE 2968 0 0
DIRECTCASH BANK 7255 128 0
EVERLINK PAYMENT SERVICES INC. 1851 0 0
FEDERATION DES CAISSES DESJARDINS DU QUEBEC 1875 0 0
HSBC BANK CANADA 168 0 0
MONERIS SOLUTIONS CORPORATION 26397 0 0
NATIONAL BANK OF CANADA 927 0 0
ROYAL BANK OF CANADA 3955 0 0
TORONTO-DOMINION BANK, THE 2691 0 0

For Directcash Bank, some of the duplicates are due to the same record assigned once to its actual address, and once to a PO Box. Otherwise, the records look exactly the same.

Code
tagged_2_2023 %>% 
  filter(duplicate_tag_1 == 1) %>%
  select(sponsor_name, owner_name, 
         inst_assigned_term_id, address1, 
         city_name, location_name) %>% 
  filter(inst_assigned_term_id == "D1018430" | 
           inst_assigned_term_id == "D1027205" | 
           inst_assigned_term_id == "D1032094") %>% 
  arrange(inst_assigned_term_id) %>% 
  kbl(col.names = 
        c("Sponsor Name", "Owner Name", "Terminal ID", 
          "Address", "City", "Location Name")) %>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "green") %>% 
  row_spec(4, color = "green") %>% 
  row_spec(5, color = "blue") %>% 
  row_spec(6, color = "blue")
Sponsor Name Owner Name Terminal ID Address City Location Name
DIRECTCASH BANK ALVENA CO-OPERATIVE ASSOCIATION LTD D1018430 103 RAILWAY AVE ALVENA ALVENA COOP
DIRECTCASH BANK ALVENA CO-OPERATIVE ASSOCIATION LTD D1018430 PO BOX 150 ALVENA ALVENA COOP
DIRECTCASH BANK DC PAYMENTS CA D1027205 215 VICTORIA AVE MONARCH MONARCH POST OFFICE
DIRECTCASH BANK DC PAYMENTS CA D1027205 PO BOX 168 MONARCH MONARCH POST OFFICE
DIRECTCASH BANK TIN ROOSTER HOLDINGS LTD D1032094 BOULDER AVE DEASE LAKE DEASE LAKE SUPER A
DIRECTCASH BANK TIN ROOSTER HOLDINGS LTD D1032094 PO BOX 160 DEASE LAKE DEASE LAKE SUPER A

Deduplication rule: When DirectCash Bank records share the same terminal ID but have different addresses, we drop the record where the address contains a PO Box and retain the record with a physical address.

Scotiabank duplicates have the same record with its address written differently, such as different city names with the same address, or one record ignores the street number while the other does have it. These are real duplicates as well.

Code
tagged_2_2023 %>% 
  filter(duplicate_tag_1 != 0) %>%
  select(sponsor_name, owner_name, last_reported, inst_assigned_term_id, address1, city_name, latitude, longitude) %>%
  filter(inst_assigned_term_id == "AE25" | inst_assigned_term_id == "AH08") %>% 
  arrange(inst_assigned_term_id) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Last Reported",
                    "Terminal ID", "Address", "City", "Latitude", "Longitude")) %>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "red") %>% 
  row_spec(4, color = "green") %>% 
  row_spec(5, color = "green")
Sponsor Name Owner Name Last Reported Terminal ID Address City Latitude Longitude
BANK OF NOVA SCOTIA SCOTIABANK 24-11-2023 14:20:10 AE25 18TH ST RICHMOND AVE BRANDON 49.82750 -99.96201
BANK OF NOVA SCOTIA SCOTIABANK 24-11-2023 14:20:10 AE25 1570 18TH ST BRANDON 49.82204 -99.96358
BANK OF NOVA SCOTIA SCOTIABANK 24-11-2023 14:20:10 AE25 18TH ST RICHMOND AVE BRANDON 49.82750 -99.96201
BANK OF NOVA SCOTIA SCOTIABANK 24-11-2023 14:20:10 AH08 3401 DUFFERIN ST TORONTO 43.72598 -79.45241
BANK OF NOVA SCOTIA SCOTIABANK 24-11-2023 14:20:10 AH08 3401 DUFFERIN ST NORTH YORK 43.72611 -79.45227

BMO ATM duplicates seem to be the same ATM reported under ownership of both Cardtronics and Direct Cash. Since Cardtronics had already acquired Direct Cash in 2017, this seems to be real duplicates as well.

Code
tagged_2_2023 %>% 
  filter(duplicate_tag_1 != 0) %>%
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name) %>% 
  filter(inst_assigned_term_id == "D1018373" | 
           inst_assigned_term_id == "D1057885" | 
           inst_assigned_term_id == "D1066100") %>% 
  arrange(inst_assigned_term_id) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City")) %>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "green") %>% 
  row_spec(4, color = "green") %>% 
  row_spec(5, color = "orange") %>% 
  row_spec(6, color = "orange")
Sponsor Name Owner Name Terminal ID Address City
BANK OF MONTREAL DIRECT CASH D1018373 4650 SHEPPARD AVE E SCARBOROUGH
BANK OF MONTREAL CARDTRONICS D1018373 1691 CT GAS SCARBOROUGH SCARBOROUGH
BANK OF MONTREAL DIRECT CASH D1057885 165 MAIN ST ANTIGONISH
BANK OF MONTREAL CARDTRONICS D1057885 476 CENTRE ST N NAPANEE
BANK OF MONTREAL DIRECT CASH D1066100 6250 NO 3 RD RICHMOND
BANK OF MONTREAL CARDTRONICS D1066100 RICHMOND BRIDGEHOUSE STATION 6250 NO 3 ROAD RICHMOND

Deduplication rule: When records within the same sponsor (BMO) share the same terminal ID but have different owner names, specifically Cardtronics and Direct Cash, we drop the record associated with Cardtronics and retain the record associated with DirectCash..

The other BMO ATM duplicates have two owners as well: MBE and RapidCash.

Code
tagged_2_2023 %>%
  filter(duplicate_tag_1 != 0) %>%
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name) %>%
  filter(inst_assigned_term_id == "RCA00539" | inst_assigned_term_id == "RCA00540" |
    inst_assigned_term_id == "RCA00630") %>% 
  arrange(desc(inst_assigned_term_id)) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City")) %>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "green") %>% 
  row_spec(4, color = "green") %>% 
  row_spec(5, color = "orange") %>% 
  row_spec(6, color = "orange")
Sponsor Name Owner Name Terminal ID Address City
BANK OF MONTREAL MBE RCA00630 1ST ST VERNON
BANK OF MONTREAL RAPID CASH ATM LTD RCA00630 12900 29 AVE VERNON
BANK OF MONTREAL MBE RCA00540 TC HWY 1 CALGARY
BANK OF MONTREAL RAPID CASH ATM LTD RCA00540 1414 3 ST SW CALGARY
BANK OF MONTREAL MBE RCA00539 TC HWY 1 CALGARY
BANK OF MONTREAL RAPID CASH ATM LTD RCA00539 1414 3 ST SW CALGARY

Deduplication rule: When records within the same sponsor (BMO) share the same terminal ID but have different owner names, specifically MBE and Rapid Cash, we drop the record associated with MBE and retain the record associated with Rapid Cash.

Across Sponsors

We begin by treating the terminal ID as a unique identifier. To manage duplicates, we add a column called duplicate tag that indicates whether each terminal ID appears once, twice, or more than twice. This tagging helps highlight duplicates in the data:

  • Tag 0: The terminal ID appears only once (i.e., it’s unique).
  • Tag 1: The terminal ID appears exactly twice.
  • Tag 2: The terminal ID appears more than twice.

Finally, we count the number of occurrences for each combination of sponsor name and duplicate tag to better understand the distribution of duplicates.

Code
# Group the raw Mastercard 2023 data by 'inst_assigned_term_id'
# inst_assigned_term_id is terminal ID in the 2023 mastercard data

tagged_2023 <- mc_raw_2023 %>% 
  group_by(inst_assigned_term_id) %>%
  
  # Create a new column 'duplicate_tag' to label the number of duplicates:
  # - Assign 0 if there's only one record for this 'inst_assigned_term_id'
  # - Assign 1 if there are exactly two records
  # - Assign 2 if there are more than two records
  mutate(duplicate_tag = case_when(
    n() == 1 ~ 0,
    n() == 2 ~ 1,
    n() > 2 ~ 2
  )) %>% 
  
  # Remove the grouping structure
  ungroup()

# Count the occurrences of each 'sponsor_name' by 'duplicate_tag', 
# then reshape the data to have columns for each 'duplicate_tag' (0, 1, 2)
tagged_2023 %>% 
  count(sponsor_name, duplicate_tag) %>% 
  pivot_wider(names_from = duplicate_tag, values_from = n, names_prefix = "tag_") %>% 
  
  # Replace missing values (NA) with 0 in the reshaped data
  mutate(across(where(is.integer), ~ replace_na(., 0))) %>% 
  
  # Create a formatted table with 'kable' for better presentation
  kbl(caption = "Table 1: Duplicates Across Sponsors (2023)", col.names = c("Sponsor Name", "Tag 0", "Tag 1", "Tag 2")) %>% 
  kable_classic() %>% 
  kable_styling(full_width = FALSE)
Table 1: Duplicates Across Sponsors (2023)
Sponsor Name Tag 0 Tag 1 Tag 2
BANK OF MONTREAL 9915 27845 167
BANK OF NOVA SCOTIA 2341 136 23
CANADIAN IMPERIAL BANK OF COMMERCE 2714 239 15
DIRECTCASH BANK 4493 2806 84
EVERLINK PAYMENT SERVICES INC. 1747 1 103
FEDERATION DES CAISSES DESJARDINS DU QUEBEC 1875 0 0
HSBC BANK CANADA 168 0 0
MONERIS SOLUTIONS CORPORATION 1460 24833 104
NATIONAL BANK OF CANADA 927 0 0
ROYAL BANK OF CANADA 3861 92 2
TORONTO-DOMINION BANK, THE 2691 0 0

It seems that the duplicates are concentrated in BMO (i.e. Moneris) and Directcash (which has been acquired by BMO according to MC).

BMO- Moneris Duplicates

BMO has 9915 unique terminal IDs, but 27845 Terminals appear more than once. And 167 Terminals appear at least twice. Moneris has 1460 unique terminal IDs, and 24833 are duplicates. And 104 terminals appear more than once. These are the primary drivers of the duplicates.

Here are examples of BMO- Moneris duplicates.

Code
filter_bmo_examples <- c("001BCLVS", "001NYKBA", "TOT00005")

tagged_2023 %>% 
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name, location_name, duplicate_tag) %>%
  filter(duplicate_tag == 1) %>% 
  arrange(inst_assigned_term_id, sponsor_name) %>% 
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name, location_name) %>% 
  filter(inst_assigned_term_id %in% filter_bmo_examples) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City", "Location Name")) %>%  
  kable_classic() %>% 
  #kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City", "Location Name")) %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "green") %>% 
  row_spec(4, color = "green") %>% 
  row_spec(5, color = "blue") %>% 
  row_spec(6, color = "blue") 
Sponsor Name Owner Name Terminal ID Address City Location Name
BANK OF MONTREAL FLEX TOUCH CANADA 4 001BCLVS 19925 WILLOWBROOK DR UNIT 101 LANGLEY SAMMY J PEPPERS
MONERIS SOLUTIONS CORPORATION FLEX TOUCH CANADA 4 001BCLVS 19925 WILLOWBROOK DR UNIT 101 LANGLEY SAMMY J PEPPERS
BANK OF MONTREAL FLEX TOUCH CANADA 4 001NYKBA 1731 TAPSCOTT RD SCARBOROUGH E GAMES PLUS
MONERIS SOLUTIONS CORPORATION FLEX TOUCH CANADA 4 001NYKBA 1731 TAPSCOTT RD SCARBOROUGH E GAMES PLUS
BANK OF MONTREAL CASH IN TIME TOT00005 2694 BROOKDALE AVE CORNWALL SUPER 8
MONERIS SOLUTIONS CORPORATION CASH IN TIME TOT00005 2694 BROOKDALE AVE CORNWALL SUPER 8

Deduplication rule: When BMO and Moneris share the same terminal ID, we use the Sponsor Name to determine which record to keep. We drop the record associated with Moneris and retain the record associated with BMO.

BMO- Direct Cash Duplicates

We also know that direct cash has been acquired by BMO. In these cases, we see the sponsor is Direct Cash in one entry, and the owner is direct cash in another entry.

Code
filter_dc_examples <- c("D4024406", "D4026911", "D4018524", "D4018462")

tagged_2023 %>% 
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name, location_name, duplicate_tag) %>%
  filter(duplicate_tag == 1) %>% 
  filter(inst_assigned_term_id %in% filter_dc_examples) %>% 
  select(-duplicate_tag) %>% 
  arrange(desc(inst_assigned_term_id)) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City", "Location Name")) %>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "green") %>% 
  row_spec(4, color = "green") %>% 
  row_spec(5, color = "blue") %>% 
  row_spec(6, color = "blue") %>% 
  row_spec(7, color = "orange") %>% 
  row_spec(8, color = "orange")
Sponsor Name Owner Name Terminal ID Address City Location Name
BANK OF MONTREAL DIRECT CASH D4026911 204 DAKOTA DUNES WAY WHITECAP SIGA DAKOTA DUNES 1
DIRECTCASH BANK SASKATCHEWAN INDIAN GAMING AUTHORITY INC D4026911 204 DAKOTA DUNES WAY WHITECAP SIGA DAKOTA DUNES 1
BANK OF MONTREAL DIRECT CASH D4024406 9208 JASPER AVE NW EDMONTON RIVERVIEW HUSKY 2
DIRECTCASH BANK ROCKY STAR ENTERPRISE LTD D4024406 9208 JASPER AVE NW EDMONTON RIVERVIEW HUSKY 2
BANK OF MONTREAL DIRECT CASH D4018524 377 RIVERSIDE DR E WINDSOR CAESARS WINDSOR EMPLOYEES LOUNGE
DIRECTCASH BANK DC PAYMENTS CA D4018524 377 RIVERSIDE DR E WINDSOR CAESARS WINDSOR EMPLOYEES LOUNGE
BANK OF MONTREAL DIRECT CASH D4018462 44 MARQUIS RD W PRINCE ALBERT SIGA NORTHERN LIGHTS 2
DIRECTCASH BANK DC PAYMENTS CA D4018462 44 MARQUIS RD W PRINCE ALBERT SIGA NORTHERN LIGHTS 2

Deduplication rule: When BMO and DirectCash Bank share the same terminal ID, we use the Sponsor Name to determine which record to keep. We drop the record associated with DirectCash Bank and retain the record associated with BMO.

Other

Most of the duplicates in the 2023 data originate from Moneris and Direct Cash. By applying the rules above, we can clean up most of the data, but a few other duplicates may still need attention.

For the few duplicate CIBC ATMs, it seems that they are actually owned by Cardtronics, but branded by CIBC.

Code
filter_cibc_examples <- c("MS909C9E", "MS909G9J", "MS909G9C", "MS909F9W")

tagged_2023 %>% 
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name, location_name, duplicate_tag) %>%
  filter(duplicate_tag == 1) %>% 
  filter(inst_assigned_term_id %in% filter_cibc_examples) %>% 
  select(-duplicate_tag) %>% 
  arrange(desc(inst_assigned_term_id)) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City", "Location Name")) %>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "green") %>% 
  row_spec(4, color = "green") %>% 
  row_spec(5, color = "blue") %>% 
  row_spec(6, color = "blue") %>% 
  row_spec(7, color = "orange") %>% 
  row_spec(8, color = "orange")
Sponsor Name Owner Name Terminal ID Address City Location Name
CANADIAN IMPERIAL BANK OF COMMERCE CIBC MS909G9J 1131 DUNDAS ST W MISSISSAUGA GIANT TIGER
BANK OF MONTREAL CARDTRONICS MS909G9J 1131 DUNDAS ST W UNIT M1 MISSISSAUGA GIANT TIGER 234
CANADIAN IMPERIAL BANK OF COMMERCE CIBC MS909G9C 9225 HWY 93 MIDLAND GIANT TIGER
BANK OF MONTREAL CARDTRONICS MS909G9C 9225 COUNTY RD 93 HURONIA MALL MIDLAND GIANT TIGER 128
CANADIAN IMPERIAL BANK OF COMMERCE CIBC MS909F9W 800 BOUL THIBEAU TROISRIVIRES GIANT TIGER
BANK OF MONTREAL CARDTRONICS MS909F9W 800 BOUL THIBEAU TROISRIVIRES GIANT TIGER 116
CANADIAN IMPERIAL BANK OF COMMERCE CIBC MS909C9E 1250 MAIN ST OTTAWA IDA PHARMACY
BANK OF MONTREAL CARDTRONICS MS909C9E 1250 MAIN ST OTTAWA IDA PHARMACY

Deduplication rule: Retain the record associated with CIBC

It seems that the RBC and Scotiabank’s “duplicated ATMs” are not duplicates, but different ATMs present in their respective branches. This could be because the terminal variable is only 4 digits in each case. For example, there are NOT duplicates.

Code
filter_rbc_examples <- c("WD21", "WC23", "VM20")

tagged_2023 %>% 
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name, location_name, duplicate_tag) %>%
  filter(duplicate_tag == 1) %>% 
  filter(inst_assigned_term_id %in% filter_rbc_examples) %>% 
  select(-duplicate_tag) %>% 
  arrange(inst_assigned_term_id) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City", "Location Name")) %>% 
  kable_classic() 
Sponsor Name Owner Name Terminal ID Address City Location Name
ROYAL BANK OF CANADA ROYAL BANK OF CANADA VM20 2288 WHATCOM RD ABBOTSFORD SUMAS MOUNTAIN VI DT
BANK OF NOVA SCOTIA SCOTIABANK VM20 7030 ALBERNI ST POWELL RIVER POWELL RIVER 1
ROYAL BANK OF CANADA ROYAL BANK OF CANADA WC23 220 PORTAGE AVE WINNIPEG WINNIPEG MNWE54AUD
BANK OF NOVA SCOTIA SCOTIABANK WC23 1091 ST MARYS RD WINNIPEG DAKOTA VILLAGE 3
ROYAL BANK OF CANADA ROYAL BANK OF CANADA WD21 85 MAIN ST MELITA MELITA BRWH28 AUD
BANK OF NOVA SCOTIA SCOTIABANK WD21 1195 ARTHUR ST W THUNDER BAY ARTHUR PARKDALE

Deduplication rule: These are not real duplicates. Nothing to do here.

Now, we explore the cases of duplicates with Tag 2 – i.e., when there are more than two instances of the same terminal ID.

There are two cases of double ATMs in Scotiabank branches that have the same 4-digit terminal ID as RBC ATMs. As mentioned in the previous paragraph, these are not real duplicates, so we do not drop them.

Code
filter_trip_bmo_rbc_examples <- c("CM25", "CM23")

tagged_2023 %>% 
  arrange(inst_assigned_term_id) %>% 
  filter(duplicate_tag == 2) %>%
  arrange(desc(inst_assigned_term_id)) %>% 
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, location_name, duplicate_tag) %>% 
  filter(inst_assigned_term_id %in% filter_trip_bmo_rbc_examples) %>% 
  select(-duplicate_tag) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "Location Name")) %>% 
  kable_classic()
Sponsor Name Owner Name Terminal ID Address Location Name
ROYAL BANK OF CANADA ROYAL BANK OF CANADA CM25 4804 50 ST SANDSTONE HENDAY AUD
BANK OF NOVA SCOTIA SCOTIABANK CM25 1265 YORK MILLS RD PARKWOOD VILLAGE 3
BANK OF NOVA SCOTIA SCOTIABANK CM25 1265 YORK MILLS RD PARKWOOD VILLAGE 3
ROYAL BANK OF CANADA ROYAL BANK OF CANADA CM23 65 GASOLINE ALLEY E PETROCAN AUD
BANK OF NOVA SCOTIA SCOTIABANK CM23 1265 YORK MILLS RD PARKWOOD VILLAGE 1
BANK OF NOVA SCOTIA SCOTIABANK CM23 1265 YORK MILLS RD PARKWOOD VILLAGE 1

Deduplication rule: These are not true duplicates across sponsors, so we do not perform deduplication between different sponsors.

For some Direct Cash ATMs, there were two duplicate records already in 2019. Coupled with the BMO record in 2023, that makes three of the same.

Code
tagged_2023 %>% 
  select(sponsor_name, owner_name, last_reported, inst_assigned_term_id, address1, location_name, duplicate_tag) %>% 
  filter(inst_assigned_term_id == "D2069425" | inst_assigned_term_id == "D2069137") %>% 
  select(-duplicate_tag) %>% 
  arrange(desc(inst_assigned_term_id)) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Last Reported", "Terminal ID", "Address", "Location Name")) %>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "red") %>% 
  row_spec(4, color = "blue") %>% 
  row_spec(5, color = "blue") %>% 
  row_spec(6, color = "blue")
Sponsor Name Owner Name Last Reported Terminal ID Address Location Name
DIRECTCASH BANK DC PAYMENTS CA 24-05-2019 09:40:38 D2069425 3456 HIGHWAY 138 MACEWEN RANGER CONVENIENCE
DIRECTCASH BANK DC PAYMENTS CA 23-05-2019 16:59:37 D2069425 PO BOX 45 MACEWEN RANGER CONVENIENCE
BANK OF MONTREAL CARDTRONICS 13-12-2023 15:55:39 D2069425 3456 HWY 138 ESSO MONKLAND 25005
BANK OF MONTREAL DIRECT CASH 13-12-2023 15:59:40 D2069137 455 BROADWAY ST W GALLAGHER CENTRE
DIRECTCASH BANK GALLAGHER CENTRE- CITY OF YORKTON 24-05-2019 09:40:30 D2069137 455 BROADWAY AVE W BOX 908 GALLAGHER CENTRE
DIRECTCASH BANK GALLAGHER CENTRE- CITY OF YORKTON 23-05-2019 16:58:50 D2069137 PO BOX 908 STN DELCENTRE GALLAGHER CENTRE

Deduplication rule: Keep BMO.

Some CIBC-Branded ATMs were originally Direct Cash ATMs, acquired by Cardtronics and reported via BMO. For these ATMs there are three sponsors reporting them as well.

Code
tagged_2023 %>% 
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name, location_name, duplicate_tag) %>%
  filter(duplicate_tag == 2) %>% 
  filter(inst_assigned_term_id == "F4257559") %>% 
  select(-duplicate_tag) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City", "Location Name")) %>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "red")
Sponsor Name Owner Name Terminal ID Address City Location Name
BANK OF MONTREAL DIRECT CASH F4257559 20 YORK ST TORONTO CIBC GTAA UP UNION STATION
CANADIAN IMPERIAL BANK OF COMMERCE CIBC F4257559 20 YORK ST TORONTO UNION STATION UP EXPRESS TRAIN PLATFOR
DIRECTCASH BANK DC PAYMENTS CA F4257559 20 YORK ST TORONTO GTAA UP UNION STATION

Deduplication rule: .

Deduplicate (2023)

There are 88642 observations in the 2023 Mastercard data.

Eliminate duplicates within sponsor

Code
df_1_2023 <- mc_raw_2023 %>%
  group_by(sponsor_name, inst_assigned_term_id, state_prov) %>%   
  mutate(tag1 = case_when(
    n() == 1 ~ 0,
    n() == 2 ~ 1,
    n() > 2 ~ 2
  )) %>% 
  ungroup() %>%
  arrange(state_prov, tag1, inst_assigned_term_id) %>%        
  mutate(todrop = 0,                                         
         todrop = if_else(owner_name == "MBE", todrop + 1, todrop),
         todrop = if_else(grepl("PO BOX", address1), todrop + 1, todrop),
         todrop = if_else(is.na(latitude), todrop + 1, todrop),
         todrop = if_else(tag1 == 1 & owner_name == "DIRECT CASH", todrop + 1, todrop)) %>% 
  arrange(desc(tag1), desc(inst_assigned_term_id), desc(todrop), desc(address1))

df_2_2023 <- df_1_2023 %>% 
  arrange(tag1, inst_assigned_term_id, todrop, address1) %>%
  group_by(tag1,inst_assigned_term_id) %>%
  mutate(dupeorder = row_number()) %>%
  ungroup() %>%
  filter(tag1 < 1 | dupeorder == 1) %>%
  select(-todrop, -tag1)

# mc_raw_2023 %>%
#   group_by(inst_assigned_term_id) %>%
#   # Flag PO Box entries for removal if the same terminal ID has a real address
#   mutate(todrop = if_else(grepl("PO BOX", address1) & n() > 1, 1, 0)) %>%
#   # Remove the flagged PO Box entries
#   filter(todrop == 0) %>%
#   ungroup() %>%
#   select(-todrop)

Then eliminate duplicates with different sponsors

Code
df_3_2023 <- df_2_2023 %>%
  group_by(inst_assigned_term_id, state_prov) %>%   
  mutate(tag = case_when(
    n() == 1 ~ 0,
    n() == 2 ~ 1,
    n() > 2 ~ 2
  )) %>% 
  ungroup() %>% 
  mutate(tokeep = 0) %>%
  # when BMO and Moneris share a terminal ID, we are keeping BMO
  mutate(tokeep = if_else(tag >= 1 & sponsor_name == "BANK OF MONTREAL", tokeep + 1, tokeep),
         # When directcash bank and some other sponsor (BMO) share an terminal ID, we remove
         # direct cash bank 
         tokeep = if_else(tag >= 1 & sponsor_name == "DIRECTCASH BANK", tokeep - 1, tokeep),
         # if cibc shares an ATM with another sponsor we retain CIBC
         tokeep = if_else(tag >= 1 & sponsor_name == "CANADIAN IMPERIAL BANK OF COMMERCE", tokeep + 1, tokeep),
         
         ### RBC AND SCOTIA BANK DUPLICATES ACROSS SPONSORS ARE NOT REAL DUPLICATES
         tokeep = if_else(tag >= 1 & sponsor_name == "ROYAL BANK OF CANADA", tokeep + 100, tokeep),
         tokeep = if_else(tag >= 1 & sponsor_name == "BANK OF NOVA SCOTIA", tokeep + 100, tokeep)) %>% 
  arrange(inst_assigned_term_id, desc(tokeep)) %>% 
  arrange(tokeep)


df_4_2023 <- df_3_2023 %>% 
  group_by(inst_assigned_term_id) %>%
  mutate(dupeorder = row_number()) %>%
  ungroup() %>% 
  filter(tokeep >= 10 | dupeorder == 1) %>% 
  select(-tokeep, -tag) %>% 
  # I am dropping this here for now because I wanted the observations to match what Hong Yu had in his stata script.
  # when I update the document I will probably move this to a later stage
  filter(owner_name != "FEDERATION DES CAISSES POPULAIRES ACADIENNES")

After deduplication, there are 60306 observations in the 2023 Mastercard data.

Identify Duplicates (2022)

Code
mc_raw_2022 <- read_csv("data/unprocessed/mastercard/BOC_All_ATMImage(November 2022).txt", guess_max = 50000) %>% 
  clean_names()

Within Sponsors

Code
tagged_2_2023 <- mc_raw_2022 %>% 
  group_by(inst_assigned_term_id, sponsor_name) %>%
  mutate(duplicate_tag_1 = case_when(
    n() == 1 ~ 0,
    n() == 2 ~ 1,
    n() > 2 ~ 2
  )) %>% 
  ungroup()

tagged_2_2023 %>% 
  count(sponsor_name, duplicate_tag_1) %>% 
  pivot_wider(names_from = duplicate_tag_1, values_from = n, names_prefix = "tag_") %>% 
  # Replace missing values (NA) with 0 in the reshaped data
  mutate(across(where(is.integer), ~ replace_na(., 0))) %>% 
  # Create a formatted table with 'kable' for better presentation
  kbl(caption = "Table 4: Duplicates Within Sponsors (2022)", col.names = c("Sponsor Name", "Tag 0", "Tag 1")) %>% 
  kable_classic()
Table 4: Duplicates Within Sponsors (2022)
Sponsor Name Tag 0 Tag 1
BANK OF MONTREAL 2950 0
BANK OF NOVA SCOTIA 2402 80
CANADIAN IMPERIAL BANK OF COMMERCE 2999 0
DIRECTCASH BANK 7271 96
EVERLINK PAYMENT SERVICES INC. 2279 0
FEDERATION DES CAISSES DESJARDINS DU QUEBEC 1875 0
MONERIS SOLUTIONS CORPORATION 34540 0
NATIONAL BANK OF CANADA 929 0
ROYAL BANK OF CANADA 3989 0
TORONTO-DOMINION BANK THE 2749 0

For Directcash Bank, the duplicates are due to the same record assigned once to its actual address, and once to a PO Box. Otherwise the records look exactly the same.

Code
tagged_2_2023 %>%
  filter(duplicate_tag_1 != 0) %>%
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name) %>% 
  filter(sponsor_name == "DIRECTCASH BANK") %>% 
  arrange(inst_assigned_term_id) %>% 
  filter(inst_assigned_term_id != "C4066615") %>% 
  filter(inst_assigned_term_id != "C4066617") %>% 
  head(6) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City"))%>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "green") %>% 
  row_spec(4, color = "green") %>% 
  row_spec(5, color = "orange") %>% 
  row_spec(6, color = "orange")
Sponsor Name Owner Name Terminal ID Address City
DIRECTCASH BANK DC PAYMENTS CA C4066599 PO BOX 3725 STN CENTRAL VICTORIA
DIRECTCASH BANK DC PAYMENTS CA C4066599 NELLES BLOCK CFB ESQUIMALT VICTORIA
DIRECTCASH BANK DC PAYMENTS CA C4066616 PO BOX 250 BUILDING 100 SHEARWATER
DIRECTCASH BANK DC PAYMENTS CA C4066616 12 WING SHEARWATER BUILDING 100 SHEARWATER
DIRECTCASH BANK CHARLIES LUNCH D1016013 PO BOX 172 EASTEND
DIRECTCASH BANK CHARLIES LUNCH D1016013 327 RED COAT DR EASTEND

Deduplication rule: When DirectCash Bank records share the same terminal ID but have different addresses, we drop the record where the address contains a PO Box and retain the record with a physical address.

Scotiabank duplicates have the same record with its address written differently, such as different city names with the same address, or one record ignores the street number while the other does have it. These are real duplicates as well.

Code
tagged_2_2023 %>%
  filter(duplicate_tag_1 != 0) %>%
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name) %>% 
  filter(sponsor_name == "BANK OF NOVA SCOTIA") %>% 
  arrange(inst_assigned_term_id) %>% 
  head(6) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City"))%>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "green") %>% 
  row_spec(4, color = "green") %>% 
  row_spec(5, color = "orange") %>% 
  row_spec(6, color = "orange")
Sponsor Name Owner Name Terminal ID Address City
BANK OF NOVA SCOTIA SCOTIABANK AA13 305 PORT UNION RD WEST HILL
BANK OF NOVA SCOTIA SCOTIABANK AA13 305 PORT UNION RD SCARBOROUGH
BANK OF NOVA SCOTIA SCOTIABANK AA21 305 PORT UNION RD WEST HILL
BANK OF NOVA SCOTIA SCOTIABANK AA21 305 PORT UNION RD SCARBOROUGH
BANK OF NOVA SCOTIA SCOTIABANK AE25 1414 18TH ST 1901 RICHMOND AVE BRANDON
BANK OF NOVA SCOTIA SCOTIABANK AE25 1570 18TH ST BRANDON

Across Sponsors

Code
tagged_2022 <- mc_raw_2022 %>% 
  group_by(inst_assigned_term_id) %>%
  mutate(duplicate_tag = case_when(
    n() == 1 ~ 0,
    n() == 2 ~ 1,
    n() > 2 ~ 2
  )) %>% 
  ungroup()

tagged_2022 %>% 
  count(sponsor_name, duplicate_tag) %>% 
  pivot_wider(names_from = duplicate_tag, values_from = n, names_prefix = "tag_") %>% 
  mutate(across(where(is.integer), ~ replace_na(., 0))) %>% 
  # Create a formatted table with 'kable' for better presentation
  kbl(caption = "Table 3: Duplicates Across Sponsors (2022)", col.names = c("Sponsor Name", "Tag 0", "Tag 1", "Tag 2")) %>% 
  kable_classic() %>% 
  kable_styling(full_width = FALSE)
Table 3: Duplicates Across Sponsors (2022)
Sponsor Name Tag 0 Tag 1 Tag 2
BANK OF MONTREAL 2950 0 0
BANK OF NOVA SCOTIA 2307 171 4
CANADIAN IMPERIAL BANK OF COMMERCE 2755 244 0
DIRECTCASH BANK 3312 3991 64
EVERLINK PAYMENT SERVICES INC. 2279 0 0
FEDERATION DES CAISSES DESJARDINS DU QUEBEC 1875 0 0
MONERIS SOLUTIONS CORPORATION 30334 4174 32
NATIONAL BANK OF CANADA 816 113 0
ROYAL BANK OF CANADA 3893 94 2
TORONTO-DOMINION BANK THE 2636 113 0

Most duplicates (around 4000) have the sponsor being Directcash in one entry and the owner being Direct Cash in the other entry

Direct Cash Bank - Moneris
Code
tagged_2022 %>% 
  filter(duplicate_tag == 1) %>% 
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name, 
         location_name) %>% 
  filter(inst_assigned_term_id == "D1014793" | 
           inst_assigned_term_id == "D1037192") %>% 
  arrange(inst_assigned_term_id) %>% 
  head(4) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City", "Location Name")) %>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "green") %>% 
  row_spec(4, color = "green")
Sponsor Name Owner Name Terminal ID Address City Location Name
DIRECTCASH BANK AVONDALE STORES LIMITED D1014793 242 KILLALY ST W PORT COLBORNE AVONDALE 033
MONERIS SOLUTIONS CORPORATION DIRECT CASH D1014793 242 KILLALY ST W PORT COLBORNE AVONDALE 033
MONERIS SOLUTIONS CORPORATION DIRECT CASH D1037192 252 THE ESPLANADE TORONTO VILLAGES MINI MART
DIRECTCASH BANK DC PAYMENTS CA D1037192 252 THE ESPLANADE TORONTO VILLAGES MINI MART

Deduplication rule: When DIRECTCASH BANK and MONERIS share the same terminal ID, we use the Sponsor Name to determine which record to keep. We drop the record associated with DirectCash Bank and retain the record associated with MONERIS.

Other

For the few duplicate CIBC ATMs, it seems that they are actually owned by Cardtronics, but branded by CIBC.

Code
tagged_2022 %>% 
  filter(duplicate_tag == 1) %>% 
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name, 
         location_name) %>% 
  group_by(inst_assigned_term_id) %>%
  filter(all(c("CANADIAN IMPERIAL BANK OF COMMERCE", "MONERIS SOLUTIONS CORPORATION") %in% sponsor_name)) %>%
  ungroup() %>% 
  arrange(desc(inst_assigned_term_id)) %>%
  head(6) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City", "Location Name"))%>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "green") %>% 
  row_spec(4, color = "green") %>% 
  row_spec(5, color = "orange") %>% 
  row_spec(6, color = "orange")
Sponsor Name Owner Name Terminal ID Address City Location Name
CANADIAN IMPERIAL BANK OF COMMERCE CIBC PI909G9X 1315 COUNTY RD 28 FRASERVILLE PIONEER GAS STATION
MONERIS SOLUTIONS CORPORATION CARDTRONICS PI909G9X 1315 COUNTY ROAD 28 FRASERVILLE FRASERVILLE OTR 43251
MONERIS SOLUTIONS CORPORATION CARDTRONICS PI909G9W 453 DUNLOP ST W BARRIE PIONEER OTR42339
CANADIAN IMPERIAL BANK OF COMMERCE CIBC PI909G9W 453 DUNLOP ST W BARRIE PIONEER GAS STATION
CANADIAN IMPERIAL BANK OF COMMERCE CIBC PI909G9V 1089 TALBOT ST ST THOMAS PIONEER GAS STATION
MONERIS SOLUTIONS CORPORATION CARDTRONICS PI909G9V 1089 TALBOT ST ST THOMAS PIONEER OTR 43233

Deduplication rule: Retain the record associated with CIBC

It seems that the RBC and Scotiabank’s “duplicated ATMs” are actually not duplicates, but different ATMs present in their respective branches. This could to be because the terminal ID variable is only 4 digit in each case.

Code
tagged_2022 %>% 
  filter(duplicate_tag == 1) %>% 
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name, 
         location_name) %>% 
  group_by(inst_assigned_term_id) %>%
  filter(all(c("ROYAL BANK OF CANADA", "BANK OF NOVA SCOTIA") %in% sponsor_name)) %>%
  ungroup() %>% 
  arrange(desc(inst_assigned_term_id)) %>%
  head(6) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City", "Location Name"))%>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "green") %>% 
  row_spec(4, color = "green") %>% 
  row_spec(5, color = "orange") %>% 
  row_spec(6, color = "orange")
Sponsor Name Owner Name Terminal ID Address City Location Name
ROYAL BANK OF CANADA ROYAL BANK OF CANADA WD21 85 MAIN ST MELITA MELITA BRWH28 AUD
BANK OF NOVA SCOTIA SCOTIABANK WD21 1195 ARTHUR ST W THUNDER BAY ARTHUR PARKDALE
ROYAL BANK OF CANADA ROYAL BANK OF CANADA WC23 220 PORTAGE AVE WINNIPEG WINNIPEG MNWE54AUD
BANK OF NOVA SCOTIA SCOTIABANK WC23 1091 ST MARYS RD WINNIPEG DAKOTA VILLAGE 3
ROYAL BANK OF CANADA ROYAL BANK OF CANADA WC21 220 PORTAGE AVE WINNIPEG WINNIPEG MAIN BR AUD
BANK OF NOVA SCOTIA SCOTIABANK WC21 318 1 ST WINNIPEGOSIS WINNIPEGOSIS BRANCH

Deduplication rule: These are not real duplicates. Nothing to do here.

It is also true that NBC and TD’s “duplicated ATMs” are actually not duplicates, but different ATMs present in their respective branches.

Code
tagged_2022 %>% 
  filter(duplicate_tag == 1) %>% 
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name, 
         location_name) %>% 
  group_by(inst_assigned_term_id) %>%
  filter(all(c("TORONTO-DOMINION BANK THE", "NATIONAL BANK OF CANADA") %in% sponsor_name)) %>%
  ungroup() %>% 
  arrange(desc(inst_assigned_term_id)) %>%
  head(6) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City", "Location Name"))%>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "green") %>% 
  row_spec(4, color = "green") %>% 
  row_spec(5, color = "orange") %>% 
  row_spec(6, color = "orange")
Sponsor Name Owner Name Terminal ID Address City Location Name
TORONTO-DOMINION BANK THE TORONTO-DOMINION BANK 7772 2025 PRINCE OF WALES DR REGINA VICTORIA SQUARE
NATIONAL BANK OF CANADA NATIONAL BANK OF CANADA 7772 560 RUE SACR-COEUR O ALMA SUCC FERME
NATIONAL BANK OF CANADA NATIONAL BANK OF CANADA 7771 3261 1IRE AV RAWDON SUCCURSALE
TORONTO-DOMINION BANK THE TORONTO-DOMINION BANK 7771 2025 PRINCE OF WALES DR REGINA VICTORIA SQUARE
NATIONAL BANK OF CANADA NATIONAL BANK OF CANADA 7770 585 AV SAINT-CHARLES VAUDREUIL-DORION SUCCURSALE
TORONTO-DOMINION BANK THE TORONTO-DOMINION BANK 7770 2025 PRINCE OF WALES DR REGINA VICTORIA SQUARE

Deduplication rule: These are not real duplicates. Nothing to do here.

Some Scotiabank ATMs are double counted due to same address but different city names

Code
tagged_2022 %>% 
  filter(duplicate_tag == 1) %>% 
  select(sponsor_name, owner_name, inst_assigned_term_id, address1, city_name, 
         location_name) %>% 
  #group_by(inst_assigned_term_id) %>%
  filter(inst_assigned_term_id == "AA13" | inst_assigned_term_id == "AA21") %>% 
 # filter(all(c("TORONTO-DOMINION BANK THE", "NATIONAL BANK OF CANADA") %in% sponsor_name)) %>%
 # ungroup() %>% 
  arrange(desc(inst_assigned_term_id)) %>%
  head(4) %>% 
  kbl(col.names = c("Sponsor Name", "Owner Name", "Terminal ID", "Address", "City", "Location Name"))%>% 
  kable_classic() %>% 
  row_spec(1, color = "red") %>% 
  row_spec(2, color = "red") %>% 
  row_spec(3, color = "green") %>% 
  row_spec(4, color = "green") 
Sponsor Name Owner Name Terminal ID Address City Location Name
BANK OF NOVA SCOTIA SCOTIABANK AA21 305 PORT UNION RD WEST HILL PORT UNION FANFARE 2
BANK OF NOVA SCOTIA SCOTIABANK AA21 305 PORT UNION RD SCARBOROUGH PORT UNION FANFARE 2
BANK OF NOVA SCOTIA SCOTIABANK AA13 305 PORT UNION RD WEST HILL PORT UNION FANFARE 1
BANK OF NOVA SCOTIA SCOTIABANK AA13 305 PORT UNION RD SCARBOROUGH PORT UNION FANFARE 1

Deduplication rule: Keep one observation

Deduplicate (2022)

There are 62159 observations in the 2023 Mastercard data.

Eliminate duplicates within sponsor

Code
# Tag duplicates within the same sponsor and province
df_1_2022 <- mc_raw_2022 %>%
  group_by(sponsor_name, inst_assigned_term_id, state_prov) %>%   
  mutate(tag1 = case_when(
    n() == 1 ~ 0,
    n() == 2 ~ 1,
    n() > 2 ~ 2
  )) %>% 
  ungroup() %>%
  arrange(state_prov, tag1, inst_assigned_term_id) %>%
  mutate(todrop = 0,
         todrop = if_else(grepl("PO BOX", address1), todrop + 1, todrop),
         todrop = if_else(is.na(latitude), todrop + 1, todrop)) %>% 
  arrange(desc(tag1), desc(inst_assigned_term_id), desc(todrop), desc(address1))

# Drop the duplicates within the same sponsor and province
df_2_2022 <- df_1_2022 %>%
  arrange(tag1, inst_assigned_term_id, todrop, address1) %>%
  group_by(tag1, inst_assigned_term_id) %>%
  mutate(dupeorder = row_number()) %>%
  ungroup() %>%
  filter(tag1 < 1 | dupeorder == 1) %>%
  select(-todrop, -tag1)

Then eliminate duplicates with different sponsors

Code
# Tag duplicates across different sponsors
df_3_2022 <- df_2_2022 %>%
  group_by(inst_assigned_term_id, state_prov) %>%   
  mutate(tag = case_when(
    n() == 1 ~ 0,
    n() == 2 ~ 1,
    n() > 2 ~ 2
  )) %>% 
  ungroup() %>% 
  mutate(tokeep = 0) %>%
  mutate(tokeep = if_else(tag >= 1 & sponsor_name == "MONERIS SOLUTIONS CORPORATION", tokeep + 1, tokeep),
         tokeep = if_else(tag >= 1 & sponsor_name == "DIRECTCASH BANK", tokeep - 1, tokeep),
         tokeep = if_else(tag >= 1 & sponsor_name == "CANADIAN IMPERIAL BANK OF COMMERCE", tokeep + 1, tokeep),
         tokeep = if_else(tag >= 1 & sponsor_name == "ROYAL BANK OF CANADA", tokeep + 100, tokeep),
         tokeep = if_else(tag >= 1 & sponsor_name == "BANK OF NOVA SCOTIA", tokeep + 100, tokeep),
         tokeep = if_else(tag >= 1 & sponsor_name == "NATIONAL BANK OF CANADA", tokeep + 100, tokeep),
         tokeep = if_else(tag >= 1 & sponsor_name == "TORONTO-DOMINION BANK THE", tokeep + 100, tokeep)) %>%
  arrange(inst_assigned_term_id, desc(tokeep)) %>% 
  arrange(tokeep)

# Final filtering and cleaning
df_4_2022 <- df_3_2022 %>% 
  group_by(inst_assigned_term_id) %>%
  mutate(dupeorder = row_number()) %>%
  ungroup() %>%
  filter(tokeep >= 10 | dupeorder == 1) %>%
  select(-tokeep, -tag) 

After deduplication, there are 57698 observations in the 2022 Mastercard data.

Step 2: Updating Desjardins ATM Data

Mastercard data is unreliable for Desjardins, as the 2019, 2022, and 2023 datasets show identical ATM counts, which is unlikely given Desjardins’ ongoing ATM closures (see Table 5 below). Therefore, we rely on external data sources for Desjardins.

Code
mc_raw_2019 <- read_csv("data/unprocessed/mastercard/BOC_All_ATMImage_(Nov 2019).txt", guess_max = 50000) %>% 
  clean_names()

desjardins_19 <- mc_raw_2019 %>% 
  count(owner_name) %>% 
  filter(owner_name == "LA FEDERATION DES CAISSES DESJARDINS DU QUEBEC" 
         | owner_name == "LA FEDERATION DES CAISSES POPULAIRES DE L'ONTARIO") %>% 
  mutate(date = "Nov 2019")

desjardins_22 <- df_4_2022 %>% 
  count(owner_name) %>% 
  filter(owner_name == "LA FEDERATION DES CAISSES DESJARDINS DU QUEBEC" 
         | owner_name == "LA FEDERATION DES CAISSES POPULAIRES DE LONTARIO") %>% 
  mutate(date = "Nov 2022")

desjardins_23 <- df_4_2023 %>% 
  count(owner_name) %>% 
   filter(owner_name == "LA FEDERATION DES CAISSES DESJARDINS DU QUEBEC" 
         | owner_name == "LA FEDERATION DES CAISSES POPULAIRES DE L'ONTARIO") %>% 
  mutate(date = "Dec 2023")

desjardins_19 %>% 
  bind_rows(desjardins_22) %>% 
  bind_rows(desjardins_23) %>% 
  mutate(
    owner_name = case_when(
    str_detect(owner_name, "LA FEDERATION DES CAISSES DESJARDINS DU QUEBEC") ~ "LA FEDERATION DES CAISSES DESJARDINS DU QUEBEC",
    str_detect(owner_name, "LA FEDERATION DES CAISSES POPULAIRES DE L'ONTARIO|LA FEDERATION DES CAISSES POPULAIRES DE LONTARIO") ~ "LA FEDERATION DES CAISSES POPULAIRES DE L'ONTARIO",
    TRUE ~ owner_name
  )) %>% 
   pivot_wider(names_from = date, values_from = n) %>% 
  kbl(caption = "Table 5: Desjardins ATM Counts Across Years (Mastercard Data)", col.names = c("Owner Name", "November 2019", "November 2022", "November 2023"))%>% 
  kable_classic()
Table 5: Desjardins ATM Counts Across Years (Mastercard Data)
Owner Name November 2019 November 2022 November 2023
LA FEDERATION DES CAISSES DESJARDINS DU QUEBEC 1706 1706 1706
LA FEDERATION DES CAISSES POPULAIRES DE L'ONTARIO 63 63 63

In the fall of 2022 and 2023, we webscraped Desjardins ATM data directly from their website. Unfortunately, we were unable to do this for 2019, so our improvements focus on the 2022 and 2023 datasets. The matching process for Desjardins ATMs involves the following steps:

  1. Fuzzy Matching: We first fuzzy match the addresses from the Mastercard ATMs to the webscraped ATMs using the fuzzyjoin package. We use the Jaro-Winkler string distance with a threshold of 0.2 (lower values indicate better matches) and reject any matches exceeding this threshold. This results in a matched dataset and two sets of rejected matches (one from Mastercard, one from the webscraped data).

  2. Manual Review: Next, we manually review the rejected matches to identify any false negatives. We carefully inspect and manually match addresses that were not captured by the fuzzy matching process. This results in four datasets: string-matched, manually-matched, rejected Mastercard matches, and rejected webscraped matches.

  3. Consolidation: We then combine the string-matched and manually-matched datasets to create a consolidated set of matched ATMs.

  4. Final Integration: Finally, since the webscraped data is more current and reliable, we add back the unmatched webscraped ATMs (those that didn’t have a corresponding match in the Mastercard data) to the matched dataset.

In summary, the final matched dataset is built through a process of initial fuzzy matching, manual review of unmatched data, and integrating unmatched webscraped ATMs back into the dataset.

Code
## READ IN OUR WEBSCRAPED DATA


# read in webscraped Desjardins ATM data 2023
desjardins_atm_2023_webscraped <- read_csv("data/unprocessed/webscraped/desjardins_2023_atm_geocoded.csv") %>% 
  mutate(latitude = lat) %>% 
  mutate(longitude = lng) %>% 
  select(-lat, -lng) %>% 
  rename(webscraped_id = id) %>% 
  # clean addresses
  mutate(address_clean = str_replace_all(address_clean, ",", ""))

# read in webscraped Desjardins ATM data 2022
desjardins_atm_2022_webscraped <- read_csv("data/unprocessed/webscraped/desjardins_2022_atm_geocoded.csv") %>% 
  mutate(latitude = lat) %>% 
  mutate(longitude = lng) %>% 
  select(-lat, -lng, -score, -caisse_atm_name) %>% 
  rename(webscraped_id = id, name = caisse_caisse_name, address_clean = caisse_atm_address) %>% 
  mutate(address_clean = str_replace_all(address_clean, ",", ""))

# read in webscraped Desjardins 2022 ATM
desjardins_ontario_atm_2022 <- read_csv("data/unprocessed/webscraped/desjardins_ontario_atm_2022.csv") %>% 
  mutate(address_clean = str_replace_all(string = atm_address, "\n", " ")) %>% 
  select(-atm_address, -atm_name)

# read in web scraped Ontario desjardins ATMs
desjardins_ontario_atm_2023 <- read_csv("data/unprocessed/webscraped/desjardins_ontario_atm_2023.csv") %>% 
  mutate(address_clean = str_replace_all(string = atm_address, "\n", " ")) %>% 
  select(-atm_address, -atm_name)

# read in ontario web scraped branches
desjardins_branch_ont_23 <- read_csv("data/unprocessed/webscraped/desjardins_branches_ontario_2023.csv") %>% 
  select(branch_address)

desjardins_branch_ont_22 <- read_csv("data/unprocessed/webscraped/desjardins_branches_ontario_2022.csv") %>% 
  select(branch_address)

## Subset deduplicated 2023 MasterCard data for quebec desjardin ATMs

desjardins_2023_mc <- df_4_2023 %>% 
  filter(owner_name == "LA FEDERATION DES CAISSES DESJARDINS DU QUEBEC") %>% 
  mutate(address_clean = paste0(address1, " ", city_name, " ", postal_code, " Quebec")) %>% 
  mutate(address_clean = str_to_lower(address_clean)) %>% 
  select(inst_assigned_term_id, owner_name, latitude, longitude, address_clean) %>% 
  mutate(mc_id = row_number()) %>% 
  mutate(address_clean = str_replace_all(address_clean, pattern = "quebec", "Québec")) %>% 
  mutate(address_clean = str_replace_all(address_clean, "(?i)saint(\\S)", "saint \\1"))

## Subset deduplicated 2023 MasterCard data for Ontario desjardin ATMs

desjardins_2023_mc_ontario <- df_4_2023 %>% 
  filter(owner_name == "LA FEDERATION DES CAISSES POPULAIRES DE L'ONTARIO") %>% 
  mutate(address_clean = paste0(address1, " ", city_name, " ", postal_code, " Ontario")) %>% 
  mutate(address_clean = str_to_lower(address_clean)) %>% 
  select(inst_assigned_term_id, owner_name, latitude, longitude, address_clean) %>% 
  mutate(mc_id = row_number())

## Subset deduplicated 2022 MasterCard data for quebec desjardin ATMs

desjardins_2022_mc <- df_4_2022 %>% 
  filter(owner_name == "LA FEDERATION DES CAISSES DESJARDINS DU QUEBEC") %>% 
  mutate(address_clean = paste0(address1, " ", city_name, " ", postal_code, " Quebec")) %>% 
  mutate(address_clean = str_to_lower(address_clean)) %>% 
  select(inst_assigned_term_id, owner_name, latitude, longitude, address_clean) %>% 
  mutate(mc_id = row_number()) %>% 
  mutate(address_clean = str_replace_all(address_clean, pattern = "quebec", "Québec")) %>% 
  mutate(address_clean = str_replace_all(address_clean, "(?i)saint(\\S)", "saint \\1"))

## Subset deduplicated 2023 MasterCard data for Ontario desjardin ATMs

desjardins_2022_mc_ontario <- df_4_2022 %>% 
  filter(owner_name == "LA FEDERATION DES CAISSES POPULAIRES DE L'ONTARIO") %>% 
  mutate(address_clean = paste0(address1, " ", city_name, " ", postal_code, " Ontario")) %>% 
  mutate(address_clean = str_to_lower(address_clean)) %>% 
  select(inst_assigned_term_id, owner_name, latitude, longitude, address_clean) %>% 
  mutate(mc_id = row_number())

String Match (2023)

Code
library(downloadthis)
library(gt)
library(fuzzyjoin)

# Perform fuzzy matching
matched_data_23 <- stringdist_join(
  x = desjardins_2023_mc,
  y = desjardins_atm_2023_webscraped,
  by = "address_clean",
  method = "jw",
  max_dist = 0.21,
  ignore_case = TRUE,
  mode = "inner",
  distance_col = "dist"
)

# Filter to keep the best match for each mc_id
best_matches_23 <- matched_data_23 %>%
  group_by(mc_id) %>%
  slice_min(order_by = dist, n = 1, with_ties = FALSE) %>%
  ungroup() %>% 
  select(-latitude.y, -longitude.y, -name) %>% 
  rename(longitude = longitude.x,
         latitude = latitude.x,
         address_mc = address_clean.x,
         address_webscraped = address_clean.y) %>% 
  select(mc_id, webscraped_id, everything())

# Identify unmatched records in dataset 1 (to discard)
unmatched_mc <- desjardins_2023_mc %>%
  anti_join(best_matches_23, by = "mc_id")

# Identify unmatched records in dataset 2 (to retain)
unmatched_webscraped <- desjardins_atm_2023_webscraped %>%
  anti_join(best_matches_23, by = "webscraped_id")

### After manual inspection there were many addresses that should have been matched but were missed in the string matching procedure
### In this section I am adding them back into the mc data

We matched 1354 ATMs from the Mastercard data to the web scraped data after string matching. You can download the data below.

Code
library(downloadthis)

best_matches_23 %>%
  download_this(
    output_name = "desjardin_matching_23_v1",
    output_extension = ".xlsx",
    button_label = "Download data as xlsx",
    button_type = "default",
    has_icon = TRUE,
    icon = "fa fa-save"
  )

Next, we review the ATMs that were rejected from both the Mastercard and webscraped data, and performed manual matching where necessary. The table below gives us the ATMs from the webscraped data that did not meet the threshold for string matching

Code
library(DT)

unmatched_webscraped %>% 
  select(-name) %>% 
  datatable() 

The table below gives us the ATMs from the Mastercard data that did not meet the threshold for string matching

Code
unmatched_mc %>% 
  select(mc_id, address_clean, latitude, longitude) %>% 
  datatable() 

Manual Review (2023)

Using the two tables above, we manually match the false negatives based on their IDs. You can search for addresses or IDs using the search bar. The code chunk below creates a dataframe to capture these false negatives based on their IDs.

Code
# Define the manually matched IDs based on your input
matched_mc_webscraped <- tibble::tribble(
  ~mc_id, ~webscraped_id,
  1, 542,
  2, 542,
  114, 350,
  124, 356,
  171, 412,
  192, 363,
  193, 363,
  194, 362,
  197, 362,
  198, 362,
  199, 362,
  201, 362,
  222, 101,
  224, 101,
  321, 230,
  322, 230,
  323, 230,
  331, 120,
  332, 123,
  348, 371,
  351, 371,
  375, 144,
  384, 232,
  385, 232,
  395, 152,
  396, 152,
  399, 45,
  400, 45,
  401, 45,
  403, 45,
  409, 235,
  433, 903,
  434, 903,
  499, 282,
  504, 286,
  548, 9,
  639, 678,
  640, 678,
  645, 682,
  646, 682,
  670, 610,
  671, 610,
  672, 610,
  721, 841,
  722, 842,
  729, 687,
  730, 691,
  756, 844,
  779, 751,
  780, 751,
  811, 806,
  901, 703,
  902, 703,
  903, 703,
  907, 706,
  911, 704,
  912, 704,
  913, 704,
  957, 768,
  958, 768,
  959, 768,
  960, 777,
  961, 777,
  968, 778,
  979, 635,
  980, 635,
  981, 635,
  982, 635,
  984, 633,
  985, 633,
  986, 638,
  987, 638,
  990, 638,
  996, 714,
  997, 714,
  1008, 779,
  1009, 779,
  1038, 657,
  1049, 658,
  1053, 653,
  1054, 653,
  1055, 717,
  1056, 717,
  1079, 660,
  1080, 660,
  1106, 722,
  1107, 722,
  1108, 722,
  1148, 599,
  1149, 599,
  1150, 599,
  1244, 29,
  1245, 29,
  1306, 444,
  1307, 444,
  1308, 444,
  1330, 460,
  1387, 66,
  1401, 82,
  1425, 300,
  1426, 300,
  1427, 300,
  1437, 302,
  1448, 313,
  1463, 318,
  1485, 335,
  1605, 479,
  1611, 479,
  1612, 479,
  1615, 479,
  1620, 479,
  1634, 540,
  1635, 540,
  1645, 377,
  1650, 377,
  1675, 916,
  1676, 916,
  1686, 921,
  1695, 924
)
Code
# join Id's that we matched manually
mc_clean <- unmatched_mc %>% 
  rename(address_clean_mc = address_clean) %>% 
  rename(latitude_mc = latitude,
         longitude_mc = longitude)

web_clean <- unmatched_webscraped %>% 
  rename(address_web = address_clean) %>% 
  rename(latitude_web = latitude,
         longitude_web = longitude) %>% 
  select(-name)

# join manually matched id
matched_mc_webscraped %>% 
  inner_join(mc_clean) %>% 
  inner_join(web_clean) %>% 
  select(mc_id, webscraped_id, address_clean_mc, address_web)

Next, we review the unmatched data from both the Mastercard and webscraped datasets—those records that remained unmatched after both string matching and manual matching. Below, you have the option to download these unmatched records for further analysis or review.

Code
unmatched_mc_final <- unmatched_mc %>%
  filter(!mc_id %in% matched_mc_webscraped$mc_id)

unmatched_webscraped_final <- unmatched_webscraped %>%
  filter(!webscraped_id %in% matched_mc_webscraped$webscraped_id) %>% 
  select(-name) %>% 
  rename(latitude_web = latitude,
         longitude_web = longitude,
         address_web = address_clean)

unmatched_mc_final %>%
  download_this(
    output_name = "unmatched_mc_23",
    output_extension = ".xlsx",
    button_label = "Download data as xlsx",
    button_type = "default",
    has_icon = TRUE,
    icon = "fa fa-save"
  )
Code
unmatched_webscraped_final %>%
  download_this(
    output_name = "unmatched_webscraped_23",
    output_extension = ".xlsx",
    button_label = "Download data as xlsx",
    button_type = "default",
    has_icon = TRUE,
    icon = "fa fa-save"
  )

Consolidate (2023)

We consolidate the string-matched and manually matched data, then add the unmatched webscraped data to create the final dataset for Desjardins ATMs in Quebec. The Ontario matching will be handled in the next step.

Code
# 
matched_final <- matched_mc_webscraped %>% 
  inner_join(mc_clean) %>% 
  inner_join(web_clean) %>% 
  bind_rows(unmatched_webscraped_final) %>% 
  mutate(address_clean = address_web) %>% 
  mutate(address_clean = ifelse(webscraped_id == 350, "501 rue Beaudoin Saint-Léonard-d’Aston Québec J0C 1M0", address_clean)) %>% 
  mutate(address_clean = ifelse(webscraped_id == 777, "170 boulevard Harwood Vaudreuil-Dorion Québec J7V 1Y2", address_clean)) %>% 
  mutate(address_clean = ifelse(webscraped_id == 758, "745 rue Roy Est Montréal Québec H2L 1E1", address_clean)) %>% 
  filter(webscraped_id != 488) %>% 
  filter(webscraped_id != 778) %>% 
  mutate(latitude = latitude_mc,
         longitude = longitude_mc) %>% 
  mutate(latitude = ifelse(is.na(latitude), latitude_web, latitude)) %>% 
  mutate(longitude = ifelse(is.na(longitude), latitude_web, longitude)) %>%  
  mutate(owner_name = ifelse(is.na(owner_name), "Desjardins (webscraped)", owner_name)) %>% 
  select(mc_id, webscraped_id, inst_assigned_term_id, owner_name, 
         address_webscraped = address_web, address_mc = address_clean_mc, address_clean, latitude, longitude) %>% 
  mutate(matched = "manual") %>% 
  mutate(matched = ifelse(is.na(mc_id), "unmatched webscraped", matched))

best_matches_23 %>% 
  mutate(address_clean = address_webscraped) %>% 
  mutate(matched = "string match") %>% 
  bind_rows(matched_final) %>% 
  rename(string_distance = dist) %>% 
  select(mc_id, webscraped_id, address_mc, address_webscraped, string_distance, matched) %>% 
  download_this(
    output_name = "matched_desjardins_final_23.",
    output_extension = ".xlsx",
    button_label = "Download data as xlsx",
    button_type = "default",
    has_icon = TRUE,
    icon = "fa fa-save"
  )

Ontario (2023)

Next, we will match the Ontario Desjardins ATMs. This matching procedure differs slightly from the one used for Quebec. In 2022 and 2023, we webscraped Ontario Desjardins ATMs, but discovered that this data only included off-site ATMs and excluded those located inside bank branches. After inspecting the Mastercard data for Ontario, we found that it included ATMs inside bank branches.

Therefore, our matching procedure is as follows: First, we fuzzy match the Ontario Desjardins ATMs from Mastercard to the webscraped Ontario Desjardins bank branches, removing any Mastercard ATMs that do not meet a specific address matching threshold. Next, we manually inspect any false negatives and add them back if necessary. Finally, we include the webscraped Ontario Desjardins ATMs in the dataset, as they represent relevant off-site ATMs that should be retained.

Code
desjardins_2023_mc_ontario %>%
  select(mc_id, address_clean) 
# A tibble: 63 × 2
   mc_id address_clean                          
   <int> <chr>                                  
 1     1 2900 gibford dr ottawa k1v 2r4 ontario 
 2     2 1212 wellington st w ottawa k1y ontario
 3     3 147 rideau st ottawa k1n 5x4 ontario   
 4     4 520 ch montral ottawa k1k ontario      
 5     5 3118 ch gendron hammond k0a ontario    
 6     6 523 ch limoges limoges k0a ontario     
 7     7 2591 st joseph blvd orlans k1c ontario 
 8     8 2591 st joseph blvd orlans k1c ontario 
 9     9 1545 rue laurier rockland k4k ontario  
10    10 1545 rue laurier rockland k4k ontario  
# ℹ 53 more rows
Code
desjardins_branch_ont_23 <- desjardins_branch_ont_23 %>% 
  mutate(branch_address = str_replace_all(branch_address, "\n", " ")) %>%
  mutate(branch_id = row_number()) %>% 
  mutate(address_clean = branch_address) 
  

# Perform fuzzy matching
matched_data_23_ont <- stringdist_join(
  x = desjardins_2023_mc_ontario,
  y = desjardins_branch_ont_23,
  by = "address_clean",
  method = "jw",
  max_dist = 0.21,
  ignore_case = TRUE,
  mode = "inner",
  distance_col = "dist"
)

matched_data_23_ont  %>% View()

best_matches_ontario_23 <- matched_data_23_ont %>% 
  group_by(mc_id) %>%
  slice_min(order_by = dist, n = 1, with_ties = FALSE) %>%
  ungroup() %>% 
  rename(address_mc = address_clean.x,
         address_webscraped = address_clean.y) %>% 
  select(mc_id, branch_id, address_mc, branch_address) 


# Identify unmatched records in dataset 1 (to discard)
unmatched_mc_ontario_23 <- desjardins_2023_mc_ontario %>%
  anti_join(best_matches_ontario_23, by = "mc_id") %>% 
  select(mc_id, address_clean)

unmatched_mc_ontario_23 %>% View()
# Identify unmatched records in dataset 2 (to retain)
unmatched_branches_ontario_23 <- desjardins_branch_ont_23 %>%
  anti_join(best_matches_ontario_23, by = "branch_id") %>% 
  select(-branch_address)