YourCustomer, Co. commissioned a project to analyze Suspicious Activity Reports (SAR) from the US Department of Treasury – Financial Crimes Enforcement Network and demographic data of United States counties within the 50 states. The goal of the analysis is to look for patterns between the SAR data points (month/year, instrument type, product type) and county socioeconomic descriptors. For this project, SARS for Credit Unions filed between 2015 to 2018 for the U.S. (excluding District of Columbia and territories) was analyzed.
The data sets used for this analysis are publicly available and are described below:
Data was accessed from the US Department of Treasury – Financial Crimes Enforcement Network website and the SAR Stat portal. The following criteria was used in downloading the SAR filings for Credit Unions (2015 – 2018):
Socioeconomic data for each county was acquired from the US Department of Agriculture’s Atlas of Rural and Small-Town America. The data is compiled for each county in the country from several Federal agencies and spans various time frames. The Atlas is comprised of four files, one for each topic: People, Jobs, Income, Veterans, and County Classifications. The County Classifications data is the main source of socioeconomic data for this project, but select variables from People, Jobs, and Income are also included to add supplementary demographic data points.
Descriptions of the data from the USDA:
Before looking at SARS filed by county, the data was analyzed as a whole to look at filings over time, by Product, and by Instrument.
Looking at the number of Suspicious Activity Reports filed over time, there is a clear trend upward over time with each year having a larger share of SARS filing than the last. Looking at the line plot, there is an interesting spike in activity at the end of 2018.
SARS_by_Year <- SARStats_demos %>%
group_by(as.character(Year)) %>%
summarize(Total_SARS = sum(Count)) %>%
mutate(Perc_SARS = (Total_SARS / sum(Total_SARS)))
SARS_by_Year %>%
mutate(Perc_SARS = sprintf("%0.1f%%", Perc_SARS*100)) %>%
kable(align = "c", col.names=c("Year", "Total SARS", "% of SARS"),
format.args = list(big.mark = ","))
| Year | Total SARS | % of SARS |
|---|---|---|
| 2015 | 12,919 | 15.9% |
| 2016 | 16,397 | 20.2% |
| 2017 | 23,055 | 28.3% |
| 2018 | 28,976 | 35.6% |
SARStats_demos %>%
group_by(Date) %>%
summarize(Total = sum(Count)) %>%
ggplot(aes(Date, Total)) + geom_line(size = 1.5) +
labs(title="Total Suspicious Activity Reports: 2015-2018", y="Total SARS")
The data was also viewed year-over-year to check for any seasonality. There does not appear to be any, however, it is more obvious now to see the increase in SARS filed year-over-year. Nearly every month, the next year has more SARS filed than the prior year. Additionally, the sharp increase in activity in late-2018 is even more evident.
SARStats_demos %>%
group_by(Date) %>%
summarize(Total = sum(Count)) %>%
ggplot(aes(month(Date), Total, color = factor(year(Date)))) + geom_line(size = 1.5) +
scale_color_brewer(palette = "Dark2") +
scale_x_discrete(limits = month.abb) +
theme(legend.title=element_blank(), legend.position="top") +
labs(title="Total Suspicious Activity Reports by Month & Year", x="Month", y="Total SARS", color="Year")
Of the Product types determined by Financial Crimes Enforcement Network, most of the SARS filed are for Credit Cards or Other. If just the top five products (Credit Card, Debit Card, Deposit Account, Home Equity Line of Credit, Other), the change of SARS filed over time is more evident.
SARS_by_Product <- SARStats_demos %>%
group_by(Product) %>%
summarize(Total_SARS = sum(Count)) %>%
mutate(Perc_SARS = (Total_SARS / sum(Total_SARS))) %>%
arrange(desc(Total_SARS))
SARS_by_Product %>%
mutate(Perc_SARS = sprintf("%0.1f%%", Perc_SARS*100)) %>%
kable(align = "c", col.names=c("Product", "Total SARS", "% of SARS"),
format.args = list(big.mark = ","))
| Product | Total SARS | % of SARS |
|---|---|---|
| Debit Card | 40,178 | 49.4% |
| Other | 24,957 | 30.7% |
| Credit Card | 7,727 | 9.5% |
| Deposit Account | 3,400 | 4.2% |
| Home Equity Line of Credit | 2,385 | 2.9% |
| Home Equity Loan | 747 | 0.9% |
| Residential Mortgage | 564 | 0.7% |
| Insurance/Annuity Products | 398 | 0.5% |
| Prepaid Access | 348 | 0.4% |
| Bonds/Notes | 160 | 0.2% |
| Commercial Paper | 109 | 0.1% |
| Stocks | 96 | 0.1% |
| Forex Transactions | 79 | 0.1% |
| Commercial Mortgage | 74 | 0.1% |
| Mutual Fund | 47 | 0.1% |
| Security Futures Products | 26 | 0.0% |
| Futures/Options on Futures | 19 | 0.0% |
| Microcap Securities | 13 | 0.0% |
| Swap, Hybrid, or Other Derivative | 10 | 0.0% |
| Hedge Fund | 8 | 0.0% |
| Options on Securities | 2 | 0.0% |
Debit Cards has the most clear upward trend in filings year-over-year, but Deposit Accounts did not appear in the data until late 2018 and had a sharp increase in filings each month. Further research is needed to know if Deposit Account was introduced as a new Product category on the SARS filing form to explain this trend. Even so, it is noteworthy that Deposit Accounts were only in 6 months of the time frame studied and was the 5th largest Product category.
SARStats_demos %>%
filter(Product == c("Debit Card", "Other", "Credit Card",
"Deposit Account", "Home Equity Line of Credit")) %>%
group_by(Year, Product) %>%
summarize(Total = sum(Count)) %>%
ggplot(aes(Year, Total, fill = Product)) + geom_col(color = 'white') +
scale_fill_brewer(palette = "Dark2") +
theme(legend.title=element_blank(), legend.position="top") +
labs(title="Total Suspicious Activity Reports by Product & Year", y="Total SARS")
SARStats_demos %>%
filter(Product == c("Debit Card", "Other", "Credit Card",
"Deposit Account", "Home Equity Line of Credit")) %>%
group_by(Date, Product) %>%
summarize(Total = sum(Count)) %>%
ggplot(aes(Date, Total, color = Product)) + geom_line(size = 1.25) +
scale_color_brewer(palette = "Dark2") +
theme(legend.title=element_blank(), legend.position="bottom") +
labs(y="Total SARS")
Ten Instrument types are listed on SARS filings. Of those, U.S. Currency, Personal/Business Check, Funds Transfer, and Bank/Cashier’s Check make up the bulk of filings.
SARS_by_Instrument <- SARStats_demos %>%
group_by(Instrument) %>%
summarize(Total_SARS = sum(Count)) %>%
mutate(Perc_SARS = (Total_SARS / sum(Total_SARS))) %>%
arrange(desc(Total_SARS))
SARS_by_Instrument %>%
mutate(Perc_SARS = sprintf("%0.1f%%", Perc_SARS*100)) %>%
kable(align = "c", col.names=c("Instrument", "Total SARS", "% of SARS"),
format.args = list(big.mark = ","))
| Instrument | Total SARS | % of SARS |
|---|---|---|
| U.S. Currency | 37,704 | 46.3% |
| Personal/Business Check | 18,850 | 23.2% |
| Funds Transfer | 12,623 | 15.5% |
| Bank/Cashier’s Check | 6,626 | 8.1% |
| Other | 3,443 | 4.2% |
| Money Orders | 1,305 | 1.6% |
| Government Payment | 582 | 0.7% |
| Foreign Currency | 104 | 0.1% |
| Gaming Instruments | 81 | 0.1% |
| Travelers Checks | 29 | 0.0% |
Looking at the filings by Instrument over time, there is a sharp increase in filings at the end of 2018 for U.S. Currency.
SARStats_demos %>%
filter(Instrument == c("U.S. Currency", "Personal/Business Check",
"Funds Transfer", "Bank/Cashier's Check")) %>%
group_by(Year, Instrument) %>%
summarize(Total = sum(Count)) %>%
ggplot(aes(Year, Total, fill = Instrument)) + geom_col(color = 'white') +
scale_fill_brewer(palette = "Dark2") +
theme(legend.title=element_blank(), legend.position="top") +
labs(title="Total Suspicious Activity Reports by Instrument & Year", y="Total SARS")
SARStats_demos %>%
filter(Instrument == c("U.S. Currency", "Personal/Business Check",
"Funds Transfer", "Bank/Cashier's Check")) %>%
group_by(Date, Instrument) %>%
summarize(Total = sum(Count)) %>%
ggplot(aes(Date, Total, color = Instrument)) + geom_line(size = 1.25) +
scale_color_brewer(palette = "Dark2") +
theme(legend.title=element_blank(), legend.position="bottom") +
labs(y="Total SARS")
YourCompany requested SARS activity to be analyzed by county characteristics determined by the USDA’s Atlas of Rural and Small-Town America. Summary statistics of SARS by County Classification and Major Industry are shown below.
Counties can be of three classifications: Metropolitan, Micropolitan, and Non-Core.
Metro counties account for nearly all of the SARS filings. 94.3% of SARS filed were in metro counties, but metro counties only account for 60.5% of counties in the data set.
SARS_by_Classification <- SARStats_demos %>%
group_by(Classification) %>%
summarize(Total_SARS = sum(Count),
Avg_County = round((sum(Count)/n_distinct(FIPStxt)), digits = 1),
N_Counties = n_distinct(FIPStxt)) %>%
mutate(Perc_SARS = (Total_SARS / sum(Total_SARS)),
Perc_Counties = (N_Counties / sum(N_Counties)))
SARS_by_Classification %>%
mutate(Perc_SARS = sprintf("%0.1f%%", Perc_SARS*100),
Perc_Counties = sprintf("%0.1f%%", Perc_Counties*100)) %>%
kable(align = "c", col.names=c("Classification", "Total SARS",
"Avg. SARS per County", "# Counties",
"% of SARS", "% of Counties" ),
format.args = list(big.mark = ","))
| Classification | Total SARS | Avg. SARS per County | # Counties | % of SARS | % of Counties |
|---|---|---|---|---|---|
| Metro | 76,704 | 97.2 | 789 | 94.3% | 60.5% |
| Micro | 3,024 | 9.6 | 316 | 3.7% | 24.2% |
| Non-core | 1,619 | 8.1 | 200 | 2.0% | 15.3% |
Looking at the SARS filings over time, Metro counties has trended upward over time and has a sharp increase in SARS in the latter half of 2018.
SARStats_demos %>%
group_by(Date, Classification) %>%
summarize(Total = sum(Count)) %>%
ggplot(aes(Date, Total, color = Classification)) + geom_line(size = 1.2) +
theme(legend.title=element_blank(), legend.position="top") +
scale_color_brewer(palette = "Dark2") +
labs(title="Total Suspicious Activity Reports by Classification", y="Total SARS")
Counties can have major industry designations if the county’s economy is dependent on said industry. The Atlas designated major industries as Farming, Government, Manufacturing, Mining, and Recreation. Counties that do not have a specific major industry are tagged as Non-specialized.
The share of Total SARS versus Number of Counties is more similar between industries than when compared by County Classifications. However, while half of counties in the data set are Non-specialized, these counties accounted for 62.7% of the SARS filed. Government-dependent counties also had a higher share of SARS than the proportion of counties represented.
SARS_by_Industry <- SARStats_demos %>%
group_by(Major_Industry) %>%
summarize(Total_SARS = sum(Count),
Avg_County = round((sum(Count)/n_distinct(FIPStxt)), digits = 1),
N_Counties = n_distinct(FIPStxt)) %>%
mutate(Perc_SARS = (Total_SARS / sum(Total_SARS)),
Perc_Counties = (N_Counties / sum(N_Counties)))
SARS_by_Industry %>%
mutate(Perc_SARS = sprintf("%0.1f%%", Perc_SARS*100),
Perc_Counties = sprintf("%0.1f%%", Perc_Counties*100)) %>%
kable(align = "c", col.names=c("Major Industry", "Total SARS",
"Avg. SARS per County", "# Counties",
"% of SARS", "% of Counties"),
format.args = list(big.mark = ","))
| Major Industry | Total SARS | Avg. SARS per County | # Counties | % of SARS | % of Counties |
|---|---|---|---|---|---|
| Farming | 247 | 7.3 | 34 | 0.3% | 2.6% |
| Government | 16,531 | 76.9 | 215 | 20.3% | 16.5% |
| Manufacturing | 6,441 | 29.8 | 216 | 7.9% | 16.6% |
| Mining | 1,708 | 38.0 | 45 | 2.1% | 3.4% |
| Non-specialized | 50,964 | 78.4 | 650 | 62.7% | 49.8% |
| Recreation | 5,456 | 37.6 | 145 | 6.7% | 11.1% |
As noted in prior sections of this analysis, there is a large spike in SARS filed in the second half of 2018. Non-specialized counties had a notable upward trend in SARS filed with a notable increase in the last 6 months of 2018. Manufacturing counties had a more gradual increase in activity, but had that same spike in SARS filed in the end of 2018.
SARStats_demos %>%
group_by(Date, Major_Industry) %>%
summarize(Total = sum(Count)) %>%
ggplot(aes(Date, Total, color = Major_Industry)) + geom_line(size = 1.2) +
theme(legend.title=element_blank(), legend.position="top") +
scale_color_brewer(palette = "Dark2") +
labs(title="Total Suspicious Activity Reports by Major Industry", y="Total SARS")
SARS filings include location data and by analyzing by state and county, it is clear that particular geographies account for the spike in SARS activity.
Looking at the 10 states with the most SARS filed, California and Florida both have a larger proportion of SARS than what would be expected from their share of counties in the data. They both have about 4% of the total number of counties, but California has nearly 15% and Florida has 11.5% of SARS filed.
Top10_SARS_State <- SARStats_demos %>%
group_by(State) %>%
summarize(Total_SARS = sum(Count),
N_Counties = n_distinct(FIPStxt)) %>%
mutate(Perc_SARS = (Total_SARS / sum(Total_SARS)),
Perc_Counties = (N_Counties/sum(N_Counties))) %>%
arrange(desc(Total_SARS)) %>%
slice(1:10)
Top_10_State <- Top10_SARS_State$State
Top10_SARS_State %>%
mutate(Perc_SARS = sprintf("%0.1f%%", Perc_SARS*100),
Perc_Counties = sprintf("%0.1f%%", Perc_Counties*100)) %>%
kable(align = "c", col.names=c("State", "Total SARS", "# Counties", "% of SARS", "% of Counties"),
format.args = list(big.mark = ","))
| State | Total SARS | # Counties | % of SARS | % of Counties |
|---|---|---|---|---|
| CA | 11,987 | 44 | 14.7% | 3.4% |
| FL | 9,352 | 51 | 11.5% | 3.9% |
| WA | 6,093 | 28 | 7.5% | 2.1% |
| PA | 5,408 | 46 | 6.6% | 3.5% |
| HI | 3,832 | 4 | 4.7% | 0.3% |
| NY | 3,796 | 53 | 4.7% | 4.1% |
| VA | 3,533 | 54 | 4.3% | 4.1% |
| OR | 3,118 | 26 | 3.8% | 2.0% |
| TX | 3,000 | 73 | 3.7% | 5.6% |
| MI | 2,261 | 64 | 2.8% | 4.9% |
Plotting the data over time, California has a very noticeable spike in SARS filed in the last few months of 2018. Interestingly, Florida actually saw a decline in activity during those final months.
SARStats_demos %>%
filter(State %in% Top_10_State) %>%
group_by(Date, State) %>%
summarize(Total = sum(Count)) %>%
ggplot(aes(Date, Total)) + geom_line() + facet_wrap(~State, nrow = 2) +
scale_color_brewer(palette = "Dark2") +
scale_x_date(breaks = "2 years", date_labels = "%Y") +
theme(strip.text.x = element_text(color="black", face = "bold")) +
labs(title="Total Suspicious Activity Reports by State", y="Total SARS")
Because there are so many counties in the data set, each one only represents a small portion of the total SARS activity. However, these 10 counties account for nearly 30% of SARS filed. Looking at the list, California has 3 counties in the top 10.
Top10_SARS_Counties <- SARStats_demos %>%
group_by(County_State) %>%
summarize(Total_SARS = sum(Count)) %>%
mutate(Perc_SARS = (Total_SARS / sum(Total_SARS))) %>%
arrange(desc(Total_SARS)) %>%
slice(1:10)
Top_10_county <- Top10_SARS_Counties$County_State
Top10_SARS_Counties %>%
mutate(Perc_SARS = sprintf("%0.1f%%", Perc_SARS*100)) %>%
kable(align = "c", col.names=c("County", "Total SARS", "% of SARS"),
format.args = list(big.mark = ","))
| County | Total SARS | % of SARS |
|---|---|---|
| Hillsborough, FL | 5,093 | 6.3% |
| Honolulu, HI | 3,247 | 4.0% |
| Los Angeles, CA | 2,643 | 3.2% |
| Santa Clara, CA | 2,524 | 3.1% |
| Dauphin, PA | 2,161 | 2.7% |
| King, WA | 2,136 | 2.6% |
| Fairfax County, VA | 2,002 | 2.5% |
| Sacramento, CA | 1,434 | 1.8% |
| Maricopa, AZ | 1,309 | 1.6% |
| Tarrant, TX | 1,098 | 1.3% |
Looking at the counties’ SARS activity over time, it is clear that Los Angeles, CA and Santa Clara, CA have shown the spike in activity seen in prior sections for those last 6 months of the time frame. Similarly as seen in the plot by State, Hillsborough, FL showed a noticeable decline in the final months.
SARStats_demos %>%
filter(County_State %in% Top_10_county) %>%
group_by(Date, County_State) %>%
summarize(Total = sum(Count)) %>%
ggplot(aes(Date, Total)) + geom_line() + facet_wrap(~County_State, nrow = 2) +
scale_color_brewer(palette = "Dark2") +
theme(strip.text.x = element_text(color="black", face = "bold")) +
scale_x_date(breaks = "2 years", date_labels = "%Y") +
labs(title="Total Suspicious Activity Reports by County", y="Total SARS")
To see if any of these counties have common characteristics, additional demographic data analyzed. Averages were used for population and unemployment rates of the annual values in the time frame of study. The metric for the number of SARS filed per 1,000 people was created to normalize for the variations in population.
All of the top 10 counties are metro areas, most are non-specialized, and there is a small delta in unemployment rates. Population growth varied across counties. The SARS per 1,000 people had a fairly large spread. Perhaps the most interesting was Dauphin, PA - the smallest of this group of counties - had 7.9 SARS filed per 1,000 people. This is more than double the next highest value (Hillsborough, FL with 3.6).
Top10_county_demos <- SARStats_counties %>%
filter(County_State %in% Top_10_county) %>%
select(-FIPStxt, -County, -State) %>%
merge(Top10_SARS_Counties) %>%
mutate(AvgUnempRate = round((UnempRate2015 + UnempRate2016 + UnempRate2017 + UnempRate2018)/4, digits = 1),
AvgPopulation = trunc((TotalPopEst2015 + TotalPopEst2016 + TotalPopEst2017 + TotalPopEst2018)/4),
SARS_per_1000 = round(Total_SARS/(AvgPopulation/1000), digits = 1),
PopGrowthRate = (TotalPopEst2018 - TotalPopEst2015)/TotalPopEst2015 *100) %>%
select(County_State, Classification, Major_Industry, AvgUnempRate, AvgPopulation,
PopGrowthRate, Total_SARS, SARS_per_1000) %>%
arrange(desc(Total_SARS))
Top10_county_demos %>%
mutate(PopGrowthRate = sprintf("%0.1f%%", PopGrowthRate)) %>%
kable(align = "c",
col.names=c("County", "Classification", "Major Industry", "Avg. Unemployment Rate",
"Avg. Population", "Pop. Growth Rate", "Total SARS", "SARS/1000 people"),
format.args = list(big.mark = ","))
| County | Classification | Major Industry | Avg. Unemployment Rate | Avg. Population | Pop. Growth Rate | Total SARS | SARS/1000 people |
|---|---|---|---|---|---|---|---|
| Hillsborough, FL | Metro | Non-specialized | 4.3 | 1,409,855 | 6.4% | 5,093 | 3.6 |
| Honolulu, HI | Metro | Government | 2.6 | 987,385 | -1.1% | 3,247 | 3.3 |
| Los Angeles, CA | Metro | Non-specialized | 5.4 | 10,092,185 | -0.1% | 2,643 | 0.3 |
| Santa Clara, CA | Metro | Manufacturing | 3.5 | 1,927,375 | 0.8% | 2,524 | 1.3 |
| Dauphin, PA | Metro | Government | 4.5 | 274,964 | 1.4% | 2,161 | 7.9 |
| King, WA | Metro | Non-specialized | 3.7 | 2,181,182 | 4.8% | 2,136 | 1.0 |
| Fairfax County, VA | Metro | Non-specialized | 3.0 | 1,145,445 | 0.7% | 2,002 | 1.7 |
| Sacramento, CA | Metro | Government | 5.0 | 1,517,677 | 3.0% | 1,434 | 0.9 |
| Maricopa, AZ | Metro | Non-specialized | 4.5 | 4,289,658 | 5.5% | 1,309 | 0.3 |
| Tarrant, TX | Metro | Non-specialized | 3.8 | 2,036,583 | 4.9% | 1,098 | 0.5 |