Summary

YourCustomer, Co. commissioned a project to analyze Suspicious Activity Reports (SAR) from the US Department of Treasury – Financial Crimes Enforcement Network and demographic data of United States counties within the 50 states. The goal of the analysis is to look for patterns between the SAR data points (month/year, instrument type, product type) and county socioeconomic descriptors. For this project, SARS for Credit Unions filed between 2015 to 2018 for the U.S. (excluding District of Columbia and territories) was analyzed.

The data sets used for this analysis are publicly available and are described below:

Suspicious Activity Reports

Data was accessed from the US Department of Treasury – Financial Crimes Enforcement Network website and the SAR Stat portal. The following criteria was used in downloading the SAR filings for Credit Unions (2015 – 2018):

  • Industry type: Depository Institution
  • Year & Month: 2015, 2016, 2017, 2018 (“Display months” selected)
  • Suspicious Activity Category / Type: All selected
  • State / Territories: Exclude American Samoa, Guam, Marshall Islands, Micronesia, Northern Mariana Islands, Unknown, Virgin Islands, and APO/DPO/FPO.
    • District of Columbia and Puerto Rico were included in the original data pull, but were filtered out of the data during data preparation stages in R.
  • County / Metro & Micro Area: County selected, All counties selected
  • Instrument Type(s) / Payment Mechanism(s): All selected
  • Product Type: All selected
  • Relationship: Not available at county level
  • Regulator: NCUA

US Department of Agriculture: Atlas of Rural and Small-Town America

Socioeconomic data for each county was acquired from the US Department of Agriculture’s Atlas of Rural and Small-Town America. The data is compiled for each county in the country from several Federal agencies and spans various time frames. The Atlas is comprised of four files, one for each topic: People, Jobs, Income, Veterans, and County Classifications. The County Classifications data is the main source of socioeconomic data for this project, but select variables from People, Jobs, and Income are also included to add supplementary demographic data points.

Descriptions of the data from the USDA:

  • County Classifications: Categorical variables including the rural-urban continuum codes, economic dependence codes, persistent poverty, persistent child poverty, population loss, onshore oil/natural gas counties, and other ERS county typology codes.
  • People: Demographic data from the American Community Survey (ACS), including age, race and ethnicity, migration and immigration, education, household size, and family composition.
  • Jobs: Economic data from the Bureau of Labor Statistics and other sources, including information on employment trends, unemployment, and industrial composition of employment from the ACS.
  • Income: Data on median household income, per capita income, and poverty (including child poverty).

Overall SARS Analysis

Before looking at SARS filed by county, the data was analyzed as a whole to look at filings over time, by Product, and by Instrument.

SARS by Year

Looking at the number of Suspicious Activity Reports filed over time, there is a clear trend upward over time with each year having a larger share of SARS filing than the last. Looking at the line plot, there is an interesting spike in activity at the end of 2018.

SARS_by_Year <- SARStats_demos %>%
  group_by(as.character(Year)) %>%
  summarize(Total_SARS = sum(Count)) %>%
  mutate(Perc_SARS = (Total_SARS / sum(Total_SARS)))

SARS_by_Year %>%
  mutate(Perc_SARS = sprintf("%0.1f%%", Perc_SARS*100)) %>%
  kable(align = "c", col.names=c("Year", "Total SARS", "% of SARS"),
      format.args = list(big.mark = ","))
Year Total SARS % of SARS
2015 12,919 15.9%
2016 16,397 20.2%
2017 23,055 28.3%
2018 28,976 35.6%
SARStats_demos %>%
  group_by(Date) %>%
  summarize(Total = sum(Count)) %>% 
  ggplot(aes(Date, Total)) + geom_line(size = 1.5) + 
  labs(title="Total Suspicious Activity Reports: 2015-2018", y="Total SARS")


The data was also viewed year-over-year to check for any seasonality. There does not appear to be any, however, it is more obvious now to see the increase in SARS filed year-over-year. Nearly every month, the next year has more SARS filed than the prior year. Additionally, the sharp increase in activity in late-2018 is even more evident.

SARStats_demos %>%
  group_by(Date) %>%
  summarize(Total = sum(Count)) %>% 
  ggplot(aes(month(Date), Total, color = factor(year(Date)))) + geom_line(size = 1.5) +
  scale_color_brewer(palette = "Dark2") +
  scale_x_discrete(limits = month.abb) +
  theme(legend.title=element_blank(), legend.position="top") +
  labs(title="Total Suspicious Activity Reports by Month & Year", x="Month", y="Total SARS", color="Year")

SARS by Product

Of the Product types determined by Financial Crimes Enforcement Network, most of the SARS filed are for Credit Cards or Other. If just the top five products (Credit Card, Debit Card, Deposit Account, Home Equity Line of Credit, Other), the change of SARS filed over time is more evident.

SARS_by_Product <- SARStats_demos %>%
  group_by(Product) %>%
  summarize(Total_SARS = sum(Count)) %>%
  mutate(Perc_SARS = (Total_SARS / sum(Total_SARS))) %>%
  arrange(desc(Total_SARS))

SARS_by_Product %>%
  mutate(Perc_SARS = sprintf("%0.1f%%", Perc_SARS*100)) %>%
  kable(align = "c", col.names=c("Product", "Total SARS", "% of SARS"),
      format.args = list(big.mark = ","))
Product Total SARS % of SARS
Debit Card 40,178 49.4%
Other 24,957 30.7%
Credit Card 7,727 9.5%
Deposit Account 3,400 4.2%
Home Equity Line of Credit 2,385 2.9%
Home Equity Loan 747 0.9%
Residential Mortgage 564 0.7%
Insurance/Annuity Products 398 0.5%
Prepaid Access 348 0.4%
Bonds/Notes 160 0.2%
Commercial Paper 109 0.1%
Stocks 96 0.1%
Forex Transactions 79 0.1%
Commercial Mortgage 74 0.1%
Mutual Fund 47 0.1%
Security Futures Products 26 0.0%
Futures/Options on Futures 19 0.0%
Microcap Securities 13 0.0%
Swap, Hybrid, or Other Derivative 10 0.0%
Hedge Fund 8 0.0%
Options on Securities 2 0.0%


Debit Cards has the most clear upward trend in filings year-over-year, but Deposit Accounts did not appear in the data until late 2018 and had a sharp increase in filings each month. Further research is needed to know if Deposit Account was introduced as a new Product category on the SARS filing form to explain this trend. Even so, it is noteworthy that Deposit Accounts were only in 6 months of the time frame studied and was the 5th largest Product category.

SARStats_demos %>%
  filter(Product == c("Debit Card", "Other", "Credit Card",
  "Deposit Account", "Home Equity Line of Credit")) %>%
  group_by(Year, Product) %>%
  summarize(Total = sum(Count)) %>%
  ggplot(aes(Year, Total, fill = Product)) + geom_col(color = 'white') +
  scale_fill_brewer(palette = "Dark2") +
  theme(legend.title=element_blank(), legend.position="top") +
  labs(title="Total Suspicious Activity Reports by Product & Year", y="Total SARS")

SARStats_demos %>%
  filter(Product == c("Debit Card", "Other", "Credit Card",
                      "Deposit Account", "Home Equity Line of Credit")) %>%
  group_by(Date, Product) %>%
  summarize(Total = sum(Count)) %>%
  ggplot(aes(Date, Total, color = Product)) + geom_line(size = 1.25) +
  scale_color_brewer(palette = "Dark2") +
  theme(legend.title=element_blank(), legend.position="bottom") +
  labs(y="Total SARS")

SARS by Instrument

Ten Instrument types are listed on SARS filings. Of those, U.S. Currency, Personal/Business Check, Funds Transfer, and Bank/Cashier’s Check make up the bulk of filings.

SARS_by_Instrument <- SARStats_demos %>%
  group_by(Instrument) %>%
  summarize(Total_SARS = sum(Count)) %>%
  mutate(Perc_SARS = (Total_SARS / sum(Total_SARS))) %>%
  arrange(desc(Total_SARS))

SARS_by_Instrument %>%
  mutate(Perc_SARS = sprintf("%0.1f%%", Perc_SARS*100)) %>%
  kable(align = "c", col.names=c("Instrument", "Total SARS", "% of SARS"),
      format.args = list(big.mark = ","))
Instrument Total SARS % of SARS
U.S. Currency 37,704 46.3%
Personal/Business Check 18,850 23.2%
Funds Transfer 12,623 15.5%
Bank/Cashier’s Check 6,626 8.1%
Other 3,443 4.2%
Money Orders 1,305 1.6%
Government Payment 582 0.7%
Foreign Currency 104 0.1%
Gaming Instruments 81 0.1%
Travelers Checks 29 0.0%


Looking at the filings by Instrument over time, there is a sharp increase in filings at the end of 2018 for U.S. Currency.

SARStats_demos %>%
  filter(Instrument == c("U.S. Currency", "Personal/Business Check",
                         "Funds Transfer", "Bank/Cashier's Check")) %>%
  group_by(Year, Instrument) %>%
  summarize(Total = sum(Count)) %>%
  ggplot(aes(Year, Total, fill = Instrument)) + geom_col(color = 'white') +
  scale_fill_brewer(palette = "Dark2") +
  theme(legend.title=element_blank(), legend.position="top") +
  labs(title="Total Suspicious Activity Reports by Instrument & Year", y="Total SARS")

SARStats_demos %>%
  filter(Instrument == c("U.S. Currency", "Personal/Business Check",
                         "Funds Transfer", "Bank/Cashier's Check")) %>%
  group_by(Date, Instrument) %>%
  summarize(Total = sum(Count)) %>%
  ggplot(aes(Date, Total, color = Instrument)) + geom_line(size = 1.25) +
  scale_color_brewer(palette = "Dark2") +
  theme(legend.title=element_blank(), legend.position="bottom") +
  labs(y="Total SARS")

County Classifications Analysis

YourCompany requested SARS activity to be analyzed by county characteristics determined by the USDA’s Atlas of Rural and Small-Town America. Summary statistics of SARS by County Classification and Major Industry are shown below.

SARS by County Classification

Counties can be of three classifications: Metropolitan, Micropolitan, and Non-Core.

Metro counties account for nearly all of the SARS filings. 94.3% of SARS filed were in metro counties, but metro counties only account for 60.5% of counties in the data set.

SARS_by_Classification <- SARStats_demos %>%
  group_by(Classification)  %>%
  summarize(Total_SARS = sum(Count),
            Avg_County = round((sum(Count)/n_distinct(FIPStxt)), digits = 1),
            N_Counties = n_distinct(FIPStxt)) %>%
  mutate(Perc_SARS = (Total_SARS / sum(Total_SARS)),
         Perc_Counties = (N_Counties / sum(N_Counties)))

SARS_by_Classification %>%
  mutate(Perc_SARS = sprintf("%0.1f%%", Perc_SARS*100),
         Perc_Counties = sprintf("%0.1f%%", Perc_Counties*100)) %>%
  kable(align = "c", col.names=c("Classification", "Total SARS", 
                    "Avg. SARS per County", "# Counties",
                    "% of SARS", "% of Counties" ),
      format.args = list(big.mark = ","))
Classification Total SARS Avg. SARS per County # Counties % of SARS % of Counties
Metro 76,704 97.2 789 94.3% 60.5%
Micro 3,024 9.6 316 3.7% 24.2%
Non-core 1,619 8.1 200 2.0% 15.3%


Looking at the SARS filings over time, Metro counties has trended upward over time and has a sharp increase in SARS in the latter half of 2018.

SARStats_demos %>%
  group_by(Date, Classification)  %>%
  summarize(Total = sum(Count)) %>%
  ggplot(aes(Date, Total, color = Classification)) + geom_line(size = 1.2) +
  theme(legend.title=element_blank(), legend.position="top") +
  scale_color_brewer(palette = "Dark2") +
  labs(title="Total Suspicious Activity Reports by Classification", y="Total SARS")

SARS by Major Industry

Counties can have major industry designations if the county’s economy is dependent on said industry. The Atlas designated major industries as Farming, Government, Manufacturing, Mining, and Recreation. Counties that do not have a specific major industry are tagged as Non-specialized.

The share of Total SARS versus Number of Counties is more similar between industries than when compared by County Classifications. However, while half of counties in the data set are Non-specialized, these counties accounted for 62.7% of the SARS filed. Government-dependent counties also had a higher share of SARS than the proportion of counties represented.

SARS_by_Industry <- SARStats_demos %>%
  group_by(Major_Industry)  %>%
  summarize(Total_SARS = sum(Count), 
            Avg_County = round((sum(Count)/n_distinct(FIPStxt)), digits = 1),
            N_Counties = n_distinct(FIPStxt)) %>%
  mutate(Perc_SARS = (Total_SARS / sum(Total_SARS)),
         Perc_Counties = (N_Counties / sum(N_Counties)))

SARS_by_Industry %>%
  mutate(Perc_SARS = sprintf("%0.1f%%", Perc_SARS*100),
         Perc_Counties = sprintf("%0.1f%%", Perc_Counties*100)) %>%
  kable(align = "c", col.names=c("Major Industry", "Total SARS", 
                    "Avg. SARS per County", "# Counties",
                    "% of SARS", "% of Counties"),
        format.args = list(big.mark = ","))
Major Industry Total SARS Avg. SARS per County # Counties % of SARS % of Counties
Farming 247 7.3 34 0.3% 2.6%
Government 16,531 76.9 215 20.3% 16.5%
Manufacturing 6,441 29.8 216 7.9% 16.6%
Mining 1,708 38.0 45 2.1% 3.4%
Non-specialized 50,964 78.4 650 62.7% 49.8%
Recreation 5,456 37.6 145 6.7% 11.1%


As noted in prior sections of this analysis, there is a large spike in SARS filed in the second half of 2018. Non-specialized counties had a notable upward trend in SARS filed with a notable increase in the last 6 months of 2018. Manufacturing counties had a more gradual increase in activity, but had that same spike in SARS filed in the end of 2018.

SARStats_demos %>%
  group_by(Date, Major_Industry)  %>%
  summarize(Total = sum(Count)) %>%
  ggplot(aes(Date, Total, color = Major_Industry)) + geom_line(size = 1.2) +
  theme(legend.title=element_blank(), legend.position="top") +
  scale_color_brewer(palette = "Dark2") +
  labs(title="Total Suspicious Activity Reports by Major Industry", y="Total SARS")

Geographic Analysis

SARS filings include location data and by analyzing by state and county, it is clear that particular geographies account for the spike in SARS activity.

SARS by State

Looking at the 10 states with the most SARS filed, California and Florida both have a larger proportion of SARS than what would be expected from their share of counties in the data. They both have about 4% of the total number of counties, but California has nearly 15% and Florida has 11.5% of SARS filed.

Top10_SARS_State <- SARStats_demos %>%
  group_by(State) %>%
  summarize(Total_SARS = sum(Count),
            N_Counties = n_distinct(FIPStxt)) %>%
  mutate(Perc_SARS = (Total_SARS / sum(Total_SARS)),
         Perc_Counties = (N_Counties/sum(N_Counties))) %>%
  arrange(desc(Total_SARS)) %>%
  slice(1:10)

Top_10_State <- Top10_SARS_State$State

Top10_SARS_State %>%
  mutate(Perc_SARS = sprintf("%0.1f%%", Perc_SARS*100),
         Perc_Counties = sprintf("%0.1f%%", Perc_Counties*100)) %>%
  kable(align = "c", col.names=c("State", "Total SARS", "# Counties", "% of SARS", "% of Counties"),
      format.args = list(big.mark = ","))
State Total SARS # Counties % of SARS % of Counties
CA 11,987 44 14.7% 3.4%
FL 9,352 51 11.5% 3.9%
WA 6,093 28 7.5% 2.1%
PA 5,408 46 6.6% 3.5%
HI 3,832 4 4.7% 0.3%
NY 3,796 53 4.7% 4.1%
VA 3,533 54 4.3% 4.1%
OR 3,118 26 3.8% 2.0%
TX 3,000 73 3.7% 5.6%
MI 2,261 64 2.8% 4.9%


Plotting the data over time, California has a very noticeable spike in SARS filed in the last few months of 2018. Interestingly, Florida actually saw a decline in activity during those final months.

SARStats_demos %>%
  filter(State %in% Top_10_State) %>%
  group_by(Date, State) %>%
  summarize(Total = sum(Count)) %>%
  ggplot(aes(Date, Total)) + geom_line() + facet_wrap(~State, nrow = 2) +
  scale_color_brewer(palette = "Dark2") +
  scale_x_date(breaks = "2 years", date_labels = "%Y") +
  theme(strip.text.x = element_text(color="black", face = "bold")) +
   labs(title="Total Suspicious Activity Reports by State", y="Total SARS")

SARS by County

Because there are so many counties in the data set, each one only represents a small portion of the total SARS activity. However, these 10 counties account for nearly 30% of SARS filed. Looking at the list, California has 3 counties in the top 10.

Top10_SARS_Counties <- SARStats_demos %>%
  group_by(County_State) %>%
  summarize(Total_SARS = sum(Count)) %>%
  mutate(Perc_SARS = (Total_SARS / sum(Total_SARS))) %>%
  arrange(desc(Total_SARS)) %>%
  slice(1:10)

Top_10_county <- Top10_SARS_Counties$County_State

Top10_SARS_Counties %>%
  mutate(Perc_SARS = sprintf("%0.1f%%", Perc_SARS*100)) %>%
  kable(align = "c", col.names=c("County", "Total SARS", "% of SARS"),
      format.args = list(big.mark = ","))
County Total SARS % of SARS
Hillsborough, FL 5,093 6.3%
Honolulu, HI 3,247 4.0%
Los Angeles, CA 2,643 3.2%
Santa Clara, CA 2,524 3.1%
Dauphin, PA 2,161 2.7%
King, WA 2,136 2.6%
Fairfax County, VA 2,002 2.5%
Sacramento, CA 1,434 1.8%
Maricopa, AZ 1,309 1.6%
Tarrant, TX 1,098 1.3%


Looking at the counties’ SARS activity over time, it is clear that Los Angeles, CA and Santa Clara, CA have shown the spike in activity seen in prior sections for those last 6 months of the time frame. Similarly as seen in the plot by State, Hillsborough, FL showed a noticeable decline in the final months.

SARStats_demos %>%
  filter(County_State %in% Top_10_county) %>%
  group_by(Date, County_State) %>%
  summarize(Total = sum(Count)) %>%
  ggplot(aes(Date, Total)) + geom_line() + facet_wrap(~County_State, nrow = 2) +
  scale_color_brewer(palette = "Dark2") +
  theme(strip.text.x = element_text(color="black", face = "bold")) +
  scale_x_date(breaks = "2 years", date_labels = "%Y") +
  labs(title="Total Suspicious Activity Reports by County", y="Total SARS")


To see if any of these counties have common characteristics, additional demographic data analyzed. Averages were used for population and unemployment rates of the annual values in the time frame of study. The metric for the number of SARS filed per 1,000 people was created to normalize for the variations in population.

All of the top 10 counties are metro areas, most are non-specialized, and there is a small delta in unemployment rates. Population growth varied across counties. The SARS per 1,000 people had a fairly large spread. Perhaps the most interesting was Dauphin, PA - the smallest of this group of counties - had 7.9 SARS filed per 1,000 people. This is more than double the next highest value (Hillsborough, FL with 3.6).

Top10_county_demos <- SARStats_counties %>%
  filter(County_State %in% Top_10_county) %>%
  select(-FIPStxt, -County, -State) %>%
  merge(Top10_SARS_Counties) %>%
  mutate(AvgUnempRate = round((UnempRate2015 + UnempRate2016 + UnempRate2017 + UnempRate2018)/4, digits = 1),
            AvgPopulation = trunc((TotalPopEst2015 + TotalPopEst2016 + TotalPopEst2017 + TotalPopEst2018)/4),
            SARS_per_1000 = round(Total_SARS/(AvgPopulation/1000), digits = 1),
            PopGrowthRate = (TotalPopEst2018 - TotalPopEst2015)/TotalPopEst2015 *100) %>%
  select(County_State, Classification, Major_Industry, AvgUnempRate, AvgPopulation,
         PopGrowthRate, Total_SARS, SARS_per_1000) %>%
  arrange(desc(Total_SARS))

Top10_county_demos %>%
  mutate(PopGrowthRate = sprintf("%0.1f%%", PopGrowthRate)) %>% 
kable(align = "c", 
      col.names=c("County", "Classification", "Major Industry", "Avg. Unemployment Rate",
                  "Avg. Population", "Pop. Growth Rate", "Total SARS", "SARS/1000 people"),
      format.args = list(big.mark = ","))
County Classification Major Industry Avg. Unemployment Rate Avg. Population Pop. Growth Rate Total SARS SARS/1000 people
Hillsborough, FL Metro Non-specialized 4.3 1,409,855 6.4% 5,093 3.6
Honolulu, HI Metro Government 2.6 987,385 -1.1% 3,247 3.3
Los Angeles, CA Metro Non-specialized 5.4 10,092,185 -0.1% 2,643 0.3
Santa Clara, CA Metro Manufacturing 3.5 1,927,375 0.8% 2,524 1.3
Dauphin, PA Metro Government 4.5 274,964 1.4% 2,161 7.9
King, WA Metro Non-specialized 3.7 2,181,182 4.8% 2,136 1.0
Fairfax County, VA Metro Non-specialized 3.0 1,145,445 0.7% 2,002 1.7
Sacramento, CA Metro Government 5.0 1,517,677 3.0% 1,434 0.9
Maricopa, AZ Metro Non-specialized 4.5 4,289,658 5.5% 1,309 0.3
Tarrant, TX Metro Non-specialized 3.8 2,036,583 4.9% 1,098 0.5