class: center, middle, inverse, title-slide .title[ #
SBA Bank Loan Exploratory Data Analysis
] .subtitle[ ##
Performance Analysis on Three Types of Random Samples
] .author[ ###
Andrew Heneghan
] .institute[ ###
West Chester University of Pennsylvania
] .date[ ###
10/25/2022
STA 490: Capstone Statistics
] --- class: middle, center # <font color = "red">Agenda</font> ## <font size = 6>Data/Research Question</font> ## <font size = 6>Method for Analysis</font> ## <font size = 6>Random Samples</font> ## <font size = 6>Results and Conclusion</font> --- class: top ## <center>Description of Data</center> .pull-left[ - Dataset is from the U.S. Small Business Administration (SBA). - Uses data between 1987 and 2014. - I used a simple random sample, a systematic random sample, and a stratified random sample for my analysis. - I used the FranchiseCode variable as my stratification variable. - Most observations did not have a franchise code. - Made it easier for me to find an appropriate sample size. ] .pull-right[ <img src = "https://i0.wp.com/thrownstone.org/wp-content/uploads/2021/01/sba-banner.jpg?w=960&ssl=1" width="600" height="500"> ] --- class: middle, center # Research Question ## <font size = 6>With FranchiseCode as a stratification variable, which of the three random sampling plans (simple, systematic, and stratified) will perform best in an analysis of the SBA Bank Loan Dataset?</font> <img src = "https://cdn.scribbr.com/wp-content/uploads/2019/09/probability-sampling.png" width="260" height="230"> --- class: middle, center ## Methods for Analysis - I divided the franchise codes into categories by their first two digits. - Any categories unclassified or not large enough to be used were deleted, defining the study population. - Loan default rates by franchise code were found. - I ran the three different types of random sampling plans, each composed of 5000 observations. | 10| 13| 15| 16| 17| 18| 19| 20| 21| 22| 23| 24| 25| 26| 28| 29| 30| 31| 32| 33| 34| 35| 36| 37| 38| 39| 40| 41| 42| 43| 44| 45| 46| 47| 48| 49| 50| 51| 52| 53| 54| 55| 56| 57| 58| 59| 60| 61| 62| 63| 64| 65| 66| 67| 68| 69| 70| 71| 72| 73| 74| 75| 76| 77| 78| 79| 80| 81| 82| 83| 84| 85| 88| 89| 90| 91| |----:|---:|---:|---:|----:|---:|---:|---:|----:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|----:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|----:|---:|----:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:|----:|----:|---:|---:|---:|---:|---:|---:|---:|---:|---:|----:|----:|---:|---:|---:|---:|---:|---:|---:|---:|---:|---:| | 2548| 328| 746| 667| 1571| 320| 434| 779| 1827| 776| 314| 827| 908| 567| 231| 538| 300| 288| 177| 305| 978| 1098| 512| 244| 849| 546| 231| 408| 388| 480| 328| 694| 370| 236| 537| 407| 1707| 283| 1374| 700| 411| 476| 682| 246| 203| 218| 359| 710| 336| 389| 679| 927| 736| 1010| 2688| 273| 577| 257| 489| 735| 377| 795| 276| 493| 3658| 1247| 632| 858| 268| 338| 745| 191| 312| 626| 607| 297| --- class: middle, center ## Simple Random Sampling Plan Sample of 5000 loans randomly chosen. I defined a sampling list and added it to study population. | Size| Var.count| |----:|---------:| | 5000| 30| <img src = "https://www.statisticshowto.com/wp-content/uploads/2014/12/Simple_random_sampling.png" width="300" height="200"> --- class: middle, center ## Systematic Random Sampling Plan Jump size, involving roundign error used to randomly chose a sample of 5000 loans at a regular interval. The actual systematic sample size is bigger from the target size. | Size| Var.count| |----:|---------:| | 5094| 30| <img src = "https://www.qualtrics.com/m/assets/wp-content/uploads/2022/02/srs2.png" width="300" height="200"> --- class: middle, center ## Stratified Random Sampling Plan - FranchiseCode used as the stratification variable. - Sample of 5000 loans was randomly taken. - Most observations did not have a franchise code, making it easier to find a better sample size. - Codes put in categories by their first two numbers. - Categories were deleted if too small or represented no franchise. - A SRS was taken from each remaining FranchiseCode stratum. | 10| 13| 15| 16| 17| 18| 19| 20| 21| 22| 23| 24| 25| 26| 28| 29| 30| 31| 32| 33| 34| 35| 36| 37| 38| 39| 40| 41| 42| 43| 44| 45| 46| 47| 48| 49| 50| 51| 52| 53| 54| 55| 56| 57| 58| 59| 60| 61| 62| 63| 64| 65| 66| 67| 68| 69| 70| 71| 72| 73| 74| 75| 76| 77| 78| 79| 80| 81| 82| 83| 84| 85| 88| 89| 90| 91| |---:|--:|--:|--:|---:|--:|--:|--:|---:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|---:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|---:|--:|---:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:|---:|--:|--:|--:|--:|--:|--:|--:|--:|--:|---:|---:|--:|--:|--:|--:|--:|--:|--:|--:|--:|--:| | 250| 32| 73| 65| 154| 31| 43| 76| 179| 76| 31| 81| 89| 56| 23| 53| 29| 28| 17| 30| 96| 108| 50| 24| 83| 54| 23| 40| 38| 47| 32| 68| 36| 23| 53| 40| 168| 28| 135| 69| 40| 47| 67| 24| 20| 21| 35| 70| 33| 38| 67| 91| 72| 99| 264| 27| 57| 25| 48| 72| 37| 78| 27| 48| 359| 122| 62| 84| 26| 33| 73| 19| 31| 61| 60| 29| --- class: middle, center ## Results Comparative performance analysis was run with default rates to determine the best random sample for this analysis. <img src = "https://azheneghan.github.io/aheneghan/images/FranchiseCode_Default_Rate_Comparison_Graph.PNG" width="600" height="450"> --- class: middle, center ## Conclusion - The simple random sampling plan performed better compared to the systematic and stratified sampling plans. - I believe the simple random sampling plan is the best sample type for the SBA data analysis. - Many observations were not used since they didn't have a franchise code. - I wonder if the performance analysis would have turned out different if more observations had franchise codes. <img src = "https://azheneghan.github.io/aheneghan/images/SamplingPlans_MSE_difference.PNG" width="400" height="300">