class: center, middle, inverse, title-slide .title[ # U.S. Bank Loans ] .subtitle[ ## Sampling Strategies ] .author[ ### Tyler Battaglini & Ryan Lebo ] .date[ ### 2025-03-27 ] --- ## Table of Contents <ul style="font-size: 1.6em; line-height: 1.6;"> <li>Introduction</li> <li>Variables</li> <li>Missing Variables/Converting Variables</li> <li>Variable Rework</li> <li>Default Rates/Discretize GrApprv</li> <li>Study Population/Sample Calculations</li> <li>Sample Methods</li> <li>Comparison of Methods</li> <li>Conclusion</li> </ul> --- ## Introduction <ul style="font-size: 1.6em; line-height: 1.6;"> <li>EDA Bank Loan Data set</li> <li>Combined Dataset</li> <li>Data set provides data from SBA</li> </ul> --- ## Variables <ul style="font-size: 1.6em; line-height: 1.6;"> <li>Mis_Status</li> <li>DisbursementGross</li> <li>BalanceGross</li> <li>ChgOffPrinGr</li> <li>GrAppv</li> <li>SBA_Appv</li> </ul> --- ## Missing Variables/Converting Variables <ul style="font-size: 1.6em; line-height: 1.6;"> <li>Looked for missing Mis_Status variables</li> <li>Converted DisbursementGross, BalanceGross, ChgOffPrinGr, GrAppv, and SBA_Appv </li> </ul> --- ## Variable Rework <ul style="font-size: 1.6em; line-height: 1.6;"> <li>6 regional categories</li> <li>Can see patterns</li> <li>Easier for visualization</li> </ul> ``` Mid-Atlantic Midwest Northeast Southeast Southwest Unknown 142486 211424 133479 138304 85461 5733 West 182277 ``` --- ## Default Rates/Discretize GrApprv <ul style="font-size: 1.6em; line-height: 1.6;"> <li>Calculated default rates of SBA-backed loans</li> <li>Compare regions</li> <li>Easier to interpret</li> </ul> |BankRegion | Total_Loans| Defaults| Default_Rate| |:------------|-----------:|--------:|------------:| |Mid-Atlantic | 142486| 29375| 20.616060| |Midwest | 211424| 33263| 15.732840| |Northeast | 133479| 19692| 14.752883| |Southeast | 138304| 30791| 22.263275| |Southwest | 85461| 10167| 11.896655| |Unknown | 5733| 443| 7.727193| |West | 182277| 33827| 18.558019| --- ## Default Rates/Discretize GrApprv <ul style="font-size: 1.6em; line-height: 1.6;"> <li>Easier to interpret</li> <li>Evenly Distributed</li> <li>Split into 5 groups</li> </ul> ``` Very Low Low Medium High Very High 180997 179102 179575 179677 179813 ``` --- ## Study Population/Sample Calculations <ul style="font-size: 1.6em; line-height: 1.6;"> <li>Remove unknown group</li> <li>Make calculation from sample</li> </ul> | Mid-Atlantic| Midwest| Northeast| Southeast| Southwest| West| |------------:|-------:|---------:|---------:|---------:|------:| | 142486| 211424| 133479| 138304| 85461| 182277| --- ## Simple Random Sample <ul style="font-size: 1.6em; line-height: 1.6;"> <li>Perform simple random sample</li> <li>Assigned a unique index to each observation</li> <li>Randomly selected 32,724 observations</li> </ul> | Size| Var.count| |-----:|---------:| | 32724| 30| --- ## Systematic Sampling <ul style="font-size: 1.6em; line-height: 1.6;"> <li>Sample from intervals after a given starting point </li> <li>Ensures representation throughout dataset </li> <li>Reduces bias </li> </ul> | Size| Var.count| |-----:|---------:| | 33090| 30| --- ## Stratefied Sampling <ul style="font-size: 1.6em; line-height: 1.6;"> <li>Ensures each region is equally represented </li> <li>Improved accuracy </li> <li>Reduces bias </li> </ul> | Mid-Atlantic| Midwest| Northeast| Southeast| Southwest| West| |------------:|-------:|---------:|---------:|---------:|----:| | 5219| 7744| 4889| 5066| 3130| 6676| --- ## Cluster Sampling <ul style="font-size: 1.6em; line-height: 1.6;"> <li>Cluster selection bias</li> <li>Potential over/Under representation of population</li> <li> Enhances comparability between groups</li> </ul> | Size| Var.count| |-----:|---------:| | 32305| 29| --- <div style="display: flex; justify-content: center; align-items: center; height: 80vh; font-size: 50px;"> Comparison of Methods </div> --- ## Population-level Default Rates <ul style="font-size: 1.6em; line-height: 1.6;"> <li>Population default rates will be used for comparison </li> <li>Rates relatively close to one another </li> </ul> | | no.lab| default| no.default| default.rate| |:------------|------:|-------:|----------:|------------:| |Mid-Atlantic | 141| 29375| 112970| 20.6| |Midwest | 143| 33263| 178018| 15.7| |Northeast | 1396| 19692| 112391| 14.9| |Southeast | 99| 30791| 107414| 22.3| |Southwest | 57| 10167| 75237| 11.9| |Unknown | 54| 443| 5236| 7.8| |West | 107| 33827| 148343| 18.6| --- ## Region-specific Default Rates for Differing Samples <ul style="font-size: 1.6em; line-height: 1.6;"> <li>Little to no variation between methods </li> <li>Almost equal to our default rate population </li> </ul> | | default.rate.pop| default.rate.srs| default.rate.sys| default.rate.str| default.rate.cluster| |:------------|----------------:|----------------:|----------------:|----------------:|--------------------:| |Northeast | 20.6| 21.0| 20.6| 20.8| 21.7| |Mid-Atlantic | 15.7| 15.6| 15.3| 16.2| 15.1| |Southeast | 14.9| 14.3| 14.7| 15.2| 15.6| |Southwest | 22.3| 22.4| 22.6| 22.3| 21.5| |Midwest | 11.9| 11.9| 12.7| 13.4| 12.9| |West | 18.6| 18.5| 19.0| 18.6| 20.6| --- ## Visualization <img src="Bank-Loan-Draft-2_files/figure-html/unnamed-chunk-20-1.png" width="100%" /> --- <div style="display: flex; justify-content: center; align-items: center; height: 80vh; font-size: 50px;"> Conclusion </div> --- ## Conclusion <ul style="font-size: 2.0em; line-height: 1.8;"> <li> Stratified/Systematic sampling most accurate </li> <li> Representative selection across methods </li> <li> Default rates between differing sampling plans are similar to population default rates </li> <li> Large sample could leads to random variation decreasing </li> </ul> --- ## Limitations <ul style="font-size: 2.0em; line-height: 1.8;"> <li> Economic Factors differ by region </li> <li> Big economical shifts for the 30 years of the dataset</li> <li> Densely populated areas could inflate rates for whole region </li> </ul> --- <div style="display: flex; justify-content: center; align-items: center; height: 80vh; font-size: 50px;"> Thank You </div>--- --- ## Contributors <ul style="font-size: 1.8em; line-height: 1.8;"> <li> Ryan Lebo - Slides Beginning to Simple Random Sampling </li> <li> Tyler Battaglini - Slides Systematic Sampling to Conclusion </li> </ul>