class: center, middle, inverse, title-slide .title[ #
Sampling Presentation
] .subtitle[ ##
Data set: bankLoan
] .author[ ###
Zack Shin
] .institute[ ###
West Chester University of Pennsulvania
] .date[ ###
October 20, 2022
Prepared for
STA490: Capstone Statistics
] --- ## I Introduction We carry out an analysis by comparing the performance of three random sampling plans: simple random sampling (SRS), systematic sampling (SS), and stratified sampling based on a large bank load data set as the finite population. This will be accomplished by categorizing the data based on Fiscal Year of Commitment, labelled ApprovalFY, and then taking samples from the population. The original data set was split into 9 subsets that are stored on GitHub. We first load these data sets to R and then combine them as a single data set. <img src="data:image/png;base64,#../../../GitHub Stuff/myweb/img/datasettable.png" width="959" height="60%" style="display: block; margin: auto;" /> --- ## Stratification Variable In this analysis, we modify the Date SBA Commitment Issued (ApprovalFY)) to define a stratification variable for stratified sampling. The ApprovalFY is a 4-digit code. We use the last two digits of the code as a basis to define the stratification variable. The population contains 52 different years according to the ApprovalFY and 21 of them had less than 900 small businesses. | Population.size| Number.of.years| Sub.Pop.less.900| |---------------:|---------------:|----------------:| | 899164| 52| 21| --- ## Description of existing 2-digit Approval year Next, we explore the frequency distribution of the 2-digit ApprovalYears and decide the potential combinations of categories with a small size. <img src="data:image/png;base64,#../../../GitHub Stuff/myweb/img/2digitkable.png" width="959" style="display: block; margin: auto;" /> From the table above we can see that many of the earliest years (1962 through to around 1976) have relatively small values. --- ## Combining Categories We now combine the actual 2-digit ApprovalFY codes: years between and including **1962** through **1989** will be combined and renamed as **19607080**. We created a string variable **stryear** to represent these modified 2-digit ApprovalFY Years. ## Study Population Based on the above frequency distribution of the modified 2-digit ApprovalFY years, we use the following inclusion rule to define the **study population**: excluding small-size categories 12, 13, 14, 90, 91, 92, 93. <img src="data:image/png;base64,#../../../GitHub Stuff/myweb/img/populationtable.png" width="959" style="display: block; margin: auto;" /> --- ## Loan Default Rates by Year We now find the loan default rates by industry defined by the stratification variable stryear. | | no.lab| default| no.default| default.rate| |:---------|------:|-------:|----------:|------------:| |2000 | 29| 4266| 33086| 11.4| |2001 | 33| 4450| 32867| 11.9| |2002 | 84| 5187| 39120| 11.7| |2003 | 193| 8425| 49575| 14.5| |2004 | 95| 12306| 55889| 18.0| |2005 | 567| 19479| 57479| 25.3| |2006 | 284| 26517| 49239| 35.0| |2007 | 227| 30658| 40991| 42.8| |2008 | 82| 16250| 23208| 41.2| |2009 | 23| 3971| 15132| 20.8| |2010 | 20| 2312| 14516| 13.7| |2011 | 15| 989| 11604| 7.9| |2012 | 5| 343| 5649| 5.7| |2013 | 3| 70| 2385| 2.9| |2014 | 0| 5| 263| 1.9| |1962-1989 | 84| 7791| 20088| 27.9| |1990 | 0| 663| 14196| 4.5| |1991 | 6| 443| 15217| 2.8| |1992 | 10| 450| 20425| 2.2| |1993 | 6| 434| 22865| 1.9| |1994 | 14| 696| 30888| 2.2| |1995 | 70| 1296| 44392| 2.8| |1996 | 91| 1646| 38375| 4.1| |1997 | 30| 2247| 35471| 6.0| |1998 | 11| 2970| 33035| 8.2| |1999 | 15| 3694| 33654| 9.9| --- # Performance Analysis of Random Samples Table: Population size, default counts, and population default rates | | no.lab| default| no.default| default.rate| |:--------|------:|-------:|----------:|------------:| |00 | 29| 4266| 33086| 11.4| |01 | 33| 4450| 32867| 11.9| |02 | 84| 5187| 39120| 11.7| |03 | 193| 8425| 49575| 14.5| |04 | 95| 12306| 55889| 18.0| |05 | 567| 19479| 57479| 25.3| |06 | 284| 26517| 49239| 35.0| |07 | 227| 30658| 40991| 42.8| |08 | 82| 16250| 23208| 41.2| |09 | 23| 3971| 15132| 20.8| |10 | 20| 2312| 14516| 13.7| |11 | 15| 989| 11604| 7.9| |12 | 5| 343| 5649| 5.7| |13 | 3| 70| 2385| 2.9| |14 | 0| 5| 263| 1.9| |19607080 | 84| 7791| 20088| 27.9| |90 | 0| 663| 14196| 4.5| |91 | 6| 443| 15217| 2.8| |92 | 10| 450| 20425| 2.2| |93 | 6| 434| 22865| 1.9| |94 | 14| 696| 30888| 2.2| |95 | 70| 1296| 44392| 2.8| |96 | 91| 1646| 38375| 4.1| |97 | 30| 2247| 35471| 6.0| |98 | 11| 2970| 33035| 8.2| |99 | 15| 3694| 33654| 9.9| --- ## Year-specific default rates for SRS, systematic, and stratified samples Table: Comparison of year-specific default rates between population, SRS, Systematic Sample, and Stratified Samples. | | default.rate.pop| default.rate.srs| default.rate.sys| default.rate.str| |:---------|----------------:|----------------:|----------------:|----------------:| |2000 | 11.4| 10.6| 13.9| 9.8| |2001 | 11.9| 8.4| 14.0| 15.8| |2002 | 11.7| 16.8| 10.1| 11.5| |2003 | 14.5| 13.8| 11.5| 14.7| |2004 | 18.0| 16.8| 18.2| 20.9| |2005 | 25.3| 26.4| 22.6| 29.6| |2006 | 35.0| 37.2| 39.5| 38.5| |2007 | 42.8| 42.9| 46.0| 40.6| |2008 | 41.2| 43.2| 33.7| 38.5| |2009 | 20.8| 16.0| 16.7| 22.3| |2010 | 13.7| 15.9| 15.6| 14.5| |2011 | 7.9| 5.5| 13.4| 8.1| |1962-1989 | 27.9| 29.3| 28.2| 24.1| |1994 | 2.2| 2.0| 3.8| 1.9| |1995 | 2.8| 3.7| 2.2| 2.2| |1996 | 4.1| 4.3| 5.5| 4.1| |1997 | 6.0| 6.6| 5.2| 5.9| |1998 | 8.2| 10.7| 6.3| 9.0| |1999 | 9.9| 13.0| 8.6| 10.9| --- ## Visualization <img src="data:image/png;base64,#SamplingPresYuh_files/figure-html/visualization-1.png" style="display: block; margin: auto;" /> Table: Industry Approval Years and the corresponding names |year.adj.name |year.code | |:-------------|:---------| |2000 |00 | |2001 |01 | |2002 |02 | |2003 |03 | |2004 |04 | |2005 |05 | |2006 |06 | |2007 |07 | |2008 |08 | |2009 |09 | |2010 |10 | |2011 |11 | |1962-1989 |19607080 | |1994 |94 | |1995 |95 | |1996 |96 | |1997 |97 | |1998 |98 | |1999 |99 |