Ass4-presentation.knit

October, 25 2022

Research Question

-What is the best sampling method for this data set?

- Simple Random Sampling    
- Systematic Random Sampling    
- Stratified Sampling

The Sampling Methods

Simple Random Sampling
- subset chosen from a larger population in which the individuals are chosen randomly, all with the same probability

Systematic Random Sampling
- probability sampling method in which a random sample, with a fixed periodic interval, is selected from a larger population

Stratified Sampling
- involves the division of a population into smaller subgroups known as strata, based on members’ shared attributes or characteristics, then independently sampled randomly

The Data

899,164 observations
27 variables
Provides information on loans that were guaranteed to some degree by the SBA between 1987 and 2014
The strata are separated by the stratification variable “State” after the states are combined into categories by region
“State” because geographical location may have an impact on other variables such as default rate

The Data

Checking for small number of observations in each category within the variable State

Population.size	Number.of.Regions	Sub.Pop.less.1000
899164	7	0

Simple Random Sampling

Using the R random number generator, the simple random sample was formed
Table provided shows the number of observations in the sample, and the number of variables
Checking to make sure the amount of variables are 5% or less of the amount of observations

Size	Var.count
4000	28

Systematic Random Sampling

Size	Var.count
4014	28

Stratified Sampling

Taking a proportional amount of observations from each category in regions using frequency function in R
The proportions are all even

MidAtlantic	Midwest	Northeast	Northwest	Southeast	Southwest	West
571	571	571	571	571	571	571

The Results

The Results in a Graph

The Conclusion

After creating 3 different samples with 3 different sampling methods and stratifying the data by “State” a table was created from the default rates extracted from the samples
The table and graph combined supports the idea that the systematic sampling method better represents the original population
Now that the best sampling method is known, more analysis can be performed, exploring trends of the bank loans data