class: center, middle, inverse, title-slide .title[ #
Comparing Potential Sampling Plans
] .author[ ###
Kyle Weber and Ian Vanwright
] .institute[ ###
Prepared For: STA 490
] .date[ ###
West Chester University of Pennsylvania
Slides available at:
https://rpubs.com/IV246
AND
https://github.com/Kyle-Weber/STA490
] --- class: top, center ## Table of Contents: .left[ - Data Cleaning - Combining Categories - Default Rate - Analyzing Four Sampling Plans - Comparing Default Rates - Default Rate Visualization - Conclusion ] --- class: top, center ## Data Cleaning .left[ - Original data set contained 899,150 observations, was split into 9 files - All 9 data files were combined into 1 data set when loaded - Removed all missing values for MIS_Status and State - Changed currency values to numerical - Remaining observations is 897153 ] --- class: top, center ## Combining Categories .left[ - Variable "State" combined into 5 regions - Represent geographical region - Frequency Table of Regions - Lowest is SouthWest ] |Region | Frequency| |:------|---------:| |MW | 202581| |NE | 202423| |SE | 140841| |SW | 90202| |W | 263103| --- ## Calculating Default Rates .left[ - Original rates for data set - Provides a comparison to the sampling plans - Southeast has the highest rate ] <!--If need time can talk about why they are important--> |State | Default Rate| |:-----|------------:| |MW | 0.1583021| |NE | 0.1611872| |SE | 0.2141990| |SW | 0.1934325| |W | 0.1719593| --- class: top, center ## Simple Random Sampling Process .left[ - Random Sample of 4000 <!-- why 4000 less than 5% of original data set --> - Taken from Cleaned Data Set - Frequency and Default Rate Table ] |Size | Var.count| |:---------|---------:| |Size | 4000| |Var.count | 28| |Region | Sample_Default_Rate| |:------|-------------------:| |SE | 0.2083333| |W | 0.1719418| |MW | 0.1455982| |SW | 0.2198391| |NE | 0.1445148| --- class: top, center ## Systematic Sample .left[ - Jump size of 4000 observations - Selects observations from fixed intervals - Representative of entire population - Rounding Error ] | Size| Var.count| |----:|---------:| | 4014| 28| |Region | Sample_Default_Rate| |:------|-------------------:| |SE | 0.1855346| |NE | 0.1571429| |W | 0.1692177| |MW | 0.1399317| |SW | 0.1985472| --- class: top, center ## Stratified Sampling .left[ - Strata size for regions ] | MW| NE| SE| SW| W| |---:|---:|---:|---:|----:| | 901| 901| 627| 401| 1170| |Region | Sample_Default_Rate| |:------|-------------------:| |MW | 0.1642619| |NE | 0.1653718| |SE | 0.2216906| |SW | 0.1695761| |W | 0.1632479| --- class: top, center ## Cluster Sampling .left[ - ZipCode used to Define Clusters - Finally, a cluster sample of Zip code is taken with the size and variable count shown below. ] |Size | Var.count| |:---------|---------:| |Size | 283| |Var.count | 28| |Region | Default Rate| |:------|------------:| |MW | 0.0714286| |NE | 0.1931818| |SE | 0.2575758| |SW | 0.0000000| |W | 0.2127660| --- class: top, center ## Default Rate Comparison .left[ - Comparison of Default Rates across regions - Sample Default Rate vs Subpopulation Default Rate - Systematic or Stratified ] <div class="figure">
<p class="caption">Summary of inferential statistics of the full model</p> </div> --- ## Comparrison Between Each other - Default Rate for each Region Based on Sampling Plan <img src="490Grp2.Rmd_files/figure-html/interactive-comparison-1.png" width="120%" /> --- ## Visualization for final choice ``` ## <ggproto object: Class FacetWrap, Facet, gg> ## compute_layout: function ## draw_back: function ## draw_front: function ## draw_labels: function ## draw_panels: function ## finish_data: function ## init_scales: function ## map_data: function ## params: list ## setup_data: function ## setup_params: function ## shrink: TRUE ## train_scales: function ## vars: function ## super: <ggproto object: Class FacetWrap, Facet, gg> ``` <img src="490Grp2.Rmd_files/figure-html/interactive sample defaul-1.png" width="120%" /> --- <img src="490Grp2.Rmd_files/figure-html/interactive sample default-1.png" width="120%" /> --- ## Visualization for final choice <img src="490Grp2.Rmd_files/figure-html/interactive-sample default 2-1.png" width="130%" /> --- ## Conclusion - why this is important - uses - Questions