R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

summary(cars)
##      speed           dist       
##  Min.   : 4.0   Min.   :  2.00  
##  1st Qu.:12.0   1st Qu.: 26.00  
##  Median :15.0   Median : 36.00  
##  Mean   :15.4   Mean   : 42.98  
##  3rd Qu.:19.0   3rd Qu.: 56.00  
##  Max.   :25.0   Max.   :120.00

Including Plots

You can also embed plots, for example:

Define the following terms and where appropriate compare with the alternative method. 1. EDA Exploratory Data Analysis is the first step in the data exploration process, or at least it should be. The idea behind conducting EDA is to look for trends, patterns, and anomalies. This is accomplished by creating figures and tables (i.e. visualizations) and calculating simple statistical computations (i.e. correlations) This can then be used to formulate and test early hypotheses. The 5 core activities of the data analysis process are: a. State/refine the question b. Explore the data/data exploration c. Build formal statistical models d. Interpret the results e. Communicate the results Some books state that there are more steps in the Data Analysis process by breaking up (b) data exploration into two or three steps (data collection, data cleaning and/or processing). This is done before building any formal statistical models. An easy way to start this in R is to load the data explorer package and use the “create report” command. This report will help you: A. Determine if there are any problems with your data set (such as missing values/NAs, extraneous entries, etc.) B. Whether or not your question can be answered with the data set C. Preliminary visualization of your hypothesis testing Some common graphs frequently used are scatterplot, boxplots, histograms (can help with showing skewness, spread, and median), and heat maps. It’s also very common to run a quartile table to see the spread of your data and to use the “outlier” command in R to pinpoint outliers. It’s a good idea to make sure that your outlier is not the result of a clerical error, such as entering a person’s height at 650 inches instead of 65 inches. 2. One-Way ANOVA versus Two-Way ANOVA ANOVA is an abbreviation for the phrase Analysis of Variance. It is an inference procedure to compare means and variance between sample means from a sample population. It also checks for variance within the individual samples. The null hypothesis (H0) is that all the means between the samples are equal. The alternative hypothesis (H1) is that not all means are equal. For a one-way ANOVA you are determine the effect of one factor or treatment affects a response variable for your population. (ex. number of study hours effect on a student’s grade for 3 different sections of the same class) However, ANOVA tests cannot be used on all experiments. For an accurate analysis, the sample population must be: a. Samples must be independent of each other b. Normally distributed c. Standard deviations must be equal Table 1. One-way Anova table

SSR: regression sum of squares df: degrees of freedom SSE: error sum of squares k: total number of sample groups SST: total sum of squares N: total number of observations

Use a Two-way ANOVA for testing more complex experiments. You can use 2 independent variables against one dependent variable. You can test interactions in a two-way ANOVA as well. Alternatives to ANOVA are the t-test, generalized linear models, Bayesian analysis, mixed-effects, random forests, permutations, and the Kruskal-Wallis’s test. 3. The method of least squares for regression Using the least squares on a regression model helps to determine the error in the model by averaging the square the difference between your predicted model observation and each observed observation in your population. This helps you find the model that best fits your data. The goal is to minimize the sum of all these “gaps” between your model and the observed data.

Figure 1. Mean Squared Error of your model 4. Assumptions of a linear model and how to check them a. Linearity – relationship is linear A. Check – residual plots b. Independent – observations are independent of each other A. Plot residuals against time B. Make a correlation plot and look for strong correlations between the factors to each other. But just because there is zero correlation, does automatically imply independence, but it can help guide you in the right direction. c. Homoscedastity – the variance of the residuals is relatively consistent A. Residual vs. Fitted plot B. Variance between the groups in the population d. Normality – the residuals are normally distributed A. Q-Q plot B. Transform dependent variables if it’s not normalized e. Multicollinearity – predictors are not correlated with each other (strong interactions between the predictors) A. Variance Inflation Factor (VIF) B. ANOVA with interaction terms

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(DataExplorer)
library(ggplot2)
library(gtsummary)
library(patchwork)
library(data.table)
## 
## Attaching package: 'data.table'
## 
## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, mday, minute, month, quarter, second, wday, week,
##     yday, year
## 
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
## 
## The following object is masked from 'package:purrr':
## 
##     transpose
pacman::p_load("pROC")
library(performance)
library(ggeffects)
library(sjPlot)
library(gtsummary)
pacman::p_load("equatiomatic")
pacman::p_load("vip")
library(mgcv)
## Loading required package: nlme
## 
## Attaching package: 'nlme'
## 
## The following object is masked from 'package:dplyr':
## 
##     collapse
## 
## This is mgcv 1.9-1. For overview type 'help("mgcv-package")'.
library(cowplot)
## 
## Attaching package: 'cowplot'
## 
## The following objects are masked from 'package:sjPlot':
## 
##     plot_grid, save_plot
## 
## The following object is masked from 'package:ggeffects':
## 
##     get_title
## 
## The following object is masked from 'package:patchwork':
## 
##     align_plots
## 
## The following object is masked from 'package:lubridate':
## 
##     stamp
library(randomForest)
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine
## 
## The following object is masked from 'package:ggplot2':
## 
##     margin
library(ISLR)
pacman::p_load(tidycensus)

Data Cleaning The goal of this section is to explore the data set and get it ready for analysis. There are no missing values in the data set, but there are some incorrect entries that must be identified and removed before completing the analysis. Age can be regarded as quantitative, and any value less than 18 is invalid. Length of residence (LenRes) is a value ranging from zero to someone’s age. LenRes should not be higher than Age. Income is coded as an ordinal value, ranging from 1 to 12, it’s left to you to decide if it should be treated as continuous or categorical. You should create a simple 1-2 paragraph summary of this section. Be sure to fully explain the reasoning behind transforming any columns and removing any rows. Simply saying that, “Campbell told me to” is not sufficient. Justify why it makes sense not to include any rows whose age is less than 18 or why we shouldn’t use rows in which length of residence is larger than age.

d1 <- read.csv("catalog.csv")
d1
##     SpendRat Age LenRes Income TotAsset SecAssets ShortLiq LongLiq WlthIdx
## 1      11.83   0      2      3      122        27      225     422     286
## 2      16.83  35      3      5      195        36      220     420     430
## 3      11.38  46      9      5      123        24      200     420     290
## 4      31.33  41      2      2      117        25      222     419     279
## 5       1.90  46      7      9      493       105      310     500     520
## 6      84.13  46     15      5      138        27      340     450     440
## 7       2.15  46     16      4      162        25      230     430     360
## 8      38.00  56     31      6      117        27      300     440     400
## 9     136.28  48      8      5      119        23      250     430     360
## 10     61.46  54      8      5       50        10      200     420     230
## 11      2.73  43     15      4      135        21      230     430     340
## 12      5.86  66      8      4       81        19      250     430     320
## 13    113.00  61      8      5      999       245      999     720     880
## 14      8.40  77      8      3      198        45      210     430     340
## 15     31.67   0      8      6      243        55      220     450     370
## 16     19.00  50      7      4      412        95      290     470     490
## 17     70.85  49     19      5      192        37      200     440     330
## 18     20.17  76     12      5       64        15      190     410     280
## 19      0.59  49      2      7      229        44      270     450     430
## 20     19.48  63      7      5       95        22      270     440     360
## 21     86.20  67     32      5       87        20      250     430     330
## 22     25.46  70     36      5      511        99      270     540     500
## 23     76.82  77     15      4      242        56      300     470     450
## 24      0.77  71      9      5       43        10      200     420     210
## 25    335.94  58      6      6       88        20      220     430     290
## 26     18.78  48      3      5      534        94      240     520     460
## 27    205.07  77      9      6      131        30      205     433     295
## 28    216.57  66      9      7       85        20      220     420     300
## 29     77.21  21      3      4      206        12      280     460     420
## 30      0.26  89     11      5       57        13      230     420     280
## 31     92.53  81     37      6      999       209      280     670     530
## 32     46.55  56     19      3      143        23      220     410     380
## 33      6.86  68     20      5      338        78      280     480     490
## 34     18.14  40      7      5      306        47      280     430     490
## 35    120.50  78      3      4      285        55      290     440     470
## 36     37.08  69      7      3      100        23      200     420     300
## 37     19.20  46     27      4       52        12      210     420     250
## 38      3.35   0      2      6      112        26      310     440     400
## 39     23.08  74     25      3       84        19      260     430     330
## 40    165.13  54      9      4      164        31      225     433     333
## 41      6.50  70      9      4      109        18      194     414     265
## 42      2.31   0      2      5      132        28      227     424     303
## 43      8.29  73      3      4       73        17      250     430     310
## 44      3.95  53      8      7      127        25      240     430     350
## 45     14.88  42      7      6      155        31      229     427     328
## 46    125.32  64     25      6      196        32      240     420     430
## 47     39.43  50      5      7      353        68      240     440     440
## 48     39.81  37     10      5      216        33      280     440     440
## 49    230.89  20     21      5      564       109      250     520     490
## 50      3.61  46     16      5      244        47      220     440     390
## 51     37.78  57     12      4      277        64      230     440     410
## 52      9.23  54      3      7      210        36      210     410     390
## 53     44.00  47     17      5      315        61      280     450     480
## 54     86.96  43     15      5       45         7      230     420     270
## 55      2.37  55      7      6      127        29      216     424     292
## 56      4.81  56     15      6      228        53      260     460     410
## 57      0.16  57     10      6      139        32      310     450     420
## 58     99.10  36      2      5       51         8      230     410     300
## 59     11.85  38      8      7      260        46      260     440     450
## 60     12.80  38      7      5      195        30      230     430     370
## 61     21.00  55     19      6      145        28      190     410     360
## 62     17.53  59      8      5      243        56      330     480     480
## 63     72.91  55     22      6      140        27      280     450     390
## 64     47.18   0      8      4      397        75      240     480     430
## 65    113.71  49     19      4      129        25      200     420     290
## 66     10.38  52      7      6       98        19      202     417     252
## 67      9.36  41      9      4      170        26      190     410     380
## 68    304.61  49      8      4      267        52      190     410     520
## 69      6.76   0      4      7      258        50      230     470     410
## 70     41.63  52     18      4      188        35      230     430     380
## 71      1.37  41      8      3      196        33      220     420     360
## 72     46.08   0      2      5      130        25      220     423     297
## 73     24.12  34     13      3      183        42      220     420     400
## 74     33.15  32      3      5       32         2      220     410     260
## 75      9.80   0      5      5       81        16      194     411     230
## 76     22.44  78     40      4       38         9      210     420     230
## 77     11.79  44      8      3      237        18      200     420     350
## 78      1.36  81      8      2       92        21      240     430     340
## 79      0.71  86     42      2       90        21      280     440     360
## 80     39.69  76      8      5      107        34      206     431     274
## 81      4.33  79     18      4       77        18      220     430     300
## 82      8.39  84     22      2       91        21      260     440     340
## 83     10.80  49     14      2      117        23      230     430     320
## 84      5.80  55     16      4      169        35      238     432     347
## 85      1.33  70     17      3       94        23      190     420     254
## 86      4.13  43     16      2      139        24      230     430     340
## 87     17.94  85     42      1      123        32      207     432     289
## 88      3.23  56     25      2      126        29      260     430     380
## 89     45.54  42      8      3      168        26      240     420     400
## 90      6.77  45      9      3       40         6      160     400     260
## 91      4.53  55      7      2      143        28      216     428     311
## 92     13.50  62     14      2       77        18      260     430     330
## 93    129.72  40      9      2       58         9      270     420     320
## 94     19.27  44     17      4      180        27      240     440     370
## 95    153.67  71     38      4       69        16      200     420     260
## 96     61.90  61     31      4      209        48      290     440     490
## 97    275.91  79     32      2       63        15      230     420     290
## 98      5.60  51     13      4      222        36      260     440     450
## 99    401.42  52     24      5      260        50      270     470     440
## 100    15.00  35      8      4      141         8      250     420     370
## 101     3.67  62      6      4      184        42      310     470     440
## 102     7.60  30      7      4       20         1      220     400     200
## 103     5.56  54     35      4      273        51      260     460     440
## 104     9.58   0      4      6      170        33      270     440     410
## 105    16.69  60     14      5      999       999      190     999     540
## 106     5.92  60     16      5       74        17      180     410     240
## 107    39.73  41     34      5       62         9      270     420     320
## 108     2.22  45      7      2      119        18      240     420     360
## 109     1.33  73      9      5      307        47      220     460     390
## 110     4.65  35     11      8      166         9      220     410     380
## 111    31.65   0      9      6       92        21      240     430     320
## 112     6.13  41     11      5      218        33      210     410     430
## 113    13.14  61     30      5      305        70      220     470     390
## 114    44.13  87      2      1       48        21      178     419     198
## 115    26.43  50      9      5      475        92      230     460     460
## 116     9.88  66      8      5      192        38      240     440     390
## 117    13.28  63     27      5      202        39      230     430     410
## 118    14.93  53     24      2      140        27      200     430     300
## 119    30.73  42      9      4      117        21      206     421     285
## 120     5.50  40      9      4      122        19      250     420     350
## 121    15.95  54     29      5      206        40      250     440     410
## 122     6.93  34     10      4       35         2      240     410     270
## 123    91.50  50      8      5       36         7      180     400     220
## 124    50.55  57      3      6      242        39      270     450     430
## 125    11.73  57     18      5      141        32      224     428     310
## 126     1.24  57      6      5      116        22      200     420     280
## 127     3.69  33     12      4       54         3      190     410     230
## 128    50.18  50     12      5      246        48      210     420     430
## 129    43.85  43      8      5      201        42      230     430     380
## 130     3.55  64      7      5       73        17      190     400     310
## 131    14.58  37      8      5      208        32      220     410     450
## 132     5.00  39     19      4      169        26      210     430     310
## 133    36.53  52     35      3       48         9      220     420     250
## 134    10.45  47     14      5      107        21      200     430     260
## 135     1.00   0      8      6      240        37      200     420     380
## 136     8.50   0      9      7      139        19      210     400     390
## 137    11.50  55     30      4       69        13      210     410     320
## 138    68.36  50      8      7       56        11      210     410     290
## 139     8.82  31     10      4      133        23      222     420     298
## 140     4.06  42      5      3      118        18      250     430     360
## 141     1.67  49     19      2      150        31      229     427     318
## 142    85.00  50     23      4      136        26      220     420     370
## 143    44.79  52     21      5      150        26      250     430     380
## 144    13.20  69     12      3      292        67      230     460     400
## 145    30.30  46     10      5      320        62      290     460     480
## 146     8.22  52     13      5      236        46      190     410     430
## 147     6.39  53     10      4      232        45      210     420     420
## 148     3.50  40      8      6      179        27      230     410     420
## 149    11.24  53     16      5      215        42      180     410     410
## 150     3.18  53     30      5       89        17      270     440     350
## 151    22.79  72     37      5       61         9      230     420     310
## 152    18.83  61     11      5      108        25      250     440     350
## 153    32.11  77      3      4       43        10      210     410     240
## 154     4.56   0      9      6       63        15      220     430     270
## 155    17.74  36     12      4      194        37      260     440     410
## 156    17.45  52     11      4      218        42      300     460     440
## 157     2.00  56      3      5      155        36      240     440     370
## 158    23.13  78      4      6      260        52      300     460     470
## 159    61.76  42      7      4      147        34      220     430     360
## 160     4.25  75     30      4      124        30      209     428     293
## 161    60.29  55     26      4      114         6      280     420     370
## 162    32.50  57     46      3       79        18      240     430     310
## 163     0.08  78      9      2       65        15      250     430     300
## 164     4.56  39      9      5      215        34      230     430     430
## 165    14.50  68     35      2       94        22      270     440     360
## 166    72.37  52     14      7      332        65      300     450     520
## 167    11.74  44     17      5      157        24      240     430     380
## 168   125.44  68     37      6      138        32      270     450     390
## 169    52.45  74     39      6      188        43      200     440     320
## 170    39.50  41      8      5      200        11      180     400     300
## 171    94.15  48      0      6      131        28      220     430     340
## 172    34.37  57     14      6      265        61      270     460     450
## 173   111.83  53      8      6      243        47      310     460     470
## 174    48.80  61      7      4      134        19      200     410     320
## 175   329.71  73     34      5       69        16      220     420     300
## 176    32.35  60     10      6      177        41      280     460     410
## 177     1.11   0      9      6      187        43      200     420     320
## 178    54.88  46     11      6      149        29      250     450     380
## 179    70.23  52     11      5      229        44      300     450     460
## 180    25.24  42     16      7      307        17      260     430     440
## 181     1.83  78     26      7        5         0      180     400      90
## 182     9.79  40     14      5      349        53      230     430     450
## 183   151.93  42      9      6      231        35      290     450     460
## 184   119.87  64     15      6      133        31      230     440     330
## 185     3.81  37     13      4      150        23      250     420     410
## 186    69.04  58     14      4      161        37      190     420     370
## 187    52.67  49      2      2       87        20      190     414     239
## 188     9.92  31     35      1      181        39      242     438     363
## 189    23.45  49     22      3      180        35      190     410     360
## 190    45.50  76     17      3       44        10      200     420     220
## 191    60.82  57     18      3      430        99      380     520     570
## 192     2.23  57     14      3      112        17      240     420     340
## 193    36.00  41     10      4      322        54      230     440     430
## 194   307.15  40     15      4      309        71      280     450     520
## 195    12.92  46      6      5      999       346      230     610     590
## 196   111.19  49     24      2      173        40      270     450     400
## 197     2.42  46      2      4      237        46      280     440     440
## 198    29.77  53     22      3      224        43      180     410     420
## 199    19.35  57     18      4      174        40      210     430     360
## 200     4.42  59      9      6      169        39      220     460     330
##     SpendVol SpenVel CollGifts BricMortar MarthaHome SunAds ThemeColl CustDec
## 1        503     285         1          0          0      1         0       1
## 2        690     570         0          1          1      0         0       1
## 3        600     280         1          0          0      1         1       1
## 4        543     308         1          0          0      1         1       0
## 5        680     100         0          1          1      0         0       1
## 6        440      50         0          1          1      0         0       1
## 7        690     180         1          0          0      1         0       0
## 8        500      10         1          1          1      0         1       1
## 9        610       0         1          0          1      0         1       1
## 10       660       0         0          1          0      0         0       0
## 11       610      50         1          1          0      1         1       0
## 12       220       0         0          0          1      1         0       1
## 13       570     220         1          0          0      1         1       0
## 14       630     300         0          0          0      0         0       0
## 15       580     170         0          0          0      0         0       0
## 16       770      30         1          0          1      1         1       1
## 17       620     170         1          0          0      1         1       0
## 18       570     610         1          1          1      1         0       1
## 19       710     130         0          0          0      0         0       0
## 20       340      30         1          0          1      1         1       1
## 21       380      20         0          1          1      0         0       1
## 22       680     160         0          0          1      1         0       1
## 23       660      20         1          0          1      1         0       1
## 24       300      10         0          0          0      0         1       0
## 25       490       0         1          1          1      1         1       0
## 26       730     140         1          0          1      0         1       1
## 27       458     188         1          1          1      1         0       1
## 28       590     200         1          1          1      1         1       0
## 29       690      20         1          1          0      0         0       0
## 30         0       0         0          0          0      0         0       0
## 31       700      80         1          0          1      0         1       0
## 32       620     380         0          1          1      0         0       1
## 33       660     230         1          0          0      1         1       0
## 34       750     240         0          1          1      0         0       0
## 35       690      80         0          1          1      1         1       0
## 36       560     450         1          0          1      0         0       0
## 37       260      10         1          0          0      1         1       0
## 38         0       0         0          0          0      0         0       0
## 39       260       0         0          1          0      0         0       0
## 40       534     238         1          1          1      0         0       1
## 41       599     312         0          0          1      0         0       1
## 42       534     281         0          0          0      1         0       0
## 43       250       0         0          0          0      1         1       0
## 44       710     100         0          0          0      0         0       0
## 45       564     251         0          0          1      0         1       0
## 46       640     380         0          0          0      1         1       0
## 47       730     190         1          0          1      0         0       0
## 48       700      90         1          0          0      1         0       1
## 49       700     130         1          1          1      1         1       1
## 50       660     300         0          0          0      0         0       0
## 51       660     220         0          1          1      0         0       1
## 52       650     400         1          0          1      1         0       1
## 53       670     210         1          0          1      1         1       1
## 54       480      70         1          0          1      1         1       0
## 55       526     238         0          1          0      1         0       0
## 56       630      10         1          0          0      1         0       1
## 57       370      30         0          0          0      1         0       1
## 58       610     330         1          0          1      1         0       1
## 59       670     240         1          0          0      0         1       1
## 60       740     110         1          0          1      0         1       1
## 61       670     750         1          0          0      0         0       0
## 62       600      40         0          0          1      0         1       0
## 63       590       0         1          1          1      0         1       1
## 64       630     260         0          1          1      0         0       1
## 65       590      90         1          0          1      1         1       0
## 66       506     308         1          0          0      0         0       0
## 67       720     750         0          0          0      1         1       0
## 68       650     900         1          0          0      1         1       0
## 69       580     230         1          0          0      0         1       0
## 70       650     160         1          1          0      0         0       0
## 71       660     210         1          0          0      1         1       0
## 72       547     300         0          0          0      0         0       0
## 73       650     570         0          0          0      0         0       0
## 74       590     310         0          0          0      0         0       0
## 75       541     329         0          1          1      0         0       1
## 76         0       0         0          1          0      0         0       0
## 77       680     250         0          0          0      0         1       0
## 78       480     150         0          0          0      1         1       0
## 79       180      10         0          0          0      0         0       0
## 80       376     132         0          1          0      0         0       0
## 81       490      70         0          0          0      1         0       1
## 82       240       0         1          0          0      1         1       0
## 83       600       0         1          0          0      1         1       1
## 84       552     230         0          0          1      0         0       0
## 85       489     218         1          0          0      1         0       0
## 86       640      90         0          0          0      0         0       0
## 87       437     166         0          0          0      0         0       0
## 88       560     140         0          0          0      0         0       0
## 89       630     360         0          0          0      0         1       0
## 90       560     999         0          0          0      1         1       0
## 91       516     241         0          1          1      0         0       0
## 92        80       0         1          0          1      1         1       0
## 93       350      30         1          1          1      0         0       1
## 94       640      60         0          0          1      0         0       0
## 95       460     130         1          1          1      0         0       1
## 96       610     430         1          0          1      1         0       0
## 97       360      30         0          0          0      0         0       0
## 98       690     280         1          1          0      0         1       0
## 99       620      80         1          1          1      0         1       1
## 100      710     100         0          0          0      1         0       1
## 101      570       0         1          0          0      0         1       0
## 102      560      10         0          1          0      0         0       0
## 103      610     150         1          0          0      1         0       1
## 104      650      60         1          0          0      1         0       1
## 105      620     490         0          0          0      1         1       1
## 106      590     580         0          0          0      0         0       0
## 107      580      10         0          1          0      1         0       1
## 108      670     130         0          0          0      0         0       0
## 109      680     130         0          0          0      0         0       0
## 110      650     450         1          0          0      1         1       0
## 111      410      30         0          1          0      0         0       1
## 112      700     660         1          0          1      1         1       1
## 113      700     150         0          1          0      0         0       1
## 114      362     205         1          0          0      1         0       0
## 115      740     220         0          1          0      0         0       0
## 116      610     310         0          0          0      0         1       1
## 117      620     330         1          0          1      1         1       1
## 118      610     130         0          0          0      1         0       1
## 119      552     304         0          1          0      0         0       0
## 120      660       0         0          0          0      0         0       0
## 121      600     210         0          0          1      1         1       1
## 122      480     210         0          0          0      0         0       0
## 123      570     680         1          0          0      1         1       1
## 124      660      40         1          0          0      1         0       1
## 125      513     219         0          0          0      1         0       0
## 126      580     140         1          0          1      1         0       1
## 127      580     230         0          0          0      0         0       0
## 128      680     600         1          1          0      1         1       0
## 129      690     220         1          0          0      0         0       0
## 130      750     690         0          0          0      0         0       0
## 131      640     640         0          1          1      0         0       0
## 132      780       0         0          0          0      0         0       0
## 133      130       0         0          0          0      1         1       0
## 134      630       0         0          0          0      0         0       0
## 135      680     410         1          0          0      1         0       0
## 136      710     600         0          0          0      0         1       1
## 137      520     510         0          0          0      0         0       0
## 138      540     480         0          1          1      0         0       1
## 139      584     334         0          0          0      0         0       0
## 140      610      50         0          1          0      0         0       0
## 141      553     254         0          0          0      0         0       0
## 142      620     390         1          0          0      1         1       0
## 143      600     190         1          0          0      1         1       1
## 144      620     100         0          0          0      0         0       0
## 145      680     130         1          0          0      0         1       1
## 146      710     750         0          0          1      0         0       0
## 147      720     580         1          0          0      0         0       0
## 148      680     450         0          0          0      0         0       0
## 149      720     840         0          0          0      0         0       0
## 150      230       0         0          0          1      0         0       0
## 151      540     240         1          0          0      1         1       0
## 152      430      70         0          0          0      1         1       1
## 153      460     110         1          0          0      1         1       0
## 154      260       0         0          0          1      1         0       1
## 155      640     140         1          0          0      1         1       0
## 156      700       0         0          0          0      0         0       0
## 157      600     120         0          0          0      0         0       0
## 158      640     130         1          1          0      1         0       0
## 159      620     300         1          0          1      1         0       1
## 160      480     191         0          0          0      1         1       0
## 161      580      10         0          0          0      0         0       0
## 162        0       0         0          0          0      0         0       0
## 163       80       0         0          0          0      0         0       0
## 164      670     420         0          1          0      0         1       0
## 165      360      40         1          0          1      1         0       0
## 166      660     260         1          0          1      1         1       1
## 167      670     110         1          0          0      1         0       1
## 168      550      40         1          1          0      1         1       1
## 169      570     250         1          0          0      0         0       0
## 170      700     560         1          0          0      0         1       0
## 171      540     120         1          1          1      1         0       1
## 172      670     120         1          0          1      0         0       0
## 173      620     100         1          1          1      0         1       1
## 174      640     330         1          0          0      1         1       0
## 175      510     200         1          1          1      1         1       1
## 176      500      60         1          0          1      1         0       0
## 177      670     270         0          0          0      0         0       0
## 178      590     100         1          1          1      0         0       0
## 179      650      50         0          1          0      1         1       0
## 180      720      40         1          1          1      0         0       0
## 181      350     630         0          0          0      0         0       0
## 182      740     370         0          1          0      0         0       0
## 183      660     190         1          1          1      1         0       1
## 184      580       0         1          0          1      0         1       1
## 185      650     310         0          0          0      1         1       1
## 186      590     650         0          1          1      0         1       0
## 187      511     266         1          0          0      0         1       0
## 188      520     194         1          0          0      1         0       0
## 189      630     650         1          1          0      0         0       0
## 190      170       0         0          1          1      0         0       0
## 191      690      80         0          1          1      0         0       0
## 192      630      40         0          0          0      0         0       0
## 193      680     330         0          1          1      0         0       1
## 194      670     420         1          0          1      1         1       1
## 195      650     540         0          0          0      0         0       0
## 196      640      70         1          0          0      1         1       1
## 197      650      20         1          0          0      1         1       1
## 198      670     880         1          0          0      0         0       0
## 199      590     370         0          0          0      0         0       0
## 200      540       0         1          0          0      0         1       0
##     RetailKids TeenWr Carlovers CountryColl
## 1            1      1         0           1
## 2            1      0         0           0
## 3            1      0         0           1
## 4            0      0         0           1
## 5            0      0         0           0
## 6            0      0         1           0
## 7            0      0         0           1
## 8            1      1         1           0
## 9            0      0         0           1
## 10           0      1         0           0
## 11           0      1         0           1
## 12           1      0         0           0
## 13           1      1         0           1
## 14           0      0         0           1
## 15           0      0         0           0
## 16           0      0         0           1
## 17           0      1         1           1
## 18           1      0         0           1
## 19           0      0         0           0
## 20           1      0         0           1
## 21           0      1         0           0
## 22           1      0         0           0
## 23           0      1         0           1
## 24           0      1         0           0
## 25           1      1         1           1
## 26           0      1         0           0
## 27           0      0         1           1
## 28           1      1         1           1
## 29           0      1         1           1
## 30           0      0         0           0
## 31           1      1         1           1
## 32           0      1         0           0
## 33           0      1         0           1
## 34           0      1         0           0
## 35           0      1         1           0
## 36           0      0         0           0
## 37           1      0         1           1
## 38           0      1         0           0
## 39           0      0         0           0
## 40           0      1         0           0
## 41           0      0         1           0
## 42           0      0         0           0
## 43           0      1         0           1
## 44           0      0         0           0
## 45           0      1         0           0
## 46           0      0         1           0
## 47           1      1         0           0
## 48           1      1         1           0
## 49           0      1         0           1
## 50           0      1         0           0
## 51           0      1         0           0
## 52           1      0         0           0
## 53           1      1         1           1
## 54           0      1         1           1
## 55           0      1         1           1
## 56           1      0         1           1
## 57           1      0         1           0
## 58           1      1         0           1
## 59           0      0         1           1
## 60           0      0         0           0
## 61           1      1         1           0
## 62           0      1         0           0
## 63           1      1         0           1
## 64           0      1         0           1
## 65           0      0         1           1
## 66           0      1         0           0
## 67           0      1         0           0
## 68           0      0         0           1
## 69           0      0         0           1
## 70           0      1         0           1
## 71           0      0         0           1
## 72           0      0         1           0
## 73           0      0         0           0
## 74           0      1         0           0
## 75           0      0         0           0
## 76           0      0         0           0
## 77           1      1         0           0
## 78           0      0         0           1
## 79           0      1         0           0
## 80           0      1         1           0
## 81           1      0         0           0
## 82           0      0         0           1
## 83           1      1         0           1
## 84           0      0         0           0
## 85           0      0         0           1
## 86           0      0         0           0
## 87           0      0         0           0
## 88           0      0         1           0
## 89           1      1         0           1
## 90           0      1         0           1
## 91           1      1         1           0
## 92           0      0         0           1
## 93           1      0         0           0
## 94           0      1         0           0
## 95           1      1         0           1
## 96           0      0         0           1
## 97           0      1         0           0
## 98           0      1         0           1
## 99           0      1         0           1
## 100          1      0         1           0
## 101          0      0         0           1
## 102          0      1         1           0
## 103          1      0         0           1
## 104          1      0         0           1
## 105          1      0         0           0
## 106          0      0         0           0
## 107          1      1         1           0
## 108          0      0         0           0
## 109          0      0         0           0
## 110          1      0         0           0
## 111          1      1         0           0
## 112          1      1         1           1
## 113          0      0         0           0
## 114          0      0         0           1
## 115          0      1         0           0
## 116          0      0         0           0
## 117          0      0         0           1
## 118          1      1         0           0
## 119          0      1         0           0
## 120          0      0         1           0
## 121          1      0         0           0
## 122          0      1         0           0
## 123          1      0         0           1
## 124          1      1         1           0
## 125          0      0         0           0
## 126          1      0         0           1
## 127          1      1         0           0
## 128          1      1         0           1
## 129          0      1         1           0
## 130          0      0         0           0
## 131          0      1         0           0
## 132          0      1         0           0
## 133          0      0         1           1
## 134          0      1         0           0
## 135          0      0         0           1
## 136          0      0         0           0
## 137          0      1         1           0
## 138          0      1         0           0
## 139          1      0         0           0
## 140          0      0         1           0
## 141          0      0         0           0
## 142          1      1         1           1
## 143          0      1         1           1
## 144          0      0         1           0
## 145          0      1         0           0
## 146          0      1         1           0
## 147          0      0         0           0
## 148          0      0         0           0
## 149          0      0         0           1
## 150          0      1         0           0
## 151          1      0         1           1
## 152          1      1         0           0
## 153          0      0         1           1
## 154          1      0         0           0
## 155          1      1         0           1
## 156          0      0         0           0
## 157          0      0         0           0
## 158          0      1         0           1
## 159          1      1         1           1
## 160          0      0         0           1
## 161          0      1         0           0
## 162          0      0         0           0
## 163          0      0         0           0
## 164          1      1         0           0
## 165          0      0         0           1
## 166          1      0         1           0
## 167          1      1         1           1
## 168          1      1         0           1
## 169          1      1         0           0
## 170          1      1         0           0
## 171          1      0         0           1
## 172          0      1         1           0
## 173          0      1         1           0
## 174          0      0         0           1
## 175          1      1         0           1
## 176          0      1         1           1
## 177          0      0         0           0
## 178          1      1         0           0
## 179          0      1         0           1
## 180          1      1         0           0
## 181          0      0         0           0
## 182          1      0         1           0
## 183          1      0         0           1
## 184          0      0         1           0
## 185          1      0         0           1
## 186          0      1         0           0
## 187          0      0         0           0
## 188          1      0         1           1
## 189          0      1         0           1
## 190          0      0         0           1
## 191          0      1         0           0
## 192          0      1         0           0
## 193          1      1         0           1
## 194          1      1         0           1
## 195          0      0         1           0
## 196          0      1         1           1
## 197          1      1         0           1
## 198          1      0         1           0
## 199          1      1         0           0
## 200          0      1         0           1

Filter out rows where age < 18

d1a <- filter(d1, Age>18)
d1a
##     SpendRat Age LenRes Income TotAsset SecAssets ShortLiq LongLiq WlthIdx
## 1      16.83  35      3      5      195        36      220     420     430
## 2      11.38  46      9      5      123        24      200     420     290
## 3      31.33  41      2      2      117        25      222     419     279
## 4       1.90  46      7      9      493       105      310     500     520
## 5      84.13  46     15      5      138        27      340     450     440
## 6       2.15  46     16      4      162        25      230     430     360
## 7      38.00  56     31      6      117        27      300     440     400
## 8     136.28  48      8      5      119        23      250     430     360
## 9      61.46  54      8      5       50        10      200     420     230
## 10      2.73  43     15      4      135        21      230     430     340
## 11      5.86  66      8      4       81        19      250     430     320
## 12    113.00  61      8      5      999       245      999     720     880
## 13      8.40  77      8      3      198        45      210     430     340
## 14     19.00  50      7      4      412        95      290     470     490
## 15     70.85  49     19      5      192        37      200     440     330
## 16     20.17  76     12      5       64        15      190     410     280
## 17      0.59  49      2      7      229        44      270     450     430
## 18     19.48  63      7      5       95        22      270     440     360
## 19     86.20  67     32      5       87        20      250     430     330
## 20     25.46  70     36      5      511        99      270     540     500
## 21     76.82  77     15      4      242        56      300     470     450
## 22      0.77  71      9      5       43        10      200     420     210
## 23    335.94  58      6      6       88        20      220     430     290
## 24     18.78  48      3      5      534        94      240     520     460
## 25    205.07  77      9      6      131        30      205     433     295
## 26    216.57  66      9      7       85        20      220     420     300
## 27     77.21  21      3      4      206        12      280     460     420
## 28      0.26  89     11      5       57        13      230     420     280
## 29     92.53  81     37      6      999       209      280     670     530
## 30     46.55  56     19      3      143        23      220     410     380
## 31      6.86  68     20      5      338        78      280     480     490
## 32     18.14  40      7      5      306        47      280     430     490
## 33    120.50  78      3      4      285        55      290     440     470
## 34     37.08  69      7      3      100        23      200     420     300
## 35     19.20  46     27      4       52        12      210     420     250
## 36     23.08  74     25      3       84        19      260     430     330
## 37    165.13  54      9      4      164        31      225     433     333
## 38      6.50  70      9      4      109        18      194     414     265
## 39      8.29  73      3      4       73        17      250     430     310
## 40      3.95  53      8      7      127        25      240     430     350
## 41     14.88  42      7      6      155        31      229     427     328
## 42    125.32  64     25      6      196        32      240     420     430
## 43     39.43  50      5      7      353        68      240     440     440
## 44     39.81  37     10      5      216        33      280     440     440
## 45    230.89  20     21      5      564       109      250     520     490
## 46      3.61  46     16      5      244        47      220     440     390
## 47     37.78  57     12      4      277        64      230     440     410
## 48      9.23  54      3      7      210        36      210     410     390
## 49     44.00  47     17      5      315        61      280     450     480
## 50     86.96  43     15      5       45         7      230     420     270
## 51      2.37  55      7      6      127        29      216     424     292
## 52      4.81  56     15      6      228        53      260     460     410
## 53      0.16  57     10      6      139        32      310     450     420
## 54     99.10  36      2      5       51         8      230     410     300
## 55     11.85  38      8      7      260        46      260     440     450
## 56     12.80  38      7      5      195        30      230     430     370
## 57     21.00  55     19      6      145        28      190     410     360
## 58     17.53  59      8      5      243        56      330     480     480
## 59     72.91  55     22      6      140        27      280     450     390
## 60    113.71  49     19      4      129        25      200     420     290
## 61     10.38  52      7      6       98        19      202     417     252
## 62      9.36  41      9      4      170        26      190     410     380
## 63    304.61  49      8      4      267        52      190     410     520
## 64     41.63  52     18      4      188        35      230     430     380
## 65      1.37  41      8      3      196        33      220     420     360
## 66     24.12  34     13      3      183        42      220     420     400
## 67     33.15  32      3      5       32         2      220     410     260
## 68     22.44  78     40      4       38         9      210     420     230
## 69     11.79  44      8      3      237        18      200     420     350
## 70      1.36  81      8      2       92        21      240     430     340
## 71      0.71  86     42      2       90        21      280     440     360
## 72     39.69  76      8      5      107        34      206     431     274
## 73      4.33  79     18      4       77        18      220     430     300
## 74      8.39  84     22      2       91        21      260     440     340
## 75     10.80  49     14      2      117        23      230     430     320
## 76      5.80  55     16      4      169        35      238     432     347
## 77      1.33  70     17      3       94        23      190     420     254
## 78      4.13  43     16      2      139        24      230     430     340
## 79     17.94  85     42      1      123        32      207     432     289
## 80      3.23  56     25      2      126        29      260     430     380
## 81     45.54  42      8      3      168        26      240     420     400
## 82      6.77  45      9      3       40         6      160     400     260
## 83      4.53  55      7      2      143        28      216     428     311
## 84     13.50  62     14      2       77        18      260     430     330
## 85    129.72  40      9      2       58         9      270     420     320
## 86     19.27  44     17      4      180        27      240     440     370
## 87    153.67  71     38      4       69        16      200     420     260
## 88     61.90  61     31      4      209        48      290     440     490
## 89    275.91  79     32      2       63        15      230     420     290
## 90      5.60  51     13      4      222        36      260     440     450
## 91    401.42  52     24      5      260        50      270     470     440
## 92     15.00  35      8      4      141         8      250     420     370
## 93      3.67  62      6      4      184        42      310     470     440
## 94      7.60  30      7      4       20         1      220     400     200
## 95      5.56  54     35      4      273        51      260     460     440
## 96     16.69  60     14      5      999       999      190     999     540
## 97      5.92  60     16      5       74        17      180     410     240
## 98     39.73  41     34      5       62         9      270     420     320
## 99      2.22  45      7      2      119        18      240     420     360
## 100     1.33  73      9      5      307        47      220     460     390
## 101     4.65  35     11      8      166         9      220     410     380
## 102     6.13  41     11      5      218        33      210     410     430
## 103    13.14  61     30      5      305        70      220     470     390
## 104    44.13  87      2      1       48        21      178     419     198
## 105    26.43  50      9      5      475        92      230     460     460
## 106     9.88  66      8      5      192        38      240     440     390
## 107    13.28  63     27      5      202        39      230     430     410
## 108    14.93  53     24      2      140        27      200     430     300
## 109    30.73  42      9      4      117        21      206     421     285
## 110     5.50  40      9      4      122        19      250     420     350
## 111    15.95  54     29      5      206        40      250     440     410
## 112     6.93  34     10      4       35         2      240     410     270
## 113    91.50  50      8      5       36         7      180     400     220
## 114    50.55  57      3      6      242        39      270     450     430
## 115    11.73  57     18      5      141        32      224     428     310
## 116     1.24  57      6      5      116        22      200     420     280
## 117     3.69  33     12      4       54         3      190     410     230
## 118    50.18  50     12      5      246        48      210     420     430
## 119    43.85  43      8      5      201        42      230     430     380
## 120     3.55  64      7      5       73        17      190     400     310
## 121    14.58  37      8      5      208        32      220     410     450
## 122     5.00  39     19      4      169        26      210     430     310
## 123    36.53  52     35      3       48         9      220     420     250
## 124    10.45  47     14      5      107        21      200     430     260
## 125    11.50  55     30      4       69        13      210     410     320
## 126    68.36  50      8      7       56        11      210     410     290
## 127     8.82  31     10      4      133        23      222     420     298
## 128     4.06  42      5      3      118        18      250     430     360
## 129     1.67  49     19      2      150        31      229     427     318
## 130    85.00  50     23      4      136        26      220     420     370
## 131    44.79  52     21      5      150        26      250     430     380
## 132    13.20  69     12      3      292        67      230     460     400
## 133    30.30  46     10      5      320        62      290     460     480
## 134     8.22  52     13      5      236        46      190     410     430
## 135     6.39  53     10      4      232        45      210     420     420
## 136     3.50  40      8      6      179        27      230     410     420
## 137    11.24  53     16      5      215        42      180     410     410
## 138     3.18  53     30      5       89        17      270     440     350
## 139    22.79  72     37      5       61         9      230     420     310
## 140    18.83  61     11      5      108        25      250     440     350
## 141    32.11  77      3      4       43        10      210     410     240
## 142    17.74  36     12      4      194        37      260     440     410
## 143    17.45  52     11      4      218        42      300     460     440
## 144     2.00  56      3      5      155        36      240     440     370
## 145    23.13  78      4      6      260        52      300     460     470
## 146    61.76  42      7      4      147        34      220     430     360
## 147     4.25  75     30      4      124        30      209     428     293
## 148    60.29  55     26      4      114         6      280     420     370
## 149    32.50  57     46      3       79        18      240     430     310
## 150     0.08  78      9      2       65        15      250     430     300
## 151     4.56  39      9      5      215        34      230     430     430
## 152    14.50  68     35      2       94        22      270     440     360
## 153    72.37  52     14      7      332        65      300     450     520
## 154    11.74  44     17      5      157        24      240     430     380
## 155   125.44  68     37      6      138        32      270     450     390
## 156    52.45  74     39      6      188        43      200     440     320
## 157    39.50  41      8      5      200        11      180     400     300
## 158    94.15  48      0      6      131        28      220     430     340
## 159    34.37  57     14      6      265        61      270     460     450
## 160   111.83  53      8      6      243        47      310     460     470
## 161    48.80  61      7      4      134        19      200     410     320
## 162   329.71  73     34      5       69        16      220     420     300
## 163    32.35  60     10      6      177        41      280     460     410
## 164    54.88  46     11      6      149        29      250     450     380
## 165    70.23  52     11      5      229        44      300     450     460
## 166    25.24  42     16      7      307        17      260     430     440
## 167     1.83  78     26      7        5         0      180     400      90
## 168     9.79  40     14      5      349        53      230     430     450
## 169   151.93  42      9      6      231        35      290     450     460
## 170   119.87  64     15      6      133        31      230     440     330
## 171     3.81  37     13      4      150        23      250     420     410
## 172    69.04  58     14      4      161        37      190     420     370
## 173    52.67  49      2      2       87        20      190     414     239
## 174     9.92  31     35      1      181        39      242     438     363
## 175    23.45  49     22      3      180        35      190     410     360
## 176    45.50  76     17      3       44        10      200     420     220
## 177    60.82  57     18      3      430        99      380     520     570
## 178     2.23  57     14      3      112        17      240     420     340
## 179    36.00  41     10      4      322        54      230     440     430
## 180   307.15  40     15      4      309        71      280     450     520
## 181    12.92  46      6      5      999       346      230     610     590
## 182   111.19  49     24      2      173        40      270     450     400
## 183     2.42  46      2      4      237        46      280     440     440
## 184    29.77  53     22      3      224        43      180     410     420
## 185    19.35  57     18      4      174        40      210     430     360
## 186     4.42  59      9      6      169        39      220     460     330
##     SpendVol SpenVel CollGifts BricMortar MarthaHome SunAds ThemeColl CustDec
## 1        690     570         0          1          1      0         0       1
## 2        600     280         1          0          0      1         1       1
## 3        543     308         1          0          0      1         1       0
## 4        680     100         0          1          1      0         0       1
## 5        440      50         0          1          1      0         0       1
## 6        690     180         1          0          0      1         0       0
## 7        500      10         1          1          1      0         1       1
## 8        610       0         1          0          1      0         1       1
## 9        660       0         0          1          0      0         0       0
## 10       610      50         1          1          0      1         1       0
## 11       220       0         0          0          1      1         0       1
## 12       570     220         1          0          0      1         1       0
## 13       630     300         0          0          0      0         0       0
## 14       770      30         1          0          1      1         1       1
## 15       620     170         1          0          0      1         1       0
## 16       570     610         1          1          1      1         0       1
## 17       710     130         0          0          0      0         0       0
## 18       340      30         1          0          1      1         1       1
## 19       380      20         0          1          1      0         0       1
## 20       680     160         0          0          1      1         0       1
## 21       660      20         1          0          1      1         0       1
## 22       300      10         0          0          0      0         1       0
## 23       490       0         1          1          1      1         1       0
## 24       730     140         1          0          1      0         1       1
## 25       458     188         1          1          1      1         0       1
## 26       590     200         1          1          1      1         1       0
## 27       690      20         1          1          0      0         0       0
## 28         0       0         0          0          0      0         0       0
## 29       700      80         1          0          1      0         1       0
## 30       620     380         0          1          1      0         0       1
## 31       660     230         1          0          0      1         1       0
## 32       750     240         0          1          1      0         0       0
## 33       690      80         0          1          1      1         1       0
## 34       560     450         1          0          1      0         0       0
## 35       260      10         1          0          0      1         1       0
## 36       260       0         0          1          0      0         0       0
## 37       534     238         1          1          1      0         0       1
## 38       599     312         0          0          1      0         0       1
## 39       250       0         0          0          0      1         1       0
## 40       710     100         0          0          0      0         0       0
## 41       564     251         0          0          1      0         1       0
## 42       640     380         0          0          0      1         1       0
## 43       730     190         1          0          1      0         0       0
## 44       700      90         1          0          0      1         0       1
## 45       700     130         1          1          1      1         1       1
## 46       660     300         0          0          0      0         0       0
## 47       660     220         0          1          1      0         0       1
## 48       650     400         1          0          1      1         0       1
## 49       670     210         1          0          1      1         1       1
## 50       480      70         1          0          1      1         1       0
## 51       526     238         0          1          0      1         0       0
## 52       630      10         1          0          0      1         0       1
## 53       370      30         0          0          0      1         0       1
## 54       610     330         1          0          1      1         0       1
## 55       670     240         1          0          0      0         1       1
## 56       740     110         1          0          1      0         1       1
## 57       670     750         1          0          0      0         0       0
## 58       600      40         0          0          1      0         1       0
## 59       590       0         1          1          1      0         1       1
## 60       590      90         1          0          1      1         1       0
## 61       506     308         1          0          0      0         0       0
## 62       720     750         0          0          0      1         1       0
## 63       650     900         1          0          0      1         1       0
## 64       650     160         1          1          0      0         0       0
## 65       660     210         1          0          0      1         1       0
## 66       650     570         0          0          0      0         0       0
## 67       590     310         0          0          0      0         0       0
## 68         0       0         0          1          0      0         0       0
## 69       680     250         0          0          0      0         1       0
## 70       480     150         0          0          0      1         1       0
## 71       180      10         0          0          0      0         0       0
## 72       376     132         0          1          0      0         0       0
## 73       490      70         0          0          0      1         0       1
## 74       240       0         1          0          0      1         1       0
## 75       600       0         1          0          0      1         1       1
## 76       552     230         0          0          1      0         0       0
## 77       489     218         1          0          0      1         0       0
## 78       640      90         0          0          0      0         0       0
## 79       437     166         0          0          0      0         0       0
## 80       560     140         0          0          0      0         0       0
## 81       630     360         0          0          0      0         1       0
## 82       560     999         0          0          0      1         1       0
## 83       516     241         0          1          1      0         0       0
## 84        80       0         1          0          1      1         1       0
## 85       350      30         1          1          1      0         0       1
## 86       640      60         0          0          1      0         0       0
## 87       460     130         1          1          1      0         0       1
## 88       610     430         1          0          1      1         0       0
## 89       360      30         0          0          0      0         0       0
## 90       690     280         1          1          0      0         1       0
## 91       620      80         1          1          1      0         1       1
## 92       710     100         0          0          0      1         0       1
## 93       570       0         1          0          0      0         1       0
## 94       560      10         0          1          0      0         0       0
## 95       610     150         1          0          0      1         0       1
## 96       620     490         0          0          0      1         1       1
## 97       590     580         0          0          0      0         0       0
## 98       580      10         0          1          0      1         0       1
## 99       670     130         0          0          0      0         0       0
## 100      680     130         0          0          0      0         0       0
## 101      650     450         1          0          0      1         1       0
## 102      700     660         1          0          1      1         1       1
## 103      700     150         0          1          0      0         0       1
## 104      362     205         1          0          0      1         0       0
## 105      740     220         0          1          0      0         0       0
## 106      610     310         0          0          0      0         1       1
## 107      620     330         1          0          1      1         1       1
## 108      610     130         0          0          0      1         0       1
## 109      552     304         0          1          0      0         0       0
## 110      660       0         0          0          0      0         0       0
## 111      600     210         0          0          1      1         1       1
## 112      480     210         0          0          0      0         0       0
## 113      570     680         1          0          0      1         1       1
## 114      660      40         1          0          0      1         0       1
## 115      513     219         0          0          0      1         0       0
## 116      580     140         1          0          1      1         0       1
## 117      580     230         0          0          0      0         0       0
## 118      680     600         1          1          0      1         1       0
## 119      690     220         1          0          0      0         0       0
## 120      750     690         0          0          0      0         0       0
## 121      640     640         0          1          1      0         0       0
## 122      780       0         0          0          0      0         0       0
## 123      130       0         0          0          0      1         1       0
## 124      630       0         0          0          0      0         0       0
## 125      520     510         0          0          0      0         0       0
## 126      540     480         0          1          1      0         0       1
## 127      584     334         0          0          0      0         0       0
## 128      610      50         0          1          0      0         0       0
## 129      553     254         0          0          0      0         0       0
## 130      620     390         1          0          0      1         1       0
## 131      600     190         1          0          0      1         1       1
## 132      620     100         0          0          0      0         0       0
## 133      680     130         1          0          0      0         1       1
## 134      710     750         0          0          1      0         0       0
## 135      720     580         1          0          0      0         0       0
## 136      680     450         0          0          0      0         0       0
## 137      720     840         0          0          0      0         0       0
## 138      230       0         0          0          1      0         0       0
## 139      540     240         1          0          0      1         1       0
## 140      430      70         0          0          0      1         1       1
## 141      460     110         1          0          0      1         1       0
## 142      640     140         1          0          0      1         1       0
## 143      700       0         0          0          0      0         0       0
## 144      600     120         0          0          0      0         0       0
## 145      640     130         1          1          0      1         0       0
## 146      620     300         1          0          1      1         0       1
## 147      480     191         0          0          0      1         1       0
## 148      580      10         0          0          0      0         0       0
## 149        0       0         0          0          0      0         0       0
## 150       80       0         0          0          0      0         0       0
## 151      670     420         0          1          0      0         1       0
## 152      360      40         1          0          1      1         0       0
## 153      660     260         1          0          1      1         1       1
## 154      670     110         1          0          0      1         0       1
## 155      550      40         1          1          0      1         1       1
## 156      570     250         1          0          0      0         0       0
## 157      700     560         1          0          0      0         1       0
## 158      540     120         1          1          1      1         0       1
## 159      670     120         1          0          1      0         0       0
## 160      620     100         1          1          1      0         1       1
## 161      640     330         1          0          0      1         1       0
## 162      510     200         1          1          1      1         1       1
## 163      500      60         1          0          1      1         0       0
## 164      590     100         1          1          1      0         0       0
## 165      650      50         0          1          0      1         1       0
## 166      720      40         1          1          1      0         0       0
## 167      350     630         0          0          0      0         0       0
## 168      740     370         0          1          0      0         0       0
## 169      660     190         1          1          1      1         0       1
## 170      580       0         1          0          1      0         1       1
## 171      650     310         0          0          0      1         1       1
## 172      590     650         0          1          1      0         1       0
## 173      511     266         1          0          0      0         1       0
## 174      520     194         1          0          0      1         0       0
## 175      630     650         1          1          0      0         0       0
## 176      170       0         0          1          1      0         0       0
## 177      690      80         0          1          1      0         0       0
## 178      630      40         0          0          0      0         0       0
## 179      680     330         0          1          1      0         0       1
## 180      670     420         1          0          1      1         1       1
## 181      650     540         0          0          0      0         0       0
## 182      640      70         1          0          0      1         1       1
## 183      650      20         1          0          0      1         1       1
## 184      670     880         1          0          0      0         0       0
## 185      590     370         0          0          0      0         0       0
## 186      540       0         1          0          0      0         1       0
##     RetailKids TeenWr Carlovers CountryColl
## 1            1      0         0           0
## 2            1      0         0           1
## 3            0      0         0           1
## 4            0      0         0           0
## 5            0      0         1           0
## 6            0      0         0           1
## 7            1      1         1           0
## 8            0      0         0           1
## 9            0      1         0           0
## 10           0      1         0           1
## 11           1      0         0           0
## 12           1      1         0           1
## 13           0      0         0           1
## 14           0      0         0           1
## 15           0      1         1           1
## 16           1      0         0           1
## 17           0      0         0           0
## 18           1      0         0           1
## 19           0      1         0           0
## 20           1      0         0           0
## 21           0      1         0           1
## 22           0      1         0           0
## 23           1      1         1           1
## 24           0      1         0           0
## 25           0      0         1           1
## 26           1      1         1           1
## 27           0      1         1           1
## 28           0      0         0           0
## 29           1      1         1           1
## 30           0      1         0           0
## 31           0      1         0           1
## 32           0      1         0           0
## 33           0      1         1           0
## 34           0      0         0           0
## 35           1      0         1           1
## 36           0      0         0           0
## 37           0      1         0           0
## 38           0      0         1           0
## 39           0      1         0           1
## 40           0      0         0           0
## 41           0      1         0           0
## 42           0      0         1           0
## 43           1      1         0           0
## 44           1      1         1           0
## 45           0      1         0           1
## 46           0      1         0           0
## 47           0      1         0           0
## 48           1      0         0           0
## 49           1      1         1           1
## 50           0      1         1           1
## 51           0      1         1           1
## 52           1      0         1           1
## 53           1      0         1           0
## 54           1      1         0           1
## 55           0      0         1           1
## 56           0      0         0           0
## 57           1      1         1           0
## 58           0      1         0           0
## 59           1      1         0           1
## 60           0      0         1           1
## 61           0      1         0           0
## 62           0      1         0           0
## 63           0      0         0           1
## 64           0      1         0           1
## 65           0      0         0           1
## 66           0      0         0           0
## 67           0      1         0           0
## 68           0      0         0           0
## 69           1      1         0           0
## 70           0      0         0           1
## 71           0      1         0           0
## 72           0      1         1           0
## 73           1      0         0           0
## 74           0      0         0           1
## 75           1      1         0           1
## 76           0      0         0           0
## 77           0      0         0           1
## 78           0      0         0           0
## 79           0      0         0           0
## 80           0      0         1           0
## 81           1      1         0           1
## 82           0      1         0           1
## 83           1      1         1           0
## 84           0      0         0           1
## 85           1      0         0           0
## 86           0      1         0           0
## 87           1      1         0           1
## 88           0      0         0           1
## 89           0      1         0           0
## 90           0      1         0           1
## 91           0      1         0           1
## 92           1      0         1           0
## 93           0      0         0           1
## 94           0      1         1           0
## 95           1      0         0           1
## 96           1      0         0           0
## 97           0      0         0           0
## 98           1      1         1           0
## 99           0      0         0           0
## 100          0      0         0           0
## 101          1      0         0           0
## 102          1      1         1           1
## 103          0      0         0           0
## 104          0      0         0           1
## 105          0      1         0           0
## 106          0      0         0           0
## 107          0      0         0           1
## 108          1      1         0           0
## 109          0      1         0           0
## 110          0      0         1           0
## 111          1      0         0           0
## 112          0      1         0           0
## 113          1      0         0           1
## 114          1      1         1           0
## 115          0      0         0           0
## 116          1      0         0           1
## 117          1      1         0           0
## 118          1      1         0           1
## 119          0      1         1           0
## 120          0      0         0           0
## 121          0      1         0           0
## 122          0      1         0           0
## 123          0      0         1           1
## 124          0      1         0           0
## 125          0      1         1           0
## 126          0      1         0           0
## 127          1      0         0           0
## 128          0      0         1           0
## 129          0      0         0           0
## 130          1      1         1           1
## 131          0      1         1           1
## 132          0      0         1           0
## 133          0      1         0           0
## 134          0      1         1           0
## 135          0      0         0           0
## 136          0      0         0           0
## 137          0      0         0           1
## 138          0      1         0           0
## 139          1      0         1           1
## 140          1      1         0           0
## 141          0      0         1           1
## 142          1      1         0           1
## 143          0      0         0           0
## 144          0      0         0           0
## 145          0      1         0           1
## 146          1      1         1           1
## 147          0      0         0           1
## 148          0      1         0           0
## 149          0      0         0           0
## 150          0      0         0           0
## 151          1      1         0           0
## 152          0      0         0           1
## 153          1      0         1           0
## 154          1      1         1           1
## 155          1      1         0           1
## 156          1      1         0           0
## 157          1      1         0           0
## 158          1      0         0           1
## 159          0      1         1           0
## 160          0      1         1           0
## 161          0      0         0           1
## 162          1      1         0           1
## 163          0      1         1           1
## 164          1      1         0           0
## 165          0      1         0           1
## 166          1      1         0           0
## 167          0      0         0           0
## 168          1      0         1           0
## 169          1      0         0           1
## 170          0      0         1           0
## 171          1      0         0           1
## 172          0      1         0           0
## 173          0      0         0           0
## 174          1      0         1           1
## 175          0      1         0           1
## 176          0      0         0           1
## 177          0      1         0           0
## 178          0      1         0           0
## 179          1      1         0           1
## 180          1      1         0           1
## 181          0      0         1           0
## 182          0      1         1           1
## 183          1      1         0           1
## 184          1      0         1           0
## 185          1      1         0           0
## 186          0      1         0           1

#Filter out rows where length of residence > age

d1a <- filter(d1a, LenRes < Age)
d1a
##     SpendRat Age LenRes Income TotAsset SecAssets ShortLiq LongLiq WlthIdx
## 1      16.83  35      3      5      195        36      220     420     430
## 2      11.38  46      9      5      123        24      200     420     290
## 3      31.33  41      2      2      117        25      222     419     279
## 4       1.90  46      7      9      493       105      310     500     520
## 5      84.13  46     15      5      138        27      340     450     440
## 6       2.15  46     16      4      162        25      230     430     360
## 7      38.00  56     31      6      117        27      300     440     400
## 8     136.28  48      8      5      119        23      250     430     360
## 9      61.46  54      8      5       50        10      200     420     230
## 10      2.73  43     15      4      135        21      230     430     340
## 11      5.86  66      8      4       81        19      250     430     320
## 12    113.00  61      8      5      999       245      999     720     880
## 13      8.40  77      8      3      198        45      210     430     340
## 14     19.00  50      7      4      412        95      290     470     490
## 15     70.85  49     19      5      192        37      200     440     330
## 16     20.17  76     12      5       64        15      190     410     280
## 17      0.59  49      2      7      229        44      270     450     430
## 18     19.48  63      7      5       95        22      270     440     360
## 19     86.20  67     32      5       87        20      250     430     330
## 20     25.46  70     36      5      511        99      270     540     500
## 21     76.82  77     15      4      242        56      300     470     450
## 22      0.77  71      9      5       43        10      200     420     210
## 23    335.94  58      6      6       88        20      220     430     290
## 24     18.78  48      3      5      534        94      240     520     460
## 25    205.07  77      9      6      131        30      205     433     295
## 26    216.57  66      9      7       85        20      220     420     300
## 27     77.21  21      3      4      206        12      280     460     420
## 28      0.26  89     11      5       57        13      230     420     280
## 29     92.53  81     37      6      999       209      280     670     530
## 30     46.55  56     19      3      143        23      220     410     380
## 31      6.86  68     20      5      338        78      280     480     490
## 32     18.14  40      7      5      306        47      280     430     490
## 33    120.50  78      3      4      285        55      290     440     470
## 34     37.08  69      7      3      100        23      200     420     300
## 35     19.20  46     27      4       52        12      210     420     250
## 36     23.08  74     25      3       84        19      260     430     330
## 37    165.13  54      9      4      164        31      225     433     333
## 38      6.50  70      9      4      109        18      194     414     265
## 39      8.29  73      3      4       73        17      250     430     310
## 40      3.95  53      8      7      127        25      240     430     350
## 41     14.88  42      7      6      155        31      229     427     328
## 42    125.32  64     25      6      196        32      240     420     430
## 43     39.43  50      5      7      353        68      240     440     440
## 44     39.81  37     10      5      216        33      280     440     440
## 45      3.61  46     16      5      244        47      220     440     390
## 46     37.78  57     12      4      277        64      230     440     410
## 47      9.23  54      3      7      210        36      210     410     390
## 48     44.00  47     17      5      315        61      280     450     480
## 49     86.96  43     15      5       45         7      230     420     270
## 50      2.37  55      7      6      127        29      216     424     292
## 51      4.81  56     15      6      228        53      260     460     410
## 52      0.16  57     10      6      139        32      310     450     420
## 53     99.10  36      2      5       51         8      230     410     300
## 54     11.85  38      8      7      260        46      260     440     450
## 55     12.80  38      7      5      195        30      230     430     370
## 56     21.00  55     19      6      145        28      190     410     360
## 57     17.53  59      8      5      243        56      330     480     480
## 58     72.91  55     22      6      140        27      280     450     390
## 59    113.71  49     19      4      129        25      200     420     290
## 60     10.38  52      7      6       98        19      202     417     252
## 61      9.36  41      9      4      170        26      190     410     380
## 62    304.61  49      8      4      267        52      190     410     520
## 63     41.63  52     18      4      188        35      230     430     380
## 64      1.37  41      8      3      196        33      220     420     360
## 65     24.12  34     13      3      183        42      220     420     400
## 66     33.15  32      3      5       32         2      220     410     260
## 67     22.44  78     40      4       38         9      210     420     230
## 68     11.79  44      8      3      237        18      200     420     350
## 69      1.36  81      8      2       92        21      240     430     340
## 70      0.71  86     42      2       90        21      280     440     360
## 71     39.69  76      8      5      107        34      206     431     274
## 72      4.33  79     18      4       77        18      220     430     300
## 73      8.39  84     22      2       91        21      260     440     340
## 74     10.80  49     14      2      117        23      230     430     320
## 75      5.80  55     16      4      169        35      238     432     347
## 76      1.33  70     17      3       94        23      190     420     254
## 77      4.13  43     16      2      139        24      230     430     340
## 78     17.94  85     42      1      123        32      207     432     289
## 79      3.23  56     25      2      126        29      260     430     380
## 80     45.54  42      8      3      168        26      240     420     400
## 81      6.77  45      9      3       40         6      160     400     260
## 82      4.53  55      7      2      143        28      216     428     311
## 83     13.50  62     14      2       77        18      260     430     330
## 84    129.72  40      9      2       58         9      270     420     320
## 85     19.27  44     17      4      180        27      240     440     370
## 86    153.67  71     38      4       69        16      200     420     260
## 87     61.90  61     31      4      209        48      290     440     490
## 88    275.91  79     32      2       63        15      230     420     290
## 89      5.60  51     13      4      222        36      260     440     450
## 90    401.42  52     24      5      260        50      270     470     440
## 91     15.00  35      8      4      141         8      250     420     370
## 92      3.67  62      6      4      184        42      310     470     440
## 93      7.60  30      7      4       20         1      220     400     200
## 94      5.56  54     35      4      273        51      260     460     440
## 95     16.69  60     14      5      999       999      190     999     540
## 96      5.92  60     16      5       74        17      180     410     240
## 97     39.73  41     34      5       62         9      270     420     320
## 98      2.22  45      7      2      119        18      240     420     360
## 99      1.33  73      9      5      307        47      220     460     390
## 100     4.65  35     11      8      166         9      220     410     380
## 101     6.13  41     11      5      218        33      210     410     430
## 102    13.14  61     30      5      305        70      220     470     390
## 103    44.13  87      2      1       48        21      178     419     198
## 104    26.43  50      9      5      475        92      230     460     460
## 105     9.88  66      8      5      192        38      240     440     390
## 106    13.28  63     27      5      202        39      230     430     410
## 107    14.93  53     24      2      140        27      200     430     300
## 108    30.73  42      9      4      117        21      206     421     285
## 109     5.50  40      9      4      122        19      250     420     350
## 110    15.95  54     29      5      206        40      250     440     410
## 111     6.93  34     10      4       35         2      240     410     270
## 112    91.50  50      8      5       36         7      180     400     220
## 113    50.55  57      3      6      242        39      270     450     430
## 114    11.73  57     18      5      141        32      224     428     310
## 115     1.24  57      6      5      116        22      200     420     280
## 116     3.69  33     12      4       54         3      190     410     230
## 117    50.18  50     12      5      246        48      210     420     430
## 118    43.85  43      8      5      201        42      230     430     380
## 119     3.55  64      7      5       73        17      190     400     310
## 120    14.58  37      8      5      208        32      220     410     450
## 121     5.00  39     19      4      169        26      210     430     310
## 122    36.53  52     35      3       48         9      220     420     250
## 123    10.45  47     14      5      107        21      200     430     260
## 124    11.50  55     30      4       69        13      210     410     320
## 125    68.36  50      8      7       56        11      210     410     290
## 126     8.82  31     10      4      133        23      222     420     298
## 127     4.06  42      5      3      118        18      250     430     360
## 128     1.67  49     19      2      150        31      229     427     318
## 129    85.00  50     23      4      136        26      220     420     370
## 130    44.79  52     21      5      150        26      250     430     380
## 131    13.20  69     12      3      292        67      230     460     400
## 132    30.30  46     10      5      320        62      290     460     480
## 133     8.22  52     13      5      236        46      190     410     430
## 134     6.39  53     10      4      232        45      210     420     420
## 135     3.50  40      8      6      179        27      230     410     420
## 136    11.24  53     16      5      215        42      180     410     410
## 137     3.18  53     30      5       89        17      270     440     350
## 138    22.79  72     37      5       61         9      230     420     310
## 139    18.83  61     11      5      108        25      250     440     350
## 140    32.11  77      3      4       43        10      210     410     240
## 141    17.74  36     12      4      194        37      260     440     410
## 142    17.45  52     11      4      218        42      300     460     440
## 143     2.00  56      3      5      155        36      240     440     370
## 144    23.13  78      4      6      260        52      300     460     470
## 145    61.76  42      7      4      147        34      220     430     360
## 146     4.25  75     30      4      124        30      209     428     293
## 147    60.29  55     26      4      114         6      280     420     370
## 148    32.50  57     46      3       79        18      240     430     310
## 149     0.08  78      9      2       65        15      250     430     300
## 150     4.56  39      9      5      215        34      230     430     430
## 151    14.50  68     35      2       94        22      270     440     360
## 152    72.37  52     14      7      332        65      300     450     520
## 153    11.74  44     17      5      157        24      240     430     380
## 154   125.44  68     37      6      138        32      270     450     390
## 155    52.45  74     39      6      188        43      200     440     320
## 156    39.50  41      8      5      200        11      180     400     300
## 157    94.15  48      0      6      131        28      220     430     340
## 158    34.37  57     14      6      265        61      270     460     450
## 159   111.83  53      8      6      243        47      310     460     470
## 160    48.80  61      7      4      134        19      200     410     320
## 161   329.71  73     34      5       69        16      220     420     300
## 162    32.35  60     10      6      177        41      280     460     410
## 163    54.88  46     11      6      149        29      250     450     380
## 164    70.23  52     11      5      229        44      300     450     460
## 165    25.24  42     16      7      307        17      260     430     440
## 166     1.83  78     26      7        5         0      180     400      90
## 167     9.79  40     14      5      349        53      230     430     450
## 168   151.93  42      9      6      231        35      290     450     460
## 169   119.87  64     15      6      133        31      230     440     330
## 170     3.81  37     13      4      150        23      250     420     410
## 171    69.04  58     14      4      161        37      190     420     370
## 172    52.67  49      2      2       87        20      190     414     239
## 173    23.45  49     22      3      180        35      190     410     360
## 174    45.50  76     17      3       44        10      200     420     220
## 175    60.82  57     18      3      430        99      380     520     570
## 176     2.23  57     14      3      112        17      240     420     340
## 177    36.00  41     10      4      322        54      230     440     430
## 178   307.15  40     15      4      309        71      280     450     520
## 179    12.92  46      6      5      999       346      230     610     590
## 180   111.19  49     24      2      173        40      270     450     400
## 181     2.42  46      2      4      237        46      280     440     440
## 182    29.77  53     22      3      224        43      180     410     420
## 183    19.35  57     18      4      174        40      210     430     360
## 184     4.42  59      9      6      169        39      220     460     330
##     SpendVol SpenVel CollGifts BricMortar MarthaHome SunAds ThemeColl CustDec
## 1        690     570         0          1          1      0         0       1
## 2        600     280         1          0          0      1         1       1
## 3        543     308         1          0          0      1         1       0
## 4        680     100         0          1          1      0         0       1
## 5        440      50         0          1          1      0         0       1
## 6        690     180         1          0          0      1         0       0
## 7        500      10         1          1          1      0         1       1
## 8        610       0         1          0          1      0         1       1
## 9        660       0         0          1          0      0         0       0
## 10       610      50         1          1          0      1         1       0
## 11       220       0         0          0          1      1         0       1
## 12       570     220         1          0          0      1         1       0
## 13       630     300         0          0          0      0         0       0
## 14       770      30         1          0          1      1         1       1
## 15       620     170         1          0          0      1         1       0
## 16       570     610         1          1          1      1         0       1
## 17       710     130         0          0          0      0         0       0
## 18       340      30         1          0          1      1         1       1
## 19       380      20         0          1          1      0         0       1
## 20       680     160         0          0          1      1         0       1
## 21       660      20         1          0          1      1         0       1
## 22       300      10         0          0          0      0         1       0
## 23       490       0         1          1          1      1         1       0
## 24       730     140         1          0          1      0         1       1
## 25       458     188         1          1          1      1         0       1
## 26       590     200         1          1          1      1         1       0
## 27       690      20         1          1          0      0         0       0
## 28         0       0         0          0          0      0         0       0
## 29       700      80         1          0          1      0         1       0
## 30       620     380         0          1          1      0         0       1
## 31       660     230         1          0          0      1         1       0
## 32       750     240         0          1          1      0         0       0
## 33       690      80         0          1          1      1         1       0
## 34       560     450         1          0          1      0         0       0
## 35       260      10         1          0          0      1         1       0
## 36       260       0         0          1          0      0         0       0
## 37       534     238         1          1          1      0         0       1
## 38       599     312         0          0          1      0         0       1
## 39       250       0         0          0          0      1         1       0
## 40       710     100         0          0          0      0         0       0
## 41       564     251         0          0          1      0         1       0
## 42       640     380         0          0          0      1         1       0
## 43       730     190         1          0          1      0         0       0
## 44       700      90         1          0          0      1         0       1
## 45       660     300         0          0          0      0         0       0
## 46       660     220         0          1          1      0         0       1
## 47       650     400         1          0          1      1         0       1
## 48       670     210         1          0          1      1         1       1
## 49       480      70         1          0          1      1         1       0
## 50       526     238         0          1          0      1         0       0
## 51       630      10         1          0          0      1         0       1
## 52       370      30         0          0          0      1         0       1
## 53       610     330         1          0          1      1         0       1
## 54       670     240         1          0          0      0         1       1
## 55       740     110         1          0          1      0         1       1
## 56       670     750         1          0          0      0         0       0
## 57       600      40         0          0          1      0         1       0
## 58       590       0         1          1          1      0         1       1
## 59       590      90         1          0          1      1         1       0
## 60       506     308         1          0          0      0         0       0
## 61       720     750         0          0          0      1         1       0
## 62       650     900         1          0          0      1         1       0
## 63       650     160         1          1          0      0         0       0
## 64       660     210         1          0          0      1         1       0
## 65       650     570         0          0          0      0         0       0
## 66       590     310         0          0          0      0         0       0
## 67         0       0         0          1          0      0         0       0
## 68       680     250         0          0          0      0         1       0
## 69       480     150         0          0          0      1         1       0
## 70       180      10         0          0          0      0         0       0
## 71       376     132         0          1          0      0         0       0
## 72       490      70         0          0          0      1         0       1
## 73       240       0         1          0          0      1         1       0
## 74       600       0         1          0          0      1         1       1
## 75       552     230         0          0          1      0         0       0
## 76       489     218         1          0          0      1         0       0
## 77       640      90         0          0          0      0         0       0
## 78       437     166         0          0          0      0         0       0
## 79       560     140         0          0          0      0         0       0
## 80       630     360         0          0          0      0         1       0
## 81       560     999         0          0          0      1         1       0
## 82       516     241         0          1          1      0         0       0
## 83        80       0         1          0          1      1         1       0
## 84       350      30         1          1          1      0         0       1
## 85       640      60         0          0          1      0         0       0
## 86       460     130         1          1          1      0         0       1
## 87       610     430         1          0          1      1         0       0
## 88       360      30         0          0          0      0         0       0
## 89       690     280         1          1          0      0         1       0
## 90       620      80         1          1          1      0         1       1
## 91       710     100         0          0          0      1         0       1
## 92       570       0         1          0          0      0         1       0
## 93       560      10         0          1          0      0         0       0
## 94       610     150         1          0          0      1         0       1
## 95       620     490         0          0          0      1         1       1
## 96       590     580         0          0          0      0         0       0
## 97       580      10         0          1          0      1         0       1
## 98       670     130         0          0          0      0         0       0
## 99       680     130         0          0          0      0         0       0
## 100      650     450         1          0          0      1         1       0
## 101      700     660         1          0          1      1         1       1
## 102      700     150         0          1          0      0         0       1
## 103      362     205         1          0          0      1         0       0
## 104      740     220         0          1          0      0         0       0
## 105      610     310         0          0          0      0         1       1
## 106      620     330         1          0          1      1         1       1
## 107      610     130         0          0          0      1         0       1
## 108      552     304         0          1          0      0         0       0
## 109      660       0         0          0          0      0         0       0
## 110      600     210         0          0          1      1         1       1
## 111      480     210         0          0          0      0         0       0
## 112      570     680         1          0          0      1         1       1
## 113      660      40         1          0          0      1         0       1
## 114      513     219         0          0          0      1         0       0
## 115      580     140         1          0          1      1         0       1
## 116      580     230         0          0          0      0         0       0
## 117      680     600         1          1          0      1         1       0
## 118      690     220         1          0          0      0         0       0
## 119      750     690         0          0          0      0         0       0
## 120      640     640         0          1          1      0         0       0
## 121      780       0         0          0          0      0         0       0
## 122      130       0         0          0          0      1         1       0
## 123      630       0         0          0          0      0         0       0
## 124      520     510         0          0          0      0         0       0
## 125      540     480         0          1          1      0         0       1
## 126      584     334         0          0          0      0         0       0
## 127      610      50         0          1          0      0         0       0
## 128      553     254         0          0          0      0         0       0
## 129      620     390         1          0          0      1         1       0
## 130      600     190         1          0          0      1         1       1
## 131      620     100         0          0          0      0         0       0
## 132      680     130         1          0          0      0         1       1
## 133      710     750         0          0          1      0         0       0
## 134      720     580         1          0          0      0         0       0
## 135      680     450         0          0          0      0         0       0
## 136      720     840         0          0          0      0         0       0
## 137      230       0         0          0          1      0         0       0
## 138      540     240         1          0          0      1         1       0
## 139      430      70         0          0          0      1         1       1
## 140      460     110         1          0          0      1         1       0
## 141      640     140         1          0          0      1         1       0
## 142      700       0         0          0          0      0         0       0
## 143      600     120         0          0          0      0         0       0
## 144      640     130         1          1          0      1         0       0
## 145      620     300         1          0          1      1         0       1
## 146      480     191         0          0          0      1         1       0
## 147      580      10         0          0          0      0         0       0
## 148        0       0         0          0          0      0         0       0
## 149       80       0         0          0          0      0         0       0
## 150      670     420         0          1          0      0         1       0
## 151      360      40         1          0          1      1         0       0
## 152      660     260         1          0          1      1         1       1
## 153      670     110         1          0          0      1         0       1
## 154      550      40         1          1          0      1         1       1
## 155      570     250         1          0          0      0         0       0
## 156      700     560         1          0          0      0         1       0
## 157      540     120         1          1          1      1         0       1
## 158      670     120         1          0          1      0         0       0
## 159      620     100         1          1          1      0         1       1
## 160      640     330         1          0          0      1         1       0
## 161      510     200         1          1          1      1         1       1
## 162      500      60         1          0          1      1         0       0
## 163      590     100         1          1          1      0         0       0
## 164      650      50         0          1          0      1         1       0
## 165      720      40         1          1          1      0         0       0
## 166      350     630         0          0          0      0         0       0
## 167      740     370         0          1          0      0         0       0
## 168      660     190         1          1          1      1         0       1
## 169      580       0         1          0          1      0         1       1
## 170      650     310         0          0          0      1         1       1
## 171      590     650         0          1          1      0         1       0
## 172      511     266         1          0          0      0         1       0
## 173      630     650         1          1          0      0         0       0
## 174      170       0         0          1          1      0         0       0
## 175      690      80         0          1          1      0         0       0
## 176      630      40         0          0          0      0         0       0
## 177      680     330         0          1          1      0         0       1
## 178      670     420         1          0          1      1         1       1
## 179      650     540         0          0          0      0         0       0
## 180      640      70         1          0          0      1         1       1
## 181      650      20         1          0          0      1         1       1
## 182      670     880         1          0          0      0         0       0
## 183      590     370         0          0          0      0         0       0
## 184      540       0         1          0          0      0         1       0
##     RetailKids TeenWr Carlovers CountryColl
## 1            1      0         0           0
## 2            1      0         0           1
## 3            0      0         0           1
## 4            0      0         0           0
## 5            0      0         1           0
## 6            0      0         0           1
## 7            1      1         1           0
## 8            0      0         0           1
## 9            0      1         0           0
## 10           0      1         0           1
## 11           1      0         0           0
## 12           1      1         0           1
## 13           0      0         0           1
## 14           0      0         0           1
## 15           0      1         1           1
## 16           1      0         0           1
## 17           0      0         0           0
## 18           1      0         0           1
## 19           0      1         0           0
## 20           1      0         0           0
## 21           0      1         0           1
## 22           0      1         0           0
## 23           1      1         1           1
## 24           0      1         0           0
## 25           0      0         1           1
## 26           1      1         1           1
## 27           0      1         1           1
## 28           0      0         0           0
## 29           1      1         1           1
## 30           0      1         0           0
## 31           0      1         0           1
## 32           0      1         0           0
## 33           0      1         1           0
## 34           0      0         0           0
## 35           1      0         1           1
## 36           0      0         0           0
## 37           0      1         0           0
## 38           0      0         1           0
## 39           0      1         0           1
## 40           0      0         0           0
## 41           0      1         0           0
## 42           0      0         1           0
## 43           1      1         0           0
## 44           1      1         1           0
## 45           0      1         0           0
## 46           0      1         0           0
## 47           1      0         0           0
## 48           1      1         1           1
## 49           0      1         1           1
## 50           0      1         1           1
## 51           1      0         1           1
## 52           1      0         1           0
## 53           1      1         0           1
## 54           0      0         1           1
## 55           0      0         0           0
## 56           1      1         1           0
## 57           0      1         0           0
## 58           1      1         0           1
## 59           0      0         1           1
## 60           0      1         0           0
## 61           0      1         0           0
## 62           0      0         0           1
## 63           0      1         0           1
## 64           0      0         0           1
## 65           0      0         0           0
## 66           0      1         0           0
## 67           0      0         0           0
## 68           1      1         0           0
## 69           0      0         0           1
## 70           0      1         0           0
## 71           0      1         1           0
## 72           1      0         0           0
## 73           0      0         0           1
## 74           1      1         0           1
## 75           0      0         0           0
## 76           0      0         0           1
## 77           0      0         0           0
## 78           0      0         0           0
## 79           0      0         1           0
## 80           1      1         0           1
## 81           0      1         0           1
## 82           1      1         1           0
## 83           0      0         0           1
## 84           1      0         0           0
## 85           0      1         0           0
## 86           1      1         0           1
## 87           0      0         0           1
## 88           0      1         0           0
## 89           0      1         0           1
## 90           0      1         0           1
## 91           1      0         1           0
## 92           0      0         0           1
## 93           0      1         1           0
## 94           1      0         0           1
## 95           1      0         0           0
## 96           0      0         0           0
## 97           1      1         1           0
## 98           0      0         0           0
## 99           0      0         0           0
## 100          1      0         0           0
## 101          1      1         1           1
## 102          0      0         0           0
## 103          0      0         0           1
## 104          0      1         0           0
## 105          0      0         0           0
## 106          0      0         0           1
## 107          1      1         0           0
## 108          0      1         0           0
## 109          0      0         1           0
## 110          1      0         0           0
## 111          0      1         0           0
## 112          1      0         0           1
## 113          1      1         1           0
## 114          0      0         0           0
## 115          1      0         0           1
## 116          1      1         0           0
## 117          1      1         0           1
## 118          0      1         1           0
## 119          0      0         0           0
## 120          0      1         0           0
## 121          0      1         0           0
## 122          0      0         1           1
## 123          0      1         0           0
## 124          0      1         1           0
## 125          0      1         0           0
## 126          1      0         0           0
## 127          0      0         1           0
## 128          0      0         0           0
## 129          1      1         1           1
## 130          0      1         1           1
## 131          0      0         1           0
## 132          0      1         0           0
## 133          0      1         1           0
## 134          0      0         0           0
## 135          0      0         0           0
## 136          0      0         0           1
## 137          0      1         0           0
## 138          1      0         1           1
## 139          1      1         0           0
## 140          0      0         1           1
## 141          1      1         0           1
## 142          0      0         0           0
## 143          0      0         0           0
## 144          0      1         0           1
## 145          1      1         1           1
## 146          0      0         0           1
## 147          0      1         0           0
## 148          0      0         0           0
## 149          0      0         0           0
## 150          1      1         0           0
## 151          0      0         0           1
## 152          1      0         1           0
## 153          1      1         1           1
## 154          1      1         0           1
## 155          1      1         0           0
## 156          1      1         0           0
## 157          1      0         0           1
## 158          0      1         1           0
## 159          0      1         1           0
## 160          0      0         0           1
## 161          1      1         0           1
## 162          0      1         1           1
## 163          1      1         0           0
## 164          0      1         0           1
## 165          1      1         0           0
## 166          0      0         0           0
## 167          1      0         1           0
## 168          1      0         0           1
## 169          0      0         1           0
## 170          1      0         0           1
## 171          0      1         0           0
## 172          0      0         0           0
## 173          0      1         0           1
## 174          0      0         0           1
## 175          0      1         0           0
## 176          0      1         0           0
## 177          1      1         0           1
## 178          1      1         0           1
## 179          0      0         1           0
## 180          0      1         1           1
## 181          1      1         0           1
## 182          1      0         1           0
## 183          1      1         0           0
## 184          0      1         0           1

Data cleaning We have to filter out rows where the age of the customer is less than 18 because of the recent enactement of the SCOPE Act in the state of Texas. (There are other states (i.e. Utah, Arkansas) that have also enacted similar laws.)Effective September 1, 2024, the Scope Act requires digital service providers, such as companies that own websites, apps, and software, to protect minor children (under 18) from harmful content and data collection practices. This new law will primarily apply to digital services that provide an online platform for social interaction between users that: (1) allow users to create a public or semi-public profile to use the service, and (2) allow users to create or post content that can be viewed by other users of the service. This includes digital services such as message boards, chat rooms, video channels, or a main feed that presents users content created and posted by other users. On a personl note,as a mom of a 13 and 15 year old, I agree. I don’t like some of the You Tube ads and shorts that are spliced in-between other videos and shorts they are watching. There’s a difference in what is (I think) appropriate for a 13 year old versus a 17 year old. But I feel like You Tube puts them in the same category. Just because they play the same Fortnight videos doesn’t mean they should see the same ads. The length of residence has to be less than or equal to the age of the customer because of common sense. A customer can’t have lived at the residence longer than they’ve been alive.

Basic Summary Provide a basic summary of the cleaned data set. Include a table of univariate statistics to summarize each variable. Choose meaningful summary statistics for each type of variable. You should also include a basic summary of the catalog spending (SpendRat) including an appropriate graphical display. Structure

str(d1a)
## 'data.frame':    184 obs. of  21 variables:
##  $ SpendRat   : num  16.8 11.4 31.3 1.9 84.1 ...
##  $ Age        : int  35 46 41 46 46 46 56 48 54 43 ...
##  $ LenRes     : int  3 9 2 7 15 16 31 8 8 15 ...
##  $ Income     : int  5 5 2 9 5 4 6 5 5 4 ...
##  $ TotAsset   : int  195 123 117 493 138 162 117 119 50 135 ...
##  $ SecAssets  : int  36 24 25 105 27 25 27 23 10 21 ...
##  $ ShortLiq   : int  220 200 222 310 340 230 300 250 200 230 ...
##  $ LongLiq    : int  420 420 419 500 450 430 440 430 420 430 ...
##  $ WlthIdx    : int  430 290 279 520 440 360 400 360 230 340 ...
##  $ SpendVol   : int  690 600 543 680 440 690 500 610 660 610 ...
##  $ SpenVel    : int  570 280 308 100 50 180 10 0 0 50 ...
##  $ CollGifts  : int  0 1 1 0 0 1 1 1 0 1 ...
##  $ BricMortar : int  1 0 0 1 1 0 1 0 1 1 ...
##  $ MarthaHome : int  1 0 0 1 1 0 1 1 0 0 ...
##  $ SunAds     : int  0 1 1 0 0 1 0 0 0 1 ...
##  $ ThemeColl  : int  0 1 1 0 0 0 1 1 0 1 ...
##  $ CustDec    : int  1 1 0 1 1 0 1 1 0 0 ...
##  $ RetailKids : int  1 1 0 0 0 0 1 0 0 0 ...
##  $ TeenWr     : int  0 0 0 0 0 0 1 0 1 1 ...
##  $ Carlovers  : int  0 0 0 0 1 0 1 0 0 0 ...
##  $ CountryColl: int  0 1 1 0 0 1 0 1 0 1 ...

Summary of Cleaned Dataset

dim(d1a)
## [1] 184  21
pacman::p_load("skimr")
skim(d1a)
Data summary
Name d1a
Number of rows 184
Number of columns 21
_______________________
Column type frequency:
numeric 21
________________________
Group variables None

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
SpendRat 0 1 43.79 66.10 0.08 6.08 18.8 50.27 401.42 ▇▁▁▁▁
Age 0 1 54.71 13.64 21.00 44.75 53.0 63.00 89.00 ▁▆▇▃▂
LenRes 0 1 14.58 9.94 0.00 8.00 11.0 19.00 46.00 ▇▆▂▂▁
Income 0 1 4.47 1.40 1.00 4.00 5.0 5.00 9.00 ▂▇▇▅▁
TotAsset 0 1 184.67 155.01 5.00 94.75 150.0 222.50 999.00 ▇▃▁▁▁
SecAssets 0 1 40.90 79.83 0.00 19.00 28.0 42.00 999.00 ▇▁▁▁▁
ShortLiq 0 1 240.64 66.92 160.00 210.00 230.0 260.00 999.00 ▇▁▁▁▁
LongLiq 0 1 439.49 55.63 400.00 420.00 430.0 440.00 999.00 ▇▁▁▁▁
WlthIdx 0 1 367.12 90.04 90.00 300.00 360.0 430.00 880.00 ▁▇▅▁▁
SpendVol 0 1 568.40 154.00 0.00 532.00 610.0 670.00 780.00 ▁▁▂▇▇
SpenVel 0 1 219.52 217.31 0.00 40.00 160.0 310.00 999.00 ▇▅▁▁▁
CollGifts 0 1 0.49 0.50 0.00 0.00 0.0 1.00 1.00 ▇▁▁▁▇
BricMortar 0 1 0.29 0.45 0.00 0.00 0.0 1.00 1.00 ▇▁▁▁▃
MarthaHome 0 1 0.36 0.48 0.00 0.00 0.0 1.00 1.00 ▇▁▁▁▅
SunAds 0 1 0.43 0.50 0.00 0.00 0.0 1.00 1.00 ▇▁▁▁▆
ThemeColl 0 1 0.40 0.49 0.00 0.00 0.0 1.00 1.00 ▇▁▁▁▅
CustDec 0 1 0.35 0.48 0.00 0.00 0.0 1.00 1.00 ▇▁▁▁▅
RetailKids 0 1 0.35 0.48 0.00 0.00 0.0 1.00 1.00 ▇▁▁▁▅
TeenWr 0 1 0.52 0.50 0.00 0.00 1.0 1.00 1.00 ▇▁▁▁▇
Carlovers 0 1 0.28 0.45 0.00 0.00 0.0 1.00 1.00 ▇▁▁▁▃
CountryColl 0 1 0.42 0.49 0.00 0.00 0.0 1.00 1.00 ▇▁▁▁▆
pacman::p_load(summarytools)
d1b <- summarytools::descr(d1a)
view(d1b)
## Switching method to 'browser'
## Output file written: C:\Users\aliso\AppData\Local\Temp\RtmpQrnDjK\file7de45d766736.html
pacman::p_load(psych)

psych::describe(d1a)
##             vars   n   mean     sd median trimmed    mad    min    max  range
## SpendRat       1 184  43.79  66.10   18.8   29.17  22.57   0.08 401.42 401.34
## Age            2 184  54.71  13.64   53.0   54.05  13.34  21.00  89.00  68.00
## LenRes         3 184  14.58   9.94   11.0   13.34   5.93   0.00  46.00  46.00
## Income         4 184   4.47   1.40    5.0    4.50   1.48   1.00   9.00   8.00
## TotAsset       5 184 184.67 155.01  150.0  160.48  91.18   5.00 999.00 994.00
## SecAssets      6 184  40.90  79.83   28.0   30.30  16.31   0.00 999.00 999.00
## ShortLiq       7 184 240.64  66.92  230.0  234.53  34.84 160.00 999.00 839.00
## LongLiq        8 184 439.49  55.63  430.0  430.93  14.83 400.00 999.00 599.00
## WlthIdx        9 184 367.12  90.04  360.0  364.30  88.96  90.00 880.00 790.00
## SpendVol      10 184 568.40 154.00  610.0  594.56  88.96   0.00 780.00 780.00
## SpenVel       11 184 219.52 217.31  160.0  186.17 192.74   0.00 999.00 999.00
## CollGifts     12 184   0.49   0.50    0.0    0.49   0.00   0.00   1.00   1.00
## BricMortar    13 184   0.29   0.45    0.0    0.24   0.00   0.00   1.00   1.00
## MarthaHome    14 184   0.36   0.48    0.0    0.33   0.00   0.00   1.00   1.00
## SunAds        15 184   0.43   0.50    0.0    0.41   0.00   0.00   1.00   1.00
## ThemeColl     16 184   0.40   0.49    0.0    0.37   0.00   0.00   1.00   1.00
## CustDec       17 184   0.35   0.48    0.0    0.31   0.00   0.00   1.00   1.00
## RetailKids    18 184   0.35   0.48    0.0    0.32   0.00   0.00   1.00   1.00
## TeenWr        19 184   0.52   0.50    1.0    0.52   0.00   0.00   1.00   1.00
## Carlovers     20 184   0.28   0.45    0.0    0.22   0.00   0.00   1.00   1.00
## CountryColl   21 184   0.42   0.49    0.0    0.40   0.00   0.00   1.00   1.00
##              skew kurtosis    se
## SpendRat     2.96     9.91  4.87
## Age          0.43    -0.45  1.01
## LenRes       1.11     0.48  0.73
## Income      -0.05     0.10  0.10
## TotAsset     3.29    14.19 11.43
## SecAssets    9.84   111.25  5.89
## ShortLiq     7.97    86.90  4.93
## LongLiq      6.81    58.36  4.10
## WlthIdx      0.95     4.74  6.64
## SpendVol    -1.74     3.03 11.35
## SpenVel      1.26     1.12 16.02
## CollGifts    0.04    -2.01  0.04
## BricMortar   0.93    -1.14  0.03
## MarthaHome   0.56    -1.70  0.04
## SunAds       0.28    -1.93  0.04
## ThemeColl    0.42    -1.83  0.04
## CustDec      0.63    -1.61  0.04
## RetailKids   0.61    -1.64  0.04
## TeenWr      -0.06    -2.01  0.04
## Carlovers    0.99    -1.03  0.03
## CountryColl  0.33    -1.90  0.04

Summary statistics for each variable

summary(d1a)
##     SpendRat            Age            LenRes          Income     
##  Min.   :  0.080   Min.   :21.00   Min.   : 0.00   Min.   :1.000  
##  1st Qu.:  6.077   1st Qu.:44.75   1st Qu.: 8.00   1st Qu.:4.000  
##  Median : 18.805   Median :53.00   Median :11.00   Median :5.000  
##  Mean   : 43.792   Mean   :54.71   Mean   :14.58   Mean   :4.473  
##  3rd Qu.: 50.273   3rd Qu.:63.00   3rd Qu.:19.00   3rd Qu.:5.000  
##  Max.   :401.420   Max.   :89.00   Max.   :46.00   Max.   :9.000  
##     TotAsset        SecAssets        ShortLiq        LongLiq     
##  Min.   :  5.00   Min.   :  0.0   Min.   :160.0   Min.   :400.0  
##  1st Qu.: 94.75   1st Qu.: 19.0   1st Qu.:210.0   1st Qu.:420.0  
##  Median :150.00   Median : 28.0   Median :230.0   Median :430.0  
##  Mean   :184.67   Mean   : 40.9   Mean   :240.6   Mean   :439.5  
##  3rd Qu.:222.50   3rd Qu.: 42.0   3rd Qu.:260.0   3rd Qu.:440.0  
##  Max.   :999.00   Max.   :999.0   Max.   :999.0   Max.   :999.0  
##     WlthIdx         SpendVol        SpenVel        CollGifts     
##  Min.   : 90.0   Min.   :  0.0   Min.   :  0.0   Min.   :0.0000  
##  1st Qu.:300.0   1st Qu.:532.0   1st Qu.: 40.0   1st Qu.:0.0000  
##  Median :360.0   Median :610.0   Median :160.0   Median :0.0000  
##  Mean   :367.1   Mean   :568.4   Mean   :219.5   Mean   :0.4891  
##  3rd Qu.:430.0   3rd Qu.:670.0   3rd Qu.:310.0   3rd Qu.:1.0000  
##  Max.   :880.0   Max.   :780.0   Max.   :999.0   Max.   :1.0000  
##    BricMortar      MarthaHome         SunAds         ThemeColl     
##  Min.   :0.000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.000   Median :0.0000   Median :0.0000   Median :0.0000  
##  Mean   :0.288   Mean   :0.3641   Mean   :0.4293   Mean   :0.3967  
##  3rd Qu.:1.000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##     CustDec         RetailKids         TeenWr         Carlovers     
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000   Median :1.0000   Median :0.0000  
##  Mean   :0.3478   Mean   :0.3533   Mean   :0.5163   Mean   :0.2772  
##  3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000   3rd Qu.:1.0000  
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000  
##   CountryColl    
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :0.0000  
##  Mean   :0.4185  
##  3rd Qu.:1.0000  
##  Max.   :1.0000
library(ISLR)
?d1a
## No documentation for 'd1a' in specified packages and libraries:
## you could try '??d1a'

Descriptive statistic by each Catalog purchase category

tab_outcome <- d1a |>
  tbl_summary(
    by = CollGifts, 
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                     all_categorical() ~ "{n} / {N} ({p}%)"),
    digits = all_continuous() ~ 2) %>%
  modify_caption("CollGifts Non-customer & Customer Characteristics (N = {N})")

tab_outcome |>
  as_gt()
CollGifts Non-customer & Customer Characteristics (N = 184)
Characteristic 0
N = 94
1
1
N = 90
1
SpendRat 22.91 (36.37) 65.60 (81.61)
Age 55.19 (14.57) 54.21 (12.67)
LenRes 15.02 (10.09) 14.11 (9.82)
Income

    1 1 / 94 (1.1%) 1 / 90 (1.1%)
    2 10 / 94 (11%) 8 / 90 (8.9%)
    3 14 / 94 (15%) 5 / 90 (5.6%)
    4 26 / 94 (28%) 23 / 90 (26%)
    5 33 / 94 (35%) 26 / 90 (29%)
    6 5 / 94 (5.3%) 20 / 90 (22%)
    7 4 / 94 (4.3%) 6 / 90 (6.7%)
    8 0 / 94 (0%) 1 / 90 (1.1%)
    9 1 / 94 (1.1%) 0 / 90 (0%)
TotAsset 175.22 (158.84) 194.54 (151.16)
SecAssets 43.70 (106.85) 37.98 (34.04)
ShortLiq 233.89 (36.67) 247.68 (87.78)
LongLiq 438.63 (65.22) 440.39 (43.78)
WlthIdx 355.32 (84.95) 379.44 (93.96)
SpendVol 547.15 (181.23) 590.59 (116.07)
SpenVel 225.33 (226.27) 213.46 (208.63)
BricMortar 29 / 94 (31%) 24 / 90 (27%)
MarthaHome 25 / 94 (27%) 42 / 90 (47%)
SunAds 22 / 94 (23%) 57 / 90 (63%)
ThemeColl 21 / 94 (22%) 52 / 90 (58%)
CustDec 22 / 94 (23%) 42 / 90 (47%)
RetailKids 21 / 94 (22%) 44 / 90 (49%)
TeenWr 45 / 94 (48%) 50 / 90 (56%)
Carlovers 20 / 94 (21%) 31 / 90 (34%)
CountryColl 13 / 94 (14%) 64 / 90 (71%)
1 Mean (SD); n / N (%)
tab_outcome <- d1a |>
  tbl_summary(
    by = BricMortar, 
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                     all_categorical() ~ "{n} / {N} ({p}%)"),
    digits = all_continuous() ~ 2) %>%
  modify_caption("BricMortar Non-customer & Customer Characteristics (N = {N})")

tab_outcome |>
  as_gt()
BricMortar Non-customer & Customer Characteristics (N = 184)
Characteristic 0
N = 131
1
1
N = 53
1
SpendRat 31.57 (50.78) 74.00 (87.31)
Age 55.06 (13.63) 53.85 (13.77)
LenRes 14.62 (9.91) 14.47 (10.13)
Income

    1 2 / 131 (1.5%) 0 / 53 (0%)
    2 16 / 131 (12%) 2 / 53 (3.8%)
    3 13 / 131 (9.9%) 6 / 53 (11%)
    4 36 / 131 (27%) 13 / 53 (25%)
    5 42 / 131 (32%) 17 / 53 (32%)
    6 14 / 131 (11%) 11 / 53 (21%)
    7 7 / 131 (5.3%) 3 / 53 (5.7%)
    8 1 / 131 (0.8%) 0 / 53 (0%)
    9 0 / 131 (0%) 1 / 53 (1.9%)
TotAsset 187.14 (170.17) 178.58 (110.07)
SecAssets 43.87 (93.51) 33.57 (22.15)
ShortLiq 239.26 (74.95) 244.04 (41.24)
LongLiq 441.19 (64.43) 435.28 (22.08)
WlthIdx 364.81 (92.78) 372.83 (83.46)
SpendVol 567.05 (158.37) 571.74 (144.05)
SpenVel 230.47 (225.59) 192.47 (194.71)
CollGifts 66 / 131 (50%) 24 / 53 (45%)
MarthaHome 36 / 131 (27%) 31 / 53 (58%)
SunAds 64 / 131 (49%) 15 / 53 (28%)
ThemeColl 58 / 131 (44%) 15 / 53 (28%)
CustDec 41 / 131 (31%) 23 / 53 (43%)
RetailKids 45 / 131 (34%) 20 / 53 (38%)
TeenWr 56 / 131 (43%) 39 / 53 (74%)
Carlovers 36 / 131 (27%) 15 / 53 (28%)
CountryColl 55 / 131 (42%) 22 / 53 (42%)
1 Mean (SD); n / N (%)
tab_outcome <- d1a |>
  tbl_summary(
    by = MarthaHome, 
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                     all_categorical() ~ "{n} / {N} ({p}%)"),
    digits = all_continuous() ~ 2) %>%
  modify_caption("MarthaHome Non-customer & Customer Characteristics (N = {N})")

tab_outcome |>
  as_gt()
MarthaHome Non-customer & Customer Characteristics (N = 184)
Characteristic 0
N = 117
1
1
N = 67
1
SpendRat 27.13 (44.11) 72.89 (85.69)
Age 54.48 (14.64) 55.12 (11.80)
LenRes 14.79 (10.14) 14.19 (9.65)
Income

    1 2 / 117 (1.7%) 0 / 67 (0%)
    2 14 / 117 (12%) 4 / 67 (6.0%)
    3 15 / 117 (13%) 4 / 67 (6.0%)
    4 33 / 117 (28%) 16 / 67 (24%)
    5 36 / 117 (31%) 23 / 67 (34%)
    6 12 / 117 (10%) 13 / 67 (19%)
    7 4 / 117 (3.4%) 6 / 67 (9.0%)
    8 1 / 117 (0.9%) 0 / 67 (0%)
    9 0 / 117 (0%) 1 / 67 (1.5%)
TotAsset 173.75 (156.64) 203.75 (151.40)
SecAssets 41.29 (97.44) 40.22 (31.34)
ShortLiq 235.98 (78.01) 248.76 (40.21)
LongLiq 437.77 (63.41) 442.49 (38.72)
WlthIdx 355.22 (93.29) 387.90 (80.58)
SpendVol 565.49 (161.73) 573.48 (140.53)
SpenVel 232.07 (230.79) 197.61 (191.20)
CollGifts 48 / 117 (41%) 42 / 67 (63%)
BricMortar 22 / 117 (19%) 31 / 67 (46%)
SunAds 50 / 117 (43%) 29 / 67 (43%)
ThemeColl 46 / 117 (39%) 27 / 67 (40%)
CustDec 24 / 117 (21%) 40 / 67 (60%)
RetailKids 36 / 117 (31%) 29 / 67 (43%)
TeenWr 56 / 117 (48%) 39 / 67 (58%)
Carlovers 31 / 117 (26%) 20 / 67 (30%)
CountryColl 47 / 117 (40%) 30 / 67 (45%)
1 Mean (SD); n / N (%)
tab_outcome <- d1a |>
  tbl_summary(
    by = SunAds, 
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                     all_categorical() ~ "{n} / {N} ({p}%)"),
    digits = all_continuous() ~ 2) %>%
  modify_caption("SunAds Non-customer & Customer Characteristics (N = {N})")

tab_outcome |>
  as_gt()
SunAds Non-customer & Customer Characteristics (N = 184)
Characteristic 0
N = 105
1
1
N = 79
1
SpendRat 35.18 (55.44) 55.24 (76.94)
Age 53.56 (13.80) 56.24 (13.37)
LenRes 14.45 (9.97) 14.75 (9.97)
Income

    1 1 / 105 (1.0%) 1 / 79 (1.3%)
    2 10 / 105 (9.5%) 8 / 79 (10%)
    3 15 / 105 (14%) 4 / 79 (5.1%)
    4 25 / 105 (24%) 24 / 79 (30%)
    5 33 / 105 (31%) 26 / 79 (33%)
    6 13 / 105 (12%) 12 / 79 (15%)
    7 7 / 105 (6.7%) 3 / 79 (3.8%)
    8 0 / 105 (0%) 1 / 79 (1.3%)
    9 1 / 105 (1.0%) 0 / 79 (0%)
TotAsset 187.06 (151.81) 181.51 (160.10)
SecAssets 36.74 (40.52) 46.43 (112.74)
ShortLiq 234.90 (37.27) 248.27 (92.48)
LongLiq 436.44 (36.31) 443.54 (73.98)
WlthIdx 362.28 (83.01) 373.56 (98.79)
SpendVol 574.23 (165.58) 560.65 (137.78)
SpenVel 225.30 (222.36) 211.85 (211.57)
CollGifts 33 / 105 (31%) 57 / 79 (72%)
BricMortar 38 / 105 (36%) 15 / 79 (19%)
MarthaHome 38 / 105 (36%) 29 / 79 (37%)
ThemeColl 24 / 105 (23%) 49 / 79 (62%)
CustDec 24 / 105 (23%) 40 / 79 (51%)
RetailKids 22 / 105 (21%) 43 / 79 (54%)
TeenWr 59 / 105 (56%) 36 / 79 (46%)
Carlovers 23 / 105 (22%) 28 / 79 (35%)
CountryColl 17 / 105 (16%) 60 / 79 (76%)
1 Mean (SD); n / N (%)
tab_outcome <- d1a |>
  tbl_summary(
    by = ThemeColl, 
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                     all_categorical() ~ "{n} / {N} ({p}%)"),
    digits = all_continuous() ~ 2) %>%
  modify_caption("ThemeColl Non-customer & Customer Characteristics (N = {N})")

tab_outcome |>
  as_gt()
ThemeColl Non-customer & Customer Characteristics (N = 184)
Characteristic 0
N = 111
1
1
N = 73
1
SpendRat 31.52 (43.96) 62.46 (86.98)
Age 54.97 (14.55) 54.32 (12.22)
LenRes 14.92 (10.38) 14.05 (9.29)
Income

    1 2 / 111 (1.8%) 0 / 73 (0%)
    2 11 / 111 (9.9%) 7 / 73 (9.6%)
    3 14 / 111 (13%) 5 / 73 (6.8%)
    4 30 / 111 (27%) 19 / 73 (26%)
    5 31 / 111 (28%) 28 / 73 (38%)
    6 15 / 111 (14%) 10 / 73 (14%)
    7 7 / 111 (6.3%) 3 / 73 (4.1%)
    8 0 / 111 (0%) 1 / 73 (1.4%)
    9 1 / 111 (0.9%) 0 / 73 (0%)
TotAsset 170.69 (126.69) 205.93 (189.19)
SecAssets 34.08 (36.08) 51.27 (118.45)
ShortLiq 234.49 (36.12) 249.99 (96.14)
LongLiq 433.50 (27.66) 448.59 (80.98)
WlthIdx 356.50 (81.11) 383.27 (100.58)
SpendVol 563.67 (165.52) 575.59 (135.42)
SpenVel 218.80 (212.93) 220.62 (225.29)
CollGifts 38 / 111 (34%) 52 / 73 (71%)
BricMortar 38 / 111 (34%) 15 / 73 (21%)
MarthaHome 40 / 111 (36%) 27 / 73 (37%)
SunAds 30 / 111 (27%) 49 / 73 (67%)
CustDec 34 / 111 (31%) 30 / 73 (41%)
RetailKids 34 / 111 (31%) 31 / 73 (42%)
TeenWr 53 / 111 (48%) 42 / 73 (58%)
Carlovers 29 / 111 (26%) 22 / 73 (30%)
CountryColl 27 / 111 (24%) 50 / 73 (68%)
1 Mean (SD); n / N (%)
tab_outcome <- d1a |>
  tbl_summary(
    by = CustDec, 
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                     all_categorical() ~ "{n} / {N} ({p}%)"),
    digits = all_continuous() ~ 2) %>%
  modify_caption("CustDec Non-customer & Customer Characteristics (N = {N})")

tab_outcome |>
  as_gt()
CustDec Non-customer & Customer Characteristics (N = 184)
Characteristic 0
N = 120
1
1
N = 64
1
SpendRat 33.72 (54.75) 62.67 (80.46)
Age 55.42 (14.63) 53.39 (11.56)
LenRes 14.52 (9.94) 14.69 (10.04)
Income

    1 2 / 120 (1.7%) 0 / 64 (0%)
    2 14 / 120 (12%) 4 / 64 (6.3%)
    3 18 / 120 (15%) 1 / 64 (1.6%)
    4 34 / 120 (28%) 15 / 64 (23%)
    5 31 / 120 (26%) 28 / 64 (44%)
    6 14 / 120 (12%) 11 / 64 (17%)
    7 6 / 120 (5.0%) 4 / 64 (6.3%)
    8 1 / 120 (0.8%) 0 / 64 (0%)
    9 0 / 120 (0%) 1 / 64 (1.6%)
TotAsset 175.60 (157.85) 201.69 (149.28)
SecAssets 35.03 (42.16) 51.91 (122.34)
ShortLiq 236.78 (78.51) 247.88 (35.86)
LongLiq 434.73 (42.18) 448.42 (74.19)
WlthIdx 356.39 (96.25) 387.23 (73.62)
SpendVol 554.78 (173.19) 593.92 (105.92)
SpenVel 238.03 (236.28) 184.81 (172.75)
CollGifts 48 / 120 (40%) 42 / 64 (66%)
BricMortar 30 / 120 (25%) 23 / 64 (36%)
MarthaHome 27 / 120 (23%) 40 / 64 (63%)
SunAds 39 / 120 (33%) 40 / 64 (63%)
ThemeColl 43 / 120 (36%) 30 / 64 (47%)
RetailKids 25 / 120 (21%) 40 / 64 (63%)
TeenWr 64 / 120 (53%) 31 / 64 (48%)
Carlovers 31 / 120 (26%) 20 / 64 (31%)
CountryColl 45 / 120 (38%) 32 / 64 (50%)
1 Mean (SD); n / N (%)
tab_outcome <- d1a |>
  tbl_summary(
    by = RetailKids, 
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                     all_categorical() ~ "{n} / {N} ({p}%)"),
    digits = all_continuous() ~ 2) %>%
  modify_caption("RetailKids Non-customer & Customer Characteristics (N = {N})")

tab_outcome |>
  as_gt()
RetailKids Non-customer & Customer Characteristics (N = 184)
Characteristic 0
N = 119
1
1
N = 65
1
SpendRat 38.25 (60.70) 53.94 (74.41)
Age 56.48 (14.03) 51.48 (12.37)
LenRes 14.19 (9.60) 15.28 (10.59)
Income

    1 2 / 119 (1.7%) 0 / 65 (0%)
    2 14 / 119 (12%) 4 / 65 (6.2%)
    3 16 / 119 (13%) 3 / 65 (4.6%)
    4 33 / 119 (28%) 16 / 65 (25%)
    5 36 / 119 (30%) 23 / 65 (35%)
    6 12 / 119 (10%) 13 / 65 (20%)
    7 5 / 119 (4.2%) 5 / 65 (7.7%)
    8 0 / 119 (0%) 1 / 65 (1.5%)
    9 1 / 119 (0.8%) 0 / 65 (0%)
TotAsset 170.97 (124.97) 209.77 (197.29)
SecAssets 35.13 (35.39) 51.46 (125.46)
ShortLiq 236.72 (38.02) 247.80 (100.31)
LongLiq 434.95 (27.51) 447.80 (85.70)
WlthIdx 358.50 (84.95) 382.91 (97.38)
SpendVol 554.24 (172.89) 594.31 (107.94)
SpenVel 208.21 (221.50) 240.23 (209.51)
CollGifts 46 / 119 (39%) 44 / 65 (68%)
BricMortar 33 / 119 (28%) 20 / 65 (31%)
MarthaHome 38 / 119 (32%) 29 / 65 (45%)
SunAds 36 / 119 (30%) 43 / 65 (66%)
ThemeColl 42 / 119 (35%) 31 / 65 (48%)
CustDec 24 / 119 (20%) 40 / 65 (62%)
TeenWr 57 / 119 (48%) 38 / 65 (58%)
Carlovers 29 / 119 (24%) 22 / 65 (34%)
CountryColl 44 / 119 (37%) 33 / 65 (51%)
1 Mean (SD); n / N (%)
tab_outcome <- d1a |>
  tbl_summary(
    by = TeenWr, 
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                     all_categorical() ~ "{n} / {N} ({p}%)"),
    digits = all_continuous() ~ 2) %>%
  modify_caption("TeenWr Non-customer & Customer Characteristics (N = {N})")

tab_outcome |>
  as_gt()
TeenWr Non-customer & Customer Characteristics (N = 184)
Characteristic 0
N = 89
1
1
N = 95
1
SpendRat 30.11 (49.27) 56.61 (76.75)
Age 57.35 (14.28) 52.24 (12.59)
LenRes 14.54 (10.42) 14.61 (9.53)
Income

    1 2 / 89 (2.2%) 0 / 95 (0%)
    2 12 / 89 (13%) 6 / 95 (6.3%)
    3 12 / 89 (13%) 7 / 95 (7.4%)
    4 22 / 89 (25%) 27 / 95 (28%)
    5 25 / 89 (28%) 34 / 95 (36%)
    6 8 / 89 (9.0%) 17 / 95 (18%)
    7 6 / 89 (6.7%) 4 / 95 (4.2%)
    8 1 / 89 (1.1%) 0 / 95 (0%)
    9 1 / 89 (1.1%) 0 / 95 (0%)
TotAsset 175.53 (156.80) 193.24 (153.65)
SecAssets 45.22 (109.25) 36.85 (34.64)
ShortLiq 232.90 (34.70) 247.88 (86.50)
LongLiq 439.27 (66.35) 439.69 (43.62)
WlthIdx 357.13 (85.60) 376.47 (93.49)
SpendVol 538.44 (184.85) 596.46 (111.98)
SpenVel 229.67 (222.20) 210.01 (213.36)
CollGifts 40 / 89 (45%) 50 / 95 (53%)
BricMortar 14 / 89 (16%) 39 / 95 (41%)
MarthaHome 28 / 89 (31%) 39 / 95 (41%)
SunAds 43 / 89 (48%) 36 / 95 (38%)
ThemeColl 31 / 89 (35%) 42 / 95 (44%)
CustDec 33 / 89 (37%) 31 / 95 (33%)
RetailKids 27 / 89 (30%) 38 / 95 (40%)
Carlovers 22 / 89 (25%) 29 / 95 (31%)
CountryColl 37 / 89 (42%) 40 / 95 (42%)
1 Mean (SD); n / N (%)
tab_outcome <- d1a |>
  tbl_summary(
    by = Carlovers, 
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                     all_categorical() ~ "{n} / {N} ({p}%)"),
    digits = all_continuous() ~ 2) %>%
  modify_caption("Carlovers Non-customer & Customer Characteristics (N = {N})")

tab_outcome |>
  as_gt()
Carlovers Non-customer & Customer Characteristics (N = 184)
Characteristic 0
N = 133
1
1
N = 51
1
SpendRat 39.54 (66.68) 54.87 (63.86)
Age 55.36 (13.91) 53.02 (12.90)
LenRes 14.43 (10.17) 14.96 (9.41)
Income

    1 2 / 133 (1.5%) 0 / 51 (0%)
    2 15 / 133 (11%) 3 / 51 (5.9%)
    3 15 / 133 (11%) 4 / 51 (7.8%)
    4 37 / 133 (28%) 12 / 51 (24%)
    5 44 / 133 (33%) 15 / 51 (29%)
    6 11 / 133 (8.3%) 14 / 51 (27%)
    7 7 / 133 (5.3%) 3 / 51 (5.9%)
    8 1 / 133 (0.8%) 0 / 51 (0%)
    9 1 / 133 (0.8%) 0 / 51 (0%)
TotAsset 180.78 (143.87) 194.82 (182.08)
SecAssets 41.07 (88.21) 40.47 (52.78)
ShortLiq 240.00 (75.52) 242.29 (36.49)
LongLiq 439.37 (59.55) 439.80 (44.32)
WlthIdx 364.16 (93.12) 374.84 (81.83)
SpendVol 562.11 (165.32) 584.80 (119.45)
SpenVel 229.03 (219.12) 194.73 (212.62)
CollGifts 59 / 133 (44%) 31 / 51 (61%)
BricMortar 38 / 133 (29%) 15 / 51 (29%)
MarthaHome 47 / 133 (35%) 20 / 51 (39%)
SunAds 51 / 133 (38%) 28 / 51 (55%)
ThemeColl 51 / 133 (38%) 22 / 51 (43%)
CustDec 44 / 133 (33%) 20 / 51 (39%)
RetailKids 43 / 133 (32%) 22 / 51 (43%)
TeenWr 66 / 133 (50%) 29 / 51 (57%)
CountryColl 54 / 133 (41%) 23 / 51 (45%)
1 Mean (SD); n / N (%)
tab_outcome <- d1a |>
  tbl_summary(
    by = CountryColl, 
    statistic = list(all_continuous() ~ "{mean} ({sd})", 
                     all_categorical() ~ "{n} / {N} ({p}%)"),
    digits = all_continuous() ~ 2) %>%
  modify_caption("CountryColl Non-customer & Customer Characteristics (N = {N})")

tab_outcome |>
  as_gt()
CountryColl Non-customer & Customer Characteristics (N = 184)
Characteristic 0
N = 107
1
1
N = 77
1
SpendRat 29.19 (40.13) 64.09 (86.93)
Age 53.71 (13.43) 56.10 (13.90)
LenRes 14.37 (9.95) 14.86 (10.00)
Income

    1 1 / 107 (0.9%) 1 / 77 (1.3%)
    2 11 / 107 (10%) 7 / 77 (9.1%)
    3 11 / 107 (10%) 8 / 77 (10%)
    4 24 / 107 (22%) 25 / 77 (32%)
    5 37 / 107 (35%) 22 / 77 (29%)
    6 13 / 107 (12%) 12 / 77 (16%)
    7 8 / 107 (7.5%) 2 / 77 (2.6%)
    8 1 / 107 (0.9%) 0 / 77 (0%)
    9 1 / 107 (0.9%) 0 / 77 (0%)
TotAsset 188.04 (154.14) 180.00 (157.11)
SecAssets 44.06 (100.33) 36.52 (35.70)
ShortLiq 236.06 (37.31) 247.00 (93.66)
LongLiq 439.37 (61.94) 439.65 (45.83)
WlthIdx 364.57 (83.50) 370.66 (98.87)
SpendVol 572.31 (164.94) 562.96 (138.25)
SpenVel 225.47 (215.72) 211.26 (220.65)
CollGifts 26 / 107 (24%) 64 / 77 (83%)
BricMortar 31 / 107 (29%) 22 / 77 (29%)
MarthaHome 37 / 107 (35%) 30 / 77 (39%)
SunAds 19 / 107 (18%) 60 / 77 (78%)
ThemeColl 23 / 107 (21%) 50 / 77 (65%)
CustDec 32 / 107 (30%) 32 / 77 (42%)
RetailKids 32 / 107 (30%) 33 / 77 (43%)
TeenWr 55 / 107 (51%) 40 / 77 (52%)
Carlovers 28 / 107 (26%) 23 / 77 (30%)
1 Mean (SD); n / N (%)

Summary and descriptive statistics of Spending Ratio –>

psych::describe(d1a$SpendRat)
##    vars   n  mean   sd median trimmed   mad  min    max  range skew kurtosis
## X1    1 184 43.79 66.1   18.8   29.17 22.57 0.08 401.42 401.34 2.96     9.91
##      se
## X1 4.87
summary(d1a$SpendRat)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.080   6.077  18.805  43.792  50.273 401.420

Histograms of Spending Ratios –>

hist(d1a$SpendRat)

change default bin width to see spending ratio spread more clearly –>

d1a |>
  ggplot(aes(x=(SpendRat))) + geom_histogram(binwidth = 5, fill="blue", color = "white") + ylab("Frequency") + xlab("Spending Ratio") + ggtitle("Spending Ratio Distribution")

take log of spending ratio to normalize it and see a better patttern with the right skewed histogram –>

d1a |>
  ggplot(aes(x=log(SpendRat))) + geom_histogram(binwidth = 0.5, fill="blue", color = "white") + ylab("Frequency") + xlab("log of Spending Ratio") + ggtitle("Spending Ratio Distribution")

additional histogram reducing the spending ratio by a factor of 10 –>

d1a |>
  ggplot(aes(x=(SpendRat/10))) + geom_histogram(binwidth = 1, fill="blue", color = "white") + ylab("Frequency") + xlab("Spending Ratio") + ggtitle("Spending Ratio Distribution")

d1a |>
  ggplot(aes(x=Age, y=SpendRat, fill = Age)) + geom_bar(stat = "identity", fill="blue") + ylab("Spending Ratio") + xlab("Age") + ggtitle("Spending Ratio vs Age")

EDA summary statistics –>

create_report(d1a)
## 
## 
## processing file: report.rmd
##   |                                             |                                     |   0%  |                                             |.                                    |   2%                                   |                                             |..                                   |   5% [global_options]                  |                                             |...                                  |   7%                                   |                                             |....                                 |  10% [introduce]                       |                                             |....                                 |  12%                                   |                                             |.....                                |  14% [plot_intro]
##   |                                             |......                               |  17%                                   |                                             |.......                              |  19% [data_structure]                  |                                             |........                             |  21%                                   |                                             |.........                            |  24% [missing_profile]
##   |                                             |..........                           |  26%                                   |                                             |...........                          |  29% [univariate_distribution_header]  |                                             |...........                          |  31%                                   |                                             |............                         |  33% [plot_histogram]
##   |                                             |.............                        |  36%                                   |                                             |..............                       |  38% [plot_density]                    |                                             |...............                      |  40%                                   |                                             |................                     |  43% [plot_frequency_bar]              |                                             |.................                    |  45%                                   |                                             |..................                   |  48% [plot_response_bar]               |                                             |..................                   |  50%                                   |                                             |...................                  |  52% [plot_with_bar]                   |                                             |....................                 |  55%                                   |                                             |.....................                |  57% [plot_normal_qq]
##   |                                             |......................               |  60%                                   |                                             |.......................              |  62% [plot_response_qq]                |                                             |........................             |  64%                                   |                                             |.........................            |  67% [plot_by_qq]                      |                                             |..........................           |  69%                                   |                                             |..........................           |  71% [correlation_analysis]
##   |                                             |...........................          |  74%                                   |                                             |............................         |  76% [principal_component_analysis]
##   |                                             |.............................        |  79%                                   |                                             |..............................       |  81% [bivariate_distribution_header]   |                                             |...............................      |  83%                                   |                                             |................................     |  86% [plot_response_boxplot]           |                                             |.................................    |  88%                                   |                                             |.................................    |  90% [plot_by_boxplot]                 |                                             |..................................   |  93%                                   |                                             |...................................  |  95% [plot_response_scatterplot]       |                                             |.................................... |  98%                                   |                                             |.....................................| 100% [plot_by_scatterplot]           
## output file: C:/Users/aliso/Documents/UTSA/Statisctical Modeling/Final Exam/report.knit.md
## "C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/pandoc" +RTS -K512m -RTS "C:\Users\aliso\DOCUME~1\UTSA\STATIS~1\FINALE~1\REPORT~1.MD" --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc7de43bfac7e.html --lua-filter "C:\Users\aliso\AppData\Local\R\win-library\4.4\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\aliso\AppData\Local\R\win-library\4.4\rmarkdown\rmarkdown\lua\latex-div.lua" --embed-resources --standalone --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 6 --template "C:\Users\aliso\AppData\Local\R\win-library\4.4\rmarkdown\rmd\h\default.html" --no-highlight --variable highlightjs=1 --variable theme=yeti --mathjax --variable "mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --include-in-header "C:\Users\aliso\AppData\Local\Temp\RtmpQrnDjK\rmarkdown-str7de413b71940.html"
## 
## Output created: report.html

Graphical display for each variable –> histogram of each variable –>

plot_histogram(d1a)

1. Produce a scatterplot matrix which includes all of the variables in the data set. –>

scatter plot of each variable –>

plot(SpendRat ~ ., pch = 19, col="blue", data = d1a)

pairs(d1a, col = "orange", pch=19)

library(GGally)
## Registered S3 method overwritten by 'GGally':
##   method from   
##   +.gg   ggplot2
Scatter_Matrix <-ggpairs(d1a)

ggsave("Scatter plot matrix1.png", Scatter_Matrix, width = 15, 
       height = 15, units = "in")

Scatter_Matrix

2. Use the lm() function to perform a multiple linear regression –> with SpendRat as the response and all other variables as –> the predictors. Use the summary() function to print the results. –> Comment on the output. For instance: –>

Look for correlations –>

pacman::p_load("Hmisc")

rcorr_matrix <- rcorr(as.matrix(d1a))
print(rcorr_matrix)
##             SpendRat   Age LenRes Income TotAsset SecAssets ShortLiq LongLiq
## SpendRat        1.00  0.08   0.11   0.09     0.02     -0.01     0.11    0.03
## Age             0.08  1.00   0.35  -0.13    -0.05      0.04     0.01    0.11
## LenRes          0.11  0.35   1.00  -0.16    -0.06     -0.02    -0.01    0.04
## Income          0.09 -0.13  -0.16   1.00     0.23      0.10     0.10    0.13
## TotAsset        0.02 -0.05  -0.06   0.23     1.00      0.74     0.46    0.82
## SecAssets      -0.01  0.04  -0.02   0.10     0.74      1.00     0.18    0.93
## ShortLiq        0.11  0.01  -0.01   0.10     0.46      0.18     1.00    0.43
## LongLiq         0.03  0.11   0.04   0.13     0.82      0.93     0.43    1.00
## WlthIdx         0.09 -0.17  -0.08   0.23     0.79      0.45     0.66    0.56
## SpendVol        0.00 -0.49  -0.32   0.29     0.42      0.16     0.02    0.11
## SpenVel        -0.03 -0.19  -0.13   0.07     0.10      0.11    -0.29   -0.09
## CollGifts       0.32 -0.04  -0.05   0.17     0.06     -0.04     0.10    0.02
## BricMortar      0.29 -0.04  -0.01   0.15    -0.03     -0.06     0.03   -0.05
## MarthaHome      0.33  0.02  -0.03   0.23     0.09     -0.01     0.09    0.04
## SunAds          0.15  0.10   0.01   0.02    -0.02      0.06     0.10    0.06
## ThemeColl       0.23 -0.02  -0.04   0.06     0.11      0.11     0.11    0.13
## CustDec         0.21 -0.07   0.01   0.23     0.08      0.10     0.08    0.12
## RetailKids      0.11 -0.18   0.05   0.21     0.12      0.10     0.08    0.11
## TeenWr          0.20 -0.19   0.00   0.13     0.06     -0.05     0.11    0.00
## Carlovers       0.10 -0.08   0.02   0.16     0.04      0.00     0.02    0.00
## CountryColl     0.26  0.09   0.02  -0.07    -0.03     -0.05     0.08    0.00
##             WlthIdx SpendVol SpenVel CollGifts BricMortar MarthaHome SunAds
## SpendRat       0.09     0.00   -0.03      0.32       0.29       0.33   0.15
## Age           -0.17    -0.49   -0.19     -0.04      -0.04       0.02   0.10
## LenRes        -0.08    -0.32   -0.13     -0.05      -0.01      -0.03   0.01
## Income         0.23     0.29    0.07      0.17       0.15       0.23   0.02
## TotAsset       0.79     0.42    0.10      0.06      -0.03       0.09  -0.02
## SecAssets      0.45     0.16    0.11     -0.04      -0.06      -0.01   0.06
## ShortLiq       0.66     0.02   -0.29      0.10       0.03       0.09   0.10
## LongLiq        0.56     0.11   -0.09      0.02      -0.05       0.04   0.06
## WlthIdx        1.00     0.47    0.06      0.13       0.04       0.18   0.06
## SpendVol       0.47     1.00    0.34      0.14       0.01       0.03  -0.04
## SpenVel        0.06     0.34    1.00     -0.03      -0.08      -0.08  -0.03
## CollGifts      0.13     0.14   -0.03      1.00      -0.05       0.21   0.40
## BricMortar     0.04     0.01   -0.08     -0.05       1.00       0.29  -0.19
## MarthaHome     0.18     0.03   -0.08      0.21       0.29       1.00   0.01
## SunAds         0.06    -0.04   -0.03      0.40      -0.19       0.01   1.00
## ThemeColl      0.15     0.04    0.00      0.36      -0.15       0.01   0.40
## CustDec        0.16     0.12   -0.12      0.24       0.12       0.40   0.29
## RetailKids     0.13     0.12    0.07      0.28       0.03       0.13   0.35
## TeenWr         0.11     0.19   -0.05      0.08       0.28       0.10  -0.11
## Carlovers      0.05     0.07   -0.07      0.15       0.01       0.04   0.15
## CountryColl    0.03    -0.03   -0.03      0.58       0.00       0.04   0.60
##             ThemeColl CustDec RetailKids TeenWr Carlovers CountryColl
## SpendRat         0.23    0.21       0.11   0.20      0.10        0.26
## Age             -0.02   -0.07      -0.18  -0.19     -0.08        0.09
## LenRes          -0.04    0.01       0.05   0.00      0.02        0.02
## Income           0.06    0.23       0.21   0.13      0.16       -0.07
## TotAsset         0.11    0.08       0.12   0.06      0.04       -0.03
## SecAssets        0.11    0.10       0.10  -0.05      0.00       -0.05
## ShortLiq         0.11    0.08       0.08   0.11      0.02        0.08
## LongLiq          0.13    0.12       0.11   0.00      0.00        0.00
## WlthIdx          0.15    0.16       0.13   0.11      0.05        0.03
## SpendVol         0.04    0.12       0.12   0.19      0.07       -0.03
## SpenVel          0.00   -0.12       0.07  -0.05     -0.07       -0.03
## CollGifts        0.36    0.24       0.28   0.08      0.15        0.58
## BricMortar      -0.15    0.12       0.03   0.28      0.01        0.00
## MarthaHome       0.01    0.40       0.13   0.10      0.04        0.04
## SunAds           0.40    0.29       0.35  -0.11      0.15        0.60
## ThemeColl        1.00    0.11       0.12   0.10      0.04        0.44
## CustDec          0.11    1.00       0.42  -0.05      0.06        0.12
## RetailKids       0.12    0.42       1.00   0.10      0.10        0.13
## TeenWr           0.10   -0.05       0.10   1.00      0.06        0.01
## Carlovers        0.04    0.06       0.10   0.06      1.00        0.04
## CountryColl      0.44    0.12       0.13   0.01      0.04        1.00
## 
## n= 184 
## 
## 
## P
##             SpendRat Age    LenRes Income TotAsset SecAssets ShortLiq LongLiq
## SpendRat             0.3115 0.1541 0.2037 0.7831   0.9383    0.1534   0.6763 
## Age         0.3115          0.0000 0.0819 0.4715   0.5607    0.8711   0.1313 
## LenRes      0.1541   0.0000        0.0288 0.4383   0.7993    0.8717   0.5658 
## Income      0.2037   0.0819 0.0288        0.0019   0.1738    0.1936   0.0880 
## TotAsset    0.7831   0.4715 0.4383 0.0019          0.0000    0.0000   0.0000 
## SecAssets   0.9383   0.5607 0.7993 0.1738 0.0000             0.0175   0.0000 
## ShortLiq    0.1534   0.8711 0.8717 0.1936 0.0000   0.0175             0.0000 
## LongLiq     0.6763   0.1313 0.5658 0.0880 0.0000   0.0000    0.0000          
## WlthIdx     0.1996   0.0207 0.3038 0.0016 0.0000   0.0000    0.0000   0.0000 
## SpendVol    0.9976   0.0000 0.0000 0.0000 0.0000   0.0353    0.7548   0.1297 
## SpenVel     0.6821   0.0098 0.0732 0.3638 0.1834   0.1343    0.0000   0.2155 
## CollGifts   0.0000   0.6274 0.5363 0.0233 0.3995   0.6281    0.1631   0.8307 
## BricMortar  0.0000   0.5867 0.9281 0.0364 0.7357   0.4294    0.6622   0.5156 
## MarthaHome  0.0000   0.7601 0.6944 0.0017 0.2075   0.9308    0.2136   0.5808 
## SunAds      0.0413   0.1882 0.8405 0.7789 0.8107   0.4167    0.1805   0.3925 
## ThemeColl   0.0017   0.7499 0.5656 0.4215 0.1318   0.1535    0.1246   0.0718 
## CustDec     0.0044   0.3387 0.9120 0.0019 0.2781   0.1728    0.2851   0.1119 
## RetailKids  0.1240   0.0170 0.4813 0.0035 0.1048   0.1856    0.2844   0.1346 
## TeenWr      0.0062   0.0108 0.9614 0.0898 0.4401   0.4787    0.1294   0.9589 
## Carlovers   0.1597   0.2987 0.7462 0.0348 0.5837   0.9639    0.8358   0.9622 
## CountryColl 0.0003   0.2414 0.7460 0.3160 0.7297   0.5291    0.2750   0.9737 
##             WlthIdx SpendVol SpenVel CollGifts BricMortar MarthaHome SunAds
## SpendRat    0.1996  0.9976   0.6821  0.0000    0.0000     0.0000     0.0413
## Age         0.0207  0.0000   0.0098  0.6274    0.5867     0.7601     0.1882
## LenRes      0.3038  0.0000   0.0732  0.5363    0.9281     0.6944     0.8405
## Income      0.0016  0.0000   0.3638  0.0233    0.0364     0.0017     0.7789
## TotAsset    0.0000  0.0000   0.1834  0.3995    0.7357     0.2075     0.8107
## SecAssets   0.0000  0.0353   0.1343  0.6281    0.4294     0.9308     0.4167
## ShortLiq    0.0000  0.7548   0.0000  0.1631    0.6622     0.2136     0.1805
## LongLiq     0.0000  0.1297   0.2155  0.8307    0.5156     0.5808     0.3925
## WlthIdx             0.0000   0.3983  0.0691    0.5856     0.0174     0.4017
## SpendVol    0.0000           0.0000  0.0556    0.8522     0.7359     0.5552
## SpenVel     0.3983  0.0000           0.7121    0.2840     0.3020     0.6790
## CollGifts   0.0691  0.0556   0.7121            0.5335     0.0045     0.0000
## BricMortar  0.5856  0.8522   0.2840  0.5335               0.0000     0.0106
## MarthaHome  0.0174  0.7359   0.3020  0.0045    0.0000                0.9427
## SunAds      0.4017  0.5552   0.6790  0.0000    0.0106     0.9427           
## ThemeColl   0.0481  0.6088   0.9560  0.0000    0.0452     0.8964     0.0000
## CustDec     0.0265  0.1008   0.1138  0.0008    0.1200     0.0000     0.0000
## RetailKids  0.0787  0.0917   0.3408  0.0001    0.6656     0.0884     0.0000
## TeenWr      0.1459  0.0103   0.5411  0.2998    0.0001     0.1785     0.1553
## Carlovers   0.4727  0.3723   0.3392  0.0464    0.9109     0.6269     0.0425
## CountryColl 0.6520  0.6858   0.6630  0.0000    0.9531     0.5449     0.0000
##             ThemeColl CustDec RetailKids TeenWr Carlovers CountryColl
## SpendRat    0.0017    0.0044  0.1240     0.0062 0.1597    0.0003     
## Age         0.7499    0.3387  0.0170     0.0108 0.2987    0.2414     
## LenRes      0.5656    0.9120  0.4813     0.9614 0.7462    0.7460     
## Income      0.4215    0.0019  0.0035     0.0898 0.0348    0.3160     
## TotAsset    0.1318    0.2781  0.1048     0.4401 0.5837    0.7297     
## SecAssets   0.1535    0.1728  0.1856     0.4787 0.9639    0.5291     
## ShortLiq    0.1246    0.2851  0.2844     0.1294 0.8358    0.2750     
## LongLiq     0.0718    0.1119  0.1346     0.9589 0.9622    0.9737     
## WlthIdx     0.0481    0.0265  0.0787     0.1459 0.4727    0.6520     
## SpendVol    0.6088    0.1008  0.0917     0.0103 0.3723    0.6858     
## SpenVel     0.9560    0.1138  0.3408     0.5411 0.3392    0.6630     
## CollGifts   0.0000    0.0008  0.0001     0.2998 0.0464    0.0000     
## BricMortar  0.0452    0.1200  0.6656     0.0001 0.9109    0.9531     
## MarthaHome  0.8964    0.0000  0.0884     0.1785 0.6269    0.5449     
## SunAds      0.0000    0.0000  0.0000     0.1553 0.0425    0.0000     
## ThemeColl             0.1464  0.1014     0.1958 0.5546    0.0000     
## CustDec     0.1464            0.0000     0.5294 0.4371    0.1027     
## RetailKids  0.1014    0.0000             0.1724 0.1717    0.0705     
## TeenWr      0.1958    0.5294  0.1724            0.3819    0.9421     
## Carlovers   0.5546    0.4371  0.1717     0.3819           0.5824     
## CountryColl 0.0000    0.1027  0.0705     0.9421 0.5824

ANOVA Analysis to look for significant predictors –>

anova_spend <- aov(SpendRat ~ ., data = d1a)

summary(anova_spend)
##              Df Sum Sq Mean Sq F value   Pr(>F)    
## Age           1   4500    4500   1.325  0.25133    
## LenRes        1   5750    5750   1.693  0.19499    
## Income        1  10797   10797   3.180  0.07640 .  
## TotAsset      1      6       6   0.002  0.96770    
## SecAssets     1    722     722   0.213  0.64531    
## ShortLiq      1   8182    8182   2.410  0.12251    
## LongLiq       1    102     102   0.030  0.86271    
## WlthIdx       1   8525    8525   2.511  0.11500    
## SpendVol      1    821     821   0.242  0.62355    
## SpenVel       1    457     457   0.135  0.71413    
## CollGifts     1  70947   70947  20.896 9.54e-06 ***
## BricMortar    1  72171   72171  21.256 8.08e-06 ***
## MarthaHome    1  28263   28263   8.324  0.00444 ** 
## SunAds        1   2889    2889   0.851  0.35768    
## ThemeColl     1  20612   20612   6.071  0.01478 *  
## CustDec       1    352     352   0.104  0.74779    
## RetailKids    1    348     348   0.102  0.74936    
## TeenWr        1   6792    6792   2.000  0.15916    
## Carlovers     1   3066    3066   0.903  0.34336    
## CountryColl   1    802     802   0.236  0.62761    
## Residuals   163 553431    3395                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Visualization of correlation matrix –>

plot_correlation(d1a)

Most significant correlation is CollGifts, BrickMortar, MarthaHome, ThemeColl, and (mildly) Income –>

  1. Use the lm() function to perform a multiple linear regression –> with SpendRat as the response and all other variables as –> the predictors. Use the summary() function to print the results. –> Comment on the output. –>
lm_fit = lm(SpendRat ~ ., data = d1a)

summary(lm_fit)
## 
## Call:
## lm(formula = SpendRat ~ ., data = d1a)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -101.616  -31.110   -8.238   16.176  273.558 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -37.53015  179.15157  -0.209  0.83433   
## Age           0.44224    0.40620   1.089  0.27789   
## LenRes        0.74215    0.50013   1.484  0.13976   
## Income       -1.45995    3.53909  -0.413  0.68050   
## TotAsset     -0.04689    0.08417  -0.557  0.57818   
## SecAssets     0.10371    0.26069   0.398  0.69128   
## ShortLiq      0.12096    0.13839   0.874  0.38341   
## LongLiq      -0.07824    0.46023  -0.170  0.86521   
## WlthIdx      -0.02007    0.11985  -0.167  0.86724   
## SpendVol      0.01635    0.04444   0.368  0.71349   
## SpenVel       0.02413    0.02743   0.880  0.38040   
## CollGifts    25.96189   11.76414   2.207  0.02872 * 
## BricMortar   35.20239   11.07492   3.179  0.00177 **
## MarthaHome   28.37021   10.72825   2.644  0.00898 **
## SunAds       -0.70414   13.12672  -0.054  0.95729   
## ThemeColl    21.83030   10.45189   2.089  0.03829 * 
## CustDec       8.27352   11.63807   0.711  0.47816   
## RetailKids   -4.49706   11.16289  -0.403  0.68758   
## TeenWr       13.49246    9.72981   1.387  0.16742   
## Carlovers     9.93914   10.06260   0.988  0.32475   
## CountryColl   6.64051   13.66314   0.486  0.62761   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 58.27 on 163 degrees of freedom
## Multiple R-squared:  0.3078, Adjusted R-squared:  0.2229 
## F-statistic: 3.624 on 20 and 163 DF,  p-value: 2.3e-06
    1. Is there a relationship between the predictors and the response? Yes.
    2. Which predictors appear to have a statistically signifcant relationship to the response? According to the linear regression, the most significant is the catalogs CollGifts, BricMortar, MarthaHome, ThemeColl. Income is no longer consider significant. Income was only significant at the p = 0.1 level anyhow, and we are using the 0.05 level. This also coincides with the correlation matrix.
    3. What does the coeffcient for the Age variable suggest? That for every year older the customer is, the spending ratio goes up by 0.4424 or 44.24 %–>

Perform MLR using only the significant predictors –>

fit<-lm(SpendRat~ CollGifts+BricMortar+MarthaHome+ThemeColl, data = d1a)
summary(fit)
## 
## Call:
## lm(formula = SpendRat ~ CollGifts + BricMortar + MarthaHome + 
##     ThemeColl, data = d1a)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -88.945 -29.201  -6.086  13.794 281.444 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   -2.287      7.234  -0.316 0.752216    
## CollGifts     29.795      9.322   3.196 0.001646 ** 
## BricMortar    39.163      9.906   3.953 0.000111 ***
## MarthaHome    28.301      9.452   2.994 0.003142 ** 
## ThemeColl     25.004      9.372   2.668 0.008330 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 57.37 on 179 degrees of freedom
## Multiple R-squared:  0.2632, Adjusted R-squared:  0.2467 
## F-statistic: 15.98 on 4 and 179 DF,  p-value: 3.303e-11

The R squared error actually went down with the smaller model. –>

  1. Use forward, backward, or stepwise selection to find a suitable subset of this model and comment on the performance of this new model to the full model from 2. –>
intercept_only <- lm(SpendRat ~ 1, data = d1a)
all <- lm(SpendRat ~ ., data = d1a)
fwd.model <- step(intercept_only, direction = "forward", scope=formula(all), trace = 0)
fwd.model$anova
##           Step Df Deviance Resid. Df Resid. Dev      AIC
## 1              NA       NA       183   799534.7 1543.340
## 2 + MarthaHome -1 89197.96       182   710336.8 1523.575
## 3  + CollGifts -1 53987.21       181   656349.5 1511.030
## 4 + BricMortar -1 43810.72       180   612538.8 1500.319
## 5  + ThemeColl -1 23427.28       179   589111.5 1495.144
## 6     + LenRes -1 13895.59       178   575216.0 1492.752
fwd.model$coefficients
## (Intercept)  MarthaHome   CollGifts  BricMortar   ThemeColl      LenRes 
## -15.7432470  28.6728771  30.3193733  39.2957330  25.5835398   0.8778631
bwd.model <- step(all, direction = "backward", scope=formula(all), trace = 0)

bwd.model$anova
##             Step Df    Deviance Resid. Df Resid. Dev      AIC
## 1                NA          NA       163   553430.6 1515.648
## 2       - SunAds  1    9.769722       164   553440.4 1513.651
## 3      - LongLiq  1   91.825742       165   553532.2 1511.682
## 4      - WlthIdx  1   48.460361       166   553580.6 1509.698
## 5     - SpendVol  1  402.667916       167   553983.3 1507.832
## 6       - Income  1  508.634049       168   554492.0 1506.000
## 7   - RetailKids  1  893.205537       169   555385.2 1504.297
## 8      - CustDec  1 1205.661022       170   556590.8 1502.696
## 9  - CountryColl  1 1236.098256       171   557826.9 1501.104
## 10   - SecAssets  1 1750.985529       172   559577.9 1499.680
## 11    - TotAsset  1 2601.227973       173   562179.1 1498.534
## 12    - ShortLiq  1 1368.055642       174   563547.2 1496.981
## 13     - SpenVel  1 1816.592462       175   565363.8 1495.573
## 14   - Carlovers  1 2105.016071       176   567468.8 1494.257
## 15         - Age  1 3604.670636       177   571073.5 1493.422
## 16      - TeenWr  1 4142.496165       178   575216.0 1492.752
bwd.model$coefficients
## (Intercept)      LenRes   CollGifts  BricMortar  MarthaHome   ThemeColl 
## -15.7432470   0.8778631  30.3193733  39.2957330  28.6728771  25.5835398
both.model <- step(intercept_only, direction = "both", scope=formula(all), trace = 0)

both.model$anova
##           Step Df Deviance Resid. Df Resid. Dev      AIC
## 1              NA       NA       183   799534.7 1543.340
## 2 + MarthaHome -1 89197.96       182   710336.8 1523.575
## 3  + CollGifts -1 53987.21       181   656349.5 1511.030
## 4 + BricMortar -1 43810.72       180   612538.8 1500.319
## 5  + ThemeColl -1 23427.28       179   589111.5 1495.144
## 6     + LenRes -1 13895.59       178   575216.0 1492.752
both.model$coefficients
## (Intercept)  MarthaHome   CollGifts  BricMortar   ThemeColl      LenRes 
## -15.7432470  28.6728771  30.3193733  39.2957330  25.5835398   0.8778631
  1. Use forward, backward, or stepwise selection to find a suitable subset of this model and comment on the performance of this new model to the full model from 2. –>

The main difference between the full model from 2 and the stepwise ones is the addition of the Length of Residence predictor and the change in the intercept. The coefficient values do not change that much. And the coefficients stay the same whether it is done forward, backward, or both directions. –>

fit2<-lm(SpendRat~ CollGifts+BricMortar+MarthaHome+ThemeColl+LenRes, data = d1a)
summary(fit2)
## 
## Call:
## lm(formula = SpendRat ~ CollGifts + BricMortar + MarthaHome + 
##     ThemeColl + LenRes, data = d1a)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -97.342 -30.315  -7.095  13.601 272.223 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -15.7432     9.6691  -1.628  0.10525    
## CollGifts    30.3194     9.2406   3.281  0.00124 ** 
## BricMortar   39.2957     9.8165   4.003 9.17e-05 ***
## MarthaHome   28.6729     9.3680   3.061  0.00255 ** 
## ThemeColl    25.5835     9.2909   2.754  0.00651 ** 
## LenRes        0.8779     0.4233   2.074  0.03955 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 56.85 on 178 degrees of freedom
## Multiple R-squared:  0.2806, Adjusted R-squared:  0.2604 
## F-statistic: 13.88 on 5 and 178 DF,  p-value: 1.873e-11
  1. Use the plot() function to produce diagnostic plots of the linear regression fit. Comment on any problems you see with the fit. Do the residual plots suggest any unusually large outliers? Does the leverage plot identify any observations with unusually high leverage? –>

Select the model based on AIC –>

pacman::p_load(AICcmodavg)
model.set <- list(fit2, fit, lm_fit)
model.names <- c("fit2", "fit", "lm_fit")
aictab(model.set, modnames = model.names)
## 
## Model selection based on AICc:
## 
##         K    AICc Delta_AICc AICcWt Cum.Wt       LL
## fit2    7 2017.56       0.00   0.75   0.75 -1001.46
## fit     6 2019.79       2.23   0.25   1.00 -1003.66
## lm_fit 22 2046.10      28.55   0.00   1.00  -997.91

Make diagnostic plots of each model –>

par(mfrow=c(2,2))
plot(fit2)

plot(fit2)

plot(fit)

par(mfrow=c(2,2))
plot(fit)

plot(lm_fit)

## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

par(mfrow=c(2,2))
plot(lm_fit)
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced

4. Use the plot() function to produce diagnostic plots of the linear regression fit. Comment on any problems you see with the fit. Do the residual plots suggest any unusually large outliers? Does the leverage plot identify any observations with unusually high leverage? –> Yes. There are a decent number of outliers in all models. The leverage model on fit2 appears to indicate observation 88, 90, and 161. Fit2 accounts for 75% of the model. And even though another one explains 25% of the model, it’s Delta AIC is more that 2 points different from the fit2 model. –>

  1. Try a few different transformations of the variables, such as log(X),√X, X^2. Comment on your findings and repeat 3 & 4. –> 3. Use forward, backward, or stepwise selection to find a suitable subset of this model and comment on the performance of this new model to the full model from 2. –> Length of Residence is the only variable that is actually continous in the smaller model, the others are binary. –>

Squared LenRes –>

squared.fit2=lm(SpendRat ~ CollGifts+BricMortar+MarthaHome+ThemeColl+
              LenRes+I(LenRes^2), data = d1a)

summary (squared.fit2)
## 
## Call:
## lm(formula = SpendRat ~ CollGifts + BricMortar + MarthaHome + 
##     ThemeColl + LenRes + I(LenRes^2), data = d1a)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -97.262 -30.931  -7.209  13.440 273.756 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -11.90724   14.20330  -0.838  0.40297    
## CollGifts    30.15382    9.27391   3.251  0.00137 ** 
## BricMortar   39.22160    9.84248   3.985 9.85e-05 ***
## MarthaHome   28.77539    9.39488   3.063  0.00253 ** 
## ThemeColl    25.80250    9.33237   2.765  0.00630 ** 
## LenRes        0.31421    1.58343   0.198  0.84293    
## I(LenRes^2)   0.01402    0.03795   0.369  0.71221    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 56.99 on 177 degrees of freedom
## Multiple R-squared:  0.2811, Adjusted R-squared:  0.2567 
## F-statistic: 11.54 on 6 and 177 DF,  p-value: 6.987e-11

Square Root LenRes–>

root.fit2=lm(SpendRat ~ CollGifts+BricMortar+MarthaHome+ThemeColl+
              LenRes+I(LenRes^(.5)), data = d1a)

summary (root.fit2)
## 
## Call:
## lm(formula = SpendRat ~ CollGifts + BricMortar + MarthaHome + 
##     ThemeColl + LenRes + I(LenRes^(0.5)), data = d1a)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -97.891 -30.757  -7.095  13.520 273.400 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -0.249     30.685  -0.008 0.993535    
## CollGifts         29.759      9.319   3.193 0.001664 ** 
## BricMortar        39.114      9.842   3.974 0.000103 ***
## MarthaHome        28.756      9.388   3.063 0.002534 ** 
## ThemeColl         25.995      9.342   2.783 0.005976 ** 
## LenRes             1.943      2.047   0.950 0.343645    
## I(LenRes^(0.5))   -8.572     16.108  -0.532 0.595281    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 56.96 on 177 degrees of freedom
## Multiple R-squared:  0.2817, Adjusted R-squared:  0.2574 
## F-statistic: 11.57 on 6 and 177 DF,  p-value: 6.52e-11

Log LenRes–>

log.fit2=lm(SpendRat ~ CollGifts+BricMortar+MarthaHome+ThemeColl+
              LenRes+log1p(LenRes), data = d1a)

summary (log.fit2)
## 
## Call:
## lm(formula = SpendRat ~ CollGifts + BricMortar + MarthaHome + 
##     ThemeColl + LenRes + log1p(LenRes), data = d1a)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -98.041 -30.635  -7.027  13.471 273.185 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    -0.7191    29.2642  -0.025 0.980424    
## CollGifts      29.7037     9.3278   3.184 0.001714 ** 
## BricMortar     39.1503     9.8396   3.979 0.000101 ***
## MarthaHome     28.7787     9.3886   3.065 0.002515 ** 
## ThemeColl      26.0015     9.3410   2.784 0.005960 ** 
## LenRes          1.4339     1.1065   1.296 0.196714    
## log1p(LenRes)  -9.0387    16.6129  -0.544 0.587072    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 56.96 on 177 degrees of freedom
## Multiple R-squared:  0.2818, Adjusted R-squared:  0.2574 
## F-statistic: 11.57 on 6 and 177 DF,  p-value: 6.481e-11

Stepwise with log1p(LenRes) –>

log_intercept_only <- lm(SpendRat ~ 1, data = d1a)

all1 <- lm(SpendRat ~ . + log1p(LenRes), data = d1a)

log_fwd_model <- step(log_intercept_only, direction = "forward", scope=formula(all1), trace = 0)

log_fwd_model$anova
##           Step Df Deviance Resid. Df Resid. Dev      AIC
## 1              NA       NA       183   799534.7 1543.340
## 2 + MarthaHome -1 89197.96       182   710336.8 1523.575
## 3  + CollGifts -1 53987.21       181   656349.5 1511.030
## 4 + BricMortar -1 43810.72       180   612538.8 1500.319
## 5  + ThemeColl -1 23427.28       179   589111.5 1495.144
## 6     + LenRes -1 13895.59       178   575216.0 1492.752
log_fwd_model$coefficients
## (Intercept)  MarthaHome   CollGifts  BricMortar   ThemeColl      LenRes 
## -15.7432470  28.6728771  30.3193733  39.2957330  25.5835398   0.8778631
log_bwd_model <- step(all1, direction = "backward", scope=formula(all1), trace = 0)

log_bwd_model$anova
##               Step Df   Deviance Resid. Df Resid. Dev      AIC
## 1                  NA         NA       162   551074.8 1516.863
## 2         - SunAds  1   50.70110       163   551125.5 1514.880
## 3        - WlthIdx  1   63.00329       164   551188.5 1512.901
## 4        - LongLiq  1   95.15733       165   551283.7 1510.933
## 5         - Income  1  619.04464       166   551902.7 1509.139
## 6       - SpendVol  1  566.49222       167   552469.2 1507.328
## 7     - RetailKids  1 1122.29669       168   553591.5 1505.701
## 8        - CustDec  1 1112.83280       169   554704.3 1504.071
## 9    - CountryColl  1 1226.51877       170   555930.8 1502.477
## 10 - log1p(LenRes)  1 1896.07435       171   557826.9 1501.104
## 11     - SecAssets  1 1750.98553       172   559577.9 1499.680
## 12      - TotAsset  1 2601.22797       173   562179.1 1498.534
## 13      - ShortLiq  1 1368.05564       174   563547.2 1496.981
## 14       - SpenVel  1 1816.59246       175   565363.8 1495.573
## 15     - Carlovers  1 2105.01607       176   567468.8 1494.257
## 16           - Age  1 3604.67064       177   571073.5 1493.422
## 17        - TeenWr  1 4142.49616       178   575216.0 1492.752
log_bwd_model$coefficients
## (Intercept)      LenRes   CollGifts  BricMortar  MarthaHome   ThemeColl 
## -15.7432470   0.8778631  30.3193733  39.2957330  28.6728771  25.5835398
log_both_model <- step(log_intercept_only, direction = "both", scope=formula(all1), trace = 0)

log_both_model$anova
##           Step Df Deviance Resid. Df Resid. Dev      AIC
## 1              NA       NA       183   799534.7 1543.340
## 2 + MarthaHome -1 89197.96       182   710336.8 1523.575
## 3  + CollGifts -1 53987.21       181   656349.5 1511.030
## 4 + BricMortar -1 43810.72       180   612538.8 1500.319
## 5  + ThemeColl -1 23427.28       179   589111.5 1495.144
## 6     + LenRes -1 13895.59       178   575216.0 1492.752
log_both_model$coefficients
## (Intercept)  MarthaHome   CollGifts  BricMortar   ThemeColl      LenRes 
## -15.7432470  28.6728771  30.3193733  39.2957330  25.5835398   0.8778631

Plot of log1p(LenRes) models –>

plot(log_fwd_model)

plot(log_bwd_model)

plot(log_both_model)

plot(log.fit2)

AIC of all models –>

log.model.set <- list(log.fit2, log_both_model, log_fwd_model,log_bwd_model, all1)

log.model.names <- c("log.fit2", "log_both_model", "log_fwd_model","log_bwd_model", "all1")

aictab(log.model.set, modnames = log.model.names)
## Warning in aictab.AIClm(log.model.set, modnames = log.model.names): 
## Check model structure carefully as some models may be redundant
## 
## Model selection based on AICc:
## 
##                 K    AICc Delta_AICc AICcWt Cum.Wt       LL
## log_both_model  7 2017.56       0.00   0.29   0.29 -1001.46
## log_fwd_model   7 2017.56       0.00   0.29   0.59 -1001.46
## log_bwd_model   7 2017.56       0.00   0.29   0.88 -1001.46
## log.fit2        8 2019.44       1.88   0.12   1.00 -1001.31
## all1           23 2047.93      30.37   0.00   1.00  -997.52

The log of the stepwise models made it comparable to the original fit2 AIC. However, the AICcWt is only 29% so these are not better than the original liner model. Pretty much the same as far as outliers, 88, 90, and 161. The backward and forward models had 62 instead of 161 as being an outlier. –>

  1. Comment on the results obtained. How good do these models fit the data? Can we use any of them to predict the spending ratio?
    I think that the fit2 model does a pretty good job as according to the AICcWt it accounts for 75% of the model. So the Spending Ratio = 30.32 CollGifts + 39.30 BricMortar + 28.67 MarthaHome + 25.28 ThemeColl + 0.88 LenRes - 15.74 –>

  2. Explore the data graphically in order to investigate the association between income and the other features. Which of the other features seem most likely to be useful in predicting income? Scatterplots and boxplots may be useful tools to answer this question. Describe your findings. –>

Package your code and responses into a single pdf and upload it to canvas. It has been my absolute pleasure. Best of luck next semester. –>

adult <- read_csv("adult.csv", na = "?")
## Rows: 32561 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (9): work_class, education, marital_status, occupation, relationship, ra...
## dbl (6): age, wgt, education_num, capital_gain, capital_loss, hours_per_week
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
create_report(adult)
## 
## 
## processing file: report.rmd
##   |                                             |                                     |   0%  |                                             |.                                    |   2%                                   |                                             |..                                   |   5% [global_options]                  |                                             |...                                  |   7%                                   |                                             |....                                 |  10% [introduce]                       |                                             |....                                 |  12%                                   |                                             |.....                                |  14% [plot_intro]
##   |                                             |......                               |  17%                                   |                                             |.......                              |  19% [data_structure]                  |                                             |........                             |  21%                                   |                                             |.........                            |  24% [missing_profile]
##   |                                             |..........                           |  26%                                   |                                             |...........                          |  29% [univariate_distribution_header]  |                                             |...........                          |  31%                                   |                                             |............                         |  33% [plot_histogram]
##   |                                             |.............                        |  36%                                   |                                             |..............                       |  38% [plot_density]                    |                                             |...............                      |  40%                                   |                                             |................                     |  43% [plot_frequency_bar]
##   |                                             |.................                    |  45%                                   |                                             |..................                   |  48% [plot_response_bar]               |                                             |..................                   |  50%                                   |                                             |...................                  |  52% [plot_with_bar]                   |                                             |....................                 |  55%                                   |                                             |.....................                |  57% [plot_normal_qq]
##   |                                             |......................               |  60%                                   |                                             |.......................              |  62% [plot_response_qq]                |                                             |........................             |  64%                                   |                                             |.........................            |  67% [plot_by_qq]                      |                                             |..........................           |  69%                                   |                                             |..........................           |  71% [correlation_analysis]
##   |                                             |...........................          |  74%                                   |                                             |............................         |  76% [principal_component_analysis]
##   |                                             |.............................        |  79%                                   |                                             |..............................       |  81% [bivariate_distribution_header]   |                                             |...............................      |  83%                                   |                                             |................................     |  86% [plot_response_boxplot]           |                                             |.................................    |  88%                                   |                                             |.................................    |  90% [plot_by_boxplot]                 |                                             |..................................   |  93%                                   |                                             |...................................  |  95% [plot_response_scatterplot]       |                                             |.................................... |  98%                                   |                                             |.....................................| 100% [plot_by_scatterplot]           
## output file: C:/Users/aliso/Documents/UTSA/Statisctical Modeling/Final Exam/report.knit.md
## "C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/pandoc" +RTS -K512m -RTS "C:\Users\aliso\DOCUME~1\UTSA\STATIS~1\FINALE~1\REPORT~1.MD" --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc7de46f8fba2.html --lua-filter "C:\Users\aliso\AppData\Local\R\win-library\4.4\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\aliso\AppData\Local\R\win-library\4.4\rmarkdown\rmarkdown\lua\latex-div.lua" --embed-resources --standalone --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 6 --template "C:\Users\aliso\AppData\Local\R\win-library\4.4\rmarkdown\rmd\h\default.html" --no-highlight --variable highlightjs=1 --variable theme=yeti --mathjax --variable "mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --include-in-header "C:\Users\aliso\AppData\Local\Temp\RtmpQrnDjK\rmarkdown-str7de47f8371a0.html"
## 
## Output created: report.html
library(tidyr)
clean_adult <- drop_na(adult)
dim(clean_adult)
## [1] 30162    15
str(clean_adult)
## tibble [30,162 × 15] (S3: tbl_df/tbl/data.frame)
##  $ age           : num [1:30162] 39 50 38 53 28 37 49 52 31 42 ...
##  $ work_class    : chr [1:30162] "State-gov" "Self-emp-not-inc" "Private" "Private" ...
##  $ wgt           : num [1:30162] 77516 83311 215646 234721 338409 ...
##  $ education     : chr [1:30162] "Bachelors" "Bachelors" "HS-grad" "11th" ...
##  $ education_num : num [1:30162] 13 13 9 7 13 14 5 9 14 13 ...
##  $ marital_status: chr [1:30162] "Never-married" "Married-civ-spouse" "Divorced" "Married-civ-spouse" ...
##  $ occupation    : chr [1:30162] "Adm-clerical" "Exec-managerial" "Handlers-cleaners" "Handlers-cleaners" ...
##  $ relationship  : chr [1:30162] "Not-in-family" "Husband" "Not-in-family" "Husband" ...
##  $ race          : chr [1:30162] "White" "White" "White" "Black" ...
##  $ sex           : chr [1:30162] "Male" "Male" "Male" "Male" ...
##  $ capital_gain  : num [1:30162] 2174 0 0 0 0 ...
##  $ capital_loss  : num [1:30162] 0 0 0 0 0 0 0 0 0 0 ...
##  $ hours_per_week: num [1:30162] 40 13 40 40 40 40 16 45 50 40 ...
##  $ native_country: chr [1:30162] "United-States" "United-States" "United-States" "United-States" ...
##  $ income        : chr [1:30162] "<=50K" "<=50K" "<=50K" "<=50K" ...
plot_missing(clean_adult)

plot_correlation(clean_adult)
## 1 features with more than 20 categories ignored!
## native_country: 41 categories

plot_boxplot(clean_adult, by = "income")

plot_scatterplot(clean_adult, by = "income")

hrs_age_plot <- ggplot(clean_adult, aes(age,hours_per_week, color = income )) + geom_point(alpha = 0.5) +
  labs(title = "Hrs per week vs Age by Income Group", x="Age", y = "Hrs per week")

print(hrs_age_plot)

hrs_edu_plot <- ggplot(clean_adult, aes(hours_per_week,education_num,color = income)) + geom_point(alpha = 0.5) 
  labs(title = "Education Years vs Hrs per week by Income", x="Hrs per week", y = "Education Years")
## $x
## [1] "Hrs per week"
## 
## $y
## [1] "Education Years"
## 
## $title
## [1] "Education Years vs Hrs per week by Income"
## 
## attr(,"class")
## [1] "labels"
print(hrs_edu_plot)

Income_Ed <- ggplot(clean_adult, aes(x = education, fill = income)) +
  geom_bar(position = "dodge") + coord_flip() + theme_minimal()
  labs(title = "Income vs. Education", y = "Proportion") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))
## NULL
print(Income_Ed)

library(dplyr)

# Convert categorical variables to factors
clean_adult$income <- as.factor(clean_adult$income)
clean_adult$work_class <- as.factor(clean_adult$work_class)
clean_adult$education <- as.factor(clean_adult$education)
clean_adult$marital_status <- as.factor(clean_adult$marital_status)
clean_adult$occupation <- as.factor(clean_adult$occupation)
clean_adult$relationship <- as.factor(clean_adult$relationship)
clean_adult$race <- as.factor(clean_adult$race)
clean_adult$native_country <- as.factor(clean_adult$native_country)
clean_adult$sex <- as.factor(clean_adult$sex)
Work_Class <- ggplot(clean_adult, aes(x = work_class, fill = income)) +
  geom_bar(position = "dodge") + coord_flip() +
  labs(title = "Income vs. Work Class", y = "Proportion") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

print(Work_Class)

Marital_Status <- ggplot(clean_adult, aes(x = marital_status, fill = income)) +
  geom_bar(position = "dodge") + coord_flip() +
  labs(title = "Income vs. Marital Status", y = "Proportion") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

print(Marital_Status)

Occupation <- ggplot(clean_adult, aes(x = occupation, fill = income)) +
  geom_bar(position = "dodge") + coord_flip() +
  labs(title = "Income vs. Occupation", y = "Proportion") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

print(Occupation)

native_country_nonUS <- filter(clean_adult, 
                               native_country != "United-States")
native_country_nonUS <- ggplot(native_country_nonUS, aes(x = native_country, fill = income)) +
  geom_bar(position = "dodge")  + 
  labs(title = "Income vs. native_country", y = "Proportion") + 
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

print(native_country_nonUS)

relationship <- ggplot(clean_adult, aes(x = relationship, fill = income)) +
  geom_bar(position = "dodge") + coord_flip() +
  labs(title = "Income vs. relationship", y = "Proportion") +
  theme(axis.text.y = element_text(angle = 0, hjust = 1))

print(relationship)

race <- ggplot(clean_adult, aes(x = race, fill = income)) +
  geom_bar(position = "dodge")+
  labs(title = "Income vs. race", y = "Proportion") +
  theme(axis.text.y = element_text(angle = 0, hjust = 1))

print(race)

non_white <- filter(clean_adult, race != "White")
non_white <- ggplot(non_white, aes(x = race, fill = income)) +
  geom_bar(position = "dodge")+
  labs(title = "Income vs. non-white race", y = "Proportion") +
  theme(axis.text.y = element_text(angle = 0, hjust = 1))

print(non_white)

gender <- ggplot(clean_adult, aes(x = sex, fill = income)) +
  geom_bar(position = "dodge")+
  labs(title = "Income vs. gender", y = "Proportion") +
  theme(axis.text.y = element_text(angle = 0, hjust = 1))

print(gender)

IncomeAge <- ggplot(clean_adult, aes(x = income, y = age, fill = income)) +
  geom_boxplot() +
  labs(title = "Income vs. Age")

print(IncomeAge)

3. Perform logistic regression in order to predict income using all the variables that seemed most associated with income in (b) (exculde wgt, you don’t need it). What is the R^2 of the model obtained? –> Tjur’s R2 –> 0.4480447 –>

logistic regression all variables except wgt –>

model <- glm(income ~ . - wgt, data = clean_adult, family = binomial)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(model)
## 
## Call:
## glm(formula = income ~ . - wgt, family = binomial, data = clean_adult)
## 
## Coefficients: (1 not defined because of singularities)
##                                            Estimate Std. Error z value Pr(>|z|)
## (Intercept)                              -6.251e+00  7.634e-01  -8.189 2.63e-16
## age                                       2.510e-02  1.709e-03  14.689  < 2e-16
## work_classLocal-gov                      -6.954e-01  1.129e-01  -6.161 7.22e-10
## work_classPrivate                        -5.006e-01  9.369e-02  -5.343 9.12e-08
## work_classSelf-emp-inc                   -3.318e-01  1.237e-01  -2.681 0.007337
## work_classSelf-emp-not-inc               -9.958e-01  1.099e-01  -9.057  < 2e-16
## work_classState-gov                      -8.190e-01  1.253e-01  -6.535 6.37e-11
## work_classWithout-pay                    -1.329e+01  1.970e+02  -0.067 0.946234
## education11th                             8.585e-02  2.138e-01   0.402 0.688013
## education12th                             4.361e-01  2.780e-01   1.568 0.116768
## education1st-4th                         -4.378e-01  4.963e-01  -0.882 0.377746
## education5th-6th                         -4.163e-01  3.592e-01  -1.159 0.246485
## education7th-8th                         -5.731e-01  2.434e-01  -2.354 0.018554
## education9th                             -2.457e-01  2.702e-01  -0.909 0.363237
## educationAssoc-acdm                       1.264e+00  1.797e-01   7.035 2.00e-12
## educationAssoc-voc                        1.255e+00  1.728e-01   7.259 3.90e-13
## educationBachelors                        1.890e+00  1.607e-01  11.762  < 2e-16
## educationDoctorate                        2.930e+00  2.230e-01  13.144  < 2e-16
## educationHS-grad                          7.654e-01  1.563e-01   4.896 9.76e-07
## educationMasters                          2.246e+00  1.718e-01  13.076  < 2e-16
## educationPreschool                       -2.008e+01  1.911e+02  -0.105 0.916328
## educationProf-school                      2.832e+00  2.068e-01  13.694  < 2e-16
## educationSome-college                     1.102e+00  1.586e-01   6.946 3.75e-12
## education_num                                    NA         NA      NA       NA
## marital_statusMarried-AF-spouse           2.778e+00  5.770e-01   4.814 1.48e-06
## marital_statusMarried-civ-spouse          2.108e+00  2.748e-01   7.672 1.70e-14
## marital_statusMarried-spouse-absent       2.929e-03  2.408e-01   0.012 0.990292
## marital_statusNever-married              -4.834e-01  8.922e-02  -5.418 6.03e-08
## marital_statusSeparated                  -7.598e-02  1.654e-01  -0.459 0.645914
## marital_statusWidowed                     1.818e-01  1.582e-01   1.150 0.250240
## occupationArmed-Forces                   -1.106e+00  1.515e+00  -0.730 0.465401
## occupationCraft-repair                    6.031e-02  8.072e-02   0.747 0.454997
## occupationExec-managerial                 8.016e-01  7.788e-02  10.293  < 2e-16
## occupationFarming-fishing                -1.004e+00  1.408e-01  -7.129 1.01e-12
## occupationHandlers-cleaners              -6.983e-01  1.447e-01  -4.826 1.39e-06
## occupationMachine-op-inspct              -2.671e-01  1.026e-01  -2.602 0.009274
## occupationOther-service                  -8.359e-01  1.191e-01  -7.019 2.24e-12
## occupationPriv-house-serv                -4.213e+00  1.727e+00  -2.439 0.014709
## occupationProf-specialty                  5.150e-01  8.245e-02   6.247 4.19e-10
## occupationProtective-serv                 5.999e-01  1.262e-01   4.753 2.00e-06
## occupationSales                           2.914e-01  8.313e-02   3.505 0.000456
## occupationTech-support                    6.615e-01  1.117e-01   5.923 3.16e-09
## occupationTransport-moving               -9.265e-02  1.000e-01  -0.926 0.354209
## relationshipNot-in-family                 4.605e-01  2.717e-01   1.695 0.090154
## relationshipOther-relative               -3.920e-01  2.477e-01  -1.583 0.113524
## relationshipOwn-child                    -7.317e-01  2.708e-01  -2.703 0.006881
## relationshipUnmarried                     3.394e-01  2.874e-01   1.181 0.237651
## relationshipWife                          1.350e+00  1.056e-01  12.786  < 2e-16
## raceAsian-Pac-Islander                    8.378e-01  2.858e-01   2.932 0.003369
## raceBlack                                 5.240e-01  2.398e-01   2.185 0.028869
## raceOther                                 1.593e-01  3.791e-01   0.420 0.674265
## raceWhite                                 6.337e-01  2.287e-01   2.770 0.005599
## sexMale                                   8.728e-01  8.084e-02  10.796  < 2e-16
## capital_gain                              3.229e-04  1.074e-05  30.067  < 2e-16
## capital_loss                              6.406e-04  3.840e-05  16.679  < 2e-16
## hours_per_week                            2.931e-02  1.701e-03  17.229  < 2e-16
## native_countryCanada                     -8.387e-01  6.893e-01  -1.217 0.223734
## native_countryChina                      -1.918e+00  7.034e-01  -2.727 0.006387
## native_countryColumbia                   -3.285e+00  1.032e+00  -3.183 0.001457
## native_countryCuba                       -7.665e-01  7.031e-01  -1.090 0.275646
## native_countryDominican-Republic         -2.950e+00  1.220e+00  -2.418 0.015627
## native_countryEcuador                    -1.427e+00  9.563e-01  -1.492 0.135628
## native_countryEl-Salvador                -1.756e+00  7.939e-01  -2.212 0.026958
## native_countryEngland                    -8.930e-01  7.009e-01  -1.274 0.202589
## native_countryFrance                     -5.944e-01  8.128e-01  -0.731 0.464586
## native_countryGermany                    -7.193e-01  6.785e-01  -1.060 0.289141
## native_countryGreece                     -2.199e+00  8.385e-01  -2.623 0.008724
## native_countryGuatemala                  -1.382e+00  9.770e-01  -1.415 0.157206
## native_countryHaiti                      -1.225e+00  9.284e-01  -1.320 0.186888
## native_countryHoland-Netherlands         -1.180e+01  8.827e+02  -0.013 0.989339
## native_countryHonduras                   -2.361e+00  2.685e+00  -0.879 0.379144
## native_countryHong                       -1.334e+00  8.990e-01  -1.484 0.137761
## native_countryHungary                    -1.306e+00  9.884e-01  -1.321 0.186544
## native_countryIndia                      -1.696e+00  6.689e-01  -2.536 0.011217
## native_countryIran                       -1.167e+00  7.583e-01  -1.539 0.123898
## native_countryIreland                    -6.878e-01  8.886e-01  -0.774 0.438911
## native_countryItaly                      -3.635e-01  7.098e-01  -0.512 0.608628
## native_countryJamaica                    -1.190e+00  7.712e-01  -1.543 0.122726
## native_countryJapan                      -9.626e-01  7.286e-01  -1.321 0.186428
## native_countryLaos                       -1.846e+00  1.041e+00  -1.773 0.076226
## native_countryMexico                     -1.610e+00  6.652e-01  -2.421 0.015493
## native_countryNicaragua                  -1.781e+00  1.015e+00  -1.756 0.079117
## native_countryOutlying-US(Guam-USVI-etc) -1.341e+01  2.110e+02  -0.064 0.949340
## native_countryPeru                       -1.952e+00  1.057e+00  -1.848 0.064620
## native_countryPhilippines                -8.900e-01  6.446e-01  -1.381 0.167361
## native_countryPoland                     -1.187e+00  7.453e-01  -1.593 0.111173
## native_countryPortugal                   -1.189e+00  8.859e-01  -1.342 0.179602
## native_countryPuerto-Rico                -1.471e+00  7.382e-01  -1.992 0.046345
## native_countryScotland                   -1.444e+00  1.084e+00  -1.332 0.182779
## native_countrySouth                      -2.470e+00  7.367e-01  -3.353 0.000800
## native_countryTaiwan                     -1.394e+00  7.547e-01  -1.847 0.064682
## native_countryThailand                   -1.840e+00  1.017e+00  -1.809 0.070439
## native_countryTrinadad&Tobago            -1.626e+00  1.058e+00  -1.537 0.124348
## native_countryUnited-States              -9.990e-01  6.307e-01  -1.584 0.113230
## native_countryVietnam                    -2.429e+00  8.467e-01  -2.869 0.004117
## native_countryYugoslavia                 -4.723e-01  9.190e-01  -0.514 0.607310
##                                             
## (Intercept)                              ***
## age                                      ***
## work_classLocal-gov                      ***
## work_classPrivate                        ***
## work_classSelf-emp-inc                   ** 
## work_classSelf-emp-not-inc               ***
## work_classState-gov                      ***
## work_classWithout-pay                       
## education11th                               
## education12th                               
## education1st-4th                            
## education5th-6th                            
## education7th-8th                         *  
## education9th                                
## educationAssoc-acdm                      ***
## educationAssoc-voc                       ***
## educationBachelors                       ***
## educationDoctorate                       ***
## educationHS-grad                         ***
## educationMasters                         ***
## educationPreschool                          
## educationProf-school                     ***
## educationSome-college                    ***
## education_num                               
## marital_statusMarried-AF-spouse          ***
## marital_statusMarried-civ-spouse         ***
## marital_statusMarried-spouse-absent         
## marital_statusNever-married              ***
## marital_statusSeparated                     
## marital_statusWidowed                       
## occupationArmed-Forces                      
## occupationCraft-repair                      
## occupationExec-managerial                ***
## occupationFarming-fishing                ***
## occupationHandlers-cleaners              ***
## occupationMachine-op-inspct              ** 
## occupationOther-service                  ***
## occupationPriv-house-serv                *  
## occupationProf-specialty                 ***
## occupationProtective-serv                ***
## occupationSales                          ***
## occupationTech-support                   ***
## occupationTransport-moving                  
## relationshipNot-in-family                .  
## relationshipOther-relative                  
## relationshipOwn-child                    ** 
## relationshipUnmarried                       
## relationshipWife                         ***
## raceAsian-Pac-Islander                   ** 
## raceBlack                                *  
## raceOther                                   
## raceWhite                                ** 
## sexMale                                  ***
## capital_gain                             ***
## capital_loss                             ***
## hours_per_week                           ***
## native_countryCanada                        
## native_countryChina                      ** 
## native_countryColumbia                   ** 
## native_countryCuba                          
## native_countryDominican-Republic         *  
## native_countryEcuador                       
## native_countryEl-Salvador                *  
## native_countryEngland                       
## native_countryFrance                        
## native_countryGermany                       
## native_countryGreece                     ** 
## native_countryGuatemala                     
## native_countryHaiti                         
## native_countryHoland-Netherlands            
## native_countryHonduras                      
## native_countryHong                          
## native_countryHungary                       
## native_countryIndia                      *  
## native_countryIran                          
## native_countryIreland                       
## native_countryItaly                         
## native_countryJamaica                       
## native_countryJapan                         
## native_countryLaos                       .  
## native_countryMexico                     *  
## native_countryNicaragua                  .  
## native_countryOutlying-US(Guam-USVI-etc)    
## native_countryPeru                       .  
## native_countryPhilippines                   
## native_countryPoland                        
## native_countryPortugal                      
## native_countryPuerto-Rico                *  
## native_countryScotland                      
## native_countrySouth                      ***
## native_countryTaiwan                     .  
## native_countryThailand                   .  
## native_countryTrinadad&Tobago               
## native_countryUnited-States                 
## native_countryVietnam                    ** 
## native_countryYugoslavia                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 33851  on 30161  degrees of freedom
## Residual deviance: 19504  on 30067  degrees of freedom
## AIC: 19694
## 
## Number of Fisher Scoring iterations: 13

logistic regression age,occupation,education,hours_per_week,marital_status –>

model2 <- glm(income ~ age + occupation + education + hours_per_week + marital_status, data = clean_adult, family = binomial)

summary(model2)
## 
## Call:
## glm(formula = income ~ age + occupation + education + hours_per_week + 
##     marital_status, family = binomial, data = clean_adult)
## 
## Coefficients:
##                                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                          -6.336393   0.196041 -32.322  < 2e-16 ***
## age                                   0.029400   0.001553  18.934  < 2e-16 ***
## occupationArmed-Forces               -0.480258   1.271359  -0.378 0.705615    
## occupationCraft-repair                0.009334   0.071178   0.131 0.895664    
## occupationExec-managerial             0.789590   0.069096  11.428  < 2e-16 ***
## occupationFarming-fishing            -1.237178   0.126702  -9.764  < 2e-16 ***
## occupationHandlers-cleaners          -0.735531   0.135207  -5.440 5.33e-08 ***
## occupationMachine-op-inspct          -0.323060   0.094361  -3.424 0.000618 ***
## occupationOther-service              -0.992080   0.110795  -8.954  < 2e-16 ***
## occupationPriv-house-serv            -2.952415   1.143405  -2.582 0.009819 ** 
## occupationProf-specialty              0.449912   0.074055   6.075 1.24e-09 ***
## occupationProtective-serv             0.404928   0.113168   3.578 0.000346 ***
## occupationSales                       0.260351   0.073111   3.561 0.000369 ***
## occupationTech-support                0.618523   0.103320   5.986 2.14e-09 ***
## occupationTransport-moving           -0.156920   0.091232  -1.720 0.085432 .  
## education11th                         0.126298   0.204964   0.616 0.537766    
## education12th                         0.441703   0.259745   1.701 0.089032 .  
## education1st-4th                     -0.658946   0.456660  -1.443 0.149029    
## education5th-6th                     -0.550468   0.339213  -1.623 0.104636    
## education7th-8th                     -0.647111   0.234530  -2.759 0.005795 ** 
## education9th                         -0.361168   0.260739  -1.385 0.166000    
## educationAssoc-acdm                   1.386957   0.170289   8.145 3.80e-16 ***
## educationAssoc-voc                    1.356264   0.163669   8.287  < 2e-16 ***
## educationBachelors                    2.020826   0.152548  13.247  < 2e-16 ***
## educationDoctorate                    3.006100   0.208265  14.434  < 2e-16 ***
## educationHS-grad                      0.828217   0.148671   5.571 2.54e-08 ***
## educationMasters                      2.375606   0.162058  14.659  < 2e-16 ***
## educationPreschool                  -11.390495 109.217823  -0.104 0.916938    
## educationProf-school                  3.092973   0.193644  15.972  < 2e-16 ***
## educationSome-college                 1.163966   0.150768   7.720 1.16e-14 ***
## hours_per_week                        0.030994   0.001565  19.809  < 2e-16 ***
## marital_statusMarried-AF-spouse       2.894988   0.498793   5.804 6.48e-09 ***
## marital_statusMarried-civ-spouse      2.139345   0.058433  36.612  < 2e-16 ***
## marital_statusMarried-spouse-absent  -0.049302   0.213693  -0.231 0.817538    
## marital_statusNever-married          -0.438015   0.075954  -5.767 8.08e-09 ***
## marital_statusSeparated              -0.070958   0.146292  -0.485 0.627644    
## marital_statusWidowed                -0.006857   0.139514  -0.049 0.960800    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 33851  on 30161  degrees of freedom
## Residual deviance: 22009  on 30125  degrees of freedom
## AIC: 22083
## 
## Number of Fisher Scoring iterations: 13

logistic regression age,occupation,education –>

model3 <- glm(income ~ age + occupation + education, data = clean_adult, family = binomial)

summary(model3)
## 
## Call:
## glm(formula = income ~ age + occupation + education, family = binomial, 
##     data = clean_adult)
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                  -4.883248   0.162226 -30.101  < 2e-16 ***
## age                           0.043744   0.001244  35.155  < 2e-16 ***
## occupationArmed-Forces        0.046730   1.119509   0.042  0.96670    
## occupationCraft-repair        0.880111   0.064463  13.653  < 2e-16 ***
## occupationExec-managerial     1.381802   0.061124  22.607  < 2e-16 ***
## occupationFarming-fishing    -0.087277   0.116397  -0.750  0.45336    
## occupationHandlers-cleaners  -0.260250   0.127232  -2.045  0.04081 *  
## occupationMachine-op-inspct   0.346618   0.087538   3.960 7.51e-05 ***
## occupationOther-service      -0.952888   0.104044  -9.159  < 2e-16 ***
## occupationPriv-house-serv    -2.878147   1.010065  -2.849  0.00438 ** 
## occupationProf-specialty      0.782144   0.065519  11.938  < 2e-16 ***
## occupationProtective-serv     1.150886   0.101883  11.296  < 2e-16 ***
## occupationSales               0.827292   0.064493  12.828  < 2e-16 ***
## occupationTech-support        0.913039   0.091077  10.025  < 2e-16 ***
## occupationTransport-moving    0.791922   0.083560   9.477  < 2e-16 ***
## education11th                 0.020604   0.197805   0.104  0.91704    
## education12th                 0.363583   0.246167   1.477  0.13968    
## education1st-4th             -0.678134   0.450170  -1.506  0.13197    
## education5th-6th             -0.514083   0.332777  -1.545  0.12239    
## education7th-8th             -0.516120   0.228077  -2.263  0.02364 *  
## education9th                 -0.273016   0.253884  -1.075  0.28222    
## educationAssoc-acdm           1.443970   0.161296   8.952  < 2e-16 ***
## educationAssoc-voc            1.450754   0.156588   9.265  < 2e-16 ***
## educationBachelors            2.030138   0.146431  13.864  < 2e-16 ***
## educationDoctorate            3.125040   0.191120  16.351  < 2e-16 ***
## educationHS-grad              0.918808   0.143684   6.395 1.61e-10 ***
## educationMasters              2.335259   0.153560  15.208  < 2e-16 ***
## educationPreschool          -10.599256  73.002880  -0.145  0.88456    
## educationProf-school          3.306028   0.178654  18.505  < 2e-16 ***
## educationSome-college         1.191885   0.145366   8.199 2.42e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 33851  on 30161  degrees of freedom
## Residual deviance: 27018  on 30132  degrees of freedom
## AIC: 27078
## 
## Number of Fisher Scoring iterations: 12

logistic regression with age,occupation,education,hours_per_week,marital_status, –> race,sex ,work_class ,capital_gain,capital_loss –>

model4 <- glm(income ~ age + occupation + education + hours_per_week + marital_status + race + sex + work_class + capital_gain + capital_loss, data = clean_adult, family = binomial)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(model4)
## 
## Call:
## glm(formula = income ~ age + occupation + education + hours_per_week + 
##     marital_status + race + sex + work_class + capital_gain + 
##     capital_loss, family = binomial, data = clean_adult)
## 
## Coefficients:
##                                       Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                         -6.538e+00  3.151e-01 -20.747  < 2e-16 ***
## age                                  2.588e-02  1.673e-03  15.472  < 2e-16 ***
## occupationArmed-Forces              -1.166e+00  1.539e+00  -0.757  0.44881    
## occupationCraft-repair              -4.007e-04  7.913e-02  -0.005  0.99596    
## occupationExec-managerial            7.692e-01  7.585e-02  10.142  < 2e-16 ***
## occupationFarming-fishing           -1.072e+00  1.401e-01  -7.652 1.98e-14 ***
## occupationHandlers-cleaners         -7.657e-01  1.436e-01  -5.333 9.65e-08 ***
## occupationMachine-op-inspct         -3.210e-01  1.012e-01  -3.171  0.00152 ** 
## occupationOther-service             -8.447e-01  1.170e-01  -7.217 5.31e-13 ***
## occupationPriv-house-serv           -4.531e+00  1.740e+00  -2.605  0.00920 ** 
## occupationProf-specialty             4.948e-01  8.049e-02   6.148 7.86e-10 ***
## occupationProtective-serv            5.616e-01  1.254e-01   4.481 7.45e-06 ***
## occupationSales                      2.383e-01  8.132e-02   2.931  0.00338 ** 
## occupationTech-support               6.271e-01  1.095e-01   5.726 1.03e-08 ***
## occupationTransport-moving          -1.377e-01  9.892e-02  -1.392  0.16392    
## education11th                        5.680e-02  2.130e-01   0.267  0.78974    
## education12th                        4.173e-01  2.757e-01   1.514  0.13010    
## education1st-4th                    -6.565e-01  4.776e-01  -1.375  0.16927    
## education5th-6th                    -5.799e-01  3.480e-01  -1.666  0.09565 .  
## education7th-8th                    -6.134e-01  2.426e-01  -2.528  0.01146 *  
## education9th                        -2.993e-01  2.692e-01  -1.112  0.26617    
## educationAssoc-acdm                  1.315e+00  1.789e-01   7.347 2.03e-13 ***
## educationAssoc-voc                   1.261e+00  1.721e-01   7.326 2.38e-13 ***
## educationBachelors                   1.920e+00  1.601e-01  11.992  < 2e-16 ***
## educationDoctorate                   2.913e+00  2.220e-01  13.122  < 2e-16 ***
## educationHS-grad                     7.818e-01  1.558e-01   5.017 5.25e-07 ***
## educationMasters                     2.257e+00  1.709e-01  13.204  < 2e-16 ***
## educationPreschool                  -2.084e+01  1.606e+02  -0.130  0.89672    
## educationProf-school                 2.842e+00  2.061e-01  13.786  < 2e-16 ***
## educationSome-college                1.108e+00  1.581e-01   7.007 2.44e-12 ***
## hours_per_week                       2.929e-02  1.680e-03  17.432  < 2e-16 ***
## marital_statusMarried-AF-spouse      3.051e+00  5.044e-01   6.048 1.46e-09 ***
## marital_statusMarried-civ-spouse     2.164e+00  6.785e-02  31.892  < 2e-16 ***
## marital_statusMarried-spouse-absent  1.093e-02  2.349e-01   0.047  0.96290    
## marital_statusNever-married         -5.113e-01  8.386e-02  -6.097 1.08e-09 ***
## marital_statusSeparated             -8.433e-02  1.616e-01  -0.522  0.60174    
## marital_statusWidowed                1.393e-02  1.546e-01   0.090  0.92822    
## raceAsian-Pac-Islander               4.732e-01  2.481e-01   1.907  0.05651 .  
## raceBlack                            5.021e-01  2.365e-01   2.123  0.03373 *  
## raceOther                           -1.302e-01  3.714e-01  -0.351  0.72585    
## raceWhite                            6.270e-01  2.260e-01   2.774  0.00554 ** 
## sexMale                              1.555e-01  5.376e-02   2.893  0.00382 ** 
## work_classLocal-gov                 -7.080e-01  1.118e-01  -6.333 2.41e-10 ***
## work_classPrivate                   -5.065e-01  9.296e-02  -5.448 5.09e-08 ***
## work_classSelf-emp-inc              -3.322e-01  1.235e-01  -2.691  0.00713 ** 
## work_classSelf-emp-not-inc          -9.975e-01  1.094e-01  -9.117  < 2e-16 ***
## work_classState-gov                 -8.342e-01  1.245e-01  -6.698 2.11e-11 ***
## work_classWithout-pay               -1.318e+01  1.985e+02  -0.066  0.94706    
## capital_gain                         3.256e-04  1.064e-05  30.599  < 2e-16 ***
## capital_loss                         6.476e-04  3.824e-05  16.933  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 33851  on 30161  degrees of freedom
## Residual deviance: 19849  on 30112  degrees of freedom
## AIC: 19949
## 
## Number of Fisher Scoring iterations: 13

coefficient summary of model –>

summary(model)
## 
## Call:
## glm(formula = income ~ . - wgt, family = binomial, data = clean_adult)
## 
## Coefficients: (1 not defined because of singularities)
##                                            Estimate Std. Error z value Pr(>|z|)
## (Intercept)                              -6.251e+00  7.634e-01  -8.189 2.63e-16
## age                                       2.510e-02  1.709e-03  14.689  < 2e-16
## work_classLocal-gov                      -6.954e-01  1.129e-01  -6.161 7.22e-10
## work_classPrivate                        -5.006e-01  9.369e-02  -5.343 9.12e-08
## work_classSelf-emp-inc                   -3.318e-01  1.237e-01  -2.681 0.007337
## work_classSelf-emp-not-inc               -9.958e-01  1.099e-01  -9.057  < 2e-16
## work_classState-gov                      -8.190e-01  1.253e-01  -6.535 6.37e-11
## work_classWithout-pay                    -1.329e+01  1.970e+02  -0.067 0.946234
## education11th                             8.585e-02  2.138e-01   0.402 0.688013
## education12th                             4.361e-01  2.780e-01   1.568 0.116768
## education1st-4th                         -4.378e-01  4.963e-01  -0.882 0.377746
## education5th-6th                         -4.163e-01  3.592e-01  -1.159 0.246485
## education7th-8th                         -5.731e-01  2.434e-01  -2.354 0.018554
## education9th                             -2.457e-01  2.702e-01  -0.909 0.363237
## educationAssoc-acdm                       1.264e+00  1.797e-01   7.035 2.00e-12
## educationAssoc-voc                        1.255e+00  1.728e-01   7.259 3.90e-13
## educationBachelors                        1.890e+00  1.607e-01  11.762  < 2e-16
## educationDoctorate                        2.930e+00  2.230e-01  13.144  < 2e-16
## educationHS-grad                          7.654e-01  1.563e-01   4.896 9.76e-07
## educationMasters                          2.246e+00  1.718e-01  13.076  < 2e-16
## educationPreschool                       -2.008e+01  1.911e+02  -0.105 0.916328
## educationProf-school                      2.832e+00  2.068e-01  13.694  < 2e-16
## educationSome-college                     1.102e+00  1.586e-01   6.946 3.75e-12
## education_num                                    NA         NA      NA       NA
## marital_statusMarried-AF-spouse           2.778e+00  5.770e-01   4.814 1.48e-06
## marital_statusMarried-civ-spouse          2.108e+00  2.748e-01   7.672 1.70e-14
## marital_statusMarried-spouse-absent       2.929e-03  2.408e-01   0.012 0.990292
## marital_statusNever-married              -4.834e-01  8.922e-02  -5.418 6.03e-08
## marital_statusSeparated                  -7.598e-02  1.654e-01  -0.459 0.645914
## marital_statusWidowed                     1.818e-01  1.582e-01   1.150 0.250240
## occupationArmed-Forces                   -1.106e+00  1.515e+00  -0.730 0.465401
## occupationCraft-repair                    6.031e-02  8.072e-02   0.747 0.454997
## occupationExec-managerial                 8.016e-01  7.788e-02  10.293  < 2e-16
## occupationFarming-fishing                -1.004e+00  1.408e-01  -7.129 1.01e-12
## occupationHandlers-cleaners              -6.983e-01  1.447e-01  -4.826 1.39e-06
## occupationMachine-op-inspct              -2.671e-01  1.026e-01  -2.602 0.009274
## occupationOther-service                  -8.359e-01  1.191e-01  -7.019 2.24e-12
## occupationPriv-house-serv                -4.213e+00  1.727e+00  -2.439 0.014709
## occupationProf-specialty                  5.150e-01  8.245e-02   6.247 4.19e-10
## occupationProtective-serv                 5.999e-01  1.262e-01   4.753 2.00e-06
## occupationSales                           2.914e-01  8.313e-02   3.505 0.000456
## occupationTech-support                    6.615e-01  1.117e-01   5.923 3.16e-09
## occupationTransport-moving               -9.265e-02  1.000e-01  -0.926 0.354209
## relationshipNot-in-family                 4.605e-01  2.717e-01   1.695 0.090154
## relationshipOther-relative               -3.920e-01  2.477e-01  -1.583 0.113524
## relationshipOwn-child                    -7.317e-01  2.708e-01  -2.703 0.006881
## relationshipUnmarried                     3.394e-01  2.874e-01   1.181 0.237651
## relationshipWife                          1.350e+00  1.056e-01  12.786  < 2e-16
## raceAsian-Pac-Islander                    8.378e-01  2.858e-01   2.932 0.003369
## raceBlack                                 5.240e-01  2.398e-01   2.185 0.028869
## raceOther                                 1.593e-01  3.791e-01   0.420 0.674265
## raceWhite                                 6.337e-01  2.287e-01   2.770 0.005599
## sexMale                                   8.728e-01  8.084e-02  10.796  < 2e-16
## capital_gain                              3.229e-04  1.074e-05  30.067  < 2e-16
## capital_loss                              6.406e-04  3.840e-05  16.679  < 2e-16
## hours_per_week                            2.931e-02  1.701e-03  17.229  < 2e-16
## native_countryCanada                     -8.387e-01  6.893e-01  -1.217 0.223734
## native_countryChina                      -1.918e+00  7.034e-01  -2.727 0.006387
## native_countryColumbia                   -3.285e+00  1.032e+00  -3.183 0.001457
## native_countryCuba                       -7.665e-01  7.031e-01  -1.090 0.275646
## native_countryDominican-Republic         -2.950e+00  1.220e+00  -2.418 0.015627
## native_countryEcuador                    -1.427e+00  9.563e-01  -1.492 0.135628
## native_countryEl-Salvador                -1.756e+00  7.939e-01  -2.212 0.026958
## native_countryEngland                    -8.930e-01  7.009e-01  -1.274 0.202589
## native_countryFrance                     -5.944e-01  8.128e-01  -0.731 0.464586
## native_countryGermany                    -7.193e-01  6.785e-01  -1.060 0.289141
## native_countryGreece                     -2.199e+00  8.385e-01  -2.623 0.008724
## native_countryGuatemala                  -1.382e+00  9.770e-01  -1.415 0.157206
## native_countryHaiti                      -1.225e+00  9.284e-01  -1.320 0.186888
## native_countryHoland-Netherlands         -1.180e+01  8.827e+02  -0.013 0.989339
## native_countryHonduras                   -2.361e+00  2.685e+00  -0.879 0.379144
## native_countryHong                       -1.334e+00  8.990e-01  -1.484 0.137761
## native_countryHungary                    -1.306e+00  9.884e-01  -1.321 0.186544
## native_countryIndia                      -1.696e+00  6.689e-01  -2.536 0.011217
## native_countryIran                       -1.167e+00  7.583e-01  -1.539 0.123898
## native_countryIreland                    -6.878e-01  8.886e-01  -0.774 0.438911
## native_countryItaly                      -3.635e-01  7.098e-01  -0.512 0.608628
## native_countryJamaica                    -1.190e+00  7.712e-01  -1.543 0.122726
## native_countryJapan                      -9.626e-01  7.286e-01  -1.321 0.186428
## native_countryLaos                       -1.846e+00  1.041e+00  -1.773 0.076226
## native_countryMexico                     -1.610e+00  6.652e-01  -2.421 0.015493
## native_countryNicaragua                  -1.781e+00  1.015e+00  -1.756 0.079117
## native_countryOutlying-US(Guam-USVI-etc) -1.341e+01  2.110e+02  -0.064 0.949340
## native_countryPeru                       -1.952e+00  1.057e+00  -1.848 0.064620
## native_countryPhilippines                -8.900e-01  6.446e-01  -1.381 0.167361
## native_countryPoland                     -1.187e+00  7.453e-01  -1.593 0.111173
## native_countryPortugal                   -1.189e+00  8.859e-01  -1.342 0.179602
## native_countryPuerto-Rico                -1.471e+00  7.382e-01  -1.992 0.046345
## native_countryScotland                   -1.444e+00  1.084e+00  -1.332 0.182779
## native_countrySouth                      -2.470e+00  7.367e-01  -3.353 0.000800
## native_countryTaiwan                     -1.394e+00  7.547e-01  -1.847 0.064682
## native_countryThailand                   -1.840e+00  1.017e+00  -1.809 0.070439
## native_countryTrinadad&Tobago            -1.626e+00  1.058e+00  -1.537 0.124348
## native_countryUnited-States              -9.990e-01  6.307e-01  -1.584 0.113230
## native_countryVietnam                    -2.429e+00  8.467e-01  -2.869 0.004117
## native_countryYugoslavia                 -4.723e-01  9.190e-01  -0.514 0.607310
##                                             
## (Intercept)                              ***
## age                                      ***
## work_classLocal-gov                      ***
## work_classPrivate                        ***
## work_classSelf-emp-inc                   ** 
## work_classSelf-emp-not-inc               ***
## work_classState-gov                      ***
## work_classWithout-pay                       
## education11th                               
## education12th                               
## education1st-4th                            
## education5th-6th                            
## education7th-8th                         *  
## education9th                                
## educationAssoc-acdm                      ***
## educationAssoc-voc                       ***
## educationBachelors                       ***
## educationDoctorate                       ***
## educationHS-grad                         ***
## educationMasters                         ***
## educationPreschool                          
## educationProf-school                     ***
## educationSome-college                    ***
## education_num                               
## marital_statusMarried-AF-spouse          ***
## marital_statusMarried-civ-spouse         ***
## marital_statusMarried-spouse-absent         
## marital_statusNever-married              ***
## marital_statusSeparated                     
## marital_statusWidowed                       
## occupationArmed-Forces                      
## occupationCraft-repair                      
## occupationExec-managerial                ***
## occupationFarming-fishing                ***
## occupationHandlers-cleaners              ***
## occupationMachine-op-inspct              ** 
## occupationOther-service                  ***
## occupationPriv-house-serv                *  
## occupationProf-specialty                 ***
## occupationProtective-serv                ***
## occupationSales                          ***
## occupationTech-support                   ***
## occupationTransport-moving                  
## relationshipNot-in-family                .  
## relationshipOther-relative                  
## relationshipOwn-child                    ** 
## relationshipUnmarried                       
## relationshipWife                         ***
## raceAsian-Pac-Islander                   ** 
## raceBlack                                *  
## raceOther                                   
## raceWhite                                ** 
## sexMale                                  ***
## capital_gain                             ***
## capital_loss                             ***
## hours_per_week                           ***
## native_countryCanada                        
## native_countryChina                      ** 
## native_countryColumbia                   ** 
## native_countryCuba                          
## native_countryDominican-Republic         *  
## native_countryEcuador                       
## native_countryEl-Salvador                *  
## native_countryEngland                       
## native_countryFrance                        
## native_countryGermany                       
## native_countryGreece                     ** 
## native_countryGuatemala                     
## native_countryHaiti                         
## native_countryHoland-Netherlands            
## native_countryHonduras                      
## native_countryHong                          
## native_countryHungary                       
## native_countryIndia                      *  
## native_countryIran                          
## native_countryIreland                       
## native_countryItaly                         
## native_countryJamaica                       
## native_countryJapan                         
## native_countryLaos                       .  
## native_countryMexico                     *  
## native_countryNicaragua                  .  
## native_countryOutlying-US(Guam-USVI-etc)    
## native_countryPeru                       .  
## native_countryPhilippines                   
## native_countryPoland                        
## native_countryPortugal                      
## native_countryPuerto-Rico                *  
## native_countryScotland                      
## native_countrySouth                      ***
## native_countryTaiwan                     .  
## native_countryThailand                   .  
## native_countryTrinadad&Tobago               
## native_countryUnited-States                 
## native_countryVietnam                    ** 
## native_countryYugoslavia                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 33851  on 30161  degrees of freedom
## Residual deviance: 19504  on 30067  degrees of freedom
## AIC: 19694
## 
## Number of Fisher Scoring iterations: 13

anova of model –>

anova(model)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Analysis of Deviance Table
## 
## Model: binomial, link: logit
## 
## Response: income
## 
## Terms added sequentially (first to last)
## 
## 
##                Df Deviance Resid. Df Resid. Dev  Pr(>Chi)    
## NULL                           30161      33851              
## age             1   1738.5     30160      32112 < 2.2e-16 ***
## work_class      6    426.1     30154      31686 < 2.2e-16 ***
## education      15   3570.2     30139      28116 < 2.2e-16 ***
## education_num   0      0.0     30139      28116              
## marital_status  6   5091.4     30133      23024 < 2.2e-16 ***
## occupation     13    765.5     30120      22259 < 2.2e-16 ***
## relationship    5    199.5     30115      22059 < 2.2e-16 ***
## race            4     21.3     30111      22038 0.0002802 ***
## sex             1    165.7     30110      21872 < 2.2e-16 ***
## capital_gain    1   1684.3     30109      20188 < 2.2e-16 ***
## capital_loss    1    294.9     30108      19893 < 2.2e-16 ***
## hours_per_week  1    307.6     30107      19586 < 2.2e-16 ***
## native_country 40     81.7     30067      19504 0.0001101 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

find most influential variables –>

most_model <- randomForest(income ~ age + work_class + education + marital_status + occupation + relationship + race + sex + capital_gain + capital_loss + hours_per_week + native_country, data = clean_adult)

vip(most_model)

model fit –>

performance(model)
## # Indices of model performance
## 
## AIC       |      AICc |       BIC | Tjur's R2 |  RMSE | Sigma | Log_loss | Score_log | Score_spherical |   PCP
## --------------------------------------------------------------------------------------------------------------
## 19693.846 | 19694.452 | 20483.708 |     0.448 | 0.322 | 1.000 |    0.323 |      -Inf |       4.671e-04 | 0.794

r^2 of model –>

pacman::p_load(effectsize)
pacman::p_load(performance)

r2_value<- r2(model)$R2

r2_value
## Tjur's R2 
## 0.4480447

substantially good –>

interpret_r2(r2_value)
##     Tjur's R2 
## "substantial" 
## (Rules: cohen1988)

fit of model to curve –>

roc_curve <- roc(income ~ fitted.values(model), data=clean_adult,
                 plot = TRUE, legacy.axes = TRUE,
                 print.auc = TRUE, ci = TRUE)
## Setting levels: control = <=50K, case = >50K
## Setting direction: controls < cases

prediction plots –>

# using [all] gets smooth plots
prediction_relation<-ggpredict(model, terms="relationship") 
prediction_age<-ggpredict(model, terms="age [all]")


# Plot each term individually
plot_relation<-plot(prediction_relation) + theme(axis.text.x = element_text(angle = 90, hjust = 1))

plot_age <-plot(prediction_age)

# Combine plots into a single figure
plot_grid(plot_relation, plot_age)

prediction_education<-ggpredict(model, terms="education [all]")
prediction_marital_status<-ggpredict(model, terms="marital_status [all]")

plot_education <- plot(prediction_education) + theme(axis.text.x = element_text(angle = 90, hjust = 1))

plot_marital_status <- plot(prediction_marital_status) + theme(axis.text.x = element_text(angle = 90, hjust = 1))

plot_grid(plot_education, plot_marital_status)

prediction_occupation<-ggpredict(model, terms="occupation [all]")

plot_occupation <- plot(prediction_occupation) + theme(axis.text.x = element_text(angle = 90, hjust = 1))

plot_grid(plot_occupation)

redoing of logistic regression using R^2 McFadden method –>

pacman::p_load(pscl)
pR2(model)
## fitting null model for pseudo-r2
##           llh       llhNull            G2      McFadden          r2ML 
## -9.751923e+03 -1.692535e+04  1.434686e+04  4.238275e-01  3.785254e-01 
##          r2CU 
##  5.612201e-01
model <- glm(income ~ . - wgt, data = clean_adult, family = binomial)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
null_model <- glm(income ~ 1, data = clean_adult, family = binomial)

pseudoR2 <- 1 - as.numeric(logLik(model)) / as.numeric(logLik(null_model))

pseudoR2
## [1] 0.4238275

Pick a significant variable and use either effects plots or odds ratios to interpret the coefficient. –>

effects analysis and plots –>

pacman::p_load(effects)

adult_higher_ed <- clean_adult |>
  filter (education %in% c("Bachelors", "Masters", "Doctorate"))

model_higher_ed <- glm(income ~ education, data = adult_higher_ed, family = binomial)

higher_ed_effect <- effect("education", model_higher_ed, type = "response")

print(higher_ed_effect)
## 
##  education effect
## education
## Bachelors Doctorate   Masters 
## 0.4214909 0.7466667 0.5642286
summary(model_higher_ed)
## 
## Call:
## glm(formula = income ~ education, family = binomial, data = adult_higher_ed)
## 
## Coefficients:
##                    Estimate Std. Error z value Pr(>|z|)    
## (Intercept)        -0.31666    0.02851  -11.11   <2e-16 ***
## educationDoctorate  1.39757    0.12211   11.45   <2e-16 ***
## educationMasters    0.57500    0.05756    9.99   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 9745.3  on 7045  degrees of freedom
## Residual deviance: 9520.6  on 7043  degrees of freedom
## AIC: 9526.6
## 
## Number of Fisher Scoring iterations: 4

Getting a doctorate increases your chances of an income>50k by a factor of 4 or 400% –>

adult_married_status <- clean_adult |>
  filter (marital_status %in% c("Married-AF-spouse", "Married-civ_spouse", "Married-spouse-absent"))

model_married_status <- glm(income ~ marital_status, data = adult_married_status, family = binomial)

married_effect <- effect("marital_status", model_married_status, type = "response")

print(married_effect)
## 
##  marital_status effect
## marital_status
##     Married-AF-spouse Married-spouse-absent 
##            0.47619048            0.08378378
summary(model_married_status)
## 
## Call:
## glm(formula = income ~ marital_status, family = binomial, data = adult_married_status)
## 
## Coefficients:
##                                     Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                         -0.09531    0.43693  -0.218    0.827    
## marital_statusMarried-spouse-absent -2.29670    0.47552  -4.830 1.37e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 262.46  on 390  degrees of freedom
## Residual deviance: 242.12  on 389  degrees of freedom
## AIC: 246.12
## 
## Number of Fisher Scoring iterations: 5

Having an absent spouse decreases your chances of an income > 50k by a factor of 0.1 or by 10%. –>

model_race <- glm(income ~ race, data = clean_adult, family = binomial)

race_effect <- effect("race", model_race, type = "response")

print(race_effect)
## 
##  race effect
## race
## Amer-Indian-Eskimo Asian-Pac-Islander              Black              Other 
##          0.1188811          0.2770950          0.1299255          0.0909091 
##              White 
##          0.2637180
race_model <- glm(formula = income ~ race, family = binomial, data = clean_adult)
summary(race_model)
## 
## Call:
## glm(formula = income ~ race, family = binomial, data = clean_adult)
## 
## Coefficients:
##                        Estimate Std. Error z value Pr(>|z|)    
## (Intercept)             -2.0031     0.1827 -10.964  < 2e-16 ***
## raceAsian-Pac-Islander   1.0442     0.1974   5.290 1.22e-07 ***
## raceBlack                0.1015     0.1911   0.531    0.596    
## raceOther               -0.2995     0.2928  -1.023    0.306    
## raceWhite                0.9763     0.1832   5.328 9.92e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 33851  on 30161  degrees of freedom
## Residual deviance: 33504  on 30157  degrees of freedom
## AIC: 33514
## 
## Number of Fisher Scoring iterations: 4

Being Asian or Pacific Islander almost triples your chances of an income > 50k or by 284%. –>

model_age <- glm(income ~ age, data = clean_adult, family = binomial)

age_effect <- effect("age", model_age, type = "response")

print(age_effect)
## 
##  age effect
## age
##        20        40        50        70        90 
## 0.1237647 0.2478050 0.3347221 0.5399135 0.7324117
summary(model_age)
## 
## Call:
## glm(formula = income ~ age, family = binomial, data = clean_adult)
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.804151   0.045583  -61.52   <2e-16 ***
## age          0.042345   0.001044   40.58   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 33851  on 30161  degrees of freedom
## Residual deviance: 32112  on 30160  degrees of freedom
## AIC: 32116
## 
## Number of Fisher Scoring iterations: 4
plot(age_effect, main = "Effect of Age on Income >50k",
xlab = "Age",
ylab = "Prediction Probability")

plot(higher_ed_effect, main = "Effect of Higher Education on Income >50k",
xlab = "Higher Ed Level",
ylab = "Prediction Probability")

model_occupation <- glm(income ~ occupation, data = clean_adult, family = binomial)

occupation_effect <- effect("occupation", model_occupation, type = "response")

print(occupation_effect)
## 
##  occupation effect
## occupation
##      Adm-clerical      Armed-Forces      Craft-repair   Exec-managerial 
##       0.133834991       0.111111111       0.225310174       0.485220441 
##   Farming-fishing Handlers-cleaners Machine-op-inspct     Other-service 
##       0.116279070       0.061481481       0.124618515       0.041095890 
##   Priv-house-serv    Prof-specialty   Protective-serv             Sales 
##       0.006993007       0.448489351       0.326086957       0.270647321 
##      Tech-support  Transport-moving 
##       0.304824561       0.202926209
summary(model_occupation)
## 
## Call:
## glm(formula = income ~ occupation, family = binomial, data = clean_adult)
## 
## Coefficients:
##                             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                 -1.86747    0.04815 -38.785  < 2e-16 ***
## occupationArmed-Forces      -0.21197    1.06175  -0.200  0.84176    
## occupationCraft-repair       0.63248    0.06115  10.342  < 2e-16 ***
## occupationExec-managerial    1.80833    0.05763  31.378  < 2e-16 ***
## occupationFarming-fishing   -0.16068    0.11026  -1.457  0.14505    
## occupationHandlers-cleaners -0.85810    0.12311  -6.970 3.16e-12 ***
## occupationMachine-op-inspct -0.08193    0.08355  -0.981  0.32677    
## occupationOther-service     -1.28242    0.10109 -12.686  < 2e-16 ***
## occupationPriv-house-serv   -3.08836    1.00456  -3.074  0.00211 ** 
## occupationProf-specialty     1.66069    0.05762  28.824  < 2e-16 ***
## occupationProtective-serv    1.14153    0.09687  11.784  < 2e-16 ***
## occupationSales              0.87613    0.06109  14.342  < 2e-16 ***
## occupationTech-support       1.04304    0.08656  12.050  < 2e-16 ***
## occupationTransport-moving   0.49936    0.07906   6.316 2.69e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 33851  on 30161  degrees of freedom
## Residual deviance: 29954  on 30148  degrees of freedom
## AIC: 29982
## 
## Number of Fisher Scoring iterations: 7

Having an executive managerial occupation increase your chances of having an income > 50k by a factor of 6.04 , or by 600% –>

model_occupation <- glm(income ~ occupation, data = clean_adult, family = binomial)

occupation_effect <- effect("occupation", model_occupation, type = "response")

occupation_effectdf <- as.data.frame(occupation_effect)

ggplot(occupation_effectdf, aes(x = fit, y = reorder (occupation,fit))) +
  geom_point(size = 3, color = "orange") +
  geom_errorbarh(aes(xmin = lower, xmax = upper), height = 0.2, color = "blue") +
  labs(title = "Effect of Occupation on Income >50k", x = "Predicted Probability", y = "Occupation") + theme_minimal(base_size = 10)

large range of armed forces income because of set pay ranks in the military –>