This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
summary(cars)
## speed dist
## Min. : 4.0 Min. : 2.00
## 1st Qu.:12.0 1st Qu.: 26.00
## Median :15.0 Median : 36.00
## Mean :15.4 Mean : 42.98
## 3rd Qu.:19.0 3rd Qu.: 56.00
## Max. :25.0 Max. :120.00
You can also embed plots, for example:
Define the following terms and where appropriate compare with the
alternative method. 1. EDA Exploratory Data Analysis is the first step
in the data exploration process, or at least it should be. The idea
behind conducting EDA is to look for trends, patterns, and anomalies.
This is accomplished by creating figures and tables
(i.e. visualizations) and calculating simple statistical computations
(i.e. correlations) This can then be used to formulate and test early
hypotheses. The 5 core activities of the data analysis process are: a.
State/refine the question b. Explore the data/data exploration c. Build
formal statistical models d. Interpret the results e. Communicate the
results Some books state that there are more steps in the Data Analysis
process by breaking up (b) data exploration into two or three steps
(data collection, data cleaning and/or processing). This is done before
building any formal statistical models. An easy way to start this in R
is to load the data explorer package and use the “create report”
command. This report will help you: A. Determine if there are any
problems with your data set (such as missing values/NAs, extraneous
entries, etc.) B. Whether or not your question can be answered with the
data set C. Preliminary visualization of your hypothesis testing Some
common graphs frequently used are scatterplot, boxplots, histograms (can
help with showing skewness, spread, and median), and heat maps. It’s
also very common to run a quartile table to see the spread of your data
and to use the “outlier” command in R to pinpoint outliers. It’s a good
idea to make sure that your outlier is not the result of a clerical
error, such as entering a person’s height at 650 inches instead of 65
inches. 2. One-Way ANOVA versus Two-Way ANOVA ANOVA is an abbreviation
for the phrase Analysis of Variance. It is an inference procedure to
compare means and variance between sample means from a sample
population. It also checks for variance within the individual samples.
The null hypothesis (H0) is that all the means between the samples are
equal. The alternative hypothesis (H1) is that not all means are equal.
For a one-way ANOVA you are determine the effect of one factor or
treatment affects a response variable for your population. (ex. number
of study hours effect on a student’s grade for 3 different sections of
the same class) However, ANOVA tests cannot be used on all experiments.
For an accurate analysis, the sample population must be: a. Samples must
be independent of each other b. Normally distributed c. Standard
deviations must be equal Table 1. One-way Anova table
SSR: regression sum of squares df: degrees of freedom SSE: error sum of squares k: total number of sample groups SST: total sum of squares N: total number of observations
Use a Two-way ANOVA for testing more complex experiments. You can use 2 independent variables against one dependent variable. You can test interactions in a two-way ANOVA as well. Alternatives to ANOVA are the t-test, generalized linear models, Bayesian analysis, mixed-effects, random forests, permutations, and the Kruskal-Wallis’s test. 3. The method of least squares for regression Using the least squares on a regression model helps to determine the error in the model by averaging the square the difference between your predicted model observation and each observed observation in your population. This helps you find the model that best fits your data. The goal is to minimize the sum of all these “gaps” between your model and the observed data.
Figure 1. Mean Squared Error of your model 4. Assumptions of a linear model and how to check them a. Linearity – relationship is linear A. Check – residual plots b. Independent – observations are independent of each other A. Plot residuals against time B. Make a correlation plot and look for strong correlations between the factors to each other. But just because there is zero correlation, does automatically imply independence, but it can help guide you in the right direction. c. Homoscedastity – the variance of the residuals is relatively consistent A. Residual vs. Fitted plot B. Variance between the groups in the population d. Normality – the residuals are normally distributed A. Q-Q plot B. Transform dependent variables if it’s not normalized e. Multicollinearity – predictors are not correlated with each other (strong interactions between the predictors) A. Variance Inflation Factor (VIF) B. ANOVA with interaction terms
Note that the echo = FALSE
parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(DataExplorer)
library(ggplot2)
library(gtsummary)
library(patchwork)
library(data.table)
##
## Attaching package: 'data.table'
##
## The following objects are masked from 'package:lubridate':
##
## hour, isoweek, mday, minute, month, quarter, second, wday, week,
## yday, year
##
## The following objects are masked from 'package:dplyr':
##
## between, first, last
##
## The following object is masked from 'package:purrr':
##
## transpose
pacman::p_load("pROC")
library(performance)
library(ggeffects)
library(sjPlot)
library(gtsummary)
pacman::p_load("equatiomatic")
pacman::p_load("vip")
library(mgcv)
## Loading required package: nlme
##
## Attaching package: 'nlme'
##
## The following object is masked from 'package:dplyr':
##
## collapse
##
## This is mgcv 1.9-1. For overview type 'help("mgcv-package")'.
library(cowplot)
##
## Attaching package: 'cowplot'
##
## The following objects are masked from 'package:sjPlot':
##
## plot_grid, save_plot
##
## The following object is masked from 'package:ggeffects':
##
## get_title
##
## The following object is masked from 'package:patchwork':
##
## align_plots
##
## The following object is masked from 'package:lubridate':
##
## stamp
library(randomForest)
## randomForest 4.7-1.2
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
##
## The following object is masked from 'package:dplyr':
##
## combine
##
## The following object is masked from 'package:ggplot2':
##
## margin
library(ISLR)
pacman::p_load(tidycensus)
Data Cleaning The goal of this section is to explore the data set and get it ready for analysis. There are no missing values in the data set, but there are some incorrect entries that must be identified and removed before completing the analysis. Age can be regarded as quantitative, and any value less than 18 is invalid. Length of residence (LenRes) is a value ranging from zero to someone’s age. LenRes should not be higher than Age. Income is coded as an ordinal value, ranging from 1 to 12, it’s left to you to decide if it should be treated as continuous or categorical. You should create a simple 1-2 paragraph summary of this section. Be sure to fully explain the reasoning behind transforming any columns and removing any rows. Simply saying that, “Campbell told me to” is not sufficient. Justify why it makes sense not to include any rows whose age is less than 18 or why we shouldn’t use rows in which length of residence is larger than age.
d1 <- read.csv("catalog.csv")
d1
## SpendRat Age LenRes Income TotAsset SecAssets ShortLiq LongLiq WlthIdx
## 1 11.83 0 2 3 122 27 225 422 286
## 2 16.83 35 3 5 195 36 220 420 430
## 3 11.38 46 9 5 123 24 200 420 290
## 4 31.33 41 2 2 117 25 222 419 279
## 5 1.90 46 7 9 493 105 310 500 520
## 6 84.13 46 15 5 138 27 340 450 440
## 7 2.15 46 16 4 162 25 230 430 360
## 8 38.00 56 31 6 117 27 300 440 400
## 9 136.28 48 8 5 119 23 250 430 360
## 10 61.46 54 8 5 50 10 200 420 230
## 11 2.73 43 15 4 135 21 230 430 340
## 12 5.86 66 8 4 81 19 250 430 320
## 13 113.00 61 8 5 999 245 999 720 880
## 14 8.40 77 8 3 198 45 210 430 340
## 15 31.67 0 8 6 243 55 220 450 370
## 16 19.00 50 7 4 412 95 290 470 490
## 17 70.85 49 19 5 192 37 200 440 330
## 18 20.17 76 12 5 64 15 190 410 280
## 19 0.59 49 2 7 229 44 270 450 430
## 20 19.48 63 7 5 95 22 270 440 360
## 21 86.20 67 32 5 87 20 250 430 330
## 22 25.46 70 36 5 511 99 270 540 500
## 23 76.82 77 15 4 242 56 300 470 450
## 24 0.77 71 9 5 43 10 200 420 210
## 25 335.94 58 6 6 88 20 220 430 290
## 26 18.78 48 3 5 534 94 240 520 460
## 27 205.07 77 9 6 131 30 205 433 295
## 28 216.57 66 9 7 85 20 220 420 300
## 29 77.21 21 3 4 206 12 280 460 420
## 30 0.26 89 11 5 57 13 230 420 280
## 31 92.53 81 37 6 999 209 280 670 530
## 32 46.55 56 19 3 143 23 220 410 380
## 33 6.86 68 20 5 338 78 280 480 490
## 34 18.14 40 7 5 306 47 280 430 490
## 35 120.50 78 3 4 285 55 290 440 470
## 36 37.08 69 7 3 100 23 200 420 300
## 37 19.20 46 27 4 52 12 210 420 250
## 38 3.35 0 2 6 112 26 310 440 400
## 39 23.08 74 25 3 84 19 260 430 330
## 40 165.13 54 9 4 164 31 225 433 333
## 41 6.50 70 9 4 109 18 194 414 265
## 42 2.31 0 2 5 132 28 227 424 303
## 43 8.29 73 3 4 73 17 250 430 310
## 44 3.95 53 8 7 127 25 240 430 350
## 45 14.88 42 7 6 155 31 229 427 328
## 46 125.32 64 25 6 196 32 240 420 430
## 47 39.43 50 5 7 353 68 240 440 440
## 48 39.81 37 10 5 216 33 280 440 440
## 49 230.89 20 21 5 564 109 250 520 490
## 50 3.61 46 16 5 244 47 220 440 390
## 51 37.78 57 12 4 277 64 230 440 410
## 52 9.23 54 3 7 210 36 210 410 390
## 53 44.00 47 17 5 315 61 280 450 480
## 54 86.96 43 15 5 45 7 230 420 270
## 55 2.37 55 7 6 127 29 216 424 292
## 56 4.81 56 15 6 228 53 260 460 410
## 57 0.16 57 10 6 139 32 310 450 420
## 58 99.10 36 2 5 51 8 230 410 300
## 59 11.85 38 8 7 260 46 260 440 450
## 60 12.80 38 7 5 195 30 230 430 370
## 61 21.00 55 19 6 145 28 190 410 360
## 62 17.53 59 8 5 243 56 330 480 480
## 63 72.91 55 22 6 140 27 280 450 390
## 64 47.18 0 8 4 397 75 240 480 430
## 65 113.71 49 19 4 129 25 200 420 290
## 66 10.38 52 7 6 98 19 202 417 252
## 67 9.36 41 9 4 170 26 190 410 380
## 68 304.61 49 8 4 267 52 190 410 520
## 69 6.76 0 4 7 258 50 230 470 410
## 70 41.63 52 18 4 188 35 230 430 380
## 71 1.37 41 8 3 196 33 220 420 360
## 72 46.08 0 2 5 130 25 220 423 297
## 73 24.12 34 13 3 183 42 220 420 400
## 74 33.15 32 3 5 32 2 220 410 260
## 75 9.80 0 5 5 81 16 194 411 230
## 76 22.44 78 40 4 38 9 210 420 230
## 77 11.79 44 8 3 237 18 200 420 350
## 78 1.36 81 8 2 92 21 240 430 340
## 79 0.71 86 42 2 90 21 280 440 360
## 80 39.69 76 8 5 107 34 206 431 274
## 81 4.33 79 18 4 77 18 220 430 300
## 82 8.39 84 22 2 91 21 260 440 340
## 83 10.80 49 14 2 117 23 230 430 320
## 84 5.80 55 16 4 169 35 238 432 347
## 85 1.33 70 17 3 94 23 190 420 254
## 86 4.13 43 16 2 139 24 230 430 340
## 87 17.94 85 42 1 123 32 207 432 289
## 88 3.23 56 25 2 126 29 260 430 380
## 89 45.54 42 8 3 168 26 240 420 400
## 90 6.77 45 9 3 40 6 160 400 260
## 91 4.53 55 7 2 143 28 216 428 311
## 92 13.50 62 14 2 77 18 260 430 330
## 93 129.72 40 9 2 58 9 270 420 320
## 94 19.27 44 17 4 180 27 240 440 370
## 95 153.67 71 38 4 69 16 200 420 260
## 96 61.90 61 31 4 209 48 290 440 490
## 97 275.91 79 32 2 63 15 230 420 290
## 98 5.60 51 13 4 222 36 260 440 450
## 99 401.42 52 24 5 260 50 270 470 440
## 100 15.00 35 8 4 141 8 250 420 370
## 101 3.67 62 6 4 184 42 310 470 440
## 102 7.60 30 7 4 20 1 220 400 200
## 103 5.56 54 35 4 273 51 260 460 440
## 104 9.58 0 4 6 170 33 270 440 410
## 105 16.69 60 14 5 999 999 190 999 540
## 106 5.92 60 16 5 74 17 180 410 240
## 107 39.73 41 34 5 62 9 270 420 320
## 108 2.22 45 7 2 119 18 240 420 360
## 109 1.33 73 9 5 307 47 220 460 390
## 110 4.65 35 11 8 166 9 220 410 380
## 111 31.65 0 9 6 92 21 240 430 320
## 112 6.13 41 11 5 218 33 210 410 430
## 113 13.14 61 30 5 305 70 220 470 390
## 114 44.13 87 2 1 48 21 178 419 198
## 115 26.43 50 9 5 475 92 230 460 460
## 116 9.88 66 8 5 192 38 240 440 390
## 117 13.28 63 27 5 202 39 230 430 410
## 118 14.93 53 24 2 140 27 200 430 300
## 119 30.73 42 9 4 117 21 206 421 285
## 120 5.50 40 9 4 122 19 250 420 350
## 121 15.95 54 29 5 206 40 250 440 410
## 122 6.93 34 10 4 35 2 240 410 270
## 123 91.50 50 8 5 36 7 180 400 220
## 124 50.55 57 3 6 242 39 270 450 430
## 125 11.73 57 18 5 141 32 224 428 310
## 126 1.24 57 6 5 116 22 200 420 280
## 127 3.69 33 12 4 54 3 190 410 230
## 128 50.18 50 12 5 246 48 210 420 430
## 129 43.85 43 8 5 201 42 230 430 380
## 130 3.55 64 7 5 73 17 190 400 310
## 131 14.58 37 8 5 208 32 220 410 450
## 132 5.00 39 19 4 169 26 210 430 310
## 133 36.53 52 35 3 48 9 220 420 250
## 134 10.45 47 14 5 107 21 200 430 260
## 135 1.00 0 8 6 240 37 200 420 380
## 136 8.50 0 9 7 139 19 210 400 390
## 137 11.50 55 30 4 69 13 210 410 320
## 138 68.36 50 8 7 56 11 210 410 290
## 139 8.82 31 10 4 133 23 222 420 298
## 140 4.06 42 5 3 118 18 250 430 360
## 141 1.67 49 19 2 150 31 229 427 318
## 142 85.00 50 23 4 136 26 220 420 370
## 143 44.79 52 21 5 150 26 250 430 380
## 144 13.20 69 12 3 292 67 230 460 400
## 145 30.30 46 10 5 320 62 290 460 480
## 146 8.22 52 13 5 236 46 190 410 430
## 147 6.39 53 10 4 232 45 210 420 420
## 148 3.50 40 8 6 179 27 230 410 420
## 149 11.24 53 16 5 215 42 180 410 410
## 150 3.18 53 30 5 89 17 270 440 350
## 151 22.79 72 37 5 61 9 230 420 310
## 152 18.83 61 11 5 108 25 250 440 350
## 153 32.11 77 3 4 43 10 210 410 240
## 154 4.56 0 9 6 63 15 220 430 270
## 155 17.74 36 12 4 194 37 260 440 410
## 156 17.45 52 11 4 218 42 300 460 440
## 157 2.00 56 3 5 155 36 240 440 370
## 158 23.13 78 4 6 260 52 300 460 470
## 159 61.76 42 7 4 147 34 220 430 360
## 160 4.25 75 30 4 124 30 209 428 293
## 161 60.29 55 26 4 114 6 280 420 370
## 162 32.50 57 46 3 79 18 240 430 310
## 163 0.08 78 9 2 65 15 250 430 300
## 164 4.56 39 9 5 215 34 230 430 430
## 165 14.50 68 35 2 94 22 270 440 360
## 166 72.37 52 14 7 332 65 300 450 520
## 167 11.74 44 17 5 157 24 240 430 380
## 168 125.44 68 37 6 138 32 270 450 390
## 169 52.45 74 39 6 188 43 200 440 320
## 170 39.50 41 8 5 200 11 180 400 300
## 171 94.15 48 0 6 131 28 220 430 340
## 172 34.37 57 14 6 265 61 270 460 450
## 173 111.83 53 8 6 243 47 310 460 470
## 174 48.80 61 7 4 134 19 200 410 320
## 175 329.71 73 34 5 69 16 220 420 300
## 176 32.35 60 10 6 177 41 280 460 410
## 177 1.11 0 9 6 187 43 200 420 320
## 178 54.88 46 11 6 149 29 250 450 380
## 179 70.23 52 11 5 229 44 300 450 460
## 180 25.24 42 16 7 307 17 260 430 440
## 181 1.83 78 26 7 5 0 180 400 90
## 182 9.79 40 14 5 349 53 230 430 450
## 183 151.93 42 9 6 231 35 290 450 460
## 184 119.87 64 15 6 133 31 230 440 330
## 185 3.81 37 13 4 150 23 250 420 410
## 186 69.04 58 14 4 161 37 190 420 370
## 187 52.67 49 2 2 87 20 190 414 239
## 188 9.92 31 35 1 181 39 242 438 363
## 189 23.45 49 22 3 180 35 190 410 360
## 190 45.50 76 17 3 44 10 200 420 220
## 191 60.82 57 18 3 430 99 380 520 570
## 192 2.23 57 14 3 112 17 240 420 340
## 193 36.00 41 10 4 322 54 230 440 430
## 194 307.15 40 15 4 309 71 280 450 520
## 195 12.92 46 6 5 999 346 230 610 590
## 196 111.19 49 24 2 173 40 270 450 400
## 197 2.42 46 2 4 237 46 280 440 440
## 198 29.77 53 22 3 224 43 180 410 420
## 199 19.35 57 18 4 174 40 210 430 360
## 200 4.42 59 9 6 169 39 220 460 330
## SpendVol SpenVel CollGifts BricMortar MarthaHome SunAds ThemeColl CustDec
## 1 503 285 1 0 0 1 0 1
## 2 690 570 0 1 1 0 0 1
## 3 600 280 1 0 0 1 1 1
## 4 543 308 1 0 0 1 1 0
## 5 680 100 0 1 1 0 0 1
## 6 440 50 0 1 1 0 0 1
## 7 690 180 1 0 0 1 0 0
## 8 500 10 1 1 1 0 1 1
## 9 610 0 1 0 1 0 1 1
## 10 660 0 0 1 0 0 0 0
## 11 610 50 1 1 0 1 1 0
## 12 220 0 0 0 1 1 0 1
## 13 570 220 1 0 0 1 1 0
## 14 630 300 0 0 0 0 0 0
## 15 580 170 0 0 0 0 0 0
## 16 770 30 1 0 1 1 1 1
## 17 620 170 1 0 0 1 1 0
## 18 570 610 1 1 1 1 0 1
## 19 710 130 0 0 0 0 0 0
## 20 340 30 1 0 1 1 1 1
## 21 380 20 0 1 1 0 0 1
## 22 680 160 0 0 1 1 0 1
## 23 660 20 1 0 1 1 0 1
## 24 300 10 0 0 0 0 1 0
## 25 490 0 1 1 1 1 1 0
## 26 730 140 1 0 1 0 1 1
## 27 458 188 1 1 1 1 0 1
## 28 590 200 1 1 1 1 1 0
## 29 690 20 1 1 0 0 0 0
## 30 0 0 0 0 0 0 0 0
## 31 700 80 1 0 1 0 1 0
## 32 620 380 0 1 1 0 0 1
## 33 660 230 1 0 0 1 1 0
## 34 750 240 0 1 1 0 0 0
## 35 690 80 0 1 1 1 1 0
## 36 560 450 1 0 1 0 0 0
## 37 260 10 1 0 0 1 1 0
## 38 0 0 0 0 0 0 0 0
## 39 260 0 0 1 0 0 0 0
## 40 534 238 1 1 1 0 0 1
## 41 599 312 0 0 1 0 0 1
## 42 534 281 0 0 0 1 0 0
## 43 250 0 0 0 0 1 1 0
## 44 710 100 0 0 0 0 0 0
## 45 564 251 0 0 1 0 1 0
## 46 640 380 0 0 0 1 1 0
## 47 730 190 1 0 1 0 0 0
## 48 700 90 1 0 0 1 0 1
## 49 700 130 1 1 1 1 1 1
## 50 660 300 0 0 0 0 0 0
## 51 660 220 0 1 1 0 0 1
## 52 650 400 1 0 1 1 0 1
## 53 670 210 1 0 1 1 1 1
## 54 480 70 1 0 1 1 1 0
## 55 526 238 0 1 0 1 0 0
## 56 630 10 1 0 0 1 0 1
## 57 370 30 0 0 0 1 0 1
## 58 610 330 1 0 1 1 0 1
## 59 670 240 1 0 0 0 1 1
## 60 740 110 1 0 1 0 1 1
## 61 670 750 1 0 0 0 0 0
## 62 600 40 0 0 1 0 1 0
## 63 590 0 1 1 1 0 1 1
## 64 630 260 0 1 1 0 0 1
## 65 590 90 1 0 1 1 1 0
## 66 506 308 1 0 0 0 0 0
## 67 720 750 0 0 0 1 1 0
## 68 650 900 1 0 0 1 1 0
## 69 580 230 1 0 0 0 1 0
## 70 650 160 1 1 0 0 0 0
## 71 660 210 1 0 0 1 1 0
## 72 547 300 0 0 0 0 0 0
## 73 650 570 0 0 0 0 0 0
## 74 590 310 0 0 0 0 0 0
## 75 541 329 0 1 1 0 0 1
## 76 0 0 0 1 0 0 0 0
## 77 680 250 0 0 0 0 1 0
## 78 480 150 0 0 0 1 1 0
## 79 180 10 0 0 0 0 0 0
## 80 376 132 0 1 0 0 0 0
## 81 490 70 0 0 0 1 0 1
## 82 240 0 1 0 0 1 1 0
## 83 600 0 1 0 0 1 1 1
## 84 552 230 0 0 1 0 0 0
## 85 489 218 1 0 0 1 0 0
## 86 640 90 0 0 0 0 0 0
## 87 437 166 0 0 0 0 0 0
## 88 560 140 0 0 0 0 0 0
## 89 630 360 0 0 0 0 1 0
## 90 560 999 0 0 0 1 1 0
## 91 516 241 0 1 1 0 0 0
## 92 80 0 1 0 1 1 1 0
## 93 350 30 1 1 1 0 0 1
## 94 640 60 0 0 1 0 0 0
## 95 460 130 1 1 1 0 0 1
## 96 610 430 1 0 1 1 0 0
## 97 360 30 0 0 0 0 0 0
## 98 690 280 1 1 0 0 1 0
## 99 620 80 1 1 1 0 1 1
## 100 710 100 0 0 0 1 0 1
## 101 570 0 1 0 0 0 1 0
## 102 560 10 0 1 0 0 0 0
## 103 610 150 1 0 0 1 0 1
## 104 650 60 1 0 0 1 0 1
## 105 620 490 0 0 0 1 1 1
## 106 590 580 0 0 0 0 0 0
## 107 580 10 0 1 0 1 0 1
## 108 670 130 0 0 0 0 0 0
## 109 680 130 0 0 0 0 0 0
## 110 650 450 1 0 0 1 1 0
## 111 410 30 0 1 0 0 0 1
## 112 700 660 1 0 1 1 1 1
## 113 700 150 0 1 0 0 0 1
## 114 362 205 1 0 0 1 0 0
## 115 740 220 0 1 0 0 0 0
## 116 610 310 0 0 0 0 1 1
## 117 620 330 1 0 1 1 1 1
## 118 610 130 0 0 0 1 0 1
## 119 552 304 0 1 0 0 0 0
## 120 660 0 0 0 0 0 0 0
## 121 600 210 0 0 1 1 1 1
## 122 480 210 0 0 0 0 0 0
## 123 570 680 1 0 0 1 1 1
## 124 660 40 1 0 0 1 0 1
## 125 513 219 0 0 0 1 0 0
## 126 580 140 1 0 1 1 0 1
## 127 580 230 0 0 0 0 0 0
## 128 680 600 1 1 0 1 1 0
## 129 690 220 1 0 0 0 0 0
## 130 750 690 0 0 0 0 0 0
## 131 640 640 0 1 1 0 0 0
## 132 780 0 0 0 0 0 0 0
## 133 130 0 0 0 0 1 1 0
## 134 630 0 0 0 0 0 0 0
## 135 680 410 1 0 0 1 0 0
## 136 710 600 0 0 0 0 1 1
## 137 520 510 0 0 0 0 0 0
## 138 540 480 0 1 1 0 0 1
## 139 584 334 0 0 0 0 0 0
## 140 610 50 0 1 0 0 0 0
## 141 553 254 0 0 0 0 0 0
## 142 620 390 1 0 0 1 1 0
## 143 600 190 1 0 0 1 1 1
## 144 620 100 0 0 0 0 0 0
## 145 680 130 1 0 0 0 1 1
## 146 710 750 0 0 1 0 0 0
## 147 720 580 1 0 0 0 0 0
## 148 680 450 0 0 0 0 0 0
## 149 720 840 0 0 0 0 0 0
## 150 230 0 0 0 1 0 0 0
## 151 540 240 1 0 0 1 1 0
## 152 430 70 0 0 0 1 1 1
## 153 460 110 1 0 0 1 1 0
## 154 260 0 0 0 1 1 0 1
## 155 640 140 1 0 0 1 1 0
## 156 700 0 0 0 0 0 0 0
## 157 600 120 0 0 0 0 0 0
## 158 640 130 1 1 0 1 0 0
## 159 620 300 1 0 1 1 0 1
## 160 480 191 0 0 0 1 1 0
## 161 580 10 0 0 0 0 0 0
## 162 0 0 0 0 0 0 0 0
## 163 80 0 0 0 0 0 0 0
## 164 670 420 0 1 0 0 1 0
## 165 360 40 1 0 1 1 0 0
## 166 660 260 1 0 1 1 1 1
## 167 670 110 1 0 0 1 0 1
## 168 550 40 1 1 0 1 1 1
## 169 570 250 1 0 0 0 0 0
## 170 700 560 1 0 0 0 1 0
## 171 540 120 1 1 1 1 0 1
## 172 670 120 1 0 1 0 0 0
## 173 620 100 1 1 1 0 1 1
## 174 640 330 1 0 0 1 1 0
## 175 510 200 1 1 1 1 1 1
## 176 500 60 1 0 1 1 0 0
## 177 670 270 0 0 0 0 0 0
## 178 590 100 1 1 1 0 0 0
## 179 650 50 0 1 0 1 1 0
## 180 720 40 1 1 1 0 0 0
## 181 350 630 0 0 0 0 0 0
## 182 740 370 0 1 0 0 0 0
## 183 660 190 1 1 1 1 0 1
## 184 580 0 1 0 1 0 1 1
## 185 650 310 0 0 0 1 1 1
## 186 590 650 0 1 1 0 1 0
## 187 511 266 1 0 0 0 1 0
## 188 520 194 1 0 0 1 0 0
## 189 630 650 1 1 0 0 0 0
## 190 170 0 0 1 1 0 0 0
## 191 690 80 0 1 1 0 0 0
## 192 630 40 0 0 0 0 0 0
## 193 680 330 0 1 1 0 0 1
## 194 670 420 1 0 1 1 1 1
## 195 650 540 0 0 0 0 0 0
## 196 640 70 1 0 0 1 1 1
## 197 650 20 1 0 0 1 1 1
## 198 670 880 1 0 0 0 0 0
## 199 590 370 0 0 0 0 0 0
## 200 540 0 1 0 0 0 1 0
## RetailKids TeenWr Carlovers CountryColl
## 1 1 1 0 1
## 2 1 0 0 0
## 3 1 0 0 1
## 4 0 0 0 1
## 5 0 0 0 0
## 6 0 0 1 0
## 7 0 0 0 1
## 8 1 1 1 0
## 9 0 0 0 1
## 10 0 1 0 0
## 11 0 1 0 1
## 12 1 0 0 0
## 13 1 1 0 1
## 14 0 0 0 1
## 15 0 0 0 0
## 16 0 0 0 1
## 17 0 1 1 1
## 18 1 0 0 1
## 19 0 0 0 0
## 20 1 0 0 1
## 21 0 1 0 0
## 22 1 0 0 0
## 23 0 1 0 1
## 24 0 1 0 0
## 25 1 1 1 1
## 26 0 1 0 0
## 27 0 0 1 1
## 28 1 1 1 1
## 29 0 1 1 1
## 30 0 0 0 0
## 31 1 1 1 1
## 32 0 1 0 0
## 33 0 1 0 1
## 34 0 1 0 0
## 35 0 1 1 0
## 36 0 0 0 0
## 37 1 0 1 1
## 38 0 1 0 0
## 39 0 0 0 0
## 40 0 1 0 0
## 41 0 0 1 0
## 42 0 0 0 0
## 43 0 1 0 1
## 44 0 0 0 0
## 45 0 1 0 0
## 46 0 0 1 0
## 47 1 1 0 0
## 48 1 1 1 0
## 49 0 1 0 1
## 50 0 1 0 0
## 51 0 1 0 0
## 52 1 0 0 0
## 53 1 1 1 1
## 54 0 1 1 1
## 55 0 1 1 1
## 56 1 0 1 1
## 57 1 0 1 0
## 58 1 1 0 1
## 59 0 0 1 1
## 60 0 0 0 0
## 61 1 1 1 0
## 62 0 1 0 0
## 63 1 1 0 1
## 64 0 1 0 1
## 65 0 0 1 1
## 66 0 1 0 0
## 67 0 1 0 0
## 68 0 0 0 1
## 69 0 0 0 1
## 70 0 1 0 1
## 71 0 0 0 1
## 72 0 0 1 0
## 73 0 0 0 0
## 74 0 1 0 0
## 75 0 0 0 0
## 76 0 0 0 0
## 77 1 1 0 0
## 78 0 0 0 1
## 79 0 1 0 0
## 80 0 1 1 0
## 81 1 0 0 0
## 82 0 0 0 1
## 83 1 1 0 1
## 84 0 0 0 0
## 85 0 0 0 1
## 86 0 0 0 0
## 87 0 0 0 0
## 88 0 0 1 0
## 89 1 1 0 1
## 90 0 1 0 1
## 91 1 1 1 0
## 92 0 0 0 1
## 93 1 0 0 0
## 94 0 1 0 0
## 95 1 1 0 1
## 96 0 0 0 1
## 97 0 1 0 0
## 98 0 1 0 1
## 99 0 1 0 1
## 100 1 0 1 0
## 101 0 0 0 1
## 102 0 1 1 0
## 103 1 0 0 1
## 104 1 0 0 1
## 105 1 0 0 0
## 106 0 0 0 0
## 107 1 1 1 0
## 108 0 0 0 0
## 109 0 0 0 0
## 110 1 0 0 0
## 111 1 1 0 0
## 112 1 1 1 1
## 113 0 0 0 0
## 114 0 0 0 1
## 115 0 1 0 0
## 116 0 0 0 0
## 117 0 0 0 1
## 118 1 1 0 0
## 119 0 1 0 0
## 120 0 0 1 0
## 121 1 0 0 0
## 122 0 1 0 0
## 123 1 0 0 1
## 124 1 1 1 0
## 125 0 0 0 0
## 126 1 0 0 1
## 127 1 1 0 0
## 128 1 1 0 1
## 129 0 1 1 0
## 130 0 0 0 0
## 131 0 1 0 0
## 132 0 1 0 0
## 133 0 0 1 1
## 134 0 1 0 0
## 135 0 0 0 1
## 136 0 0 0 0
## 137 0 1 1 0
## 138 0 1 0 0
## 139 1 0 0 0
## 140 0 0 1 0
## 141 0 0 0 0
## 142 1 1 1 1
## 143 0 1 1 1
## 144 0 0 1 0
## 145 0 1 0 0
## 146 0 1 1 0
## 147 0 0 0 0
## 148 0 0 0 0
## 149 0 0 0 1
## 150 0 1 0 0
## 151 1 0 1 1
## 152 1 1 0 0
## 153 0 0 1 1
## 154 1 0 0 0
## 155 1 1 0 1
## 156 0 0 0 0
## 157 0 0 0 0
## 158 0 1 0 1
## 159 1 1 1 1
## 160 0 0 0 1
## 161 0 1 0 0
## 162 0 0 0 0
## 163 0 0 0 0
## 164 1 1 0 0
## 165 0 0 0 1
## 166 1 0 1 0
## 167 1 1 1 1
## 168 1 1 0 1
## 169 1 1 0 0
## 170 1 1 0 0
## 171 1 0 0 1
## 172 0 1 1 0
## 173 0 1 1 0
## 174 0 0 0 1
## 175 1 1 0 1
## 176 0 1 1 1
## 177 0 0 0 0
## 178 1 1 0 0
## 179 0 1 0 1
## 180 1 1 0 0
## 181 0 0 0 0
## 182 1 0 1 0
## 183 1 0 0 1
## 184 0 0 1 0
## 185 1 0 0 1
## 186 0 1 0 0
## 187 0 0 0 0
## 188 1 0 1 1
## 189 0 1 0 1
## 190 0 0 0 1
## 191 0 1 0 0
## 192 0 1 0 0
## 193 1 1 0 1
## 194 1 1 0 1
## 195 0 0 1 0
## 196 0 1 1 1
## 197 1 1 0 1
## 198 1 0 1 0
## 199 1 1 0 0
## 200 0 1 0 1
d1a <- filter(d1, Age>18)
d1a
## SpendRat Age LenRes Income TotAsset SecAssets ShortLiq LongLiq WlthIdx
## 1 16.83 35 3 5 195 36 220 420 430
## 2 11.38 46 9 5 123 24 200 420 290
## 3 31.33 41 2 2 117 25 222 419 279
## 4 1.90 46 7 9 493 105 310 500 520
## 5 84.13 46 15 5 138 27 340 450 440
## 6 2.15 46 16 4 162 25 230 430 360
## 7 38.00 56 31 6 117 27 300 440 400
## 8 136.28 48 8 5 119 23 250 430 360
## 9 61.46 54 8 5 50 10 200 420 230
## 10 2.73 43 15 4 135 21 230 430 340
## 11 5.86 66 8 4 81 19 250 430 320
## 12 113.00 61 8 5 999 245 999 720 880
## 13 8.40 77 8 3 198 45 210 430 340
## 14 19.00 50 7 4 412 95 290 470 490
## 15 70.85 49 19 5 192 37 200 440 330
## 16 20.17 76 12 5 64 15 190 410 280
## 17 0.59 49 2 7 229 44 270 450 430
## 18 19.48 63 7 5 95 22 270 440 360
## 19 86.20 67 32 5 87 20 250 430 330
## 20 25.46 70 36 5 511 99 270 540 500
## 21 76.82 77 15 4 242 56 300 470 450
## 22 0.77 71 9 5 43 10 200 420 210
## 23 335.94 58 6 6 88 20 220 430 290
## 24 18.78 48 3 5 534 94 240 520 460
## 25 205.07 77 9 6 131 30 205 433 295
## 26 216.57 66 9 7 85 20 220 420 300
## 27 77.21 21 3 4 206 12 280 460 420
## 28 0.26 89 11 5 57 13 230 420 280
## 29 92.53 81 37 6 999 209 280 670 530
## 30 46.55 56 19 3 143 23 220 410 380
## 31 6.86 68 20 5 338 78 280 480 490
## 32 18.14 40 7 5 306 47 280 430 490
## 33 120.50 78 3 4 285 55 290 440 470
## 34 37.08 69 7 3 100 23 200 420 300
## 35 19.20 46 27 4 52 12 210 420 250
## 36 23.08 74 25 3 84 19 260 430 330
## 37 165.13 54 9 4 164 31 225 433 333
## 38 6.50 70 9 4 109 18 194 414 265
## 39 8.29 73 3 4 73 17 250 430 310
## 40 3.95 53 8 7 127 25 240 430 350
## 41 14.88 42 7 6 155 31 229 427 328
## 42 125.32 64 25 6 196 32 240 420 430
## 43 39.43 50 5 7 353 68 240 440 440
## 44 39.81 37 10 5 216 33 280 440 440
## 45 230.89 20 21 5 564 109 250 520 490
## 46 3.61 46 16 5 244 47 220 440 390
## 47 37.78 57 12 4 277 64 230 440 410
## 48 9.23 54 3 7 210 36 210 410 390
## 49 44.00 47 17 5 315 61 280 450 480
## 50 86.96 43 15 5 45 7 230 420 270
## 51 2.37 55 7 6 127 29 216 424 292
## 52 4.81 56 15 6 228 53 260 460 410
## 53 0.16 57 10 6 139 32 310 450 420
## 54 99.10 36 2 5 51 8 230 410 300
## 55 11.85 38 8 7 260 46 260 440 450
## 56 12.80 38 7 5 195 30 230 430 370
## 57 21.00 55 19 6 145 28 190 410 360
## 58 17.53 59 8 5 243 56 330 480 480
## 59 72.91 55 22 6 140 27 280 450 390
## 60 113.71 49 19 4 129 25 200 420 290
## 61 10.38 52 7 6 98 19 202 417 252
## 62 9.36 41 9 4 170 26 190 410 380
## 63 304.61 49 8 4 267 52 190 410 520
## 64 41.63 52 18 4 188 35 230 430 380
## 65 1.37 41 8 3 196 33 220 420 360
## 66 24.12 34 13 3 183 42 220 420 400
## 67 33.15 32 3 5 32 2 220 410 260
## 68 22.44 78 40 4 38 9 210 420 230
## 69 11.79 44 8 3 237 18 200 420 350
## 70 1.36 81 8 2 92 21 240 430 340
## 71 0.71 86 42 2 90 21 280 440 360
## 72 39.69 76 8 5 107 34 206 431 274
## 73 4.33 79 18 4 77 18 220 430 300
## 74 8.39 84 22 2 91 21 260 440 340
## 75 10.80 49 14 2 117 23 230 430 320
## 76 5.80 55 16 4 169 35 238 432 347
## 77 1.33 70 17 3 94 23 190 420 254
## 78 4.13 43 16 2 139 24 230 430 340
## 79 17.94 85 42 1 123 32 207 432 289
## 80 3.23 56 25 2 126 29 260 430 380
## 81 45.54 42 8 3 168 26 240 420 400
## 82 6.77 45 9 3 40 6 160 400 260
## 83 4.53 55 7 2 143 28 216 428 311
## 84 13.50 62 14 2 77 18 260 430 330
## 85 129.72 40 9 2 58 9 270 420 320
## 86 19.27 44 17 4 180 27 240 440 370
## 87 153.67 71 38 4 69 16 200 420 260
## 88 61.90 61 31 4 209 48 290 440 490
## 89 275.91 79 32 2 63 15 230 420 290
## 90 5.60 51 13 4 222 36 260 440 450
## 91 401.42 52 24 5 260 50 270 470 440
## 92 15.00 35 8 4 141 8 250 420 370
## 93 3.67 62 6 4 184 42 310 470 440
## 94 7.60 30 7 4 20 1 220 400 200
## 95 5.56 54 35 4 273 51 260 460 440
## 96 16.69 60 14 5 999 999 190 999 540
## 97 5.92 60 16 5 74 17 180 410 240
## 98 39.73 41 34 5 62 9 270 420 320
## 99 2.22 45 7 2 119 18 240 420 360
## 100 1.33 73 9 5 307 47 220 460 390
## 101 4.65 35 11 8 166 9 220 410 380
## 102 6.13 41 11 5 218 33 210 410 430
## 103 13.14 61 30 5 305 70 220 470 390
## 104 44.13 87 2 1 48 21 178 419 198
## 105 26.43 50 9 5 475 92 230 460 460
## 106 9.88 66 8 5 192 38 240 440 390
## 107 13.28 63 27 5 202 39 230 430 410
## 108 14.93 53 24 2 140 27 200 430 300
## 109 30.73 42 9 4 117 21 206 421 285
## 110 5.50 40 9 4 122 19 250 420 350
## 111 15.95 54 29 5 206 40 250 440 410
## 112 6.93 34 10 4 35 2 240 410 270
## 113 91.50 50 8 5 36 7 180 400 220
## 114 50.55 57 3 6 242 39 270 450 430
## 115 11.73 57 18 5 141 32 224 428 310
## 116 1.24 57 6 5 116 22 200 420 280
## 117 3.69 33 12 4 54 3 190 410 230
## 118 50.18 50 12 5 246 48 210 420 430
## 119 43.85 43 8 5 201 42 230 430 380
## 120 3.55 64 7 5 73 17 190 400 310
## 121 14.58 37 8 5 208 32 220 410 450
## 122 5.00 39 19 4 169 26 210 430 310
## 123 36.53 52 35 3 48 9 220 420 250
## 124 10.45 47 14 5 107 21 200 430 260
## 125 11.50 55 30 4 69 13 210 410 320
## 126 68.36 50 8 7 56 11 210 410 290
## 127 8.82 31 10 4 133 23 222 420 298
## 128 4.06 42 5 3 118 18 250 430 360
## 129 1.67 49 19 2 150 31 229 427 318
## 130 85.00 50 23 4 136 26 220 420 370
## 131 44.79 52 21 5 150 26 250 430 380
## 132 13.20 69 12 3 292 67 230 460 400
## 133 30.30 46 10 5 320 62 290 460 480
## 134 8.22 52 13 5 236 46 190 410 430
## 135 6.39 53 10 4 232 45 210 420 420
## 136 3.50 40 8 6 179 27 230 410 420
## 137 11.24 53 16 5 215 42 180 410 410
## 138 3.18 53 30 5 89 17 270 440 350
## 139 22.79 72 37 5 61 9 230 420 310
## 140 18.83 61 11 5 108 25 250 440 350
## 141 32.11 77 3 4 43 10 210 410 240
## 142 17.74 36 12 4 194 37 260 440 410
## 143 17.45 52 11 4 218 42 300 460 440
## 144 2.00 56 3 5 155 36 240 440 370
## 145 23.13 78 4 6 260 52 300 460 470
## 146 61.76 42 7 4 147 34 220 430 360
## 147 4.25 75 30 4 124 30 209 428 293
## 148 60.29 55 26 4 114 6 280 420 370
## 149 32.50 57 46 3 79 18 240 430 310
## 150 0.08 78 9 2 65 15 250 430 300
## 151 4.56 39 9 5 215 34 230 430 430
## 152 14.50 68 35 2 94 22 270 440 360
## 153 72.37 52 14 7 332 65 300 450 520
## 154 11.74 44 17 5 157 24 240 430 380
## 155 125.44 68 37 6 138 32 270 450 390
## 156 52.45 74 39 6 188 43 200 440 320
## 157 39.50 41 8 5 200 11 180 400 300
## 158 94.15 48 0 6 131 28 220 430 340
## 159 34.37 57 14 6 265 61 270 460 450
## 160 111.83 53 8 6 243 47 310 460 470
## 161 48.80 61 7 4 134 19 200 410 320
## 162 329.71 73 34 5 69 16 220 420 300
## 163 32.35 60 10 6 177 41 280 460 410
## 164 54.88 46 11 6 149 29 250 450 380
## 165 70.23 52 11 5 229 44 300 450 460
## 166 25.24 42 16 7 307 17 260 430 440
## 167 1.83 78 26 7 5 0 180 400 90
## 168 9.79 40 14 5 349 53 230 430 450
## 169 151.93 42 9 6 231 35 290 450 460
## 170 119.87 64 15 6 133 31 230 440 330
## 171 3.81 37 13 4 150 23 250 420 410
## 172 69.04 58 14 4 161 37 190 420 370
## 173 52.67 49 2 2 87 20 190 414 239
## 174 9.92 31 35 1 181 39 242 438 363
## 175 23.45 49 22 3 180 35 190 410 360
## 176 45.50 76 17 3 44 10 200 420 220
## 177 60.82 57 18 3 430 99 380 520 570
## 178 2.23 57 14 3 112 17 240 420 340
## 179 36.00 41 10 4 322 54 230 440 430
## 180 307.15 40 15 4 309 71 280 450 520
## 181 12.92 46 6 5 999 346 230 610 590
## 182 111.19 49 24 2 173 40 270 450 400
## 183 2.42 46 2 4 237 46 280 440 440
## 184 29.77 53 22 3 224 43 180 410 420
## 185 19.35 57 18 4 174 40 210 430 360
## 186 4.42 59 9 6 169 39 220 460 330
## SpendVol SpenVel CollGifts BricMortar MarthaHome SunAds ThemeColl CustDec
## 1 690 570 0 1 1 0 0 1
## 2 600 280 1 0 0 1 1 1
## 3 543 308 1 0 0 1 1 0
## 4 680 100 0 1 1 0 0 1
## 5 440 50 0 1 1 0 0 1
## 6 690 180 1 0 0 1 0 0
## 7 500 10 1 1 1 0 1 1
## 8 610 0 1 0 1 0 1 1
## 9 660 0 0 1 0 0 0 0
## 10 610 50 1 1 0 1 1 0
## 11 220 0 0 0 1 1 0 1
## 12 570 220 1 0 0 1 1 0
## 13 630 300 0 0 0 0 0 0
## 14 770 30 1 0 1 1 1 1
## 15 620 170 1 0 0 1 1 0
## 16 570 610 1 1 1 1 0 1
## 17 710 130 0 0 0 0 0 0
## 18 340 30 1 0 1 1 1 1
## 19 380 20 0 1 1 0 0 1
## 20 680 160 0 0 1 1 0 1
## 21 660 20 1 0 1 1 0 1
## 22 300 10 0 0 0 0 1 0
## 23 490 0 1 1 1 1 1 0
## 24 730 140 1 0 1 0 1 1
## 25 458 188 1 1 1 1 0 1
## 26 590 200 1 1 1 1 1 0
## 27 690 20 1 1 0 0 0 0
## 28 0 0 0 0 0 0 0 0
## 29 700 80 1 0 1 0 1 0
## 30 620 380 0 1 1 0 0 1
## 31 660 230 1 0 0 1 1 0
## 32 750 240 0 1 1 0 0 0
## 33 690 80 0 1 1 1 1 0
## 34 560 450 1 0 1 0 0 0
## 35 260 10 1 0 0 1 1 0
## 36 260 0 0 1 0 0 0 0
## 37 534 238 1 1 1 0 0 1
## 38 599 312 0 0 1 0 0 1
## 39 250 0 0 0 0 1 1 0
## 40 710 100 0 0 0 0 0 0
## 41 564 251 0 0 1 0 1 0
## 42 640 380 0 0 0 1 1 0
## 43 730 190 1 0 1 0 0 0
## 44 700 90 1 0 0 1 0 1
## 45 700 130 1 1 1 1 1 1
## 46 660 300 0 0 0 0 0 0
## 47 660 220 0 1 1 0 0 1
## 48 650 400 1 0 1 1 0 1
## 49 670 210 1 0 1 1 1 1
## 50 480 70 1 0 1 1 1 0
## 51 526 238 0 1 0 1 0 0
## 52 630 10 1 0 0 1 0 1
## 53 370 30 0 0 0 1 0 1
## 54 610 330 1 0 1 1 0 1
## 55 670 240 1 0 0 0 1 1
## 56 740 110 1 0 1 0 1 1
## 57 670 750 1 0 0 0 0 0
## 58 600 40 0 0 1 0 1 0
## 59 590 0 1 1 1 0 1 1
## 60 590 90 1 0 1 1 1 0
## 61 506 308 1 0 0 0 0 0
## 62 720 750 0 0 0 1 1 0
## 63 650 900 1 0 0 1 1 0
## 64 650 160 1 1 0 0 0 0
## 65 660 210 1 0 0 1 1 0
## 66 650 570 0 0 0 0 0 0
## 67 590 310 0 0 0 0 0 0
## 68 0 0 0 1 0 0 0 0
## 69 680 250 0 0 0 0 1 0
## 70 480 150 0 0 0 1 1 0
## 71 180 10 0 0 0 0 0 0
## 72 376 132 0 1 0 0 0 0
## 73 490 70 0 0 0 1 0 1
## 74 240 0 1 0 0 1 1 0
## 75 600 0 1 0 0 1 1 1
## 76 552 230 0 0 1 0 0 0
## 77 489 218 1 0 0 1 0 0
## 78 640 90 0 0 0 0 0 0
## 79 437 166 0 0 0 0 0 0
## 80 560 140 0 0 0 0 0 0
## 81 630 360 0 0 0 0 1 0
## 82 560 999 0 0 0 1 1 0
## 83 516 241 0 1 1 0 0 0
## 84 80 0 1 0 1 1 1 0
## 85 350 30 1 1 1 0 0 1
## 86 640 60 0 0 1 0 0 0
## 87 460 130 1 1 1 0 0 1
## 88 610 430 1 0 1 1 0 0
## 89 360 30 0 0 0 0 0 0
## 90 690 280 1 1 0 0 1 0
## 91 620 80 1 1 1 0 1 1
## 92 710 100 0 0 0 1 0 1
## 93 570 0 1 0 0 0 1 0
## 94 560 10 0 1 0 0 0 0
## 95 610 150 1 0 0 1 0 1
## 96 620 490 0 0 0 1 1 1
## 97 590 580 0 0 0 0 0 0
## 98 580 10 0 1 0 1 0 1
## 99 670 130 0 0 0 0 0 0
## 100 680 130 0 0 0 0 0 0
## 101 650 450 1 0 0 1 1 0
## 102 700 660 1 0 1 1 1 1
## 103 700 150 0 1 0 0 0 1
## 104 362 205 1 0 0 1 0 0
## 105 740 220 0 1 0 0 0 0
## 106 610 310 0 0 0 0 1 1
## 107 620 330 1 0 1 1 1 1
## 108 610 130 0 0 0 1 0 1
## 109 552 304 0 1 0 0 0 0
## 110 660 0 0 0 0 0 0 0
## 111 600 210 0 0 1 1 1 1
## 112 480 210 0 0 0 0 0 0
## 113 570 680 1 0 0 1 1 1
## 114 660 40 1 0 0 1 0 1
## 115 513 219 0 0 0 1 0 0
## 116 580 140 1 0 1 1 0 1
## 117 580 230 0 0 0 0 0 0
## 118 680 600 1 1 0 1 1 0
## 119 690 220 1 0 0 0 0 0
## 120 750 690 0 0 0 0 0 0
## 121 640 640 0 1 1 0 0 0
## 122 780 0 0 0 0 0 0 0
## 123 130 0 0 0 0 1 1 0
## 124 630 0 0 0 0 0 0 0
## 125 520 510 0 0 0 0 0 0
## 126 540 480 0 1 1 0 0 1
## 127 584 334 0 0 0 0 0 0
## 128 610 50 0 1 0 0 0 0
## 129 553 254 0 0 0 0 0 0
## 130 620 390 1 0 0 1 1 0
## 131 600 190 1 0 0 1 1 1
## 132 620 100 0 0 0 0 0 0
## 133 680 130 1 0 0 0 1 1
## 134 710 750 0 0 1 0 0 0
## 135 720 580 1 0 0 0 0 0
## 136 680 450 0 0 0 0 0 0
## 137 720 840 0 0 0 0 0 0
## 138 230 0 0 0 1 0 0 0
## 139 540 240 1 0 0 1 1 0
## 140 430 70 0 0 0 1 1 1
## 141 460 110 1 0 0 1 1 0
## 142 640 140 1 0 0 1 1 0
## 143 700 0 0 0 0 0 0 0
## 144 600 120 0 0 0 0 0 0
## 145 640 130 1 1 0 1 0 0
## 146 620 300 1 0 1 1 0 1
## 147 480 191 0 0 0 1 1 0
## 148 580 10 0 0 0 0 0 0
## 149 0 0 0 0 0 0 0 0
## 150 80 0 0 0 0 0 0 0
## 151 670 420 0 1 0 0 1 0
## 152 360 40 1 0 1 1 0 0
## 153 660 260 1 0 1 1 1 1
## 154 670 110 1 0 0 1 0 1
## 155 550 40 1 1 0 1 1 1
## 156 570 250 1 0 0 0 0 0
## 157 700 560 1 0 0 0 1 0
## 158 540 120 1 1 1 1 0 1
## 159 670 120 1 0 1 0 0 0
## 160 620 100 1 1 1 0 1 1
## 161 640 330 1 0 0 1 1 0
## 162 510 200 1 1 1 1 1 1
## 163 500 60 1 0 1 1 0 0
## 164 590 100 1 1 1 0 0 0
## 165 650 50 0 1 0 1 1 0
## 166 720 40 1 1 1 0 0 0
## 167 350 630 0 0 0 0 0 0
## 168 740 370 0 1 0 0 0 0
## 169 660 190 1 1 1 1 0 1
## 170 580 0 1 0 1 0 1 1
## 171 650 310 0 0 0 1 1 1
## 172 590 650 0 1 1 0 1 0
## 173 511 266 1 0 0 0 1 0
## 174 520 194 1 0 0 1 0 0
## 175 630 650 1 1 0 0 0 0
## 176 170 0 0 1 1 0 0 0
## 177 690 80 0 1 1 0 0 0
## 178 630 40 0 0 0 0 0 0
## 179 680 330 0 1 1 0 0 1
## 180 670 420 1 0 1 1 1 1
## 181 650 540 0 0 0 0 0 0
## 182 640 70 1 0 0 1 1 1
## 183 650 20 1 0 0 1 1 1
## 184 670 880 1 0 0 0 0 0
## 185 590 370 0 0 0 0 0 0
## 186 540 0 1 0 0 0 1 0
## RetailKids TeenWr Carlovers CountryColl
## 1 1 0 0 0
## 2 1 0 0 1
## 3 0 0 0 1
## 4 0 0 0 0
## 5 0 0 1 0
## 6 0 0 0 1
## 7 1 1 1 0
## 8 0 0 0 1
## 9 0 1 0 0
## 10 0 1 0 1
## 11 1 0 0 0
## 12 1 1 0 1
## 13 0 0 0 1
## 14 0 0 0 1
## 15 0 1 1 1
## 16 1 0 0 1
## 17 0 0 0 0
## 18 1 0 0 1
## 19 0 1 0 0
## 20 1 0 0 0
## 21 0 1 0 1
## 22 0 1 0 0
## 23 1 1 1 1
## 24 0 1 0 0
## 25 0 0 1 1
## 26 1 1 1 1
## 27 0 1 1 1
## 28 0 0 0 0
## 29 1 1 1 1
## 30 0 1 0 0
## 31 0 1 0 1
## 32 0 1 0 0
## 33 0 1 1 0
## 34 0 0 0 0
## 35 1 0 1 1
## 36 0 0 0 0
## 37 0 1 0 0
## 38 0 0 1 0
## 39 0 1 0 1
## 40 0 0 0 0
## 41 0 1 0 0
## 42 0 0 1 0
## 43 1 1 0 0
## 44 1 1 1 0
## 45 0 1 0 1
## 46 0 1 0 0
## 47 0 1 0 0
## 48 1 0 0 0
## 49 1 1 1 1
## 50 0 1 1 1
## 51 0 1 1 1
## 52 1 0 1 1
## 53 1 0 1 0
## 54 1 1 0 1
## 55 0 0 1 1
## 56 0 0 0 0
## 57 1 1 1 0
## 58 0 1 0 0
## 59 1 1 0 1
## 60 0 0 1 1
## 61 0 1 0 0
## 62 0 1 0 0
## 63 0 0 0 1
## 64 0 1 0 1
## 65 0 0 0 1
## 66 0 0 0 0
## 67 0 1 0 0
## 68 0 0 0 0
## 69 1 1 0 0
## 70 0 0 0 1
## 71 0 1 0 0
## 72 0 1 1 0
## 73 1 0 0 0
## 74 0 0 0 1
## 75 1 1 0 1
## 76 0 0 0 0
## 77 0 0 0 1
## 78 0 0 0 0
## 79 0 0 0 0
## 80 0 0 1 0
## 81 1 1 0 1
## 82 0 1 0 1
## 83 1 1 1 0
## 84 0 0 0 1
## 85 1 0 0 0
## 86 0 1 0 0
## 87 1 1 0 1
## 88 0 0 0 1
## 89 0 1 0 0
## 90 0 1 0 1
## 91 0 1 0 1
## 92 1 0 1 0
## 93 0 0 0 1
## 94 0 1 1 0
## 95 1 0 0 1
## 96 1 0 0 0
## 97 0 0 0 0
## 98 1 1 1 0
## 99 0 0 0 0
## 100 0 0 0 0
## 101 1 0 0 0
## 102 1 1 1 1
## 103 0 0 0 0
## 104 0 0 0 1
## 105 0 1 0 0
## 106 0 0 0 0
## 107 0 0 0 1
## 108 1 1 0 0
## 109 0 1 0 0
## 110 0 0 1 0
## 111 1 0 0 0
## 112 0 1 0 0
## 113 1 0 0 1
## 114 1 1 1 0
## 115 0 0 0 0
## 116 1 0 0 1
## 117 1 1 0 0
## 118 1 1 0 1
## 119 0 1 1 0
## 120 0 0 0 0
## 121 0 1 0 0
## 122 0 1 0 0
## 123 0 0 1 1
## 124 0 1 0 0
## 125 0 1 1 0
## 126 0 1 0 0
## 127 1 0 0 0
## 128 0 0 1 0
## 129 0 0 0 0
## 130 1 1 1 1
## 131 0 1 1 1
## 132 0 0 1 0
## 133 0 1 0 0
## 134 0 1 1 0
## 135 0 0 0 0
## 136 0 0 0 0
## 137 0 0 0 1
## 138 0 1 0 0
## 139 1 0 1 1
## 140 1 1 0 0
## 141 0 0 1 1
## 142 1 1 0 1
## 143 0 0 0 0
## 144 0 0 0 0
## 145 0 1 0 1
## 146 1 1 1 1
## 147 0 0 0 1
## 148 0 1 0 0
## 149 0 0 0 0
## 150 0 0 0 0
## 151 1 1 0 0
## 152 0 0 0 1
## 153 1 0 1 0
## 154 1 1 1 1
## 155 1 1 0 1
## 156 1 1 0 0
## 157 1 1 0 0
## 158 1 0 0 1
## 159 0 1 1 0
## 160 0 1 1 0
## 161 0 0 0 1
## 162 1 1 0 1
## 163 0 1 1 1
## 164 1 1 0 0
## 165 0 1 0 1
## 166 1 1 0 0
## 167 0 0 0 0
## 168 1 0 1 0
## 169 1 0 0 1
## 170 0 0 1 0
## 171 1 0 0 1
## 172 0 1 0 0
## 173 0 0 0 0
## 174 1 0 1 1
## 175 0 1 0 1
## 176 0 0 0 1
## 177 0 1 0 0
## 178 0 1 0 0
## 179 1 1 0 1
## 180 1 1 0 1
## 181 0 0 1 0
## 182 0 1 1 1
## 183 1 1 0 1
## 184 1 0 1 0
## 185 1 1 0 0
## 186 0 1 0 1
#Filter out rows where length of residence > age
d1a <- filter(d1a, LenRes < Age)
d1a
## SpendRat Age LenRes Income TotAsset SecAssets ShortLiq LongLiq WlthIdx
## 1 16.83 35 3 5 195 36 220 420 430
## 2 11.38 46 9 5 123 24 200 420 290
## 3 31.33 41 2 2 117 25 222 419 279
## 4 1.90 46 7 9 493 105 310 500 520
## 5 84.13 46 15 5 138 27 340 450 440
## 6 2.15 46 16 4 162 25 230 430 360
## 7 38.00 56 31 6 117 27 300 440 400
## 8 136.28 48 8 5 119 23 250 430 360
## 9 61.46 54 8 5 50 10 200 420 230
## 10 2.73 43 15 4 135 21 230 430 340
## 11 5.86 66 8 4 81 19 250 430 320
## 12 113.00 61 8 5 999 245 999 720 880
## 13 8.40 77 8 3 198 45 210 430 340
## 14 19.00 50 7 4 412 95 290 470 490
## 15 70.85 49 19 5 192 37 200 440 330
## 16 20.17 76 12 5 64 15 190 410 280
## 17 0.59 49 2 7 229 44 270 450 430
## 18 19.48 63 7 5 95 22 270 440 360
## 19 86.20 67 32 5 87 20 250 430 330
## 20 25.46 70 36 5 511 99 270 540 500
## 21 76.82 77 15 4 242 56 300 470 450
## 22 0.77 71 9 5 43 10 200 420 210
## 23 335.94 58 6 6 88 20 220 430 290
## 24 18.78 48 3 5 534 94 240 520 460
## 25 205.07 77 9 6 131 30 205 433 295
## 26 216.57 66 9 7 85 20 220 420 300
## 27 77.21 21 3 4 206 12 280 460 420
## 28 0.26 89 11 5 57 13 230 420 280
## 29 92.53 81 37 6 999 209 280 670 530
## 30 46.55 56 19 3 143 23 220 410 380
## 31 6.86 68 20 5 338 78 280 480 490
## 32 18.14 40 7 5 306 47 280 430 490
## 33 120.50 78 3 4 285 55 290 440 470
## 34 37.08 69 7 3 100 23 200 420 300
## 35 19.20 46 27 4 52 12 210 420 250
## 36 23.08 74 25 3 84 19 260 430 330
## 37 165.13 54 9 4 164 31 225 433 333
## 38 6.50 70 9 4 109 18 194 414 265
## 39 8.29 73 3 4 73 17 250 430 310
## 40 3.95 53 8 7 127 25 240 430 350
## 41 14.88 42 7 6 155 31 229 427 328
## 42 125.32 64 25 6 196 32 240 420 430
## 43 39.43 50 5 7 353 68 240 440 440
## 44 39.81 37 10 5 216 33 280 440 440
## 45 3.61 46 16 5 244 47 220 440 390
## 46 37.78 57 12 4 277 64 230 440 410
## 47 9.23 54 3 7 210 36 210 410 390
## 48 44.00 47 17 5 315 61 280 450 480
## 49 86.96 43 15 5 45 7 230 420 270
## 50 2.37 55 7 6 127 29 216 424 292
## 51 4.81 56 15 6 228 53 260 460 410
## 52 0.16 57 10 6 139 32 310 450 420
## 53 99.10 36 2 5 51 8 230 410 300
## 54 11.85 38 8 7 260 46 260 440 450
## 55 12.80 38 7 5 195 30 230 430 370
## 56 21.00 55 19 6 145 28 190 410 360
## 57 17.53 59 8 5 243 56 330 480 480
## 58 72.91 55 22 6 140 27 280 450 390
## 59 113.71 49 19 4 129 25 200 420 290
## 60 10.38 52 7 6 98 19 202 417 252
## 61 9.36 41 9 4 170 26 190 410 380
## 62 304.61 49 8 4 267 52 190 410 520
## 63 41.63 52 18 4 188 35 230 430 380
## 64 1.37 41 8 3 196 33 220 420 360
## 65 24.12 34 13 3 183 42 220 420 400
## 66 33.15 32 3 5 32 2 220 410 260
## 67 22.44 78 40 4 38 9 210 420 230
## 68 11.79 44 8 3 237 18 200 420 350
## 69 1.36 81 8 2 92 21 240 430 340
## 70 0.71 86 42 2 90 21 280 440 360
## 71 39.69 76 8 5 107 34 206 431 274
## 72 4.33 79 18 4 77 18 220 430 300
## 73 8.39 84 22 2 91 21 260 440 340
## 74 10.80 49 14 2 117 23 230 430 320
## 75 5.80 55 16 4 169 35 238 432 347
## 76 1.33 70 17 3 94 23 190 420 254
## 77 4.13 43 16 2 139 24 230 430 340
## 78 17.94 85 42 1 123 32 207 432 289
## 79 3.23 56 25 2 126 29 260 430 380
## 80 45.54 42 8 3 168 26 240 420 400
## 81 6.77 45 9 3 40 6 160 400 260
## 82 4.53 55 7 2 143 28 216 428 311
## 83 13.50 62 14 2 77 18 260 430 330
## 84 129.72 40 9 2 58 9 270 420 320
## 85 19.27 44 17 4 180 27 240 440 370
## 86 153.67 71 38 4 69 16 200 420 260
## 87 61.90 61 31 4 209 48 290 440 490
## 88 275.91 79 32 2 63 15 230 420 290
## 89 5.60 51 13 4 222 36 260 440 450
## 90 401.42 52 24 5 260 50 270 470 440
## 91 15.00 35 8 4 141 8 250 420 370
## 92 3.67 62 6 4 184 42 310 470 440
## 93 7.60 30 7 4 20 1 220 400 200
## 94 5.56 54 35 4 273 51 260 460 440
## 95 16.69 60 14 5 999 999 190 999 540
## 96 5.92 60 16 5 74 17 180 410 240
## 97 39.73 41 34 5 62 9 270 420 320
## 98 2.22 45 7 2 119 18 240 420 360
## 99 1.33 73 9 5 307 47 220 460 390
## 100 4.65 35 11 8 166 9 220 410 380
## 101 6.13 41 11 5 218 33 210 410 430
## 102 13.14 61 30 5 305 70 220 470 390
## 103 44.13 87 2 1 48 21 178 419 198
## 104 26.43 50 9 5 475 92 230 460 460
## 105 9.88 66 8 5 192 38 240 440 390
## 106 13.28 63 27 5 202 39 230 430 410
## 107 14.93 53 24 2 140 27 200 430 300
## 108 30.73 42 9 4 117 21 206 421 285
## 109 5.50 40 9 4 122 19 250 420 350
## 110 15.95 54 29 5 206 40 250 440 410
## 111 6.93 34 10 4 35 2 240 410 270
## 112 91.50 50 8 5 36 7 180 400 220
## 113 50.55 57 3 6 242 39 270 450 430
## 114 11.73 57 18 5 141 32 224 428 310
## 115 1.24 57 6 5 116 22 200 420 280
## 116 3.69 33 12 4 54 3 190 410 230
## 117 50.18 50 12 5 246 48 210 420 430
## 118 43.85 43 8 5 201 42 230 430 380
## 119 3.55 64 7 5 73 17 190 400 310
## 120 14.58 37 8 5 208 32 220 410 450
## 121 5.00 39 19 4 169 26 210 430 310
## 122 36.53 52 35 3 48 9 220 420 250
## 123 10.45 47 14 5 107 21 200 430 260
## 124 11.50 55 30 4 69 13 210 410 320
## 125 68.36 50 8 7 56 11 210 410 290
## 126 8.82 31 10 4 133 23 222 420 298
## 127 4.06 42 5 3 118 18 250 430 360
## 128 1.67 49 19 2 150 31 229 427 318
## 129 85.00 50 23 4 136 26 220 420 370
## 130 44.79 52 21 5 150 26 250 430 380
## 131 13.20 69 12 3 292 67 230 460 400
## 132 30.30 46 10 5 320 62 290 460 480
## 133 8.22 52 13 5 236 46 190 410 430
## 134 6.39 53 10 4 232 45 210 420 420
## 135 3.50 40 8 6 179 27 230 410 420
## 136 11.24 53 16 5 215 42 180 410 410
## 137 3.18 53 30 5 89 17 270 440 350
## 138 22.79 72 37 5 61 9 230 420 310
## 139 18.83 61 11 5 108 25 250 440 350
## 140 32.11 77 3 4 43 10 210 410 240
## 141 17.74 36 12 4 194 37 260 440 410
## 142 17.45 52 11 4 218 42 300 460 440
## 143 2.00 56 3 5 155 36 240 440 370
## 144 23.13 78 4 6 260 52 300 460 470
## 145 61.76 42 7 4 147 34 220 430 360
## 146 4.25 75 30 4 124 30 209 428 293
## 147 60.29 55 26 4 114 6 280 420 370
## 148 32.50 57 46 3 79 18 240 430 310
## 149 0.08 78 9 2 65 15 250 430 300
## 150 4.56 39 9 5 215 34 230 430 430
## 151 14.50 68 35 2 94 22 270 440 360
## 152 72.37 52 14 7 332 65 300 450 520
## 153 11.74 44 17 5 157 24 240 430 380
## 154 125.44 68 37 6 138 32 270 450 390
## 155 52.45 74 39 6 188 43 200 440 320
## 156 39.50 41 8 5 200 11 180 400 300
## 157 94.15 48 0 6 131 28 220 430 340
## 158 34.37 57 14 6 265 61 270 460 450
## 159 111.83 53 8 6 243 47 310 460 470
## 160 48.80 61 7 4 134 19 200 410 320
## 161 329.71 73 34 5 69 16 220 420 300
## 162 32.35 60 10 6 177 41 280 460 410
## 163 54.88 46 11 6 149 29 250 450 380
## 164 70.23 52 11 5 229 44 300 450 460
## 165 25.24 42 16 7 307 17 260 430 440
## 166 1.83 78 26 7 5 0 180 400 90
## 167 9.79 40 14 5 349 53 230 430 450
## 168 151.93 42 9 6 231 35 290 450 460
## 169 119.87 64 15 6 133 31 230 440 330
## 170 3.81 37 13 4 150 23 250 420 410
## 171 69.04 58 14 4 161 37 190 420 370
## 172 52.67 49 2 2 87 20 190 414 239
## 173 23.45 49 22 3 180 35 190 410 360
## 174 45.50 76 17 3 44 10 200 420 220
## 175 60.82 57 18 3 430 99 380 520 570
## 176 2.23 57 14 3 112 17 240 420 340
## 177 36.00 41 10 4 322 54 230 440 430
## 178 307.15 40 15 4 309 71 280 450 520
## 179 12.92 46 6 5 999 346 230 610 590
## 180 111.19 49 24 2 173 40 270 450 400
## 181 2.42 46 2 4 237 46 280 440 440
## 182 29.77 53 22 3 224 43 180 410 420
## 183 19.35 57 18 4 174 40 210 430 360
## 184 4.42 59 9 6 169 39 220 460 330
## SpendVol SpenVel CollGifts BricMortar MarthaHome SunAds ThemeColl CustDec
## 1 690 570 0 1 1 0 0 1
## 2 600 280 1 0 0 1 1 1
## 3 543 308 1 0 0 1 1 0
## 4 680 100 0 1 1 0 0 1
## 5 440 50 0 1 1 0 0 1
## 6 690 180 1 0 0 1 0 0
## 7 500 10 1 1 1 0 1 1
## 8 610 0 1 0 1 0 1 1
## 9 660 0 0 1 0 0 0 0
## 10 610 50 1 1 0 1 1 0
## 11 220 0 0 0 1 1 0 1
## 12 570 220 1 0 0 1 1 0
## 13 630 300 0 0 0 0 0 0
## 14 770 30 1 0 1 1 1 1
## 15 620 170 1 0 0 1 1 0
## 16 570 610 1 1 1 1 0 1
## 17 710 130 0 0 0 0 0 0
## 18 340 30 1 0 1 1 1 1
## 19 380 20 0 1 1 0 0 1
## 20 680 160 0 0 1 1 0 1
## 21 660 20 1 0 1 1 0 1
## 22 300 10 0 0 0 0 1 0
## 23 490 0 1 1 1 1 1 0
## 24 730 140 1 0 1 0 1 1
## 25 458 188 1 1 1 1 0 1
## 26 590 200 1 1 1 1 1 0
## 27 690 20 1 1 0 0 0 0
## 28 0 0 0 0 0 0 0 0
## 29 700 80 1 0 1 0 1 0
## 30 620 380 0 1 1 0 0 1
## 31 660 230 1 0 0 1 1 0
## 32 750 240 0 1 1 0 0 0
## 33 690 80 0 1 1 1 1 0
## 34 560 450 1 0 1 0 0 0
## 35 260 10 1 0 0 1 1 0
## 36 260 0 0 1 0 0 0 0
## 37 534 238 1 1 1 0 0 1
## 38 599 312 0 0 1 0 0 1
## 39 250 0 0 0 0 1 1 0
## 40 710 100 0 0 0 0 0 0
## 41 564 251 0 0 1 0 1 0
## 42 640 380 0 0 0 1 1 0
## 43 730 190 1 0 1 0 0 0
## 44 700 90 1 0 0 1 0 1
## 45 660 300 0 0 0 0 0 0
## 46 660 220 0 1 1 0 0 1
## 47 650 400 1 0 1 1 0 1
## 48 670 210 1 0 1 1 1 1
## 49 480 70 1 0 1 1 1 0
## 50 526 238 0 1 0 1 0 0
## 51 630 10 1 0 0 1 0 1
## 52 370 30 0 0 0 1 0 1
## 53 610 330 1 0 1 1 0 1
## 54 670 240 1 0 0 0 1 1
## 55 740 110 1 0 1 0 1 1
## 56 670 750 1 0 0 0 0 0
## 57 600 40 0 0 1 0 1 0
## 58 590 0 1 1 1 0 1 1
## 59 590 90 1 0 1 1 1 0
## 60 506 308 1 0 0 0 0 0
## 61 720 750 0 0 0 1 1 0
## 62 650 900 1 0 0 1 1 0
## 63 650 160 1 1 0 0 0 0
## 64 660 210 1 0 0 1 1 0
## 65 650 570 0 0 0 0 0 0
## 66 590 310 0 0 0 0 0 0
## 67 0 0 0 1 0 0 0 0
## 68 680 250 0 0 0 0 1 0
## 69 480 150 0 0 0 1 1 0
## 70 180 10 0 0 0 0 0 0
## 71 376 132 0 1 0 0 0 0
## 72 490 70 0 0 0 1 0 1
## 73 240 0 1 0 0 1 1 0
## 74 600 0 1 0 0 1 1 1
## 75 552 230 0 0 1 0 0 0
## 76 489 218 1 0 0 1 0 0
## 77 640 90 0 0 0 0 0 0
## 78 437 166 0 0 0 0 0 0
## 79 560 140 0 0 0 0 0 0
## 80 630 360 0 0 0 0 1 0
## 81 560 999 0 0 0 1 1 0
## 82 516 241 0 1 1 0 0 0
## 83 80 0 1 0 1 1 1 0
## 84 350 30 1 1 1 0 0 1
## 85 640 60 0 0 1 0 0 0
## 86 460 130 1 1 1 0 0 1
## 87 610 430 1 0 1 1 0 0
## 88 360 30 0 0 0 0 0 0
## 89 690 280 1 1 0 0 1 0
## 90 620 80 1 1 1 0 1 1
## 91 710 100 0 0 0 1 0 1
## 92 570 0 1 0 0 0 1 0
## 93 560 10 0 1 0 0 0 0
## 94 610 150 1 0 0 1 0 1
## 95 620 490 0 0 0 1 1 1
## 96 590 580 0 0 0 0 0 0
## 97 580 10 0 1 0 1 0 1
## 98 670 130 0 0 0 0 0 0
## 99 680 130 0 0 0 0 0 0
## 100 650 450 1 0 0 1 1 0
## 101 700 660 1 0 1 1 1 1
## 102 700 150 0 1 0 0 0 1
## 103 362 205 1 0 0 1 0 0
## 104 740 220 0 1 0 0 0 0
## 105 610 310 0 0 0 0 1 1
## 106 620 330 1 0 1 1 1 1
## 107 610 130 0 0 0 1 0 1
## 108 552 304 0 1 0 0 0 0
## 109 660 0 0 0 0 0 0 0
## 110 600 210 0 0 1 1 1 1
## 111 480 210 0 0 0 0 0 0
## 112 570 680 1 0 0 1 1 1
## 113 660 40 1 0 0 1 0 1
## 114 513 219 0 0 0 1 0 0
## 115 580 140 1 0 1 1 0 1
## 116 580 230 0 0 0 0 0 0
## 117 680 600 1 1 0 1 1 0
## 118 690 220 1 0 0 0 0 0
## 119 750 690 0 0 0 0 0 0
## 120 640 640 0 1 1 0 0 0
## 121 780 0 0 0 0 0 0 0
## 122 130 0 0 0 0 1 1 0
## 123 630 0 0 0 0 0 0 0
## 124 520 510 0 0 0 0 0 0
## 125 540 480 0 1 1 0 0 1
## 126 584 334 0 0 0 0 0 0
## 127 610 50 0 1 0 0 0 0
## 128 553 254 0 0 0 0 0 0
## 129 620 390 1 0 0 1 1 0
## 130 600 190 1 0 0 1 1 1
## 131 620 100 0 0 0 0 0 0
## 132 680 130 1 0 0 0 1 1
## 133 710 750 0 0 1 0 0 0
## 134 720 580 1 0 0 0 0 0
## 135 680 450 0 0 0 0 0 0
## 136 720 840 0 0 0 0 0 0
## 137 230 0 0 0 1 0 0 0
## 138 540 240 1 0 0 1 1 0
## 139 430 70 0 0 0 1 1 1
## 140 460 110 1 0 0 1 1 0
## 141 640 140 1 0 0 1 1 0
## 142 700 0 0 0 0 0 0 0
## 143 600 120 0 0 0 0 0 0
## 144 640 130 1 1 0 1 0 0
## 145 620 300 1 0 1 1 0 1
## 146 480 191 0 0 0 1 1 0
## 147 580 10 0 0 0 0 0 0
## 148 0 0 0 0 0 0 0 0
## 149 80 0 0 0 0 0 0 0
## 150 670 420 0 1 0 0 1 0
## 151 360 40 1 0 1 1 0 0
## 152 660 260 1 0 1 1 1 1
## 153 670 110 1 0 0 1 0 1
## 154 550 40 1 1 0 1 1 1
## 155 570 250 1 0 0 0 0 0
## 156 700 560 1 0 0 0 1 0
## 157 540 120 1 1 1 1 0 1
## 158 670 120 1 0 1 0 0 0
## 159 620 100 1 1 1 0 1 1
## 160 640 330 1 0 0 1 1 0
## 161 510 200 1 1 1 1 1 1
## 162 500 60 1 0 1 1 0 0
## 163 590 100 1 1 1 0 0 0
## 164 650 50 0 1 0 1 1 0
## 165 720 40 1 1 1 0 0 0
## 166 350 630 0 0 0 0 0 0
## 167 740 370 0 1 0 0 0 0
## 168 660 190 1 1 1 1 0 1
## 169 580 0 1 0 1 0 1 1
## 170 650 310 0 0 0 1 1 1
## 171 590 650 0 1 1 0 1 0
## 172 511 266 1 0 0 0 1 0
## 173 630 650 1 1 0 0 0 0
## 174 170 0 0 1 1 0 0 0
## 175 690 80 0 1 1 0 0 0
## 176 630 40 0 0 0 0 0 0
## 177 680 330 0 1 1 0 0 1
## 178 670 420 1 0 1 1 1 1
## 179 650 540 0 0 0 0 0 0
## 180 640 70 1 0 0 1 1 1
## 181 650 20 1 0 0 1 1 1
## 182 670 880 1 0 0 0 0 0
## 183 590 370 0 0 0 0 0 0
## 184 540 0 1 0 0 0 1 0
## RetailKids TeenWr Carlovers CountryColl
## 1 1 0 0 0
## 2 1 0 0 1
## 3 0 0 0 1
## 4 0 0 0 0
## 5 0 0 1 0
## 6 0 0 0 1
## 7 1 1 1 0
## 8 0 0 0 1
## 9 0 1 0 0
## 10 0 1 0 1
## 11 1 0 0 0
## 12 1 1 0 1
## 13 0 0 0 1
## 14 0 0 0 1
## 15 0 1 1 1
## 16 1 0 0 1
## 17 0 0 0 0
## 18 1 0 0 1
## 19 0 1 0 0
## 20 1 0 0 0
## 21 0 1 0 1
## 22 0 1 0 0
## 23 1 1 1 1
## 24 0 1 0 0
## 25 0 0 1 1
## 26 1 1 1 1
## 27 0 1 1 1
## 28 0 0 0 0
## 29 1 1 1 1
## 30 0 1 0 0
## 31 0 1 0 1
## 32 0 1 0 0
## 33 0 1 1 0
## 34 0 0 0 0
## 35 1 0 1 1
## 36 0 0 0 0
## 37 0 1 0 0
## 38 0 0 1 0
## 39 0 1 0 1
## 40 0 0 0 0
## 41 0 1 0 0
## 42 0 0 1 0
## 43 1 1 0 0
## 44 1 1 1 0
## 45 0 1 0 0
## 46 0 1 0 0
## 47 1 0 0 0
## 48 1 1 1 1
## 49 0 1 1 1
## 50 0 1 1 1
## 51 1 0 1 1
## 52 1 0 1 0
## 53 1 1 0 1
## 54 0 0 1 1
## 55 0 0 0 0
## 56 1 1 1 0
## 57 0 1 0 0
## 58 1 1 0 1
## 59 0 0 1 1
## 60 0 1 0 0
## 61 0 1 0 0
## 62 0 0 0 1
## 63 0 1 0 1
## 64 0 0 0 1
## 65 0 0 0 0
## 66 0 1 0 0
## 67 0 0 0 0
## 68 1 1 0 0
## 69 0 0 0 1
## 70 0 1 0 0
## 71 0 1 1 0
## 72 1 0 0 0
## 73 0 0 0 1
## 74 1 1 0 1
## 75 0 0 0 0
## 76 0 0 0 1
## 77 0 0 0 0
## 78 0 0 0 0
## 79 0 0 1 0
## 80 1 1 0 1
## 81 0 1 0 1
## 82 1 1 1 0
## 83 0 0 0 1
## 84 1 0 0 0
## 85 0 1 0 0
## 86 1 1 0 1
## 87 0 0 0 1
## 88 0 1 0 0
## 89 0 1 0 1
## 90 0 1 0 1
## 91 1 0 1 0
## 92 0 0 0 1
## 93 0 1 1 0
## 94 1 0 0 1
## 95 1 0 0 0
## 96 0 0 0 0
## 97 1 1 1 0
## 98 0 0 0 0
## 99 0 0 0 0
## 100 1 0 0 0
## 101 1 1 1 1
## 102 0 0 0 0
## 103 0 0 0 1
## 104 0 1 0 0
## 105 0 0 0 0
## 106 0 0 0 1
## 107 1 1 0 0
## 108 0 1 0 0
## 109 0 0 1 0
## 110 1 0 0 0
## 111 0 1 0 0
## 112 1 0 0 1
## 113 1 1 1 0
## 114 0 0 0 0
## 115 1 0 0 1
## 116 1 1 0 0
## 117 1 1 0 1
## 118 0 1 1 0
## 119 0 0 0 0
## 120 0 1 0 0
## 121 0 1 0 0
## 122 0 0 1 1
## 123 0 1 0 0
## 124 0 1 1 0
## 125 0 1 0 0
## 126 1 0 0 0
## 127 0 0 1 0
## 128 0 0 0 0
## 129 1 1 1 1
## 130 0 1 1 1
## 131 0 0 1 0
## 132 0 1 0 0
## 133 0 1 1 0
## 134 0 0 0 0
## 135 0 0 0 0
## 136 0 0 0 1
## 137 0 1 0 0
## 138 1 0 1 1
## 139 1 1 0 0
## 140 0 0 1 1
## 141 1 1 0 1
## 142 0 0 0 0
## 143 0 0 0 0
## 144 0 1 0 1
## 145 1 1 1 1
## 146 0 0 0 1
## 147 0 1 0 0
## 148 0 0 0 0
## 149 0 0 0 0
## 150 1 1 0 0
## 151 0 0 0 1
## 152 1 0 1 0
## 153 1 1 1 1
## 154 1 1 0 1
## 155 1 1 0 0
## 156 1 1 0 0
## 157 1 0 0 1
## 158 0 1 1 0
## 159 0 1 1 0
## 160 0 0 0 1
## 161 1 1 0 1
## 162 0 1 1 1
## 163 1 1 0 0
## 164 0 1 0 1
## 165 1 1 0 0
## 166 0 0 0 0
## 167 1 0 1 0
## 168 1 0 0 1
## 169 0 0 1 0
## 170 1 0 0 1
## 171 0 1 0 0
## 172 0 0 0 0
## 173 0 1 0 1
## 174 0 0 0 1
## 175 0 1 0 0
## 176 0 1 0 0
## 177 1 1 0 1
## 178 1 1 0 1
## 179 0 0 1 0
## 180 0 1 1 1
## 181 1 1 0 1
## 182 1 0 1 0
## 183 1 1 0 0
## 184 0 1 0 1
Data cleaning We have to filter out rows where the age of the customer is less than 18 because of the recent enactement of the SCOPE Act in the state of Texas. (There are other states (i.e. Utah, Arkansas) that have also enacted similar laws.)Effective September 1, 2024, the Scope Act requires digital service providers, such as companies that own websites, apps, and software, to protect minor children (under 18) from harmful content and data collection practices. This new law will primarily apply to digital services that provide an online platform for social interaction between users that: (1) allow users to create a public or semi-public profile to use the service, and (2) allow users to create or post content that can be viewed by other users of the service. This includes digital services such as message boards, chat rooms, video channels, or a main feed that presents users content created and posted by other users. On a personl note,as a mom of a 13 and 15 year old, I agree. I don’t like some of the You Tube ads and shorts that are spliced in-between other videos and shorts they are watching. There’s a difference in what is (I think) appropriate for a 13 year old versus a 17 year old. But I feel like You Tube puts them in the same category. Just because they play the same Fortnight videos doesn’t mean they should see the same ads. The length of residence has to be less than or equal to the age of the customer because of common sense. A customer can’t have lived at the residence longer than they’ve been alive.
Basic Summary Provide a basic summary of the cleaned data set. Include a table of univariate statistics to summarize each variable. Choose meaningful summary statistics for each type of variable. You should also include a basic summary of the catalog spending (SpendRat) including an appropriate graphical display. Structure
str(d1a)
## 'data.frame': 184 obs. of 21 variables:
## $ SpendRat : num 16.8 11.4 31.3 1.9 84.1 ...
## $ Age : int 35 46 41 46 46 46 56 48 54 43 ...
## $ LenRes : int 3 9 2 7 15 16 31 8 8 15 ...
## $ Income : int 5 5 2 9 5 4 6 5 5 4 ...
## $ TotAsset : int 195 123 117 493 138 162 117 119 50 135 ...
## $ SecAssets : int 36 24 25 105 27 25 27 23 10 21 ...
## $ ShortLiq : int 220 200 222 310 340 230 300 250 200 230 ...
## $ LongLiq : int 420 420 419 500 450 430 440 430 420 430 ...
## $ WlthIdx : int 430 290 279 520 440 360 400 360 230 340 ...
## $ SpendVol : int 690 600 543 680 440 690 500 610 660 610 ...
## $ SpenVel : int 570 280 308 100 50 180 10 0 0 50 ...
## $ CollGifts : int 0 1 1 0 0 1 1 1 0 1 ...
## $ BricMortar : int 1 0 0 1 1 0 1 0 1 1 ...
## $ MarthaHome : int 1 0 0 1 1 0 1 1 0 0 ...
## $ SunAds : int 0 1 1 0 0 1 0 0 0 1 ...
## $ ThemeColl : int 0 1 1 0 0 0 1 1 0 1 ...
## $ CustDec : int 1 1 0 1 1 0 1 1 0 0 ...
## $ RetailKids : int 1 1 0 0 0 0 1 0 0 0 ...
## $ TeenWr : int 0 0 0 0 0 0 1 0 1 1 ...
## $ Carlovers : int 0 0 0 0 1 0 1 0 0 0 ...
## $ CountryColl: int 0 1 1 0 0 1 0 1 0 1 ...
Summary of Cleaned Dataset
dim(d1a)
## [1] 184 21
pacman::p_load("skimr")
skim(d1a)
Name | d1a |
Number of rows | 184 |
Number of columns | 21 |
_______________________ | |
Column type frequency: | |
numeric | 21 |
________________________ | |
Group variables | None |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
SpendRat | 0 | 1 | 43.79 | 66.10 | 0.08 | 6.08 | 18.8 | 50.27 | 401.42 | ▇▁▁▁▁ |
Age | 0 | 1 | 54.71 | 13.64 | 21.00 | 44.75 | 53.0 | 63.00 | 89.00 | ▁▆▇▃▂ |
LenRes | 0 | 1 | 14.58 | 9.94 | 0.00 | 8.00 | 11.0 | 19.00 | 46.00 | ▇▆▂▂▁ |
Income | 0 | 1 | 4.47 | 1.40 | 1.00 | 4.00 | 5.0 | 5.00 | 9.00 | ▂▇▇▅▁ |
TotAsset | 0 | 1 | 184.67 | 155.01 | 5.00 | 94.75 | 150.0 | 222.50 | 999.00 | ▇▃▁▁▁ |
SecAssets | 0 | 1 | 40.90 | 79.83 | 0.00 | 19.00 | 28.0 | 42.00 | 999.00 | ▇▁▁▁▁ |
ShortLiq | 0 | 1 | 240.64 | 66.92 | 160.00 | 210.00 | 230.0 | 260.00 | 999.00 | ▇▁▁▁▁ |
LongLiq | 0 | 1 | 439.49 | 55.63 | 400.00 | 420.00 | 430.0 | 440.00 | 999.00 | ▇▁▁▁▁ |
WlthIdx | 0 | 1 | 367.12 | 90.04 | 90.00 | 300.00 | 360.0 | 430.00 | 880.00 | ▁▇▅▁▁ |
SpendVol | 0 | 1 | 568.40 | 154.00 | 0.00 | 532.00 | 610.0 | 670.00 | 780.00 | ▁▁▂▇▇ |
SpenVel | 0 | 1 | 219.52 | 217.31 | 0.00 | 40.00 | 160.0 | 310.00 | 999.00 | ▇▅▁▁▁ |
CollGifts | 0 | 1 | 0.49 | 0.50 | 0.00 | 0.00 | 0.0 | 1.00 | 1.00 | ▇▁▁▁▇ |
BricMortar | 0 | 1 | 0.29 | 0.45 | 0.00 | 0.00 | 0.0 | 1.00 | 1.00 | ▇▁▁▁▃ |
MarthaHome | 0 | 1 | 0.36 | 0.48 | 0.00 | 0.00 | 0.0 | 1.00 | 1.00 | ▇▁▁▁▅ |
SunAds | 0 | 1 | 0.43 | 0.50 | 0.00 | 0.00 | 0.0 | 1.00 | 1.00 | ▇▁▁▁▆ |
ThemeColl | 0 | 1 | 0.40 | 0.49 | 0.00 | 0.00 | 0.0 | 1.00 | 1.00 | ▇▁▁▁▅ |
CustDec | 0 | 1 | 0.35 | 0.48 | 0.00 | 0.00 | 0.0 | 1.00 | 1.00 | ▇▁▁▁▅ |
RetailKids | 0 | 1 | 0.35 | 0.48 | 0.00 | 0.00 | 0.0 | 1.00 | 1.00 | ▇▁▁▁▅ |
TeenWr | 0 | 1 | 0.52 | 0.50 | 0.00 | 0.00 | 1.0 | 1.00 | 1.00 | ▇▁▁▁▇ |
Carlovers | 0 | 1 | 0.28 | 0.45 | 0.00 | 0.00 | 0.0 | 1.00 | 1.00 | ▇▁▁▁▃ |
CountryColl | 0 | 1 | 0.42 | 0.49 | 0.00 | 0.00 | 0.0 | 1.00 | 1.00 | ▇▁▁▁▆ |
pacman::p_load(summarytools)
d1b <- summarytools::descr(d1a)
view(d1b)
## Switching method to 'browser'
## Output file written: C:\Users\aliso\AppData\Local\Temp\RtmpQrnDjK\file7de45d766736.html
pacman::p_load(psych)
psych::describe(d1a)
## vars n mean sd median trimmed mad min max range
## SpendRat 1 184 43.79 66.10 18.8 29.17 22.57 0.08 401.42 401.34
## Age 2 184 54.71 13.64 53.0 54.05 13.34 21.00 89.00 68.00
## LenRes 3 184 14.58 9.94 11.0 13.34 5.93 0.00 46.00 46.00
## Income 4 184 4.47 1.40 5.0 4.50 1.48 1.00 9.00 8.00
## TotAsset 5 184 184.67 155.01 150.0 160.48 91.18 5.00 999.00 994.00
## SecAssets 6 184 40.90 79.83 28.0 30.30 16.31 0.00 999.00 999.00
## ShortLiq 7 184 240.64 66.92 230.0 234.53 34.84 160.00 999.00 839.00
## LongLiq 8 184 439.49 55.63 430.0 430.93 14.83 400.00 999.00 599.00
## WlthIdx 9 184 367.12 90.04 360.0 364.30 88.96 90.00 880.00 790.00
## SpendVol 10 184 568.40 154.00 610.0 594.56 88.96 0.00 780.00 780.00
## SpenVel 11 184 219.52 217.31 160.0 186.17 192.74 0.00 999.00 999.00
## CollGifts 12 184 0.49 0.50 0.0 0.49 0.00 0.00 1.00 1.00
## BricMortar 13 184 0.29 0.45 0.0 0.24 0.00 0.00 1.00 1.00
## MarthaHome 14 184 0.36 0.48 0.0 0.33 0.00 0.00 1.00 1.00
## SunAds 15 184 0.43 0.50 0.0 0.41 0.00 0.00 1.00 1.00
## ThemeColl 16 184 0.40 0.49 0.0 0.37 0.00 0.00 1.00 1.00
## CustDec 17 184 0.35 0.48 0.0 0.31 0.00 0.00 1.00 1.00
## RetailKids 18 184 0.35 0.48 0.0 0.32 0.00 0.00 1.00 1.00
## TeenWr 19 184 0.52 0.50 1.0 0.52 0.00 0.00 1.00 1.00
## Carlovers 20 184 0.28 0.45 0.0 0.22 0.00 0.00 1.00 1.00
## CountryColl 21 184 0.42 0.49 0.0 0.40 0.00 0.00 1.00 1.00
## skew kurtosis se
## SpendRat 2.96 9.91 4.87
## Age 0.43 -0.45 1.01
## LenRes 1.11 0.48 0.73
## Income -0.05 0.10 0.10
## TotAsset 3.29 14.19 11.43
## SecAssets 9.84 111.25 5.89
## ShortLiq 7.97 86.90 4.93
## LongLiq 6.81 58.36 4.10
## WlthIdx 0.95 4.74 6.64
## SpendVol -1.74 3.03 11.35
## SpenVel 1.26 1.12 16.02
## CollGifts 0.04 -2.01 0.04
## BricMortar 0.93 -1.14 0.03
## MarthaHome 0.56 -1.70 0.04
## SunAds 0.28 -1.93 0.04
## ThemeColl 0.42 -1.83 0.04
## CustDec 0.63 -1.61 0.04
## RetailKids 0.61 -1.64 0.04
## TeenWr -0.06 -2.01 0.04
## Carlovers 0.99 -1.03 0.03
## CountryColl 0.33 -1.90 0.04
Summary statistics for each variable
summary(d1a)
## SpendRat Age LenRes Income
## Min. : 0.080 Min. :21.00 Min. : 0.00 Min. :1.000
## 1st Qu.: 6.077 1st Qu.:44.75 1st Qu.: 8.00 1st Qu.:4.000
## Median : 18.805 Median :53.00 Median :11.00 Median :5.000
## Mean : 43.792 Mean :54.71 Mean :14.58 Mean :4.473
## 3rd Qu.: 50.273 3rd Qu.:63.00 3rd Qu.:19.00 3rd Qu.:5.000
## Max. :401.420 Max. :89.00 Max. :46.00 Max. :9.000
## TotAsset SecAssets ShortLiq LongLiq
## Min. : 5.00 Min. : 0.0 Min. :160.0 Min. :400.0
## 1st Qu.: 94.75 1st Qu.: 19.0 1st Qu.:210.0 1st Qu.:420.0
## Median :150.00 Median : 28.0 Median :230.0 Median :430.0
## Mean :184.67 Mean : 40.9 Mean :240.6 Mean :439.5
## 3rd Qu.:222.50 3rd Qu.: 42.0 3rd Qu.:260.0 3rd Qu.:440.0
## Max. :999.00 Max. :999.0 Max. :999.0 Max. :999.0
## WlthIdx SpendVol SpenVel CollGifts
## Min. : 90.0 Min. : 0.0 Min. : 0.0 Min. :0.0000
## 1st Qu.:300.0 1st Qu.:532.0 1st Qu.: 40.0 1st Qu.:0.0000
## Median :360.0 Median :610.0 Median :160.0 Median :0.0000
## Mean :367.1 Mean :568.4 Mean :219.5 Mean :0.4891
## 3rd Qu.:430.0 3rd Qu.:670.0 3rd Qu.:310.0 3rd Qu.:1.0000
## Max. :880.0 Max. :780.0 Max. :999.0 Max. :1.0000
## BricMortar MarthaHome SunAds ThemeColl
## Min. :0.000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.288 Mean :0.3641 Mean :0.4293 Mean :0.3967
## 3rd Qu.:1.000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## CustDec RetailKids TeenWr Carlovers
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :1.0000 Median :0.0000
## Mean :0.3478 Mean :0.3533 Mean :0.5163 Mean :0.2772
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## CountryColl
## Min. :0.0000
## 1st Qu.:0.0000
## Median :0.0000
## Mean :0.4185
## 3rd Qu.:1.0000
## Max. :1.0000
library(ISLR)
?d1a
## No documentation for 'd1a' in specified packages and libraries:
## you could try '??d1a'
Descriptive statistic by each Catalog purchase category
tab_outcome <- d1a |>
tbl_summary(
by = CollGifts,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 2) %>%
modify_caption("CollGifts Non-customer & Customer Characteristics (N = {N})")
tab_outcome |>
as_gt()
Characteristic | 0 N = 941 |
1 N = 901 |
---|---|---|
SpendRat | 22.91 (36.37) | 65.60 (81.61) |
Age | 55.19 (14.57) | 54.21 (12.67) |
LenRes | 15.02 (10.09) | 14.11 (9.82) |
Income | ||
1 | 1 / 94 (1.1%) | 1 / 90 (1.1%) |
2 | 10 / 94 (11%) | 8 / 90 (8.9%) |
3 | 14 / 94 (15%) | 5 / 90 (5.6%) |
4 | 26 / 94 (28%) | 23 / 90 (26%) |
5 | 33 / 94 (35%) | 26 / 90 (29%) |
6 | 5 / 94 (5.3%) | 20 / 90 (22%) |
7 | 4 / 94 (4.3%) | 6 / 90 (6.7%) |
8 | 0 / 94 (0%) | 1 / 90 (1.1%) |
9 | 1 / 94 (1.1%) | 0 / 90 (0%) |
TotAsset | 175.22 (158.84) | 194.54 (151.16) |
SecAssets | 43.70 (106.85) | 37.98 (34.04) |
ShortLiq | 233.89 (36.67) | 247.68 (87.78) |
LongLiq | 438.63 (65.22) | 440.39 (43.78) |
WlthIdx | 355.32 (84.95) | 379.44 (93.96) |
SpendVol | 547.15 (181.23) | 590.59 (116.07) |
SpenVel | 225.33 (226.27) | 213.46 (208.63) |
BricMortar | 29 / 94 (31%) | 24 / 90 (27%) |
MarthaHome | 25 / 94 (27%) | 42 / 90 (47%) |
SunAds | 22 / 94 (23%) | 57 / 90 (63%) |
ThemeColl | 21 / 94 (22%) | 52 / 90 (58%) |
CustDec | 22 / 94 (23%) | 42 / 90 (47%) |
RetailKids | 21 / 94 (22%) | 44 / 90 (49%) |
TeenWr | 45 / 94 (48%) | 50 / 90 (56%) |
Carlovers | 20 / 94 (21%) | 31 / 90 (34%) |
CountryColl | 13 / 94 (14%) | 64 / 90 (71%) |
1 Mean (SD); n / N (%) |
tab_outcome <- d1a |>
tbl_summary(
by = BricMortar,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 2) %>%
modify_caption("BricMortar Non-customer & Customer Characteristics (N = {N})")
tab_outcome |>
as_gt()
Characteristic | 0 N = 1311 |
1 N = 531 |
---|---|---|
SpendRat | 31.57 (50.78) | 74.00 (87.31) |
Age | 55.06 (13.63) | 53.85 (13.77) |
LenRes | 14.62 (9.91) | 14.47 (10.13) |
Income | ||
1 | 2 / 131 (1.5%) | 0 / 53 (0%) |
2 | 16 / 131 (12%) | 2 / 53 (3.8%) |
3 | 13 / 131 (9.9%) | 6 / 53 (11%) |
4 | 36 / 131 (27%) | 13 / 53 (25%) |
5 | 42 / 131 (32%) | 17 / 53 (32%) |
6 | 14 / 131 (11%) | 11 / 53 (21%) |
7 | 7 / 131 (5.3%) | 3 / 53 (5.7%) |
8 | 1 / 131 (0.8%) | 0 / 53 (0%) |
9 | 0 / 131 (0%) | 1 / 53 (1.9%) |
TotAsset | 187.14 (170.17) | 178.58 (110.07) |
SecAssets | 43.87 (93.51) | 33.57 (22.15) |
ShortLiq | 239.26 (74.95) | 244.04 (41.24) |
LongLiq | 441.19 (64.43) | 435.28 (22.08) |
WlthIdx | 364.81 (92.78) | 372.83 (83.46) |
SpendVol | 567.05 (158.37) | 571.74 (144.05) |
SpenVel | 230.47 (225.59) | 192.47 (194.71) |
CollGifts | 66 / 131 (50%) | 24 / 53 (45%) |
MarthaHome | 36 / 131 (27%) | 31 / 53 (58%) |
SunAds | 64 / 131 (49%) | 15 / 53 (28%) |
ThemeColl | 58 / 131 (44%) | 15 / 53 (28%) |
CustDec | 41 / 131 (31%) | 23 / 53 (43%) |
RetailKids | 45 / 131 (34%) | 20 / 53 (38%) |
TeenWr | 56 / 131 (43%) | 39 / 53 (74%) |
Carlovers | 36 / 131 (27%) | 15 / 53 (28%) |
CountryColl | 55 / 131 (42%) | 22 / 53 (42%) |
1 Mean (SD); n / N (%) |
tab_outcome <- d1a |>
tbl_summary(
by = MarthaHome,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 2) %>%
modify_caption("MarthaHome Non-customer & Customer Characteristics (N = {N})")
tab_outcome |>
as_gt()
Characteristic | 0 N = 1171 |
1 N = 671 |
---|---|---|
SpendRat | 27.13 (44.11) | 72.89 (85.69) |
Age | 54.48 (14.64) | 55.12 (11.80) |
LenRes | 14.79 (10.14) | 14.19 (9.65) |
Income | ||
1 | 2 / 117 (1.7%) | 0 / 67 (0%) |
2 | 14 / 117 (12%) | 4 / 67 (6.0%) |
3 | 15 / 117 (13%) | 4 / 67 (6.0%) |
4 | 33 / 117 (28%) | 16 / 67 (24%) |
5 | 36 / 117 (31%) | 23 / 67 (34%) |
6 | 12 / 117 (10%) | 13 / 67 (19%) |
7 | 4 / 117 (3.4%) | 6 / 67 (9.0%) |
8 | 1 / 117 (0.9%) | 0 / 67 (0%) |
9 | 0 / 117 (0%) | 1 / 67 (1.5%) |
TotAsset | 173.75 (156.64) | 203.75 (151.40) |
SecAssets | 41.29 (97.44) | 40.22 (31.34) |
ShortLiq | 235.98 (78.01) | 248.76 (40.21) |
LongLiq | 437.77 (63.41) | 442.49 (38.72) |
WlthIdx | 355.22 (93.29) | 387.90 (80.58) |
SpendVol | 565.49 (161.73) | 573.48 (140.53) |
SpenVel | 232.07 (230.79) | 197.61 (191.20) |
CollGifts | 48 / 117 (41%) | 42 / 67 (63%) |
BricMortar | 22 / 117 (19%) | 31 / 67 (46%) |
SunAds | 50 / 117 (43%) | 29 / 67 (43%) |
ThemeColl | 46 / 117 (39%) | 27 / 67 (40%) |
CustDec | 24 / 117 (21%) | 40 / 67 (60%) |
RetailKids | 36 / 117 (31%) | 29 / 67 (43%) |
TeenWr | 56 / 117 (48%) | 39 / 67 (58%) |
Carlovers | 31 / 117 (26%) | 20 / 67 (30%) |
CountryColl | 47 / 117 (40%) | 30 / 67 (45%) |
1 Mean (SD); n / N (%) |
tab_outcome <- d1a |>
tbl_summary(
by = SunAds,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 2) %>%
modify_caption("SunAds Non-customer & Customer Characteristics (N = {N})")
tab_outcome |>
as_gt()
Characteristic | 0 N = 1051 |
1 N = 791 |
---|---|---|
SpendRat | 35.18 (55.44) | 55.24 (76.94) |
Age | 53.56 (13.80) | 56.24 (13.37) |
LenRes | 14.45 (9.97) | 14.75 (9.97) |
Income | ||
1 | 1 / 105 (1.0%) | 1 / 79 (1.3%) |
2 | 10 / 105 (9.5%) | 8 / 79 (10%) |
3 | 15 / 105 (14%) | 4 / 79 (5.1%) |
4 | 25 / 105 (24%) | 24 / 79 (30%) |
5 | 33 / 105 (31%) | 26 / 79 (33%) |
6 | 13 / 105 (12%) | 12 / 79 (15%) |
7 | 7 / 105 (6.7%) | 3 / 79 (3.8%) |
8 | 0 / 105 (0%) | 1 / 79 (1.3%) |
9 | 1 / 105 (1.0%) | 0 / 79 (0%) |
TotAsset | 187.06 (151.81) | 181.51 (160.10) |
SecAssets | 36.74 (40.52) | 46.43 (112.74) |
ShortLiq | 234.90 (37.27) | 248.27 (92.48) |
LongLiq | 436.44 (36.31) | 443.54 (73.98) |
WlthIdx | 362.28 (83.01) | 373.56 (98.79) |
SpendVol | 574.23 (165.58) | 560.65 (137.78) |
SpenVel | 225.30 (222.36) | 211.85 (211.57) |
CollGifts | 33 / 105 (31%) | 57 / 79 (72%) |
BricMortar | 38 / 105 (36%) | 15 / 79 (19%) |
MarthaHome | 38 / 105 (36%) | 29 / 79 (37%) |
ThemeColl | 24 / 105 (23%) | 49 / 79 (62%) |
CustDec | 24 / 105 (23%) | 40 / 79 (51%) |
RetailKids | 22 / 105 (21%) | 43 / 79 (54%) |
TeenWr | 59 / 105 (56%) | 36 / 79 (46%) |
Carlovers | 23 / 105 (22%) | 28 / 79 (35%) |
CountryColl | 17 / 105 (16%) | 60 / 79 (76%) |
1 Mean (SD); n / N (%) |
tab_outcome <- d1a |>
tbl_summary(
by = ThemeColl,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 2) %>%
modify_caption("ThemeColl Non-customer & Customer Characteristics (N = {N})")
tab_outcome |>
as_gt()
Characteristic | 0 N = 1111 |
1 N = 731 |
---|---|---|
SpendRat | 31.52 (43.96) | 62.46 (86.98) |
Age | 54.97 (14.55) | 54.32 (12.22) |
LenRes | 14.92 (10.38) | 14.05 (9.29) |
Income | ||
1 | 2 / 111 (1.8%) | 0 / 73 (0%) |
2 | 11 / 111 (9.9%) | 7 / 73 (9.6%) |
3 | 14 / 111 (13%) | 5 / 73 (6.8%) |
4 | 30 / 111 (27%) | 19 / 73 (26%) |
5 | 31 / 111 (28%) | 28 / 73 (38%) |
6 | 15 / 111 (14%) | 10 / 73 (14%) |
7 | 7 / 111 (6.3%) | 3 / 73 (4.1%) |
8 | 0 / 111 (0%) | 1 / 73 (1.4%) |
9 | 1 / 111 (0.9%) | 0 / 73 (0%) |
TotAsset | 170.69 (126.69) | 205.93 (189.19) |
SecAssets | 34.08 (36.08) | 51.27 (118.45) |
ShortLiq | 234.49 (36.12) | 249.99 (96.14) |
LongLiq | 433.50 (27.66) | 448.59 (80.98) |
WlthIdx | 356.50 (81.11) | 383.27 (100.58) |
SpendVol | 563.67 (165.52) | 575.59 (135.42) |
SpenVel | 218.80 (212.93) | 220.62 (225.29) |
CollGifts | 38 / 111 (34%) | 52 / 73 (71%) |
BricMortar | 38 / 111 (34%) | 15 / 73 (21%) |
MarthaHome | 40 / 111 (36%) | 27 / 73 (37%) |
SunAds | 30 / 111 (27%) | 49 / 73 (67%) |
CustDec | 34 / 111 (31%) | 30 / 73 (41%) |
RetailKids | 34 / 111 (31%) | 31 / 73 (42%) |
TeenWr | 53 / 111 (48%) | 42 / 73 (58%) |
Carlovers | 29 / 111 (26%) | 22 / 73 (30%) |
CountryColl | 27 / 111 (24%) | 50 / 73 (68%) |
1 Mean (SD); n / N (%) |
tab_outcome <- d1a |>
tbl_summary(
by = CustDec,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 2) %>%
modify_caption("CustDec Non-customer & Customer Characteristics (N = {N})")
tab_outcome |>
as_gt()
Characteristic | 0 N = 1201 |
1 N = 641 |
---|---|---|
SpendRat | 33.72 (54.75) | 62.67 (80.46) |
Age | 55.42 (14.63) | 53.39 (11.56) |
LenRes | 14.52 (9.94) | 14.69 (10.04) |
Income | ||
1 | 2 / 120 (1.7%) | 0 / 64 (0%) |
2 | 14 / 120 (12%) | 4 / 64 (6.3%) |
3 | 18 / 120 (15%) | 1 / 64 (1.6%) |
4 | 34 / 120 (28%) | 15 / 64 (23%) |
5 | 31 / 120 (26%) | 28 / 64 (44%) |
6 | 14 / 120 (12%) | 11 / 64 (17%) |
7 | 6 / 120 (5.0%) | 4 / 64 (6.3%) |
8 | 1 / 120 (0.8%) | 0 / 64 (0%) |
9 | 0 / 120 (0%) | 1 / 64 (1.6%) |
TotAsset | 175.60 (157.85) | 201.69 (149.28) |
SecAssets | 35.03 (42.16) | 51.91 (122.34) |
ShortLiq | 236.78 (78.51) | 247.88 (35.86) |
LongLiq | 434.73 (42.18) | 448.42 (74.19) |
WlthIdx | 356.39 (96.25) | 387.23 (73.62) |
SpendVol | 554.78 (173.19) | 593.92 (105.92) |
SpenVel | 238.03 (236.28) | 184.81 (172.75) |
CollGifts | 48 / 120 (40%) | 42 / 64 (66%) |
BricMortar | 30 / 120 (25%) | 23 / 64 (36%) |
MarthaHome | 27 / 120 (23%) | 40 / 64 (63%) |
SunAds | 39 / 120 (33%) | 40 / 64 (63%) |
ThemeColl | 43 / 120 (36%) | 30 / 64 (47%) |
RetailKids | 25 / 120 (21%) | 40 / 64 (63%) |
TeenWr | 64 / 120 (53%) | 31 / 64 (48%) |
Carlovers | 31 / 120 (26%) | 20 / 64 (31%) |
CountryColl | 45 / 120 (38%) | 32 / 64 (50%) |
1 Mean (SD); n / N (%) |
tab_outcome <- d1a |>
tbl_summary(
by = RetailKids,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 2) %>%
modify_caption("RetailKids Non-customer & Customer Characteristics (N = {N})")
tab_outcome |>
as_gt()
Characteristic | 0 N = 1191 |
1 N = 651 |
---|---|---|
SpendRat | 38.25 (60.70) | 53.94 (74.41) |
Age | 56.48 (14.03) | 51.48 (12.37) |
LenRes | 14.19 (9.60) | 15.28 (10.59) |
Income | ||
1 | 2 / 119 (1.7%) | 0 / 65 (0%) |
2 | 14 / 119 (12%) | 4 / 65 (6.2%) |
3 | 16 / 119 (13%) | 3 / 65 (4.6%) |
4 | 33 / 119 (28%) | 16 / 65 (25%) |
5 | 36 / 119 (30%) | 23 / 65 (35%) |
6 | 12 / 119 (10%) | 13 / 65 (20%) |
7 | 5 / 119 (4.2%) | 5 / 65 (7.7%) |
8 | 0 / 119 (0%) | 1 / 65 (1.5%) |
9 | 1 / 119 (0.8%) | 0 / 65 (0%) |
TotAsset | 170.97 (124.97) | 209.77 (197.29) |
SecAssets | 35.13 (35.39) | 51.46 (125.46) |
ShortLiq | 236.72 (38.02) | 247.80 (100.31) |
LongLiq | 434.95 (27.51) | 447.80 (85.70) |
WlthIdx | 358.50 (84.95) | 382.91 (97.38) |
SpendVol | 554.24 (172.89) | 594.31 (107.94) |
SpenVel | 208.21 (221.50) | 240.23 (209.51) |
CollGifts | 46 / 119 (39%) | 44 / 65 (68%) |
BricMortar | 33 / 119 (28%) | 20 / 65 (31%) |
MarthaHome | 38 / 119 (32%) | 29 / 65 (45%) |
SunAds | 36 / 119 (30%) | 43 / 65 (66%) |
ThemeColl | 42 / 119 (35%) | 31 / 65 (48%) |
CustDec | 24 / 119 (20%) | 40 / 65 (62%) |
TeenWr | 57 / 119 (48%) | 38 / 65 (58%) |
Carlovers | 29 / 119 (24%) | 22 / 65 (34%) |
CountryColl | 44 / 119 (37%) | 33 / 65 (51%) |
1 Mean (SD); n / N (%) |
tab_outcome <- d1a |>
tbl_summary(
by = TeenWr,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 2) %>%
modify_caption("TeenWr Non-customer & Customer Characteristics (N = {N})")
tab_outcome |>
as_gt()
Characteristic | 0 N = 891 |
1 N = 951 |
---|---|---|
SpendRat | 30.11 (49.27) | 56.61 (76.75) |
Age | 57.35 (14.28) | 52.24 (12.59) |
LenRes | 14.54 (10.42) | 14.61 (9.53) |
Income | ||
1 | 2 / 89 (2.2%) | 0 / 95 (0%) |
2 | 12 / 89 (13%) | 6 / 95 (6.3%) |
3 | 12 / 89 (13%) | 7 / 95 (7.4%) |
4 | 22 / 89 (25%) | 27 / 95 (28%) |
5 | 25 / 89 (28%) | 34 / 95 (36%) |
6 | 8 / 89 (9.0%) | 17 / 95 (18%) |
7 | 6 / 89 (6.7%) | 4 / 95 (4.2%) |
8 | 1 / 89 (1.1%) | 0 / 95 (0%) |
9 | 1 / 89 (1.1%) | 0 / 95 (0%) |
TotAsset | 175.53 (156.80) | 193.24 (153.65) |
SecAssets | 45.22 (109.25) | 36.85 (34.64) |
ShortLiq | 232.90 (34.70) | 247.88 (86.50) |
LongLiq | 439.27 (66.35) | 439.69 (43.62) |
WlthIdx | 357.13 (85.60) | 376.47 (93.49) |
SpendVol | 538.44 (184.85) | 596.46 (111.98) |
SpenVel | 229.67 (222.20) | 210.01 (213.36) |
CollGifts | 40 / 89 (45%) | 50 / 95 (53%) |
BricMortar | 14 / 89 (16%) | 39 / 95 (41%) |
MarthaHome | 28 / 89 (31%) | 39 / 95 (41%) |
SunAds | 43 / 89 (48%) | 36 / 95 (38%) |
ThemeColl | 31 / 89 (35%) | 42 / 95 (44%) |
CustDec | 33 / 89 (37%) | 31 / 95 (33%) |
RetailKids | 27 / 89 (30%) | 38 / 95 (40%) |
Carlovers | 22 / 89 (25%) | 29 / 95 (31%) |
CountryColl | 37 / 89 (42%) | 40 / 95 (42%) |
1 Mean (SD); n / N (%) |
tab_outcome <- d1a |>
tbl_summary(
by = Carlovers,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 2) %>%
modify_caption("Carlovers Non-customer & Customer Characteristics (N = {N})")
tab_outcome |>
as_gt()
Characteristic | 0 N = 1331 |
1 N = 511 |
---|---|---|
SpendRat | 39.54 (66.68) | 54.87 (63.86) |
Age | 55.36 (13.91) | 53.02 (12.90) |
LenRes | 14.43 (10.17) | 14.96 (9.41) |
Income | ||
1 | 2 / 133 (1.5%) | 0 / 51 (0%) |
2 | 15 / 133 (11%) | 3 / 51 (5.9%) |
3 | 15 / 133 (11%) | 4 / 51 (7.8%) |
4 | 37 / 133 (28%) | 12 / 51 (24%) |
5 | 44 / 133 (33%) | 15 / 51 (29%) |
6 | 11 / 133 (8.3%) | 14 / 51 (27%) |
7 | 7 / 133 (5.3%) | 3 / 51 (5.9%) |
8 | 1 / 133 (0.8%) | 0 / 51 (0%) |
9 | 1 / 133 (0.8%) | 0 / 51 (0%) |
TotAsset | 180.78 (143.87) | 194.82 (182.08) |
SecAssets | 41.07 (88.21) | 40.47 (52.78) |
ShortLiq | 240.00 (75.52) | 242.29 (36.49) |
LongLiq | 439.37 (59.55) | 439.80 (44.32) |
WlthIdx | 364.16 (93.12) | 374.84 (81.83) |
SpendVol | 562.11 (165.32) | 584.80 (119.45) |
SpenVel | 229.03 (219.12) | 194.73 (212.62) |
CollGifts | 59 / 133 (44%) | 31 / 51 (61%) |
BricMortar | 38 / 133 (29%) | 15 / 51 (29%) |
MarthaHome | 47 / 133 (35%) | 20 / 51 (39%) |
SunAds | 51 / 133 (38%) | 28 / 51 (55%) |
ThemeColl | 51 / 133 (38%) | 22 / 51 (43%) |
CustDec | 44 / 133 (33%) | 20 / 51 (39%) |
RetailKids | 43 / 133 (32%) | 22 / 51 (43%) |
TeenWr | 66 / 133 (50%) | 29 / 51 (57%) |
CountryColl | 54 / 133 (41%) | 23 / 51 (45%) |
1 Mean (SD); n / N (%) |
tab_outcome <- d1a |>
tbl_summary(
by = CountryColl,
statistic = list(all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} / {N} ({p}%)"),
digits = all_continuous() ~ 2) %>%
modify_caption("CountryColl Non-customer & Customer Characteristics (N = {N})")
tab_outcome |>
as_gt()
Characteristic | 0 N = 1071 |
1 N = 771 |
---|---|---|
SpendRat | 29.19 (40.13) | 64.09 (86.93) |
Age | 53.71 (13.43) | 56.10 (13.90) |
LenRes | 14.37 (9.95) | 14.86 (10.00) |
Income | ||
1 | 1 / 107 (0.9%) | 1 / 77 (1.3%) |
2 | 11 / 107 (10%) | 7 / 77 (9.1%) |
3 | 11 / 107 (10%) | 8 / 77 (10%) |
4 | 24 / 107 (22%) | 25 / 77 (32%) |
5 | 37 / 107 (35%) | 22 / 77 (29%) |
6 | 13 / 107 (12%) | 12 / 77 (16%) |
7 | 8 / 107 (7.5%) | 2 / 77 (2.6%) |
8 | 1 / 107 (0.9%) | 0 / 77 (0%) |
9 | 1 / 107 (0.9%) | 0 / 77 (0%) |
TotAsset | 188.04 (154.14) | 180.00 (157.11) |
SecAssets | 44.06 (100.33) | 36.52 (35.70) |
ShortLiq | 236.06 (37.31) | 247.00 (93.66) |
LongLiq | 439.37 (61.94) | 439.65 (45.83) |
WlthIdx | 364.57 (83.50) | 370.66 (98.87) |
SpendVol | 572.31 (164.94) | 562.96 (138.25) |
SpenVel | 225.47 (215.72) | 211.26 (220.65) |
CollGifts | 26 / 107 (24%) | 64 / 77 (83%) |
BricMortar | 31 / 107 (29%) | 22 / 77 (29%) |
MarthaHome | 37 / 107 (35%) | 30 / 77 (39%) |
SunAds | 19 / 107 (18%) | 60 / 77 (78%) |
ThemeColl | 23 / 107 (21%) | 50 / 77 (65%) |
CustDec | 32 / 107 (30%) | 32 / 77 (42%) |
RetailKids | 32 / 107 (30%) | 33 / 77 (43%) |
TeenWr | 55 / 107 (51%) | 40 / 77 (52%) |
Carlovers | 28 / 107 (26%) | 23 / 77 (30%) |
1 Mean (SD); n / N (%) |
Summary and descriptive statistics of Spending Ratio –>
psych::describe(d1a$SpendRat)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 184 43.79 66.1 18.8 29.17 22.57 0.08 401.42 401.34 2.96 9.91
## se
## X1 4.87
summary(d1a$SpendRat)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.080 6.077 18.805 43.792 50.273 401.420
Histograms of Spending Ratios –>
hist(d1a$SpendRat)
change default bin width to see spending ratio spread more clearly –>
d1a |>
ggplot(aes(x=(SpendRat))) + geom_histogram(binwidth = 5, fill="blue", color = "white") + ylab("Frequency") + xlab("Spending Ratio") + ggtitle("Spending Ratio Distribution")
take log of spending ratio to normalize it and see a better patttern with the right skewed histogram –>
d1a |>
ggplot(aes(x=log(SpendRat))) + geom_histogram(binwidth = 0.5, fill="blue", color = "white") + ylab("Frequency") + xlab("log of Spending Ratio") + ggtitle("Spending Ratio Distribution")
additional histogram reducing the spending ratio by a factor of 10 –>
d1a |>
ggplot(aes(x=(SpendRat/10))) + geom_histogram(binwidth = 1, fill="blue", color = "white") + ylab("Frequency") + xlab("Spending Ratio") + ggtitle("Spending Ratio Distribution")
d1a |>
ggplot(aes(x=Age, y=SpendRat, fill = Age)) + geom_bar(stat = "identity", fill="blue") + ylab("Spending Ratio") + xlab("Age") + ggtitle("Spending Ratio vs Age")
EDA summary statistics –>
create_report(d1a)
##
##
## processing file: report.rmd
## | | | 0% | |. | 2% | |.. | 5% [global_options] | |... | 7% | |.... | 10% [introduce] | |.... | 12% | |..... | 14% [plot_intro]
## | |...... | 17% | |....... | 19% [data_structure] | |........ | 21% | |......... | 24% [missing_profile]
## | |.......... | 26% | |........... | 29% [univariate_distribution_header] | |........... | 31% | |............ | 33% [plot_histogram]
## | |............. | 36% | |.............. | 38% [plot_density] | |............... | 40% | |................ | 43% [plot_frequency_bar] | |................. | 45% | |.................. | 48% [plot_response_bar] | |.................. | 50% | |................... | 52% [plot_with_bar] | |.................... | 55% | |..................... | 57% [plot_normal_qq]
## | |...................... | 60% | |....................... | 62% [plot_response_qq] | |........................ | 64% | |......................... | 67% [plot_by_qq] | |.......................... | 69% | |.......................... | 71% [correlation_analysis]
## | |........................... | 74% | |............................ | 76% [principal_component_analysis]
## | |............................. | 79% | |.............................. | 81% [bivariate_distribution_header] | |............................... | 83% | |................................ | 86% [plot_response_boxplot] | |................................. | 88% | |................................. | 90% [plot_by_boxplot] | |.................................. | 93% | |................................... | 95% [plot_response_scatterplot] | |.................................... | 98% | |.....................................| 100% [plot_by_scatterplot]
## output file: C:/Users/aliso/Documents/UTSA/Statisctical Modeling/Final Exam/report.knit.md
## "C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/pandoc" +RTS -K512m -RTS "C:\Users\aliso\DOCUME~1\UTSA\STATIS~1\FINALE~1\REPORT~1.MD" --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc7de43bfac7e.html --lua-filter "C:\Users\aliso\AppData\Local\R\win-library\4.4\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\aliso\AppData\Local\R\win-library\4.4\rmarkdown\rmarkdown\lua\latex-div.lua" --embed-resources --standalone --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 6 --template "C:\Users\aliso\AppData\Local\R\win-library\4.4\rmarkdown\rmd\h\default.html" --no-highlight --variable highlightjs=1 --variable theme=yeti --mathjax --variable "mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --include-in-header "C:\Users\aliso\AppData\Local\Temp\RtmpQrnDjK\rmarkdown-str7de413b71940.html"
##
## Output created: report.html
Graphical display for each variable –> histogram of each variable –>
plot_histogram(d1a)
1. Produce a scatterplot matrix which includes all of the variables in
the data set. –>
scatter plot of each variable –>
plot(SpendRat ~ ., pch = 19, col="blue", data = d1a)
pairs(d1a, col = "orange", pch=19)
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
Scatter_Matrix <-ggpairs(d1a)
ggsave("Scatter plot matrix1.png", Scatter_Matrix, width = 15,
height = 15, units = "in")
Scatter_Matrix
2. Use the lm() function to perform a multiple linear regression –>
with SpendRat as the response and all other variables as –> the
predictors. Use the summary() function to print the results. –>
Comment on the output. For instance: –>
Look for correlations –>
pacman::p_load("Hmisc")
rcorr_matrix <- rcorr(as.matrix(d1a))
print(rcorr_matrix)
## SpendRat Age LenRes Income TotAsset SecAssets ShortLiq LongLiq
## SpendRat 1.00 0.08 0.11 0.09 0.02 -0.01 0.11 0.03
## Age 0.08 1.00 0.35 -0.13 -0.05 0.04 0.01 0.11
## LenRes 0.11 0.35 1.00 -0.16 -0.06 -0.02 -0.01 0.04
## Income 0.09 -0.13 -0.16 1.00 0.23 0.10 0.10 0.13
## TotAsset 0.02 -0.05 -0.06 0.23 1.00 0.74 0.46 0.82
## SecAssets -0.01 0.04 -0.02 0.10 0.74 1.00 0.18 0.93
## ShortLiq 0.11 0.01 -0.01 0.10 0.46 0.18 1.00 0.43
## LongLiq 0.03 0.11 0.04 0.13 0.82 0.93 0.43 1.00
## WlthIdx 0.09 -0.17 -0.08 0.23 0.79 0.45 0.66 0.56
## SpendVol 0.00 -0.49 -0.32 0.29 0.42 0.16 0.02 0.11
## SpenVel -0.03 -0.19 -0.13 0.07 0.10 0.11 -0.29 -0.09
## CollGifts 0.32 -0.04 -0.05 0.17 0.06 -0.04 0.10 0.02
## BricMortar 0.29 -0.04 -0.01 0.15 -0.03 -0.06 0.03 -0.05
## MarthaHome 0.33 0.02 -0.03 0.23 0.09 -0.01 0.09 0.04
## SunAds 0.15 0.10 0.01 0.02 -0.02 0.06 0.10 0.06
## ThemeColl 0.23 -0.02 -0.04 0.06 0.11 0.11 0.11 0.13
## CustDec 0.21 -0.07 0.01 0.23 0.08 0.10 0.08 0.12
## RetailKids 0.11 -0.18 0.05 0.21 0.12 0.10 0.08 0.11
## TeenWr 0.20 -0.19 0.00 0.13 0.06 -0.05 0.11 0.00
## Carlovers 0.10 -0.08 0.02 0.16 0.04 0.00 0.02 0.00
## CountryColl 0.26 0.09 0.02 -0.07 -0.03 -0.05 0.08 0.00
## WlthIdx SpendVol SpenVel CollGifts BricMortar MarthaHome SunAds
## SpendRat 0.09 0.00 -0.03 0.32 0.29 0.33 0.15
## Age -0.17 -0.49 -0.19 -0.04 -0.04 0.02 0.10
## LenRes -0.08 -0.32 -0.13 -0.05 -0.01 -0.03 0.01
## Income 0.23 0.29 0.07 0.17 0.15 0.23 0.02
## TotAsset 0.79 0.42 0.10 0.06 -0.03 0.09 -0.02
## SecAssets 0.45 0.16 0.11 -0.04 -0.06 -0.01 0.06
## ShortLiq 0.66 0.02 -0.29 0.10 0.03 0.09 0.10
## LongLiq 0.56 0.11 -0.09 0.02 -0.05 0.04 0.06
## WlthIdx 1.00 0.47 0.06 0.13 0.04 0.18 0.06
## SpendVol 0.47 1.00 0.34 0.14 0.01 0.03 -0.04
## SpenVel 0.06 0.34 1.00 -0.03 -0.08 -0.08 -0.03
## CollGifts 0.13 0.14 -0.03 1.00 -0.05 0.21 0.40
## BricMortar 0.04 0.01 -0.08 -0.05 1.00 0.29 -0.19
## MarthaHome 0.18 0.03 -0.08 0.21 0.29 1.00 0.01
## SunAds 0.06 -0.04 -0.03 0.40 -0.19 0.01 1.00
## ThemeColl 0.15 0.04 0.00 0.36 -0.15 0.01 0.40
## CustDec 0.16 0.12 -0.12 0.24 0.12 0.40 0.29
## RetailKids 0.13 0.12 0.07 0.28 0.03 0.13 0.35
## TeenWr 0.11 0.19 -0.05 0.08 0.28 0.10 -0.11
## Carlovers 0.05 0.07 -0.07 0.15 0.01 0.04 0.15
## CountryColl 0.03 -0.03 -0.03 0.58 0.00 0.04 0.60
## ThemeColl CustDec RetailKids TeenWr Carlovers CountryColl
## SpendRat 0.23 0.21 0.11 0.20 0.10 0.26
## Age -0.02 -0.07 -0.18 -0.19 -0.08 0.09
## LenRes -0.04 0.01 0.05 0.00 0.02 0.02
## Income 0.06 0.23 0.21 0.13 0.16 -0.07
## TotAsset 0.11 0.08 0.12 0.06 0.04 -0.03
## SecAssets 0.11 0.10 0.10 -0.05 0.00 -0.05
## ShortLiq 0.11 0.08 0.08 0.11 0.02 0.08
## LongLiq 0.13 0.12 0.11 0.00 0.00 0.00
## WlthIdx 0.15 0.16 0.13 0.11 0.05 0.03
## SpendVol 0.04 0.12 0.12 0.19 0.07 -0.03
## SpenVel 0.00 -0.12 0.07 -0.05 -0.07 -0.03
## CollGifts 0.36 0.24 0.28 0.08 0.15 0.58
## BricMortar -0.15 0.12 0.03 0.28 0.01 0.00
## MarthaHome 0.01 0.40 0.13 0.10 0.04 0.04
## SunAds 0.40 0.29 0.35 -0.11 0.15 0.60
## ThemeColl 1.00 0.11 0.12 0.10 0.04 0.44
## CustDec 0.11 1.00 0.42 -0.05 0.06 0.12
## RetailKids 0.12 0.42 1.00 0.10 0.10 0.13
## TeenWr 0.10 -0.05 0.10 1.00 0.06 0.01
## Carlovers 0.04 0.06 0.10 0.06 1.00 0.04
## CountryColl 0.44 0.12 0.13 0.01 0.04 1.00
##
## n= 184
##
##
## P
## SpendRat Age LenRes Income TotAsset SecAssets ShortLiq LongLiq
## SpendRat 0.3115 0.1541 0.2037 0.7831 0.9383 0.1534 0.6763
## Age 0.3115 0.0000 0.0819 0.4715 0.5607 0.8711 0.1313
## LenRes 0.1541 0.0000 0.0288 0.4383 0.7993 0.8717 0.5658
## Income 0.2037 0.0819 0.0288 0.0019 0.1738 0.1936 0.0880
## TotAsset 0.7831 0.4715 0.4383 0.0019 0.0000 0.0000 0.0000
## SecAssets 0.9383 0.5607 0.7993 0.1738 0.0000 0.0175 0.0000
## ShortLiq 0.1534 0.8711 0.8717 0.1936 0.0000 0.0175 0.0000
## LongLiq 0.6763 0.1313 0.5658 0.0880 0.0000 0.0000 0.0000
## WlthIdx 0.1996 0.0207 0.3038 0.0016 0.0000 0.0000 0.0000 0.0000
## SpendVol 0.9976 0.0000 0.0000 0.0000 0.0000 0.0353 0.7548 0.1297
## SpenVel 0.6821 0.0098 0.0732 0.3638 0.1834 0.1343 0.0000 0.2155
## CollGifts 0.0000 0.6274 0.5363 0.0233 0.3995 0.6281 0.1631 0.8307
## BricMortar 0.0000 0.5867 0.9281 0.0364 0.7357 0.4294 0.6622 0.5156
## MarthaHome 0.0000 0.7601 0.6944 0.0017 0.2075 0.9308 0.2136 0.5808
## SunAds 0.0413 0.1882 0.8405 0.7789 0.8107 0.4167 0.1805 0.3925
## ThemeColl 0.0017 0.7499 0.5656 0.4215 0.1318 0.1535 0.1246 0.0718
## CustDec 0.0044 0.3387 0.9120 0.0019 0.2781 0.1728 0.2851 0.1119
## RetailKids 0.1240 0.0170 0.4813 0.0035 0.1048 0.1856 0.2844 0.1346
## TeenWr 0.0062 0.0108 0.9614 0.0898 0.4401 0.4787 0.1294 0.9589
## Carlovers 0.1597 0.2987 0.7462 0.0348 0.5837 0.9639 0.8358 0.9622
## CountryColl 0.0003 0.2414 0.7460 0.3160 0.7297 0.5291 0.2750 0.9737
## WlthIdx SpendVol SpenVel CollGifts BricMortar MarthaHome SunAds
## SpendRat 0.1996 0.9976 0.6821 0.0000 0.0000 0.0000 0.0413
## Age 0.0207 0.0000 0.0098 0.6274 0.5867 0.7601 0.1882
## LenRes 0.3038 0.0000 0.0732 0.5363 0.9281 0.6944 0.8405
## Income 0.0016 0.0000 0.3638 0.0233 0.0364 0.0017 0.7789
## TotAsset 0.0000 0.0000 0.1834 0.3995 0.7357 0.2075 0.8107
## SecAssets 0.0000 0.0353 0.1343 0.6281 0.4294 0.9308 0.4167
## ShortLiq 0.0000 0.7548 0.0000 0.1631 0.6622 0.2136 0.1805
## LongLiq 0.0000 0.1297 0.2155 0.8307 0.5156 0.5808 0.3925
## WlthIdx 0.0000 0.3983 0.0691 0.5856 0.0174 0.4017
## SpendVol 0.0000 0.0000 0.0556 0.8522 0.7359 0.5552
## SpenVel 0.3983 0.0000 0.7121 0.2840 0.3020 0.6790
## CollGifts 0.0691 0.0556 0.7121 0.5335 0.0045 0.0000
## BricMortar 0.5856 0.8522 0.2840 0.5335 0.0000 0.0106
## MarthaHome 0.0174 0.7359 0.3020 0.0045 0.0000 0.9427
## SunAds 0.4017 0.5552 0.6790 0.0000 0.0106 0.9427
## ThemeColl 0.0481 0.6088 0.9560 0.0000 0.0452 0.8964 0.0000
## CustDec 0.0265 0.1008 0.1138 0.0008 0.1200 0.0000 0.0000
## RetailKids 0.0787 0.0917 0.3408 0.0001 0.6656 0.0884 0.0000
## TeenWr 0.1459 0.0103 0.5411 0.2998 0.0001 0.1785 0.1553
## Carlovers 0.4727 0.3723 0.3392 0.0464 0.9109 0.6269 0.0425
## CountryColl 0.6520 0.6858 0.6630 0.0000 0.9531 0.5449 0.0000
## ThemeColl CustDec RetailKids TeenWr Carlovers CountryColl
## SpendRat 0.0017 0.0044 0.1240 0.0062 0.1597 0.0003
## Age 0.7499 0.3387 0.0170 0.0108 0.2987 0.2414
## LenRes 0.5656 0.9120 0.4813 0.9614 0.7462 0.7460
## Income 0.4215 0.0019 0.0035 0.0898 0.0348 0.3160
## TotAsset 0.1318 0.2781 0.1048 0.4401 0.5837 0.7297
## SecAssets 0.1535 0.1728 0.1856 0.4787 0.9639 0.5291
## ShortLiq 0.1246 0.2851 0.2844 0.1294 0.8358 0.2750
## LongLiq 0.0718 0.1119 0.1346 0.9589 0.9622 0.9737
## WlthIdx 0.0481 0.0265 0.0787 0.1459 0.4727 0.6520
## SpendVol 0.6088 0.1008 0.0917 0.0103 0.3723 0.6858
## SpenVel 0.9560 0.1138 0.3408 0.5411 0.3392 0.6630
## CollGifts 0.0000 0.0008 0.0001 0.2998 0.0464 0.0000
## BricMortar 0.0452 0.1200 0.6656 0.0001 0.9109 0.9531
## MarthaHome 0.8964 0.0000 0.0884 0.1785 0.6269 0.5449
## SunAds 0.0000 0.0000 0.0000 0.1553 0.0425 0.0000
## ThemeColl 0.1464 0.1014 0.1958 0.5546 0.0000
## CustDec 0.1464 0.0000 0.5294 0.4371 0.1027
## RetailKids 0.1014 0.0000 0.1724 0.1717 0.0705
## TeenWr 0.1958 0.5294 0.1724 0.3819 0.9421
## Carlovers 0.5546 0.4371 0.1717 0.3819 0.5824
## CountryColl 0.0000 0.1027 0.0705 0.9421 0.5824
ANOVA Analysis to look for significant predictors –>
anova_spend <- aov(SpendRat ~ ., data = d1a)
summary(anova_spend)
## Df Sum Sq Mean Sq F value Pr(>F)
## Age 1 4500 4500 1.325 0.25133
## LenRes 1 5750 5750 1.693 0.19499
## Income 1 10797 10797 3.180 0.07640 .
## TotAsset 1 6 6 0.002 0.96770
## SecAssets 1 722 722 0.213 0.64531
## ShortLiq 1 8182 8182 2.410 0.12251
## LongLiq 1 102 102 0.030 0.86271
## WlthIdx 1 8525 8525 2.511 0.11500
## SpendVol 1 821 821 0.242 0.62355
## SpenVel 1 457 457 0.135 0.71413
## CollGifts 1 70947 70947 20.896 9.54e-06 ***
## BricMortar 1 72171 72171 21.256 8.08e-06 ***
## MarthaHome 1 28263 28263 8.324 0.00444 **
## SunAds 1 2889 2889 0.851 0.35768
## ThemeColl 1 20612 20612 6.071 0.01478 *
## CustDec 1 352 352 0.104 0.74779
## RetailKids 1 348 348 0.102 0.74936
## TeenWr 1 6792 6792 2.000 0.15916
## Carlovers 1 3066 3066 0.903 0.34336
## CountryColl 1 802 802 0.236 0.62761
## Residuals 163 553431 3395
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Visualization of correlation matrix –>
plot_correlation(d1a)
Most significant correlation is CollGifts, BrickMortar, MarthaHome, ThemeColl, and (mildly) Income –>
lm_fit = lm(SpendRat ~ ., data = d1a)
summary(lm_fit)
##
## Call:
## lm(formula = SpendRat ~ ., data = d1a)
##
## Residuals:
## Min 1Q Median 3Q Max
## -101.616 -31.110 -8.238 16.176 273.558
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -37.53015 179.15157 -0.209 0.83433
## Age 0.44224 0.40620 1.089 0.27789
## LenRes 0.74215 0.50013 1.484 0.13976
## Income -1.45995 3.53909 -0.413 0.68050
## TotAsset -0.04689 0.08417 -0.557 0.57818
## SecAssets 0.10371 0.26069 0.398 0.69128
## ShortLiq 0.12096 0.13839 0.874 0.38341
## LongLiq -0.07824 0.46023 -0.170 0.86521
## WlthIdx -0.02007 0.11985 -0.167 0.86724
## SpendVol 0.01635 0.04444 0.368 0.71349
## SpenVel 0.02413 0.02743 0.880 0.38040
## CollGifts 25.96189 11.76414 2.207 0.02872 *
## BricMortar 35.20239 11.07492 3.179 0.00177 **
## MarthaHome 28.37021 10.72825 2.644 0.00898 **
## SunAds -0.70414 13.12672 -0.054 0.95729
## ThemeColl 21.83030 10.45189 2.089 0.03829 *
## CustDec 8.27352 11.63807 0.711 0.47816
## RetailKids -4.49706 11.16289 -0.403 0.68758
## TeenWr 13.49246 9.72981 1.387 0.16742
## Carlovers 9.93914 10.06260 0.988 0.32475
## CountryColl 6.64051 13.66314 0.486 0.62761
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 58.27 on 163 degrees of freedom
## Multiple R-squared: 0.3078, Adjusted R-squared: 0.2229
## F-statistic: 3.624 on 20 and 163 DF, p-value: 2.3e-06
Perform MLR using only the significant predictors –>
fit<-lm(SpendRat~ CollGifts+BricMortar+MarthaHome+ThemeColl, data = d1a)
summary(fit)
##
## Call:
## lm(formula = SpendRat ~ CollGifts + BricMortar + MarthaHome +
## ThemeColl, data = d1a)
##
## Residuals:
## Min 1Q Median 3Q Max
## -88.945 -29.201 -6.086 13.794 281.444
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.287 7.234 -0.316 0.752216
## CollGifts 29.795 9.322 3.196 0.001646 **
## BricMortar 39.163 9.906 3.953 0.000111 ***
## MarthaHome 28.301 9.452 2.994 0.003142 **
## ThemeColl 25.004 9.372 2.668 0.008330 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 57.37 on 179 degrees of freedom
## Multiple R-squared: 0.2632, Adjusted R-squared: 0.2467
## F-statistic: 15.98 on 4 and 179 DF, p-value: 3.303e-11
The R squared error actually went down with the smaller model. –>
intercept_only <- lm(SpendRat ~ 1, data = d1a)
all <- lm(SpendRat ~ ., data = d1a)
fwd.model <- step(intercept_only, direction = "forward", scope=formula(all), trace = 0)
fwd.model$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 183 799534.7 1543.340
## 2 + MarthaHome -1 89197.96 182 710336.8 1523.575
## 3 + CollGifts -1 53987.21 181 656349.5 1511.030
## 4 + BricMortar -1 43810.72 180 612538.8 1500.319
## 5 + ThemeColl -1 23427.28 179 589111.5 1495.144
## 6 + LenRes -1 13895.59 178 575216.0 1492.752
fwd.model$coefficients
## (Intercept) MarthaHome CollGifts BricMortar ThemeColl LenRes
## -15.7432470 28.6728771 30.3193733 39.2957330 25.5835398 0.8778631
bwd.model <- step(all, direction = "backward", scope=formula(all), trace = 0)
bwd.model$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 163 553430.6 1515.648
## 2 - SunAds 1 9.769722 164 553440.4 1513.651
## 3 - LongLiq 1 91.825742 165 553532.2 1511.682
## 4 - WlthIdx 1 48.460361 166 553580.6 1509.698
## 5 - SpendVol 1 402.667916 167 553983.3 1507.832
## 6 - Income 1 508.634049 168 554492.0 1506.000
## 7 - RetailKids 1 893.205537 169 555385.2 1504.297
## 8 - CustDec 1 1205.661022 170 556590.8 1502.696
## 9 - CountryColl 1 1236.098256 171 557826.9 1501.104
## 10 - SecAssets 1 1750.985529 172 559577.9 1499.680
## 11 - TotAsset 1 2601.227973 173 562179.1 1498.534
## 12 - ShortLiq 1 1368.055642 174 563547.2 1496.981
## 13 - SpenVel 1 1816.592462 175 565363.8 1495.573
## 14 - Carlovers 1 2105.016071 176 567468.8 1494.257
## 15 - Age 1 3604.670636 177 571073.5 1493.422
## 16 - TeenWr 1 4142.496165 178 575216.0 1492.752
bwd.model$coefficients
## (Intercept) LenRes CollGifts BricMortar MarthaHome ThemeColl
## -15.7432470 0.8778631 30.3193733 39.2957330 28.6728771 25.5835398
both.model <- step(intercept_only, direction = "both", scope=formula(all), trace = 0)
both.model$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 183 799534.7 1543.340
## 2 + MarthaHome -1 89197.96 182 710336.8 1523.575
## 3 + CollGifts -1 53987.21 181 656349.5 1511.030
## 4 + BricMortar -1 43810.72 180 612538.8 1500.319
## 5 + ThemeColl -1 23427.28 179 589111.5 1495.144
## 6 + LenRes -1 13895.59 178 575216.0 1492.752
both.model$coefficients
## (Intercept) MarthaHome CollGifts BricMortar ThemeColl LenRes
## -15.7432470 28.6728771 30.3193733 39.2957330 25.5835398 0.8778631
The main difference between the full model from 2 and the stepwise ones is the addition of the Length of Residence predictor and the change in the intercept. The coefficient values do not change that much. And the coefficients stay the same whether it is done forward, backward, or both directions. –>
fit2<-lm(SpendRat~ CollGifts+BricMortar+MarthaHome+ThemeColl+LenRes, data = d1a)
summary(fit2)
##
## Call:
## lm(formula = SpendRat ~ CollGifts + BricMortar + MarthaHome +
## ThemeColl + LenRes, data = d1a)
##
## Residuals:
## Min 1Q Median 3Q Max
## -97.342 -30.315 -7.095 13.601 272.223
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -15.7432 9.6691 -1.628 0.10525
## CollGifts 30.3194 9.2406 3.281 0.00124 **
## BricMortar 39.2957 9.8165 4.003 9.17e-05 ***
## MarthaHome 28.6729 9.3680 3.061 0.00255 **
## ThemeColl 25.5835 9.2909 2.754 0.00651 **
## LenRes 0.8779 0.4233 2.074 0.03955 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 56.85 on 178 degrees of freedom
## Multiple R-squared: 0.2806, Adjusted R-squared: 0.2604
## F-statistic: 13.88 on 5 and 178 DF, p-value: 1.873e-11
Select the model based on AIC –>
pacman::p_load(AICcmodavg)
model.set <- list(fit2, fit, lm_fit)
model.names <- c("fit2", "fit", "lm_fit")
aictab(model.set, modnames = model.names)
##
## Model selection based on AICc:
##
## K AICc Delta_AICc AICcWt Cum.Wt LL
## fit2 7 2017.56 0.00 0.75 0.75 -1001.46
## fit 6 2019.79 2.23 0.25 1.00 -1003.66
## lm_fit 22 2046.10 28.55 0.00 1.00 -997.91
Make diagnostic plots of each model –>
par(mfrow=c(2,2))
plot(fit2)
plot(fit2)
plot(fit)
par(mfrow=c(2,2))
plot(fit)
plot(lm_fit)
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
par(mfrow=c(2,2))
plot(lm_fit)
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
## Warning in sqrt(crit * p * (1 - hh)/hh): NaNs produced
4. Use the plot() function to produce diagnostic plots of the linear
regression fit. Comment on any problems you see with the fit. Do the
residual plots suggest any unusually large outliers? Does the leverage
plot identify any observations with unusually high leverage? –> Yes.
There are a decent number of outliers in all models. The leverage model
on fit2 appears to indicate observation 88, 90, and 161. Fit2 accounts
for 75% of the model. And even though another one explains 25% of the
model, it’s Delta AIC is more that 2 points different from the fit2
model. –>
Squared LenRes –>
squared.fit2=lm(SpendRat ~ CollGifts+BricMortar+MarthaHome+ThemeColl+
LenRes+I(LenRes^2), data = d1a)
summary (squared.fit2)
##
## Call:
## lm(formula = SpendRat ~ CollGifts + BricMortar + MarthaHome +
## ThemeColl + LenRes + I(LenRes^2), data = d1a)
##
## Residuals:
## Min 1Q Median 3Q Max
## -97.262 -30.931 -7.209 13.440 273.756
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -11.90724 14.20330 -0.838 0.40297
## CollGifts 30.15382 9.27391 3.251 0.00137 **
## BricMortar 39.22160 9.84248 3.985 9.85e-05 ***
## MarthaHome 28.77539 9.39488 3.063 0.00253 **
## ThemeColl 25.80250 9.33237 2.765 0.00630 **
## LenRes 0.31421 1.58343 0.198 0.84293
## I(LenRes^2) 0.01402 0.03795 0.369 0.71221
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 56.99 on 177 degrees of freedom
## Multiple R-squared: 0.2811, Adjusted R-squared: 0.2567
## F-statistic: 11.54 on 6 and 177 DF, p-value: 6.987e-11
Square Root LenRes–>
root.fit2=lm(SpendRat ~ CollGifts+BricMortar+MarthaHome+ThemeColl+
LenRes+I(LenRes^(.5)), data = d1a)
summary (root.fit2)
##
## Call:
## lm(formula = SpendRat ~ CollGifts + BricMortar + MarthaHome +
## ThemeColl + LenRes + I(LenRes^(0.5)), data = d1a)
##
## Residuals:
## Min 1Q Median 3Q Max
## -97.891 -30.757 -7.095 13.520 273.400
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.249 30.685 -0.008 0.993535
## CollGifts 29.759 9.319 3.193 0.001664 **
## BricMortar 39.114 9.842 3.974 0.000103 ***
## MarthaHome 28.756 9.388 3.063 0.002534 **
## ThemeColl 25.995 9.342 2.783 0.005976 **
## LenRes 1.943 2.047 0.950 0.343645
## I(LenRes^(0.5)) -8.572 16.108 -0.532 0.595281
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 56.96 on 177 degrees of freedom
## Multiple R-squared: 0.2817, Adjusted R-squared: 0.2574
## F-statistic: 11.57 on 6 and 177 DF, p-value: 6.52e-11
Log LenRes–>
log.fit2=lm(SpendRat ~ CollGifts+BricMortar+MarthaHome+ThemeColl+
LenRes+log1p(LenRes), data = d1a)
summary (log.fit2)
##
## Call:
## lm(formula = SpendRat ~ CollGifts + BricMortar + MarthaHome +
## ThemeColl + LenRes + log1p(LenRes), data = d1a)
##
## Residuals:
## Min 1Q Median 3Q Max
## -98.041 -30.635 -7.027 13.471 273.185
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.7191 29.2642 -0.025 0.980424
## CollGifts 29.7037 9.3278 3.184 0.001714 **
## BricMortar 39.1503 9.8396 3.979 0.000101 ***
## MarthaHome 28.7787 9.3886 3.065 0.002515 **
## ThemeColl 26.0015 9.3410 2.784 0.005960 **
## LenRes 1.4339 1.1065 1.296 0.196714
## log1p(LenRes) -9.0387 16.6129 -0.544 0.587072
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 56.96 on 177 degrees of freedom
## Multiple R-squared: 0.2818, Adjusted R-squared: 0.2574
## F-statistic: 11.57 on 6 and 177 DF, p-value: 6.481e-11
Stepwise with log1p(LenRes) –>
log_intercept_only <- lm(SpendRat ~ 1, data = d1a)
all1 <- lm(SpendRat ~ . + log1p(LenRes), data = d1a)
log_fwd_model <- step(log_intercept_only, direction = "forward", scope=formula(all1), trace = 0)
log_fwd_model$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 183 799534.7 1543.340
## 2 + MarthaHome -1 89197.96 182 710336.8 1523.575
## 3 + CollGifts -1 53987.21 181 656349.5 1511.030
## 4 + BricMortar -1 43810.72 180 612538.8 1500.319
## 5 + ThemeColl -1 23427.28 179 589111.5 1495.144
## 6 + LenRes -1 13895.59 178 575216.0 1492.752
log_fwd_model$coefficients
## (Intercept) MarthaHome CollGifts BricMortar ThemeColl LenRes
## -15.7432470 28.6728771 30.3193733 39.2957330 25.5835398 0.8778631
log_bwd_model <- step(all1, direction = "backward", scope=formula(all1), trace = 0)
log_bwd_model$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 162 551074.8 1516.863
## 2 - SunAds 1 50.70110 163 551125.5 1514.880
## 3 - WlthIdx 1 63.00329 164 551188.5 1512.901
## 4 - LongLiq 1 95.15733 165 551283.7 1510.933
## 5 - Income 1 619.04464 166 551902.7 1509.139
## 6 - SpendVol 1 566.49222 167 552469.2 1507.328
## 7 - RetailKids 1 1122.29669 168 553591.5 1505.701
## 8 - CustDec 1 1112.83280 169 554704.3 1504.071
## 9 - CountryColl 1 1226.51877 170 555930.8 1502.477
## 10 - log1p(LenRes) 1 1896.07435 171 557826.9 1501.104
## 11 - SecAssets 1 1750.98553 172 559577.9 1499.680
## 12 - TotAsset 1 2601.22797 173 562179.1 1498.534
## 13 - ShortLiq 1 1368.05564 174 563547.2 1496.981
## 14 - SpenVel 1 1816.59246 175 565363.8 1495.573
## 15 - Carlovers 1 2105.01607 176 567468.8 1494.257
## 16 - Age 1 3604.67064 177 571073.5 1493.422
## 17 - TeenWr 1 4142.49616 178 575216.0 1492.752
log_bwd_model$coefficients
## (Intercept) LenRes CollGifts BricMortar MarthaHome ThemeColl
## -15.7432470 0.8778631 30.3193733 39.2957330 28.6728771 25.5835398
log_both_model <- step(log_intercept_only, direction = "both", scope=formula(all1), trace = 0)
log_both_model$anova
## Step Df Deviance Resid. Df Resid. Dev AIC
## 1 NA NA 183 799534.7 1543.340
## 2 + MarthaHome -1 89197.96 182 710336.8 1523.575
## 3 + CollGifts -1 53987.21 181 656349.5 1511.030
## 4 + BricMortar -1 43810.72 180 612538.8 1500.319
## 5 + ThemeColl -1 23427.28 179 589111.5 1495.144
## 6 + LenRes -1 13895.59 178 575216.0 1492.752
log_both_model$coefficients
## (Intercept) MarthaHome CollGifts BricMortar ThemeColl LenRes
## -15.7432470 28.6728771 30.3193733 39.2957330 25.5835398 0.8778631
Plot of log1p(LenRes) models –>
plot(log_fwd_model)
plot(log_bwd_model)
plot(log_both_model)
plot(log.fit2)
AIC of all models –>
log.model.set <- list(log.fit2, log_both_model, log_fwd_model,log_bwd_model, all1)
log.model.names <- c("log.fit2", "log_both_model", "log_fwd_model","log_bwd_model", "all1")
aictab(log.model.set, modnames = log.model.names)
## Warning in aictab.AIClm(log.model.set, modnames = log.model.names):
## Check model structure carefully as some models may be redundant
##
## Model selection based on AICc:
##
## K AICc Delta_AICc AICcWt Cum.Wt LL
## log_both_model 7 2017.56 0.00 0.29 0.29 -1001.46
## log_fwd_model 7 2017.56 0.00 0.29 0.59 -1001.46
## log_bwd_model 7 2017.56 0.00 0.29 0.88 -1001.46
## log.fit2 8 2019.44 1.88 0.12 1.00 -1001.31
## all1 23 2047.93 30.37 0.00 1.00 -997.52
The log of the stepwise models made it comparable to the original fit2 AIC. However, the AICcWt is only 29% so these are not better than the original liner model. Pretty much the same as far as outliers, 88, 90, and 161. The backward and forward models had 62 instead of 161 as being an outlier. –>
Comment on the results obtained. How good do these models fit the
data? Can we use any of them to predict the spending ratio?
I think that the fit2 model does a pretty good job as according to the
AICcWt it accounts for 75% of the model. So the Spending Ratio = 30.32
CollGifts + 39.30 BricMortar + 28.67 MarthaHome + 25.28 ThemeColl + 0.88
LenRes - 15.74 –>
Explore the data graphically in order to investigate the association between income and the other features. Which of the other features seem most likely to be useful in predicting income? Scatterplots and boxplots may be useful tools to answer this question. Describe your findings. –>
Package your code and responses into a single pdf and upload it to canvas. It has been my absolute pleasure. Best of luck next semester. –>
adult <- read_csv("adult.csv", na = "?")
## Rows: 32561 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (9): work_class, education, marital_status, occupation, relationship, ra...
## dbl (6): age, wgt, education_num, capital_gain, capital_loss, hours_per_week
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
create_report(adult)
##
##
## processing file: report.rmd
## | | | 0% | |. | 2% | |.. | 5% [global_options] | |... | 7% | |.... | 10% [introduce] | |.... | 12% | |..... | 14% [plot_intro]
## | |...... | 17% | |....... | 19% [data_structure] | |........ | 21% | |......... | 24% [missing_profile]
## | |.......... | 26% | |........... | 29% [univariate_distribution_header] | |........... | 31% | |............ | 33% [plot_histogram]
## | |............. | 36% | |.............. | 38% [plot_density] | |............... | 40% | |................ | 43% [plot_frequency_bar]
## | |................. | 45% | |.................. | 48% [plot_response_bar] | |.................. | 50% | |................... | 52% [plot_with_bar] | |.................... | 55% | |..................... | 57% [plot_normal_qq]
## | |...................... | 60% | |....................... | 62% [plot_response_qq] | |........................ | 64% | |......................... | 67% [plot_by_qq] | |.......................... | 69% | |.......................... | 71% [correlation_analysis]
## | |........................... | 74% | |............................ | 76% [principal_component_analysis]
## | |............................. | 79% | |.............................. | 81% [bivariate_distribution_header] | |............................... | 83% | |................................ | 86% [plot_response_boxplot] | |................................. | 88% | |................................. | 90% [plot_by_boxplot] | |.................................. | 93% | |................................... | 95% [plot_response_scatterplot] | |.................................... | 98% | |.....................................| 100% [plot_by_scatterplot]
## output file: C:/Users/aliso/Documents/UTSA/Statisctical Modeling/Final Exam/report.knit.md
## "C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/pandoc" +RTS -K512m -RTS "C:\Users\aliso\DOCUME~1\UTSA\STATIS~1\FINALE~1\REPORT~1.MD" --to html4 --from markdown+autolink_bare_uris+tex_math_single_backslash --output pandoc7de46f8fba2.html --lua-filter "C:\Users\aliso\AppData\Local\R\win-library\4.4\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\aliso\AppData\Local\R\win-library\4.4\rmarkdown\rmarkdown\lua\latex-div.lua" --embed-resources --standalone --variable bs3=TRUE --section-divs --table-of-contents --toc-depth 6 --template "C:\Users\aliso\AppData\Local\R\win-library\4.4\rmarkdown\rmd\h\default.html" --no-highlight --variable highlightjs=1 --variable theme=yeti --mathjax --variable "mathjax-url=https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML" --include-in-header "C:\Users\aliso\AppData\Local\Temp\RtmpQrnDjK\rmarkdown-str7de47f8371a0.html"
##
## Output created: report.html
library(tidyr)
clean_adult <- drop_na(adult)
dim(clean_adult)
## [1] 30162 15
str(clean_adult)
## tibble [30,162 × 15] (S3: tbl_df/tbl/data.frame)
## $ age : num [1:30162] 39 50 38 53 28 37 49 52 31 42 ...
## $ work_class : chr [1:30162] "State-gov" "Self-emp-not-inc" "Private" "Private" ...
## $ wgt : num [1:30162] 77516 83311 215646 234721 338409 ...
## $ education : chr [1:30162] "Bachelors" "Bachelors" "HS-grad" "11th" ...
## $ education_num : num [1:30162] 13 13 9 7 13 14 5 9 14 13 ...
## $ marital_status: chr [1:30162] "Never-married" "Married-civ-spouse" "Divorced" "Married-civ-spouse" ...
## $ occupation : chr [1:30162] "Adm-clerical" "Exec-managerial" "Handlers-cleaners" "Handlers-cleaners" ...
## $ relationship : chr [1:30162] "Not-in-family" "Husband" "Not-in-family" "Husband" ...
## $ race : chr [1:30162] "White" "White" "White" "Black" ...
## $ sex : chr [1:30162] "Male" "Male" "Male" "Male" ...
## $ capital_gain : num [1:30162] 2174 0 0 0 0 ...
## $ capital_loss : num [1:30162] 0 0 0 0 0 0 0 0 0 0 ...
## $ hours_per_week: num [1:30162] 40 13 40 40 40 40 16 45 50 40 ...
## $ native_country: chr [1:30162] "United-States" "United-States" "United-States" "United-States" ...
## $ income : chr [1:30162] "<=50K" "<=50K" "<=50K" "<=50K" ...
plot_missing(clean_adult)
plot_correlation(clean_adult)
## 1 features with more than 20 categories ignored!
## native_country: 41 categories
plot_boxplot(clean_adult, by = "income")
plot_scatterplot(clean_adult, by = "income")
hrs_age_plot <- ggplot(clean_adult, aes(age,hours_per_week, color = income )) + geom_point(alpha = 0.5) +
labs(title = "Hrs per week vs Age by Income Group", x="Age", y = "Hrs per week")
print(hrs_age_plot)
hrs_edu_plot <- ggplot(clean_adult, aes(hours_per_week,education_num,color = income)) + geom_point(alpha = 0.5)
labs(title = "Education Years vs Hrs per week by Income", x="Hrs per week", y = "Education Years")
## $x
## [1] "Hrs per week"
##
## $y
## [1] "Education Years"
##
## $title
## [1] "Education Years vs Hrs per week by Income"
##
## attr(,"class")
## [1] "labels"
print(hrs_edu_plot)
Income_Ed <- ggplot(clean_adult, aes(x = education, fill = income)) +
geom_bar(position = "dodge") + coord_flip() + theme_minimal()
labs(title = "Income vs. Education", y = "Proportion") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
## NULL
print(Income_Ed)
library(dplyr)
# Convert categorical variables to factors
clean_adult$income <- as.factor(clean_adult$income)
clean_adult$work_class <- as.factor(clean_adult$work_class)
clean_adult$education <- as.factor(clean_adult$education)
clean_adult$marital_status <- as.factor(clean_adult$marital_status)
clean_adult$occupation <- as.factor(clean_adult$occupation)
clean_adult$relationship <- as.factor(clean_adult$relationship)
clean_adult$race <- as.factor(clean_adult$race)
clean_adult$native_country <- as.factor(clean_adult$native_country)
clean_adult$sex <- as.factor(clean_adult$sex)
Work_Class <- ggplot(clean_adult, aes(x = work_class, fill = income)) +
geom_bar(position = "dodge") + coord_flip() +
labs(title = "Income vs. Work Class", y = "Proportion") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
print(Work_Class)
Marital_Status <- ggplot(clean_adult, aes(x = marital_status, fill = income)) +
geom_bar(position = "dodge") + coord_flip() +
labs(title = "Income vs. Marital Status", y = "Proportion") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(Marital_Status)
Occupation <- ggplot(clean_adult, aes(x = occupation, fill = income)) +
geom_bar(position = "dodge") + coord_flip() +
labs(title = "Income vs. Occupation", y = "Proportion") +
theme(axis.text.x = element_text(angle = 45, hjust = 1))
print(Occupation)
native_country_nonUS <- filter(clean_adult,
native_country != "United-States")
native_country_nonUS <- ggplot(native_country_nonUS, aes(x = native_country, fill = income)) +
geom_bar(position = "dodge") +
labs(title = "Income vs. native_country", y = "Proportion") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
print(native_country_nonUS)
relationship <- ggplot(clean_adult, aes(x = relationship, fill = income)) +
geom_bar(position = "dodge") + coord_flip() +
labs(title = "Income vs. relationship", y = "Proportion") +
theme(axis.text.y = element_text(angle = 0, hjust = 1))
print(relationship)
race <- ggplot(clean_adult, aes(x = race, fill = income)) +
geom_bar(position = "dodge")+
labs(title = "Income vs. race", y = "Proportion") +
theme(axis.text.y = element_text(angle = 0, hjust = 1))
print(race)
non_white <- filter(clean_adult, race != "White")
non_white <- ggplot(non_white, aes(x = race, fill = income)) +
geom_bar(position = "dodge")+
labs(title = "Income vs. non-white race", y = "Proportion") +
theme(axis.text.y = element_text(angle = 0, hjust = 1))
print(non_white)
gender <- ggplot(clean_adult, aes(x = sex, fill = income)) +
geom_bar(position = "dodge")+
labs(title = "Income vs. gender", y = "Proportion") +
theme(axis.text.y = element_text(angle = 0, hjust = 1))
print(gender)
IncomeAge <- ggplot(clean_adult, aes(x = income, y = age, fill = income)) +
geom_boxplot() +
labs(title = "Income vs. Age")
print(IncomeAge)
3. Perform logistic regression in order to predict income using all the
variables that seemed most associated with income in (b) (exculde wgt,
you don’t need it). What is the R^2 of the model obtained? –> Tjur’s
R2 –> 0.4480447 –>
logistic regression all variables except wgt –>
model <- glm(income ~ . - wgt, data = clean_adult, family = binomial)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(model)
##
## Call:
## glm(formula = income ~ . - wgt, family = binomial, data = clean_adult)
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.251e+00 7.634e-01 -8.189 2.63e-16
## age 2.510e-02 1.709e-03 14.689 < 2e-16
## work_classLocal-gov -6.954e-01 1.129e-01 -6.161 7.22e-10
## work_classPrivate -5.006e-01 9.369e-02 -5.343 9.12e-08
## work_classSelf-emp-inc -3.318e-01 1.237e-01 -2.681 0.007337
## work_classSelf-emp-not-inc -9.958e-01 1.099e-01 -9.057 < 2e-16
## work_classState-gov -8.190e-01 1.253e-01 -6.535 6.37e-11
## work_classWithout-pay -1.329e+01 1.970e+02 -0.067 0.946234
## education11th 8.585e-02 2.138e-01 0.402 0.688013
## education12th 4.361e-01 2.780e-01 1.568 0.116768
## education1st-4th -4.378e-01 4.963e-01 -0.882 0.377746
## education5th-6th -4.163e-01 3.592e-01 -1.159 0.246485
## education7th-8th -5.731e-01 2.434e-01 -2.354 0.018554
## education9th -2.457e-01 2.702e-01 -0.909 0.363237
## educationAssoc-acdm 1.264e+00 1.797e-01 7.035 2.00e-12
## educationAssoc-voc 1.255e+00 1.728e-01 7.259 3.90e-13
## educationBachelors 1.890e+00 1.607e-01 11.762 < 2e-16
## educationDoctorate 2.930e+00 2.230e-01 13.144 < 2e-16
## educationHS-grad 7.654e-01 1.563e-01 4.896 9.76e-07
## educationMasters 2.246e+00 1.718e-01 13.076 < 2e-16
## educationPreschool -2.008e+01 1.911e+02 -0.105 0.916328
## educationProf-school 2.832e+00 2.068e-01 13.694 < 2e-16
## educationSome-college 1.102e+00 1.586e-01 6.946 3.75e-12
## education_num NA NA NA NA
## marital_statusMarried-AF-spouse 2.778e+00 5.770e-01 4.814 1.48e-06
## marital_statusMarried-civ-spouse 2.108e+00 2.748e-01 7.672 1.70e-14
## marital_statusMarried-spouse-absent 2.929e-03 2.408e-01 0.012 0.990292
## marital_statusNever-married -4.834e-01 8.922e-02 -5.418 6.03e-08
## marital_statusSeparated -7.598e-02 1.654e-01 -0.459 0.645914
## marital_statusWidowed 1.818e-01 1.582e-01 1.150 0.250240
## occupationArmed-Forces -1.106e+00 1.515e+00 -0.730 0.465401
## occupationCraft-repair 6.031e-02 8.072e-02 0.747 0.454997
## occupationExec-managerial 8.016e-01 7.788e-02 10.293 < 2e-16
## occupationFarming-fishing -1.004e+00 1.408e-01 -7.129 1.01e-12
## occupationHandlers-cleaners -6.983e-01 1.447e-01 -4.826 1.39e-06
## occupationMachine-op-inspct -2.671e-01 1.026e-01 -2.602 0.009274
## occupationOther-service -8.359e-01 1.191e-01 -7.019 2.24e-12
## occupationPriv-house-serv -4.213e+00 1.727e+00 -2.439 0.014709
## occupationProf-specialty 5.150e-01 8.245e-02 6.247 4.19e-10
## occupationProtective-serv 5.999e-01 1.262e-01 4.753 2.00e-06
## occupationSales 2.914e-01 8.313e-02 3.505 0.000456
## occupationTech-support 6.615e-01 1.117e-01 5.923 3.16e-09
## occupationTransport-moving -9.265e-02 1.000e-01 -0.926 0.354209
## relationshipNot-in-family 4.605e-01 2.717e-01 1.695 0.090154
## relationshipOther-relative -3.920e-01 2.477e-01 -1.583 0.113524
## relationshipOwn-child -7.317e-01 2.708e-01 -2.703 0.006881
## relationshipUnmarried 3.394e-01 2.874e-01 1.181 0.237651
## relationshipWife 1.350e+00 1.056e-01 12.786 < 2e-16
## raceAsian-Pac-Islander 8.378e-01 2.858e-01 2.932 0.003369
## raceBlack 5.240e-01 2.398e-01 2.185 0.028869
## raceOther 1.593e-01 3.791e-01 0.420 0.674265
## raceWhite 6.337e-01 2.287e-01 2.770 0.005599
## sexMale 8.728e-01 8.084e-02 10.796 < 2e-16
## capital_gain 3.229e-04 1.074e-05 30.067 < 2e-16
## capital_loss 6.406e-04 3.840e-05 16.679 < 2e-16
## hours_per_week 2.931e-02 1.701e-03 17.229 < 2e-16
## native_countryCanada -8.387e-01 6.893e-01 -1.217 0.223734
## native_countryChina -1.918e+00 7.034e-01 -2.727 0.006387
## native_countryColumbia -3.285e+00 1.032e+00 -3.183 0.001457
## native_countryCuba -7.665e-01 7.031e-01 -1.090 0.275646
## native_countryDominican-Republic -2.950e+00 1.220e+00 -2.418 0.015627
## native_countryEcuador -1.427e+00 9.563e-01 -1.492 0.135628
## native_countryEl-Salvador -1.756e+00 7.939e-01 -2.212 0.026958
## native_countryEngland -8.930e-01 7.009e-01 -1.274 0.202589
## native_countryFrance -5.944e-01 8.128e-01 -0.731 0.464586
## native_countryGermany -7.193e-01 6.785e-01 -1.060 0.289141
## native_countryGreece -2.199e+00 8.385e-01 -2.623 0.008724
## native_countryGuatemala -1.382e+00 9.770e-01 -1.415 0.157206
## native_countryHaiti -1.225e+00 9.284e-01 -1.320 0.186888
## native_countryHoland-Netherlands -1.180e+01 8.827e+02 -0.013 0.989339
## native_countryHonduras -2.361e+00 2.685e+00 -0.879 0.379144
## native_countryHong -1.334e+00 8.990e-01 -1.484 0.137761
## native_countryHungary -1.306e+00 9.884e-01 -1.321 0.186544
## native_countryIndia -1.696e+00 6.689e-01 -2.536 0.011217
## native_countryIran -1.167e+00 7.583e-01 -1.539 0.123898
## native_countryIreland -6.878e-01 8.886e-01 -0.774 0.438911
## native_countryItaly -3.635e-01 7.098e-01 -0.512 0.608628
## native_countryJamaica -1.190e+00 7.712e-01 -1.543 0.122726
## native_countryJapan -9.626e-01 7.286e-01 -1.321 0.186428
## native_countryLaos -1.846e+00 1.041e+00 -1.773 0.076226
## native_countryMexico -1.610e+00 6.652e-01 -2.421 0.015493
## native_countryNicaragua -1.781e+00 1.015e+00 -1.756 0.079117
## native_countryOutlying-US(Guam-USVI-etc) -1.341e+01 2.110e+02 -0.064 0.949340
## native_countryPeru -1.952e+00 1.057e+00 -1.848 0.064620
## native_countryPhilippines -8.900e-01 6.446e-01 -1.381 0.167361
## native_countryPoland -1.187e+00 7.453e-01 -1.593 0.111173
## native_countryPortugal -1.189e+00 8.859e-01 -1.342 0.179602
## native_countryPuerto-Rico -1.471e+00 7.382e-01 -1.992 0.046345
## native_countryScotland -1.444e+00 1.084e+00 -1.332 0.182779
## native_countrySouth -2.470e+00 7.367e-01 -3.353 0.000800
## native_countryTaiwan -1.394e+00 7.547e-01 -1.847 0.064682
## native_countryThailand -1.840e+00 1.017e+00 -1.809 0.070439
## native_countryTrinadad&Tobago -1.626e+00 1.058e+00 -1.537 0.124348
## native_countryUnited-States -9.990e-01 6.307e-01 -1.584 0.113230
## native_countryVietnam -2.429e+00 8.467e-01 -2.869 0.004117
## native_countryYugoslavia -4.723e-01 9.190e-01 -0.514 0.607310
##
## (Intercept) ***
## age ***
## work_classLocal-gov ***
## work_classPrivate ***
## work_classSelf-emp-inc **
## work_classSelf-emp-not-inc ***
## work_classState-gov ***
## work_classWithout-pay
## education11th
## education12th
## education1st-4th
## education5th-6th
## education7th-8th *
## education9th
## educationAssoc-acdm ***
## educationAssoc-voc ***
## educationBachelors ***
## educationDoctorate ***
## educationHS-grad ***
## educationMasters ***
## educationPreschool
## educationProf-school ***
## educationSome-college ***
## education_num
## marital_statusMarried-AF-spouse ***
## marital_statusMarried-civ-spouse ***
## marital_statusMarried-spouse-absent
## marital_statusNever-married ***
## marital_statusSeparated
## marital_statusWidowed
## occupationArmed-Forces
## occupationCraft-repair
## occupationExec-managerial ***
## occupationFarming-fishing ***
## occupationHandlers-cleaners ***
## occupationMachine-op-inspct **
## occupationOther-service ***
## occupationPriv-house-serv *
## occupationProf-specialty ***
## occupationProtective-serv ***
## occupationSales ***
## occupationTech-support ***
## occupationTransport-moving
## relationshipNot-in-family .
## relationshipOther-relative
## relationshipOwn-child **
## relationshipUnmarried
## relationshipWife ***
## raceAsian-Pac-Islander **
## raceBlack *
## raceOther
## raceWhite **
## sexMale ***
## capital_gain ***
## capital_loss ***
## hours_per_week ***
## native_countryCanada
## native_countryChina **
## native_countryColumbia **
## native_countryCuba
## native_countryDominican-Republic *
## native_countryEcuador
## native_countryEl-Salvador *
## native_countryEngland
## native_countryFrance
## native_countryGermany
## native_countryGreece **
## native_countryGuatemala
## native_countryHaiti
## native_countryHoland-Netherlands
## native_countryHonduras
## native_countryHong
## native_countryHungary
## native_countryIndia *
## native_countryIran
## native_countryIreland
## native_countryItaly
## native_countryJamaica
## native_countryJapan
## native_countryLaos .
## native_countryMexico *
## native_countryNicaragua .
## native_countryOutlying-US(Guam-USVI-etc)
## native_countryPeru .
## native_countryPhilippines
## native_countryPoland
## native_countryPortugal
## native_countryPuerto-Rico *
## native_countryScotland
## native_countrySouth ***
## native_countryTaiwan .
## native_countryThailand .
## native_countryTrinadad&Tobago
## native_countryUnited-States
## native_countryVietnam **
## native_countryYugoslavia
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 33851 on 30161 degrees of freedom
## Residual deviance: 19504 on 30067 degrees of freedom
## AIC: 19694
##
## Number of Fisher Scoring iterations: 13
logistic regression age,occupation,education,hours_per_week,marital_status –>
model2 <- glm(income ~ age + occupation + education + hours_per_week + marital_status, data = clean_adult, family = binomial)
summary(model2)
##
## Call:
## glm(formula = income ~ age + occupation + education + hours_per_week +
## marital_status, family = binomial, data = clean_adult)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.336393 0.196041 -32.322 < 2e-16 ***
## age 0.029400 0.001553 18.934 < 2e-16 ***
## occupationArmed-Forces -0.480258 1.271359 -0.378 0.705615
## occupationCraft-repair 0.009334 0.071178 0.131 0.895664
## occupationExec-managerial 0.789590 0.069096 11.428 < 2e-16 ***
## occupationFarming-fishing -1.237178 0.126702 -9.764 < 2e-16 ***
## occupationHandlers-cleaners -0.735531 0.135207 -5.440 5.33e-08 ***
## occupationMachine-op-inspct -0.323060 0.094361 -3.424 0.000618 ***
## occupationOther-service -0.992080 0.110795 -8.954 < 2e-16 ***
## occupationPriv-house-serv -2.952415 1.143405 -2.582 0.009819 **
## occupationProf-specialty 0.449912 0.074055 6.075 1.24e-09 ***
## occupationProtective-serv 0.404928 0.113168 3.578 0.000346 ***
## occupationSales 0.260351 0.073111 3.561 0.000369 ***
## occupationTech-support 0.618523 0.103320 5.986 2.14e-09 ***
## occupationTransport-moving -0.156920 0.091232 -1.720 0.085432 .
## education11th 0.126298 0.204964 0.616 0.537766
## education12th 0.441703 0.259745 1.701 0.089032 .
## education1st-4th -0.658946 0.456660 -1.443 0.149029
## education5th-6th -0.550468 0.339213 -1.623 0.104636
## education7th-8th -0.647111 0.234530 -2.759 0.005795 **
## education9th -0.361168 0.260739 -1.385 0.166000
## educationAssoc-acdm 1.386957 0.170289 8.145 3.80e-16 ***
## educationAssoc-voc 1.356264 0.163669 8.287 < 2e-16 ***
## educationBachelors 2.020826 0.152548 13.247 < 2e-16 ***
## educationDoctorate 3.006100 0.208265 14.434 < 2e-16 ***
## educationHS-grad 0.828217 0.148671 5.571 2.54e-08 ***
## educationMasters 2.375606 0.162058 14.659 < 2e-16 ***
## educationPreschool -11.390495 109.217823 -0.104 0.916938
## educationProf-school 3.092973 0.193644 15.972 < 2e-16 ***
## educationSome-college 1.163966 0.150768 7.720 1.16e-14 ***
## hours_per_week 0.030994 0.001565 19.809 < 2e-16 ***
## marital_statusMarried-AF-spouse 2.894988 0.498793 5.804 6.48e-09 ***
## marital_statusMarried-civ-spouse 2.139345 0.058433 36.612 < 2e-16 ***
## marital_statusMarried-spouse-absent -0.049302 0.213693 -0.231 0.817538
## marital_statusNever-married -0.438015 0.075954 -5.767 8.08e-09 ***
## marital_statusSeparated -0.070958 0.146292 -0.485 0.627644
## marital_statusWidowed -0.006857 0.139514 -0.049 0.960800
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 33851 on 30161 degrees of freedom
## Residual deviance: 22009 on 30125 degrees of freedom
## AIC: 22083
##
## Number of Fisher Scoring iterations: 13
logistic regression age,occupation,education –>
model3 <- glm(income ~ age + occupation + education, data = clean_adult, family = binomial)
summary(model3)
##
## Call:
## glm(formula = income ~ age + occupation + education, family = binomial,
## data = clean_adult)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.883248 0.162226 -30.101 < 2e-16 ***
## age 0.043744 0.001244 35.155 < 2e-16 ***
## occupationArmed-Forces 0.046730 1.119509 0.042 0.96670
## occupationCraft-repair 0.880111 0.064463 13.653 < 2e-16 ***
## occupationExec-managerial 1.381802 0.061124 22.607 < 2e-16 ***
## occupationFarming-fishing -0.087277 0.116397 -0.750 0.45336
## occupationHandlers-cleaners -0.260250 0.127232 -2.045 0.04081 *
## occupationMachine-op-inspct 0.346618 0.087538 3.960 7.51e-05 ***
## occupationOther-service -0.952888 0.104044 -9.159 < 2e-16 ***
## occupationPriv-house-serv -2.878147 1.010065 -2.849 0.00438 **
## occupationProf-specialty 0.782144 0.065519 11.938 < 2e-16 ***
## occupationProtective-serv 1.150886 0.101883 11.296 < 2e-16 ***
## occupationSales 0.827292 0.064493 12.828 < 2e-16 ***
## occupationTech-support 0.913039 0.091077 10.025 < 2e-16 ***
## occupationTransport-moving 0.791922 0.083560 9.477 < 2e-16 ***
## education11th 0.020604 0.197805 0.104 0.91704
## education12th 0.363583 0.246167 1.477 0.13968
## education1st-4th -0.678134 0.450170 -1.506 0.13197
## education5th-6th -0.514083 0.332777 -1.545 0.12239
## education7th-8th -0.516120 0.228077 -2.263 0.02364 *
## education9th -0.273016 0.253884 -1.075 0.28222
## educationAssoc-acdm 1.443970 0.161296 8.952 < 2e-16 ***
## educationAssoc-voc 1.450754 0.156588 9.265 < 2e-16 ***
## educationBachelors 2.030138 0.146431 13.864 < 2e-16 ***
## educationDoctorate 3.125040 0.191120 16.351 < 2e-16 ***
## educationHS-grad 0.918808 0.143684 6.395 1.61e-10 ***
## educationMasters 2.335259 0.153560 15.208 < 2e-16 ***
## educationPreschool -10.599256 73.002880 -0.145 0.88456
## educationProf-school 3.306028 0.178654 18.505 < 2e-16 ***
## educationSome-college 1.191885 0.145366 8.199 2.42e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 33851 on 30161 degrees of freedom
## Residual deviance: 27018 on 30132 degrees of freedom
## AIC: 27078
##
## Number of Fisher Scoring iterations: 12
logistic regression with age,occupation,education,hours_per_week,marital_status, –> race,sex ,work_class ,capital_gain,capital_loss –>
model4 <- glm(income ~ age + occupation + education + hours_per_week + marital_status + race + sex + work_class + capital_gain + capital_loss, data = clean_adult, family = binomial)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(model4)
##
## Call:
## glm(formula = income ~ age + occupation + education + hours_per_week +
## marital_status + race + sex + work_class + capital_gain +
## capital_loss, family = binomial, data = clean_adult)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.538e+00 3.151e-01 -20.747 < 2e-16 ***
## age 2.588e-02 1.673e-03 15.472 < 2e-16 ***
## occupationArmed-Forces -1.166e+00 1.539e+00 -0.757 0.44881
## occupationCraft-repair -4.007e-04 7.913e-02 -0.005 0.99596
## occupationExec-managerial 7.692e-01 7.585e-02 10.142 < 2e-16 ***
## occupationFarming-fishing -1.072e+00 1.401e-01 -7.652 1.98e-14 ***
## occupationHandlers-cleaners -7.657e-01 1.436e-01 -5.333 9.65e-08 ***
## occupationMachine-op-inspct -3.210e-01 1.012e-01 -3.171 0.00152 **
## occupationOther-service -8.447e-01 1.170e-01 -7.217 5.31e-13 ***
## occupationPriv-house-serv -4.531e+00 1.740e+00 -2.605 0.00920 **
## occupationProf-specialty 4.948e-01 8.049e-02 6.148 7.86e-10 ***
## occupationProtective-serv 5.616e-01 1.254e-01 4.481 7.45e-06 ***
## occupationSales 2.383e-01 8.132e-02 2.931 0.00338 **
## occupationTech-support 6.271e-01 1.095e-01 5.726 1.03e-08 ***
## occupationTransport-moving -1.377e-01 9.892e-02 -1.392 0.16392
## education11th 5.680e-02 2.130e-01 0.267 0.78974
## education12th 4.173e-01 2.757e-01 1.514 0.13010
## education1st-4th -6.565e-01 4.776e-01 -1.375 0.16927
## education5th-6th -5.799e-01 3.480e-01 -1.666 0.09565 .
## education7th-8th -6.134e-01 2.426e-01 -2.528 0.01146 *
## education9th -2.993e-01 2.692e-01 -1.112 0.26617
## educationAssoc-acdm 1.315e+00 1.789e-01 7.347 2.03e-13 ***
## educationAssoc-voc 1.261e+00 1.721e-01 7.326 2.38e-13 ***
## educationBachelors 1.920e+00 1.601e-01 11.992 < 2e-16 ***
## educationDoctorate 2.913e+00 2.220e-01 13.122 < 2e-16 ***
## educationHS-grad 7.818e-01 1.558e-01 5.017 5.25e-07 ***
## educationMasters 2.257e+00 1.709e-01 13.204 < 2e-16 ***
## educationPreschool -2.084e+01 1.606e+02 -0.130 0.89672
## educationProf-school 2.842e+00 2.061e-01 13.786 < 2e-16 ***
## educationSome-college 1.108e+00 1.581e-01 7.007 2.44e-12 ***
## hours_per_week 2.929e-02 1.680e-03 17.432 < 2e-16 ***
## marital_statusMarried-AF-spouse 3.051e+00 5.044e-01 6.048 1.46e-09 ***
## marital_statusMarried-civ-spouse 2.164e+00 6.785e-02 31.892 < 2e-16 ***
## marital_statusMarried-spouse-absent 1.093e-02 2.349e-01 0.047 0.96290
## marital_statusNever-married -5.113e-01 8.386e-02 -6.097 1.08e-09 ***
## marital_statusSeparated -8.433e-02 1.616e-01 -0.522 0.60174
## marital_statusWidowed 1.393e-02 1.546e-01 0.090 0.92822
## raceAsian-Pac-Islander 4.732e-01 2.481e-01 1.907 0.05651 .
## raceBlack 5.021e-01 2.365e-01 2.123 0.03373 *
## raceOther -1.302e-01 3.714e-01 -0.351 0.72585
## raceWhite 6.270e-01 2.260e-01 2.774 0.00554 **
## sexMale 1.555e-01 5.376e-02 2.893 0.00382 **
## work_classLocal-gov -7.080e-01 1.118e-01 -6.333 2.41e-10 ***
## work_classPrivate -5.065e-01 9.296e-02 -5.448 5.09e-08 ***
## work_classSelf-emp-inc -3.322e-01 1.235e-01 -2.691 0.00713 **
## work_classSelf-emp-not-inc -9.975e-01 1.094e-01 -9.117 < 2e-16 ***
## work_classState-gov -8.342e-01 1.245e-01 -6.698 2.11e-11 ***
## work_classWithout-pay -1.318e+01 1.985e+02 -0.066 0.94706
## capital_gain 3.256e-04 1.064e-05 30.599 < 2e-16 ***
## capital_loss 6.476e-04 3.824e-05 16.933 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 33851 on 30161 degrees of freedom
## Residual deviance: 19849 on 30112 degrees of freedom
## AIC: 19949
##
## Number of Fisher Scoring iterations: 13
coefficient summary of model –>
summary(model)
##
## Call:
## glm(formula = income ~ . - wgt, family = binomial, data = clean_adult)
##
## Coefficients: (1 not defined because of singularities)
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.251e+00 7.634e-01 -8.189 2.63e-16
## age 2.510e-02 1.709e-03 14.689 < 2e-16
## work_classLocal-gov -6.954e-01 1.129e-01 -6.161 7.22e-10
## work_classPrivate -5.006e-01 9.369e-02 -5.343 9.12e-08
## work_classSelf-emp-inc -3.318e-01 1.237e-01 -2.681 0.007337
## work_classSelf-emp-not-inc -9.958e-01 1.099e-01 -9.057 < 2e-16
## work_classState-gov -8.190e-01 1.253e-01 -6.535 6.37e-11
## work_classWithout-pay -1.329e+01 1.970e+02 -0.067 0.946234
## education11th 8.585e-02 2.138e-01 0.402 0.688013
## education12th 4.361e-01 2.780e-01 1.568 0.116768
## education1st-4th -4.378e-01 4.963e-01 -0.882 0.377746
## education5th-6th -4.163e-01 3.592e-01 -1.159 0.246485
## education7th-8th -5.731e-01 2.434e-01 -2.354 0.018554
## education9th -2.457e-01 2.702e-01 -0.909 0.363237
## educationAssoc-acdm 1.264e+00 1.797e-01 7.035 2.00e-12
## educationAssoc-voc 1.255e+00 1.728e-01 7.259 3.90e-13
## educationBachelors 1.890e+00 1.607e-01 11.762 < 2e-16
## educationDoctorate 2.930e+00 2.230e-01 13.144 < 2e-16
## educationHS-grad 7.654e-01 1.563e-01 4.896 9.76e-07
## educationMasters 2.246e+00 1.718e-01 13.076 < 2e-16
## educationPreschool -2.008e+01 1.911e+02 -0.105 0.916328
## educationProf-school 2.832e+00 2.068e-01 13.694 < 2e-16
## educationSome-college 1.102e+00 1.586e-01 6.946 3.75e-12
## education_num NA NA NA NA
## marital_statusMarried-AF-spouse 2.778e+00 5.770e-01 4.814 1.48e-06
## marital_statusMarried-civ-spouse 2.108e+00 2.748e-01 7.672 1.70e-14
## marital_statusMarried-spouse-absent 2.929e-03 2.408e-01 0.012 0.990292
## marital_statusNever-married -4.834e-01 8.922e-02 -5.418 6.03e-08
## marital_statusSeparated -7.598e-02 1.654e-01 -0.459 0.645914
## marital_statusWidowed 1.818e-01 1.582e-01 1.150 0.250240
## occupationArmed-Forces -1.106e+00 1.515e+00 -0.730 0.465401
## occupationCraft-repair 6.031e-02 8.072e-02 0.747 0.454997
## occupationExec-managerial 8.016e-01 7.788e-02 10.293 < 2e-16
## occupationFarming-fishing -1.004e+00 1.408e-01 -7.129 1.01e-12
## occupationHandlers-cleaners -6.983e-01 1.447e-01 -4.826 1.39e-06
## occupationMachine-op-inspct -2.671e-01 1.026e-01 -2.602 0.009274
## occupationOther-service -8.359e-01 1.191e-01 -7.019 2.24e-12
## occupationPriv-house-serv -4.213e+00 1.727e+00 -2.439 0.014709
## occupationProf-specialty 5.150e-01 8.245e-02 6.247 4.19e-10
## occupationProtective-serv 5.999e-01 1.262e-01 4.753 2.00e-06
## occupationSales 2.914e-01 8.313e-02 3.505 0.000456
## occupationTech-support 6.615e-01 1.117e-01 5.923 3.16e-09
## occupationTransport-moving -9.265e-02 1.000e-01 -0.926 0.354209
## relationshipNot-in-family 4.605e-01 2.717e-01 1.695 0.090154
## relationshipOther-relative -3.920e-01 2.477e-01 -1.583 0.113524
## relationshipOwn-child -7.317e-01 2.708e-01 -2.703 0.006881
## relationshipUnmarried 3.394e-01 2.874e-01 1.181 0.237651
## relationshipWife 1.350e+00 1.056e-01 12.786 < 2e-16
## raceAsian-Pac-Islander 8.378e-01 2.858e-01 2.932 0.003369
## raceBlack 5.240e-01 2.398e-01 2.185 0.028869
## raceOther 1.593e-01 3.791e-01 0.420 0.674265
## raceWhite 6.337e-01 2.287e-01 2.770 0.005599
## sexMale 8.728e-01 8.084e-02 10.796 < 2e-16
## capital_gain 3.229e-04 1.074e-05 30.067 < 2e-16
## capital_loss 6.406e-04 3.840e-05 16.679 < 2e-16
## hours_per_week 2.931e-02 1.701e-03 17.229 < 2e-16
## native_countryCanada -8.387e-01 6.893e-01 -1.217 0.223734
## native_countryChina -1.918e+00 7.034e-01 -2.727 0.006387
## native_countryColumbia -3.285e+00 1.032e+00 -3.183 0.001457
## native_countryCuba -7.665e-01 7.031e-01 -1.090 0.275646
## native_countryDominican-Republic -2.950e+00 1.220e+00 -2.418 0.015627
## native_countryEcuador -1.427e+00 9.563e-01 -1.492 0.135628
## native_countryEl-Salvador -1.756e+00 7.939e-01 -2.212 0.026958
## native_countryEngland -8.930e-01 7.009e-01 -1.274 0.202589
## native_countryFrance -5.944e-01 8.128e-01 -0.731 0.464586
## native_countryGermany -7.193e-01 6.785e-01 -1.060 0.289141
## native_countryGreece -2.199e+00 8.385e-01 -2.623 0.008724
## native_countryGuatemala -1.382e+00 9.770e-01 -1.415 0.157206
## native_countryHaiti -1.225e+00 9.284e-01 -1.320 0.186888
## native_countryHoland-Netherlands -1.180e+01 8.827e+02 -0.013 0.989339
## native_countryHonduras -2.361e+00 2.685e+00 -0.879 0.379144
## native_countryHong -1.334e+00 8.990e-01 -1.484 0.137761
## native_countryHungary -1.306e+00 9.884e-01 -1.321 0.186544
## native_countryIndia -1.696e+00 6.689e-01 -2.536 0.011217
## native_countryIran -1.167e+00 7.583e-01 -1.539 0.123898
## native_countryIreland -6.878e-01 8.886e-01 -0.774 0.438911
## native_countryItaly -3.635e-01 7.098e-01 -0.512 0.608628
## native_countryJamaica -1.190e+00 7.712e-01 -1.543 0.122726
## native_countryJapan -9.626e-01 7.286e-01 -1.321 0.186428
## native_countryLaos -1.846e+00 1.041e+00 -1.773 0.076226
## native_countryMexico -1.610e+00 6.652e-01 -2.421 0.015493
## native_countryNicaragua -1.781e+00 1.015e+00 -1.756 0.079117
## native_countryOutlying-US(Guam-USVI-etc) -1.341e+01 2.110e+02 -0.064 0.949340
## native_countryPeru -1.952e+00 1.057e+00 -1.848 0.064620
## native_countryPhilippines -8.900e-01 6.446e-01 -1.381 0.167361
## native_countryPoland -1.187e+00 7.453e-01 -1.593 0.111173
## native_countryPortugal -1.189e+00 8.859e-01 -1.342 0.179602
## native_countryPuerto-Rico -1.471e+00 7.382e-01 -1.992 0.046345
## native_countryScotland -1.444e+00 1.084e+00 -1.332 0.182779
## native_countrySouth -2.470e+00 7.367e-01 -3.353 0.000800
## native_countryTaiwan -1.394e+00 7.547e-01 -1.847 0.064682
## native_countryThailand -1.840e+00 1.017e+00 -1.809 0.070439
## native_countryTrinadad&Tobago -1.626e+00 1.058e+00 -1.537 0.124348
## native_countryUnited-States -9.990e-01 6.307e-01 -1.584 0.113230
## native_countryVietnam -2.429e+00 8.467e-01 -2.869 0.004117
## native_countryYugoslavia -4.723e-01 9.190e-01 -0.514 0.607310
##
## (Intercept) ***
## age ***
## work_classLocal-gov ***
## work_classPrivate ***
## work_classSelf-emp-inc **
## work_classSelf-emp-not-inc ***
## work_classState-gov ***
## work_classWithout-pay
## education11th
## education12th
## education1st-4th
## education5th-6th
## education7th-8th *
## education9th
## educationAssoc-acdm ***
## educationAssoc-voc ***
## educationBachelors ***
## educationDoctorate ***
## educationHS-grad ***
## educationMasters ***
## educationPreschool
## educationProf-school ***
## educationSome-college ***
## education_num
## marital_statusMarried-AF-spouse ***
## marital_statusMarried-civ-spouse ***
## marital_statusMarried-spouse-absent
## marital_statusNever-married ***
## marital_statusSeparated
## marital_statusWidowed
## occupationArmed-Forces
## occupationCraft-repair
## occupationExec-managerial ***
## occupationFarming-fishing ***
## occupationHandlers-cleaners ***
## occupationMachine-op-inspct **
## occupationOther-service ***
## occupationPriv-house-serv *
## occupationProf-specialty ***
## occupationProtective-serv ***
## occupationSales ***
## occupationTech-support ***
## occupationTransport-moving
## relationshipNot-in-family .
## relationshipOther-relative
## relationshipOwn-child **
## relationshipUnmarried
## relationshipWife ***
## raceAsian-Pac-Islander **
## raceBlack *
## raceOther
## raceWhite **
## sexMale ***
## capital_gain ***
## capital_loss ***
## hours_per_week ***
## native_countryCanada
## native_countryChina **
## native_countryColumbia **
## native_countryCuba
## native_countryDominican-Republic *
## native_countryEcuador
## native_countryEl-Salvador *
## native_countryEngland
## native_countryFrance
## native_countryGermany
## native_countryGreece **
## native_countryGuatemala
## native_countryHaiti
## native_countryHoland-Netherlands
## native_countryHonduras
## native_countryHong
## native_countryHungary
## native_countryIndia *
## native_countryIran
## native_countryIreland
## native_countryItaly
## native_countryJamaica
## native_countryJapan
## native_countryLaos .
## native_countryMexico *
## native_countryNicaragua .
## native_countryOutlying-US(Guam-USVI-etc)
## native_countryPeru .
## native_countryPhilippines
## native_countryPoland
## native_countryPortugal
## native_countryPuerto-Rico *
## native_countryScotland
## native_countrySouth ***
## native_countryTaiwan .
## native_countryThailand .
## native_countryTrinadad&Tobago
## native_countryUnited-States
## native_countryVietnam **
## native_countryYugoslavia
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 33851 on 30161 degrees of freedom
## Residual deviance: 19504 on 30067 degrees of freedom
## AIC: 19694
##
## Number of Fisher Scoring iterations: 13
anova(model)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
## Analysis of Deviance Table
##
## Model: binomial, link: logit
##
## Response: income
##
## Terms added sequentially (first to last)
##
##
## Df Deviance Resid. Df Resid. Dev Pr(>Chi)
## NULL 30161 33851
## age 1 1738.5 30160 32112 < 2.2e-16 ***
## work_class 6 426.1 30154 31686 < 2.2e-16 ***
## education 15 3570.2 30139 28116 < 2.2e-16 ***
## education_num 0 0.0 30139 28116
## marital_status 6 5091.4 30133 23024 < 2.2e-16 ***
## occupation 13 765.5 30120 22259 < 2.2e-16 ***
## relationship 5 199.5 30115 22059 < 2.2e-16 ***
## race 4 21.3 30111 22038 0.0002802 ***
## sex 1 165.7 30110 21872 < 2.2e-16 ***
## capital_gain 1 1684.3 30109 20188 < 2.2e-16 ***
## capital_loss 1 294.9 30108 19893 < 2.2e-16 ***
## hours_per_week 1 307.6 30107 19586 < 2.2e-16 ***
## native_country 40 81.7 30067 19504 0.0001101 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
find most influential variables –>
most_model <- randomForest(income ~ age + work_class + education + marital_status + occupation + relationship + race + sex + capital_gain + capital_loss + hours_per_week + native_country, data = clean_adult)
vip(most_model)
model fit –>
performance(model)
## # Indices of model performance
##
## AIC | AICc | BIC | Tjur's R2 | RMSE | Sigma | Log_loss | Score_log | Score_spherical | PCP
## --------------------------------------------------------------------------------------------------------------
## 19693.846 | 19694.452 | 20483.708 | 0.448 | 0.322 | 1.000 | 0.323 | -Inf | 4.671e-04 | 0.794
r^2 of model –>
pacman::p_load(effectsize)
pacman::p_load(performance)
r2_value<- r2(model)$R2
r2_value
## Tjur's R2
## 0.4480447
substantially good –>
interpret_r2(r2_value)
## Tjur's R2
## "substantial"
## (Rules: cohen1988)
fit of model to curve –>
roc_curve <- roc(income ~ fitted.values(model), data=clean_adult,
plot = TRUE, legacy.axes = TRUE,
print.auc = TRUE, ci = TRUE)
## Setting levels: control = <=50K, case = >50K
## Setting direction: controls < cases
prediction plots –>
# using [all] gets smooth plots
prediction_relation<-ggpredict(model, terms="relationship")
prediction_age<-ggpredict(model, terms="age [all]")
# Plot each term individually
plot_relation<-plot(prediction_relation) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
plot_age <-plot(prediction_age)
# Combine plots into a single figure
plot_grid(plot_relation, plot_age)
prediction_education<-ggpredict(model, terms="education [all]")
prediction_marital_status<-ggpredict(model, terms="marital_status [all]")
plot_education <- plot(prediction_education) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
plot_marital_status <- plot(prediction_marital_status) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
plot_grid(plot_education, plot_marital_status)
prediction_occupation<-ggpredict(model, terms="occupation [all]")
plot_occupation <- plot(prediction_occupation) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
plot_grid(plot_occupation)
redoing of logistic regression using R^2 McFadden method –>
pacman::p_load(pscl)
pR2(model)
## fitting null model for pseudo-r2
## llh llhNull G2 McFadden r2ML
## -9.751923e+03 -1.692535e+04 1.434686e+04 4.238275e-01 3.785254e-01
## r2CU
## 5.612201e-01
model <- glm(income ~ . - wgt, data = clean_adult, family = binomial)
## Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
null_model <- glm(income ~ 1, data = clean_adult, family = binomial)
pseudoR2 <- 1 - as.numeric(logLik(model)) / as.numeric(logLik(null_model))
pseudoR2
## [1] 0.4238275
Pick a significant variable and use either effects plots or odds ratios to interpret the coefficient. –>
effects analysis and plots –>
pacman::p_load(effects)
adult_higher_ed <- clean_adult |>
filter (education %in% c("Bachelors", "Masters", "Doctorate"))
model_higher_ed <- glm(income ~ education, data = adult_higher_ed, family = binomial)
higher_ed_effect <- effect("education", model_higher_ed, type = "response")
print(higher_ed_effect)
##
## education effect
## education
## Bachelors Doctorate Masters
## 0.4214909 0.7466667 0.5642286
summary(model_higher_ed)
##
## Call:
## glm(formula = income ~ education, family = binomial, data = adult_higher_ed)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.31666 0.02851 -11.11 <2e-16 ***
## educationDoctorate 1.39757 0.12211 11.45 <2e-16 ***
## educationMasters 0.57500 0.05756 9.99 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 9745.3 on 7045 degrees of freedom
## Residual deviance: 9520.6 on 7043 degrees of freedom
## AIC: 9526.6
##
## Number of Fisher Scoring iterations: 4
Getting a doctorate increases your chances of an income>50k by a factor of 4 or 400% –>
adult_married_status <- clean_adult |>
filter (marital_status %in% c("Married-AF-spouse", "Married-civ_spouse", "Married-spouse-absent"))
model_married_status <- glm(income ~ marital_status, data = adult_married_status, family = binomial)
married_effect <- effect("marital_status", model_married_status, type = "response")
print(married_effect)
##
## marital_status effect
## marital_status
## Married-AF-spouse Married-spouse-absent
## 0.47619048 0.08378378
summary(model_married_status)
##
## Call:
## glm(formula = income ~ marital_status, family = binomial, data = adult_married_status)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.09531 0.43693 -0.218 0.827
## marital_statusMarried-spouse-absent -2.29670 0.47552 -4.830 1.37e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 262.46 on 390 degrees of freedom
## Residual deviance: 242.12 on 389 degrees of freedom
## AIC: 246.12
##
## Number of Fisher Scoring iterations: 5
Having an absent spouse decreases your chances of an income > 50k by a factor of 0.1 or by 10%. –>
model_race <- glm(income ~ race, data = clean_adult, family = binomial)
race_effect <- effect("race", model_race, type = "response")
print(race_effect)
##
## race effect
## race
## Amer-Indian-Eskimo Asian-Pac-Islander Black Other
## 0.1188811 0.2770950 0.1299255 0.0909091
## White
## 0.2637180
race_model <- glm(formula = income ~ race, family = binomial, data = clean_adult)
summary(race_model)
##
## Call:
## glm(formula = income ~ race, family = binomial, data = clean_adult)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.0031 0.1827 -10.964 < 2e-16 ***
## raceAsian-Pac-Islander 1.0442 0.1974 5.290 1.22e-07 ***
## raceBlack 0.1015 0.1911 0.531 0.596
## raceOther -0.2995 0.2928 -1.023 0.306
## raceWhite 0.9763 0.1832 5.328 9.92e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 33851 on 30161 degrees of freedom
## Residual deviance: 33504 on 30157 degrees of freedom
## AIC: 33514
##
## Number of Fisher Scoring iterations: 4
Being Asian or Pacific Islander almost triples your chances of an income > 50k or by 284%. –>
model_age <- glm(income ~ age, data = clean_adult, family = binomial)
age_effect <- effect("age", model_age, type = "response")
print(age_effect)
##
## age effect
## age
## 20 40 50 70 90
## 0.1237647 0.2478050 0.3347221 0.5399135 0.7324117
summary(model_age)
##
## Call:
## glm(formula = income ~ age, family = binomial, data = clean_adult)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -2.804151 0.045583 -61.52 <2e-16 ***
## age 0.042345 0.001044 40.58 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 33851 on 30161 degrees of freedom
## Residual deviance: 32112 on 30160 degrees of freedom
## AIC: 32116
##
## Number of Fisher Scoring iterations: 4
plot(age_effect, main = "Effect of Age on Income >50k",
xlab = "Age",
ylab = "Prediction Probability")
plot(higher_ed_effect, main = "Effect of Higher Education on Income >50k",
xlab = "Higher Ed Level",
ylab = "Prediction Probability")
model_occupation <- glm(income ~ occupation, data = clean_adult, family = binomial)
occupation_effect <- effect("occupation", model_occupation, type = "response")
print(occupation_effect)
##
## occupation effect
## occupation
## Adm-clerical Armed-Forces Craft-repair Exec-managerial
## 0.133834991 0.111111111 0.225310174 0.485220441
## Farming-fishing Handlers-cleaners Machine-op-inspct Other-service
## 0.116279070 0.061481481 0.124618515 0.041095890
## Priv-house-serv Prof-specialty Protective-serv Sales
## 0.006993007 0.448489351 0.326086957 0.270647321
## Tech-support Transport-moving
## 0.304824561 0.202926209
summary(model_occupation)
##
## Call:
## glm(formula = income ~ occupation, family = binomial, data = clean_adult)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.86747 0.04815 -38.785 < 2e-16 ***
## occupationArmed-Forces -0.21197 1.06175 -0.200 0.84176
## occupationCraft-repair 0.63248 0.06115 10.342 < 2e-16 ***
## occupationExec-managerial 1.80833 0.05763 31.378 < 2e-16 ***
## occupationFarming-fishing -0.16068 0.11026 -1.457 0.14505
## occupationHandlers-cleaners -0.85810 0.12311 -6.970 3.16e-12 ***
## occupationMachine-op-inspct -0.08193 0.08355 -0.981 0.32677
## occupationOther-service -1.28242 0.10109 -12.686 < 2e-16 ***
## occupationPriv-house-serv -3.08836 1.00456 -3.074 0.00211 **
## occupationProf-specialty 1.66069 0.05762 28.824 < 2e-16 ***
## occupationProtective-serv 1.14153 0.09687 11.784 < 2e-16 ***
## occupationSales 0.87613 0.06109 14.342 < 2e-16 ***
## occupationTech-support 1.04304 0.08656 12.050 < 2e-16 ***
## occupationTransport-moving 0.49936 0.07906 6.316 2.69e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 33851 on 30161 degrees of freedom
## Residual deviance: 29954 on 30148 degrees of freedom
## AIC: 29982
##
## Number of Fisher Scoring iterations: 7
Having an executive managerial occupation increase your chances of having an income > 50k by a factor of 6.04 , or by 600% –>
model_occupation <- glm(income ~ occupation, data = clean_adult, family = binomial)
occupation_effect <- effect("occupation", model_occupation, type = "response")
occupation_effectdf <- as.data.frame(occupation_effect)
ggplot(occupation_effectdf, aes(x = fit, y = reorder (occupation,fit))) +
geom_point(size = 3, color = "orange") +
geom_errorbarh(aes(xmin = lower, xmax = upper), height = 0.2, color = "blue") +
labs(title = "Effect of Occupation on Income >50k", x = "Predicted Probability", y = "Occupation") + theme_minimal(base_size = 10)
large range of armed forces income because of set pay ranks in the military –>