ParkerDMCH5HW

Introduction

This analysis looks at the topic of parsimony and false discovery rates for the DirectMarketing.csv. In addition to this, this analysis uses topics that were talked about in chapter 5.

library(tidyverse)

## -- Attaching packages ------------------------------------------------------ tidyverse 1.3.0 --

## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0

## -- Conflicts --------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(dplyr)
library(pander)
DirectMarketing <- read_csv("DirectMarketing.csv")

## Parsed with column specification:
## cols(
##   Age = col_character(),
##   Gender = col_character(),
##   OwnHome = col_character(),
##   Married = col_character(),
##   Location = col_character(),
##   Salary = col_double(),
##   Children = col_double(),
##   History = col_character(),
##   Catalogs = col_double(),
##   AmountSpent = col_double()
## )

head(DirectMarketing)

## # A tibble: 6 x 10
##   Age   Gender OwnHome Married Location Salary Children History Catalogs
##   <chr> <chr>  <chr>   <chr>   <chr>     <dbl>    <dbl> <chr>      <dbl>
## 1 Old   Female Own     Single  Far       47500        0 High           6
## 2 Midd~ Male   Rent    Single  Close     63600        0 High           6
## 3 Young Female Rent    Single  Close     13500        0 Low           18
## 4 Midd~ Male   Own     Married Close     85600        1 High          18
## 5 Midd~ Female Own     Single  Close     68400        0 High          12
## 6 Young Male   Own     Married Close     30400        0 Low            6
## # ... with 1 more variable: AmountSpent <dbl>

summary(DirectMarketing)

##      Age               Gender            OwnHome            Married         
##  Length:1000        Length:1000        Length:1000        Length:1000       
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##    Location             Salary          Children       History         
##  Length:1000        Min.   : 10100   Min.   :0.000   Length:1000       
##  Class :character   1st Qu.: 29975   1st Qu.:0.000   Class :character  
##  Mode  :character   Median : 53700   Median :1.000   Mode  :character  
##                     Mean   : 56104   Mean   :0.934                     
##                     3rd Qu.: 77025   3rd Qu.:2.000                     
##                     Max.   :168800   Max.   :3.000                     
##     Catalogs      AmountSpent    
##  Min.   : 6.00   Min.   :  38.0  
##  1st Qu.: 6.00   1st Qu.: 488.2  
##  Median :12.00   Median : 962.0  
##  Mean   :14.68   Mean   :1216.8  
##  3rd Qu.:18.00   3rd Qu.:1688.5  
##  Max.   :24.00   Max.   :6217.0

m1<-lm(formula = AmountSpent ~ Catalogs+ Salary, data = DirectMarketing)
summary(m1)

## 
## Call:
## lm(formula = AmountSpent ~ Catalogs + Salary, data = DirectMarketing)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1761.3  -327.9    14.6   270.6  3387.8 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -6.591e+02  5.368e+01  -12.28   <2e-16 ***
## Catalogs     5.170e+01  2.912e+00   17.75   <2e-16 ***
## Salary       1.991e-02  6.299e-04   31.61   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 599.2 on 997 degrees of freedom
## Multiple R-squared:  0.6121, Adjusted R-squared:  0.6113 
## F-statistic: 786.5 on 2 and 997 DF,  p-value: < 2.2e-16

summary(m1)$coefficients

##                  Estimate   Std. Error   t value      Pr(>|t|)
## (Intercept) -659.14813353 5.367602e+01 -12.28012  2.219934e-32
## Catalogs      51.69516193 2.911923e+00  17.75293  1.749434e-61
## Salary         0.01990824 6.299047e-04  31.60516 1.924160e-152

m1.pvals<-summary(m1)$coef1ficients[,4]  
p1.examples <-
  data_frame(p1 = summary(m1)$coefficients[,4])%>%
  mutate(p1.fdr = p.adjust(p1, method="fdr"),
         p1.sig = ifelse(p1 < .05, "*", ""),
         p1.fdr.sig = ifelse(p1.fdr < .05, "*", ""))

## Warning: `data_frame()` is deprecated as of tibble 1.1.0.
## Please use `tibble()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.

p1.examples %>%
  arrange(p1) %>%
  pander(caption="Generated 'p values', with and without FDR correction applied.")

Generated ‘p values’, with and without FDR correction applied.
p1	p1.fdr	p1.sig	p1.fdr.sig
1.924e-152	5.772e-152	*	*
1.749e-61	2.624e-61	*	*
2.22e-32	2.22e-32	*	*

ParkerDMCH5HW

Nickolas Parker

9/23/2020

Introduction

Conclusion