Discussion 12

Using R, build a multiple regression model for data that interests you. Include in this model at least one quadratic term,bone dichotomous term, and one dichotomous vs. quantitative interaction term. Interpret all coefficients. Conduct residual analysis. Was the linear model appropriate? Why or why not?

rm(list=ls())
library(ggplot2)
clothing.dt<-read.csv('https://raw.githubusercontent.com/VioletaStoyanova/Data605/master/Clothing.csv', header=TRUE)
head(clothing.dt)
##     tsale     sales margin nown  nfull  npart   naux hoursw  hourspw
## 1  750000  4411.765     41    1 1.0000 1.0000 1.5357     76 16.75596
## 2 1926395  4280.878     39    2 2.0000 3.0000 1.5357    192 22.49376
## 3 1250000  4166.667     40    1 2.0000 2.2222 1.4091    114 17.19120
## 4  694227  2670.104     40    1 1.0000 1.2833 1.3673    100 21.50260
## 5  750000 15000.000     44    2 1.9556 1.2833 1.3673    104 15.74279
## 6  400000  4444.444     41    2 1.9556 1.2833 1.3673     72 10.89885
##        inv1     inv2 ssize start
## 1  17166.67 27177.04   170  1984
## 2  17166.67 27177.04   450  1972
## 3 292857.20 71570.55   300  1952
## 4  22207.04 15000.00   260  1966
## 5  22207.04 10000.00    50  1996
## 6  22207.04 22859.85    90  1947
summary(clothing.dt)
##      tsale             sales           margin           nown       
##  Min.   :  50000   Min.   :  300   Min.   :16.00   Min.   : 1.000  
##  1st Qu.: 495340   1st Qu.: 3904   1st Qu.:37.00   1st Qu.: 1.000  
##  Median : 694227   Median : 5279   Median :39.00   Median : 1.000  
##  Mean   : 833584   Mean   : 6335   Mean   :38.77   Mean   : 1.284  
##  3rd Qu.: 976817   3rd Qu.: 7740   3rd Qu.:41.00   3rd Qu.: 1.295  
##  Max.   :5000000   Max.   :27000   Max.   :66.00   Max.   :10.000  
##      nfull           npart            naux           hoursw     
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   : 32.0  
##  1st Qu.:1.923   1st Qu.:1.283   1st Qu.:1.333   1st Qu.: 80.0  
##  Median :1.956   Median :1.283   Median :1.367   Median :104.0  
##  Mean   :2.069   Mean   :1.566   Mean   :1.390   Mean   :121.1  
##  3rd Qu.:2.066   3rd Qu.:2.000   3rd Qu.:1.367   3rd Qu.:145.2  
##  Max.   :8.000   Max.   :9.000   Max.   :4.000   Max.   :582.0  
##     hourspw            inv1              inv2            ssize       
##  Min.   : 5.708   Min.   :   1000   Min.   :   350   Min.   :  16.0  
##  1st Qu.:13.541   1st Qu.:  20000   1st Qu.: 10000   1st Qu.:  80.0  
##  Median :17.745   Median :  22207   Median : 22860   Median : 120.0  
##  Mean   :18.955   Mean   :  58257   Mean   : 27829   Mean   : 151.1  
##  3rd Qu.:24.303   3rd Qu.:  62269   3rd Qu.: 22860   3rd Qu.: 190.0  
##  Max.   :43.326   Max.   :1500000   Max.   :400000   Max.   :1214.0  
##      start     
##  Min.   :1945  
##  1st Qu.:1959  
##  Median :1978  
##  Mean   :1978  
##  3rd Qu.:1996  
##  Max.   :2015
pairs(clothing.dt,gap=0.5)

clothing.lm <- lm(sales ~ tsale+ margin+ nown+ nfull + npart+ naux+ hoursw + hourspw +inv1 +inv2+ssize+start, data =clothing.dt)
summary(clothing.lm)
## 
## Call:
## lm(formula = sales ~ tsale + margin + nown + nfull + npart + 
##     naux + hoursw + hourspw + inv1 + inv2 + ssize + start, data = clothing.dt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7684.6 -1149.0  -560.0   571.4 14962.2 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.395e+04  1.076e+04  -1.297   0.1956    
## tsale        5.272e-03  3.212e-04  16.412   <2e-16 ***
## margin       5.118e+01  2.294e+01   2.231   0.0263 *  
## nown         1.872e+02  2.882e+02   0.650   0.5164    
## nfull       -2.431e+02  2.451e+02  -0.992   0.3219    
## npart       -2.540e+02  2.185e+02  -1.162   0.2459    
## naux        -2.249e+02  3.522e+02  -0.639   0.5234    
## hoursw       1.587e+01  8.266e+00   1.919   0.0557 .  
## hourspw     -6.933e+01  5.661e+01  -1.225   0.2215    
## inv1         9.753e-04  1.194e-03   0.817   0.4145    
## inv2        -2.890e-03  3.059e-03  -0.945   0.3454    
## ssize       -2.673e+01  1.312e+00 -20.378   <2e-16 ***
## start        9.270e+00  5.390e+00   1.720   0.0863 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2207 on 387 degrees of freedom
## Multiple R-squared:  0.6622, Adjusted R-squared:  0.6517 
## F-statistic: 63.22 on 12 and 387 DF,  p-value: < 2.2e-16
par(mfrow=c(2,2))
hist(clothing.lm$residuals, main = "Histogram of Residuals", xlab= "")
plot(clothing.lm$residuals, fitted(clothing.lm))
qqnorm(clothing.lm$residuals)
qqline(clothing.lm$residuals)

The equation for this model includes the following predictors sales^=-1.395e+04+ 5.272e-03???tsale +5.118e+01*margin+ -2.673e+01???ssize

R-squared/Adjusted R^2: values of 0.6622 and 0.6517 respectively, which means that about 65% of the data fall into the regression line. F-statistic: value of 63.22 with a small p-value < 2.2e-16

The residuals slightly follow the indicated line but we cannot conclude that they are normally distributed.
I don’t think that the Multiple Regression Model is appropriate in this case