5202assign5

10.13

Cosmetics sales.

a.

cosmetic_sales <- read.table("~/Documents/5202assign5/cosmetic_sales.txt", quote="\"", comment.char="", stringsAsFactors=FALSE)
names(cosmetic_sales)<-c("Y","X1","X2","X3")
Y <- cosmetic_sales$Y
X1 <- cosmetic_sales$X1
X2 <- cosmetic_sales$X2
X3 <- cosmetic_sales$X3
n = length(Y)
fit2 <- lm('Y~X1+X2+X3',data=cosmetic_sales)
summary(fit2)

## 
## Call:
## lm(formula = "Y~X1+X2+X3", data = cosmetic_sales)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.4217 -0.9115  0.0703  1.1420  3.5479 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   1.0233     1.2029   0.851   0.4000  
## X1            0.9657     0.7092   1.362   0.1809  
## X2            0.6292     0.7783   0.808   0.4237  
## X3            0.6760     0.3557   1.900   0.0646 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.825 on 40 degrees of freedom
## Multiple R-squared:  0.7417, Adjusted R-squared:  0.7223 
## F-statistic: 38.28 on 3 and 40 DF,  p-value: 7.821e-12

print(fit2$fitted.values)

##         1         2         3         4         5         6         7 
## 12.523331 11.247489  9.232069 12.005132  8.594978 12.861599 15.557943 
##         8         9        10        11        12        13        14 
##  9.971563  7.134733  7.893822 11.358116  7.858766  5.889261 11.179547 
##        15        16        17        18        19        20        21 
##  8.541178  2.974351  6.328273 10.691463  7.226361 10.817552  9.962447 
##        22        23        24        25        26        27        28 
##  8.217182  4.404481 12.971120  8.623391  9.705286 11.580732  9.295027 
##        29        30        31        32        33        34        35 
## 12.812187  8.381651  5.989922  5.777189 11.658598  3.133836  9.459454 
##        36        37        38        39        40        41        42 
## 10.213518 16.930693  7.134775  6.559452  6.221696  8.324271  9.802153 
##        43        44 
##  9.014906 13.198505

The regression model is \[Y=1.0233+0.9657X_1+0.6292X_2+0.6760X_3+\epsilon\]

b.

fit2.aov <- anova(fit2)
SSR <- sum(fit2.aov[1:2, 2])
SSE <- sum(fit2$residuals^2)
F <- (SSR)/(3-1)/(SSE/fit2$df.residual)
print(sprintf('F*=%f',F))

## [1] "F*=55.613439"

print(sprintf('F(0.95;%d,%d)=%f',3-1,fit2$df.residual,df(0.95,3-1,fit2$df.residual)))

## [1] "F(0.95;2,40)=0.377368"

\[H_0:\beta_1=\beta_2=\beta_3=0\qquad\] \[H_a: \text{not all }\beta_k=0\ (k=1,2,3)\] The test statistic is \[\begin{align*} F^*&=\dfrac{\frac{SSR(X_1,X_2,X_3)}{p-1}}{\frac{SSE(X_1,X_2,X_3)}{n-p}}\\ &=\dfrac{MSR}{MSE} \end{align*}\]

Given \(\alpha\), the decision rule is

If \(F^*\leqslant F(1-\alpha;2,40),\) then conclude \(H_0\);

If \(F^*> F(1-\alpha;2,40),\) then conclude \(H_a\);

Here, \(F^*=55.613439>F(0.95;2,40)=0.377368\), therefore, conclude \(H_a\), i.e. not all \(\beta_k=0\ (k=1,2,3)\).

c.

SSE123 <- sum(fit2$residuals^2)
SSR1_23 <- sum(lm('Y~X2+X3',data=cosmetic_sales)$residuals^2)-SSE123
SSR2_13 <- sum(lm('Y~X1+X3',data=cosmetic_sales)$residuals^2)-SSE123
SSR3_12 <- sum(lm('Y~X1+X2',data=cosmetic_sales)$residuals^2)-SSE123
F1 <- SSR1_23/(SSE123/fit2$df.residual)
F2 <- SSR2_13/(SSE123/fit2$df.residual)
F3 <- SSR3_12/(SSE123/fit2$df.residual)
print(sprintf('When k=1, F*=%f',F1))

## [1] "When k=1, F*=1.854008"

print(sprintf('When k=2, F*=%f',F2))

## [1] "When k=2, F*=0.653481"

print(sprintf('When k=3, F*=%f',F3))

## [1] "When k=3, F*=3.611251"

print(sprintf('F(0.95;%d,%d)=%f',1,fit2$df.residual,df(0.95,1,fit2$df.residual)))

## [1] "F(0.95;1,40)=0.251396"

For \(k=1,2,3\), \[H_0:\beta_k=0\qquad H_a: \beta_k\neq 0\]
The test statistic is \[\begin{align*} F^*&=\dfrac{\frac{SSR(X_k|X_j,\text{ for }j=1,2,3,\ j\neq k)}{1}}{\frac{SSE(X_1,X_2,X_3)}{n-p}}\\ &=\dfrac{MSR(X_k|X_j,\text{ for }j=1,2,3,\ j\neq k)}{MSE(X_1,X_2,X_3)} \end{align*}\]

Given \(\alpha\), the decision rule is

If \(F^*\leqslant F(1-\alpha;1,40),\) then conclude \(H_0\);

If \(F^*> F(1-\alpha;1,40),\) then conclude \(H_a\);

Here, \(F^*=\begin{cases} 1.854008&,k=1\\ 0.653481&,k=2\\ 3.611251&,k=3 \end{cases}>F(0.95;1,40)=0.251396\), therefore, conclude \(H_a\), i.e. \(\beta_k\neq 0\ (k=1,2,3)\).

d.

cor(cbind(X1,X2,X3))

##           X1        X2        X3
## X1 1.0000000 0.9744313 0.3759509
## X2 0.9744313 1.0000000 0.4099208
## X3 0.3759509 0.4099208 1.0000000

e.

Under this model, the expect sales when \(X_1\) is increased by \(1\) thousand dollars and \(X_2\) and \(X_3\) are held constant, is \(\beta_1\). It can be estimated by \(b_1\). Since \(X_1\) and \(X_2\) may be linear related from (d), when \(X_2\) is fixed, \(X_1\) is almost fixed, i.e. \(b_1\approx0\). There is a contradiction and therefore, the data may not be suitable for the research objective.

11.11

a.

copier_maintenance <- read.table("~/Documents/5202assign5/copier_maintenance.txt", quote="\"", stringsAsFactors=FALSE)
names(copier_maintenance)<-c("Y","X1")
copier=rbind(copier_maintenance,data.frame(Y=c(132,166),X1=c(6,5)))
modelold=lm(Y~X1,data=copier_maintenance)
model47=lm(Y~X1,data=copier)
plot(copier$X1,copier$Y)
abline(model47,col='blue')
abline(modelold,col='green')
points(c(6,5),c(132,166),col='red')

Data plot with fitted line as shown. Blue line are 47 case fitted function, green line is 45 case fitted function, red points are added 2 cases. The 47 cases fitted line is shifted upward due to 2 extreme cases added.

b.

ei=model47$residuals
mad=1/0.6745*median(abs(ei-median(ei)))    
ui=ei/mad
wui=ifelse(abs(ui)<=1.345,1,1.345/abs(ui))
which.min(wui)

## 47 
## 47

From R output, case 47 received smallest Huber weights. Huber weights gives smallest weight to outlier to reduce the influence of influential cases, which is 47.

c.

newdata=data.frame(Y=copier$Y*wui,X1=copier$X1)
newmodel=lm(Y~X1,data=newdata)
model47;newmodel

## 
## Call:
## lm(formula = Y ~ X1, data = copier)
## 
## Coefficients:
## (Intercept)           X1  
##       1.886       15.109

## 
## Call:
## lm(formula = Y ~ X1, data = newdata)
## 
## Coefficients:
## (Intercept)           X1  
##       3.889       13.018

above all, the new coefficients are 3.889 and 13.018. \(\beta_0\) has increased and x1 has decreased from 15.109 to 13.018

14.14

a.

\(\beta_0\)=-1.17716; \(\beta_1\)=0.07279; \(\beta_2\)=-0.09899; \(\beta_3\)=0.43397 Fitted response function:

where \(x_31=1\) if male client

b.

Values of \(exp(\beta_i)\) shown below.

flu_shots <- read.table("~/Documents/5202assign5/flu_shots.txt", stringsAsFactors=FALSE)
names(flu_shots)<-c("Y","X1","X2","X3")
flu_shots$X3=as.factor(flu_shots$X3)
model=glm(Y~X1+X2+X3,data=flu_shots,family = binomial("logit"))
model$coefficients

## (Intercept)          X1          X2         X31 
## -1.17715922  0.07278802 -0.09898649  0.43397485

exp(model$coefficients[-1])

##        X1        X2       X31 
## 1.0755025 0.9057549 1.5433801

\(exp(b_1)\) means the odds of flu shot is 1.07 times higher for per unit increase in age. \(exp(b_2)\) means the odds of flu shot is 0.91 times for each increasing the health awareness index. \(exp(b_3)\) mean odds of flu hot is 1.54 times higher for male compared to female.

Q1

a.

Respective coefficients as follow

library(MASS)
model0.5=lm.ridge(Y~X1+X2+X3,data=cosmetic_sales,lambda =0.5 )
model5=lm.ridge(Y~X1+X2+X3,data=cosmetic_sales,lambda =5 )
model50=lm.ridge(Y~X1+X2+X3,data=cosmetic_sales,lambda =50 )
model0.5$coef

##        X1        X2        X3 
## 1.5673180 1.1104455 0.5723968

model5$coef

##        X1        X2        X3 
## 1.3347843 1.2167963 0.5591836

model50$coef

##        X1        X2        X3 
## 0.8768038 0.8661714 0.4393605

b.

Yes, almost all coefficients decreased when \(\lambda\) increases. Although the coefficient of \(x_2\) increases when \(\lambda\) is changed from 0.5 to 5, it significantly decreases when \(\lambda\) changed to 50.

5202assign5_cx

CX

15/04/2018

10.13

a.

b.

c.

d.

e.

11.11

a.

b.

c.

14.14

a.

b.

Q1

a.

b.

Q2