Cosmetics sales.
cosmetic_sales <- read.table("~/Documents/5202assign5/cosmetic_sales.txt", quote="\"", comment.char="", stringsAsFactors=FALSE)
names(cosmetic_sales)<-c("Y","X1","X2","X3")
Y <- cosmetic_sales$Y
X1 <- cosmetic_sales$X1
X2 <- cosmetic_sales$X2
X3 <- cosmetic_sales$X3
n = length(Y)
fit2 <- lm('Y~X1+X2+X3',data=cosmetic_sales)
summary(fit2)
##
## Call:
## lm(formula = "Y~X1+X2+X3", data = cosmetic_sales)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.4217 -0.9115 0.0703 1.1420 3.5479
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.0233 1.2029 0.851 0.4000
## X1 0.9657 0.7092 1.362 0.1809
## X2 0.6292 0.7783 0.808 0.4237
## X3 0.6760 0.3557 1.900 0.0646 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.825 on 40 degrees of freedom
## Multiple R-squared: 0.7417, Adjusted R-squared: 0.7223
## F-statistic: 38.28 on 3 and 40 DF, p-value: 7.821e-12
print(fit2$fitted.values)
## 1 2 3 4 5 6 7
## 12.523331 11.247489 9.232069 12.005132 8.594978 12.861599 15.557943
## 8 9 10 11 12 13 14
## 9.971563 7.134733 7.893822 11.358116 7.858766 5.889261 11.179547
## 15 16 17 18 19 20 21
## 8.541178 2.974351 6.328273 10.691463 7.226361 10.817552 9.962447
## 22 23 24 25 26 27 28
## 8.217182 4.404481 12.971120 8.623391 9.705286 11.580732 9.295027
## 29 30 31 32 33 34 35
## 12.812187 8.381651 5.989922 5.777189 11.658598 3.133836 9.459454
## 36 37 38 39 40 41 42
## 10.213518 16.930693 7.134775 6.559452 6.221696 8.324271 9.802153
## 43 44
## 9.014906 13.198505
The regression model is \[Y=1.0233+0.9657X_1+0.6292X_2+0.6760X_3+\epsilon\]
fit2.aov <- anova(fit2)
SSR <- sum(fit2.aov[1:2, 2])
SSE <- sum(fit2$residuals^2)
F <- (SSR)/(3-1)/(SSE/fit2$df.residual)
print(sprintf('F*=%f',F))
## [1] "F*=55.613439"
print(sprintf('F(0.95;%d,%d)=%f',3-1,fit2$df.residual,df(0.95,3-1,fit2$df.residual)))
## [1] "F(0.95;2,40)=0.377368"
\[H_0:\beta_1=\beta_2=\beta_3=0\qquad\] \[H_a: \text{not all }\beta_k=0\ (k=1,2,3)\] The test statistic is \[\begin{align*} F^*&=\dfrac{\frac{SSR(X_1,X_2,X_3)}{p-1}}{\frac{SSE(X_1,X_2,X_3)}{n-p}}\\ &=\dfrac{MSR}{MSE} \end{align*}\]
Given \(\alpha\), the decision rule is
If \(F^*\leqslant F(1-\alpha;2,40),\) then conclude \(H_0\);
If \(F^*> F(1-\alpha;2,40),\) then conclude \(H_a\);
Here, \(F^*=55.613439>F(0.95;2,40)=0.377368\), therefore, conclude \(H_a\), i.e. not all \(\beta_k=0\ (k=1,2,3)\).
SSE123 <- sum(fit2$residuals^2)
SSR1_23 <- sum(lm('Y~X2+X3',data=cosmetic_sales)$residuals^2)-SSE123
SSR2_13 <- sum(lm('Y~X1+X3',data=cosmetic_sales)$residuals^2)-SSE123
SSR3_12 <- sum(lm('Y~X1+X2',data=cosmetic_sales)$residuals^2)-SSE123
F1 <- SSR1_23/(SSE123/fit2$df.residual)
F2 <- SSR2_13/(SSE123/fit2$df.residual)
F3 <- SSR3_12/(SSE123/fit2$df.residual)
print(sprintf('When k=1, F*=%f',F1))
## [1] "When k=1, F*=1.854008"
print(sprintf('When k=2, F*=%f',F2))
## [1] "When k=2, F*=0.653481"
print(sprintf('When k=3, F*=%f',F3))
## [1] "When k=3, F*=3.611251"
print(sprintf('F(0.95;%d,%d)=%f',1,fit2$df.residual,df(0.95,1,fit2$df.residual)))
## [1] "F(0.95;1,40)=0.251396"
For \(k=1,2,3\), \[H_0:\beta_k=0\qquad H_a: \beta_k\neq 0\]
The test statistic is \[\begin{align*}
F^*&=\dfrac{\frac{SSR(X_k|X_j,\text{ for }j=1,2,3,\ j\neq k)}{1}}{\frac{SSE(X_1,X_2,X_3)}{n-p}}\\
&=\dfrac{MSR(X_k|X_j,\text{ for }j=1,2,3,\ j\neq k)}{MSE(X_1,X_2,X_3)}
\end{align*}\]
Given \(\alpha\), the decision rule is
If \(F^*\leqslant F(1-\alpha;1,40),\) then conclude \(H_0\);
If \(F^*> F(1-\alpha;1,40),\) then conclude \(H_a\);
Here, \(F^*=\begin{cases} 1.854008&,k=1\\ 0.653481&,k=2\\ 3.611251&,k=3 \end{cases}>F(0.95;1,40)=0.251396\), therefore, conclude \(H_a\), i.e. \(\beta_k\neq 0\ (k=1,2,3)\).
cor(cbind(X1,X2,X3))
## X1 X2 X3
## X1 1.0000000 0.9744313 0.3759509
## X2 0.9744313 1.0000000 0.4099208
## X3 0.3759509 0.4099208 1.0000000
Under this model, the expect sales when \(X_1\) is increased by \(1\) thousand dollars and \(X_2\) and \(X_3\) are held constant, is \(\beta_1\). It can be estimated by \(b_1\). Since \(X_1\) and \(X_2\) may be linear related from (d), when \(X_2\) is fixed, \(X_1\) is almost fixed, i.e. \(b_1\approx0\). There is a contradiction and therefore, the data may not be suitable for the research objective.
copier_maintenance <- read.table("~/Documents/5202assign5/copier_maintenance.txt", quote="\"", stringsAsFactors=FALSE)
names(copier_maintenance)<-c("Y","X1")
copier=rbind(copier_maintenance,data.frame(Y=c(132,166),X1=c(6,5)))
modelold=lm(Y~X1,data=copier_maintenance)
model47=lm(Y~X1,data=copier)
plot(copier$X1,copier$Y)
abline(model47,col='blue')
abline(modelold,col='green')
points(c(6,5),c(132,166),col='red')
Data plot with fitted line as shown. Blue line are 47 case fitted function, green line is 45 case fitted function, red points are added 2 cases. The 47 cases fitted line is shifted upward due to 2 extreme cases added.
ei=model47$residuals
mad=1/0.6745*median(abs(ei-median(ei)))
ui=ei/mad
wui=ifelse(abs(ui)<=1.345,1,1.345/abs(ui))
which.min(wui)
## 47
## 47
From R output, case 47 received smallest Huber weights. Huber weights gives smallest weight to outlier to reduce the influence of influential cases, which is 47.
newdata=data.frame(Y=copier$Y*wui,X1=copier$X1)
newmodel=lm(Y~X1,data=newdata)
model47;newmodel
##
## Call:
## lm(formula = Y ~ X1, data = copier)
##
## Coefficients:
## (Intercept) X1
## 1.886 15.109
##
## Call:
## lm(formula = Y ~ X1, data = newdata)
##
## Coefficients:
## (Intercept) X1
## 3.889 13.018
above all, the new coefficients are 3.889 and 13.018. \(\beta_0\) has increased and x1 has decreased from 15.109 to 13.018
\(\beta_0\)=-1.17716; \(\beta_1\)=0.07279; \(\beta_2\)=-0.09899; \(\beta_3\)=0.43397 Fitted response function:
where \(x_31=1\) if male client
Values of \(exp(\beta_i)\) shown below.
flu_shots <- read.table("~/Documents/5202assign5/flu_shots.txt", stringsAsFactors=FALSE)
names(flu_shots)<-c("Y","X1","X2","X3")
flu_shots$X3=as.factor(flu_shots$X3)
model=glm(Y~X1+X2+X3,data=flu_shots,family = binomial("logit"))
model$coefficients
## (Intercept) X1 X2 X31
## -1.17715922 0.07278802 -0.09898649 0.43397485
exp(model$coefficients[-1])
## X1 X2 X31
## 1.0755025 0.9057549 1.5433801
\(exp(b_1)\) means the odds of flu shot is 1.07 times higher for per unit increase in age. \(exp(b_2)\) means the odds of flu shot is 0.91 times for each increasing the health awareness index. \(exp(b_3)\) mean odds of flu hot is 1.54 times higher for male compared to female.
Respective coefficients as follow
library(MASS)
model0.5=lm.ridge(Y~X1+X2+X3,data=cosmetic_sales,lambda =0.5 )
model5=lm.ridge(Y~X1+X2+X3,data=cosmetic_sales,lambda =5 )
model50=lm.ridge(Y~X1+X2+X3,data=cosmetic_sales,lambda =50 )
model0.5$coef
## X1 X2 X3
## 1.5673180 1.1104455 0.5723968
model5$coef
## X1 X2 X3
## 1.3347843 1.2167963 0.5591836
model50$coef
## X1 X2 X3
## 0.8768038 0.8661714 0.4393605
Yes, almost all coefficients decreased when \(\lambda\) increases. Although the coefficient of \(x_2\) increases when \(\lambda\) is changed from 0.5 to 5, it significantly decreases when \(\lambda\) changed to 50.