Refer to the Brand preference data. In a small-scale experimental study of the relation between degree of brand liking (Y) and moisture content (X1) and sweetness (X2) of the product. (30 pts)
The diagnostic aids show that firstly, there are no outliers and the distribution for each variable is normal. Additionally, looking at the correlation matrix, Y and X1 have significant positive correlation, Y and X2 are positively correlated, but less so than Y and X1 and there’s no corrleation between X1 and X2.
library(knitr)
Brand.Preference <- read.csv("/cloud/project/Brand Preference.csv")
plot(Brand.Preference)
round(cor(Brand.Preference),2)
## Y X1 X2
## Y 1.00 0.89 0.39
## X1 0.89 1.00 0.00
## X2 0.39 0.00 1.00
par(mfrow=c(1,3))
boxplot(Brand.Preference$Y,main="Y");boxplot(Brand.Preference$X1,main="X1");boxplot(Brand.Preference$X2,main="X2")
The regression model yields the equation Y= 37.65 + 4.425X1 + 4.375X2. Holding the other variable constant, Increasing one unit of X1 leads to an increase in the brand liking degree by 4.425, and holding X1 constant, an one unit increase in X2 leads to an increase of the brand liking degreee of 4.375. Both X1 and X2 are significant as the P values for each variable are < 0.05.
f.pr1<-lm(Y~X1+X2,data=Brand.Preference)
summary(f.pr1)
##
## Call:
## lm(formula = Y ~ X1 + X2, data = Brand.Preference)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.400 -1.762 0.025 1.587 4.200
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.6500 2.9961 12.566 1.20e-08 ***
## X1 4.4250 0.3011 14.695 1.78e-09 ***
## X2 4.3750 0.6733 6.498 2.01e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.693 on 13 degrees of freedom
## Multiple R-squared: 0.9521, Adjusted R-squared: 0.9447
## F-statistic: 129.1 on 2 and 13 DF, p-value: 2.658e-09
There are no outliers and errors are normally distributed.
resid<-f.pr1$residuals
boxplot(resid)
### Problem 2:
2- Refer to Commercial properties data. The age (X1), operating expenses and taxes (X2), vacancy rates (X3), total square footage (X4), and rental rates (Y). (50pts)
Firstly, looking at the correlation matrix, Y and X1 have a negative correlation. Y & X2, and Y & X4, have a similarly positive correlation, and Y and X3 have a very small positive correlation. The scatter plot matrix shows me that there are a few outliers, particularly for X3. Additionally, this plot tells me that variables are descrete.
Commercial.Properties <-- read.csv("/cloud/project/Commercial Properties.csv")
plot(Commercial.Properties)
round(cor(Commercial.Properties),2)
## Y X1 X2 X3 X4
## Y 1.00 -0.25 0.41 0.07 0.54
## X1 -0.25 1.00 0.39 -0.25 0.29
## X2 0.41 0.39 1.00 -0.38 0.44
## X3 0.07 -0.25 -0.38 1.00 0.08
## X4 0.54 0.29 0.44 0.08 1.00
The regression function is Y= -1.22e+01 - 1.42e-01(X1) + 2.82e-01(X2) + 6.193e-01(X3) + 7.824e-06(X4). X1, X2, and X4 are statistically significant. X3 is not significant.
f.pr2<-lm(Y~X1+X2+X3+X4,data=Commercial.Properties)
summary(f.pr2)
##
## Call:
## lm(formula = Y ~ X1 + X2 + X3 + X4, data = Commercial.Properties)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.9441 -0.5579 0.0910 0.5911 3.1872
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.220e+01 5.780e-01 -21.110 < 2e-16 ***
## X1 -1.420e-01 2.134e-02 -6.655 3.89e-09 ***
## X2 2.820e-01 6.317e-02 4.464 2.75e-05 ***
## X3 6.193e-01 1.087e+00 0.570 0.57
## X4 7.924e-06 1.385e-06 5.722 1.98e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.137 on 76 degrees of freedom
## Multiple R-squared: 0.5847, Adjusted R-squared: 0.5629
## F-statistic: 26.76 on 4 and 76 DF, p-value: 7.272e-14
The distribution is fairly normal, but there are outliers to both sides of the median with more outliers to the right of the median.
resid2<-f.pr2$residuals
boxplot(resid2)
X1, X3, and X4 have distributions skewed to the left while X2 is slightly skewed to the right. Y and the residuals have a normal distribution. Both X3 and the redisudals have outliers, but X3 only has outliers less than the median while the residual box plot shows outliers both above and below the median value.
resid2<-f.pr2$residuals
par(mfrow=c(1,3))
boxplot(resid2);boxplot(Commercial.Properties$Y,main="Y");boxplot(Commercial.Properties$X1,main="X1");boxplot(Commercial.Properties$X2,main="X2");boxplot(Commercial.Properties$X3,main="X3");boxplot(Commercial.Properties$X4,main="X4")
3- Refer to Problem 4 (Commercial properties data) (20 pts). Three properties with the following characteristics did not have any rental information available. 1 2 3 X1: 4.0 6.0 12.0 X2: 10.0 11.5 12.5 X3: 0.10 0 0.32 X4: 80,000 120,000 34,000
Predict rental rates based on the data above in the table.
eq. 1: Y= -1.22e+01 - 1.42e-01(4.0) + 2.82e-01(10.0) + 6.193e-01(0.1) + 7.824e-06(80,000). Rental rates = 1559.564222 eq. 2: Y= -1.22e+01 - 1.42e-01(6.0) + 2.82e-01(11.5) + 6.193e-01(0) + 7.824e-06(120,000). Rental rates = 2335.598028 eq. 3: Y= -1.22e+01 - 1.42e-01(12.0) + 2.82e-01(12.5) + 6.193e-01(0.32) + 7.824e-06(34,000). Rental rates = 666.3670595