Problem 1:

Refer to the Brand preference data. In a small-scale experimental study of the relation between degree of brand liking (Y) and moisture content (X1) and sweetness (X2) of the product. (30 pts)

Part A

  1. Obtain the scatter plot matrix and the correlation matrix. What information do these diagnostic aids provide here?

The diagnostic aids show that firstly, there are no outliers and the distribution for each variable is normal. Additionally, looking at the correlation matrix, Y and X1 have significant positive correlation, Y and X2 are positively correlated, but less so than Y and X1 and there’s no corrleation between X1 and X2.

library(knitr)
Brand.Preference <- read.csv("/cloud/project/Brand Preference.csv")
plot(Brand.Preference)

round(cor(Brand.Preference),2)
##       Y   X1   X2
## Y  1.00 0.89 0.39
## X1 0.89 1.00 0.00
## X2 0.39 0.00 1.00
par(mfrow=c(1,3))
boxplot(Brand.Preference$Y,main="Y");boxplot(Brand.Preference$X1,main="X1");boxplot(Brand.Preference$X2,main="X2")

Part B

  1. Fit regression model to data. State the estimated regression function. Interpreted regression coefficients? (10pts)

The regression model yields the equation Y= 37.65 + 4.425X1 + 4.375X2. Holding the other variable constant, Increasing one unit of X1 leads to an increase in the brand liking degree by 4.425, and holding X1 constant, an one unit increase in X2 leads to an increase of the brand liking degreee of 4.375. Both X1 and X2 are significant as the P values for each variable are < 0.05.

f.pr1<-lm(Y~X1+X2,data=Brand.Preference)
summary(f.pr1)
## 
## Call:
## lm(formula = Y ~ X1 + X2, data = Brand.Preference)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -4.400 -1.762  0.025  1.587  4.200 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.6500     2.9961  12.566 1.20e-08 ***
## X1            4.4250     0.3011  14.695 1.78e-09 ***
## X2            4.3750     0.6733   6.498 2.01e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.693 on 13 degrees of freedom
## Multiple R-squared:  0.9521, Adjusted R-squared:  0.9447 
## F-statistic: 129.1 on 2 and 13 DF,  p-value: 2.658e-09

Part C

  1. Obtain the residuals, and prepare box plot of the residuals. What information does this plot provide? (10pts)

There are no outliers and errors are normally distributed.

resid<-f.pr1$residuals
boxplot(resid)

### Problem 2:

2- Refer to Commercial properties data. The age (X1), operating expenses and taxes (X2), vacancy rates (X3), total square footage (X4), and rental rates (Y). (50pts)

Part A:

  1. Obtain the scatter plot matrix and the correlation matrix. Interpret these and state your principal findings. (15 pts)

Firstly, looking at the correlation matrix, Y and X1 have a negative correlation. Y & X2, and Y & X4, have a similarly positive correlation, and Y and X3 have a very small positive correlation. The scatter plot matrix shows me that there are a few outliers, particularly for X3. Additionally, this plot tells me that variables are descrete.

Commercial.Properties <-- read.csv("/cloud/project/Commercial Properties.csv")
plot(Commercial.Properties)

round(cor(Commercial.Properties),2)
##        Y    X1    X2    X3   X4
## Y   1.00 -0.25  0.41  0.07 0.54
## X1 -0.25  1.00  0.39 -0.25 0.29
## X2  0.41  0.39  1.00 -0.38 0.44
## X3  0.07 -0.25 -0.38  1.00 0.08
## X4  0.54  0.29  0.44  0.08 1.00

Part B:

  1. Fit regression model for four predictor variables to the data. State the estimated regression function. (10pts)

The regression function is Y= -1.22e+01 - 1.42e-01(X1) + 2.82e-01(X2) + 6.193e-01(X3) + 7.824e-06(X4). X1, X2, and X4 are statistically significant. X3 is not significant.

f.pr2<-lm(Y~X1+X2+X3+X4,data=Commercial.Properties)
summary(f.pr2)
## 
## Call:
## lm(formula = Y ~ X1 + X2 + X3 + X4, data = Commercial.Properties)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9441 -0.5579  0.0910  0.5911  3.1872 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.220e+01  5.780e-01 -21.110  < 2e-16 ***
## X1          -1.420e-01  2.134e-02  -6.655 3.89e-09 ***
## X2           2.820e-01  6.317e-02   4.464 2.75e-05 ***
## X3           6.193e-01  1.087e+00   0.570     0.57    
## X4           7.924e-06  1.385e-06   5.722 1.98e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.137 on 76 degrees of freedom
## Multiple R-squared:  0.5847, Adjusted R-squared:  0.5629 
## F-statistic: 26.76 on 4 and 76 DF,  p-value: 7.272e-14

Part C

  1. Obtain the residuals and plot the residuals. Does the distribution appear to be fairly symmetrical? (10 pts)

The distribution is fairly normal, but there are outliers to both sides of the median with more outliers to the right of the median.

resid2<-f.pr2$residuals
boxplot(resid2)

Part D

  1. Plot the residuals against Y, and each predictor variable. Analyze yours plots and summarize your findings. (15pts)

X1, X3, and X4 have distributions skewed to the left while X2 is slightly skewed to the right. Y and the residuals have a normal distribution. Both X3 and the redisudals have outliers, but X3 only has outliers less than the median while the residual box plot shows outliers both above and below the median value.

resid2<-f.pr2$residuals
par(mfrow=c(1,3))
boxplot(resid2);boxplot(Commercial.Properties$Y,main="Y");boxplot(Commercial.Properties$X1,main="X1");boxplot(Commercial.Properties$X2,main="X2");boxplot(Commercial.Properties$X3,main="X3");boxplot(Commercial.Properties$X4,main="X4")

Problem 3

3- Refer to Problem 4 (Commercial properties data) (20 pts). Three properties with the following characteristics did not have any rental information available. 1 2 3 X1: 4.0 6.0 12.0 X2: 10.0 11.5 12.5 X3: 0.10 0 0.32 X4: 80,000 120,000 34,000

Predict rental rates based on the data above in the table.

eq. 1: Y= -1.22e+01 - 1.42e-01(4.0) + 2.82e-01(10.0) + 6.193e-01(0.1) + 7.824e-06(80,000). Rental rates = 1559.564222 eq. 2: Y= -1.22e+01 - 1.42e-01(6.0) + 2.82e-01(11.5) + 6.193e-01(0) + 7.824e-06(120,000). Rental rates = 2335.598028 eq. 3: Y= -1.22e+01 - 1.42e-01(12.0) + 2.82e-01(12.5) + 6.193e-01(0.32) + 7.824e-06(34,000). Rental rates = 666.3670595