Problem Statement

The clock prices data set contains the selling Price (in pounds Stirling) of 32 antique grandfather clocks in different auctions, along with the Age of the clocks in years and the number of Bidders participating in that auction.

The data used for the analysis is available in https://drive.google.com/open?id=1yvDDNzmpJrDfbLI744fNSgrlXwvY-bLm

VARIABLES

  1. Age - Age of the clock (years)

  2. Bidders - Number of individuals participating in the bidding

  3. Price - Selling price (pounds Stirling)

We’re interested in modeling the Price (Dependent Variable) based on Age and Bidders. The Model can be Simple Linear Regression Model or a Multiple Linear Regression Model. Finally, we choose a Model that is best able to represent the given data.

clock.data <- read.table("C:\\Users\\Lenovo\\Desktop\\clock_prices.data", header = TRUE)
head(clock.data)
##   Age Bidders Price
## 1 127      13  1235
## 2 115      12  1080
## 3 127       7   845
## 4 150       9  1522
## 5 156       6  1047
## 6 182      11  1979
summary(clock.data)
##       Age           Bidders           Price     
##  Min.   :108.0   Min.   : 5.000   Min.   : 729  
##  1st Qu.:117.0   1st Qu.: 7.000   1st Qu.:1053  
##  Median :140.0   Median : 9.000   Median :1258  
##  Mean   :144.9   Mean   : 9.531   Mean   :1327  
##  3rd Qu.:168.5   3rd Qu.:11.250   3rd Qu.:1561  
##  Max.   :194.0   Max.   :15.000   Max.   :2131
str(clock.data)
## 'data.frame':    32 obs. of  3 variables:
##  $ Age    : int  127 115 127 150 156 182 156 132 137 113 ...
##  $ Bidders: int  13 12 7 9 6 11 12 10 9 9 ...
##  $ Price  : int  1235 1080 845 1522 1047 1979 1822 1253 1297 946 ...

Data Analysis

1. Graphically analyze the data and comment on how the age of the clock and the number of bidders are affecting the auctioned selling price.

Plotting the given data yields :

plot(clock.data, pch=16)

A 3-D view of the Data would look like :

scatterplot3d(clock.data)

Looking at the 2D plots, we can answer the following :

  1. Price of the Clock seems to be linearly related to the Age of the Clock.

  2. The Price of the Clock seems to be linearly related to the Number of Bidders on the Clock.

  3. Age of the Clock and Number of Bidders don’t seem to have a strong correlation between each other.

# Regression Plane

s3d <-scatterplot3d(clock.data, pch=16, highlight.3d=TRUE,
  type="h", main="3D Scatterplot")
fit <- lm(Price ~ Age+Bidders,data = clock.data) 
s3d$plane3d(fit)

Looking at the 3D plot allows us to infer the following :

Price of the Clock appears to have a linear relationship with Age of the Clocks and the Number of Bidders.

The immediate next step in our analysis would be to check the Correlation between the different Variables in the data.

cor(clock.data)
##                Age    Bidders     Price
## Age      1.0000000 -0.2537491 0.7302332
## Bidders -0.2537491  1.0000000 0.3946404
## Price    0.7302332  0.3946404 1.0000000

The correlation Matrix simply confirms our inferences from the visual inspection of plots.

Fitting a First Order Model

2. Fit a first order multiple regression model to the data and answer the following based on this model :

Before we proceed with fitting a Full First Order Model, it is important to explore if a Full First Order Model itself is needed to explain the relationship between Price of the Clock and Age of the Clock and/or Number of Bidders for the Clock.

First we fit a Simple Linear Regression Model (SLRM) by considering only Price (Dependent Variable) and Age of the Clock (Regressor).

slrm1 <- lm(Price ~ Age, data=clock.data)
summary(slrm1)
## 
## Call:
## lm(formula = Price ~ Age, data = clock.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -485.29 -192.66   30.75  157.21  541.21 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -191.66     263.89  -0.726    0.473    
## Age            10.48       1.79   5.854  2.1e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 273 on 30 degrees of freedom
## Multiple R-squared:  0.5332, Adjusted R-squared:  0.5177 
## F-statistic: 34.27 on 1 and 30 DF,  p-value: 2.096e-06

We can see that 53.32% of the Total Variability in Price can be explained by Age of the Clock. (Based on R2 value we have obtained)

Similarly, we can fit a SLRM by considering only Number of Bidders as Regressor Variable.

slrm2 <- lm(Price ~ Bidders, data=clock.data)
summary(slrm2)
## 
## Call:
## lm(formula = Price ~ Bidders, data = clock.data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -516.31 -355.27  -29.49  302.76  688.23 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)   806.40     230.68   3.496  0.00149 **
## Bidders        54.64      23.23   2.352  0.02540 * 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 367.2 on 30 degrees of freedom
## Multiple R-squared:  0.1557, Adjusted R-squared:  0.1276 
## F-statistic: 5.534 on 1 and 30 DF,  p-value: 0.0254

As can be seen from the R2 values, Number of Bidders is able to explain only 15.57% of the variability in Price of the Clock.

Next we can try fitting a Full First Order Multiple Regression Model as follows :

mlrm1 <- lm(Price ~ Age + Bidders, data=clock.data)
summary(mlrm1)
## 
## Call:
## lm(formula = Price ~ Age + Bidders, data = clock.data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -207.2 -117.8   16.5  102.7  213.5 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1336.7221   173.3561  -7.711 1.67e-08 ***
## Age            12.7362     0.9024  14.114 1.60e-14 ***
## Bidders        85.8151     8.7058   9.857 9.14e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 133.1 on 29 degrees of freedom
## Multiple R-squared:  0.8927, Adjusted R-squared:  0.8853 
## F-statistic: 120.7 on 2 and 29 DF,  p-value: 8.769e-15

As can be observed from the R2 value, the Model with both Age of the Clock and Number of Bidders is explaining 89.27% of the variability in Price.

Here we can conclude that including both Bidders and Age in the Model has allowed us to explain more of the variability in Price as compared to SLRM for Bidders and Age individually.

The next immediate questions that arise from fitting such a Model would be :

  1. What was left unexplained in Price after Age was included in the Model?

  2. How much of this unexplained variability can Bidders explain if it is also included in the Model?

To answer both the above questions, we can take the help of Partial Regression Plots. (Added-Variable Plots)

In the 2 predictor Model, we have Price, Age and Bidders for the Clock. The partial Regression plot for Bidders gives us information on including Bidders in the model, AFTER Age has already been included. It is created based on the Residuals from the following 2 Models :

  1. lm(Price ~ Age)

  2. lm(Bidders ~ Age)

slrm1 <- lm(Price ~ Age, data=clock.data)

#Regression for Bidders on Age
slrm3 <- lm(Bidders ~ Age, data=clock.data)

#Plotting Residuals against each other
plot(slrm3$residuals, slrm1$residuals, pch=16)

If a Linear Relationship is observed here, it is better to include the Bidders as the second predictor.

mlrm.reg.part <- lm(slrm1$residuals ~ slrm3$residuals)
summary(mlrm.reg.part)
## 
## Call:
## lm(formula = slrm1$residuals ~ slrm3$residuals)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -207.2 -117.8   16.5  102.7  213.5 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     1.347e-14  2.314e+01    0.00        1    
## slrm3$residuals 8.582e+01  8.559e+00   10.03 4.31e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 130.9 on 30 degrees of freedom
## Multiple R-squared:  0.7701, Adjusted R-squared:  0.7625 
## F-statistic: 100.5 on 1 and 30 DF,  p-value: 4.309e-11

The R2 value here indicates that Adding Bidders to the Model (Price ~ Age) is able to explain 77.01% of the variability that was left unexplained by that model

The fitted line in the Partial Regression Plot also gives us the Regression Coefficient for Bidders in the MLRM.

plot(slrm3$residuals, slrm1$residuals, pch=16)
abline(mlrm.reg.part)

Since we have fit a First Order Model, we can proceed with answering the rest of the questions.

a. Is the Model useful?

To answer this, we can check the F-values and the corresponding p-values obtained from our Model.

summary(mlrm1)
## 
## Call:
## lm(formula = Price ~ Age + Bidders, data = clock.data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -207.2 -117.8   16.5  102.7  213.5 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1336.7221   173.3561  -7.711 1.67e-08 ***
## Age            12.7362     0.9024  14.114 1.60e-14 ***
## Bidders        85.8151     8.7058   9.857 9.14e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 133.1 on 29 degrees of freedom
## Multiple R-squared:  0.8927, Adjusted R-squared:  0.8853 
## F-statistic: 120.7 on 2 and 29 DF,  p-value: 8.769e-15

The low p-values observed for both \(\beta\)0 and \(\beta\)1 is quite low allowing us to conclude that both the values are significant.

We can also proceed with creation of ANOVA Table that allows us to infer if R2 obtained is significant or not.

anova(mlrm1)
## Analysis of Variance Table
## 
## Response: Price
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## Age        1 2554859 2554859 144.136 8.957e-13 ***
## Bidders    1 1722301 1722301  97.166 9.135e-11 ***
## Residuals 29  514035   17725                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As can be seen, p-values for both Age and Bidders is nearly 0, allowing us to conclude that both \(\beta\)0 and \(\beta\)1 are both significant.

Therefore, we can conclude that the following Model that has been fitted is useful :

Price = -1336.7221 + 12.7362(Age) + 85.8151(Bidders)

b. Given the age of a clock, by what amount can one expect the selling price to go up for one more person participating in the auction?

Using the fitted Model described above, we can say that - for a Clock with given age, an increase of 1 Bidder in the number of Bidders, is associated with an increase of 85.8151? in the Mean Price of the Clock.

  1. An auction house has acquired several grandfather clocks each 100 years old paying an average price of ?500 per clock. From the past experience it has found that such auctions (for antique grandfather clocks) typically attract about 10-12 bidders. What can be said about its expected profit per clock with 95% confidence?

We need to find 95% Confidence Interval for the (Price - 500)? for a clock that is 100 years old and has 10 Bidders.

Effectively we are finding E(Price|Age = 100, Bidders = 10), E(Price|Age = 100, Bidders = 11), and E(Price|Age = 100, Bidders = 12).

#For Bidders = 10
exp.value <- predict(mlrm1, newdata = data.frame(Age = 100, Bidders = 10),interval = "confidence", level = .95)
exp.value[2]-500
## [1] 200.6368
#For Bidders = 11
exp.value <- predict(mlrm1, newdata = data.frame(Age = 100, Bidders = 11),interval = "confidence", level = .95)
exp.value[2]-500
## [1] 287.1706
#For Bidders = 12
exp.value <- predict(mlrm1, newdata = data.frame(Age = 100, Bidders = 12),interval = "confidence", level = .95)
exp.value[2]-500
## [1] 370.3602

The Expected Profit per Clock that is 100 years old and has 10 Bidders, with 95% confidence is 200.6368?

The Expected Profit per Clock that is 100 years old and has 11 Bidders, with 95% confidence is 287.1706?

The Expected Profit per Clock that is 100 years old and has 10 Bidders, with 95% confidence is 370.3602?

c. You walk into an auction selling an antique 150 year old grandfather clock and find that there are 15 bidders (including yourself) participating in the auction. You are extremely keen in acquiring the clock. At least what amount should you bid for the clock, so that, you are 99% certain that nobody else can out-bid you?

For this, we need to predict a lower bound for a Predicted value of Y (\(\hat{Y}\)) for the given values of Age = 150 years and Bidders = 15.

MSE <- mean(mlrm1$residuals^2)

beta_matrix <- as.matrix(mlrm1$coefficients, ncol=1)


new_data <- as.matrix(c(1, 150,15))

prediction_price <- t(new_data) %*% beta_matrix
t.value <- qt(0.99,30)
x <- cbind(clock.data$Age,clock.data$Bidders)
x_h <- matrix(c(150,15), nrow=1,ncol = 2)

val <- x_h %*% solve( t(x) %*% x ) %*% t(x_h)
lower.bound.price <- prediction_price-t.value*sqrt(MSE)*sqrt(1+val)
lower.bound.price
##         [,1]
## [1,] 1533.38

From the above calculations, we can say that if we bid at a Price higher than 1533.38?, we can be 99% certain that no one else can out-bid us.

d. Find the partial correlation coefficients, compare them with the corresponding marginal correlation coefficients, and comment on the nature of the relationships between the independent variables and the dependent variable.

Marginal COrrelation Coefficients can be calculated as follows :

  1. R2Price,Age is the R2 value obtained for model lm(Price ~ Age)

  2. R2Price,Bidders is the R2 value obtained for model lm(Price ~ Bidders)

marginal.corr.coeff.Age <- summary(slrm1)$r.squared
marginal.corr.coeff.Bidders <- summary(slrm2)$r.squared
marginal.corr.coeff.Age
## [1] 0.5332405
marginal.corr.coeff.Bidders
## [1] 0.155741

Marginal Correlation Coefficient for the Model Price ~ Age is : 0.5332405

Marginal Correlation coefficient for the Model Price ~ Bidders is : 0.155741

We can infer that Age alone is able to explain 53.32% of the variability in Price and Bidders alone is able to explain 15.57% of the variability in Price.

For Partial Correlation coefficients, we can proceed to calculate

R2~Price , Age|Bidders~ and R2~Price , Bidders|Age~ as follows :

# Here - SSR(Age, Bidders) = SSR(Age) + SSR(Bidders|Age) - which is actually sequential sum of square output from anova.

first.order.mlrm.full <- lm(Price ~  Age + Bidders, data = clock.data)

# SSR(Age, Bidders)
SSReg.Age.and.Bidders <- sum(anova(first.order.mlrm.full)[-3,2])
SSReg.Age.and.Bidders
## [1] 4277160
# Total sum of square 
SStotal.Age.and.Bidders <- sum(anova(first.order.mlrm.full)[,2])
SStotal.Age.and.Bidders
## [1] 4791194
SSE.Age.and.Bidders <- SStotal.Age.and.Bidders - SSReg.Age.and.Bidders

##Calculation of SSR(Bidders|Age)
slrm1 <- lm(Price ~ Age, data=clock.data)
summary(anova(slrm1))
##        Df            Sum Sq           Mean Sq           F value     
##  Min.   : 1.00   Min.   :2236335   Min.   :  74545   Min.   :34.27  
##  1st Qu.: 8.25   1st Qu.:2315966   1st Qu.: 694623   1st Qu.:34.27  
##  Median :15.50   Median :2395597   Median :1314702   Median :34.27  
##  Mean   :15.50   Mean   :2395597   Mean   :1314702   Mean   :34.27  
##  3rd Qu.:22.75   3rd Qu.:2475228   3rd Qu.:1934780   3rd Qu.:34.27  
##  Max.   :30.00   Max.   :2554859   Max.   :2554859   Max.   :34.27  
##                                                      NA's   :1      
##      Pr(>F)       
##  Min.   :2.1e-06  
##  1st Qu.:2.1e-06  
##  Median :2.1e-06  
##  Mean   :2.1e-06  
##  3rd Qu.:2.1e-06  
##  Max.   :2.1e-06  
##  NA's   :1
SSReg.Age <- sum(anova(slrm1)[-2,2])
#SSReg.Age

SStotal.Age <- sum(anova(slrm1)[,2])
#SStotal.Age

SSE.Age <- SStotal.Age - SSReg.Age

#Calculate SSR(Bidders|Age) <- SSE(Age) - SSE(Age,Bidders)
SSReg.Bidders.given.Age <- SSE.Age - SSE.Age.and.Bidders
#SSReg.Bidders.given.Age

#Calculation of SSR(Age|Bidders)
slrm2 <- lm(Price ~ Bidders, data=clock.data)
SSReg.Bidders <- sum(anova(slrm2)[-2,2])
#SSReg.Bidders

SSTotal.Bidders <- sum(anova(slrm2)[,2])
#SSTotal.Bidders

SSE.Bidders <- SSTotal.Bidders - SSReg.Bidders

#Calculate SSR(Age|Bidders) <- SSE(Bidders) - SSE(Age,Bidders)
SSReg.Age.given.Bidders <- SSE.Bidders - SSE.Age.and.Bidders
#SSReg.Age.given.Bidders


R.squared.Age.given.Bidders <- SSReg.Age.given.Bidders/SSE.Bidders

R.squared.Bidders.given.Age <- SSReg.Bidders.given.Age/SSE.Age

cat("R2 value for Price,Age|Bidders : ",R.squared.Age.given.Bidders,"\nR2 value for Price,Bidders|Age : ",R.squared.Bidders.given.Age)
## R2 value for Price,Age|Bidders :  0.8729213 
## R2 value for Price,Bidders|Age :  0.7701442

We can interpret the above results as follows :

  1. Age is able to explain 87.29% of the variation remaining in Price after including Bidders.

  2. Bidders is able to explain 77.01% of the variation remaining in Price after including Age.

Observing the Marginal and Partial Coefficients of determination, we can conclude that :

  1. Age alone is able to explain more variability in Price compared to Bidders alone.

  2. Inclusion of Age in the model for Price (on top of Bidders) is able to explain more of the remaining variability in Price, as compared to inclusion of Bidders in the model for Price (on top of Age).

f. In presence of the other, which of the two factors, age of the clock or the number of bidders, is more important in determining the selling price of a clock?

To answer this, we first build a standardized First Order Linear Model as follows :

std.clock <- clock.data

std.clock$Price <- (clock.data$Price - mean(clock.data$Price))/sd(clock.data$Price)
std.clock$Age <- (clock.data$Age - mean(clock.data$Age))/sd(clock.data$Age)
std.clock$Bidders <- (clock.data$Bidders - mean(clock.data$Bidders))/sd(clock.data$Bidders)

standard.model <- lm(Price ~ -1 + Age + Bidders,data = std.clock)

summary(standard.model)
## 
## Call:
## lm(formula = Price ~ -1 + Age + Bidders, data = std.clock)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.52699 -0.29976  0.04196  0.26121  0.54305 
## 
## Coefficients:
##         Estimate Std. Error t value Pr(>|t|)    
## Age      0.88752    0.06183   14.36 5.60e-15 ***
## Bidders  0.61985    0.06183   10.03 4.31e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.333 on 30 degrees of freedom
## Multiple R-squared:  0.8927, Adjusted R-squared:  0.8856 
## F-statistic: 124.8 on 2 and 30 DF,  p-value: 2.872e-15
vcov(standard.model)
##                  Age      Bidders
## Age     0.0038223614 0.0009699208
## Bidders 0.0009699208 0.0038223614
val <- (0.88752-0.61985)/sqrt(2*(0.0038223614-0.0009699208))
p.val <- 2*(1-pt(val,30))
cat("p-value for Statistical significance between Coefficients of Age and Bidders",p.val)
## p-value for Statistical significance between Coefficients of Age and Bidders 0.001314188

As can be observed from the calculations above, coefficient of standardized Age of the Clock is higher than the coefficient of standardized Number of Bidders for the Clock. The difference between the two values is also significant as indicated by a p-value of 0.001314. (Considering \(\alpha\) = 5%)

We can conclude that Age of the Clcok is more important in determining the selling price of the Clock compared to Number of Bidders.

3. Is the first order model acceptable? Fit as appropriate a Model as possible for the auctioned selling price of grandfather clocks, based on the information on the age of the clock and the number of bidders, and then based on this model answer the same questions as in 2. b, c, and d above.

To answer whether a Model is acceptable or not, we proceed with plotting the residuals obtained after fitting the model.

For the 1st Order Model we fitted earlier :

#res_analysis(mlrm1, clock.data)
  
sres<-residuals(mlrm1)/(133.1*sqrt(1-influence(mlrm1)$hat))
hist(sres)

boxplot(sres,main="Sres")

plot(mlrm1)

plot(clock.data$Age,sres)

plot(clock.data$Bidders,sres)

normtest(sres)
## Warning: package 'nortest' was built under R version 3.4.1
##                                           Method   P.Value
## 1                    Shapiro-Wilk normality test 0.1522839
## 2                Anderson-Darling normality test 0.2488978
## 3                Cramer-von Mises normality test 0.2872351
## 4 Lilliefors (Kolmogorov-Smirnov) normality test 0.2001644
## 5                 Shapiro-Francia normality test 0.3231334
bptest(mlrm1)
## 
##  studentized Breusch-Pagan test
## 
## data:  mlrm1
## BP = 0.43689, df = 2, p-value = 0.8038

As can be seen from the Residual plots, there are no visible patterns in the Residuals obtained from the fitted model.

It can also be seen from the Normality Tests and the Breusch-Pagan Tests that the Residuals follow a Normal Distribution and are homoscedastic in nature.

Based on the above, we can conclude that the First Order Model we have fitted is Acceptable.

Higher Order Model

We belive that a second order Model might be able to explain the variance in Prices better than the First Order Model already fitted. We can check this by fitting a Second Order Model as follows :

second.mlrm <- lm(Price ~ Age + Bidders + I(Age*Bidders), data = clock.data)
summary(second.mlrm)
## 
## Call:
## lm(formula = Price ~ Age + Bidders + I(Age * Bidders), data = clock.data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -146.772  -70.985    2.108   47.535  201.959 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      322.7544   293.3251   1.100  0.28056    
## Age                0.8733     2.0197   0.432  0.66877    
## Bidders          -93.4099    29.7077  -3.144  0.00392 ** 
## I(Age * Bidders)   1.2979     0.2110   6.150 1.22e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 88.37 on 28 degrees of freedom
## Multiple R-squared:  0.9544, Adjusted R-squared:  0.9495 
## F-statistic: 195.2 on 3 and 28 DF,  p-value: < 2.2e-16
#fit <- lm(formula = Price ~ Age + Bidders + I(Age*Bidders),data = clock.data)
fit.aov <- anova(second.mlrm)
tab <- as.table(cbind(
  'SS' = c("SSR(x1, x2, x3)" = sum(fit.aov[1:3, 2]),
         "SSR(x1)"           = fit.aov[1, 2],
         "SSR(x2|x1)"        = fit.aov[2, 2],
         "SSR(x3|x1, x2)"    = fit.aov[3, 2],
         "SSE"               = fit.aov[4, 2],
         "Total"             = sum(fit.aov[, 2])),

  'Df' = c(                    sum(fit.aov[1:3, 1]),
                               fit.aov[1, 1],
                               fit.aov[2, 1],
                               fit.aov[3, 1],
                               fit.aov[4, 1],
                               sum(fit.aov$Df)),

  'MS' = c(                    sum(fit.aov[1:3, 2]) / sum(fit.aov[1:3, 1]),
                               fit.aov[1, 3],
                               fit.aov[2, 3],
                               fit.aov[3, 3],
                               fit.aov[4, 3],
                               NA)
))

round(tab, 2)
##                         SS         Df         MS
## SSR(x1, x2, x3) 4572547.99       3.00 1524182.66
## SSR(x1)         2554859.01       1.00 2554859.01
## SSR(x2|x1)      1722300.69       1.00 1722300.69
## SSR(x3|x1, x2)   295388.28       1.00  295388.28
## SSE              218646.23      28.00    7808.79
## Total           4791194.22      31.00

As can be seen from the R2 value obtained, this Model is able to explain 95.44% of the variance in Price.

a. Is the Model useful?

To check if the Model is useful, we can observe the F-value and p-value for each of the Variables from the ANOVA table below.

anova(second.mlrm)
## Analysis of Variance Table
## 
## Response: Price
##                  Df  Sum Sq Mean Sq F value    Pr(>F)    
## Age               1 2554859 2554859 327.177 < 2.2e-16 ***
## Bidders           1 1722301 1722301 220.559 8.372e-15 ***
## I(Age * Bidders)  1  295388  295388  37.828 1.222e-06 ***
## Residuals        28  218646    7809                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As can be seen, the p values are quite low - indicating that Age, Bidders and interaction between Age and Bidders is significant.

Since this establishes the usefulness of the Model, we can proceed with answering the questions.

b. Given the age of a clock, by what amount can one expect the selling price to go up for one more person participating in the auction?

We can write the fitted Model as :

E(Price) = 322.7544 + 0.8733(Age) - 93.4099(Bidders) + 1.2979(Age * Bidders)

This can be re-written as :

E(Price) = 322.7544 + 0.8733(Age) + (-93.4099 + 1.2979 x Age) x Bidders

For a given Age, the Expected selling Price of a Clock will go up by (-93.4099 + 1.2979 x Age) where Age will be a constant given to us.

c. An auction house has acquired several grandfather clocks each 100 years old paying an average price of ?500 per clock. From the past experience it has found that such auctions (for antique grandfather clocks) typically attract about 10-12 bidders. What can be said about its expected profit per clock with 95% confidence?

We need to find 95% Confidence Interval for the (Price - 500)? for a clock that is 100 years old and has 10 to 12 Bidders.

Effectively we are finding E(Price|Age = 100, Bidders = 10), E(Price|Age = 100, Bidders = 11), and E(Price|Age = 100, Bidders = 12).

#For Bidders = 10
exp.value1 <- predict(second.mlrm, newdata = data.frame(Age = 100, Bidders = 10),interval = "confidence", level = .95)
exp.value1[2] - 500
## [1] 210.7254
#For Bidders = 11
exp.value2 <- predict(second.mlrm, newdata = data.frame(Age = 100, Bidders = 11),interval = "confidence", level = .95)
exp.value2[2]-500
## [1] 243.6869
#For Bidders = 12
exp.value3 <- predict(second.mlrm, newdata = data.frame(Age = 100, Bidders = 12),interval = "confidence", level = .95)
exp.value3[2]-500
## [1] 271.1562

The Expected Profit per Clock that is 100 years old and has 10 Bidders, with 95% confidence is 210.7254?

The Expected Profit per Clock that is 100 years old and has 11 Bidders, with 95% confidence is 243.6869?

The Expected Profit per Clock that is 100 years old and has 12 Bidders, with 95% confidence is 271.1562?

d. You walk into an auction selling an antique 150 year old grandfather clock and find that there are 15 bidders (including yourself) participating in the auction. You are extremely keen in acquiring the clock. At least what amount should you bid for the clock, so that, you are 99% certain that nobody else can out-bid you?

For this, we need to predict a lower bound for a Predicted value of Y (\(\hat{Y}\)) for the given values of Age = 150 years and Bidders = 15.

MSE <- mean(second.mlrm$residuals^2)

beta_matrix <- as.matrix(second.mlrm$coefficients, ncol=1)
beta_matrix
##                         [,1]
## (Intercept)      322.7543531
## Age                0.8732878
## Bidders          -93.4099199
## I(Age * Bidders)   1.2978983
new_data <- as.matrix(c(1, 150,15, 2250))
new_data
##      [,1]
## [1,]    1
## [2,]  150
## [3,]   15
## [4,] 2250
prediction_price <- t(new_data) %*% beta_matrix
t.value <- qt(0.99,30) #df = 30
x <- cbind(clock.data$Age,clock.data$Bidders, clock.data$Age*clock.data$Bidders)
x_h <- matrix(c(150,15,2250), nrow=1,ncol = 3)

val <- x_h %*% solve( t(x) %*% x ) %*% t(x_h)
lower.bound.price <- prediction_price-t.value*sqrt(MSE)*sqrt(1+val)
lower.bound.price
##          [,1]
## [1,] 1750.339

From the above calculations, we can say that if we bid at a Price higher than 1750.339?, we can be 99% certain that no one else can out-bid us.