dat <- read.csv(file.choose())
obs <- dat$y
x1 <- dat$x1
x2 <- dat$x2

a)

model <- lm(obs~x1+x2+x1:x2, data = dat)
summary(model)
## 
## Call:
## lm(formula = obs ~ x1 + x2 + x1:x2, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.0753 -3.6781  0.4395  3.1321  8.8448 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  12.50128    1.89347   6.602 1.92e-07 ***
## x1          256.73740   73.72914   3.482  0.00146 ** 
## x2            0.09879    0.01193   8.281 1.84e-09 ***
## x1:x2         0.76127    0.51026   1.492  0.14551    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.696 on 32 degrees of freedom
## Multiple R-squared:  0.8518, Adjusted R-squared:  0.8379 
## F-statistic: 61.31 on 3 and 32 DF,  p-value: 2.318e-13

Here is the model with interactions relating calthrate formation to sufactant and time.

b)

plot(model,2)

The normal plot suggests that the normality assumption may be upheld.

plot(model,1)

This plot suggests that the constant variance assumption may be violaited. so we need to dod a boxcox transformation.

library(MASS)
boxcox(model)

As 1 is in the range given in the boxcox plot, we don’t need to do any data transformation.

c)

model1 <- lm(obs~x1+x2, data = dat)
anova(model,model1)
## Analysis of Variance Table
## 
## Model 1: obs ~ x1 + x2 + x1:x2
## Model 2: obs ~ x1 + x2
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     32 705.66                           
## 2     33 754.74 -1   -49.084 2.2259 0.1455

Since the p-value is higher than 0.05, we can’t reject the null hypothesis, meaning that we can drop the interaction term.

d)

From the part c conclusion our model is as follows:

summary(model1)
## 
## Call:
## lm(formula = obs ~ x1 + x2, data = dat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -9.7716 -4.1656  0.0802  3.8323  8.3349 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.109e+01  1.669e+00   6.642 1.48e-07 ***
## x1          3.501e+02  3.968e+01   8.823 3.38e-10 ***
## x2          1.089e-01  9.983e-03  10.912 1.74e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.782 on 33 degrees of freedom
## Multiple R-squared:  0.8415, Adjusted R-squared:  0.8319 
## F-statistic:  87.6 on 2 and 33 DF,  p-value: 6.316e-14

Since the p-value of both terms, as well as the intercept, are lower than 0.05, all term are significant and this is our final model.