Question 1: regression model with 8-variable
##
## Call:
## lm(formula = logC ~ PT + CT + logN + logS + D + NE + logT2 +
## PR, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.29131 -0.09935 0.02178 0.09351 0.24800
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -15.22561 3.37328 -4.514 0.000156 ***
## PT -0.21572 0.11451 -1.884 0.072280 .
## CT 0.11462 0.06227 1.841 0.078631 .
## logN -0.07873 0.04249 -1.853 0.076751 .
## logS 0.68246 0.12805 5.330 2.07e-05 ***
## D 0.22722 0.04394 5.171 3.06e-05 ***
## NE 0.25895 0.07379 3.509 0.001886 **
## logT2 0.30186 0.22833 1.322 0.199155
## PR -0.09336 0.07022 -1.330 0.196709
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1578 on 23 degrees of freedom
## Multiple R-squared: 0.8706, Adjusted R-squared: 0.8256
## F-statistic: 19.34 on 8 and 23 DF, p-value: 1.709e-08
From the result above we can see that the estimate and standard error for the variable PT is -0.21572 and 0.11451 respectively.
And for this model, we can calculate the reduced cost percentage of nuclear plants with partial turnkey guarantee compared with those that do not have this guarantee. The result is 19.4% as shown below.
x <- regress8$coefficients[2]
y <- 1 - exp(x)
names(y) <- "reduced cost (by %)"
y
## reduced cost (by %)
## 0.1940423
Question 2: regression model with 6-variable
##
## Call:
## lm(formula = logC ~ PT + CT + logN + logS + D + NE, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32721 -0.07620 0.02920 0.08115 0.28946
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -13.26031 3.13950 -4.224 0.000278 ***
## PT -0.22610 0.11355 -1.991 0.057490 .
## CT 0.14039 0.06042 2.323 0.028582 *
## logN -0.08758 0.04147 -2.112 0.044891 *
## logS 0.72341 0.11882 6.088 2.31e-06 ***
## D 0.21241 0.04326 4.910 4.70e-05 ***
## NE 0.24902 0.07414 3.359 0.002510 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1592 on 25 degrees of freedom
## Multiple R-squared: 0.8569, Adjusted R-squared: 0.8225
## F-statistic: 24.95 on 6 and 25 DF, p-value: 2.058e-09
Question 3: make some plots
Plot 1: Residual VS. Fitted Value
Plot 2: Residual VS. logN
Plot 3: Residual VS. Normal Order Statistics
Question 4: some explanations for the plots
The first two plots check that the residuals are independent.
The qqplot checks for the normal distribution of the redisuals.
When the assumptions are met, the residuals against fitted values and explanatory variables (logN) should be random and cannot find a pattern in the plot; the qqplot should follow a linear relationship with no obvious outliers.
Question 5: regression model with 6-variable including interaction
##
## Call:
## lm(formula = logC ~ PT + CT + logN + Z + D + NE + Z:PT, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.32866 -0.05714 0.02067 0.07979 0.29282
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -13.08645 3.23858 -4.041 0.000475 ***
## PT -2.18759 5.85357 -0.374 0.711895
## CT 0.13998 0.06154 2.275 0.032156 *
## logN -0.08683 0.04229 -2.053 0.051102 .
## Z 0.71761 0.12222 5.872 4.68e-06 ***
## D 0.21044 0.04444 4.735 8.14e-05 ***
## NE 0.24841 0.07551 3.290 0.003088 **
## PT:Z 0.29159 0.87002 0.335 0.740418
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1621 on 24 degrees of freedom
## Multiple R-squared: 0.8575, Adjusted R-squared: 0.816
## F-statistic: 20.64 on 7 and 24 DF, p-value: 1.033e-08
Question 6: a nicer looking table containing only the estimates and standard errors
| Estimate | Std. Error | |
|---|---|---|
| (Intercept) | -13.09 | 3.24 |
| PT | -2.19 | 5.85 |
| CT | 0.14 | 0.06 |
| logN | -0.09 | 0.04 |
| Z | 0.72 | 0.12 |
| D | 0.21 | 0.04 |
| NE | 0.25 | 0.08 |
| PT:Z | 0.29 | 0.87 |