###########################
# READING IN DATA and MLR
##########################
plot <- read.csv('multreg (1).csv')R Lab Session: Week 2
Multiple linear Regression
####################################
# MULTIPLE LINEAR REGRESSION MODEL
####################################
plot$Pool <- as.factor(plot$Pool)
fit <- lm(Price ~ PlotSize * FloorArea + Trees + Distance + Pool,
data=plot)
summary(fit)
Call:
lm(formula = Price ~ PlotSize * FloorArea + Trees + Distance +
Pool, data = plot)
Residuals:
Min 1Q Median 3Q Max
-2236.21 -1017.96 -63.39 961.08 2746.10
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4580.868 3955.676 1.158 0.2520
PlotSize -1089.536 1133.065 -0.962 0.3406
FloorArea -4.308 7.508 -0.574 0.5686
Trees 20.218 7.373 2.742 0.0083 **
Distance -28.877 18.441 -1.566 0.1233
PoolYes 556.901 331.834 1.678 0.0992 .
PlotSize:FloorArea 1.889 1.899 0.995 0.3245
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1240 on 53 degrees of freedom
Multiple R-squared: 0.3114, Adjusted R-squared: 0.2335
F-statistic: 3.996 on 6 and 53 DF, p-value: 0.002263
confint(fit) 2.5 % 97.5 %
(Intercept) -3353.213721 12514.949265
PlotSize -3362.176657 1183.104428
FloorArea -19.367116 10.751992
Trees 5.430623 35.005380
Distance -65.864202 8.110505
PoolYes -108.673910 1222.475267
PlotSize:FloorArea -1.920375 5.697525
########################
# CORRELATION MATRIX
########################
plot_numeric <- plot[, c(1,2,3,4,5)]
cor(plot_numeric) Price PlotSize FloorArea Trees Distance
Price 1.0000000 0.2999366 0.3777987 0.38911228 -0.23047264
PlotSize 0.2999366 1.0000000 0.8260654 0.28226271 -0.19443280
FloorArea 0.3777987 0.8260654 1.0000000 0.36211823 -0.19657973
Trees 0.3891123 0.2822627 0.3621182 1.00000000 0.07952912
Distance -0.2304726 -0.1944328 -0.1965797 0.07952912 1.00000000
Question 1
What is the estimated slope regression coefficient related to PlotSize? [answer to 2 decimal places]
\(-1089.54\)
Question 2
What would the intercept of the estimated regression model equation be for properties without a pool? [answer to 2 decimal places]
fit_numeric <- lm(Price ~ PlotSize * FloorArea + Trees + Distance,
data=plot)
summary(fit_numeric)
Call:
lm(formula = Price ~ PlotSize * FloorArea + Trees + Distance,
data = plot)
Residuals:
Min 1Q Median 3Q Max
-1866.4 -1050.8 -139.0 916.7 2507.1
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4892.972 4017.212 1.218 0.2285
PlotSize -1255.629 1147.562 -1.094 0.2787
FloorArea -3.803 7.627 -0.499 0.6201
Trees 18.636 7.434 2.507 0.0152 *
Distance -29.534 18.744 -1.576 0.1210
PlotSize:FloorArea 2.073 1.927 1.075 0.2870
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 1261 on 54 degrees of freedom
Multiple R-squared: 0.2749, Adjusted R-squared: 0.2077
F-statistic: 4.094 on 5 and 54 DF, p-value: 0.003182
\(4892.97\)
Question 3
What is the estimated slope regression coefficient related to Distance? [answer to 2 decimal places]
\(-28.88\)
Question 4
What would the intercept of the estimated regression model equation be for properties with a pool?
\(4580.87\)
Question 5
What percentage of variation in Price is explained by the variation in the independent variables, without making any sort of adjustment for the model complexity? (Do not add a percentage % symbol, just provide the numeric value).
\(31.14\)
Question 6
What percentage of variation in Price is explained by the variation in the independent variables, when adjusting for model complexity? (Do not add a percentage % symbol, just provide the numeric value).
\(23.35\)
Question 7
If all other independent variables remain unchanged, the estimated Price (in thousands of rands) decreases by ___________ for every additional hectare in the size of the plot. [2 decimal points]
\(-1089.54\)
Question 8
If all other independent variables remain unchanged, the estimated Price (in thousands of rands) increases by __________ for every additional tree present on the plot. [2 Decimal places]
\(20.22\)
Question 9
What is the value of the test statistic related to the hypothesis test for checking the overall significance of this fitted multiple linear regression model? Provide the answer with three decimal places.
\(3.996\)
Question 10
How many independent variables could be considered to have a significant linear relationship with the outcome of interest (Price), assuming a 5% level of significance?
\(1\)
Question 11
What is the value of the test statistic related to the hypothesis test that determines whether a significant linear relationship exists between Price and Trees? [2 Decimal points]
\(2.74\)
Question 12
The upper bound of the 95% confidence interval related to the estimated slope regression coefficient for Trees is ____________. [2 Decimal places]
\(35.01\)
Question 13
The lower bound of the 95% confidence interval related to the estimated slope regression coefficient for the interaction term (between PlotSize and FloorArea) is _____________.[2 Decimal places]
\(-1.92\)
Question 14
Within the context of performing an appropriate hypothesis test, there is some evidence to suggest that the model is overall significant.
True
False
True
Question 15
Referring to an appropriate value, there is some evidence to suggest that this model is a very good fit.
True
False
\(\text{RSE}=1240\). False.
Question 16
The unadjusted coefficient of determination never decreases in value when a new independent variable is added to the multiple linear regression model.
True
False
True
Question 17
Linear relationship
Errors are normally distributed with a mean of \(0\)
Constant error variance
Errors are independent