R Lab Session: Week 2

Multiple linear Regression

###########################
# READING IN DATA and MLR
##########################

plot <- read.csv('multreg (1).csv')
####################################
# MULTIPLE LINEAR REGRESSION MODEL
####################################

plot$Pool <- as.factor(plot$Pool)
fit <- lm(Price ~ PlotSize * FloorArea + Trees + Distance + Pool, 
          data=plot)

summary(fit)

Call:
lm(formula = Price ~ PlotSize * FloorArea + Trees + Distance + 
    Pool, data = plot)

Residuals:
     Min       1Q   Median       3Q      Max 
-2236.21 -1017.96   -63.39   961.08  2746.10 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)   
(Intercept)         4580.868   3955.676   1.158   0.2520   
PlotSize           -1089.536   1133.065  -0.962   0.3406   
FloorArea             -4.308      7.508  -0.574   0.5686   
Trees                 20.218      7.373   2.742   0.0083 **
Distance             -28.877     18.441  -1.566   0.1233   
PoolYes              556.901    331.834   1.678   0.0992 . 
PlotSize:FloorArea     1.889      1.899   0.995   0.3245   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1240 on 53 degrees of freedom
Multiple R-squared:  0.3114,    Adjusted R-squared:  0.2335 
F-statistic: 3.996 on 6 and 53 DF,  p-value: 0.002263
confint(fit)
                          2.5 %       97.5 %
(Intercept)        -3353.213721 12514.949265
PlotSize           -3362.176657  1183.104428
FloorArea            -19.367116    10.751992
Trees                  5.430623    35.005380
Distance             -65.864202     8.110505
PoolYes             -108.673910  1222.475267
PlotSize:FloorArea    -1.920375     5.697525
########################
# CORRELATION MATRIX
########################

plot_numeric <- plot[, c(1,2,3,4,5)]
cor(plot_numeric)
               Price   PlotSize  FloorArea      Trees    Distance
Price      1.0000000  0.2999366  0.3777987 0.38911228 -0.23047264
PlotSize   0.2999366  1.0000000  0.8260654 0.28226271 -0.19443280
FloorArea  0.3777987  0.8260654  1.0000000 0.36211823 -0.19657973
Trees      0.3891123  0.2822627  0.3621182 1.00000000  0.07952912
Distance  -0.2304726 -0.1944328 -0.1965797 0.07952912  1.00000000

Question 1

What is the estimated slope regression coefficient related to PlotSize? [answer to 2 decimal places]

\(-1089.54\)

Question 2

What would the intercept of the estimated regression model equation be for properties without a pool? [answer to 2 decimal places]

fit_numeric <- lm(Price ~ PlotSize * FloorArea + Trees + Distance, 
          data=plot)
summary(fit_numeric)

Call:
lm(formula = Price ~ PlotSize * FloorArea + Trees + Distance, 
    data = plot)

Residuals:
    Min      1Q  Median      3Q     Max 
-1866.4 -1050.8  -139.0   916.7  2507.1 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)  
(Intercept)         4892.972   4017.212   1.218   0.2285  
PlotSize           -1255.629   1147.562  -1.094   0.2787  
FloorArea             -3.803      7.627  -0.499   0.6201  
Trees                 18.636      7.434   2.507   0.0152 *
Distance             -29.534     18.744  -1.576   0.1210  
PlotSize:FloorArea     2.073      1.927   1.075   0.2870  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1261 on 54 degrees of freedom
Multiple R-squared:  0.2749,    Adjusted R-squared:  0.2077 
F-statistic: 4.094 on 5 and 54 DF,  p-value: 0.003182

\(4892.97\)

Question 3

What is the estimated slope regression coefficient related to Distance? [answer to 2 decimal places]

\(-28.88\)

Question 4

What would the intercept of the estimated regression model equation be for properties with a pool?

\(4580.87\)

Question 5

What percentage of variation in Price is explained by the variation in the independent variables, without making any sort of adjustment for the model complexity? (Do not add a percentage % symbol, just provide the numeric value).

\(31.14\)

Question 6

What percentage of variation in Price is explained by the variation in the independent variables, when adjusting for model complexity? (Do not add a percentage % symbol, just provide the numeric value).

\(23.35\)

Question 7

If all other independent variables remain unchanged, the estimated Price (in thousands of rands) decreases by ___________ for every additional hectare in the size of the plot. [2 decimal points]

\(-1089.54\)

Question 8

If all other independent variables remain unchanged, the estimated Price (in thousands of rands) increases by __________ for every additional tree present on the plot. [2 Decimal places]

\(20.22\)

Question 9

What is the value of the test statistic related to the hypothesis test for checking the overall significance of this fitted multiple linear regression model? Provide the answer with three decimal places.

\(3.996\)

Question 10

How many independent variables could be considered to have a significant linear relationship with the outcome of interest (Price), assuming a 5% level of significance?

\(1\)

Question 11

What is the value of the test statistic related to the hypothesis test that determines whether a significant linear relationship exists between Price and Trees? [2 Decimal points]

\(2.74\)

Question 12

The upper bound of the 95% confidence interval related to the estimated slope regression coefficient for Trees is ____________. [2 Decimal places]

\(35.01\)

Question 13

The lower bound of the 95% confidence interval related to the estimated slope regression coefficient for the interaction term (between PlotSize and FloorArea) is _____________.[2 Decimal places]

\(-1.92\)

Question 14

Within the context of performing an appropriate hypothesis test, there is some evidence to suggest that the model is overall significant.

  • True

  • False

True

Question 15

Referring to an appropriate value, there is some evidence to suggest that this model is a very good fit.

  • True

  • False

\(\text{RSE}=1240\). False.

Question 16

The unadjusted coefficient of determination never decreases in value when a new independent variable is added to the multiple linear regression model.

  • True

  • False

True

Question 17

  • Linear relationship

  • Errors are normally distributed with a mean of \(0\)

  • Constant error variance

  • Errors are independent