Based on the textbook’s content, we know that the area of the house is 0, which means it is a vacant lot, and there is no house on it, so there is no observation or discussion significance. Therefore, we first delete the data of the house with the area of 0 from the data set.
index <- house$area > 0
house <- house[ index, ]
summary(house$age)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1896 1979 1995 1990 2006 2015 221
## Warning: Removed 232 rows containing missing values (geom_point).

It’s intuitive to see that newer houses also seem to have a higher price.
##
## Call:
## lm(formula = price ~ age, data = house)
##
## Residuals:
## Min 1Q Median 3Q Max
## -387509 -142729 -37316 103754 598689
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4004422.1 1224810.4 -3.269 0.001234 **
## age 2199.7 615.7 3.573 0.000426 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 216100 on 243 degrees of freedom
## (因为不存在,232个观察量被删除了)
## Multiple R-squared: 0.0499, Adjusted R-squared: 0.04599
## F-statistic: 12.76 on 1 and 243 DF, p-value: 0.0004261
y=-4004422.1 + 2100.7x + e
## Warning: Removed 232 rows containing non-finite values (stat_smooth).
## Warning: Removed 232 rows containing missing values (geom_point).

## Warning: Removed 132 rows containing missing values (geom_point).

##
## Call:
## lm(formula = price ~ taxes, data = house)
##
## Residuals:
## Min 1Q Median 3Q Max
## -408214 -112316 -28914 67032 709869
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 95838.413 20404.245 4.697 3.83e-06 ***
## taxes 103.164 7.018 14.701 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 165400 on 343 degrees of freedom
## (因为不存在,132个观察量被删除了)
## Multiple R-squared: 0.3865, Adjusted R-squared: 0.3847
## F-statistic: 216.1 on 1 and 343 DF, p-value: < 2.2e-16
y= 95838.41 + 103.16x + e
## Warning: Removed 132 rows containing non-finite values (stat_smooth).
## Warning: Removed 132 rows containing missing values (geom_point).

## Warning: Removed 104 rows containing missing values (geom_point).

##
## Call:
## lm(formula = price ~ bed, data = house)
##
## Residuals:
## Min 1Q Median 3Q Max
## -330223 -145207 -45739 110177 640245
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 204771 29228 7.006 1.16e-11 ***
## bed 54984 9370 5.868 9.79e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 201200 on 371 degrees of freedom
## (因为不存在,104个观察量被删除了)
## Multiple R-squared: 0.08493, Adjusted R-squared: 0.08246
## F-statistic: 34.43 on 1 and 371 DF, p-value: 9.793e-09
## Warning: Removed 104 rows containing non-finite values (stat_smooth).
## Warning: Removed 104 rows containing missing values (geom_point).

y= 204771 + 54984x + e
We want to explore the relationship between multiple variables further. We feel interested in the relationship between the two variables: age and number of bedrooms, and house price. We define a large unit as having more than three bedrooms. And assign with house$bigUnit. Then, by fitting the multi-far linear regression line, the following results are obtained:
house$bigunit <- factor( ifelse( house$bed > 3, 1, 0 ) )
m2 <- lm( formula = price ~ bigunit + age + age * bigunit,
data = house )
ggplot( data = house, mapping = aes( x = age, y = price, colour = bigunit ) ) +
geom_point( alpha = .4 ) +
geom_smooth( method = 'lm', se = FALSE )
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 232 rows containing non-finite values (stat_smooth).
## Warning: Removed 232 rows containing missing values (geom_point).

Thank you!