Part 1: Modeling house prices based on house age and location

  1. The data set being used describes prices of houses and the effect three different predictor variables have on the prices. There are 414 observations, and the predictor variables are house age, distance of Mass Rapid Transit, and stores nearby.

  2. Price per unit area is weakly and negatively correlated with house age, with a r-value of -0.21. Price per unit area is strongly and negatively correlated with, distance MRT, with a r-value of -0.67. Price per unit area is strongly and positively correlated with stores, with a r-value of 0.57. Stores is strongly and negatively correlated with distance MRT, with a r-value of -0.60.

##                       house_age distance_MRT      stores price_per_unit_area
## house_age            1.00000000   0.02562205  0.04959251          -0.2105670
## distance_MRT         0.02562205   1.00000000 -0.60251914          -0.6736129
## stores               0.04959251  -0.60251914  1.00000000           0.5710049
## price_per_unit_area -0.21056705  -0.67361286  0.57100491           1.0000000

## 
## Call:
## lm(formula = price_per_unit_area ~ house_age + distance_MRT + 
##     stores, data = real_estate)
## 
## Coefficients:
##  (Intercept)     house_age  distance_MRT        stores  
##    42.977286     -0.252856     -0.005379      1.297442
## 
## Call:
## lm(formula = price_per_unit_area ~ house_age + distance_MRT + 
##     stores, data = real_estate)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.304  -5.430  -1.738   4.325  77.315 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  42.977286   1.384542  31.041  < 2e-16 ***
## house_age    -0.252856   0.040105  -6.305 7.47e-10 ***
## distance_MRT -0.005379   0.000453 -11.874  < 2e-16 ***
## stores        1.297443   0.194290   6.678 7.91e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.251 on 410 degrees of freedom
## Multiple R-squared:  0.5411, Adjusted R-squared:  0.5377 
## F-statistic: 161.1 on 3 and 410 DF,  p-value: < 2.2e-16
  1. According to the linear model in question 3, all predictor variables (house age, distance MRT, and stores) are significant. The adjusted r-squared value is 0.5377.

  2. There appears to be some multicollinearity present between distance MRT and stores. This is implied by the fact that the r-value between the two is -0.60, which indicates a strong negative correlation.

  3. The price per unit of area of a 20 year-old house with 4 nearby stores, located 500 meters from the nearest MRT is about 40.420434 thousand dollars.

Part 2: Impact of seasonality and COVID on real estate in two US cities

  1. According to the plots above, seasonality plays a small, but clear factor in Miami, and an obvious factor in Columbus, when it comes to listing prices. In both plots, there are spikes at the middle of every year from 2017 to 2021. There are spikes in the middle because warmer weather in both Miami and Columbus is a more comfortable temperature to live in, so demand for houses goes up, resulting in price for those houses going up.

  2. Like mentioned above, the seasonality has an impact on both Miami and Columbus when it comes to listing price. In these graphs, you can see that it also plays a factor in median days on market. There are spikes in the graph around the months of June and December. While seasonality is a factor, COVID is an even larger factor. In the graph of median listing price, both lines feature a drastic increase since the start of the pandemic. In median days on market, both lines feature a drastic decrease since the start of the pandemic. However, the two lines differ when looking at active listing count. When the pandemic started, listing count in Miami dropped substantially, whereas listing count in Columbus stays basically the same.