Part I: Modeling house prices based on house age and location

Data source: https://www.kaggle.com/jashwanthram10/multilinear-regression/data?select=Real+estate.csv

This data set examines many different aspects of the real estate market. I will be comparing the listing prices, days on the market, and active listing counts between the city of Miami, FL and Columbus, OH. I will also be examining any correlations between the variables in the data set and seeing how COVID-19 has affected the real estate market.

Correlation matrix and Heat Map

##                       house_age distance_MRT      stores price_per_unit_area
## house_age            1.00000000   0.02562205  0.04959251          -0.2105670
## distance_MRT         0.02562205   1.00000000 -0.60251914          -0.6736129
## stores               0.04959251  -0.60251914  1.00000000           0.5710049
## price_per_unit_area -0.21056705  -0.67361286  0.57100491           1.0000000

Linear Model

This model predicts the price per unit based on the three predictor variables

## 
## Call:
## lm(formula = price_per_unit_area ~ house_age + distance_MRT + 
##     stores, data = Real_estate)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.304  -5.430  -1.738   4.325  77.315 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  42.977286   1.384542  31.041  < 2e-16 ***
## house_age    -0.252856   0.040105  -6.305 7.47e-10 ***
## distance_MRT -0.005379   0.000453 -11.874  < 2e-16 ***
## stores        1.297443   0.194290   6.678 7.91e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.251 on 410 degrees of freedom
## Multiple R-squared:  0.5411, Adjusted R-squared:  0.5377 
## F-statistic: 161.1 on 3 and 410 DF,  p-value: < 2.2e-16

This is a good quality model. A couple of important variable are the adjusted R-Squared(0.5377) and the P-value(2.2e-16). Finally, All of the variables have 3 stars, meaning they are at the highest possible significance. There is no multicollinearity because the independent variables in the data set are not strongly correlated.

##        1 
## 40.42037

This predicts that the Price per unit for a 20 year old house with 4 nearby schools and located 500 meters from the nearest MRT, would be about $40.42.

Part II: Impact od Seasonality and COVID on real estate in two US cities

Data source: https://www.realtor.com/research/data/

I will now be comparing the real estate market in Columbus Ohio, and Miami Florida.

Based on these graphs, seasonality seems to be a bigger factor in Columbus than it is in Miami. you can see in Columbus that the price goes down in the winter, and spikes every summer. In Miami, the price spikes somewhat in the winter, and goes down in the summer.

Finally, I will be examining the impact that the COVID pandemic has had on the real estate market, specifically in these two cities.