Part I: Modeling house prices based on house age and location

## Rows: 414 Columns: 4
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (4): house_age, distance_MRT, stores, price_per_unit_area
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 414 x 4
##    house_age distance_MRT stores price_per_unit_area
##        <dbl>        <dbl>  <dbl>               <dbl>
##  1      32           84.9     10                37.9
##  2      19.5        307.       9                42.2
##  3      13.3        562.       5                47.3
##  4      13.3        562.       5                54.8
##  5       5          391.       5                43.1
##  6       7.1       2175.       3                32.1
##  7      34.5        623.       7                40.3
##  8      20.3        288.       6                46.7
##  9      31.7       5512.       1                18.8
## 10      17.9       1783.       3                22.1
## # ... with 404 more rows

Description

1- This data set comes from china and intends to model house prices based on many variables that can eventually impact house price. variables such as house age, distances from public transportation, and distances from stores or commercial centers are examine to see their effect on the price per unit area.

Correlation Matrix

##                       house_age distance_MRT      stores price_per_unit_area
## house_age            1.00000000   0.02562205  0.04959251          -0.2105670
## distance_MRT         0.02562205   1.00000000 -0.60251914          -0.6736129
## stores               0.04959251  -0.60251914  1.00000000           0.5710049
## price_per_unit_area -0.21056705  -0.67361286  0.57100491           1.0000000

2- The Heatmap below shows the degree of relationship between the price of the house and the predictor variables. the light the blue color, the stronger the relationship and the darker the blue, the weak the relationship between variables.

The dark blue between the distance RMT shows a very weak relationship with the price of the house. In others words, distance RMT does not affect the price of the house at all.

However, Age of the house shows a very weak relationship with the price per unit area. the older the house, there is a slightly decrease of the price.

Beside the age of the house, the number of the stores located in the area strongly impacts the price of the house. The greater the amount of stores in the area is, the up the prices goes. The less stores around the location are, the down the prices go.

Heat map

3 Linear Model

## 
## Call:
## lm(formula = price_per_unit_area ~ house_age + distance_MRT + 
##     stores, data = Real_estate)
## 
## Coefficients:
##  (Intercept)     house_age  distance_MRT        stores  
##    42.977286     -0.252856     -0.005379      1.297442

Predictions of the price per square unit

## 
## Call:
## lm(formula = price_per_unit_area ~ house_age + distance_MRT + 
##     stores, data = Real_estate)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.304  -5.430  -1.738   4.325  77.315 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  42.977286   1.384542  31.041  < 2e-16 ***
## house_age    -0.252856   0.040105  -6.305 7.47e-10 ***
## distance_MRT -0.005379   0.000453 -11.874  < 2e-16 ***
## stores        1.297443   0.194290   6.678 7.91e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.251 on 410 degrees of freedom
## Multiple R-squared:  0.5411, Adjusted R-squared:  0.5377 
## F-statistic: 161.1 on 3 and 410 DF,  p-value: < 2.2e-16

Model’s quality

4- This is a pretty good model; number of stores is very strong related to the price of the house with an Adjusted R-squared: 0.5377

5- There is a no multicollinearity.

The implications are such as stores and prices are strongly related. Whereas distance MRT and house_age are independent variables. They do not affect the price of the house . Only the stores impacts the price of the house.

6- Example of a prediction:

##        1 
## 40.42037

With a 20 year old house, 4 stores around, and located from 500m of the transportation zone, the price per unit of area will be 40.42037

Part II: Impact of seasonality and COVID on real estate in two US cities

1- This data set examines the impact of COVID at a certain period in the real estate in Columbus and Miami.

## Rows: 57771 Columns: 40
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr  (1): cbsa_title
## dbl (39): month_date_yyyymm, cbsa_code, HouseholdRank, median_listing_price,...
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 57,771 x 40
##    month_date_yyyymm cbsa_code cbsa_title         HouseholdRank median_listing_~
##                <dbl>     <dbl> <chr>                      <dbl>            <dbl>
##  1            202109     35620 new york-newark-j~             1          607500 
##  2            202109     31080 los angeles-long ~             2          967500 
##  3            202109     16980 chicago-napervill~             3          332450 
##  4            202109     19100 dallas-fort worth~             4          396480 
##  5            202109     26420 houston-the woodl~             5          363210 
##  6            202109     37980 philadelphia-camd~             6          321950.
##  7            202109     47900 washington-arling~             7          509500 
##  8            202109     33100 miami-fort lauder~             8          463495 
##  9            202109     12060 atlanta-sandy spr~             9          398000 
## 10            202109     14460 boston-cambridge-~            10          675000 
## # ... with 57,761 more rows, and 35 more variables:
## #   median_listing_price_mm <dbl>, median_listing_price_yy <dbl>,
## #   active_listing_count <dbl>, active_listing_count_mm <dbl>,
## #   active_listing_count_yy <dbl>, median_days_on_market <dbl>,
## #   median_days_on_market_mm <dbl>, median_days_on_market_yy <dbl>,
## #   new_listing_count <dbl>, new_listing_count_mm <dbl>,
## #   new_listing_count_yy <dbl>, price_increased_count <dbl>, ...

2- a time series for the median listing price in Columbus,OH

## # A tibble: 63 x 40
##    month_date_yyyymm cbsa_code cbsa_title   HouseholdRank median_listing_price
##                <dbl>     <dbl> <chr>                <dbl>                <dbl>
##  1            202109     18140 columbus, oh            31               289450
##  2            202108     18140 columbus, oh            31               299900
##  3            202107     18140 columbus, oh            31               305000
##  4            202106     18140 columbus, oh            31               299900
##  5            202105     18140 columbus, oh            31               308500
##  6            202104     18140 columbus, oh            31               314750
##  7            202103     18140 columbus, oh            31               324500
##  8            202102     18140 columbus, oh            31               324950
##  9            202101     18140 columbus, oh            31               307000
## 10            202012     18140 columbus, oh            31               306200
## # ... with 53 more rows, and 35 more variables: median_listing_price_mm <dbl>,
## #   median_listing_price_yy <dbl>, active_listing_count <dbl>,
## #   active_listing_count_mm <dbl>, active_listing_count_yy <dbl>,
## #   median_days_on_market <dbl>, median_days_on_market_mm <dbl>,
## #   median_days_on_market_yy <dbl>, new_listing_count <dbl>,
## #   new_listing_count_mm <dbl>, new_listing_count_yy <dbl>,
## #   price_increased_count <dbl>, price_increased_count_mm <dbl>, ...

3- a time series for the median listing price in Miami,FL

## # A tibble: 63 x 40
##    month_date_yyyymm cbsa_code cbsa_title         HouseholdRank median_listing_~
##                <dbl>     <dbl> <chr>                      <dbl>            <dbl>
##  1            202109     33100 miami-fort lauder~             8           463495
##  2            202108     33100 miami-fort lauder~             8           455750
##  3            202107     33100 miami-fort lauder~             8           450000
##  4            202106     33100 miami-fort lauder~             8           447000
##  5            202105     33100 miami-fort lauder~             8           425900
##  6            202104     33100 miami-fort lauder~             8           417950
##  7            202103     33100 miami-fort lauder~             8           402450
##  8            202102     33100 miami-fort lauder~             8           399450
##  9            202101     33100 miami-fort lauder~             8           400000
## 10            202012     33100 miami-fort lauder~             8           409000
## # ... with 53 more rows, and 35 more variables: median_listing_price_mm <dbl>,
## #   median_listing_price_yy <dbl>, active_listing_count <dbl>,
## #   active_listing_count_mm <dbl>, active_listing_count_yy <dbl>,
## #   median_days_on_market <dbl>, median_days_on_market_mm <dbl>,
## #   median_days_on_market_yy <dbl>, new_listing_count <dbl>,
## #   new_listing_count_mm <dbl>, new_listing_count_yy <dbl>,
## #   price_increased_count <dbl>, price_increased_count_mm <dbl>, ...

4- Columbus and Miami median listing prices

Both graphs show a continuous increase of median prices in Columbus and Miami real estate market. However, Columbus graph shows a big fluctuations that seem to bi caused by the two seasons: the fall and the summer. during the cold weather, the prices of the houses go down each year and during the summer, real estate prices go up.

In the other hand, Miami with its warm weather because there is no winter in the the state does not undergo big changes. prices and sales continue slightly increase during the year.

5 Comparison time series

6- COVID Period