## Rows: 414 Columns: 4
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (4): house_age, distance_MRT, stores, price_per_unit_area
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 414 x 4
## house_age distance_MRT stores price_per_unit_area
## <dbl> <dbl> <dbl> <dbl>
## 1 32 84.9 10 37.9
## 2 19.5 307. 9 42.2
## 3 13.3 562. 5 47.3
## 4 13.3 562. 5 54.8
## 5 5 391. 5 43.1
## 6 7.1 2175. 3 32.1
## 7 34.5 623. 7 40.3
## 8 20.3 288. 6 46.7
## 9 31.7 5512. 1 18.8
## 10 17.9 1783. 3 22.1
## # ... with 404 more rows
1- This data set comes from china and intends to model house prices based on many variables that can eventually impact house price. variables such as house age, distances from public transportation, and distances from stores or commercial centers are examine to see their effect on the price per unit area.
## house_age distance_MRT stores price_per_unit_area
## house_age 1.00000000 0.02562205 0.04959251 -0.2105670
## distance_MRT 0.02562205 1.00000000 -0.60251914 -0.6736129
## stores 0.04959251 -0.60251914 1.00000000 0.5710049
## price_per_unit_area -0.21056705 -0.67361286 0.57100491 1.0000000
2- The Heatmap below shows the degree of relationship between the price of the house and the predictor variables. the light the blue color, the stronger the relationship and the darker the blue, the weak the relationship between variables.
The dark blue between the distance RMT shows a very weak relationship with the price of the house. In others words, distance RMT does not affect the price of the house at all.
However, Age of the house shows a very weak relationship with the price per unit area. the older the house, there is a slightly decrease of the price.
Beside the age of the house, the number of the stores located in the area strongly impacts the price of the house. The greater the amount of stores in the area is, the up the prices goes. The less stores around the location are, the down the prices go.
##
## Call:
## lm(formula = price_per_unit_area ~ house_age + distance_MRT +
## stores, data = Real_estate)
##
## Coefficients:
## (Intercept) house_age distance_MRT stores
## 42.977286 -0.252856 -0.005379 1.297442
Predictions of the price per square unit
##
## Call:
## lm(formula = price_per_unit_area ~ house_age + distance_MRT +
## stores, data = Real_estate)
##
## Residuals:
## Min 1Q Median 3Q Max
## -37.304 -5.430 -1.738 4.325 77.315
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 42.977286 1.384542 31.041 < 2e-16 ***
## house_age -0.252856 0.040105 -6.305 7.47e-10 ***
## distance_MRT -0.005379 0.000453 -11.874 < 2e-16 ***
## stores 1.297443 0.194290 6.678 7.91e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.251 on 410 degrees of freedom
## Multiple R-squared: 0.5411, Adjusted R-squared: 0.5377
## F-statistic: 161.1 on 3 and 410 DF, p-value: < 2.2e-16
4- This is a pretty good model; number of stores is very strong related to the price of the house with an Adjusted R-squared: 0.5377
5- There is a no multicollinearity.
The implications are such as stores and prices are strongly related. Whereas distance MRT and house_age are independent variables. They do not affect the price of the house . Only the stores impacts the price of the house.
6- Example of a prediction:
## 1
## 40.42037
With a 20 year old house, 4 stores around, and located from 500m of the transportation zone, the price per unit of area will be 40.42037
1- This data set examines the impact of COVID at a certain period in the real estate in Columbus and Miami.
## Rows: 57771 Columns: 40
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## chr (1): cbsa_title
## dbl (39): month_date_yyyymm, cbsa_code, HouseholdRank, median_listing_price,...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
## # A tibble: 57,771 x 40
## month_date_yyyymm cbsa_code cbsa_title HouseholdRank median_listing_~
## <dbl> <dbl> <chr> <dbl> <dbl>
## 1 202109 35620 new york-newark-j~ 1 607500
## 2 202109 31080 los angeles-long ~ 2 967500
## 3 202109 16980 chicago-napervill~ 3 332450
## 4 202109 19100 dallas-fort worth~ 4 396480
## 5 202109 26420 houston-the woodl~ 5 363210
## 6 202109 37980 philadelphia-camd~ 6 321950.
## 7 202109 47900 washington-arling~ 7 509500
## 8 202109 33100 miami-fort lauder~ 8 463495
## 9 202109 12060 atlanta-sandy spr~ 9 398000
## 10 202109 14460 boston-cambridge-~ 10 675000
## # ... with 57,761 more rows, and 35 more variables:
## # median_listing_price_mm <dbl>, median_listing_price_yy <dbl>,
## # active_listing_count <dbl>, active_listing_count_mm <dbl>,
## # active_listing_count_yy <dbl>, median_days_on_market <dbl>,
## # median_days_on_market_mm <dbl>, median_days_on_market_yy <dbl>,
## # new_listing_count <dbl>, new_listing_count_mm <dbl>,
## # new_listing_count_yy <dbl>, price_increased_count <dbl>, ...
2- a time series for the median listing price in Columbus,OH
## # A tibble: 63 x 40
## month_date_yyyymm cbsa_code cbsa_title HouseholdRank median_listing_price
## <dbl> <dbl> <chr> <dbl> <dbl>
## 1 202109 18140 columbus, oh 31 289450
## 2 202108 18140 columbus, oh 31 299900
## 3 202107 18140 columbus, oh 31 305000
## 4 202106 18140 columbus, oh 31 299900
## 5 202105 18140 columbus, oh 31 308500
## 6 202104 18140 columbus, oh 31 314750
## 7 202103 18140 columbus, oh 31 324500
## 8 202102 18140 columbus, oh 31 324950
## 9 202101 18140 columbus, oh 31 307000
## 10 202012 18140 columbus, oh 31 306200
## # ... with 53 more rows, and 35 more variables: median_listing_price_mm <dbl>,
## # median_listing_price_yy <dbl>, active_listing_count <dbl>,
## # active_listing_count_mm <dbl>, active_listing_count_yy <dbl>,
## # median_days_on_market <dbl>, median_days_on_market_mm <dbl>,
## # median_days_on_market_yy <dbl>, new_listing_count <dbl>,
## # new_listing_count_mm <dbl>, new_listing_count_yy <dbl>,
## # price_increased_count <dbl>, price_increased_count_mm <dbl>, ...
3- a time series for the median listing price in Miami,FL
## # A tibble: 63 x 40
## month_date_yyyymm cbsa_code cbsa_title HouseholdRank median_listing_~
## <dbl> <dbl> <chr> <dbl> <dbl>
## 1 202109 33100 miami-fort lauder~ 8 463495
## 2 202108 33100 miami-fort lauder~ 8 455750
## 3 202107 33100 miami-fort lauder~ 8 450000
## 4 202106 33100 miami-fort lauder~ 8 447000
## 5 202105 33100 miami-fort lauder~ 8 425900
## 6 202104 33100 miami-fort lauder~ 8 417950
## 7 202103 33100 miami-fort lauder~ 8 402450
## 8 202102 33100 miami-fort lauder~ 8 399450
## 9 202101 33100 miami-fort lauder~ 8 400000
## 10 202012 33100 miami-fort lauder~ 8 409000
## # ... with 53 more rows, and 35 more variables: median_listing_price_mm <dbl>,
## # median_listing_price_yy <dbl>, active_listing_count <dbl>,
## # active_listing_count_mm <dbl>, active_listing_count_yy <dbl>,
## # median_days_on_market <dbl>, median_days_on_market_mm <dbl>,
## # median_days_on_market_yy <dbl>, new_listing_count <dbl>,
## # new_listing_count_mm <dbl>, new_listing_count_yy <dbl>,
## # price_increased_count <dbl>, price_increased_count_mm <dbl>, ...
4- Columbus and Miami median listing prices
Both graphs show a continuous increase of median prices in Columbus and Miami real estate market. However, Columbus graph shows a big fluctuations that seem to bi caused by the two seasons: the fall and the summer. during the cold weather, the prices of the houses go down each year and during the summer, real estate prices go up.
In the other hand, Miami with its warm weather because there is no winter in the the state does not undergo big changes. prices and sales continue slightly increase during the year.
6- COVID Period
Median listing prices graphs: since 2017and in different speeds in Columbus and Miami, real estate prices are going up maybe it is due to President Trump election.likewise, in Miami, the median price was almost stable to around $400,000 when in columbus the prices were tremendously going up from $200,000 to more than $300,000. however, with the pandemic, in 2021 in Columbus, prices started to decline while in Miami they are reaching the top exceeding $450,000.
Median day on the market: Columbus and Miami market function inversely. During the winter in Columbus, the median day of the house in the market increase and then decrease during the summer. Miami does have such a big changes but during the winter the median day of the house in the market slightly decrease and increase during the summer. Since 2summer 2020, both markets function in the same rhythm and the median day is continuously decreasing. may be there is not enough houses available and the demand is higher than the offer.
Active listing count: prior to the pandemic, in Miami more than 40,000 houses used to be listed on the market while Columbus has never passes 10,000 active houses listed. since summer 2020, Miami saw a drop of more than 20,000 houses while Columbus remains slightly stable with a small decrease of number of the houses listed.