The market historical data set of real estate valuation are collected from Sindian Dist., New Taipei City, Taiwan. Donated on 8/17/2018. The purpose of this analysis is to determine the the house price of a unit area in New Taipei City through a linear regression model.
## # A tibble: 414 × 8
## No `X1 transaction date` X2 hous…¹ X3 di…² X4 nu…³ X5 la…⁴ X6 lo…⁵ Y hou…⁶
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 2013. 32 84.9 10 25.0 122. 37.9
## 2 2 2013. 19.5 307. 9 25.0 122. 42.2
## 3 3 2014. 13.3 562. 5 25.0 122. 47.3
## 4 4 2014. 13.3 562. 5 25.0 122. 54.8
## 5 5 2013. 5 391. 5 25.0 122. 43.1
## 6 6 2013. 7.1 2175. 3 25.0 122. 32.1
## 7 7 2013. 34.5 623. 7 25.0 122. 40.3
## 8 8 2013. 20.3 288. 6 25.0 122. 46.7
## 9 9 2014. 31.7 5512. 1 25.0 121. 18.8
## 10 10 2013. 17.9 1783. 3 25.0 122. 22.1
## # … with 404 more rows, and abbreviated variable names ¹`X2 house age`,
## # ²`X3 distance to the nearest MRT station`,
## # ³`X4 number of convenience stores`, ⁴`X5 latitude`, ⁵`X6 longitude`,
## # ⁶`Y house price of unit area`
I will rename columns
df1 <- df %>%
rename("Transaction_date" = "X1 transaction date",
"House_age" = "X2 house age",
"Distance_MRTstation" = "X3 distance to the nearest MRT station",
"Convenience_stores" = "X4 number of convenience stores",
"Latitude" = "X5 latitude",
"Longitude" = "X6 longitude",
"House_price" = "Y house price of unit area")
Now I will for empty rows
which(is.na(df1))
## integer(0)
There are no missing values.
summary(df1)
## No Transaction_date House_age Distance_MRTstation
## Min. : 1.0 Min. :2013 Min. : 0.000 Min. : 23.38
## 1st Qu.:104.2 1st Qu.:2013 1st Qu.: 9.025 1st Qu.: 289.32
## Median :207.5 Median :2013 Median :16.100 Median : 492.23
## Mean :207.5 Mean :2013 Mean :17.713 Mean :1083.89
## 3rd Qu.:310.8 3rd Qu.:2013 3rd Qu.:28.150 3rd Qu.:1454.28
## Max. :414.0 Max. :2014 Max. :43.800 Max. :6488.02
## Convenience_stores Latitude Longitude House_price
## Min. : 0.000 Min. :24.93 Min. :121.5 Min. : 7.60
## 1st Qu.: 1.000 1st Qu.:24.96 1st Qu.:121.5 1st Qu.: 27.70
## Median : 4.000 Median :24.97 Median :121.5 Median : 38.45
## Mean : 4.094 Mean :24.97 Mean :121.5 Mean : 37.98
## 3rd Qu.: 6.000 3rd Qu.:24.98 3rd Qu.:121.5 3rd Qu.: 46.60
## Max. :10.000 Max. :25.01 Max. :121.6 Max. :117.50
The market historical data set of real estate valuation. Can we predict or determine the price of a House in New Taipei City, Taiwan?
There are 414 instances in this data set.
This data is loaded in UC Invine website for any purpose, provided that the appropriate credit is given. UC Irvine
This is an observational study.
Data was donated on 8/17/2018 to UC Irvine and is available here: https://archive-beta.ics.uci.edu/dataset/477/real+estate+valuation+data+set
House price is our dependent variable in this regression model. This variable is quantitative.
There are 3 quantitative variables: House age Distance MRT Convenience stores
av_house <- mean(df1$`House_price`)
av_house
## [1] 37.98019
The average price for a house in New Taipei City, Taiwan is 37.98 units where each unit equals to 10000 New Taiwan Dollar/Ping.
df2 <- df1 %>%
select(-'No', -'Transaction_date')
library(corrplot)
corrplot(cor(df2),
method = "shade",
type = "full",
diag = TRUE,
tl.col = "black",
bg = "white",
title = "",
col = NULL)
There are some correlations between variables. Visualization implies that it would be necessary to define which variables are staying for further analysis before running any model.