Two Numeric Explanatory Variables
Load Packages and Dataset
Packages includes fst (for reading fst document), dplyr (data manipulation), ggplot2, broom. r dataset is the one on Taiwan’s property price.
Visualizing Three Numeric Variables
- 3D scatter plot - might suffer perspective issues and difficult to interpret
- 2D scatter plot with response as color
3D Visualization
For easier reading of 3D-plotting code, we choose to use Magrittr package to use pipe in the code for plotting. x = number of convenience stores, y = square-root of distance to mrt, z = price per square meter.
taiwan_real_estate %$%
scatter3D(n_convenience, sqrt(dist_to_mrt_m), price_twd_msq)2D plot with response as color
Using taiwan_real_estate, plot sqrt dist to MRT vs. # no. of conv stores, colored by price; Make it a scatter plot; Use the continuous viridis plasma color scale. Flat plot in this case provides easier interpretation.
ggplot(taiwan_real_estate, aes(n_convenience, sqrt(dist_to_mrt_m), color = price_twd_msq)) +
geom_point() +
scale_color_viridis_c(option = "plasma")Modelling Two Numeric Explanatory Variables
A Linear Regression
Model and predict the house price against the number of nearby convenience stores and the square-root of the distance to the nearest MRT station.
Packages including dplyr, tidyr, ggplot2 are utilised.
Two Numeric Explanatory Variables with no Interaction
mdl_price_vs_conv_dist <- lm(price_twd_msq ~ n_convenience + sqrt(dist_to_mrt_m), data = taiwan_real_estate)Create expanded grid of explanatory variables with expand_grid.
explanatory_data <- expand_grid(
n_convenience = 0:10,
dist_to_mrt_m = seq(from = 0, to = 80, by = 10)^2
)
explanatory_data## # A tibble: 99 × 2
## n_convenience dist_to_mrt_m
## <int> <dbl>
## 1 0 0
## 2 0 100
## 3 0 400
## 4 0 900
## 5 0 1600
## 6 0 2500
## 7 0 3600
## 8 0 4900
## 9 0 6400
## 10 1 0
## # … with 89 more rows
Add explanatory data to prediction data which is from the lm prediction.
prediction_data <- explanatory_data %>%
mutate(price_twd_msq = predict(mdl_price_vs_conv_dist, explanatory_data))Add predictions to plot.
ggplot(
taiwan_real_estate,
aes(n_convenience, sqrt(dist_to_mrt_m), color = price_twd_msq)
) +
geom_point() +
scale_color_viridis_c(option = "plasma")+
geom_point(data = prediction_data, color = "yellow", size = 3)Two Numeric Explanatory Variables with Interaction
mdl_price_vs_conv_dist <- lm(price_twd_msq ~ n_convenience * sqrt(dist_to_mrt_m), data = taiwan_real_estate)The rest process is the same as the one used in Two Numeric Explanatory Variables with no Interaction.
explanatory_data <- expand_grid(n_convenience = 0:10, dist_to_mrt_m = seq(0, 80, 10) ^ 2)
prediction_data <- explanatory_data %>%
mutate(price_twd_msq = predict(mdl_price_vs_conv_dist, explanatory_data))
ggplot(
taiwan_real_estate,
aes(n_convenience, sqrt(dist_to_mrt_m), color = price_twd_msq)
) +
geom_point() +
scale_color_viridis_c(option = "D") +
geom_point(data = prediction_data, color = "yellow", size = 3)