Develop a Model & Assess Predictor Significance
Step 1: Install and Load R packages
# if (!require(car)) install.packages("car") # heip us find the Levene's Test
#install.packages("readxl")
#install.packages("tidyer")
library(car)
## Warning: 套件 'car' 是用 R 版本 4.4.2 來建造的
## 載入需要的套件:carData
## Warning: 套件 'carData' 是用 R 版本 4.4.2 來建造的
library(readxl)
## Warning: 套件 'readxl' 是用 R 版本 4.4.2 來建造的
library(tidyr)
## Warning: 套件 'tidyr' 是用 R 版本 4.4.2 來建造的
library(ggplot2)
## Warning: 套件 'ggplot2' 是用 R 版本 4.4.2 來建造的
Step 2: Import & summarize the data
combined_data <- read_excel(file.choose())
str(combined_data)
## tibble [12 × 4] (S3: tbl_df/tbl/data.frame)
## $ Month : chr [1:12] "Jan" "Feb" "March" "April" ...
## $ AVG $ SFR : num [1:12] 1914174 1862336 2031186 2089503 2118960 ...
## $ AVG $ TH : num [1:12] 1198247 1155453 1216500 1280010 1310374 ...
## $ AVG $ Condo: num [1:12] 776460 775791 883392 893087 836370 ...
head(combined_data)
## # A tibble: 6 × 4
## Month `AVG $ SFR` `AVG $ TH` `AVG $ Condo`
## <chr> <dbl> <dbl> <dbl>
## 1 Jan 1914174 1198247 776460
## 2 Feb 1862336 1155453 775791
## 3 March 2031186 1216500 883392
## 4 April 2089503 1280010 893087
## 5 May 2118960 1310374 836370
## 6 June 2162057 1282466 927731
colnames(combined_data) <- c("Month", "Avg_SFR", "Avg_Condo", "Avg_TH")
summary(combined_data)
## Month Avg_SFR Avg_Condo Avg_TH
## Length:12 Min. :1862336 Min. :1155453 Min. :775791
## Class :character 1st Qu.:2002821 1st Qu.:1211937 1st Qu.:827934
## Mode :character Median :2087018 Median :1250368 Median :861735
## Mean :2065312 Mean :1246097 Mean :852054
## 3rd Qu.:2154952 3rd Qu.:1285062 3rd Qu.:884279
## Max. :2217444 Max. :1316528 Max. :927731
Data Description: A description of some of the features are presented in the table below.
Variable |Definition
---------------|-------------
1. Avg_SFR |Single-family homes average price per month in 2023
2. Avg_TH |Townhomes average price per month in 2023
3. Avg_Condo |Condos average price per month in 2023
Step 4: Data visualization
# Boxplot to compare price distribution
ggplot(combined_long, aes(x = Type, y = Avg_Price, fill = Type)) +
geom_boxplot() +
labs(title = "Price Distribution by Housing Type in 2023",
x = "Housing Type",
y = "Average Price") +
theme_minimal()

Interpertation: Visualizes the distribution of average prices across three housing types in 2023.
(1) Avg_SFR: The highest median price among the three housing types. A wider IQR indicates more variability in prices compared to the other categories.
(2) Avg_TH: The second-highest median price. A smaller IQR compared to Avg_SFR, suggesting less price variability.
(3) Avg_Condo: The lowest median price, a relatively small IQR, indicating consistent pricing within this category.
Step 6: Check Assumptions of ANOVA
6.1 Residual Normality: Shapiro-Wilk Test
shapiro_test <- shapiro.test(residuals(anova_result))
print(shapiro_test)
##
## Shapiro-Wilk normality test
##
## data: residuals(anova_result)
## W = 0.97851, p-value = 0.695
if (shapiro_test$p.value < 0.05) {
print("Residuals are NOT normally distributed (p < 0.05). Consider transformations or non-parametric methods.")
} else {
print("Residuals are normally distributed (p >= 0.05).")
}
## [1] "Residuals are normally distributed (p >= 0.05)."
Interpretation: Conducted to assess whether the residuals from the ANOVA model are normally distributed.
P-value > 0.05, the residuals do not significantly diviate from normality, no need for data transformation or alternative methods.
6.2 Homogeneity of Variance: Levene’s Test
levene_test <- leveneTest(Avg_Price ~ Type, data = combined_long)
print("Levene's Test for Homogeneity of Variances:")
## [1] "Levene's Test for Homogeneity of Variances:"
print(levene_test)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 4.4973 0.01874 *
## 33
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
if (levene_test$`Pr(>F)`[1] < 0.05) {
print("Variance is NOT equal across groups (p < 0.05). Consider alternatives like Welch's ANOVA or transformations.")
} else {
print("Variance is equal across groups (p >= 0.05). ANOVA assumption of equal variances is met.")
}
## [1] "Variance is NOT equal across groups (p < 0.05). Consider alternatives like Welch's ANOVA or transformations."
Interpretation: Test whether the variances across the types of housing are equal.
P-value = 0.01874 < 0.05, the variances are significantly different across the group.
Consider alternatives like Welch's ANOVA or transformations.
Step 7: Correlation Analysis
correlation_matrix <- cor(combined_data[, 2:4], use = "complete.obs") # Exclude missing values
print("Correlation Matrix:")
## [1] "Correlation Matrix:"
print(correlation_matrix)
## Avg_SFR Avg_Condo Avg_TH
## Avg_SFR 1.0000000 0.7563963 0.7844563
## Avg_Condo 0.7563963 1.0000000 0.5705040
## Avg_TH 0.7844563 0.5705040 1.0000000
Interpretation:The correlation matrix displays the pairwise Pearson correlation coefficients between the average prices of single-family homes, townhomes, and condos.
(1) Avg_SFR and Avg_TH: Correlation coefficient = 0.756. This indicates a strong positive correlation, suggesting that as the average price of single-family homes increases, the average price of townhomes also tends to increase, and vice versa.
(2) Avg_SFR and Avg_Condo: Correlation coefficient = 0.784. This indicates a strong positive correlation, suggesting that as the average price of single-family homes increases, the average price of condos also tends to increase, and vice versa.
(3) Avg_TH and Avg_Condo: Correlation coefficient = 0.571. This indicates a moderate positive correlation, showing a weaker but still meaningful relationship between townhome and condo prices.
The high correlations suggest that the average prices of these three housing types are interrelated.