Introduction

Global housing crisis in 2025 has been a large topic of concern
Definition of key terms:
- Dwellings: Buildings that are primarily designed for long-term residential purposes
- Housing stress: housing costs that exceed 30% of the average income for low-income households
- Housing affordability: Relationship between housing costs and household income, can vary between locations
- Affordable housing: housing provided at discounted market rates, typically targeted towards households that are part of the low-middle income class.
- Social housing: Government or community managed housing typically provided to those who are unable to access affordable, adequate housing in the market.
- Overcrowding: The number of residents that live within the dwelling exceeding the designed capacity due to affordability challenges or housing shortages.

Introduction cont.

Key findings from Australia’s National Housing Supply and Affordability Council include:
- Rises in housing prices and rent outpaced the rise in the median household income
- Housing affordability deteriorating
- Supply of housing is near its lowest levels in a decade
Key findings from CBRE Investment Management on the state of housing in the United States include:
- Home prices reaching historical highs
- Income required to buy a home has doubled since 2019
- Key problem is the difference in supply and demand of houses in the US
  - Not enough housing, wrong types of houses being supplied for current demand
- Regulatory barriers include:
  - Local zoning regulations
  - Higher cost of capital, goods and labour
  - Rising interest rates, which significantly decreased development activity despite strong consumer demand for housing

Problem Statement

“How do the different variables (i.e. property prices, furnishing statuses, salaries, etc.) interact with purchasing prices of properties?
Statistics will be employed to:
- Explore the average property prices across different countries
- Explore the relationship between monthly salary and property prices
- Explore the relationship between property price and the number of previous owners
- Explore the relationship between property price and buyer decisions
- Explore the relationship between furnishing status and buyer decisions

Data

Explain how you collected your data: I went to Kaggle.com and searched for a recently updated housing dataset.
Open Data Reference: https://www.kaggle.com/datasets/mohankrishnathalla/global-house-purchase-decision-dataset
List of important variables in the dataset:
- Property_type describes the type of the developed property, such as a studio, an apartment, a townhouse, etc.
- Furnishing_status describes the furninshing condition of the developed property, such as unfurnished or furnished
- The price describes the price of the property in the local currency
- Monthly_expenses describes the monthly expenses of the customer that is looking to buy the property
- Customer_salary describes the monthly salary of the customer looking to buy the property

Data cont.

Preprocessing done to the data set:
- Converted the decision variable into a factor, with 0 being No and 1 being Yes
  - This helps with plotting, as well as treating purchase decision as a numerical outcome
- Converted all dollar values from local currency to USD for easier comparison

ds <- read.csv("global_house_purchase_dataset.csv")
ds$decision <- ds$decision %>% factor(levels = c(0,1), labels = c("No", "Yes"))
ds$furnishing_status <- ds$furnishing_status %>% factor(levels = c("Unfurnished", "Semi-Furnished", "Fully-Furnished"), ordered=TRUE)
ds_filtered <- subset(ds, !country %in% c("Brazil", "China", "India", "Germany","Japan", "South Africa", "UAE"))
currency_pairs <- c("AUDUSD=X", "CADUSD=X", "EURUSD=X", "SGDUSD=X", "GBPUSD=X")

exchange_data <- getQuote(currency_pairs)
exchange_rates <- exchange_data$Last

currency_codes <- c("AUD", "CAD","EUR", "SGD", "GBP")
names(exchange_rates) <- currency_codes

exchange_rates <- c("USD" = 1, exchange_rates)
country_currency <- c("Australia" = "AUD", "Canada" = "CAD", "France" = "EUR","Singapore" = "SGD", "UK" = "GBP", "USA" = "USD")
ds_filtered$currency <- country_currency[ds_filtered$country]
ds_filtered$price_usd <- ds_filtered$price * exchange_rates[ds_filtered$currency]
ds_filtered$salary_usd <- ds_filtered$customer_salary*exchange_rates[ds_filtered$currency]

Descriptive Statistics and Visualisations

Summarise the important variables in your investigation.
Use visualisation to highlight interesting features of the data and tell the overall story.

country_summary <- ds_filtered %>%
  group_by(country) %>%
  summarise(avg_price_usd = mean(price_usd, na.rm = TRUE),
    median_price_usd = median(price_usd, na.rm = TRUE),
    avg_salary_usd = mean(salary_usd, na.rm = TRUE))
knitr::kable(country_summary)

country	avg_price_usd	median_price_usd	avg_salary_usd
Australia	665715.5	666912.2	35767.41
Canada	794184.8	793537.5	39405.12
France	1562263.2	1567218.1	63955.33
Singapore	1726977.8	1716941.5	42327.34
UK	1710936.8	1714401.0	73586.92
USA	1603145.7	1600616.0	54950.75

Descriptive Statistics and Visualisations cont.

# Boxplot: Price comparison by Country
ggplot(ds_filtered, aes(x = country, y = price_usd, fill = country)) +
  geom_boxplot(fill = "white", color = "black") +
  xlab("Country") +  ylab("Property Price (USD)") +  theme_classic() +
  labs(title = "Boxplot of Property Prices by Country") +
  scale_y_continuous(labels = dollar) +  
  theme(legend.position = "none", axis.text.x = element_text(angle = 90, size = 10),
    axis.title.x = element_text(size = 12), axis.title.y = element_text(size = 12),
    plot.title = element_text(size = 16))

The boxplot below describes the prices of properties across the different countries being analysed in the data set.
Looking at the boxplot, the median property price (solid black line) is higher in countries like Singapore, the UK, US, and France compared to countries like Australia and Canada.

Descriptive Statistics and Visualisations cont.

ggplot(ds_filtered, aes(x = country, fill = decision)) +
  geom_bar(position = "dodge") + labs(title = "Decision to Buy by Country", 
  x = "Country", y = "Count of Decisions", fill = "Decision") + theme_minimal()

The bar plot describes the difference in each country’s buyer decisions
It can be seen that buyers in Singapore and the US are less likely to commit to buying a property
There are the largest number of buyers in the US who ended up backtracking on their decision to buy a property

Descriptive Statistics and visualisations cont.

property_counts <- ds_filtered %>% group_by(country, constructed_year, property_type) %>% summarise(count = n()) 
filtered_counts <- property_counts %>% filter(property_type %in% c("Apartment", "Independent House"))
ggplot(filtered_counts, aes(x = constructed_year, y = count, colour = property_type)) +
  geom_line(size=0.75) + facet_wrap(~ country)+
  labs(title = "Number of apartments and townhouses built by year in each country",
       x = "Constructed Year", y = "Number of Properties", color = "Property Type") + theme_minimal() + 
  theme(legend.text = element_text(size = 8), legend.title = element_text(size = 9), legend.position = "bottom")

Based on the chart above, it can be seen that over years, some countries (i.e. Canada, France, UK) are building more apartments and less independent houses
- This could indicate that the population of people who are moving into the main city area to live are getting bigger
Other countries like Australia, Singapore, and the US are showing a slow-down in the building of both property types
- This could indicate broader issues within their economies, such as an increase in the cost of building materials in general, or regulatory issues

Hypothesis testing: Welch 2-sample t-test

yes_prices <- ds_filtered$price[ds_filtered$decision == "Yes"]
no_prices <- ds_filtered$price[ds_filtered$decision == "No"]

par(mfrow=c(1,2))
yes_prices %>% qqnorm()
no_prices %>% qqnorm()

Using qqnorm() function to check for normality, both lines show an s-curve, indicating that normality assumptions are violated
- However, since the sample size in the data set is large ( > 30), sampling distributions will approximate a normal distribution

Welch 2-sample t-test cont.

leveneTest(price ~ decision, data = ds_filtered)

## Levene's Test for Homogeneity of Variance (center = median)
##          Df F value    Pr(>F)    
## group     1  225.64 < 2.2e-16 ***
##       92441                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

t.test(yes_prices, no_prices)

## 
##  Welch Two Sample t-test
## 
## data:  yes_prices and no_prices
## t = -23.743, df = 39214, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -160893.0 -136354.8
## sample estimates:
## mean of x mean of y 
##   1319175   1467799

Using Levene Test to check for homogeneity of variances, results show that the homogeneity of variances is violated
- the p-value reported is < 2.2e-16, therefore is statistically significant
The results of the Welch 2-sample t-test show that the mean property price of those who ended up purchasing houses is 1,391,175, whereas the mean property price of those who did not buy a house is 1,467,799.
The t-value is -23.743, a degree of freedom of 39214, and a p-value of < 2.2e-16
Results of the Welch 2-sample t-test are statistically significant, indicating that there is a strong evidence of a difference in house prices between the 2 decision groups
Properties that buyers buy is on average cheaper than properties that buyers decided not to buy

Regression analysis: One-way ANOVA test

ds_filtered$furnishing_status <- as.factor(ds_filtered$furnishing_status)

# Run ANOVA
anova_result <- aov(price ~ furnishing_status, data = ds_filtered)
par(mfrow=c(2,2))
plot(anova_result)

summary(anova_result)

##                      Df    Sum Sq   Mean Sq F value Pr(>F)
## furnishing_status     2 2.453e+12 1.227e+12   1.698  0.183
## Residuals         92440 6.679e+16 7.225e+11

Categorical association

Statement: Does furnishing status influence the buying decision of a given property?
Null hypothesis: There is no association between furnishing status and the buying decision
Alternative: There is an association between the furnishing status and buying decision

furnishing_decision_table <- table(ds_filtered$furnishing_status, ds_filtered$decision)
chi_result <- chisq.test(furnishing_decision_table)
chi_result

## 
##  Pearson's Chi-squared test
## 
## data:  furnishing_decision_table
## X-squared = 4.81, df = 2, p-value = 0.09026

p-value: 0.09026 (> 0.05) therefore we fail to reject the null hypothesis
Conclusion: Results of chi-squared tests are not statistically significant. There is therefore not enough evidence to conclude that furnishing status influences the buying decision of a property.

Categorical association cont.

Statement: Do buyers in certain countries have higher purchase rates than others?
Null hypothesis: There is no association between a buyer’s country and their purhase decisions
Alternative: There is an association

country_decision_table <- table(ds_filtered$country, ds_filtered$decision)
chi_result <- chisq.test(country_decision_table)
chi_result

## 
##  Pearson's Chi-squared test
## 
## data:  country_decision_table
## X-squared = 487.55, df = 5, p-value < 2.2e-16

p-value: 2.2e-16 (< 0.05), therefore we reject the null hypothesis
Conclusion: Results of the chi-squared tests are statistically significant. There is therefore enough evidence to conclude that a buyer’s purchase rates differ greatly by country.

Categorical association cont.

Statement: Are certain property types more likely to be furnished or not?
Null hypothesis: There is no association between the furnishing status and property types
Alternative: There is an association

property_furnishing_table <- table(ds_filtered$property_type, ds_filtered$furnishing_status)
chi_result <- chisq.test(property_furnishing_table)
chi_result

## 
##  Pearson's Chi-squared test
## 
## data:  property_furnishing_table
## X-squared = 7.8341, df = 10, p-value = 0.645

p-value: 0.645 (> 0.05), fail to reject the null hypothesis
Conclusion: Results of the chi-squared tests are not statistically significant. There is therefore not enough evidence to prove that a given property type has a higher chance of being furnished over another.

An analysis on global housing market trends

Focusing specifically on Australia, Canada, France, Singapore, UK and USA

RPubs link information

Introduction

Introduction cont.

Problem Statement

Data

Data cont.

Descriptive Statistics and Visualisations

Descriptive Statistics and Visualisations cont.

Descriptive Statistics and Visualisations cont.

Descriptive Statistics and visualisations cont.

Hypothesis testing: Welch 2-sample t-test

Welch 2-sample t-test cont.

Regression analysis: One-way ANOVA test

Categorical association

Categorical association cont.

Categorical association cont.

Discussion

References