Factors Influencing Real Estate Lead Conversion in Lagos

Author

Jadesola Ajose

Published

May 19, 2026

Code
df <- read.csv("raw_data/DA Data.csv", stringsAsFactors = FALSE)

names(df) <- tolower(names(df))
names(df) <- gsub("\\.", "_", names(df))
names(df) <- gsub(" ", "_", names(df))
names(df) <- gsub("-", "_", names(df))

df$inquiry_date <- as.Date(
  df$inquiry_date,
  tryFormats = c("%m/%d/%y", "%m/%d/%Y", "%Y-%m-%d")
)

df$lead_source <- as.factor(df$lead_source)
df$property_type <- as.factor(df$property_type)
df$location <- as.factor(df$location)
df$inspected <- as.factor(df$inspected)

df$inspected_binary <- ifelse(df$inspected == "Yes", 1, 0)
df$log_budget <- log(df$budget_naira)

1. Executive Summary

This case study examines the operational factors that influence whether Lagos real estate leads progress from initial inquiry to property inspection. The analysis uses 100 anonymised lead records independently assembled from JahDay Real Estate’s buyer engagement activity between November 2025 and May 2026. The dataset captures inquiry date, lead source, property type, location, budget, response time, follow-up activity, and inspection outcome. Five CS1 techniques were applied: exploratory data analysis, data visualisation, hypothesis testing, correlation analysis, and logistic regression. The analysis shows that inquiry volume alone is not a reliable measure of lead quality. Instagram produced the highest number of leads, but inspection progression appears to depend on a wider combination of source quality, budget profile, response behaviour, and follow-up activity. Budget values were highly skewed because of luxury property inquiries, so log budget was used to improve interpretation. Overall, the findings support a more structured lead prioritisation process. JahDay Real Estate should focus on leads with stronger budget alignment, credible source quality, and consistent follow-up potential, rather than relying only on speed of response or total inquiry volume.

2. Professional Disclosure

I work in the Lagos real estate sector through JahDay Real Estate, where I support high-value client acquisition, buyer advisory, lead qualification, inspection coordination, developer engagement, and investment-focused property transactions. My role requires evaluating buyer intent across multiple inquiry channels, including Instagram, WhatsApp, referrals, agent networks, and website leads. Because property inspection is a key milestone between initial interest and serious transaction activity, understanding what moves a lead from inquiry to inspection is directly relevant to my professional decision making.

The business question guiding this analysis is: Which operational factors influence whether a Lagos real estate inquiry progresses into a serious property inspection opportunity?

Exploratory Data Analysis relevance: Exploratory Data Analysis is relevant because it provides the first view of lead behaviour before deeper testing. In my work, it helps identify where inquiries are coming from, how buyer budgets are distributed, which property types attract interest, and whether the data contains missing values, outliers, or unusual patterns that could affect decision making.

Data Visualisation relevance: Data visualisation is important because real estate decisions are often discussed with non-technical stakeholders, including buyers, developers, agents, and strategic partners. Visuals make it easier to communicate channel performance, property demand, budget patterns, and inspection behaviour in a clear and practical way.

Hypothesis Testing relevance: Hypothesis testing is useful because it allows me to move beyond observation and test whether differences in lead behaviour are statistically meaningful. For example, it can help determine whether inspected leads have different budget profiles from non-inspected leads, or whether inspection outcomes are associated with lead source.

Correlation Analysis relevance: Correlation analysis helps assess whether numeric operational variables move together. In this case, it supports a better understanding of how budget, response time, follow-up activity, and inspection outcome relate to one another. This is useful for deciding which lead management factors deserve closer attention.

Regression relevance: Logistic regression is appropriate because the main business outcome is binary: a lead either inspected or did not inspect. This technique helps estimate how multiple factors, such as budget, response time, follow-up activity, lead source, and property type, are associated with the probability of inspection. In practice, this supports a more structured and evidence-based approach to client prioritisation and inspection conversion.

3. Data Collection and Sampling

The dataset used for this analysis was collected from anonymised Lagos real estate inquiry records connected to active property marketing and buyer engagement activity through JahDay Real Estate.

Each row represents one unique lead inquiry. The dataset contains 100 observations and 9 original variables, with additional derived variables created during analysis for inspection outcome and log-transformed budget. The time period covered by the dataset runs from November 2025 to May 2026.

The sampling approach was purposive sampling because the dataset focused specifically on real estate inquiries connected to active Lagos property marketing operations. The sampling frame includes inquiries received through Instagram, WhatsApp, referrals, agent networks, and website channels. These channels are directly relevant to my professional work because they represent the main ways real estate leads enter the buyer engagement process.

To protect confidentiality, all personally identifiable information was removed before analysis. The dataset does not include buyer names, phone numbers, email addresses, or sensitive client information. The data is used only for academic analysis and operational learning.

Code
data_collection_summary <- data.frame(
  Item = c(
    "Number of observations",
    "Number of original variables",
    "Earliest inquiry date",
    "Latest inquiry date",
    "Lead source channels",
    "Property type categories",
    "Inspection outcome variable"
  ),
  Value = c(
    nrow(df),
    "9 original variables, plus 2 derived variables for analysis",
    as.character(min(df$inquiry_date, na.rm = TRUE)),
    as.character(max(df$inquiry_date, na.rm = TRUE)),
    paste(levels(df$lead_source), collapse = ", "),
    paste(levels(df$property_type), collapse = ", "),
    "Inspected: Yes or No"
  )
)

knitr::kable(data_collection_summary)
Item Value
Number of observations 100
Number of original variables 9 original variables, plus 2 derived variables for analysis
Earliest inquiry date 2025-11-05
Latest inquiry date 2026-05-13
Lead source channels Agent Network, Instagram, Referral, Website, WhatsApp
Property type categories Apartment, Bungalow, Duplex, Land
Inspection outcome variable Inspected: Yes or No

Sampling justification: A sample size of 100 lead inquiries is sufficient for this exploratory and inferential case study because it meets the assignment requirement and provides enough variation across lead source, property type, budget, response time, follow-up activity, and inspection outcome. The dataset is not intended to represent the entire Lagos real estate market. Instead, it is intended to analyse operational lead behaviour within my own professional context.

4. Data Description

The dataset contains categorical, numeric, date, and outcome variables. This makes it suitable for CS1 because it supports exploratory data analysis, visualisation, hypothesis testing, correlation analysis, and regression modelling.

The main outcome variable is inspected, which shows whether a real estate inquiry progressed into a property inspection opportunity. The main predictor variables include lead source, property type, location, budget, response time, and follow-up activity.

Code
variable_table <- data.frame(
  Variable = c(
    "lead_id",
    "inquiry_date",
    "lead_source",
    "property_type",
    "location",
    "budget_naira",
    "response_time_hours",
    "follow_ups",
    "inspected",
    "inspected_binary",
    "log_budget"
  ),
  Description = c(
    "Anonymous identifier for each lead",
    "Date the inquiry was received",
    "Channel where the inquiry came from",
    "Property category requested by the lead",
    "Property location associated with the inquiry",
    "Estimated property value or client budget in Naira",
    "Time taken to respond to the inquiry",
    "Number of follow-up interactions after first contact",
    "Whether the lead progressed to inspection",
    "Numeric version of inspection outcome, where Yes = 1 and No = 0",
    "Log-transformed budget used to reduce skewness"
  ),
  Type = c(
    "Categorical",
    "Date",
    "Categorical",
    "Categorical",
    "Categorical",
    "Numeric",
    "Numeric",
    "Numeric",
    "Categorical outcome",
    "Numeric outcome",
    "Numeric"
  )
)

knitr::kable(variable_table)
Variable Description Type
lead_id Anonymous identifier for each lead Categorical
inquiry_date Date the inquiry was received Date
lead_source Channel where the inquiry came from Categorical
property_type Property category requested by the lead Categorical
location Property location associated with the inquiry Categorical
budget_naira Estimated property value or client budget in Naira Numeric
response_time_hours Time taken to respond to the inquiry Numeric
follow_ups Number of follow-up interactions after first contact Numeric
inspected Whether the lead progressed to inspection Categorical outcome
inspected_binary Numeric version of inspection outcome, where Yes = 1 and No = 0 Numeric outcome
log_budget Log-transformed budget used to reduce skewness Numeric
Code
requirement_check <- data.frame(
  Requirement = c(
    "Minimum observations",
    "Minimum variables",
    "At least 3 numeric variables",
    "At least 2 categorical variables",
    "At least 1 date variable",
    "At least 1 outcome variable"
  ),
  Dataset_Status = c(
    paste(nrow(df), "observations"),
    paste(ncol(df), "variables after derived variables"),
    "budget_naira, response_time_hours, follow_ups, inspected_binary, log_budget",
    "lead_source, property_type, location, inspected",
    "inquiry_date",
    "inspected"
  ),
  Meets_Requirement = c("Yes", "Yes", "Yes", "Yes", "Yes", "Yes")
)

knitr::kable(requirement_check)
Requirement Dataset_Status Meets_Requirement
Minimum observations 100 observations Yes
Minimum variables 11 variables after derived variables Yes
At least 3 numeric variables budget_naira, response_time_hours, follow_ups, inspected_binary, log_budget Yes
At least 2 categorical variables lead_source, property_type, location, inspected Yes
At least 1 date variable inquiry_date Yes
At least 1 outcome variable inspected Yes
Code
numeric_summary <- data.frame(
  Metric = c(
    "Minimum budget",
    "Median budget",
    "Mean budget",
    "Maximum budget",
    "Average response time",
    "Average follow ups",
    "Inspection rate"
  ),
  Value = c(
    min(df$budget_naira, na.rm = TRUE),
    median(df$budget_naira, na.rm = TRUE),
    mean(df$budget_naira, na.rm = TRUE),
    max(df$budget_naira, na.rm = TRUE),
    mean(df$response_time_hours, na.rm = TRUE),
    mean(df$follow_ups, na.rm = TRUE),
    mean(df$inspected_binary, na.rm = TRUE)
  )
)

knitr::kable(numeric_summary)
Metric Value
Minimum budget 5.0000e+07
Median budget 1.5000e+08
Mean budget 1.6062e+09
Maximum budget 1.7500e+10
Average response time 5.1400e+00
Average follow ups 6.9100e+00
Inspection rate 5.1000e-01
Code
lead_source_summary <- as.data.frame(table(df$lead_source))
names(lead_source_summary) <- c("Lead Source", "Number of Leads")

knitr::kable(lead_source_summary)
Lead Source Number of Leads
Agent Network 9
Instagram 47
Referral 19
Website 3
WhatsApp 22
Code
property_type_summary <- as.data.frame(table(df$property_type))
names(property_type_summary) <- c("Property Type", "Number of Leads")

knitr::kable(property_type_summary)
Property Type Number of Leads
Apartment 27
Bungalow 26
Duplex 25
Land 22
Code
inspection_summary <- as.data.frame(table(df$inspected))
names(inspection_summary) <- c("Inspection Outcome", "Number of Leads")

knitr::kable(inspection_summary)
Inspection Outcome Number of Leads
No 49
Yes 51

Data description interpretation: The dataset meets the CS1 data requirements because it contains 100 observations, more than 6 variables, at least 3 numeric variables, at least 2 categorical variables, a date variable, and a clear outcome variable. The outcome variable, inspected, allows the analysis to focus on real estate lead conversion rather than only general inquiry activity.

5. Analysis 1: Exploratory Data Analysis

Exploratory Data Analysis was conducted to understand the structure, quality, and operational behaviour of the Lagos real estate lead dataset before applying statistical tests and regression modelling.

For JahDay Real Estate, this stage is important because lead conversion decisions should not be based only on instinct. EDA helps identify which channels generate inquiries, how budgets are distributed, whether response times vary meaningfully, and whether the dataset contains missing values, outliers, or unusual patterns that could affect interpretation.

Code
missing_values <- data.frame(
  Variable = names(df),
  Missing_Count = colSums(is.na(df))
)

knitr::kable(missing_values)
Variable Missing_Count
lead_id lead_id 0
inquiry_date inquiry_date 0
lead_source lead_source 0
property_type property_type 0
location location 0
budget_naira budget_naira 0
response_time_hours response_time_hours 0
follow_ups follow_ups 0
inspected inspected 0
inspected_binary inspected_binary 0
log_budget log_budget 0
Code
eda_summary <- data.frame(
  Metric = c(
    "Total leads",
    "Inspection rate",
    "Average budget",
    "Median budget",
    "Average response time",
    "Median response time",
    "Average follow ups",
    "Median follow ups"
  ),
  Value = c(
    nrow(df),
    mean(df$inspected_binary, na.rm = TRUE),
    mean(df$budget_naira, na.rm = TRUE),
    median(df$budget_naira, na.rm = TRUE),
    mean(df$response_time_hours, na.rm = TRUE),
    median(df$response_time_hours, na.rm = TRUE),
    mean(df$follow_ups, na.rm = TRUE),
    median(df$follow_ups, na.rm = TRUE)
  )
)

knitr::kable(eda_summary)
Metric Value
Total leads 1.0000e+02
Inspection rate 5.1000e-01
Average budget 1.6062e+09
Median budget 1.5000e+08
Average response time 5.1400e+00
Median response time 4.0000e+00
Average follow ups 6.9100e+00
Median follow ups 6.0000e+00
Code
budget_outliers <- boxplot.stats(df$budget_naira)$out
response_time_outliers <- boxplot.stats(df$response_time_hours)$out
follow_up_outliers <- boxplot.stats(df$follow_ups)$out

outlier_summary <- data.frame(
  Variable = c("budget_naira", "response_time_hours", "follow_ups"),
  Number_of_Outliers = c(
    length(budget_outliers),
    length(response_time_outliers),
    length(follow_up_outliers)
  ),
  Minimum_Outlier = c(
    ifelse(length(budget_outliers) > 0, min(budget_outliers), NA),
    ifelse(length(response_time_outliers) > 0, min(response_time_outliers), NA),
    ifelse(length(follow_up_outliers) > 0, min(follow_up_outliers), NA)
  ),
  Maximum_Outlier = c(
    ifelse(length(budget_outliers) > 0, max(budget_outliers), NA),
    ifelse(length(response_time_outliers) > 0, max(response_time_outliers), NA),
    ifelse(length(follow_up_outliers) > 0, max(follow_up_outliers), NA)
  )
)

knitr::kable(outlier_summary)
Variable Number_of_Outliers Minimum_Outlier Maximum_Outlier
budget_naira 8 1.75e+10 1.75e+10
response_time_hours 0 NA NA
follow_ups 0 NA NA
Code
inspection_by_source <- as.data.frame.matrix(table(df$lead_source, df$inspected))

inspection_by_source$Lead_Source <- rownames(inspection_by_source)
rownames(inspection_by_source) <- NULL

knitr::kable(inspection_by_source)
No Yes Lead_Source
6 3 Agent Network
28 19 Instagram
4 15 Referral
2 1 Website
9 13 WhatsApp
Code
inspection_by_property <- as.data.frame.matrix(table(df$property_type, df$inspected))

inspection_by_property$Property_Type <- rownames(inspection_by_property)
rownames(inspection_by_property) <- NULL

knitr::kable(inspection_by_property)
No Yes Property_Type
12 15 Apartment
9 17 Bungalow
12 13 Duplex
16 6 Land

EDA interpretation: The missing-value check confirms whether the dataset is complete enough for analysis. The key summary statistics show that the budget variable is highly skewed because the mean budget is much higher than the median budget. This is expected in Lagos real estate because a small number of luxury inquiries can raise the average significantly. To handle this, a log-transformed budget variable was created for later visualisation and regression. A second data quality issue was the date format. The inquiry date was imported from CSV and converted into a proper date variable before analysis. Outlier detection was also performed for budget, response time, and follow-up activity so that unusual values could be identified and interpreted rather than ignored.

6. Analysis 2: Data Visualisation

Data visualisation was used to communicate the Lagos real estate lead conversion story in a clear and manager-friendly way. The visual narrative moves from lead source volume, to inspection quality, to property demand, budget profile, and operational engagement behaviour.

These charts help distinguish lead volume from lead quality. A source may generate many inquiries, but that does not automatically mean it produces inspection-ready buyers.

6.1 Lead Volume by Source

This chart shows where the 100 real estate inquiries came from. It helps identify which channels are generating the highest inquiry volume.

Code
lead_source_counts <- sort(table(df$lead_source), decreasing = TRUE)

par(mar = c(5, 12, 4, 4))

bp1 <- barplot(
  lead_source_counts,
  horiz = TRUE,
  col = "lightblue",
  border = "grey30",
  main = "Lead Volume by Source",
  xlab = "Number of Leads",
  las = 1,
  xlim = c(0, max(lead_source_counts) * 1.25)
)

text(
  x = lead_source_counts,
  y = bp1,
  labels = lead_source_counts,
  pos = 4,
  cex = 0.9
)

Chart interpretation: Instagram generated the highest number of inquiries, followed by WhatsApp and referrals. This shows that digital visibility and direct relationship channels are important sources of buyer interest. However, inquiry volume alone does not confirm lead quality, so this chart should be read together with the inspection rate chart below.

6.2 Inspection Rate by Lead Source

This chart shows the share of leads from each source that progressed to inspection. It is useful because lead quality is not the same as lead volume.

Code
inspection_rate_source <- aggregate(
  inspected_binary ~ lead_source,
  data = df,
  mean
)

inspection_rate_source <- inspection_rate_source[
  order(inspection_rate_source$inspected_binary, decreasing = TRUE),
]

par(mar = c(5, 12, 4, 4))

bp2 <- barplot(
  inspection_rate_source$inspected_binary,
  names.arg = inspection_rate_source$lead_source,
  horiz = TRUE,
  col = "lightgreen",
  border = "grey30",
  main = "Inspection Rate by Lead Source",
  xlab = "Inspection Rate",
  xlim = c(0, 1.15),
  las = 1
)

text(
  x = inspection_rate_source$inspected_binary,
  y = bp2,
  labels = paste0(round(inspection_rate_source$inspected_binary * 100, 0), "%"),
  pos = 4,
  cex = 0.9
)

Chart interpretation: This chart shifts the focus from quantity to quality. A source with fewer total inquiries can still produce stronger inspection movement if a higher share of those leads convert. This helps JahDay Real Estate avoid overvaluing high-volume channels and instead identify channels that are more likely to produce serious buyers.

6.3 Lead Volume by Property Type

This chart shows the distribution of inquiries across apartments, bungalows, duplexes, and land. It helps identify which property categories attract the most interest.

Code
property_type_counts <- sort(table(df$property_type), decreasing = TRUE)

par(mar = c(5, 10, 4, 4))

bp3 <- barplot(
  property_type_counts,
  horiz = TRUE,
  col = "lightgrey",
  border = "grey30",
  main = "Lead Volume by Property Type",
  xlab = "Number of Leads",
  las = 1,
  xlim = c(0, max(property_type_counts) * 1.25)
)

text(
  x = property_type_counts,
  y = bp3,
  labels = property_type_counts,
  pos = 4,
  cex = 0.9
)

Chart interpretation: Property interest is fairly spread across the four categories, with apartments and bungalows slightly ahead. This suggests that the lead pool is not concentrated in only one property type, which makes the dataset more useful for comparing buyer behaviour across different real estate categories.

6.4 Lead Volume by Budget Band

This chart groups buyer budgets into practical business bands. This is easier to interpret than a raw budget histogram because Lagos real estate values are highly skewed by a small number of luxury inquiries.

Code
df$budget_band <- cut(
  df$budget_naira,
  breaks = c(-Inf, 100000000, 250000000, 500000000, 1000000000, Inf),
  labels = c(
    "Up to 100M",
    "101M to 250M",
    "251M to 500M",
    "501M to 1B",
    "Above 1B"
  )
)

budget_band_counts <- sort(table(df$budget_band), decreasing = TRUE)

par(mar = c(5, 12, 4, 5))

bp4 <- barplot(
  budget_band_counts,
  horiz = TRUE,
  col = "orange",
  border = "grey30",
  main = "Lead Volume by Budget Band",
  xlab = "Number of Leads",
  las = 1,
  xlim = c(0, max(budget_band_counts) * 1.3)
)

text(
  x = budget_band_counts,
  y = bp4,
  labels = budget_band_counts,
  pos = 4,
  cex = 0.9
)

mtext(
  "Budget bands are in Naira",
  side = 1,
  line = 4,
  cex = 0.8
)

Chart interpretation: Most leads fall within the lower and mid-range budget bands, while fewer leads sit in the highest luxury budget category. This is important because the average budget can be misleading when a few very high-value leads are present. Grouping budgets into bands gives a more practical view of buyer affordability and helps with lead qualification.

6.5 Response Time by Inspection Outcome

This chart compares response time for leads that inspected versus leads that did not inspect. It helps assess whether faster response appears linked to inspection progression.

Code
par(mar = c(5, 5, 4, 2))

boxplot(
  response_time_hours ~ inspected,
  data = df,
  col = c("lightcoral", "lightblue"),
  main = "Response Time by Inspection Outcome",
  xlab = "Inspection Outcome",
  ylab = "Response Time in Hours"
)

Chart interpretation: This chart helps show whether inspected leads were generally responded to faster than non-inspected leads. If the inspected group has a lower median response time, it suggests that speed may support conversion. If the difference is small, it suggests that response time matters, but it is not enough on its own to explain inspection movement.

6.6 Follow-Up Activity by Inspection Outcome

This chart compares the number of follow-ups for inspected and non-inspected leads. It helps evaluate whether sustained engagement is associated with inspection movement.

Code
par(mar = c(5, 5, 4, 2))

boxplot(
  follow_ups ~ inspected,
  data = df,
  col = c("lightcoral", "lightblue"),
  main = "Follow-Up Activity by Inspection Outcome",
  xlab = "Inspection Outcome",
  ylab = "Number of Follow-Ups"
)

Chart interpretation: This chart shows whether inspected leads received more follow-up activity than non-inspected leads. In Lagos real estate, follow-up can be especially important because buyers often need additional information about location, title, documentation, pricing, and inspection logistics before taking action.

6.7 Response Time and Follow-Up Activity

This scatterplot shows the relationship between response time and follow-up activity, with inspection outcome shown by colour. It helps show whether leads that inspect have a distinct response or follow-up pattern.

Code
par(mar = c(5, 5, 4, 2))

plot(
  df$response_time_hours,
  df$follow_ups,
  pch = 19,
  col = ifelse(df$inspected == "Yes", "darkgreen", "darkred"),
  main = "Response Time and Follow-Up Activity",
  xlab = "Response Time in Hours",
  ylab = "Number of Follow-Ups"
)

legend(
  "topright",
  legend = c("Inspected = Yes", "Inspected = No"),
  col = c("darkgreen", "darkred"),
  pch = 19,
  bty = "n"
)

Chart interpretation: The scatterplot shows that response time and follow-up activity do not form a perfectly simple pattern. Some leads require many follow-ups even when response time is fast, while others do not progress despite engagement. This supports the broader finding that inspection conversion is influenced by multiple factors, not one operational variable alone.

Overall visualisation interpretation: The visualisations show that lead volume, lead quality, budget profile, and engagement behaviour should be evaluated together. Instagram may produce the highest inquiry volume, but inspection rate and budget band distribution provide deeper insight into lead seriousness. The response time and follow-up charts show that operational discipline matters, but conversion is not explained by speed alone. Together, the charts support a more structured approach to client prioritisation and inspection conversion.

7. Analysis 3: Hypothesis Testing

Hypothesis testing was used to evaluate whether observed differences in the data are statistically meaningful or likely due to random variation. Two hypotheses were tested. The first examines whether inspected and non-inspected leads differ by budget profile. The second examines whether inspection outcome is associated with lead source.

Hypothesis 1: Budget Difference by Inspection Outcome

H0: The average log-transformed budget is the same for inspected and non-inspected leads.

H1: The average log-transformed budget is different for inspected and non-inspected leads.

A Welch two-sample t-test was used because the test compares the mean of a numeric variable across two independent groups. Log-transformed budget was used instead of raw budget because the original budget values were highly skewed by a small number of luxury property inquiries.

Code
yes_budget <- df$log_budget[df$inspected == "Yes"]
no_budget <- df$log_budget[df$inspected == "No"]

budget_t_test <- t.test(yes_budget, no_budget)

pooled_sd <- sqrt(
  ((length(yes_budget) - 1) * var(yes_budget, na.rm = TRUE) +
     (length(no_budget) - 1) * var(no_budget, na.rm = TRUE)) /
    (length(yes_budget) + length(no_budget) - 2)
)

cohens_d <- (mean(yes_budget, na.rm = TRUE) - mean(no_budget, na.rm = TRUE)) / pooled_sd

hypothesis_1_results <- data.frame(
  Measure = c(
    "Test used",
    "Mean log-transformed budget for inspected leads",
    "Mean log-transformed budget for non-inspected leads",
    "p-value",
    "Effect size: Cohen's d",
    "Decision at 5 percent significance level"
  ),
  Result = c(
    "Welch two-sample t-test",
    round(mean(yes_budget, na.rm = TRUE), 3),
    round(mean(no_budget, na.rm = TRUE), 3),
    round(budget_t_test$p.value, 4),
    round(cohens_d, 3),
    ifelse(budget_t_test$p.value < 0.05, "Reject H0", "Fail to reject H0")
  )
)

knitr::kable(hypothesis_1_results)
Measure Result
Test used Welch two-sample t-test
Mean log-transformed budget for inspected leads 19
Mean log-transformed budget for non-inspected leads 19.572
p-value 0.0559
Effect size: Cohen’s d -0.393
Decision at 5 percent significance level Fail to reject H0
Code
assumption_check_h1 <- data.frame(
  Assumption = c(
    "Independent observations",
    "Numeric dependent variable",
    "Two comparison groups",
    "Equal variance requirement"
  ),
  Check = c(
    "Each row represents one unique lead inquiry",
    "Log-transformed budget is numeric",
    "Inspection outcome has two groups: Yes and No",
    "Welch t-test was used because it does not require equal variances"
  )
)

knitr::kable(assumption_check_h1)
Assumption Check
Independent observations Each row represents one unique lead inquiry
Numeric dependent variable Log-transformed budget is numeric
Two comparison groups Inspection outcome has two groups: Yes and No
Equal variance requirement Welch t-test was used because it does not require equal variances

Hypothesis 1 interpretation: The Welch two-sample t-test produced a p-value slightly above the 0.05 significance level. This means there is not enough statistical evidence to conclude that inspected and non-inspected leads have different average log-transformed budgets. Therefore, I fail to reject the null hypothesis. However, the result is close to the threshold, and the Cohen’s d value indicates a small to moderate difference between the two groups. From a business perspective, budget profile may still matter, but it should not be used alone to predict inspection conversion. JahDay Real Estate should evaluate budget together with lead source, response behaviour, and follow-up activity.

Hypothesis 2: Lead Source and Inspection Outcome

H0: Lead source and inspection outcome are independent.

H1: Lead source and inspection outcome are associated.

A chi-square test was used because both lead source and inspection outcome are categorical variables.

Code
lead_source_table <- table(df$lead_source, df$inspected)

chi_test <- chisq.test(lead_source_table, simulate.p.value = TRUE, B = 10000)

cramers_v <- sqrt(
  as.numeric(chi_test$statistic) /
    (sum(lead_source_table) * (min(dim(lead_source_table)) - 1))
)

hypothesis_2_results <- data.frame(
  Measure = c(
    "Test used",
    "p-value",
    "Effect size: Cramer's V",
    "Decision at 5 percent significance level"
  ),
  Result = c(
    "Chi-square test with simulated p-value",
    round(chi_test$p.value, 4),
    round(cramers_v, 3),
    ifelse(chi_test$p.value < 0.05, "Reject H0", "Fail to reject H0")
  )
)

knitr::kable(as.data.frame.matrix(lead_source_table))
No Yes
Agent Network 6 3
Instagram 28 19
Referral 4 15
Website 2 1
WhatsApp 9 13
Code
knitr::kable(hypothesis_2_results)
Measure Result
Test used Chi-square test with simulated p-value
p-value 0.0343
Effect size: Cramer’s V 0.318
Decision at 5 percent significance level Reject H0
Code
assumption_check_h2 <- data.frame(
  Assumption = c(
    "Categorical variables",
    "Independent observations",
    "Expected cell count concern"
  ),
  Check = c(
    "Lead source and inspection outcome are both categorical",
    "Each row represents one unique lead inquiry",
    "Simulated p-value was used because some lead source categories have small counts"
  )
)

knitr::kable(assumption_check_h2)
Assumption Check
Categorical variables Lead source and inspection outcome are both categorical
Independent observations Each row represents one unique lead inquiry
Expected cell count concern Simulated p-value was used because some lead source categories have small counts

Hypothesis 2 interpretation: The chi-square test evaluates whether inspection outcome differs by lead source. If the p-value is below 0.05, the result suggests that lead source and inspection outcome are statistically associated. Cramer’s V measures the strength of that association. In business terms, this test helps show whether some inquiry channels are more likely to produce inspection-ready leads than others. For JahDay Real Estate, this is important because time and follow-up effort should not be allocated only based on lead volume. Channels that produce stronger inspection movement should receive greater attention in lead prioritisation and client engagement planning.

Overall hypothesis testing interpretation: The hypothesis tests show that budget profile alone does not fully explain inspection conversion, while lead source may provide stronger insight into inspection movement. This supports the broader conclusion that conversion is influenced by a combination of buyer profile, acquisition channel, and operational follow-up rather than one single variable.

8. Analysis 4: Correlation Analysis

Correlation analysis was used to examine the strength and direction of relationships between numeric variables in the dataset. This helps identify whether budget, response time, follow-up activity, and inspection outcome move together in ways that may be useful for real estate lead prioritisation.

Correlation does not prove causation, but it helps identify operational relationships that deserve further investigation.

Code
numeric_data <- data.frame(
  budget_naira = df$budget_naira,
  response_time_hours = df$response_time_hours,
  follow_ups = df$follow_ups,
  inspected_binary = df$inspected_binary,
  log_budget = df$log_budget
)

cor_matrix <- cor(numeric_data, use = "complete.obs", method = "pearson")

knitr::kable(round(cor_matrix, 3))
budget_naira response_time_hours follow_ups inspected_binary log_budget
budget_naira 1.000 -0.099 0.246 -0.230 0.883
response_time_hours -0.099 1.000 -0.075 -0.189 -0.098
follow_ups 0.246 -0.075 1.000 -0.225 0.156
inspected_binary -0.230 -0.189 -0.225 1.000 -0.195
log_budget 0.883 -0.098 0.156 -0.195 1.000
Code
heatmap(
  cor_matrix,
  Rowv = NA,
  Colv = NA,
  scale = "none",
  main = "Correlation Heatmap of Numeric Variables"
)

Code
cor_values <- as.data.frame(as.table(cor_matrix))
names(cor_values) <- c("Variable_1", "Variable_2", "Correlation")

cor_values <- cor_values[cor_values$Variable_1 != cor_values$Variable_2, ]

cor_values$Pair <- apply(
  cor_values[, c("Variable_1", "Variable_2")],
  1,
  function(x) paste(sort(x), collapse = " and ")
)

cor_values_unique <- cor_values[!duplicated(cor_values$Pair), ]

cor_values_unique$Absolute_Correlation <- abs(cor_values_unique$Correlation)

strongest_correlations <- cor_values_unique[
  order(-cor_values_unique$Absolute_Correlation),
]

knitr::kable(head(strongest_correlations, 3))
Variable_1 Variable_2 Correlation Pair Absolute_Correlation
5 log_budget budget_naira 0.8831326 budget_naira and log_budget 0.8831326
3 follow_ups budget_naira 0.2460385 budget_naira and follow_ups 0.2460385
4 inspected_binary budget_naira -0.2295360 budget_naira and inspected_binary 0.2295360

Correlation interpretation: The correlation matrix shows how the numeric variables relate to one another. A positive correlation means that two variables tend to increase together, while a negative correlation means that one variable tends to decrease as the other increases. The strongest relationships are useful because they indicate which operational variables may be connected to inspection progression. In this dataset, the key managerial question is whether response time, follow-up activity, and budget profile are meaningfully related to inspection outcome. If follow-up activity has a stronger relationship with inspection than response time, this would suggest that sustained engagement may matter more than speed alone. However, correlation should not be treated as proof of causation. A stronger causal test would require tracking similar leads over time and comparing outcomes based on controlled differences in response strategy or follow-up intensity.

9. Analysis 5: Logistic Regression

Logistic regression was used because the main business outcome in this case study is binary: a lead either progressed to inspection or did not progress to inspection. The model estimates how selected lead characteristics are associated with the probability of inspection.

For JahDay Real Estate, this is useful because it supports a more structured approach to lead prioritisation. Instead of treating all inquiries equally, the model helps assess whether budget profile, response time, follow-up activity, lead source, and property type are associated with inspection conversion.

Code
logistic_model <- glm(
  inspected_binary ~ log_budget + response_time_hours + follow_ups + lead_source + property_type,
  data = df,
  family = binomial
)

summary(logistic_model)

Call:
glm(formula = inspected_binary ~ log_budget + response_time_hours + 
    follow_ups + lead_source + property_type, family = binomial, 
    data = df)

Coefficients:
                      Estimate Std. Error z value Pr(>|z|)  
(Intercept)            7.58100    4.39521   1.725   0.0846 .
log_budget            -0.38802    0.22758  -1.705   0.0882 .
response_time_hours   -0.07850    0.08823  -0.890   0.3736  
follow_ups            -0.10492    0.07710  -1.361   0.1736  
lead_sourceInstagram   0.78683    0.93380   0.843   0.3994  
lead_sourceReferral    2.52967    1.01770   2.486   0.0129 *
lead_sourceWebsite     1.38343    1.65843   0.834   0.4042  
lead_sourceWhatsApp    1.10769    0.86989   1.273   0.2029  
property_typeBungalow  0.21030    0.62058   0.339   0.7347  
property_typeDuplex    0.23756    0.70396   0.337   0.7358  
property_typeLand     -0.73168    0.81930  -0.893   0.3718  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 138.59  on 99  degrees of freedom
Residual deviance: 113.97  on 89  degrees of freedom
AIC: 135.97

Number of Fisher Scoring iterations: 4
Code
model_coefficients <- as.data.frame(summary(logistic_model)$coefficients)

model_coefficients$Term <- rownames(model_coefficients)
rownames(model_coefficients) <- NULL

model_coefficients$Odds_Ratio <- exp(model_coefficients$Estimate)
model_coefficients$Lower_95_CI <- exp(model_coefficients$Estimate - 1.96 * model_coefficients$`Std. Error`)
model_coefficients$Upper_95_CI <- exp(model_coefficients$Estimate + 1.96 * model_coefficients$`Std. Error`)

model_coefficients <- model_coefficients[, c(
  "Term",
  "Estimate",
  "Std. Error",
  "z value",
  "Pr(>|z|)",
  "Odds_Ratio",
  "Lower_95_CI",
  "Upper_95_CI"
)]

knitr::kable(model_coefficients)
Term Estimate Std. Error z value Pr(>|z|) Odds_Ratio Lower_95_CI Upper_95_CI
(Intercept) 7.5809985 4.3952088 1.7248324 0.0845577 1960.5857245 0.3557202 1.080595e+07
log_budget -0.3880243 0.2275817 -1.7049889 0.0881965 0.6783958 0.4342719 1.059753e+00
response_time_hours -0.0784991 0.0882256 -0.8897533 0.3735984 0.9245029 0.7776947 1.099025e+00
follow_ups -0.1049216 0.0771022 -1.3608113 0.1735733 0.9003951 0.7741095 1.047283e+00
lead_sourceInstagram 0.7868346 0.9337989 0.8426168 0.3994428 2.1964328 0.3522510 1.369568e+01
lead_sourceReferral 2.5296685 1.0177012 2.4856691 0.0129308 12.5493447 1.7074040 9.223713e+01
lead_sourceWebsite 1.3834267 1.6584273 0.8341799 0.4041796 3.9885457 0.1545727 1.029192e+02
lead_sourceWhatsApp 1.1076936 0.8698896 1.2733727 0.2028858 3.0273681 0.5503010 1.665445e+01
property_typeBungalow 0.2103005 0.6205842 0.3388750 0.7347039 1.2340488 0.3656625 4.164706e+00
property_typeDuplex 0.2375575 0.7039583 0.3374596 0.7357704 1.2681480 0.3191159 5.039545e+00
property_typeLand -0.7316781 0.8192953 -0.8930579 0.3718262 0.4811010 0.0965691 2.396815e+00
Code
significant_terms <- model_coefficients[model_coefficients$`Pr(>|z|)` < 0.05, ]

if (nrow(significant_terms) == 0) {
  significant_terms <- data.frame(
    Note = "No predictor was statistically significant at the 5 percent level in this model."
  )
}

knitr::kable(significant_terms)
Term Estimate Std. Error z value Pr(>|z|) Odds_Ratio Lower_95_CI Upper_95_CI
6 lead_sourceReferral 2.529669 1.017701 2.485669 0.0129308 12.54934 1.707404 92.23713
Code
model_fit <- data.frame(
  Metric = c(
    "Null deviance",
    "Residual deviance",
    "AIC"
  ),
  Value = c(
    logistic_model$null.deviance,
    logistic_model$deviance,
    logistic_model$aic
  )
)

knitr::kable(model_fit)
Metric Value
Null deviance 138.5894
Residual deviance 113.9696
AIC 135.9696
Code
df$predicted_probability <- predict(logistic_model, type = "response")
df$predicted_class <- ifelse(df$predicted_probability >= 0.5, "Yes", "No")

confusion_matrix <- table(
  Actual = df$inspected,
  Predicted = df$predicted_class
)

knitr::kable(as.data.frame.matrix(confusion_matrix))
No Yes
No 36 13
Yes 16 35
Code
hist(
  df$predicted_probability,
  main = "Predicted Probability of Inspection",
  xlab = "Predicted Probability",
  ylab = "Number of Leads",
  breaks = 12
)

Code
plot(
  cooks.distance(logistic_model),
  type = "h",
  main = "Cook's Distance Diagnostic Plot",
  xlab = "Observation Number",
  ylab = "Cook's Distance"
)

Regression interpretation: The logistic regression model estimates the likelihood that a lead will progress to inspection based on lead characteristics. Odds ratios above 1 suggest that a variable is associated with higher inspection odds, while odds ratios below 1 suggest lower inspection odds. Statistically significant predictors should be interpreted as the strongest candidates for operational action. For example, if follow-up activity has an odds ratio above 1 and is statistically significant, this would support a more structured follow-up process. If a lead source has a higher odds ratio than the reference category, that channel should receive greater prioritisation in marketing and response planning. The model should not be treated as a perfect prediction system because the sample size is limited, but it provides a practical evidence base for improving lead scoring and inspection conversion.

10. Integrated Findings

The five analyses collectively show that Lagos real estate lead conversion is not driven by one isolated factor. Inspection progression is shaped by a combination of lead source quality, buyer budget profile, response behaviour, and follow-up activity. This is important because real estate inquiries can appear promising at first glance, but not every inquiry represents a buyer who is financially ready, serious, or prepared to move toward inspection.

The exploratory data analysis confirmed that the dataset was suitable for CS1 and met the required structure for analysis. It also revealed two important data quality issues. First, budget values were highly skewed because a small number of luxury property inquiries increased the average budget. This meant that the mean budget alone could give a misleading impression of the typical buyer profile. To address this, log-transformed budget was created for analysis, while budget bands were used in the visual section to make the pattern easier to interpret from a business perspective. Second, inquiry date was imported from the CSV as a text field and was converted into a proper date variable before analysis. These cleaning steps made the dataset more reliable for interpretation.

The visualisation section showed that lead volume and lead quality are not the same. Instagram generated the highest number of inquiries, which confirms its usefulness as a visibility and discovery channel. However, the inspection rate chart showed that high inquiry volume does not automatically translate into stronger inspection conversion. This distinction is important for decision making because a business can waste time chasing a large number of low-readiness leads if it focuses only on volume. The budget band chart also gave a more practical view of buyer affordability by showing how many leads fall into each budget range instead of allowing a few luxury inquiries to distort the full picture.

The hypothesis testing section added statistical discipline to the business interpretation. The first test showed that the difference in log-transformed budget between inspected and non-inspected leads was not statistically significant at the 5 percent level, although the result was close enough to remain operationally relevant. This suggests that budget profile may matter, but budget alone should not be treated as a reliable predictor of inspection conversion. The second test showed that lead source and inspection outcome were statistically associated. This finding is more actionable because it indicates that where a lead comes from may influence the likelihood of inspection. In practical terms, source quality deserves attention alongside buyer budget.

The correlation analysis reinforced the idea that no single numeric variable fully explains inspection behaviour. Budget, response time, follow-up activity, and inspection outcome should be interpreted together rather than separately. The relationships between these variables help identify patterns, but they do not prove causation. For example, stronger follow-up activity may be linked to inspection progression, but it may also reflect the fact that more serious buyers naturally engage more. This means the business should use correlation as a signal for further investigation, not as final proof.

The logistic regression analysis brought the variables together into one model and provided a more structured way to assess inspection probability. Because the outcome variable is binary, logistic regression was the appropriate method for this business problem. The model supports the broader finding that inspection conversion should be viewed as a probability influenced by several factors, not a simple yes-or-no judgment based on one characteristic. Even if not every predictor is statistically significant, the model is useful as a decision-support tool because it shows how multiple operational and buyer characteristics can be considered together.

The single recommendation is that JahDay Real Estate should adopt a simple lead prioritisation framework. Leads should not be ranked only by who came in first, which channel produced the most inquiries, or who appears to have the highest budget. Instead, priority should be given to leads with stronger source quality, realistic budget alignment, clear inspection intent, and consistent follow-up potential. This would help focus time, inspection coordination, buyer advisory effort, and developer engagement on the leads most likely to progress into serious transaction opportunities.

Overall, the analysis supports a shift from reactive lead handling to more structured client prioritisation. The business should continue responding quickly, but speed should be supported by better qualification. A practical next step would be to create a simple scoring system that assigns weight to lead source, budget band, follow-up engagement, and inspection readiness. This would make the lead management process more evidence based while still allowing room for professional judgment in a relationship-driven market like Lagos real estate.

11. Limitations and Further Work

This study is limited by the sample size of 100 observations. Although this meets the assignment requirement, a larger dataset collected over a longer period would provide more stable and generalisable findings. The data reflects lead behaviour within one real estate business context and should not be interpreted as a complete representation of the entire Lagos real estate market.

The dataset also does not include some variables that may strongly influence inspection conversion. These include buyer income level, financing structure, urgency, title concerns, property documentation stage, marketing spend, developer reputation, and final sale outcome. These factors are important in Lagos real estate because buyers often require trust, legal comfort, and financial readiness before progressing from inquiry to inspection.

Another limitation is that the analysis focuses on inspection as the outcome variable, not final purchase. Inspection is an important operational milestone, but it does not always result in a completed transaction.

With more data and time, I would collect additional observations across multiple property campaigns and track each lead from first inquiry to inspection, negotiation, and final sale. Further work could also apply customer segmentation, predictive modelling, and time based analysis to improve lead scoring, marketing allocation, and follow up planning.

References

Adi, B. (2026). Data Analytics 1 Course Materials. Lagos Business School.

JahDay Real Estate. (2026). Anonymised Lagos real estate lead inquiry dataset, November 2025 to May 2026 [Unpublished primary dataset].

R Core Team. (2026). R: A language and environment for statistical computing. R Foundation for Statistical Computing.

Xie, Y. (2026). knitr: A general-purpose package for dynamic report generation in R.

Appendix: AI Usage Statement

ChatGPT was used as a support tool during the preparation of this report, specifically for code troubleshooting, Quarto formatting guidance, and refinement of interpretation language. The dataset was independently assembled, cleaned, and anonymised by the author from real estate lead activity connected to JahDay Real Estate. The business question, professional context, selection of analytical techniques, review of outputs, interpretation of results, and final recommendations were independently evaluated and validated by the author. AI assistance was used to support presentation, formatting, and clarity, while the underlying data ownership, analytical judgement, and business reasoning remained with the author.