library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggthemes)
library(ggrepel)
library(AmesHousing)
library(boot)
library(broom)
library(lindia)
# remove scientific notation
options(scipen = 6)
# default theme, unless otherwise noted
theme_set(theme_minimal())
ames <- read.csv("/Users/rupeshswarnakar/Downloads/AmesHousing.csv")
We have a construction company which is trying to analyze what are the most prominent factors that can increase the profit by constructing and selling houses with consideration of different factors of houses such as number of bedroom, location, year built, and more.
The below analysis support the fact that adding more bedroom in the house can be more profitable than adding more stories on house.
Let’s compare how sales price of house get affected by adding more bedroom vs adding more stories of house as given below.
# Linear regression for Sale Price vs Number of Bedrooms
bedroom_lm <- lm(SalePrice ~ Bedroom.AbvGr, data = ames)
summary(bedroom_lm)
##
## Call:
## lm(formula = SalePrice ~ Bedroom.AbvGr, data = ames)
##
## Residuals:
## Min 1Q Median 3Q Max
## -156142 -52820 -19820 32180 558290
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 141152 5245 26.910 < 2e-16 ***
## Bedroom.AbvGr 13890 1765 7.869 4.98e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 79070 on 2928 degrees of freedom
## Multiple R-squared: 0.02071, Adjusted R-squared: 0.02038
## F-statistic: 61.92 on 1 and 2928 DF, p-value: 4.98e-15
# Linear regression for Sale Price vs House Style (Stories)
story_lm <- lm(SalePrice ~ House.Style, data = ames)
summary(story_lm)
##
## Call:
## lm(formula = SalePrice ~ House.Style, data = ames)
##
## Residuals:
## Min 1Q Median 3Q Max
## -166990 -47520 -15700 28010 548010
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 137530 4335 31.723 < 2e-16 ***
## House.Style1.5Unf -27867 18150 -1.535 0.124800
## House.Style1Story 41170 4773 8.626 < 2e-16 ***
## House.Style2.5Fin 82470 27505 2.998 0.002737 **
## House.Style2.5Unf 39628 16270 2.436 0.014922 *
## House.Style2Story 69460 5055 13.740 < 2e-16 ***
## House.StyleSFoyer 5943 9482 0.627 0.530863
## House.StyleSLvl 27998 8056 3.475 0.000518 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 76820 on 2922 degrees of freedom
## Multiple R-squared: 0.07744, Adjusted R-squared: 0.07523
## F-statistic: 35.04 on 7 and 2922 DF, p-value: < 2.2e-16
In the above analysis, we have two linear regression model: one for bedroom with sales price and another one for story of house with sales price.
From the linear regression for bedroom, we can see that $13,890 is the amount of sales price increment as the number of bedroom increases by one. This is statically significant as evidenced by very small p-value of 4.98e-15.
From the linear regression for story of house, we can see that going from one story house ($41,170) to two story house ($69,460), the difference in sales price is around $29,000. This means adding one more story increases the sales price of house by $29,000. This value is also evidenced by small p-value of 2e-16.
However, the cost of construction of adding one more story in house is higher than just adding one more bedroom. This leads to the fact that if a construction company add three more bedroom than adding one more story of house (a living room, a bedroom, and a bathroom), then the sales price of house with increases by three times $13,890 equals to approx. $42,000. This value is higher than $29,000.
Hence, from the perspective of business, a construction company can invest in adding more bedrooms than adding more stories on the house.
Let’s further facilitate the above statement by showing a below plot between 1-story 2 bedroom house vs 2-story 2 bedroom house (1 bedroom on each floor).
# Filter the dataset for the two conditions
group_2_bedroom <- ames |>
filter(Bedroom.AbvGr == 2 & House.Style == "1Story") |>
select(SalePrice)
group_2_story_1_bedroom <- ames |>
filter(Bedroom.AbvGr == 2 & House.Style == "2Story") |>
select(SalePrice)
# Calculate the average sale price for each group
avg_sale_price_2_bedroom <- mean(group_2_bedroom$SalePrice, na.rm = TRUE)
avg_sale_price_2_story_2_bedroom <- mean(group_2_story_1_bedroom$SalePrice, na.rm = TRUE)
# Print the results
cat("Average Sale Price for 2-bedroom houses with 1-Story: $", round(avg_sale_price_2_bedroom, 2), "\n")
## Average Sale Price for 2-bedroom houses with 1-Story: $ 172282.9
cat("Average Sale Price for 2-story, 2-bedroom houses: $", round(avg_sale_price_2_story_2_bedroom, 2), "\n")
## Average Sale Price for 2-story, 2-bedroom houses: $ 140514.5
# Create a plot to visualize the comparison
ggplot() +
geom_bar(aes(x = c("1-Story, 2-bedroom", "2-Story, 2-bedroom"),
y = c(avg_sale_price_2_bedroom, avg_sale_price_2_story_2_bedroom)),
stat = "identity", fill = c("skyblue", "orange")) +
labs(title = "Comparison of Sale Prices", y = "Average Sale Price", x = "House Type") +
theme_minimal()
From the above plot, we can see that if we compare a 1-story house with 2 bedrooms vs 2-story house with 1 bedroom on each floor, then we get higher sales price in the 1-story house with 2 bedroom. This visualization supports the fact that adding more bedroom in the house is more profitable than adding more stories on the house.