---
title: "Dynamic Regression Analysis"
author: "Hasin Anwar"
date: "2024-06-09"
output: html_document
---
# **Analysis of House Prices**
## **1. Plot the Dependent Variable Against Each Independent Variable**
### **Price vs Lotsize**
```r
# Select the Excel file using a file dialog
file_path <- file.choose()
# Load the data
hprice1 <- read_excel(file_path)
# Plot price against lotsize
p1 <- ggplot(hprice1, aes(x = lotsize, y = price)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Price vs Lotsize",
x = "Lotsize",
y = "Price") +
theme_minimal() +
theme(plot.title = element_text(size = 14, face = "bold"))
ggplotly(p1)
## `geom_smooth()` using formula = 'y ~ x'
# Plot price against sqrft
p2 <- ggplot(hprice1, aes(x = sqrft, y = price)) +
geom_point(color = "green") +
geom_smooth(method = "lm", se = FALSE, color = "purple") +
labs(title = "Price vs Square Footage",
x = "Square Footage",
y = "Price") +
theme_minimal() +
theme(plot.title = element_text(size = 14, face = "bold"))
ggplotly(p2)
## `geom_smooth()` using formula = 'y ~ x'
# Plot price against bdrms
p3 <- ggplot(hprice1, aes(x = bdrms, y = price)) +
geom_point(color = "orange") +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = "Price vs Bedrooms",
x = "Bedrooms",
y = "Price") +
theme_minimal() +
theme(plot.title = element_text(size = 14, face = "bold"))
ggplotly(p3)
## `geom_smooth()` using formula = 'y ~ x'
summary(hprice1)
## price assess bdrms lotsize sqrft
## Min. :111.0 Min. :198.7 Min. :2.000 Min. : 1000 Min. :1171
## 1st Qu.:230.0 1st Qu.:253.9 1st Qu.:3.000 1st Qu.: 5733 1st Qu.:1660
## Median :265.5 Median :290.2 Median :3.000 Median : 6430 Median :1845
## Mean :293.5 Mean :315.7 Mean :3.568 Mean : 9020 Mean :2014
## 3rd Qu.:326.2 3rd Qu.:352.1 3rd Qu.:4.000 3rd Qu.: 8583 3rd Qu.:2227
## Max. :725.0 Max. :708.6 Max. :7.000 Max. :92681 Max. :3880
## colonial
## Min. :0.0000
## 1st Qu.:0.0000
## Median :1.0000
## Mean :0.6932
## 3rd Qu.:1.0000
## Max. :1.0000
model <- lm(price ~ lotsize + sqrft + bdrms, data = hprice1)
summary(model)
##
## Call:
## lm(formula = price ~ lotsize + sqrft + bdrms, data = hprice1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -120.026 -38.530 -6.555 32.323 209.376
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.177e+01 2.948e+01 -0.739 0.46221
## lotsize 2.068e-03 6.421e-04 3.220 0.00182 **
## sqrft 1.228e-01 1.324e-02 9.275 1.66e-14 ***
## bdrms 1.385e+01 9.010e+00 1.537 0.12795
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 59.83 on 84 degrees of freedom
## Multiple R-squared: 0.6724, Adjusted R-squared: 0.6607
## F-statistic: 57.46 on 3 and 84 DF, p-value: < 2.2e-16
residuals_vs_fitted <- ggplot() +
geom_point(aes(x = fitted(model), y = residuals(model)), color = "darkred") +
geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
labs(title = "Residuals vs Fitted Values",
x = "Fitted Values",
y = "Residuals") +
theme_minimal() +
theme(plot.title = element_text(size = 14, face = "bold"))
ggplotly(residuals_vs_fitted)
vif(model)
## lotsize sqrft bdrms
## 1.037211 1.418654 1.396663
# Create Q-Q plot
qqnorm_plot <- ggplot() +
geom_qq(aes(sample = resid(model)), color = "blue") +
geom_qq_line(aes(sample = resid(model)), color = "red") +
labs(title = "Q-Q Plot of Residuals") +
theme_minimal() +
theme(plot.title = element_text(size = 14, face = "bold"))
ggplotly(qqnorm_plot)
```
###This project presents a comprehensive analysis of housing prices using the dataset hprice1. It involves the exploration of relationships between price and key independent variables, descriptive statistics, and linear regression analysis. Additionally, the study examines the residuals, multicollinearity, and the normality of residuals to ensure the robustness of the regression model. Visualizations and statistical tests are employed to derive meaningful insights. ##Key Components: Exploratory Data Analysis with plots of dependent and independent variables. Descriptive statistics summarizing the dataset. Linear regression analysis to determine significant predictors of housing prices. Residual analysis, multicollinearity test, and normality check of residuals.