---
title: "Dynamic Regression Analysis"
author: "Hasin Anwar"
date: "2024-06-09"
output: html_document
---



# **Analysis of House Prices**

## **1. Plot the Dependent Variable Against Each Independent Variable**

### **Price vs Lotsize**

```r
# Select the Excel file using a file dialog
file_path <- file.choose()

# Load the data
hprice1 <- read_excel(file_path)

# Plot price against lotsize
p1 <- ggplot(hprice1, aes(x = lotsize, y = price)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Price vs Lotsize",
       x = "Lotsize",
       y = "Price") +
  theme_minimal() +
  theme(plot.title = element_text(size = 14, face = "bold"))
ggplotly(p1)
## `geom_smooth()` using formula = 'y ~ x'

Price vs Square Footage

# Plot price against sqrft
p2 <- ggplot(hprice1, aes(x = sqrft, y = price)) +
  geom_point(color = "green") +
  geom_smooth(method = "lm", se = FALSE, color = "purple") +
  labs(title = "Price vs Square Footage",
       x = "Square Footage",
       y = "Price") +
  theme_minimal() +
  theme(plot.title = element_text(size = 14, face = "bold"))
ggplotly(p2)
## `geom_smooth()` using formula = 'y ~ x'

Price vs Bedrooms

# Plot price against bdrms
p3 <- ggplot(hprice1, aes(x = bdrms, y = price)) +
  geom_point(color = "orange") +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(title = "Price vs Bedrooms",
       x = "Bedrooms",
       y = "Price") +
  theme_minimal() +
  theme(plot.title = element_text(size = 14, face = "bold"))
ggplotly(p3)
## `geom_smooth()` using formula = 'y ~ x'

2. Descriptive Statistics

summary(hprice1)
##      price           assess          bdrms          lotsize          sqrft     
##  Min.   :111.0   Min.   :198.7   Min.   :2.000   Min.   : 1000   Min.   :1171  
##  1st Qu.:230.0   1st Qu.:253.9   1st Qu.:3.000   1st Qu.: 5733   1st Qu.:1660  
##  Median :265.5   Median :290.2   Median :3.000   Median : 6430   Median :1845  
##  Mean   :293.5   Mean   :315.7   Mean   :3.568   Mean   : 9020   Mean   :2014  
##  3rd Qu.:326.2   3rd Qu.:352.1   3rd Qu.:4.000   3rd Qu.: 8583   3rd Qu.:2227  
##  Max.   :725.0   Max.   :708.6   Max.   :7.000   Max.   :92681   Max.   :3880  
##     colonial     
##  Min.   :0.0000  
##  1st Qu.:0.0000  
##  Median :1.0000  
##  Mean   :0.6932  
##  3rd Qu.:1.0000  
##  Max.   :1.0000

3. Perform Linear Regression

Linear Regression Results

model <- lm(price ~ lotsize + sqrft + bdrms, data = hprice1)
summary(model)
## 
## Call:
## lm(formula = price ~ lotsize + sqrft + bdrms, data = hprice1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -120.026  -38.530   -6.555   32.323  209.376 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -2.177e+01  2.948e+01  -0.739  0.46221    
## lotsize      2.068e-03  6.421e-04   3.220  0.00182 ** 
## sqrft        1.228e-01  1.324e-02   9.275 1.66e-14 ***
## bdrms        1.385e+01  9.010e+00   1.537  0.12795    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 59.83 on 84 degrees of freedom
## Multiple R-squared:  0.6724, Adjusted R-squared:  0.6607 
## F-statistic: 57.46 on 3 and 84 DF,  p-value: < 2.2e-16

4. Additional Tasks

a. Plot Residuals Against Fitted Values

residuals_vs_fitted <- ggplot() +
  geom_point(aes(x = fitted(model), y = residuals(model)), color = "darkred") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
  labs(title = "Residuals vs Fitted Values",
       x = "Fitted Values",
       y = "Residuals") +
  theme_minimal() +
  theme(plot.title = element_text(size = 14, face = "bold"))
ggplotly(residuals_vs_fitted)

b. Test for Multicollinearity

vif(model)
##  lotsize    sqrft    bdrms 
## 1.037211 1.418654 1.396663

c. Test for Normality of Residuals Using Q-Q Plot

# Create Q-Q plot
qqnorm_plot <- ggplot() +
  geom_qq(aes(sample = resid(model)), color = "blue") +
  geom_qq_line(aes(sample = resid(model)), color = "red") +
  labs(title = "Q-Q Plot of Residuals") +
  theme_minimal() +
  theme(plot.title = element_text(size = 14, face = "bold"))
ggplotly(qqnorm_plot)

```

###This project presents a comprehensive analysis of housing prices using the dataset hprice1. It involves the exploration of relationships between price and key independent variables, descriptive statistics, and linear regression analysis. Additionally, the study examines the residuals, multicollinearity, and the normality of residuals to ensure the robustness of the regression model. Visualizations and statistical tests are employed to derive meaningful insights. ##Key Components: Exploratory Data Analysis with plots of dependent and independent variables. Descriptive statistics summarizing the dataset. Linear regression analysis to determine significant predictors of housing prices. Residual analysis, multicollinearity test, and normality check of residuals.