ISA 388 - Homework 7

7.1
After reading the dataset into R (naming it using your student ID as usual), identify the number of observations, data types, as well as the mean, median, and quartiles for each variable in the dataset. Use any function(s) that you like. Show your function(s) and your output. Consider other statistics or graphics you may want to create.

getwd()

## [1] "C:/Users/rafid/Desktop"

setwd("C:/Users/rafid/Desktop")
data28 <- read.csv("CommercialProperties.csv")
str(data28)

## 'data.frame':    82 obs. of  5 variables:
##  $ Rental.Rates                : num  13.5 12 10.5 15 14 10.5 14 16.5 17.5 16.5 ...
##  $ Age                         : int  1 14 16 4 11 15 2 1 1 8 ...
##  $ Operating.Expenses.and.Taxes: num  5.02 8.19 3 10.7 8.97 ...
##  $ Vacancy.Rates               : num  0.14 0.27 0 0.05 0.07 0.24 0.19 0.6 0 0.03 ...
##  $ Total.Square.Footage        : int  123000 104079 39998 57112 60000 101385 31300 248172 215000 251015 ...

#The number of observations#
no_obsv <- nrow(data28)
cat("The Number of Observations:", no_obsv, "\n")

## The Number of Observations: 82

#The Data Types#
data_type <- sapply(data28, class)
print(data_type)

##                 Rental.Rates                          Age 
##                    "numeric"                    "integer" 
## Operating.Expenses.and.Taxes                Vacancy.Rates 
##                    "numeric"                    "numeric" 
##         Total.Square.Footage 
##                    "integer"

#Finding the mean, median and quartile for each variable in the dataset# 
summary(data28)

##   Rental.Rates       Age         Operating.Expenses.and.Taxes Vacancy.Rates    
##  Min.   :10.5   Min.   : 0.000   Min.   : 3.000               Min.   :0.00000  
##  1st Qu.:14.0   1st Qu.: 2.000   1st Qu.: 8.145               1st Qu.:0.00000  
##  Median :15.0   Median : 4.000   Median :10.370               Median :0.03000  
##  Mean   :15.2   Mean   : 8.012   Mean   : 9.814               Mean   :0.08244  
##  3rd Qu.:16.5   3rd Qu.:15.000   3rd Qu.:11.620               3rd Qu.:0.09750  
##  Max.   :20.0   Max.   :20.000   Max.   :20.000               Max.   :0.73000  
##  Total.Square.Footage
##  Min.   : 27000      
##  1st Qu.: 70500      
##  Median :129614      
##  Mean   :164772      
##  3rd Qu.:239000      
##  Max.   :500020

summaryStatistics <- sapply(data28, function(x) {
  c(Mean = mean(x, na.rm = TRUE),
    Median = median(x, na.rm = TRUE),
    First_Quartile = quantile(x, 0.25, na.rm = TRUE),
    Third_Quartile = quantile(x, 0.75, na.rm = TRUE))
})

print(summaryStatistics)

##                    Rental.Rates       Age Operating.Expenses.and.Taxes
## Mean                   15.19817  8.012195                     9.813902
## Median                 15.00000  4.000000                    10.370000
## First_Quartile.25%     14.00000  2.000000                     8.145000
## Third_Quartile.75%     16.50000 15.000000                    11.620000
##                    Vacancy.Rates Total.Square.Footage
## Mean                  0.08243902             164772.1
## Median                0.03000000             129614.0
## First_Quartile.25%    0.00000000              70500.0
## Third_Quartile.75%    0.09750000             239000.0

  #Histogram Data of the Rental Rates#
  hist(data28$Rental.Rates, 
       main="Distribution of Commercial Rental Rates", 
       xlab="Rental Rates ($USD per square foot)", 
       col="darkblue", 
       border="grey")

7.2
Perform a series of regressions where you predict

7.2.1 Rental rates of the properties (Rental Rates) using Age,

rentalRatesByAgeModel <- lm(Rental.Rates ~ Age, data = data28)
summary(rentalRatesByAgeModel)

## 
## Call:
## lm(formula = Rental.Rates ~ Age, data = data28)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3739 -0.9308  0.1209  1.0112  5.3582 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 15.57003    0.30593  50.895   <2e-16 ***
## Age         -0.04641    0.02932  -1.583    0.117    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.775 on 80 degrees of freedom
## Multiple R-squared:  0.03038,    Adjusted R-squared:  0.01826 
## F-statistic: 2.506 on 1 and 80 DF,  p-value: 0.1173

7.2.2 Rental rates of the properties (Rental Rates) using Operating Expenses & Taxes,

rentalRatesByExpenses <- lm(Rental.Rates ~ Operating.Expenses.and.Taxes, data = data28)
summary(rentalRatesByExpenses)

## 
## Call:
## lm(formula = Rental.Rates ~ Operating.Expenses.and.Taxes, data = data28)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5862 -0.9241 -0.1766  0.7576  4.8161 
## 
## Coefficients:
##                              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  12.17872    0.63735  19.109  < 2e-16 ***
## Operating.Expenses.and.Taxes  0.30767    0.06247   4.925 4.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.579 on 80 degrees of freedom
## Multiple R-squared:  0.2327, Adjusted R-squared:  0.2231 
## F-statistic: 24.26 on 1 and 80 DF,  p-value: 4.452e-06

7.2.3 Rental rates of the properties (Rental Rates) using Vacancy Rates

rentalRatesByVacancy <- lm(Rental.Rates ~ Vacancy.Rates, data = data28)
summary(rentalRatesByVacancy)

## 
## Call:
## lm(formula = Rental.Rates ~ Vacancy.Rates, data = data28)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.8924 -1.1243 -0.0965  1.1264  4.6569 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    15.0965     0.2329  64.815   <2e-16 ***
## Vacancy.Rates   1.2329     1.4841   0.831    0.409    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.795 on 80 degrees of freedom
## Multiple R-squared:  0.008552,   Adjusted R-squared:  -0.003841 
## F-statistic: 0.6901 on 1 and 80 DF,  p-value: 0.4086

7.2.4 Rental rates of the properties (Rental Rates) using Total Square Footage

rentalRatesBySQFT <- lm(Rental.Rates ~  Total.Square.Footage, data = data28)
summary(rentalRatesBySQFT)

## 
## Call:
## lm(formula = Rental.Rates ~ Total.Square.Footage, data = data28)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1236 -0.7636  0.2848  1.0718  3.3699 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          1.370e+01  2.848e-01  48.117  < 2e-16 ***
## Total.Square.Footage 9.065e-06  1.421e-06   6.377 1.08e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.468 on 80 degrees of freedom
## Multiple R-squared:  0.337,  Adjusted R-squared:  0.3287 
## F-statistic: 40.67 on 1 and 80 DF,  p-value: 1.084e-08

Explanation : Property age, operating costs, and size significantly influence rental prices, indicating these factors affect rent variations. Conversely, vacancy rates show no significant impact on rent.

7.3
Look at your output for the regression of rental rates and total square footage. How many observations are analyzed (what is the “n” value)? What is the regression equation? What is the R2 value? [use the “Multiple R-Squared”]. What percentage of variation does this model explain? According to this model, how much would a property of 300,000 square feet rent for?

Explanation

Number of Observations Analyzed (n value): The dataset contains 82 observations. Regression Equation: The regression equation derived from the model is given by: Rental Rates = 13.70 + 0.000009065 × Total Square Footage

R² Value (Multiple R-Squared): The R² value is 0.337, indicating that the model explains 33.7% of the variance in rental rates from the total square footage. Percentage of Variation Explained: This model explains 33.7% of the variation in rental rates.

Rental Rate for a Property of 300,000 Square Feet: To find the rental rate for a property of 300,000 square feet, you use the regression equation: Rental Rates for 300,000 sq. ft = 13.70 + 0.000009065 × 300,000 = 16.42 USD per square foot

Therefore, Rental Rates for 300,000 sq. ft = 16.42 USD per square foot

7.4
Use the ggplot() to check the required conditions for the model [only do model 4, the one with total square footage as the independent variable and rental rates as the dependent variable]. Show your commands and your output. Choose your own colors for your graphics. Be sure you have appropriate titles on all of your graphics, and that all axes are labeled appropriately.State your conclusion about the usefulness of the model.

library(ggplot2)

#Leverage Values#
data28$leverage <- hatvalues(rentalRatesBySQFT)
data28$residuals <- resid(rentalRatesBySQFT)
data28$fitted.values <- fitted(rentalRatesBySQFT)

#Plot 1#
plot_1 <- ggplot(data28, aes(x = fitted.values, y = residuals)) +
  geom_point(color = "darkgreen") +
  geom_hline(yintercept = 0, linetype = "dashed", color = "gray") +
  labs(title = "Residuals vs. Fitted Values", x = "Fitted Values", y = "Residuals") +
  theme(plot.title = element_text(hjust = 0.5))

print(plot_1)

Explanation : This plot evaluates the linear relationship assumption and homoscedasticity (constant variance) of residuals. Ideally, the points should be randomly dispersed around the horizontal line at 0, without forming any patterns. Patterns or a funnel shape would indicate potential issues with the model.

#Plot 2#
plot_2 <- ggplot(data28, aes(x = residuals)) +
  geom_histogram(fill = "coral", color = "black", binwidth = 0.5) +
  labs(title = "Histogram of Residuals", x = "Residuals", y = "Frequency") +
  theme(plot.title = element_text(hjust = 0.5))

print(plot_2)

Explanation : The histogram of residuals checks for the normality of residuals. A well-fitting model should have residuals that approximately follow a normal distribution. Deviations from this pattern suggest the model might not be capturing some aspects of the data’s structure.

#Plot 3#
plot_3 <- ggplot(data28, aes(sample = residuals)) +
  geom_qq(color = "purple") +
  geom_qq_line(color = "darkred") +
  labs(title = "QQ Plot of Residuals") +
  theme(plot.title = element_text(hjust = 0.5))

print(plot_3)

Explanation : The QQ plot provides another way to assess the normality of residuals. Points following closely along the reference line indicate that residuals are normally distributed. Significant deviations from the line, especially at the tails, would signal that the residuals do not follow a normal distribution, potentially undermining the model’s assumptions.

#plot 4#
plot_4 <- ggplot(data28, aes(x = leverage, y = residuals)) +
  geom_point(color = "steelblue") +
  labs(title = "Residuals vs. Leverage", x = "Leverage", y = "Residuals") +
  theme(plot.title = element_text(hjust = 0.5)) # Centering the title

# Print the plot
print(plot_4)

Explanation : This plot helps in identifying influential cases in the regression model. Points with high leverage can have a disproportionate effect on the model’s parameters. Ideally, residuals should be randomly distributed, and leverage values should be low. A cluster of points along the zero line for residuals, with no distinct patterns or outliers with high leverage, suggests the model is robust and the predictors are appropriate.

Conclusion
This comprehensive graphical analysis, combined with statistical measures from the regression summary, provides a strong foundation for concluding the model’s effectiveness and its potential usefulness in predicting rental rates based on the total square footage.

ISA 388 - Homework 7

Mohammed Hossain

2024-03-13