knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
library(reshape2)
homes_data <- read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")

Introduction

In this project, we will analyze data from California home prices. We will be reviewing the following questions:

  1. How much does the size of a home influence its price?

  2. How does the number of bedrooms of a home influence its price?

  3. How does the number of bathrooms of a home influence its price?

  4. How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

  5. Are there significant differences in home prices among the four states (CA, NY, NJ, PA)?

By analyzing the data and answering these questions, we will be able to learn more about California home prices.

Data Analyzation

Q1: How much does the size of a home influence its price?

california_homes <- homes_data[homes_data$State == "CA", ]

# Create a basic scatter plot
plot(california_homes$Size, california_homes$Price,
     xlab = "Size (in 1000 sq ft)", 
     ylab = "Price (in $1000s)", 
     main = "Price vs Size of Homes in California")

This plot is a good example of how the size of a home influences its price. As you can see, there is a general trend throughout the plot where the smaller the size, the cheaper the price. There are a couple anomalies around 1500 square feet where some houses are decently expensive but that may be relative to location rather than size.

Q2: How does the number of bedrooms of a home influence its price?

california_homes <- homes_data[homes_data$State == "CA", ]

# Create a boxplot for Bedrooms vs Price
ggplot(california_homes, aes(x = factor(Beds), y = Price)) +
  geom_boxplot() +
  labs(x = "Number of Bedrooms", 
       y = "Price (in $1000s)", 
       title = "Price Distribution by Number of Bedrooms in California Homes") +
  theme_minimal()

This plot is a great representation of how the number of bedrooms in a house effects its price. There is a clear trend between number of bedrooms and price. A house with only 2 bedrooms is much cheaper than a house with 4 or 5. Houses with 4 and 5 bedrooms are pretty similar in price which is interesting but the overall trend makes sense.

Q3: How does the number of bathrooms of a home influence its price?

california_homes <- homes_data[homes_data$State == "CA", ]

# Create a boxplot for Bathrooms vs Price
ggplot(california_homes, aes(x = factor(Baths), y = Price)) +
  geom_boxplot() +
  labs(x = "Number of Bathrooms", 
       y = "Price (in $1000s)", 
       title = "Price Distribution by Number of Bathrooms in California Homes") +
  theme_minimal()

This is another really good plot to determine the effect the number of bathrooms has on the price of a house. There is a trend of higher priced houses as the amount of bathrooms go up. There is not much data to be had from houses with >3.5 bathrooms. All data <3.5 bathrooms show that the more bathrooms there are, the more expensive the house is

Q4: How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?

california_homes <- homes_data[homes_data$State == "CA", ]

# Select only numerical columns for correlation
numerical_data <- california_homes[, c("Price", "Size", "Beds", "Baths")]

# Calculate the correlation matrix
cor_matrix <- cor(numerical_data)

# Reshape the correlation matrix for ggplot
melted_cor_matrix <- melt(cor_matrix)

# Create a confusing heatmap
ggplot(melted_cor_matrix, aes(x = Var1, y = Var2, fill = value)) +
  geom_tile() +
  scale_fill_gradient2(low = "red", high = "green", mid = "yellow", midpoint = 0) +
  labs(title = "Confusing Correlation Heatmap",
       x = "Variable", y = "Variable") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

This heat map is a good representation of how each part of a house effects another. You can see how price is most greatly effected by the size and the amount of bathrooms rather than the amount of beds.

Q5: Are there significant differences in home prices among the four states (CA, NY, NJ, PA)?

ggplot(homes_data, aes(x = State, y = Price)) +
  geom_boxplot(fill = "lightblue", color = "darkblue") +  # Boxplot style
  labs(title = "Home Price Distribution Across States", 
       x = "State", 
       y = "Price (in $1000s)") +
  theme_minimal()

As you can see in this plot, California has the highest prices for houses out of the 4 states. All 4 of these states are known a higher quality of living which can spike housing prices. My reasoning for why California is higher than New Jersey, New York, and Pennsylvania is due to the climate. New Jersey, New York, and Pennsylvania all border each other on the east coast while California is on the west coast. California sees a much more favorable year round climate compared to the other 3 states. California typically sees warm summers and warm winters, Most areas in California experience little to no winter weather which most home owners favor. California also is known for bigger houses and mansions. That is my reasoning for why California’s home prices are much higher than those of New Jersey, New York, and Pennsylvania.

Summary

We analyzed the questions above to determine how California home prices are determined as well as why California home prices are higher than New Jersey’s, New York’s, and Pennsylvania’s. We now understand the home prices are determined more from size and amount of bathrooms than from beds. This makes sense as a house with 5 beds and 2 bathrooms looks way less appealing than a house with 2 beds and 2 bathrooms. We also understand that climate may play a roll in why California’s home prices are much higher than the other 3 states.