knitr::opts_chunk$set(echo = TRUE)
library(ggplot2)
library(reshape2)
homes_data <- read.csv("https://www.lock5stat.com/datasets3e/HomesForSale.csv")
In this project, we will analyze data from California home prices. We will be reviewing the following questions:
How much does the size of a home influence its price?
How does the number of bedrooms of a home influence its price?
How does the number of bathrooms of a home influence its price?
How do the size, the number of bedrooms, and the number of bathrooms of a home jointly influence its price?
Are there significant differences in home prices among the four states (CA, NY, NJ, PA)?
By analyzing the data and answering these questions, we will be able to learn more about California home prices.
california_homes <- homes_data[homes_data$State == "CA", ]
# Create a basic scatter plot
plot(california_homes$Size, california_homes$Price,
xlab = "Size (in 1000 sq ft)",
ylab = "Price (in $1000s)",
main = "Price vs Size of Homes in California")
This plot is a good example of how the size of a home influences its
price. As you can see, there is a general trend throughout the plot
where the smaller the size, the cheaper the price. There are a couple
anomalies around 1500 square feet where some houses are decently
expensive but that may be relative to location rather than size.
california_homes <- homes_data[homes_data$State == "CA", ]
# Create a boxplot for Bedrooms vs Price
ggplot(california_homes, aes(x = factor(Beds), y = Price)) +
geom_boxplot() +
labs(x = "Number of Bedrooms",
y = "Price (in $1000s)",
title = "Price Distribution by Number of Bedrooms in California Homes") +
theme_minimal()
This plot is a great representation of how the number of bedrooms in a
house effects its price. There is a clear trend between number of
bedrooms and price. A house with only 2 bedrooms is much cheaper than a
house with 4 or 5. Houses with 4 and 5 bedrooms are pretty similar in
price which is interesting but the overall trend makes sense.
california_homes <- homes_data[homes_data$State == "CA", ]
# Create a boxplot for Bathrooms vs Price
ggplot(california_homes, aes(x = factor(Baths), y = Price)) +
geom_boxplot() +
labs(x = "Number of Bathrooms",
y = "Price (in $1000s)",
title = "Price Distribution by Number of Bathrooms in California Homes") +
theme_minimal()
This is another really good plot to determine the effect the number of
bathrooms has on the price of a house. There is a trend of higher priced
houses as the amount of bathrooms go up. There is not much data to be
had from houses with >3.5 bathrooms. All data <3.5 bathrooms show
that the more bathrooms there are, the more expensive the house is
california_homes <- homes_data[homes_data$State == "CA", ]
# Select only numerical columns for correlation
numerical_data <- california_homes[, c("Price", "Size", "Beds", "Baths")]
# Calculate the correlation matrix
cor_matrix <- cor(numerical_data)
# Reshape the correlation matrix for ggplot
melted_cor_matrix <- melt(cor_matrix)
# Create a confusing heatmap
ggplot(melted_cor_matrix, aes(x = Var1, y = Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = "red", high = "green", mid = "yellow", midpoint = 0) +
labs(title = "Confusing Correlation Heatmap",
x = "Variable", y = "Variable") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
This heat map is a good representation of how each part of a house
effects another. You can see how price is most greatly effected by the
size and the amount of bathrooms rather than the amount of beds.
ggplot(homes_data, aes(x = State, y = Price)) +
geom_boxplot(fill = "lightblue", color = "darkblue") + # Boxplot style
labs(title = "Home Price Distribution Across States",
x = "State",
y = "Price (in $1000s)") +
theme_minimal()
As you can see in this plot, California has the highest prices for
houses out of the 4 states. All 4 of these states are known a higher
quality of living which can spike housing prices. My reasoning for why
California is higher than New Jersey, New York, and Pennsylvania is due
to the climate. New Jersey, New York, and Pennsylvania all border each
other on the east coast while California is on the west coast.
California sees a much more favorable year round climate compared to the
other 3 states. California typically sees warm summers and warm winters,
Most areas in California experience little to no winter weather which
most home owners favor. California also is known for bigger houses and
mansions. That is my reasoning for why California’s home prices are much
higher than those of New Jersey, New York, and Pennsylvania.
We analyzed the questions above to determine how California home prices are determined as well as why California home prices are higher than New Jersey’s, New York’s, and Pennsylvania’s. We now understand the home prices are determined more from size and amount of bathrooms than from beds. This makes sense as a house with 5 beds and 2 bathrooms looks way less appealing than a house with 2 beds and 2 bathrooms. We also understand that climate may play a roll in why California’s home prices are much higher than the other 3 states.