Use the Housing dataset from Ecdat package

Gives information about sales prices and other variables for 546 homes in Windsor, Canada in 1987

Scatterplot of sales price against the lot size of these homes, with loess curves plotted by whether or not the home is in a preferred neighborhood

library(ggplot2)
library(Ecdat)
## Loading required package: Ecfun
## 
## Attaching package: 'Ecfun'
## The following object is masked from 'package:base':
## 
##     sign
## 
## Attaching package: 'Ecdat'
## The following object is masked from 'package:datasets':
## 
##     Orange
ggplot(Housing, aes(x = lotsize, y = price, color = prefarea))+
  geom_point(alpha = 0.5)+
  geom_smooth(method = "loess")+
  labs(title = "Sales Prices vs. Lot Size of Homes in Windsor, Canada in 1987", x = "Lot Size (Square Feet)", y = "Price", color = "Preferred Neighborhood?") +
  scale_y_continuous(breaks = seq(25000, 250000, by = 25000), limits = c(25000, 200000)) + 
  scale_x_continuous(breaks = seq(1500, 16500, by = 3000))+
  theme(axis.text = element_text(size = 14, face = "bold"), 
        axis.title = element_text(size = 15, face = "bold"), 
        plot.title = element_text(size = 20), 
        axis.ticks.length = unit(0.25, "cm"), 
        axis.ticks = element_line(color = "firebrick"),
        legend.key.size = unit(1.3, "cm"),
        legend.text = element_text(size = 16),
        legend.title = element_text(size = 16))
## `geom_smooth()` using formula = 'y ~ x'

Analysis

The scatterplot of price against the lot size of homes in Windsor, Canada shows that there are more homes in this dataset that are not in a preferred neighborhood of the city than homes that are in a preferred neighborhood. There does not appear to be a linear relationship between the lot size in square feet and the prices of these homes for either preferred or unpreferred neighborhoods. We can see that homes in preferred neighborhoods do generally have higher prices than homes that are not in preferred neighborhoods, as can be seen by more blue (preferred) points being located at higher prices than red (not preferred) points. Consequently, the Loess curve representing preferred neighborhoods is located above the Loess curve representing homes that are not in preferred neighborhoods.

There is a cluster of points with low lot square footage that have lower prices. The overall trend for both groups is an increase in price for higher lot sizes, as evident by the Loess curves generally increasing over time. However, looking at the points apart from the curves, we can see a lot of variation within both groups of preferred neighborhoods and not, with many homes that have lot sizes in the middle parts of the overall data having the highest prices. The trend of increasing price for homes with “average” lot sizes (or just lot sizes toward the center of the distribution), is especially evident for the homes that are in preferred neighborhoods, with an upward shift in the blue Loess curve with lot sizes of about 6000-7500 square feet. After this, we see a slight downward trend. There are also multiple homes that are not in preferred neighborhoods with square footage greater than 10,500 that have prices less than 100,000. It could be true that lot size does not play a huge role in determining the price, or even that homes with an average amount of land have other qualities that tend to increase their value, but it does seem that being in a preferred neighborhood is associated with higher selling price for these homes in Windsor.