Cincinnati Housing Analysis
Cincinnati Housing Data Introduction
The housing market across the United States follows similar trends, but is very different in each state, and even each city. Cincinnati is home to many professional sports teams, businesses, and private schools so I thought it would be interesting to look at the current housing market of the city.
Zillow is a website that realtors can list properties for sale so that home buyers can easily search for a home that matches their needs. With a quick location search, I narrowed down the criteria to just properties in the Cincinnati area. It was interesting to see the map that outlined the location Zillow is pulling listings from because a “Cincinnati” search does not include properties in Kenwood or Newtown which are not too far from downtown, but it does include properties in Withamsville and Pleasant Run which are further from downtown. It does let you customize your search area, but for this project I kept the default Cincinnati search area using this link: https://www.zillow.com/cincinnati-oh/
Zillow also has a bunch of various APIs available that they say are intended for commercial use for businesses, as well as some CSV files for research and academic uses. However, for this project, scraping the data I am searching for does not overload Zillow’s servers and it will have fine analytic uses.
Data Worth Scraping for Analysis
After making the search for properties listed for sale in Cincinnati, this is what I see. A list of homes with descriptors of the property, price, address, realtor company, and a scrollable array of pictures of the property. In the top right it also tells me that it is sorting by “Homes for You” but since I was not signed into Zillow, I do not know if there is any significant meaning behind this sorting.
So the variables I want to scrape is almost everything I can see on each of these tiles on the page.
Address
Realtor company
Descriptors
Number of bedrooms
Number of bathrooms
Square footage
Type of property (ex. Condo, or House)
Price
Minor Issue I Came Across
When I was scraping the Zillow property data, it was only grabbing the first 9 listings on each page for some reason. My amateur thought is that there is some type of dynamic loading when scrolling throughout each page, that the scraper cannot load. I noticed that when scrolling on the page, sometimes there is a slight delay for the listings to pop up if I scroll quickly. Whether this was the case or not, I could not find a workaround to allow me to scrape the full, probably 50 or so listings per page. I had to stick with the 9 per page it was giving me for each of my variables and run it through enough pages to have a good amount of data to analyze.
Here is the link to the final data frame after scraping which is 135 different properties:
Analysis
There are plenty of various insights that can be gained from real estate data that I will explore below
How does the type of property/listing relate to Price?
There were many different property types I was able to scrape from Zillow including:
Condo
House
Lot/Land
Multi-family home (ex. duplex, triplex, suite, etc)
New Construction (meaningt the building has not been built yet)
Townhouse
I created a boxplot to help visualize this analysis and see the differences in prices for the different types of properties up for sale. I filtered out properties that were over $2,000,000 because there were only a few of these outliers that visually increased the graph size.
From this visualization, I can draw many different conclusions and insights about type and price. New constructions tend to be the most pricey whereas simply buying the lot/land is the cheapest. This makes sense because a new home will have very modern features and new electrical, plumbing, and other things that require more maintenance the older the property is. A lot does not have any building on it yet which makes it not worth as much. Houses and condos are pretty similar in terms of the median, but houses have a larger spread which could be due to there being more homes (77) than condos (12) in the dataset, or the fact that a house comes in all different forms too. Condos can sometimes be more pricey, even if there is less area, because they’re downtown or in more populated areas.
Relationship between square footage and price
Once again I filtered out properties with prices over $2,000,000 as there is like 2 or 3 that just stretches the graph.
This is a pretty straightforward visual that shows what the relationship between square footage and price of a property is. As I assumed, more square footage tends to relate to higher home prices, as the home is larger and/or the lot is larger. However, it could be interesting to get a lot more data to see the real correlation at higher footage because here we can see that the lower square footage and prices are more tightly packed on the graph, but the higher it goes the more spread out it is. Overall, I will claim that there is a positive relationship between square footage and price.
Types of properties by Realtors
I wanted to see if there was any realtors who made up most of the Cincinnati data that I have scraped and on top of that, see if they sell more of a certain type of property. To make this visualization I combined realtors like Sibcy Cline that had a handful of different name variations. I also filtered out all the realtors that had only 1 property to focus in on more of the overall trend.
One takeaway from this visualization is that there is a lot of different realtor groups. Out of all the realtor groups, even the ones I filtered out of this visual, there are 47 unique groups which is a high percentage given that I collected 135 different properties. I like this visual because I can clearly see that Coldwell Bank has the most properties listed from the data I had scraped, and they’ve got a bit of every type of property. Also, across the board the most common property type appears to be house for these realtor groups, which makes sense because it was the most common across all the listings too.
This type of analysis might come more in handy on the competitive side of this business. Where realtors would look at this graph and possibly see a competitor like Coldwell with a larger variety of properties, and that might make the realtor want to diversify their company’s portfolio since they know Coldwell is booming right now (just an example).
Do more beds correlate to more bathrooms?
From the data I have, I wanted to see if higher number of bedrooms also equal a higher number of bathrooms. I did a scatter plot and used jitter so the points would be a little spread out and I could look at the type of property as well because that might have some value there. I decided it was okay to jitter the points because both the x and y axes are categorical in the sense that you cannot have a 2.3 bedrooms or 1.8 bathrooms, so it was fine to spread the points out in order to see the type better.
From the graph above, I can see that there is a slight upwards/positive relationship between bathrooms and bedrooms. Meaning that more bedrooms does equal more bathrooms. It is also interesting to see that house property type is fairly spread out and even throughout, but I notice some trends of condos and townhouses having less bedrooms/bathrooms. Alternatively, multi-family homes and new construction properties are more in the middle with 3-4 bedrooms.
I also ran a linear model to see what the r squared was based on my data. Obviously, there is more than just the data I scraped, but from below I see that the model is statistically significant because the p-value is less than 0.05 and about 40% of the variance in bedrooms is explained by the number of bathrooms.
How do more bedrooms / bathrooms relate to price?
For my final analysis I was interested in looking at how the number of bedrooms relates to price as well as how the number of bathrooms relate to price. For these two graphs I also includes the type of property again using jitter to look for any trends there.
In terms of bedrooms relating to price, I see an upwards/positive trend meaning more bedrooms means higher prices. Which makes sense for the most part, but many properties such as condos in cities have become quite expensive to buy outright. I see some of this also with the red dots being condos, some that are 2 and 3 bedrooms are more expensive than many of the houses that are 2 and 3 bedrooms.
Similar to the bedrooms, more bathrooms appears to have a positive affect on price. In this graph, I filtered out properties with more than 6.5 bathrooms because there was 2 properties that had 7 bathrooms or so and were super cheap… not sure what that must be. But I see that many of the new construction properties have 2-5 bathrooms and those prices go up and up. Like I saw with bedrooms, houses are across the whole board in terms of number of bathrooms and the prices go up but not as much as new constructions.
Concluding Thoughts
Housing data can be used for many different purposes and audiences to be beneficial. If individuals wanted to compare properties without having to click on a specific one and scroll through the details, this can be a great way to quickly do that. It could also be used by realtors to look at competitors properties or even their own to create strategies for the future.
Other datatypes would be great to have here in this dataset such as year built, if there is a garage/pool/basement/etc, and last sell price and year. This would take a lot more scraping and creativity to code this because to get those details the user has to click on the property listing first which takes it to a somewhat unique URL. I am sure with more time and practice I would be able to do this, but this housing data was very interesting to work with and I can picture many ways to utilize similar forms of data.
I would love to figure out what is causing the scrape function to not collect more than 9 properties per page too. This would allow me to have much more data to analyze and gain insights from.