Ames - the place of our interest in this project - is a small city located in the United States…
…in Story County, central Iowa.
This city is famous for Iowa State University which is home to over 33,000 students. Taking into consideration that Ames population is just over 66,000, it can be concluded that around half of its population is students. This means that there may be considerable demand for accommodation near the university.
Interestingly, in 2010, Ames was ninth on CNNMoney’s “Best Places to Live” list.
Our database contains data on the characteristics and prices of houses sold in the city of Ames, Iowa, between 2006 and 2010. It includes 2,930 observations and initially 83 variables.
This database was used some time ago in the kaggle competition with the task to build a model predicting real estate prices sold. A detailed description of the variable (types and definition) can be found here.
In this visualization project we have basically two, main aims:
Let us first look at the distribution of sold houses on the map of Ames. While looking at the map, one may notice that the mentioned Iowa State University is located in the central part of the city and it is in its vicinity that the largest number of house sales took place in the considered years. Another such important area is near the airport, although much fewer transactions took place there. On the other hand, in the eastern part of the town, only a few houses were sold within 5 years (between 2006 and 2010).
When we slightly zoom the map, we are able to view the specific location of all the sold houses. The points representing the houses have been marked with different colors to distinguish the districts in which they are located. In general, there are as many as 28 districts identified in our database. However, due to the fact that in part of them there were very few transactions, in some charts, we limit ourselves to the 11 most important of them.
Hovering the cursor over the appropriate houses, information about the district name, the price and the year of construction of the house will be displayed interactively. It is worth noticing at this moment that in the northern part of the city districts Northridge Heights, Gilbert and Stone Brook are located. Old Town is just to the right of the university.
When it comes to the number of transactions depending on each district (the plot below), North Ames was by far the most numerous. As it was already mentioned, and as can be seen now, there were relatively few sales in some districts. Importantly, two districts of the highest quality, Stone Brook and Northridge Heights are both districts in the northern part of the city.
Let us now move to the distribution of the most interesting variable from our analysis point of view, i.e. price. The map above presents its distribution with color usage determining its level. It is not difficult to notice that the highest prices were for houses located in the aforementioned northern districts. It is surprising, however, that in the immediate vicinity of the university and in the Old Town the prices were the lowest.
The plot above shows the distribution of the sold house prices in Ames in the histogram form. We can clearly see that the distribution employs a right skew, which results in the median price being closer to the beginning of the graph than in the case of normal distribution. We can also gather that there are a handful of outliers represented by nearly invisible bars on the right side of the plot. The vast majority of the offers are below the price of $300,000. Two most numerous groups are between are in the $100,000 - $200,000 interval. They constitute more than half of all of the offers.
On the boxplot above we can observe the differences in the price distribution among the districts. As one could see on the previous map, Northridge Heights (North-West side) is standing out as the most expensive district. Apart from having the highest median and almost no outliers, Northridge Heights also seem to have the widest range of values represented by the boxplot’s whiskers. What is interesting that the cheapest district among the analyzed is the Old Town. It is in contrast with European standard, where Old Towns tend to be the historic, atmospheric centre of the city - thus their prices are usually among the highest. However, Ames is a relatively young, small city and it seems that the Old Town there is in fact the oldest district, but not that old to become prestigious. It may be cheaper due to obsolete plumbing, poor quality of finishing, or other time-related aspects. The rest of the districts are quite smoothly transitioning from one to another, without any major jumps in price distribution. Worth noting is also that in Sawyer West there is not a single outlier.
Next, let us have an insight into how the median price changed among the districts over the 2006-2010 period.
Northridge Heights was occupying the top spot throughout the whole period, while Old Town occupied the bottom. Northridge Heights again stands out in terms of price, being the only district with any shade of blue. In fact, there were just a few changes in positions, mainly in the 4-6 spots. Gilbert, Northwest Ames were superseded by Sawyer West in 2007 for it come back to 6th place 2 years after. All in all, however, the volatility in the median prices is rather low and we do not observe many swaps.
It would then seem that prices are relatively constant over the years. Let us take a closer look at this issue.
We can observe that the median price for offers per year is actually fluctuating closely around the total median price (which is $160,000). The most severe deviations were mere $5,000 ones, which shows us that the year does not really affect the median price significantly. The number of sold houses each year (represented by bars) is also stable over the years bar the last one. That is mainly due to the fact that the dataset covered only half of 2010.
Someone might be wondering whether it stays the same when we make a smaller division of time? On the plot below we can see the breakdown of the previous plot, which also takes months into account.
Now we can observe that each year, a monthly pattern of the number of sold houses is occurring. There are far many more houses sold in the summer months, than in any other season (holidays probably affect this as well). The lowest number is sold during the winter.
Even though the last plot showed the stability of the median price over the years, the monthly median is much more volatile, especially in the first year of observations. They do not seem to follow any constant trend though. As mentioned before, in 2010 the data were incomplete for the whole year, but nevertheless, we can see a decrease in the number of sold houses in the summer in comparison to previous years. Whether this is due to incomplete data or some other exogenous factor - we do not know. The fact that the July bar is present on the plot may suggest that the data for June should be complete.
Now we go further to yet another very important price variable, which is the year the house was built. As presented on the map below, the vast majority of new houses are located on the western and northern outskirts of the city. The oldest houses are near the university and, obviously, mainly in the Old Town district.
In the context of the age of the building, it is crucial whether it has been renovated. For this reason, we take a look at the plot below at how the price of a house fluctuates depending on its construction year and on whether it has been renovated since then.
In the top chart, one can notice that all the houses built up to 1950 were already renovated. However, the share of renovated houses for a specific construction year, but above 1950, is usually less than half.
But does the renovation affect the price? The chart on the right shows that the distribution of house prices depending on whether there has been a renovation or not is quite similar. Moreover, in the small chart attached to it, we can spot that the relative share of this binary variable for the different price levels is 50:50 almost everywhere.
Finally, we join those distributions information and analyze the central chart. It indicates that the dependence of price on the year built does not seem to be linear. For relatively young houses, the price seems to increase significantly with the year they are built, while for the oldest buildings it rather flattens out.
Another important variable is living area. First, note that for almost every district, the distribution of this variable is somewhat right-skewed (the median is smaller than the mean). Definitely, the largest houses (in terms of living space) were sold in Northridge Heights. Interestingly, a very large variation in the living area can be observed in the case of houses sold in Old Town. The span between the 1st and 9th decile is over 1,300 square feet there. While, for example, in the case of the Gilbert district, this difference is almost twice as small. One can also interactively check the values for other districts.
On the above graph, we present the kernel densities estimates for the sale price with respect to different levels of the overall quality of the house. The first 3 or 4 worst levels are located almost entirely below $100,000. Each next level is further right on the x-axis (whole distributions are moving towards higher prices). The “average” level has taller distribution than others surrounding it (bar “poor”), maybe because people labelling the house as “average” if they think it is “good enough”. Then we can see a clear tendency of the distributions flattening and moving by bigger distances than before. “Very excellent” category has a really flat distribution, which means that you can find that quality in many price ranges (also more than per range - fat tails mean more houses on the tail prices), but they are still, on average, the most expensive.
Similar in idea, but representing the levels of a different variable is the graph above. Here, we compare the overall condition of the house. Now the differences between levels are not that significant and follow a very weak trend at best. The distribution of houses in poor condition is clearly to the left of those in “excellent” condition, but from average above we cannot really distinguish the differences. The only one is the big right tail of “excellent” condition. Similar case is with “average”. Also, there is a spike in “poor” condition houses in the $350,000 - $400,000 segment. Thus, we can conclude that if the house is a rather expensive one, it will most probably be of excellent, average, or poor condition.
This heatmap’s purpose is to represent the overwhelming domination of single-family detached house type on the Ames market. It constitutes more than 80% of the whole dataset. The second most popular type of housing is a townhouse end unit, however, it is a mere fraction of the main type.
This interactive plot above lets us explore the connections between the type of material used for founding the house and the overall condition of the house. What stands out is that almost all the poured concrete foundations ended with houses in “average” condition. Cinder block constitutes a good part of every condition level, however, it takes more than half of “poor” and “above average” categories. It is hard to determine the existence of a “superior” founding material, however, brick and tile might be one - it has more houses in the top levels than in the “average” and “poor”, which cannot be said about other materials. Its low total number may be due to said exclusiveness or simply uncommon practice among builders in the US.
The city that was analyzed in this project is quite a special one - being relatively small, yet housing a state university. Despite that, the results of most of the analyzed aspects were not ground-breaking and rather reasonable. There were of course some unexpected turns like the Old Town being the cheapest district among the 11 taken into account. Nevertheless, the visualizations proved insightful in studying the housing market of the city of Ames. We have learned about many dependencies, connections and distributions and such knowledge may prove invaluable to anyone intrested in getting to know the specifics of Ames’ housing market. The stress in this work was put mainly on the prices and districts, but we also took a look at other, less obvious variables and their impact on the main two. The results of our work are undoubtedly informative on their own, but should one want to extend the study further - into modelling or predicting the prices - this is a solid background and a starting point for such research.