# A tibble: 6,885 × 19
parcel_id purchaser cps norwood_schools street_address unit_id street_name
<chr> <chr> <lgl> <lgl> <dbl> <chr> <chr>
1 068-0002-… LAURENT … TRUE FALSE 713 #1 E MCMILLAN…
2 041-0002-… BRABES I… TRUE FALSE 3443 #1 SHAW AVE
3 053-0001-… MITCHELL… TRUE FALSE 2324 #1809 MADISON RD
4 053-0001-… OKADA KE… TRUE FALSE 2324 #1810 MADISON RD
5 055-0004-… HUBER MA… TRUE FALSE 1720 #2 DEXTER AVE
6 086-0001-… TDPDX LLC TRUE FALSE 536 #2 LIBERTY HI…
7 041-0002-… 3443 SHA… TRUE FALSE 3443 #2 SHAW AVE
8 068-0003-… WYATT DA… TRUE FALSE 719 #3 E MCMILLAN…
9 086-0001-… RUDY AND… TRUE FALSE 534 #3 LIBERTY HI…
10 086-0001-… SCHINDEW… TRUE FALSE 534 #4 LIBERTY HI…
# ℹ 6,875 more rows
# ℹ 12 more variables: use <dbl>, yr_blt <dbl>, value <dbl>,
# neighborhood <chr>, total_rooms <dbl>, bedrooms <dbl>, full_bath <dbl>,
# half_bath <dbl>, finished_sqft <dbl>, date <date>, dwelling <lgl>,
# value_category <chr>
#4 Assignment
Introduction
- Housing around universities’ campus usually has different characteristics. The document explains the trend of the houses’ price around Xavier University campus based on variables such as school district, neighborhood, number of rooms/ bedrooms/ bathroom, size of the property, and the transaction value. Based on the analysis, we could estimate the value of a property with a similar traits.
Data Preparation:
2.1 Load in Data
2.2 Data preparation
Data Errors:
There are some suspicious data in the data set:
The year the property was built is after the year it was sold (sold in 2019, built 2020). This could be because the buyers only bought the land and then built a property on it, or they rebuilt the property after buying it.
There are 10 properties which have transaction value as $0, this could be caused by heritage property, donations, etc.
Variable Creation
Data-Dwelling
Simple Trends & Analysis
3.1
According to the boxplot, most of the property that was marked as single-dwelling family have the size in the range of 1000-3000 square feet. Although there are some outliers which the size ranges from 5000-9000 square feet. Judging from the visualization, SQFT is not normally distributed but skewed to the right.
3.2
Before mutating the data, I expect the ratio of full bathrooms to bedrooms for each neighborhood would be less or equal to 1 since it does not make sense to have 2 bedrooms and 3+ full bathrooms.
# A tibble: 9 × 4
neighborhood bed.sum bath.sum ratio
<chr> <dbl> <dbl> <dbl>
1 Avondale 2119 1070 0.505
2 Clifton 1775 1049 0.591
3 Evanston 2275 1133 0.498
4 Hyde Park 4931 3217 0.652
5 Mount Adams 654 535 0.818
6 Mount Auburn 1457 892 0.612
7 N Avondale 1334 742 0.556
8 Norwood 6469 3373 0.521
9 Walnut Hills 1471 1006 0.684
3.3
According to the visualization, we could see that the neighborhood of Hyde Park and Norwood have the significant difference of the total value of transaction of properties from other neighborhoods. At the very bottom, we could see the pink line of Walnut Hills. We could see that the total transaction value of each neighborhood does not show a significant trend in a specific month. –> not exhibit seasonality.
Although, when the sum of the transaction value was calculated by neighborhood groups and the groups of month, the year variable was not taken in account.
Directed Analysis
Price vs Neighborhood:
According to the visualization, the properties in Mount Adams has the highest average value of transaction. Therefore if I want to make an investment, I would consider buying property in Mount Adams.
Features: (size, bedrooms, bathrooms)
To decide which features of the property I want, I make some visualization to support my assumption.
Firstly, I filter only properties in Mount Adams, since it has a high rate of property values, and also it mostly does not have apartment ( which would lead to there would be properties has 14 rooms, which it does not quite make sense.)
According to the scatter plot, the houses with 4 bedrooms and size of ~5500 square feet usually have a higher rate of price. The theorem is the aiming customers are families with kid(s) since they need more spacious house than people without dependents. Also, people with family usually are more comfortable with purchasing a big house. Therefore, if I am investing, I would invest to the house with similar qualities (3-4 bedrooms, size about 4500-5000 square feet)
How old?
Preparation before generating visualization:
Calculate age of the house at the time it was sold by taking the difference between year of transaction and the year the property was built. There might be negative values, ut it is expected since people could buy the land first and buy houses on it, which is more likely the case if a corporate is the purchaser.
Categorize the ages into 4 group: young(younger than 50 years old), fair(50-100), a bit old(100-150), and old( greater than 150)
Calculate the average value of transaction with in each group.
Generate a bar plot with x-axis is the age group and y-axis is the average value of transaction.
Conclusion:
If I was an investor, I would buy a young property since it has a higher value. Young properties has more up-to-date style and architecture in most population’s style. Moreover, young properties is more well kept than older properties which may require some refurbishment for people to live in. Hence lower the value of the properties.
Time to sell
I only took in account transaction value of year 2020 ( the most recent year) because then the trend would be most accurate for recent time.
I made 2 visualizations to demonstrate the trend of the value of properties throughout 2020. The first visualization is to observe which month the most properties were purchased the most. The second visualization is to observe the mean value of properties that were purchased each month in 2020.
We can see in the first visualization, the winter (10-11-12) is the season that the highest purchases of property. However, in the second visualization, the value of properties in the winter is also the lowest. This might the the result of affordability. In the winter there are usually less buyers –> lower price –> purchaser who are looking to buy an affordable house. However, overall, the average value of properties is most highest. Therefore, in late spring and summer is the best time to sell houses.
Self-directed Analysis
My theorem is that the corporations buying properties and release them for rent does increase the average value of the properties overall.
In order to make an appropriate visualization to support the assumption, I make 2 visualizations of average values in 2018-2019 and 2021-2022 to see the change of the values in 2 different time intervals. 2021-2022 is closer to the recent time, and in this time frame, the trend of investment firms purchasing properties already appeared and was growing. On the other hand, from 2018-2019, the phenomena was still not very common.
The amount of observations in to sample 2018-2019 and 2021-2022 is roughly equal (2021-2022 has about 200 observations less than 2018-2019). Therefore I do not worry about the difference of amount of observations would cause a significant skewness in the calculation of mean.
Conclusion:
On the left graph, there are only 4 neighborhood which have average value of properties over $200,000. While in the right visualizations, There are more than 5 neighborhood have average value greater than $200,000
In 2018-2019, the highest average properties value is about $500,000. While in 2021-202, the highest value bumped up to about $550,000.
Therefore, I would conclude that the rising of investment firms purchases of property increases the value of properties.