#4 Assignment

Author

Hanh Nguyen

Introduction

  • Housing around universities’ campus usually has different characteristics. The document explains the trend of the houses’ price around Xavier University campus based on variables such as school district, neighborhood, number of rooms/ bedrooms/ bathroom, size of the property, and the transaction value. Based on the analysis, we could estimate the value of a property with a similar traits.

Data Preparation:

2.1 Load in Data

2.2 Data preparation

Data Errors:

There are some suspicious data in the data set:

  • The year the property was built is after the year it was sold (sold in 2019, built 2020). This could be because the buyers only bought the land and then built a property on it, or they rebuilt the property after buying it.

  • There are 10 properties which have transaction value as $0, this could be caused by heritage property, donations, etc.

Variable Creation

Data-Dwelling

# A tibble: 6,885 × 19
   parcel_id  purchaser cps   norwood_schools street_address unit_id street_name
   <chr>      <chr>     <lgl> <lgl>                    <dbl> <chr>   <chr>      
 1 068-0002-… LAURENT … TRUE  FALSE                      713 #1      E MCMILLAN…
 2 041-0002-… BRABES I… TRUE  FALSE                     3443 #1      SHAW AVE   
 3 053-0001-… MITCHELL… TRUE  FALSE                     2324 #1809   MADISON RD 
 4 053-0001-… OKADA KE… TRUE  FALSE                     2324 #1810   MADISON RD 
 5 055-0004-… HUBER MA… TRUE  FALSE                     1720 #2      DEXTER AVE 
 6 086-0001-… TDPDX LLC TRUE  FALSE                      536 #2      LIBERTY HI…
 7 041-0002-… 3443 SHA… TRUE  FALSE                     3443 #2      SHAW AVE   
 8 068-0003-… WYATT DA… TRUE  FALSE                      719 #3      E MCMILLAN…
 9 086-0001-… RUDY AND… TRUE  FALSE                      534 #3      LIBERTY HI…
10 086-0001-… SCHINDEW… TRUE  FALSE                      534 #4      LIBERTY HI…
# ℹ 6,875 more rows
# ℹ 12 more variables: use <dbl>, yr_blt <dbl>, value <dbl>,
#   neighborhood <chr>, total_rooms <dbl>, bedrooms <dbl>, full_bath <dbl>,
#   half_bath <dbl>, finished_sqft <dbl>, date <date>, dwelling <lgl>,
#   value_category <chr>

Directed Analysis

Price vs Neighborhood:

According to the visualization, the properties in Mount Adams has the highest average value of transaction. Therefore if I want to make an investment, I would consider buying property in Mount Adams.

Features: (size, bedrooms, bathrooms)

To decide which features of the property I want, I make some visualization to support my assumption.

Firstly, I filter only properties in Mount Adams, since it has a high rate of property values, and also it mostly does not have apartment ( which would lead to there would be properties has 14 rooms, which it does not quite make sense.)

According to the scatter plot, the houses with 4 bedrooms and size of ~5500 square feet usually have a higher rate of price. The theorem is the aiming customers are families with kid(s) since they need more spacious house than people without dependents. Also, people with family usually are more comfortable with purchasing a big house. Therefore, if I am investing, I would invest to the house with similar qualities (3-4 bedrooms, size about 4500-5000 square feet)

How old?

Preparation before generating visualization:

  • Calculate age of the house at the time it was sold by taking the difference between year of transaction and the year the property was built. There might be negative values, ut it is expected since people could buy the land first and buy houses on it, which is more likely the case if a corporate is the purchaser.

  • Categorize the ages into 4 group: young(younger than 50 years old), fair(50-100), a bit old(100-150), and old( greater than 150)

  • Calculate the average value of transaction with in each group.

  • Generate a bar plot with x-axis is the age group and y-axis is the average value of transaction.

Conclusion:

If I was an investor, I would buy a young property since it has a higher value. Young properties has more up-to-date style and architecture in most population’s style. Moreover, young properties is more well kept than older properties which may require some refurbishment for people to live in. Hence lower the value of the properties.

Time to sell

I only took in account transaction value of year 2020 ( the most recent year) because then the trend would be most accurate for recent time.

I made 2 visualizations to demonstrate the trend of the value of properties throughout 2020. The first visualization is to observe which month the most properties were purchased the most. The second visualization is to observe the mean value of properties that were purchased each month in 2020.

We can see in the first visualization, the winter (10-11-12) is the season that the highest purchases of property. However, in the second visualization, the value of properties in the winter is also the lowest. This might the the result of affordability. In the winter there are usually less buyers –> lower price –> purchaser who are looking to buy an affordable house. However, overall, the average value of properties is most highest. Therefore, in late spring and summer is the best time to sell houses.

Self-directed Analysis

My theorem is that the corporations buying properties and release them for rent does increase the average value of the properties overall.

  • In order to make an appropriate visualization to support the assumption, I make 2 visualizations of average values in 2018-2019 and 2021-2022 to see the change of the values in 2 different time intervals. 2021-2022 is closer to the recent time, and in this time frame, the trend of investment firms purchasing properties already appeared and was growing. On the other hand, from 2018-2019, the phenomena was still not very common.

  • The amount of observations in to sample 2018-2019 and 2021-2022 is roughly equal (2021-2022 has about 200 observations less than 2018-2019). Therefore I do not worry about the difference of amount of observations would cause a significant skewness in the calculation of mean.

Conclusion:

  • On the left graph, there are only 4 neighborhood which have average value of properties over $200,000. While in the right visualizations, There are more than 5 neighborhood have average value greater than $200,000

  • In 2018-2019, the highest average properties value is about $500,000. While in 2021-202, the highest value bumped up to about $550,000.

  • Therefore, I would conclude that the rising of investment firms purchases of property increases the value of properties.