Covid-19 pandemic not only affected people, but also many economic policies which has impacted the prices of residential properties in the Cincinnati neighborhoods that adjacent to Xavier’s campus. Citizen has been expressing their concern about housing affordability in the city. Through my analysis, I will be investigating residential property and have a better understanding and help address certain interests and concerns.
# A tibble: 1 × 19
parcel_id purchaser cps norwood_schools street_address unit_id street_name
<chr> <chr> <lgl> <lgl> <dbl> <chr> <chr>
1 218-0060-0… CRAIG RI… TRUE FALSE 430 <NA> WEST CLIFF…
# ℹ 12 more variables: use <dbl>, yr_blt <dbl>, day <dbl>, month <dbl>,
# year <dbl>, value <dbl>, neighborhood <chr>, total_rooms <dbl>,
# bedrooms <dbl>, full_bath <dbl>, half_bath <dbl>, finished_sqft <dbl>
Data Error
While examining the data, I noticed that there are 1995 missing values for the value column for the housing. This may show that many houses do not want to disclose this information to the public. Furthermore, there are some housing entries that are interesting. There is a house that has over 10,000 sq ft but less than 15 rooms which doesn’t seem that realistic. Futhermore there is a house that has 0 sq feet which was changed to NA.
Simple Trends and Analysis
While lookinging at the histogram, I would say Sqft is normal distributed, it’s more skewed to the left which means that majority of houses would have less square feet. Many houses are around 1,500 square feet. While looking at the, Norwood had one of the highest full bathrooms to bedrooms ratio probably due to majority of housing is for students. Many apartments come with their own bathroom. After looking at the graph for the total value of home transaction, it’s noticeable that during the summer time more people buy houses.
Directed Anaylsis
To find out what neighborhood I wanted to be located in, I grouped the neighborhoods and found the median transaction value. Based on this, Mount Adam is a great location and so is Hyde Park. Futhermore, just based off of sq ft and the number of bedroom, many houses in cincinnati have many bedrroms and quite a bit of sq ft. There are also many houses that sell that are over 100 years old. It would be great to wither buy a house that is around 100 years old is is just a couple of years old. The best time of year to sell would be July. It has the highest transaction value for houses.
Source Code
---title: "Analysis of Covid-19 and the Impact on Housing" # Name of your HTML outputauthor: "Nathaly Munnicha" # Author nametoc: true # Generates an automatic table of contents.format: # Options related to formatting. html: # Options related to HTML output. code-tools: TRUE # Allow the code tools option showing in the output. embed-resources: TRUE # Embeds all components into a single HTML file. execute: # Options related to the execution of code chunks. warning: FALSE # FALSE: Code chunk sarnings are hidden by default. message: FALSE # FALSE: Code chunk messages are hidden by default. echo: FALSE # TRUE: Show all code in the output.---## IntroductionCovid-19 pandemic not only affected people, but also many economic policies which has impacted the prices of residential properties in the Cincinnati neighborhoods that adjacent to Xavier's campus. Citizen has been expressing their concern about housing affordability in the city. Through my analysis, I will be investigating residential property and have a better understanding and help address certain interests and concerns.```{r}#| label: introductionlibrary(tidyverse) #For all the tidy thingslibrary(skimr) #For better summary statistics library(lubridate) # for the dates``````{r}#| label: data preparation data errorsproperty_sales <-read_csv("http://asayanalytics.com/xu_prop-csv")# Impossible dates property_sales %>%mutate (day =ifelse ((month ==2& day >28) | (month %in%c(4,6,9,11) & day >30),NA, day)) #Looking at the summary of the dataskim(property_sales)# Looking at extreme high end outiers if it's even possibleproperty_sales %>%filter(finished_sqft >10000& total_rooms <15)#Changing the finished sqfeet to NA instead of 0 property_sales [3052,19] <-NA```## Data ErrorWhile examining the data, I noticed that there are 1995 missing values for the value column for the housing. This may show that many houses do not want to disclose this information to the public. Furthermore, there are some housing entries that are interesting. There is a house that has over 10,000 sq ft but less than 15 rooms which doesn't seem that realistic. Futhermore there is a house that has 0 sq feet which was changed to NA.```{r}#| label: data preparation variable creation# fully functional date vector property_sales <- property_sales %>%mutate(date=dmy(paste(day, month, year, sep ="-")))# deleting day, month, year columnsproperty_sales <- property_sales %>%select(-day, -month, -year)#Multi-family or not property_sales <- property_sales %>%mutate(multi_dwell =ifelse(use %in%c(401,402,403), "TRUE","FALSE"))sd <-sd(property_sales$value, na.rm=TRUE)mean <-mean(property_sales$value, na.rm=TRUE)#Within 1SD of mean property_sales <- property_sales %>%mutate(mean_SD =ifelse( value - mean <= sd, "Within 1 SD", "Outside 1 SD"))#More than 1SDproperty_sales <- property_sales %>%mutate(SD_abovemean =ifelse( value - mean >= sd, "More than 1 SD above mean", "Less than 1 above the mean"))#Less than 1SDproperty_sales <- property_sales %>%mutate(SD_belowmean =ifelse(value < mean - sd, "less than 1 SD below the mean value.", "Notless than 1 SD below the mean"))#Value is missing property_sales <- property_sales %>%mutate(value_missing =ifelse(is.na(value), "Missing", "Not Missing"))``````{r}#| label: Simple Trends and Analysis#Distribution of Single Family House Dwelling in SqFt property_sales %>%filter(use==510) %>%ggplot(aes(x= finished_sqft)) +geom_histogram(bins=50) +labs(title="Distribution of Finished Square Feet",x="Finished Square Feet")#Ratio of full bathrooms to bedroomsproperty_sales <- property_sales %>%mutate(ratio_bath_rooms = full_bath / bedrooms)property_sales %>%group_by(neighborhood) %>%summarize(avg_ratio_bath_bed =mean(ratio_bath_rooms, na.rm =TRUE)) %>%ggplot(aes(x = neighborhood, y = avg_ratio_bath_bed)) +labs(title ="Average Bathroom-to-Bedroom Ratio by Neighborhood",x ="Neighborhood",y ="Bathroom-to-Bedroom Ratio" )#Total Value of Home Transaction# want the y to be month, x to be the total home transaction, group by neighborhood property_sales <- property_sales %>%mutate(month=month(date))monthly_sales <- property_sales %>%group_by(neighborhood, month) %>%summarize(total_sum_sales =sum(value, na.rm =TRUE))ggplot(monthly_sales, aes(x = month, y = total_sum_sales, color = neighborhood)) +geom_point() +scale_y_continuous(labels = scales::comma) +labs(title ="Total Value of Home Transactions by Month & Neighborhood",x ="Month",y ="Total Transaction Value")```## Simple Trends and AnalysisWhile lookinging at the histogram, I would say Sqft is normal distributed, it's more skewed to the left which means that majority of houses would have less square feet. Many houses are around 1,500 square feet. While looking at the, Norwood had one of the highest full bathrooms to bedrooms ratio probably due to majority of housing is for students. Many apartments come with their own bathroom. After looking at the graph for the total value of home transaction, it's noticeable that during the summer time more people buy houses. ```{r}#| label: Directed Analysis# Neighborhood I would want to be located property_sales %>%group_by(neighborhood) %>%ggplot(aes(x= neighborhood, y= value)) +scale_y_continuous(labels = scales::comma)+geom_boxplot() +labs(title ="Highest Transaction Value of houses between neighborhoods",x="Neighborhoods",y="Transaction Value")# What features (size, rooms, bedrooms) would you want it to haveproperty_sales %>%ggplot(aes(x = finished_sqft, y = total_rooms)) +geom_point() +labs(title ="Home Size vs. Number of Rooms",x ="Finished Square Footage",y ="Total Number of Rooms")property_sales %>%ggplot(aes(x =as.factor(bedrooms), y = finished_sqft)) +geom_boxplot() +labs(title ="Distribution of Home Sizes by Number of Bedrooms",x ="Number of Bedrooms",y ="Finished Square Footage")#How old do you want it to beproperty_sales <- property_sales %>%mutate(home_age =year(Sys.Date()) - yr_blt)property_sales %>%ggplot (aes(x = home_age, y = value)) +geom_point()+scale_y_continuous(labels = scales::comma)+labs(title ="Home Age vs. Transaction Value",x ="Home Age",y ="Transaction Value")# What time of year or day of the week would you want to sell monthly_sales <- property_sales %>%group_by(month) %>%summarize(total_sales =sum(value, na.rm =TRUE))ggplot(monthly_sales, aes(x = month, y = total_sales)) +geom_point() +scale_y_continuous(labels = scales::comma) +labs(title ="Total Home Sales Value by Month",x ="Month",y ="Total Transaction Value")```## Directed AnaylsisTo find out what neighborhood I wanted to be located in, I grouped the neighborhoods and found the median transaction value. Based on this, Mount Adam is a great location and so is Hyde Park. Futhermore, just based off of sq ft and the number of bedroom, many houses in cincinnati have many bedrroms and quite a bit of sq ft. There are also many houses that sell that are over 100 years old. It would be great to wither buy a house that is around 100 years old is is just a couple of years old. The best time of year to sell would be July. It has the highest transaction value for houses. ```{r}#| label: Self Directed Analysislibrary(stringr)property_sales <- property_sales %>%mutate(buyer_type =ifelse(str_detect(purchaser, "LLC"), "corporation", "individual"))property_sales %>%filter(year(date) >=2017) %>%mutate(year =year(date)) %>%group_by(year,buyer_type) %>%summarize(avg_price =mean(value, na.rm =TRUE)) %>%ggplot(aes(x = year, y = avg_price, color = buyer_type)) +scale_y_continuous(labels = scales::comma) +geom_line() +labs(title ="Average Property Sale Price by Ownership Type (2017-2021)",x ="Year",y ="Average Sale Price" )```