Assignment 4 HW

Author

KC

Introduction:

Xavier Property Sales

This data communicates the sales of properties near Xavier University campus. Each row in the data is a single transaction and each column tells a detail about the transaction that occurred. Xavier University is located in Hamilton County, OH which has a website where more information can be found about properties in near.

Information

For more information about the data set used, please visit: http://asayanalytics.com/xu_prop-csv

Data:

The date atribute is given in three separate columns displaying the year, month and data. Using the mutate command these can be put into one singular column which allows for more useful operations.

#|label: DATE
#|message: false 
#| echo: false

date_vector <- make_date(Xavier_Sales$year, Xavier_Sales$month, Xavier_Sales$day)

Family Size:

For this data set, houses with more bedrooms will house larger families. This is an assumption is not reasonable for real world examples. There is more variables that affects the size of a house than the size of a family. Income and location will directly affect the size. House holds with three or more bedrooms will be considered to have more than one person living in the house.

#|label: dummy varible 
#|message: false
#| echo: false
Fam_Dew <- ifelse(Xavier_Sales$bedrooms >=3 , 1, 0)

AVG Value– Standard Deviation

House hold buyers are looking to get the best value for their dream property. Understanding how the property relates to the average value of surrounding provides the buyer with a baseline offering price.

#|label: discrete varible 
#|massage: false 
#| echo: false
mean_value <- mean(Xavier_Sales$value, na.rm=TRUE)
std_value <- sd(Xavier_Sales$value, na.rm=TRUE)
value_sd_mean <- case_when(
  is.na(Xavier_Sales$value) ~"N/A",
  Xavier_Sales$value >= (mean_value + std_value) ~ "SD +1 than Mean",
  Xavier_Sales$value <= (mean_value - std_value) ~"SD -1 than Mean",
  Xavier_Sales$value >= (mean_value - std_value) & Xavier_Sales$value <= (mean_value + std_value) ~ "SD w/in Mean")

Standard Deviation for Single Fam.

A single family house hold is any house the contains less than 3 bedrooms. The graph shows a screwed left deviation.The majority of house have rouhgly 2000 SQFT.

#|label: Visualization-SQFT for single fam
#| message: false 
#| echo: false

Xavier_Sales <- Xavier_Sales %>%
  mutate(single_fam = ifelse(bedrooms > 3, "Single", "Multi"))

Xavier_Sales %>%
  group_by(single_fam) %>%
  summarize(count = n())
# A tibble: 2 × 2
  single_fam count
  <chr>      <int>
1 Multi       4306
2 Single      2579
hist(Xavier_Sales$finished_sqft, 
     main = "Histogram of Finished Square Feet", 
     xlab = "Finished Square Feet")

Visualizations

Ratio of Bedrooms to Full Bathrooms per Neighborhood

#|label: BD to BR 
#|message: false 
#| echo: false

BD_TO_BR <- (Xavier_Sales$full_bath/Xavier_Sales$bedrooms)
summary(BD_TO_BR)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   0.00    0.40    0.50     Inf    0.75     Inf 
Xavier_Sales%>%
ggplot(aes(x = neighborhood, y = BD_TO_BR)) +
  geom_boxplot(aes(color = neighborhood))

Total Value by month for each neighborhood

Gifted Property

A property as given as a gift. In order to take full advantage of this gift. It is important to look at which neighborhood would be the best option for this gifted property, when the best time to resell the property, the size of the house on the property, and the age of the house. All of these factors will affect the resell value.

#|label: gifted pro
#|message: false 

Xavier_Sales %>%
  group_by(neighborhood) %>%
  summarize(avg_price = mean(value, na.rm = TRUE)) %>%
  ggplot(aes(x = neighborhood, y = avg_price)) +
  geom_col() +  
  labs(title = "Neighborhood's Average Price",
       x = "Neighborhood",
       y = "Average Price")

Xavier_Sales %>%
  group_by(neighborhood, month, year) %>%
  summarize(avg_price = mean(value, na.rm = TRUE),
            avg_sqft = mean(finished_sqft, na.rm = TRUE),  
            avg_bd = mean(bedrooms, na.rm = TRUE),  
            avg_br = mean(full_bath, na.rm = TRUE)) -> neighborhood_month_year


CURRENT_YEAR <- year(Sys.Date())  
Xavier_Sales$Age <- CURRENT_YEAR - Xavier_Sales$year


Xavier_Sales %>%
  group_by(Age) %>%
  summarize(avg_price = mean(value, na.rm = TRUE)) -> avg_price_by_age


hist(Xavier_Sales$Age, main = "Histogram of Property Age", 
     xlab = "Age (Years)",
     col = "lightblue")

Xavier_Sales %>%
  group_by(month) %>%
  summarize(avg_price = mean(value, na.rm = TRUE)) -> avg_price_by_month


ggplot(avg_price_by_month, aes(x = factor(month), y = avg_price)) +  # Ensure 'month' is treated as a factor
  geom_point() +
  labs(title = "Average Price by Month",
       x = "Month",
       y = "Average Price")

Ownership Controversy in Cincinnati

Xavier_Sales$owner <- ifelse(grepl("LLC", Xavier_Sales$purchaser), "Company", "Indy")

Xavier_Sales$past_4_years <- 2025 - Xavier_Sales$year


Xavier_Sales %>%
  group_by(owner) %>%
  summarize(count = n()) %>%
  ggplot(aes(x = owner, y = count)) +
  geom_bar(stat = "identity") +  
  labs(title = "Company vs Indy",
       x = "Owner",
       y = "Frequency")