#|label: DATE
#|message: false
#| echo: false
date_vector <- make_date(Xavier_Sales$year, Xavier_Sales$month, Xavier_Sales$day)Assignment 4 HW
Introduction:
Xavier Property Sales
This data communicates the sales of properties near Xavier University campus. Each row in the data is a single transaction and each column tells a detail about the transaction that occurred. Xavier University is located in Hamilton County, OH which has a website where more information can be found about properties in near.
Information
For more information about the data set used, please visit: http://asayanalytics.com/xu_prop-csv
Data:
The date atribute is given in three separate columns displaying the year, month and data. Using the mutate command these can be put into one singular column which allows for more useful operations.
Family Size:
For this data set, houses with more bedrooms will house larger families. This is an assumption is not reasonable for real world examples. There is more variables that affects the size of a house than the size of a family. Income and location will directly affect the size. House holds with three or more bedrooms will be considered to have more than one person living in the house.
#|label: dummy varible
#|message: false
#| echo: false
Fam_Dew <- ifelse(Xavier_Sales$bedrooms >=3 , 1, 0)AVG Value– Standard Deviation
House hold buyers are looking to get the best value for their dream property. Understanding how the property relates to the average value of surrounding provides the buyer with a baseline offering price.
#|label: discrete varible
#|massage: false
#| echo: false
mean_value <- mean(Xavier_Sales$value, na.rm=TRUE)
std_value <- sd(Xavier_Sales$value, na.rm=TRUE)
value_sd_mean <- case_when(
is.na(Xavier_Sales$value) ~"N/A",
Xavier_Sales$value >= (mean_value + std_value) ~ "SD +1 than Mean",
Xavier_Sales$value <= (mean_value - std_value) ~"SD -1 than Mean",
Xavier_Sales$value >= (mean_value - std_value) & Xavier_Sales$value <= (mean_value + std_value) ~ "SD w/in Mean")Standard Deviation for Single Fam.
A single family house hold is any house the contains less than 3 bedrooms. The graph shows a screwed left deviation.The majority of house have rouhgly 2000 SQFT.
#|label: Visualization-SQFT for single fam
#| message: false
#| echo: false
Xavier_Sales <- Xavier_Sales %>%
mutate(single_fam = ifelse(bedrooms > 3, "Single", "Multi"))
Xavier_Sales %>%
group_by(single_fam) %>%
summarize(count = n())# A tibble: 2 × 2
single_fam count
<chr> <int>
1 Multi 4306
2 Single 2579
hist(Xavier_Sales$finished_sqft,
main = "Histogram of Finished Square Feet",
xlab = "Finished Square Feet")Visualizations
Ratio of Bedrooms to Full Bathrooms per Neighborhood
#|label: BD to BR
#|message: false
#| echo: false
BD_TO_BR <- (Xavier_Sales$full_bath/Xavier_Sales$bedrooms)
summary(BD_TO_BR) Min. 1st Qu. Median Mean 3rd Qu. Max.
0.00 0.40 0.50 Inf 0.75 Inf
Xavier_Sales%>%
ggplot(aes(x = neighborhood, y = BD_TO_BR)) +
geom_boxplot(aes(color = neighborhood))Total Value by month for each neighborhood
Gifted Property
A property as given as a gift. In order to take full advantage of this gift. It is important to look at which neighborhood would be the best option for this gifted property, when the best time to resell the property, the size of the house on the property, and the age of the house. All of these factors will affect the resell value.
#|label: gifted pro
#|message: false
Xavier_Sales %>%
group_by(neighborhood) %>%
summarize(avg_price = mean(value, na.rm = TRUE)) %>%
ggplot(aes(x = neighborhood, y = avg_price)) +
geom_col() +
labs(title = "Neighborhood's Average Price",
x = "Neighborhood",
y = "Average Price")Xavier_Sales %>%
group_by(neighborhood, month, year) %>%
summarize(avg_price = mean(value, na.rm = TRUE),
avg_sqft = mean(finished_sqft, na.rm = TRUE),
avg_bd = mean(bedrooms, na.rm = TRUE),
avg_br = mean(full_bath, na.rm = TRUE)) -> neighborhood_month_year
CURRENT_YEAR <- year(Sys.Date())
Xavier_Sales$Age <- CURRENT_YEAR - Xavier_Sales$year
Xavier_Sales %>%
group_by(Age) %>%
summarize(avg_price = mean(value, na.rm = TRUE)) -> avg_price_by_age
hist(Xavier_Sales$Age, main = "Histogram of Property Age",
xlab = "Age (Years)",
col = "lightblue")Xavier_Sales %>%
group_by(month) %>%
summarize(avg_price = mean(value, na.rm = TRUE)) -> avg_price_by_month
ggplot(avg_price_by_month, aes(x = factor(month), y = avg_price)) + # Ensure 'month' is treated as a factor
geom_point() +
labs(title = "Average Price by Month",
x = "Month",
y = "Average Price")Ownership Controversy in Cincinnati
Xavier_Sales$owner <- ifelse(grepl("LLC", Xavier_Sales$purchaser), "Company", "Indy")
Xavier_Sales$past_4_years <- 2025 - Xavier_Sales$year
Xavier_Sales %>%
group_by(owner) %>%
summarize(count = n()) %>%
ggplot(aes(x = owner, y = count)) +
geom_bar(stat = "identity") +
labs(title = "Company vs Indy",
x = "Owner",
y = "Frequency")