In This project, I’m evaluating txhousing dataset which contains information about housing prices in Texas.
txhousing is a data frame with 8602 observations and 9 variables:
city: Name of multiple listing service (MLS) area
Date: year,month,date
sales: Number of sales
volume: Total value of sales
median: Median sale price
listings: Total active listings
inventory: Months inventory(amount of time it would take to sell all current listings at current pace of sales)
txh, that contains all of the variables in txhousing, plus three more that you will create:txhousing dataset includes the median sale price of all sales in a city in a given month, but it does not include the mean sale price. Create a new variable, mean_price, that is the mean sale price of all sales in a city in a given month, calculated from the total volume and the number of sales.median sale price in a given month will generally be different from the mean_price. Create a new variable, price_dif that is the difference between the two (mean_price – median).sales_prop, that calculates the proportion of listings that resulted in sales in a given month.txh <- txhousing %>% mutate(mean_price = volume / sales,
price_dif = mean_price - median,
sales_prop = sales / listings)
txh
sales_prop is greater than one? List them.txh %>%
filter(sales_prop > 1) %>%
select(city, year, month, sales_prop, everything())
sales, the total volume of sales, and the number of cities in this dataset.txh %>%
summarise(sales_total = sum(sales, na.rm = TRUE),
volume_total = sum(volume, na.rm = TRUE),
cities_num = n_distinct(city))
sales per month, the median of the median price per month, and the median of the mean_price each month.txh %>%
summarise(sales_avg = mean(sales, na.rm = TRUE),
median_med = median(median, na.rm = TRUE),
median_mean = median(mean_price, na.rm = TRUE))
city, find the median price_dif and list them in descending order of their magnitude.txh %>%
group_by(city) %>%
summarise(med_price_dif = median(price_dif, na.rm = TRUE)) %>%
arrange(desc(med_price_dif))
city and each year, find the mean number of monthly sales and the median of the price variables: median, mean_price, price_dif.txh6 <- txh %>%
group_by(city, year) %>%
summarise(sales_avg = mean(sales, na.rm = TRUE),
median_med = median(median, na.rm = TRUE),
median_mean = median(mean_price, na.rm = TRUE),
median_dif = median(price_dif, na.rm = TRUE))
txh6
sales per year for each city, using a different color line for each city.txh6 %>%
ggplot() +
geom_line(mapping = aes(x = year, y = sales_avg, color = city))
price_dif for each city.txh6 %>%
ggplot() +
geom_boxplot(mapping = aes(x = median_dif, y = city))