# Load packages
library(ggplot2)
library(dplyr)

# Import data
census <- read.csv("/resources/rstudio/BusinessStatistics/Data/census.csv")

str(census)

summary(census)

Suppose that you are writing a report on the economic importance of snowbirds in the Lakes Region Planning Commission’s Region. The director ask you what the share of seasonal homes in total (housing_percentOfseasonal) for a typical town in the region is.

Q1 Would you choose the mean or the median for the typical value?

I would chose the median for typical values. ## Q2 Explain your answer. I would do this because the data has outliers that would skew the data if I were to use the mean. ## Q3 What is the highest percentage of the seasonal homes by any town in the region? The highest percentage of seasonal homes by town in the region is about 28%.

# Create faceted histogram
ggplot(census, aes(x = housing_percentOfseasonal)) +
  geom_histogram()


# Create box plots of city mpg by UR_aboveAve
ggplot(census, aes(x = 1, y = housing_percentOfseasonal)) +
  geom_boxplot()


# Create overlaid density plots for same data
ggplot(census, aes(x = housing_percentOfseasonal)) +
  geom_density(alpha = .3)

# If data has extreme values
census %>%
  summarize(median = median(housing_percentOfseasonal, na.rm = TRUE),
            IQR = IQR(housing_percentOfseasonal, na.rm = TRUE))

# If data doesn't have extreme values
census %>%
  summarize(mean = mean(housing_percentOfseasonal, na.rm = TRUE),
            sd = sd(housing_percentOfseasonal, na.rm = TRUE))

Suppose that director suspect that the share of seaonsal homes (popBA_percent) may be associated with the educational level of residents. Divide the towns into two groups: 1) educated towns (the share of population with Bachelor’s degree or higher than the average) and 2) other towns (the share of population with Bachelor’s degree or lower than the average). ## Q4 What is the share of seasonal homes in total in a typical educated town? 27%. ## Q5 What is the share of seasonal homes in total in a typical less educated educated town? Just above 31%. ## Q6 What possible explanation you may have for the significant difference, if any? The data I used to get the answer was calculated by using the median. If I were to use the mean both percentages would have gone up 2% due to the outliers in the data set.

# Create a new variable, UR > or < average
UR_ave <- mean(census$unemplRate)

census$UR_aboveAve <- ifelse(census$unemplRate >= UR_ave, "equal or above ave", "below ave")

# Create box plots of total population by UR_aboveAve
ggplot(census, aes(x = UR_aboveAve, y = housing_percentOfseasonal)) +
  geom_boxplot()


# If data has extreme values
census %>%
  group_by(UR_aboveAve) %>%
  summarize(median = median(housing_percentOfseasonal, na.rm = TRUE),
            IQR = IQR(housing_percentOfseasonal, na.rm = TRUE))

# If data doesn't have extreme values
census %>%
  group_by(UR_aboveAve) %>%
  summarize(mean = mean(housing_percentOfseasonal, na.rm = TRUE),
            sd = sd(housing_percentOfseasonal, na.rm = TRUE))