ECON 465 Stage 1
PACKAGES
Airbnb: Kaggle NYC Airbnb dataset (URL)
Bank: UCI Bank Marketing dataset (URL)
install.packages(“tidyverse”) install.packages(“janitor”) library(tidyverse) library(janitor)
1.1 DATA IMPORT
airbnb <- read_csv(“data/Airbnb_data.csv”) bank <- read_csv2(“data/Bank_data.csv”)
1.1 DATA CLEANING
airbnb <- airbnb %>% clean_names() %>% drop_na()
bank <- bank %>% clean_names() %>% drop_na()
1.2 ECONOMIC QUESTIONS
Airbnb: What factors determine Airbnb listing prices in NYC?
Bank: Can we predict whether a customer will subscribe to a term deposit?
1.3 SUMMARY STATISTICS
glimpse(airbnb) glimpse(bank)
AIRBNB SUMMARY (TARGET = price)
summary(airbnb\(price) sd(airbnb\)price)
BANK SUMMARY (TARGET = y)
table(bank\(y) prop.table(table(bank\)y))
1.3 HISTOGRAMS
Airbnb price distribution
ggplot(airbnb, aes(x = price)) + geom_histogram(bins = 50) + theme_minimal() + labs(title = “Airbnb Price Distribution”)
Bank target distribution (y)
ggplot(bank, aes(x = y)) + geom_bar() + theme_minimal() + labs(title = “Bank Term Deposit Subscription (y)”)
1.3 LOG TRANSFORMATION
airbnb <- airbnb %>% filter(price > 0) %>% mutate(log_price = log(price))
1.3 LOG HISTOGRAMS
ggplot(airbnb, aes(x = log_price)) + geom_histogram(bins = 50) + labs(title = “Log Airbnb Price Distribution”)
#1.4 PROBABILITY INTERPRETATION # AIRBNB INTERPRETATION # Price is right-skewed due to a small number of expensive listings. # After log transformation, distribution becomes more symmetric. # This suggests a log-normal distribution.