── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(janitor)
Attaching package: 'janitor'
The following objects are masked from 'package:stats':
chisq.test, fisher.test
1.1 DATA IMPORT
airbnb <- read_csv(“data/Airbnb_data.csv”) bank <- read_csv2(“data/Bank_data.csv”)
1.1 DATA CLEANING
airbnb <- airbnb %>% clean_names() %>% drop_na()
bank <- bank %>% clean_names() %>% drop_na()
1.2 ECONOMIC QUESTIONS
Airbnb: What factors determine Airbnb listing prices in NYC? Bank: Can we predict whether a customer will subscribe to a term deposit?
Price is right-skewed due to a small number of expensive listings. After log transformation, distribution becomes more symmetric. This suggests a log-normal distribution.
BANK INTERPRETATION
Target variable (y) is binary (yes/no). This follows a Bernoulli distribution. Proportion of success (yes) can be interpreted as probability p.