The goal of this homework is to get more hands-on practice with data cleaning and feature engineering, experience with classification trees and kNNs, and understanding of overfitting. You will:
The data in the accompanying file “airbnb_hw2.csv” (posted on Canvas) contains data from 10,000 Airbnb.com listings, mostly from the US. This is a larger subset of the data that you will eventually use for the class project. The data dictionary is available on ELMS.
Your task is to develop models to predict the target variable “high_booking_rate”, which labels whether a listing is popular (i.e. spends most of the time booked) or not.
Please answer the questions below clearly and concisely, providing tables or plots where applicable. Turn in a well-formatted compiled HTML document using R Markdown, containing clear answers to the questions and R code in the appropriate places.
RUBRIC: To receive a passing score on this assignment, you must do the following:
Note that this assignment is somewhat open-ended and there are many ways to answer these questions. I don’t require that we have exactly the same answers in order for you to receive full credit.
airbnb <- read_csv("airbnb_hw2.csv") #read the dataset in R
## Rows: 10000 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): name, bed_type, cancellation_policy, cleaning_fee, price, property...
## dbl (7): accommodates, bedrooms, beds, host_total_listings_count, high_book...
## lgl (1): host_is_superhost
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
names(airbnb) #variables used in dataset
## [1] "name" "accommodates"
## [3] "bed_type" "bedrooms"
## [5] "beds" "cancellation_policy"
## [7] "cleaning_fee" "host_total_listings_count"
## [9] "price" "property_type"
## [11] "room_type" "high_booking_rate"
## [13] "bathrooms" "extra_people"
## [15] "host_acceptance_rate" "host_is_superhost"
## [17] "host_response_rate" "minimum_nights"
## [19] "market"
What is the mean of the accommodates variable?
ANSWER: The mean number of people that can be accommodated in a listing in this dataset is 3.522893.
accommodates_mean <- airbnb %>%
summarise(mean_accommodates = mean(accommodates))
price_per_person is the nightly price per accommodates
has_cleaning_fee is YES if there is a cleaning fee, and NO otherwise
bed_category is “bed” if the bed_type is Real Bed and “other” otherwise
property_category has the following values:
make sure to convert property_category to a factor!
ppp_ind is 1 if the price_per_person is greater than the median for the property_category, and 0 otherwise
Make sure these variables are factors: - property_category - bed_category - cancellation_policy - room_type - ppp_ind
#PUT QUESTION 1a CODE HERE
cleandata <- airbnb %>%
mutate(cancellation_policy = as.factor(ifelse(cancellation_policy == 'super_strict_30','strict',cancellation_policy)),
price = parse_number(price, na = c("", "NA"), locale = default_locale(), trim_ws = TRUE),
cleaning_fee = parse_number(cleaning_fee, na = c("", "NA"), locale = default_locale(), trim_ws = TRUE),
cleaning_fee = ifelse(is.na(cleaning_fee),0,cleaning_fee),
price = ifelse(is.na(price),0,price),
accommodates = ifelse(is.na(accommodates),mean(accommodates,na.rm = TRUE),accommodates),
bedrooms = ifelse(is.na(bedrooms),mean(bedrooms,na.rm = TRUE),bedrooms),
beds = ifelse(is.na(beds),mean(beds,na.rm = TRUE),beds),
host_total_listings_count = ifelse(is.na(host_total_listings_count),
mean(host_total_listings_count,na.rm=TRUE),host_total_listings_count),
price_per_person = price/accommodates,
has_cleaning_fee = ifelse(cleaning_fee != 0,"YES", "NO"),
bed_category = ifelse(bed_type == "Real Bed","Bed","other"),
property_category = as.factor(case_when(property_type %in% c("Bed & Breakfast","Boutique hotel","Hostel") ~ "hotel",
property_type %in% c("Apartment", "Serviced apartment", "Loft") ~ "apartment",
property_type %in% c("Townhouse","Condominium") ~ "condo",
property_type %in% c("Bungalow","House") ~ "house",TRUE ~ "other" )),
bed_type = as.factor(bed_type),
room_type= as.factor(room_type)) %>%
group_by(property_category)%>%
mutate(median_CC = median(price_per_person))%>%
ungroup()%>%
mutate(ppp_ind = as.factor(ifelse(price_per_person > median_CC,1,0))
)
summary(cleandata)
## name accommodates bed_type bedrooms
## Length:10000 Min. : 1.000 Airbed : 67 Min. : 0.000
## Class :character 1st Qu.: 2.000 Couch : 19 1st Qu.: 1.000
## Mode :character Median : 3.000 Futon : 113 Median : 1.000
## Mean : 3.523 Pull-out Sofa: 77 Mean : 1.365
## 3rd Qu.: 4.000 Real Bed :9724 3rd Qu.: 2.000
## Max. :16.000 Max. :11.000
##
## beds cancellation_policy cleaning_fee
## Min. : 0.000 flexible :2216 Min. : 0
## 1st Qu.: 1.000 moderate :3081 1st Qu.: 12
## Median : 1.000 strict :4687 Median : 40
## Mean : 1.892 super_strict_60: 16 Mean : 55
## 3rd Qu.: 2.000 3rd Qu.: 80
## Max. :16.000 Max. :950
##
## host_total_listings_count price property_type
## Min. : 0.000 Min. : 0.0 Length:10000
## 1st Qu.: 1.000 1st Qu.: 71.0 Class :character
## Median : 1.000 Median : 109.0 Mode :character
## Mean : 9.176 Mean : 154.4
## 3rd Qu.: 3.000 3rd Qu.: 175.0
## Max. :992.000 Max. :5000.0
##
## room_type high_booking_rate bathrooms extra_people
## Entire home/apt:6080 Min. :0.0000 Min. : 0.000 Length:10000
## Private room :3659 1st Qu.:0.0000 1st Qu.: 1.000 Class :character
## Shared room : 261 Median :0.0000 Median : 1.000 Mode :character
## Mean :0.2443 Mean : 1.287
## 3rd Qu.:0.0000 3rd Qu.: 1.000
## Max. :1.0000 Max. :17.000
## NA's :31
## host_acceptance_rate host_is_superhost host_response_rate minimum_nights
## Length:10000 Mode :logical Length:10000 Min. : 1.000
## Class :character FALSE:7445 Class :character 1st Qu.: 1.000
## Mode :character TRUE :2536 Mode :character Median : 2.000
## NA's :19 Mean : 3.378
## 3rd Qu.: 3.000
## Max. :1125.000
##
## market price_per_person has_cleaning_fee bed_category
## Length:10000 Min. : 0.00 Length:10000 Length:10000
## Class :character 1st Qu.: 27.50 Class :character Class :character
## Mode :character Median : 39.50 Mode :character Mode :character
## Mean : 46.80
## 3rd Qu.: 57.17
## Max. :1600.00
##
## property_category median_CC ppp_ind
## apartment:5687 Min. :35.00 0:5118
## condo : 633 1st Qu.:35.00 1:4882
## hotel : 70 Median :42.50
## house :3197 Mean :39.54
## other : 413 3rd Qu.:42.50
## Max. :42.50
##
final_cleandata <- cleandata %>%
mutate(bathrooms = ifelse(is.na(bathrooms),median(bathrooms,na.rm = TRUE),bathrooms ),
host_is_superhost =ifelse(is.na(host_is_superhost),FALSE,host_is_superhost),
charges_for_extra = as.factor(ifelse(parse_number(extra_people) > 0,"YES","NO")),
host_acceptance = as.factor(ifelse(is.na(host_acceptance_rate),"MISSING",
ifelse(host_acceptance_rate =="100%","ALL","SOME"))),
host_response = as.factor(ifelse(is.na(host_response_rate),"MISSING",
ifelse(host_response_rate =="100%","ALL","SOME"))),
has_min_nights = ifelse(minimum_nights > 1,"YES","NO"),
market = as.factor(ifelse(is.na(market) | table(market)[market] < 300, "OTHER", market)),
high_booking_rate = as.factor(high_booking_rate))
summary(final_cleandata)
## name accommodates bed_type bedrooms
## Length:10000 Min. : 1.000 Airbed : 67 Min. : 0.000
## Class :character 1st Qu.: 2.000 Couch : 19 1st Qu.: 1.000
## Mode :character Median : 3.000 Futon : 113 Median : 1.000
## Mean : 3.523 Pull-out Sofa: 77 Mean : 1.365
## 3rd Qu.: 4.000 Real Bed :9724 3rd Qu.: 2.000
## Max. :16.000 Max. :11.000
##
## beds cancellation_policy cleaning_fee
## Min. : 0.000 flexible :2216 Min. : 0
## 1st Qu.: 1.000 moderate :3081 1st Qu.: 12
## Median : 1.000 strict :4687 Median : 40
## Mean : 1.892 super_strict_60: 16 Mean : 55
## 3rd Qu.: 2.000 3rd Qu.: 80
## Max. :16.000 Max. :950
##
## host_total_listings_count price property_type
## Min. : 0.000 Min. : 0.0 Length:10000
## 1st Qu.: 1.000 1st Qu.: 71.0 Class :character
## Median : 1.000 Median : 109.0 Mode :character
## Mean : 9.176 Mean : 154.4
## 3rd Qu.: 3.000 3rd Qu.: 175.0
## Max. :992.000 Max. :5000.0
##
## room_type high_booking_rate bathrooms extra_people
## Entire home/apt:6080 0:7557 Min. : 0.000 Length:10000
## Private room :3659 1:2443 1st Qu.: 1.000 Class :character
## Shared room : 261 Median : 1.000 Mode :character
## Mean : 1.286
## 3rd Qu.: 1.000
## Max. :17.000
##
## host_acceptance_rate host_is_superhost host_response_rate minimum_nights
## Length:10000 Mode :logical Length:10000 Min. : 1.000
## Class :character FALSE:7464 Class :character 1st Qu.: 1.000
## Mode :character TRUE :2536 Mode :character Median : 2.000
## Mean : 3.378
## 3rd Qu.: 3.000
## Max. :1125.000
##
## market price_per_person has_cleaning_fee bed_category
## New York :3307 Min. : 0.00 Length:10000 Length:10000
## Los Angeles:2106 1st Qu.: 27.50 Class :character Class :character
## OTHER : 621 Median : 39.50 Mode :character Mode :character
## D.C. : 492 Mean : 46.80
## Austin : 491 3rd Qu.: 57.17
## New Orleans: 428 Max. :1600.00
## (Other) :2555
## property_category median_CC ppp_ind charges_for_extra host_acceptance
## apartment:5687 Min. :35.00 0:5118 NO :4693 ALL : 488
## condo : 633 1st Qu.:35.00 1:4882 YES:5307 MISSING:9216
## hotel : 70 Median :42.50 SOME : 296
## house :3197 Mean :39.54
## other : 413 3rd Qu.:42.50
## Max. :42.50
##
## host_response has_min_nights
## ALL :6581 Length:10000
## MISSING:1662 Class :character
## SOME :1757 Mode :character
##
##
##
##
How many dummy variables do you end up with in your resulting data frame?
ANSWER TO QUESTION 2a HERE: 36
final_data <- final_cleandata %>%
select(accommodates,bedrooms,beds,cancellation_policy,has_cleaning_fee,host_total_listings_count,price,ppp_ind,
property_category,bed_category,bathrooms,charges_for_extra,host_acceptance,host_response,
has_min_nights,market,host_is_superhost,high_booking_rate)
dummy_variable = dummyVars("~.",data = final_data, fullRank = TRUE)
final_data1 <- data.frame(predict(dummy_variable,newdata = final_data))
final_data1$high_booking_rate = as.factor(final_data1$high_booking_rate)
final_data1 <- final_data1[,-(ncol(final_data1)-1)]
summary(final_data1)
## accommodates bedrooms beds
## Min. : 1.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 2.000 1st Qu.: 1.000 1st Qu.: 1.000
## Median : 3.000 Median : 1.000 Median : 1.000
## Mean : 3.523 Mean : 1.365 Mean : 1.892
## 3rd Qu.: 4.000 3rd Qu.: 2.000 3rd Qu.: 2.000
## Max. :16.000 Max. :11.000 Max. :16.000
## cancellation_policy.moderate cancellation_policy.strict
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000
## Mean :0.3081 Mean :0.4687
## 3rd Qu.:1.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000
## cancellation_policy.super_strict_60 has_cleaning_feeYES
## Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:1.0000
## Median :0.0000 Median :1.0000
## Mean :0.0016 Mean :0.7998
## 3rd Qu.:0.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000
## host_total_listings_count price ppp_ind.1
## Min. : 0.000 Min. : 0.0 Min. :0.0000
## 1st Qu.: 1.000 1st Qu.: 71.0 1st Qu.:0.0000
## Median : 1.000 Median : 109.0 Median :0.0000
## Mean : 9.176 Mean : 154.4 Mean :0.4882
## 3rd Qu.: 3.000 3rd Qu.: 175.0 3rd Qu.:1.0000
## Max. :992.000 Max. :5000.0 Max. :1.0000
## property_category.condo property_category.hotel property_category.house
## Min. :0.0000 Min. :0.000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.000 1st Qu.:0.0000
## Median :0.0000 Median :0.000 Median :0.0000
## Mean :0.0633 Mean :0.007 Mean :0.3197
## 3rd Qu.:0.0000 3rd Qu.:0.000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.000 Max. :1.0000
## property_category.other bed_categoryother bathrooms
## Min. :0.0000 Min. :0.0000 Min. : 0.000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.: 1.000
## Median :0.0000 Median :0.0000 Median : 1.000
## Mean :0.0413 Mean :0.0276 Mean : 1.286
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.: 1.000
## Max. :1.0000 Max. :1.0000 Max. :17.000
## charges_for_extra.YES host_acceptance.MISSING host_acceptance.SOME
## Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:1.0000 1st Qu.:0.0000
## Median :1.0000 Median :1.0000 Median :0.0000
## Mean :0.5307 Mean :0.9216 Mean :0.0296
## 3rd Qu.:1.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000
## host_response.MISSING host_response.SOME has_min_nightsYES market.Boston
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :1.0000 Median :0.0000
## Mean :0.1662 Mean :0.1757 Mean :0.6329 Mean :0.0329
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## market.Chicago market.D.C. market.Denver market.Los.Angeles
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.0369 Mean :0.0492 Mean :0.0302 Mean :0.2106
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## market.Nashville market.New.Orleans market.New.York market.OTHER
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.0419 Mean :0.0428 Mean :0.3307 Mean :0.0621
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000 3rd Qu.:0.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## market.Portland market.San.Diego market.San.Francisco host_is_superhostTRUE
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000
## 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000 1st Qu.:0.0000
## Median :0.0000 Median :0.0000 Median :0.0000 Median :0.0000
## Mean :0.0386 Mean :0.0367 Mean :0.0383 Mean :0.2536
## 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:0.0000 3rd Qu.:1.0000
## Max. :1.0000 Max. :1.0000 Max. :1.0000 Max. :1.0000
## high_booking_rate
## 0:7557
## 1:2443
##
##
##
##
ncol(final_data1)
## [1] 36
train_insts <- sample(nrow(final_data1),0.7*nrow(final_data1))
train_data <- final_data1[train_insts,]
valid_data <- final_data1[-train_insts,]
ANSWER TO QUESTION 2c HERE: 0.7786667
model_log <- glm(high_booking_rate ~.,data = train_data,family= "binomial")
prediction1 <- predict(model_log,new_data = valid_data, type ="response")
classification1 <- ifelse(prediction1 > 0.5,1,0)
classification1 =as.factor(classification1)
a <- ifelse(classification1 == valid_data$high_booking_rate,1,0 )
## Warning in `==.default`(classification1, valid_data$high_booking_rate): longer
## object length is not a multiple of shorter object length
## Warning in is.na(e1) | is.na(e2): longer object length is not a multiple of
## shorter object length
accuracy <- sum(a)/ length(classification1)
accuracy
## [1] 0.6915714
ANSWER TO QUESTION 3a HERE: 204
library(tree)
myset = tree.control(nrow(train_data), mincut = 5, minsize = 10, mindev = 0.0005)
full_tree= tree(high_booking_rate~., control = myset, train_data)
further <- full_tree$frame %>%
filter(var == '<leaf>')
terminal_nodes <- nrow(further)
terminal_nodes
## [1] 204
full_tree
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 7000 7777.000 0 ( 0.75614 0.24386 )
## 2) host_response.MISSING < 0.5 5839 7031.000 0 ( 0.71005 0.28995 )
## 4) host_is_superhostTRUE < 0.5 4122 4489.000 0 ( 0.76565 0.23435 )
## 8) has_min_nightsYES < 0.5 1623 2091.000 0 ( 0.65496 0.34504 )
## 16) ppp_ind.1 < 0.5 1023 1372.000 0 ( 0.60606 0.39394 )
## 32) bedrooms < 0.5 70 90.010 1 ( 0.34286 0.65714 )
## 64) host_acceptance.MISSING < 0.5 12 15.280 0 ( 0.66667 0.33333 ) *
## 65) host_acceptance.MISSING > 0.5 58 68.320 1 ( 0.27586 0.72414 )
## 130) property_category.other < 0.5 52 64.190 1 ( 0.30769 0.69231 ) *
## 131) property_category.other > 0.5 6 0.000 1 ( 0.00000 1.00000 ) *
## 33) bedrooms > 0.5 953 1261.000 0 ( 0.62539 0.37461 )
## 66) host_total_listings_count < 2.5 536 669.700 0 ( 0.68284 0.31716 )
## 132) beds < 1.5 323 368.100 0 ( 0.74303 0.25697 )
## 264) property_category.house < 0.5 236 248.900 0 ( 0.77966 0.22034 )
## 528) host_acceptance.MISSING < 0.5 28 37.520 0 ( 0.60714 0.39286 ) *
## 529) host_acceptance.MISSING > 0.5 208 206.500 0 ( 0.80288 0.19712 )
## 1058) price < 135.5 199 202.400 0 ( 0.79397 0.20603 )
## 2116) accommodates < 1.5 19 7.835 0 ( 0.94737 0.05263 ) *
## 2117) accommodates > 1.5 180 190.700 0 ( 0.77778 0.22222 ) *
## 1059) price > 135.5 9 0.000 0 ( 1.00000 0.00000 ) *
## 265) property_category.house > 0.5 87 113.300 0 ( 0.64368 0.35632 )
## 530) price < 91 82 103.900 0 ( 0.67073 0.32927 )
## 1060) price < 72.5 76 98.900 0 ( 0.64474 0.35526 ) *
## 1061) price > 72.5 6 0.000 0 ( 1.00000 0.00000 ) *
## 531) price > 91 5 5.004 1 ( 0.20000 0.80000 ) *
## 133) beds > 1.5 213 288.100 0 ( 0.59155 0.40845 )
## 266) bathrooms < 2.25 195 267.100 0 ( 0.56410 0.43590 )
## 532) price < 255 187 254.100 0 ( 0.58289 0.41711 )
## 1064) price < 133 132 182.700 0 ( 0.52273 0.47727 )
## 2128) market.Chicago < 0.5 127 176.100 0 ( 0.50394 0.49606 )
## 4256) beds < 2.5 91 124.300 0 ( 0.57143 0.42857 ) *
## 4257) beds > 2.5 36 45.830 1 ( 0.33333 0.66667 )
## 8514) cancellation_policy.moderate < 0.5 25 34.300 1 ( 0.44000 0.56000 ) *
## 8515) cancellation_policy.moderate > 0.5 11 6.702 1 ( 0.09091 0.90909 ) *
## 2129) market.Chicago > 0.5 5 0.000 0 ( 1.00000 0.00000 ) *
## 1065) price > 133 55 64.450 0 ( 0.72727 0.27273 )
## 2130) property_category.house < 0.5 37 47.970 0 ( 0.64865 0.35135 )
## 4260) price < 231 32 38.020 0 ( 0.71875 0.28125 ) *
## 4261) price > 231 5 5.004 1 ( 0.20000 0.80000 ) *
## 2131) property_category.house > 0.5 18 12.560 0 ( 0.88889 0.11111 ) *
## 533) price > 255 8 6.028 1 ( 0.12500 0.87500 ) *
## 267) bathrooms > 2.25 18 12.560 0 ( 0.88889 0.11111 )
## 534) beds < 5.5 13 0.000 0 ( 1.00000 0.00000 ) *
## 535) beds > 5.5 5 6.730 0 ( 0.60000 0.40000 ) *
## 67) host_total_listings_count > 2.5 417 573.600 0 ( 0.55156 0.44844 )
## 134) host_total_listings_count < 29 385 532.800 0 ( 0.52468 0.47532 )
## 268) property_category.other < 0.5 364 504.600 0 ( 0.50549 0.49451 )
## 536) property_category.house < 0.5 223 306.300 1 ( 0.44395 0.55605 )
## 1072) price < 24 6 0.000 0 ( 1.00000 0.00000 ) *
## 1073) price > 24 217 296.400 1 ( 0.42857 0.57143 )
## 2146) host_acceptance.SOME < 0.5 211 286.000 1 ( 0.41232 0.58768 )
## 4292) market.Los.Angeles < 0.5 131 168.600 1 ( 0.34351 0.65649 )
## 8584) charges_for_extra.YES < 0.5 32 24.110 1 ( 0.12500 0.87500 ) *
## 8585) charges_for_extra.YES > 0.5 99 134.300 1 ( 0.41414 0.58586 )
## 17170) has_cleaning_feeYES < 0.5 15 11.780 1 ( 0.13333 0.86667 )
## 34340) price < 67 10 0.000 1 ( 0.00000 1.00000 ) *
## 34341) price > 67 5 6.730 1 ( 0.40000 0.60000 ) *
## 17171) has_cleaning_feeYES > 0.5 84 116.000 1 ( 0.46429 0.53571 )
## 34342) host_total_listings_count < 3.5 26 33.540 0 ( 0.65385 0.34615 ) *
## 34343) host_total_listings_count > 3.5 58 76.990 1 ( 0.37931 0.62069 )
## 68686) market.D.C. < 0.5 50 68.590 1 ( 0.44000 0.56000 ) *
## 68687) market.D.C. > 0.5 8 0.000 1 ( 0.00000 1.00000 ) *
## 4293) market.Los.Angeles > 0.5 80 110.700 0 ( 0.52500 0.47500 )
## 8586) price < 62.5 19 19.560 0 ( 0.78947 0.21053 )
## 17172) host_total_listings_count < 6.5 11 0.000 0 ( 1.00000 0.00000 ) *
## 17173) host_total_listings_count > 6.5 8 11.090 0 ( 0.50000 0.50000 ) *
## 8587) price > 62.5 61 83.760 1 ( 0.44262 0.55738 )
## 17174) price < 74 5 0.000 1 ( 0.00000 1.00000 ) *
## 17175) price > 74 56 77.560 1 ( 0.48214 0.51786 )
## 34350) cancellation_policy.moderate < 0.5 40 54.550 0 ( 0.57500 0.42500 )
## 68700) accommodates < 5.5 22 25.780 0 ( 0.72727 0.27273 )
## 137400) price < 105 11 15.160 1 ( 0.45455 0.54545 ) *
## 137401) price > 105 11 0.000 0 ( 1.00000 0.00000 ) *
## 68701) accommodates > 5.5 18 24.060 1 ( 0.38889 0.61111 )
## 137402) charges_for_extra.YES < 0.5 5 5.004 0 ( 0.80000 0.20000 ) *
## 137403) charges_for_extra.YES > 0.5 13 14.050 1 ( 0.23077 0.76923 ) *
## 34351) cancellation_policy.moderate > 0.5 16 17.990 1 ( 0.25000 0.75000 ) *
## 2147) host_acceptance.SOME > 0.5 6 0.000 0 ( 1.00000 0.00000 ) *
## 537) property_category.house > 0.5 141 189.500 0 ( 0.60284 0.39716 )
## 1074) beds < 5.5 131 178.800 0 ( 0.57252 0.42748 )
## 2148) host_total_listings_count < 25.5 126 170.100 0 ( 0.59524 0.40476 )
## 4296) price < 74.5 93 128.400 0 ( 0.53763 0.46237 )
## 8592) market.New.York < 0.5 79 106.700 0 ( 0.59494 0.40506 )
## 17184) price < 32.5 13 11.160 0 ( 0.84615 0.15385 ) *
## 17185) price > 32.5 66 90.950 0 ( 0.54545 0.45455 )
## 34370) host_total_listings_count < 14.5 60 81.500 0 ( 0.58333 0.41667 ) *
## 34371) host_total_listings_count > 14.5 6 5.407 1 ( 0.16667 0.83333 ) *
## 8593) market.New.York > 0.5 14 14.550 1 ( 0.21429 0.78571 )
## 17186) price < 47 8 10.590 1 ( 0.37500 0.62500 ) *
## 17187) price > 47 6 0.000 1 ( 0.00000 1.00000 ) *
## 4297) price > 74.5 33 36.550 0 ( 0.75758 0.24242 ) *
## 2149) host_total_listings_count > 25.5 5 0.000 1 ( 0.00000 1.00000 ) *
## 1075) beds > 5.5 10 0.000 0 ( 1.00000 0.00000 ) *
## 269) property_category.other > 0.5 21 17.220 0 ( 0.85714 0.14286 )
## 538) accommodates < 3.5 12 0.000 0 ( 1.00000 0.00000 ) *
## 539) accommodates > 3.5 9 11.460 0 ( 0.66667 0.33333 ) *
## 135) host_total_listings_count > 29 32 24.110 0 ( 0.87500 0.12500 )
## 270) accommodates < 3.5 7 9.561 0 ( 0.57143 0.42857 ) *
## 271) accommodates > 3.5 25 8.397 0 ( 0.96000 0.04000 ) *
## 17) ppp_ind.1 > 0.5 600 689.800 0 ( 0.73833 0.26167 )
## 34) market.New.York < 0.5 411 421.600 0 ( 0.79075 0.20925 )
## 68) price < 177 299 334.600 0 ( 0.75251 0.24749 )
## 136) beds < 1.5 268 285.000 0 ( 0.77612 0.22388 )
## 272) cancellation_policy.moderate < 0.5 203 195.700 0 ( 0.81281 0.18719 )
## 544) cancellation_policy.strict < 0.5 102 73.890 0 ( 0.88235 0.11765 ) *
## 545) cancellation_policy.strict > 0.5 101 115.200 0 ( 0.74257 0.25743 )
## 1090) accommodates < 2.5 91 108.900 0 ( 0.71429 0.28571 )
## 2180) host_total_listings_count < 1.5 28 22.970 0 ( 0.85714 0.14286 ) *
## 2181) host_total_listings_count > 1.5 63 81.520 0 ( 0.65079 0.34921 )
## 4362) price < 54 5 0.000 0 ( 1.00000 0.00000 ) *
## 4363) price > 54 58 76.990 0 ( 0.62069 0.37931 ) *
## 1091) accommodates > 2.5 10 0.000 0 ( 1.00000 0.00000 ) *
## 273) cancellation_policy.moderate > 0.5 65 83.200 0 ( 0.66154 0.33846 )
## 546) price < 143.5 57 65.700 0 ( 0.73684 0.26316 )
## 1092) host_total_listings_count < 4.5 49 60.360 0 ( 0.69388 0.30612 ) *
## 1093) host_total_listings_count > 4.5 8 0.000 0 ( 1.00000 0.00000 ) *
## 547) price > 143.5 8 6.028 1 ( 0.12500 0.87500 ) *
## 137) beds > 1.5 31 42.680 0 ( 0.54839 0.45161 )
## 274) host_response.SOME < 0.5 26 35.890 1 ( 0.46154 0.53846 ) *
## 275) host_response.SOME > 0.5 5 0.000 0 ( 1.00000 0.00000 ) *
## 69) price > 177 112 76.270 0 ( 0.89286 0.10714 )
## 138) host_response.SOME < 0.5 75 62.530 0 ( 0.85333 0.14667 )
## 276) host_total_listings_count < 4.5 55 33.510 0 ( 0.90909 0.09091 )
## 552) price < 294 30 27.030 0 ( 0.83333 0.16667 )
## 1104) accommodates < 3.5 11 0.000 0 ( 1.00000 0.00000 ) *
## 1105) accommodates > 3.5 19 21.900 0 ( 0.73684 0.26316 )
## 2210) cancellation_policy.moderate < 0.5 12 16.300 0 ( 0.58333 0.41667 ) *
## 2211) cancellation_policy.moderate > 0.5 7 0.000 0 ( 1.00000 0.00000 ) *
## 553) price > 294 25 0.000 0 ( 1.00000 0.00000 ) *
## 277) host_total_listings_count > 4.5 20 24.430 0 ( 0.70000 0.30000 ) *
## 139) host_response.SOME > 0.5 37 9.195 0 ( 0.97297 0.02703 )
## 278) property_category.other < 0.5 32 0.000 0 ( 1.00000 0.00000 ) *
## 279) property_category.other > 0.5 5 5.004 0 ( 0.80000 0.20000 ) *
## 35) market.New.York > 0.5 189 250.200 0 ( 0.62434 0.37566 )
## 70) accommodates < 1.5 41 40.470 0 ( 0.80488 0.19512 )
## 140) price < 96.5 31 35.400 0 ( 0.74194 0.25806 )
## 280) host_total_listings_count < 3 24 21.630 0 ( 0.83333 0.16667 )
## 560) host_response.SOME < 0.5 15 17.400 0 ( 0.73333 0.26667 )
## 1120) bathrooms < 1.25 10 13.460 0 ( 0.60000 0.40000 ) *
## 1121) bathrooms > 1.25 5 0.000 0 ( 1.00000 0.00000 ) *
## 561) host_response.SOME > 0.5 9 0.000 0 ( 1.00000 0.00000 ) *
## 281) host_total_listings_count > 3 7 9.561 1 ( 0.42857 0.57143 ) *
## 141) price > 96.5 10 0.000 0 ( 1.00000 0.00000 ) *
## 71) accommodates > 1.5 148 201.900 0 ( 0.57432 0.42568 )
## 142) host_total_listings_count < 5.5 141 193.900 0 ( 0.55319 0.44681 )
## 284) price < 149.5 56 76.490 1 ( 0.42857 0.57143 ) *
## 285) price > 149.5 85 111.500 0 ( 0.63529 0.36471 )
## 570) host_total_listings_count < 2.5 74 91.720 0 ( 0.68919 0.31081 )
## 1140) beds < 1.5 34 24.630 0 ( 0.88235 0.11765 ) *
## 1141) beds > 1.5 40 55.350 0 ( 0.52500 0.47500 )
## 2282) host_response.SOME < 0.5 33 45.470 1 ( 0.45455 0.54545 ) *
## 2283) host_response.SOME > 0.5 7 5.742 0 ( 0.85714 0.14286 ) *
## 571) host_total_listings_count > 2.5 11 12.890 1 ( 0.27273 0.72727 ) *
## 143) host_total_listings_count > 5.5 7 0.000 0 ( 1.00000 0.00000 ) *
## 9) has_min_nightsYES > 0.5 2499 2218.000 0 ( 0.83754 0.16246 )
## 18) ppp_ind.1 < 0.5 1175 1211.000 0 ( 0.78894 0.21106 )
## 36) host_total_listings_count < 17.5 1089 1159.000 0 ( 0.77594 0.22406 )
## 72) host_response.SOME < 0.5 835 930.800 0 ( 0.75449 0.24551 )
## 144) market.New.York < 0.5 570 591.900 0 ( 0.78596 0.21404 )
## 288) price < 220.5 528 566.000 0 ( 0.77273 0.22727 )
## 576) price < 206.5 522 553.000 0 ( 0.77778 0.22222 )
## 1152) property_category.other < 0.5 501 519.700 0 ( 0.78643 0.21357 ) *
## 1153) property_category.other > 0.5 21 28.680 0 ( 0.57143 0.42857 )
## 2306) host_total_listings_count < 1.5 15 20.190 1 ( 0.40000 0.60000 )
## 4612) beds < 2.5 8 6.028 1 ( 0.12500 0.87500 ) *
## 4613) beds > 2.5 7 8.376 0 ( 0.71429 0.28571 ) *
## 2307) host_total_listings_count > 1.5 6 0.000 0 ( 1.00000 0.00000 ) *
## 577) price > 206.5 6 7.638 1 ( 0.33333 0.66667 ) *
## 289) price > 220.5 42 16.080 0 ( 0.95238 0.04762 )
## 578) cancellation_policy.moderate < 0.5 35 0.000 0 ( 1.00000 0.00000 ) *
## 579) cancellation_policy.moderate > 0.5 7 8.376 0 ( 0.71429 0.28571 ) *
## 145) market.New.York > 0.5 265 329.500 0 ( 0.68679 0.31321 )
## 290) cancellation_policy.strict < 0.5 129 142.300 0 ( 0.75969 0.24031 )
## 580) price < 38 10 13.460 1 ( 0.40000 0.60000 ) *
## 581) price > 38 119 122.300 0 ( 0.78992 0.21008 )
## 1162) price < 127.5 99 109.700 0 ( 0.75758 0.24242 )
## 2324) accommodates < 3.5 77 75.940 0 ( 0.80519 0.19481 ) *
## 2325) accommodates > 3.5 22 29.770 0 ( 0.59091 0.40909 ) *
## 1163) price > 127.5 20 7.941 0 ( 0.95000 0.05000 ) *
## 291) cancellation_policy.strict > 0.5 136 180.900 0 ( 0.61765 0.38235 ) *
## 73) host_response.SOME > 0.5 254 217.800 0 ( 0.84646 0.15354 )
## 146) beds < 1.94589 130 88.830 0 ( 0.89231 0.10769 )
## 292) market.New.York < 0.5 56 17.260 0 ( 0.96429 0.03571 ) *
## 293) market.New.York > 0.5 74 65.600 0 ( 0.83784 0.16216 )
## 586) property_category.house < 0.5 67 49.010 0 ( 0.88060 0.11940 ) *
## 587) property_category.house > 0.5 7 9.561 1 ( 0.42857 0.57143 ) *
## 147) beds > 1.94589 124 124.700 0 ( 0.79839 0.20161 )
## 294) accommodates < 2.5 9 0.000 0 ( 1.00000 0.00000 ) *
## 295) accommodates > 2.5 115 120.400 0 ( 0.78261 0.21739 )
## 590) bedrooms < 1.18246 46 58.090 0 ( 0.67391 0.32609 )
## 1180) accommodates < 5.5 41 47.690 0 ( 0.73171 0.26829 ) *
## 1181) accommodates > 5.5 5 5.004 1 ( 0.20000 0.80000 ) *
## 591) bedrooms > 1.18246 69 57.110 0 ( 0.85507 0.14493 )
## 1182) accommodates < 5.5 28 31.490 0 ( 0.75000 0.25000 ) *
## 1183) accommodates > 5.5 41 21.460 0 ( 0.92683 0.07317 )
## 2366) charges_for_extra.YES < 0.5 16 15.440 0 ( 0.81250 0.18750 )
## 4732) price < 189.5 8 10.590 0 ( 0.62500 0.37500 ) *
## 4733) price > 189.5 8 0.000 0 ( 1.00000 0.00000 ) *
## 2367) charges_for_extra.YES > 0.5 25 0.000 0 ( 1.00000 0.00000 ) *
## 37) host_total_listings_count > 17.5 86 32.360 0 ( 0.95349 0.04651 ) *
## 19) ppp_ind.1 > 0.5 1324 968.100 0 ( 0.88066 0.11934 )
## 38) bathrooms < 1.25 932 763.900 0 ( 0.85730 0.14270 )
## 76) accommodates < 1.5 132 60.360 0 ( 0.93939 0.06061 )
## 152) market.New.York < 0.5 59 0.000 0 ( 1.00000 0.00000 ) *
## 153) market.New.York > 0.5 73 50.470 0 ( 0.89041 0.10959 )
## 306) charges_for_extra.YES < 0.5 42 40.900 0 ( 0.80952 0.19048 ) *
## 307) charges_for_extra.YES > 0.5 31 0.000 0 ( 1.00000 0.00000 ) *
## 77) accommodates > 1.5 800 693.400 0 ( 0.84375 0.15625 )
## 154) accommodates < 6.5 789 669.300 0 ( 0.84918 0.15082 )
## 308) price < 191.5 531 492.700 0 ( 0.82486 0.17514 )
## 616) price < 184 508 455.600 0 ( 0.83465 0.16535 )
## 1232) bed_categoryother < 0.5 492 449.700 0 ( 0.82927 0.17073 )
## 2464) price < 99.5 110 122.600 0 ( 0.75455 0.24545 )
## 4928) host_total_listings_count < 5.5 101 105.900 0 ( 0.78218 0.21782 )
## 9856) price < 90.5 57 46.240 0 ( 0.85965 0.14035 ) *
## 9857) price > 90.5 44 55.040 0 ( 0.68182 0.31818 )
## 19714) price < 94.5 8 10.590 1 ( 0.37500 0.62500 ) *
## 19715) price > 94.5 36 40.490 0 ( 0.75000 0.25000 ) *
## 4929) host_total_listings_count > 5.5 9 12.370 1 ( 0.44444 0.55556 ) *
## 2465) price > 99.5 382 321.900 0 ( 0.85079 0.14921 )
## 4930) host_acceptance.MISSING < 0.5 31 37.350 0 ( 0.70968 0.29032 ) *
## 4931) host_acceptance.MISSING > 0.5 351 280.100 0 ( 0.86325 0.13675 )
## 9862) host_total_listings_count < 3.5 306 259.000 0 ( 0.84967 0.15033 )
## 19724) bedrooms < 0.5 63 69.160 0 ( 0.76190 0.23810 ) *
## 19725) bedrooms > 0.5 243 185.500 0 ( 0.87243 0.12757 )
## 39450) price < 108.5 29 0.000 0 ( 1.00000 0.00000 ) *
## 39451) price > 108.5 214 177.100 0 ( 0.85514 0.14486 )
## 78902) price < 178.5 201 172.800 0 ( 0.84577 0.15423 )
## 157804) property_category.house < 0.5 174 160.000 0 ( 0.82759 0.17241 )
## 315608) bedrooms < 1.5 161 135.600 0 ( 0.85093 0.14907 ) *
## 315609) bedrooms > 1.5 13 17.940 0 ( 0.53846 0.46154 ) *
## 157805) property_category.house > 0.5 27 8.554 0 ( 0.96296 0.03704 ) *
## 78903) price > 178.5 13 0.000 0 ( 1.00000 0.00000 ) *
## 9863) host_total_listings_count > 3.5 45 16.360 0 ( 0.95556 0.04444 )
## 19726) beds < 1.94589 34 0.000 0 ( 1.00000 0.00000 ) *
## 19727) beds > 1.94589 11 10.430 0 ( 0.81818 0.18182 ) *
## 1233) bed_categoryother > 0.5 16 0.000 0 ( 1.00000 0.00000 ) *
## 617) price > 184 23 30.790 0 ( 0.60870 0.39130 )
## 1234) accommodates < 3.5 16 17.990 0 ( 0.75000 0.25000 ) *
## 1235) accommodates > 3.5 7 8.376 1 ( 0.28571 0.71429 ) *
## 309) price > 191.5 258 168.600 0 ( 0.89922 0.10078 )
## 618) market.New.York < 0.5 136 49.180 0 ( 0.95588 0.04412 ) *
## 619) market.New.York > 0.5 122 108.900 0 ( 0.83607 0.16393 )
## 1238) beds < 2.5 101 69.530 0 ( 0.89109 0.10891 )
## 2476) bedrooms < 0.5 23 0.000 0 ( 1.00000 0.00000 ) *
## 2477) bedrooms > 0.5 78 63.460 0 ( 0.85897 0.14103 )
## 4954) has_cleaning_feeYES < 0.5 13 0.000 0 ( 1.00000 0.00000 ) *
## 4955) has_cleaning_feeYES > 0.5 65 59.110 0 ( 0.83077 0.16923 ) *
## 1239) beds > 2.5 21 28.680 0 ( 0.57143 0.42857 ) *
## 155) accommodates > 6.5 11 15.160 1 ( 0.45455 0.54545 )
## 310) price < 346 6 5.407 1 ( 0.16667 0.83333 ) *
## 311) price > 346 5 5.004 0 ( 0.80000 0.20000 ) *
## 39) bathrooms > 1.25 392 186.000 0 ( 0.93622 0.06378 )
## 78) market.New.York < 0.5 312 108.100 0 ( 0.95833 0.04167 )
## 156) property_category.house < 0.5 96 55.070 0 ( 0.91667 0.08333 )
## 312) accommodates < 5.5 61 17.600 0 ( 0.96721 0.03279 )
## 624) cancellation_policy.moderate < 0.5 41 0.000 0 ( 1.00000 0.00000 ) *
## 625) cancellation_policy.moderate > 0.5 20 13.000 0 ( 0.90000 0.10000 ) *
## 313) accommodates > 5.5 35 32.070 0 ( 0.82857 0.17143 ) *
## 157) property_category.house > 0.5 216 47.540 0 ( 0.97685 0.02315 )
## 314) charges_for_extra.YES < 0.5 116 41.220 0 ( 0.95690 0.04310 ) *
## 315) charges_for_extra.YES > 0.5 100 0.000 0 ( 1.00000 0.00000 ) *
## 79) market.New.York > 0.5 80 67.630 0 ( 0.85000 0.15000 )
## 158) beds < 2.5 43 9.499 0 ( 0.97674 0.02326 )
## 316) price < 383 37 0.000 0 ( 1.00000 0.00000 ) *
## 317) price > 383 6 5.407 0 ( 0.83333 0.16667 ) *
## 159) beds > 2.5 37 45.030 0 ( 0.70270 0.29730 )
## 318) bedrooms < 2.5 9 11.460 1 ( 0.33333 0.66667 ) *
## 319) bedrooms > 2.5 28 26.280 0 ( 0.82143 0.17857 )
## 638) bathrooms < 1.75 5 6.730 1 ( 0.40000 0.60000 ) *
## 639) bathrooms > 1.75 23 13.590 0 ( 0.91304 0.08696 )
## 1278) accommodates < 8.5 16 0.000 0 ( 1.00000 0.00000 ) *
## 1279) accommodates > 8.5 7 8.376 0 ( 0.71429 0.28571 ) *
## 5) host_is_superhostTRUE > 0.5 1717 2340.000 0 ( 0.57659 0.42341 )
## 10) has_min_nightsYES < 0.5 594 819.900 1 ( 0.46128 0.53872 )
## 20) bedrooms < 0.5 43 44.120 1 ( 0.20930 0.79070 )
## 40) host_total_listings_count < 6.5 38 29.590 1 ( 0.13158 0.86842 ) *
## 41) host_total_listings_count > 6.5 5 5.004 0 ( 0.80000 0.20000 ) *
## 21) bedrooms > 0.5 551 763.000 1 ( 0.48094 0.51906 )
## 42) price < 60.5 115 155.600 0 ( 0.59130 0.40870 )
## 84) host_total_listings_count < 9.5 110 150.200 0 ( 0.57273 0.42727 ) *
## 85) host_total_listings_count > 9.5 5 0.000 0 ( 1.00000 0.00000 ) *
## 43) price > 60.5 436 600.400 1 ( 0.45183 0.54817 )
## 86) ppp_ind.1 < 0.5 231 307.000 1 ( 0.38095 0.61905 )
## 172) host_total_listings_count < 6.5 202 261.900 1 ( 0.35149 0.64851 )
## 344) market.New.Orleans < 0.5 185 231.600 1 ( 0.31892 0.68108 )
## 688) accommodates < 2.5 47 65.130 1 ( 0.48936 0.51064 )
## 1376) cancellation_policy.moderate < 0.5 31 40.320 1 ( 0.35484 0.64516 ) *
## 1377) cancellation_policy.moderate > 0.5 16 17.990 0 ( 0.75000 0.25000 )
## 2754) price < 68.5 7 9.561 1 ( 0.42857 0.57143 ) *
## 2755) price > 68.5 9 0.000 0 ( 1.00000 0.00000 ) *
## 689) accommodates > 2.5 138 158.400 1 ( 0.26087 0.73913 )
## 1378) price < 71 7 0.000 1 ( 0.00000 1.00000 ) *
## 1379) price > 71 131 154.100 1 ( 0.27481 0.72519 ) *
## 345) market.New.Orleans > 0.5 17 20.600 0 ( 0.70588 0.29412 ) *
## 173) host_total_listings_count > 6.5 29 39.340 0 ( 0.58621 0.41379 ) *
## 87) ppp_ind.1 > 0.5 205 283.400 0 ( 0.53171 0.46829 )
## 174) beds < 1.44589 143 197.900 1 ( 0.47552 0.52448 )
## 348) price < 237 138 190.300 1 ( 0.45652 0.54348 )
## 696) market.San.Francisco < 0.5 130 180.100 1 ( 0.48462 0.51538 ) *
## 697) market.San.Francisco > 0.5 8 0.000 1 ( 0.00000 1.00000 ) *
## 349) price > 237 5 0.000 0 ( 1.00000 0.00000 ) *
## 175) beds > 1.44589 62 79.380 0 ( 0.66129 0.33871 )
## 350) has_cleaning_feeYES < 0.5 6 0.000 0 ( 1.00000 0.00000 ) *
## 351) has_cleaning_feeYES > 0.5 56 74.100 0 ( 0.62500 0.37500 )
## 702) host_total_listings_count < 2.5 35 39.900 0 ( 0.74286 0.25714 )
## 1404) market.New.York < 0.5 29 26.660 0 ( 0.82759 0.17241 )
## 2808) price < 181.5 10 13.460 0 ( 0.60000 0.40000 ) *
## 2809) price > 181.5 19 7.835 0 ( 0.94737 0.05263 ) *
## 1405) market.New.York > 0.5 6 7.638 1 ( 0.33333 0.66667 ) *
## 703) host_total_listings_count > 2.5 21 28.680 1 ( 0.42857 0.57143 ) *
## 11) has_min_nightsYES > 0.5 1123 1471.000 0 ( 0.63758 0.36242 )
## 22) market.New.York < 0.5 857 1066.000 0 ( 0.68611 0.31389 )
## 44) price < 149.5 547 728.500 0 ( 0.61609 0.38391 )
## 88) price < 61.5 96 105.700 0 ( 0.76042 0.23958 )
## 176) host_response.SOME < 0.5 90 90.070 0 ( 0.80000 0.20000 )
## 352) charges_for_extra.YES < 0.5 45 55.800 0 ( 0.68889 0.31111 )
## 704) property_category.condo < 0.5 40 51.800 0 ( 0.65000 0.35000 )
## 1408) ppp_ind.1 < 0.5 29 39.890 0 ( 0.55172 0.44828 ) *
## 1409) ppp_ind.1 > 0.5 11 6.702 0 ( 0.90909 0.09091 ) *
## 705) property_category.condo > 0.5 5 0.000 0 ( 1.00000 0.00000 ) *
## 353) charges_for_extra.YES > 0.5 45 27.000 0 ( 0.91111 0.08889 )
## 706) market.Chicago < 0.5 40 9.353 0 ( 0.97500 0.02500 ) *
## 707) market.Chicago > 0.5 5 6.730 1 ( 0.40000 0.60000 ) *
## 177) host_response.SOME > 0.5 6 5.407 1 ( 0.16667 0.83333 ) *
## 89) price > 61.5 451 612.000 0 ( 0.58537 0.41463 )
## 178) property_category.other < 0.5 410 551.000 0 ( 0.60244 0.39756 )
## 356) bedrooms < 0.5 51 70.520 1 ( 0.47059 0.52941 ) *
## 357) bedrooms > 0.5 359 476.400 0 ( 0.62117 0.37883 )
## 714) price < 69.5 31 42.170 1 ( 0.41935 0.58065 )
## 1428) host_total_listings_count < 2.5 25 29.650 1 ( 0.28000 0.72000 )
## 2856) host_total_listings_count < 1.5 12 6.884 1 ( 0.08333 0.91667 ) *
## 2857) host_total_listings_count > 1.5 13 17.940 1 ( 0.46154 0.53846 ) *
## 1429) host_total_listings_count > 2.5 6 0.000 0 ( 1.00000 0.00000 ) *
## 715) price > 69.5 328 428.600 0 ( 0.64024 0.35976 )
## 1430) accommodates < 1.5 13 7.051 0 ( 0.92308 0.07692 ) *
## 1431) accommodates > 1.5 315 415.600 0 ( 0.62857 0.37143 )
## 2862) beds < 1.94589 163 204.100 0 ( 0.68098 0.31902 )
## 5724) market.San.Diego < 0.5 151 194.500 0 ( 0.65563 0.34437 )
## 11448) price < 143.5 146 190.100 0 ( 0.64384 0.35616 )
## 22896) price < 81 27 25.870 0 ( 0.81481 0.18519 )
## 45792) host_total_listings_count < 1.5 11 0.000 0 ( 1.00000 0.00000 ) *
## 45793) host_total_listings_count > 1.5 16 19.870 0 ( 0.68750 0.31250 ) *
## 22897) price > 81 119 159.700 0 ( 0.60504 0.39496 )
## 45794) cancellation_policy.strict < 0.5 69 95.640 0 ( 0.50725 0.49275 )
## 91588) host_total_listings_count < 4.5 64 88.160 0 ( 0.54688 0.45312 ) *
## 91589) host_total_listings_count > 4.5 5 0.000 1 ( 0.00000 1.00000 ) *
## 45795) cancellation_policy.strict > 0.5 50 57.310 0 ( 0.74000 0.26000 )
## 91590) price < 132 45 45.040 0 ( 0.80000 0.20000 ) *
## 91591) price > 132 5 5.004 1 ( 0.20000 0.80000 ) *
## 11449) price > 143.5 5 0.000 0 ( 1.00000 0.00000 ) *
## 5725) market.San.Diego > 0.5 12 0.000 0 ( 1.00000 0.00000 ) *
## 2863) beds > 1.94589 152 207.500 0 ( 0.57237 0.42763 )
## 5726) market.San.Francisco < 0.5 141 188.600 0 ( 0.60993 0.39007 ) *
## 5727) market.San.Francisco > 0.5 11 6.702 1 ( 0.09091 0.90909 ) *
## 179) property_category.other > 0.5 41 55.640 1 ( 0.41463 0.58537 )
## 358) charges_for_extra.YES < 0.5 21 23.050 1 ( 0.23810 0.76190 )
## 716) price < 87.5 7 0.000 1 ( 0.00000 1.00000 ) *
## 717) price > 87.5 14 18.250 1 ( 0.35714 0.64286 ) *
## 359) charges_for_extra.YES > 0.5 20 26.920 0 ( 0.60000 0.40000 ) *
## 45) price > 149.5 310 301.700 0 ( 0.80968 0.19032 )
## 90) market.New.Orleans < 0.5 279 287.900 0 ( 0.78853 0.21147 )
## 180) ppp_ind.1 < 0.5 89 110.800 0 ( 0.68539 0.31461 ) *
## 181) ppp_ind.1 > 0.5 190 169.100 0 ( 0.83684 0.16316 )
## 362) host_total_listings_count < 2.5 127 134.000 0 ( 0.77953 0.22047 )
## 724) property_category.condo < 0.5 114 127.100 0 ( 0.75439 0.24561 )
## 1448) market.Nashville < 0.5 103 120.500 0 ( 0.72816 0.27184 )
## 2896) market.Portland < 0.5 96 115.900 0 ( 0.70833 0.29167 ) *
## 2897) market.Portland > 0.5 7 0.000 0 ( 1.00000 0.00000 ) *
## 1449) market.Nashville > 0.5 11 0.000 0 ( 1.00000 0.00000 ) *
## 725) property_category.condo > 0.5 13 0.000 0 ( 1.00000 0.00000 ) *
## 363) host_total_listings_count > 2.5 63 24.120 0 ( 0.95238 0.04762 )
## 726) property_category.house < 0.5 28 19.070 0 ( 0.89286 0.10714 )
## 1452) host_total_listings_count < 8.5 13 14.050 0 ( 0.76923 0.23077 ) *
## 1453) host_total_listings_count > 8.5 15 0.000 0 ( 1.00000 0.00000 ) *
## 727) property_category.house > 0.5 35 0.000 0 ( 1.00000 0.00000 ) *
## 91) market.New.Orleans > 0.5 31 0.000 0 ( 1.00000 0.00000 ) *
## 23) market.New.York > 0.5 266 368.400 1 ( 0.48120 0.51880 )
## 46) accommodates < 3.5 165 226.500 0 ( 0.55758 0.44242 )
## 92) price < 204.5 158 218.100 0 ( 0.53797 0.46203 )
## 184) price < 182.5 151 207.400 0 ( 0.55629 0.44371 ) *
## 185) price > 182.5 7 5.742 1 ( 0.14286 0.85714 ) *
## 93) price > 204.5 7 0.000 0 ( 1.00000 0.00000 ) *
## 47) accommodates > 3.5 101 131.600 1 ( 0.35644 0.64356 )
## 94) host_total_listings_count < 1.5 61 65.720 1 ( 0.22951 0.77049 )
## 188) beds < 3.5 52 60.580 1 ( 0.26923 0.73077 )
## 376) price < 182 29 23.270 1 ( 0.13793 0.86207 )
## 752) price < 132.5 17 18.550 1 ( 0.23529 0.76471 ) *
## 753) price > 132.5 12 0.000 1 ( 0.00000 1.00000 ) *
## 377) price > 182 23 31.490 1 ( 0.43478 0.56522 ) *
## 189) beds > 3.5 9 0.000 1 ( 0.00000 1.00000 ) *
## 95) host_total_listings_count > 1.5 40 55.050 0 ( 0.55000 0.45000 )
## 190) bathrooms < 1.25 34 47.020 1 ( 0.47059 0.52941 ) *
## 191) bathrooms > 1.25 6 0.000 0 ( 1.00000 0.00000 ) *
## 3) host_response.MISSING > 0.5 1161 151.500 0 ( 0.98794 0.01206 )
## 6) host_is_superhostTRUE < 0.5 1125 104.800 0 ( 0.99200 0.00800 )
## 12) host_total_listings_count < 1.5 995 52.120 0 ( 0.99598 0.00402 )
## 24) ppp_ind.1 < 0.5 399 44.780 0 ( 0.98997 0.01003 )
## 48) market.New.York < 0.5 166 0.000 0 ( 1.00000 0.00000 ) *
## 49) market.New.York > 0.5 233 40.450 0 ( 0.98283 0.01717 ) *
## 25) ppp_ind.1 > 0.5 596 0.000 0 ( 1.00000 0.00000 ) *
## 13) host_total_listings_count > 1.5 130 42.390 0 ( 0.96154 0.03846 )
## 26) market.New.York < 0.5 53 0.000 0 ( 1.00000 0.00000 ) *
## 27) market.New.York > 0.5 77 37.010 0 ( 0.93506 0.06494 ) *
## 7) host_is_superhostTRUE > 0.5 36 29.010 0 ( 0.86111 0.13889 )
## 14) has_cleaning_feeYES < 0.5 12 15.280 0 ( 0.66667 0.33333 ) *
## 15) has_cleaning_feeYES > 0.5 24 8.314 0 ( 0.95833 0.04167 ) *
Hint: you might want to change the y-axis of your plot from the default by adding the ylim() argument to plot().
ANSWER TO QUESTION 3b HERE:
tree_sizes <- c(2,4,6,8,10,15,20,25,30,35,40)
tr_accs <- c(0,0,0,0,0,0,0,0,0,0,0)
va_accs <- c(0,0,0,0,0,0,0,0,0,0,0)
accuracy <- function(classifications, actuals){
correct_classifications <- ifelse(classifications == actuals, 1, 0)
acc <- sum(correct_classifications)/length(classifications)
return(acc)
}
### USE A FOR LOOP
for (i in 1:length(tree_sizes)) {
pruned_tree <- prune.tree(full_tree,best = tree_sizes[i])
if (is.null(pruned_tree)) {
cat("Pruning failed for size:", size, "\n")
} else {
plot(pruned_tree)
text(pruned_tree,pretty =0)
tree_preds_tr <- predict(pruned_tree,newdata=train_data)
classifications_tr <-ifelse(tree_preds_tr[,2]> 0.5,1,0)
tr_accs[i] <- accuracy(classifications_tr,train_data$high_booking_rate)
tree_preds_va <- predict(pruned_tree,newdata=valid_data)
classifications_va <-ifelse(tree_preds_va[,2] > 0.5,1,0)
va_accs[i] <- accuracy(classifications_va,valid_data$high_booking_rate)
}
}
tr_accs
## [1] 0.7561429 0.7561429 0.7627143 0.7641429 0.7672857 0.7672857 0.7672857
## [8] 0.7705714 0.7711429 0.7764286 0.7765714
va_accs
## [1] 0.7546667 0.7546667 0.7656667 0.7580000 0.7590000 0.7590000 0.7590000
## [8] 0.7680000 0.7673333 0.7650000 0.7656667
plot(tree_sizes,tr_accs, col = "orange", type = 'l', ylim = c(0.7,0.8))
lines(tree_sizes,va_accs, col = "black")
legend("topright", legend = c("Training Accuracy", "Validation Accuracy"), col = c("orange", "black"), lty = 1)
points(tree_sizes, tr_accs, col = "orange", pch = 16)
points(tree_sizes, va_accs, col = "black", pch = 16)
ANSWER TO QUESTION 3c HERE: Best tree size is 25 terminal nodes, selected based on the lowest cross-validation error, resulting in a validation accuracy of approx 76.8%. Comparing this to the logistic regression model’s accuracy of ~75.47%, the decision tree slightly outperforms. Given the trade-offs between interchangeability, simplicity, & accuracy, choose the decision tree for predicting high_booking_rate, due to its better performance on the validation set & its understandable nature.
my_tree_size <- tree_sizes[which.max(va_accs)]
my_tree_size
## [1] 25
my_va_accs <- va_accs[which.max(va_accs)]
my_va_accs
## [1] 0.768
tra_x <- train_data[,-which(names(train_data) == "high_booking_rate")]
tra_y <- train_data$high_booking_rate
val_x <- valid_data[,-which(names(valid_data) == "high_booking_rate")]
val_y <- valid_data$high_booking_rate
4b. Compute kNN estimates in the training and validation data using k values of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 50, 100, and 200. Assume a cutoff of 0.5. Plot the accuracy in the validation and training sets for each k value. Make sure the two sets of points are different colors! Once again, you might want to play with the ylim() on your graph. It also might help to plot the log(k) on the x-axis rather than just k.
+Note: kNN will take a little more time than logistic regression or trees to run, so be patient!
ANSWER TO QUESTION 4b HERE:
kvec <- c(1,2,3,4,5,6,7,8,9,10,15,20,50,100,200)
va_acc <- rep(0, length(kvec))
tr_acc <- rep(0, length(kvec))
for(i in 1:length(kvec)){
inner_k <- kvec[i]
inner_tr_preds <- knn(tra_x, tra_x, tra_y, k=inner_k, prob = TRUE)
inner_tr_acc <- accuracy(inner_tr_preds, tra_y)
tr_acc[i] <- inner_tr_acc
inner_va_preds <- knn(tra_x, val_x, tra_y, k=inner_k, prob = TRUE)
inner_va_acc <- accuracy(inner_va_preds, val_y)
va_acc[i] <- inner_va_acc
}
tr_acc
## [1] 0.9974286 0.8391429 0.8360000 0.8061429 0.7984286 0.7888571 0.7877143
## [8] 0.7802857 0.7777143 0.7760000 0.7690000 0.7651429 0.7568571 0.7561429
## [15] 0.7561429
va_acc
## [1] 0.6806667 0.7063333 0.7226667 0.7223333 0.7300000 0.7293333 0.7386667
## [8] 0.7376667 0.7440000 0.7473333 0.7513333 0.7530000 0.7536667 0.7546667
## [15] 0.7546667
plot(log(kvec),tr_acc, col = "lightgreen", type = 'l', ylim = c(0.6,1))
lines(log(kvec),va_acc, col = "black")
legend("topright", legend = c("Training Accuracy", "Validation Accuracy"), col = c("lightgreen", "black"), lty = 1)
points(log(kvec), tr_acc, col = "lightgreen", pch = 16)
points(log(kvec), va_acc, col = "black", pch = 16)
bst_k <- kvec[which.max(va_acc)]
bst_k
## [1] 100
bst_va_acc <- va_acc[which.max(va_acc)]
bst_va_acc
## [1] 0.7546667