Due March 10, 2024

Problem Overview

The goal of this homework is to get more hands-on practice with data cleaning and feature engineering, experience with classification trees and kNNs, and understanding of overfitting. You will:

  1. Clean and create additional features in the Airbnb dataset (which you will use for your project).
  2. Develop logistic regression, classification tree, and kNN models.
  3. Construct fitting or complexity-performance curves.
  4. Select the model with the best performance.

The Assignment

The data in the accompanying file “airbnb_hw2.csv” (posted on Canvas) contains data from 10,000 Airbnb.com listings, mostly from the US. This is a larger subset of the data that you will eventually use for the class project. The data dictionary is available on ELMS.

Your task is to develop models to predict the target variable “high_booking_rate”, which labels whether a listing is popular (i.e. spends most of the time booked) or not.

Please answer the questions below clearly and concisely, providing tables or plots where applicable. Turn in a well-formatted compiled HTML document using R Markdown, containing clear answers to the questions and R code in the appropriate places.

RUBRIC: To receive a passing score on this assignment, you must do the following:

  1. Turn in a well-formatted compiled HTML document using R markdown.
  2. Provide clear answers to the questions and the correct R commands as necessary, in the appropriate places. You may answer up to three sub-questions incorrectly (i.e. incorrect R command; missing answer to question) and still receive a P on this assignment (for example, 1(a) counts as one sub-question).
  3. The entire document must be clear, concise, and readable.

Note that this assignment is somewhat open-ended and there are many ways to answer these questions. I don’t require that we have exactly the same answers in order for you to receive full credit.

airbnb <- read_csv("airbnb_hw2.csv")  #read the dataset in R
## Rows: 10000 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (11): name, bed_type, cancellation_policy, cleaning_fee, price, property...
## dbl  (7): accommodates, bedrooms, beds, host_total_listings_count, high_book...
## lgl  (1): host_is_superhost
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
names(airbnb)                       #variables used in dataset
##  [1] "name"                      "accommodates"             
##  [3] "bed_type"                  "bedrooms"                 
##  [5] "beds"                      "cancellation_policy"      
##  [7] "cleaning_fee"              "host_total_listings_count"
##  [9] "price"                     "property_type"            
## [11] "room_type"                 "high_booking_rate"        
## [13] "bathrooms"                 "extra_people"             
## [15] "host_acceptance_rate"      "host_is_superhost"        
## [17] "host_response_rate"        "minimum_nights"           
## [19] "market"

0: Example answer

What is the mean of the accommodates variable?

ANSWER: The mean number of people that can be accommodated in a listing in this dataset is 3.522893.

accommodates_mean <- airbnb %>%
  summarise(mean_accommodates = mean(accommodates))

1: EDA and Data Cleaning

  1. Repeat the data cleaning from HW1. You should be able to mostly reuse your code. As a reminder, this requires doing the following steps (I recommend you do these in the order in which they are listed):

Make sure these variables are factors: - property_category - bed_category - cancellation_policy - room_type - ppp_ind

#PUT QUESTION 1a CODE HERE
cleandata <- airbnb %>%
   mutate(cancellation_policy = as.factor(ifelse(cancellation_policy == 'super_strict_30','strict',cancellation_policy)),
           price = parse_number(price, na = c("", "NA"), locale = default_locale(), trim_ws = TRUE),
           cleaning_fee = parse_number(cleaning_fee, na = c("", "NA"), locale = default_locale(), trim_ws = TRUE),
           cleaning_fee = ifelse(is.na(cleaning_fee),0,cleaning_fee), 
           price = ifelse(is.na(price),0,price),
           accommodates =  ifelse(is.na(accommodates),mean(accommodates,na.rm = TRUE),accommodates),
           bedrooms =  ifelse(is.na(bedrooms),mean(bedrooms,na.rm = TRUE),bedrooms),
           beds =  ifelse(is.na(beds),mean(beds,na.rm = TRUE),beds),
           host_total_listings_count =  ifelse(is.na(host_total_listings_count),
                                               mean(host_total_listings_count,na.rm=TRUE),host_total_listings_count),
          price_per_person = price/accommodates,
           has_cleaning_fee = ifelse(cleaning_fee != 0,"YES", "NO"),
           bed_category = ifelse(bed_type == "Real Bed","Bed","other"),
           property_category = as.factor(case_when(property_type %in% c("Bed & Breakfast","Boutique hotel","Hostel") ~ "hotel",
                                         property_type %in% c("Apartment", "Serviced apartment", "Loft") ~ "apartment",
                                         property_type %in% c("Townhouse","Condominium") ~ "condo",
                                         property_type %in% c("Bungalow","House") ~ "house",TRUE ~ "other" )),
          bed_type = as.factor(bed_type),
          room_type= as.factor(room_type)) %>%
   group_by(property_category)%>%
   mutate(median_CC = median(price_per_person))%>%
   ungroup()%>%
   mutate(ppp_ind = as.factor(ifelse(price_per_person > median_CC,1,0))
          )
summary(cleandata)
##      name            accommodates             bed_type       bedrooms     
##  Length:10000       Min.   : 1.000   Airbed       :  67   Min.   : 0.000  
##  Class :character   1st Qu.: 2.000   Couch        :  19   1st Qu.: 1.000  
##  Mode  :character   Median : 3.000   Futon        : 113   Median : 1.000  
##                     Mean   : 3.523   Pull-out Sofa:  77   Mean   : 1.365  
##                     3rd Qu.: 4.000   Real Bed     :9724   3rd Qu.: 2.000  
##                     Max.   :16.000                        Max.   :11.000  
##                                                                           
##       beds             cancellation_policy  cleaning_fee
##  Min.   : 0.000   flexible       :2216     Min.   :  0  
##  1st Qu.: 1.000   moderate       :3081     1st Qu.: 12  
##  Median : 1.000   strict         :4687     Median : 40  
##  Mean   : 1.892   super_strict_60:  16     Mean   : 55  
##  3rd Qu.: 2.000                            3rd Qu.: 80  
##  Max.   :16.000                            Max.   :950  
##                                                         
##  host_total_listings_count     price        property_type     
##  Min.   :  0.000           Min.   :   0.0   Length:10000      
##  1st Qu.:  1.000           1st Qu.:  71.0   Class :character  
##  Median :  1.000           Median : 109.0   Mode  :character  
##  Mean   :  9.176           Mean   : 154.4                     
##  3rd Qu.:  3.000           3rd Qu.: 175.0                     
##  Max.   :992.000           Max.   :5000.0                     
##                                                               
##            room_type    high_booking_rate   bathrooms      extra_people      
##  Entire home/apt:6080   Min.   :0.0000    Min.   : 0.000   Length:10000      
##  Private room   :3659   1st Qu.:0.0000    1st Qu.: 1.000   Class :character  
##  Shared room    : 261   Median :0.0000    Median : 1.000   Mode  :character  
##                         Mean   :0.2443    Mean   : 1.287                     
##                         3rd Qu.:0.0000    3rd Qu.: 1.000                     
##                         Max.   :1.0000    Max.   :17.000                     
##                                           NA's   :31                         
##  host_acceptance_rate host_is_superhost host_response_rate minimum_nights    
##  Length:10000         Mode :logical     Length:10000       Min.   :   1.000  
##  Class :character     FALSE:7445        Class :character   1st Qu.:   1.000  
##  Mode  :character     TRUE :2536        Mode  :character   Median :   2.000  
##                       NA's :19                             Mean   :   3.378  
##                                                            3rd Qu.:   3.000  
##                                                            Max.   :1125.000  
##                                                                              
##     market          price_per_person  has_cleaning_fee   bed_category      
##  Length:10000       Min.   :   0.00   Length:10000       Length:10000      
##  Class :character   1st Qu.:  27.50   Class :character   Class :character  
##  Mode  :character   Median :  39.50   Mode  :character   Mode  :character  
##                     Mean   :  46.80                                        
##                     3rd Qu.:  57.17                                        
##                     Max.   :1600.00                                        
##                                                                            
##  property_category   median_CC     ppp_ind 
##  apartment:5687    Min.   :35.00   0:5118  
##  condo    : 633    1st Qu.:35.00   1:4882  
##  hotel    :  70    Median :42.50           
##  house    :3197    Mean   :39.54           
##  other    : 413    3rd Qu.:42.50           
##                    Max.   :42.50           
## 
  1. Additional variables have been included in this dataset. Do the following steps to clean them in preparation for inclusion in models:
final_cleandata <- cleandata %>%
  mutate(bathrooms = ifelse(is.na(bathrooms),median(bathrooms,na.rm = TRUE),bathrooms ),
         host_is_superhost =ifelse(is.na(host_is_superhost),FALSE,host_is_superhost),
         charges_for_extra = as.factor(ifelse(parse_number(extra_people) > 0,"YES","NO")),
         host_acceptance = as.factor(ifelse(is.na(host_acceptance_rate),"MISSING",
                                            ifelse(host_acceptance_rate =="100%","ALL","SOME"))),
         host_response = as.factor(ifelse(is.na(host_response_rate),"MISSING",
                                            ifelse(host_response_rate =="100%","ALL","SOME"))),
         has_min_nights = ifelse(minimum_nights > 1,"YES","NO"),
         market = as.factor(ifelse(is.na(market) | table(market)[market] < 300, "OTHER", market)),
         high_booking_rate = as.factor(high_booking_rate))
summary(final_cleandata)
##      name            accommodates             bed_type       bedrooms     
##  Length:10000       Min.   : 1.000   Airbed       :  67   Min.   : 0.000  
##  Class :character   1st Qu.: 2.000   Couch        :  19   1st Qu.: 1.000  
##  Mode  :character   Median : 3.000   Futon        : 113   Median : 1.000  
##                     Mean   : 3.523   Pull-out Sofa:  77   Mean   : 1.365  
##                     3rd Qu.: 4.000   Real Bed     :9724   3rd Qu.: 2.000  
##                     Max.   :16.000                        Max.   :11.000  
##                                                                           
##       beds             cancellation_policy  cleaning_fee
##  Min.   : 0.000   flexible       :2216     Min.   :  0  
##  1st Qu.: 1.000   moderate       :3081     1st Qu.: 12  
##  Median : 1.000   strict         :4687     Median : 40  
##  Mean   : 1.892   super_strict_60:  16     Mean   : 55  
##  3rd Qu.: 2.000                            3rd Qu.: 80  
##  Max.   :16.000                            Max.   :950  
##                                                         
##  host_total_listings_count     price        property_type     
##  Min.   :  0.000           Min.   :   0.0   Length:10000      
##  1st Qu.:  1.000           1st Qu.:  71.0   Class :character  
##  Median :  1.000           Median : 109.0   Mode  :character  
##  Mean   :  9.176           Mean   : 154.4                     
##  3rd Qu.:  3.000           3rd Qu.: 175.0                     
##  Max.   :992.000           Max.   :5000.0                     
##                                                               
##            room_type    high_booking_rate   bathrooms      extra_people      
##  Entire home/apt:6080   0:7557            Min.   : 0.000   Length:10000      
##  Private room   :3659   1:2443            1st Qu.: 1.000   Class :character  
##  Shared room    : 261                     Median : 1.000   Mode  :character  
##                                           Mean   : 1.286                     
##                                           3rd Qu.: 1.000                     
##                                           Max.   :17.000                     
##                                                                              
##  host_acceptance_rate host_is_superhost host_response_rate minimum_nights    
##  Length:10000         Mode :logical     Length:10000       Min.   :   1.000  
##  Class :character     FALSE:7464        Class :character   1st Qu.:   1.000  
##  Mode  :character     TRUE :2536        Mode  :character   Median :   2.000  
##                                                            Mean   :   3.378  
##                                                            3rd Qu.:   3.000  
##                                                            Max.   :1125.000  
##                                                                              
##          market     price_per_person  has_cleaning_fee   bed_category      
##  New York   :3307   Min.   :   0.00   Length:10000       Length:10000      
##  Los Angeles:2106   1st Qu.:  27.50   Class :character   Class :character  
##  OTHER      : 621   Median :  39.50   Mode  :character   Mode  :character  
##  D.C.       : 492   Mean   :  46.80                                        
##  Austin     : 491   3rd Qu.:  57.17                                        
##  New Orleans: 428   Max.   :1600.00                                        
##  (Other)    :2555                                                          
##  property_category   median_CC     ppp_ind  charges_for_extra host_acceptance
##  apartment:5687    Min.   :35.00   0:5118   NO :4693          ALL    : 488   
##  condo    : 633    1st Qu.:35.00   1:4882   YES:5307          MISSING:9216   
##  hotel    :  70    Median :42.50                              SOME   : 296   
##  house    :3197    Mean   :39.54                                             
##  other    : 413    3rd Qu.:42.50                                             
##                    Max.   :42.50                                             
##                                                                              
##  host_response  has_min_nights    
##  ALL    :6581   Length:10000      
##  MISSING:1662   Class :character  
##  SOME   :1757   Mode  :character  
##                                   
##                                   
##                                   
## 

2: Modeling Setup

  1. Select the variables listed below from your dataframe - these will be the features used in our models (plus high_booking_rate, which is the target variable). Create a dataframe of dummy variables from the resulting set of variables. Reminder: include the argument fullRank = TRUE when you call dummyVars()! Convert the dummy variable associated with high_booking_rate to a factor.

How many dummy variables do you end up with in your resulting data frame?

ANSWER TO QUESTION 2a HERE: 36

final_data <- final_cleandata %>%
  select(accommodates,bedrooms,beds,cancellation_policy,has_cleaning_fee,host_total_listings_count,price,ppp_ind,
         property_category,bed_category,bathrooms,charges_for_extra,host_acceptance,host_response,
         has_min_nights,market,host_is_superhost,high_booking_rate)
dummy_variable = dummyVars("~.",data = final_data, fullRank = TRUE)
final_data1 <- data.frame(predict(dummy_variable,newdata = final_data))
final_data1$high_booking_rate = as.factor(final_data1$high_booking_rate)
final_data1 <- final_data1[,-(ncol(final_data1)-1)]
summary(final_data1)
##   accommodates       bedrooms           beds       
##  Min.   : 1.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 2.000   1st Qu.: 1.000   1st Qu.: 1.000  
##  Median : 3.000   Median : 1.000   Median : 1.000  
##  Mean   : 3.523   Mean   : 1.365   Mean   : 1.892  
##  3rd Qu.: 4.000   3rd Qu.: 2.000   3rd Qu.: 2.000  
##  Max.   :16.000   Max.   :11.000   Max.   :16.000  
##  cancellation_policy.moderate cancellation_policy.strict
##  Min.   :0.0000               Min.   :0.0000            
##  1st Qu.:0.0000               1st Qu.:0.0000            
##  Median :0.0000               Median :0.0000            
##  Mean   :0.3081               Mean   :0.4687            
##  3rd Qu.:1.0000               3rd Qu.:1.0000            
##  Max.   :1.0000               Max.   :1.0000            
##  cancellation_policy.super_strict_60 has_cleaning_feeYES
##  Min.   :0.0000                      Min.   :0.0000     
##  1st Qu.:0.0000                      1st Qu.:1.0000     
##  Median :0.0000                      Median :1.0000     
##  Mean   :0.0016                      Mean   :0.7998     
##  3rd Qu.:0.0000                      3rd Qu.:1.0000     
##  Max.   :1.0000                      Max.   :1.0000     
##  host_total_listings_count     price          ppp_ind.1     
##  Min.   :  0.000           Min.   :   0.0   Min.   :0.0000  
##  1st Qu.:  1.000           1st Qu.:  71.0   1st Qu.:0.0000  
##  Median :  1.000           Median : 109.0   Median :0.0000  
##  Mean   :  9.176           Mean   : 154.4   Mean   :0.4882  
##  3rd Qu.:  3.000           3rd Qu.: 175.0   3rd Qu.:1.0000  
##  Max.   :992.000           Max.   :5000.0   Max.   :1.0000  
##  property_category.condo property_category.hotel property_category.house
##  Min.   :0.0000          Min.   :0.000           Min.   :0.0000         
##  1st Qu.:0.0000          1st Qu.:0.000           1st Qu.:0.0000         
##  Median :0.0000          Median :0.000           Median :0.0000         
##  Mean   :0.0633          Mean   :0.007           Mean   :0.3197         
##  3rd Qu.:0.0000          3rd Qu.:0.000           3rd Qu.:1.0000         
##  Max.   :1.0000          Max.   :1.000           Max.   :1.0000         
##  property_category.other bed_categoryother   bathrooms     
##  Min.   :0.0000          Min.   :0.0000    Min.   : 0.000  
##  1st Qu.:0.0000          1st Qu.:0.0000    1st Qu.: 1.000  
##  Median :0.0000          Median :0.0000    Median : 1.000  
##  Mean   :0.0413          Mean   :0.0276    Mean   : 1.286  
##  3rd Qu.:0.0000          3rd Qu.:0.0000    3rd Qu.: 1.000  
##  Max.   :1.0000          Max.   :1.0000    Max.   :17.000  
##  charges_for_extra.YES host_acceptance.MISSING host_acceptance.SOME
##  Min.   :0.0000        Min.   :0.0000          Min.   :0.0000      
##  1st Qu.:0.0000        1st Qu.:1.0000          1st Qu.:0.0000      
##  Median :1.0000        Median :1.0000          Median :0.0000      
##  Mean   :0.5307        Mean   :0.9216          Mean   :0.0296      
##  3rd Qu.:1.0000        3rd Qu.:1.0000          3rd Qu.:0.0000      
##  Max.   :1.0000        Max.   :1.0000          Max.   :1.0000      
##  host_response.MISSING host_response.SOME has_min_nightsYES market.Boston   
##  Min.   :0.0000        Min.   :0.0000     Min.   :0.0000    Min.   :0.0000  
##  1st Qu.:0.0000        1st Qu.:0.0000     1st Qu.:0.0000    1st Qu.:0.0000  
##  Median :0.0000        Median :0.0000     Median :1.0000    Median :0.0000  
##  Mean   :0.1662        Mean   :0.1757     Mean   :0.6329    Mean   :0.0329  
##  3rd Qu.:0.0000        3rd Qu.:0.0000     3rd Qu.:1.0000    3rd Qu.:0.0000  
##  Max.   :1.0000        Max.   :1.0000     Max.   :1.0000    Max.   :1.0000  
##  market.Chicago    market.D.C.     market.Denver    market.Los.Angeles
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000   Min.   :0.0000    
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000    
##  Median :0.0000   Median :0.0000   Median :0.0000   Median :0.0000    
##  Mean   :0.0369   Mean   :0.0492   Mean   :0.0302   Mean   :0.2106    
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000    
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000   Max.   :1.0000    
##  market.Nashville market.New.Orleans market.New.York   market.OTHER   
##  Min.   :0.0000   Min.   :0.0000     Min.   :0.0000   Min.   :0.0000  
##  1st Qu.:0.0000   1st Qu.:0.0000     1st Qu.:0.0000   1st Qu.:0.0000  
##  Median :0.0000   Median :0.0000     Median :0.0000   Median :0.0000  
##  Mean   :0.0419   Mean   :0.0428     Mean   :0.3307   Mean   :0.0621  
##  3rd Qu.:0.0000   3rd Qu.:0.0000     3rd Qu.:1.0000   3rd Qu.:0.0000  
##  Max.   :1.0000   Max.   :1.0000     Max.   :1.0000   Max.   :1.0000  
##  market.Portland  market.San.Diego market.San.Francisco host_is_superhostTRUE
##  Min.   :0.0000   Min.   :0.0000   Min.   :0.0000       Min.   :0.0000       
##  1st Qu.:0.0000   1st Qu.:0.0000   1st Qu.:0.0000       1st Qu.:0.0000       
##  Median :0.0000   Median :0.0000   Median :0.0000       Median :0.0000       
##  Mean   :0.0386   Mean   :0.0367   Mean   :0.0383       Mean   :0.2536       
##  3rd Qu.:0.0000   3rd Qu.:0.0000   3rd Qu.:0.0000       3rd Qu.:1.0000       
##  Max.   :1.0000   Max.   :1.0000   Max.   :1.0000       Max.   :1.0000       
##  high_booking_rate
##  0:7557           
##  1:2443           
##                   
##                   
##                   
## 
ncol(final_data1)
## [1] 36
  1. Split your data into 70% training, 30% validation.
train_insts <- sample(nrow(final_data1),0.7*nrow(final_data1))
train_data <- final_data1[train_insts,]
valid_data <- final_data1[-train_insts,]
  1. Train a logistic regression model to predict high_booking_rate in the training data (using the remaining variables as predictors). Report the accuracy in the validation data.

ANSWER TO QUESTION 2c HERE: 0.7786667

model_log <- glm(high_booking_rate ~.,data = train_data,family= "binomial")
prediction1 <- predict(model_log,new_data = valid_data, type ="response")
classification1 <- ifelse(prediction1 > 0.5,1,0)
classification1 =as.factor(classification1)
a <- ifelse(classification1 == valid_data$high_booking_rate,1,0 )
## Warning in `==.default`(classification1, valid_data$high_booking_rate): longer
## object length is not a multiple of shorter object length
## Warning in is.na(e1) | is.na(e2): longer object length is not a multiple of
## shorter object length
accuracy <- sum(a)/ length(classification1)
accuracy
## [1] 0.6915714

3: Classification trees

  1. Use the following code to create an unpruned tree (replace YOUR_Y_VAR and YOUR_TRAINING_DATA with the appropriate variable names, then uncomment the line starting with “full_tree…”). How many terminal nodes are in the full tree? Which variable has the highest-information gain (leads to the biggest decrease in impurity)? How do you know?

ANSWER TO QUESTION 3a HERE: 204

library(tree)

myset = tree.control(nrow(train_data), mincut = 5, minsize = 10, mindev = 0.0005)

full_tree= tree(high_booking_rate~., control = myset, train_data)

further <- full_tree$frame %>%
          filter(var == '<leaf>')
terminal_nodes <- nrow(further)
terminal_nodes
## [1] 204
full_tree
## node), split, n, deviance, yval, (yprob)
##       * denotes terminal node
## 
##      1) root 7000 7777.000 0 ( 0.75614 0.24386 )  
##        2) host_response.MISSING < 0.5 5839 7031.000 0 ( 0.71005 0.28995 )  
##          4) host_is_superhostTRUE < 0.5 4122 4489.000 0 ( 0.76565 0.23435 )  
##            8) has_min_nightsYES < 0.5 1623 2091.000 0 ( 0.65496 0.34504 )  
##             16) ppp_ind.1 < 0.5 1023 1372.000 0 ( 0.60606 0.39394 )  
##               32) bedrooms < 0.5 70   90.010 1 ( 0.34286 0.65714 )  
##                 64) host_acceptance.MISSING < 0.5 12   15.280 0 ( 0.66667 0.33333 ) *
##                 65) host_acceptance.MISSING > 0.5 58   68.320 1 ( 0.27586 0.72414 )  
##                  130) property_category.other < 0.5 52   64.190 1 ( 0.30769 0.69231 ) *
##                  131) property_category.other > 0.5 6    0.000 1 ( 0.00000 1.00000 ) *
##               33) bedrooms > 0.5 953 1261.000 0 ( 0.62539 0.37461 )  
##                 66) host_total_listings_count < 2.5 536  669.700 0 ( 0.68284 0.31716 )  
##                  132) beds < 1.5 323  368.100 0 ( 0.74303 0.25697 )  
##                    264) property_category.house < 0.5 236  248.900 0 ( 0.77966 0.22034 )  
##                      528) host_acceptance.MISSING < 0.5 28   37.520 0 ( 0.60714 0.39286 ) *
##                      529) host_acceptance.MISSING > 0.5 208  206.500 0 ( 0.80288 0.19712 )  
##                       1058) price < 135.5 199  202.400 0 ( 0.79397 0.20603 )  
##                         2116) accommodates < 1.5 19    7.835 0 ( 0.94737 0.05263 ) *
##                         2117) accommodates > 1.5 180  190.700 0 ( 0.77778 0.22222 ) *
##                       1059) price > 135.5 9    0.000 0 ( 1.00000 0.00000 ) *
##                    265) property_category.house > 0.5 87  113.300 0 ( 0.64368 0.35632 )  
##                      530) price < 91 82  103.900 0 ( 0.67073 0.32927 )  
##                       1060) price < 72.5 76   98.900 0 ( 0.64474 0.35526 ) *
##                       1061) price > 72.5 6    0.000 0 ( 1.00000 0.00000 ) *
##                      531) price > 91 5    5.004 1 ( 0.20000 0.80000 ) *
##                  133) beds > 1.5 213  288.100 0 ( 0.59155 0.40845 )  
##                    266) bathrooms < 2.25 195  267.100 0 ( 0.56410 0.43590 )  
##                      532) price < 255 187  254.100 0 ( 0.58289 0.41711 )  
##                       1064) price < 133 132  182.700 0 ( 0.52273 0.47727 )  
##                         2128) market.Chicago < 0.5 127  176.100 0 ( 0.50394 0.49606 )  
##                           4256) beds < 2.5 91  124.300 0 ( 0.57143 0.42857 ) *
##                           4257) beds > 2.5 36   45.830 1 ( 0.33333 0.66667 )  
##                             8514) cancellation_policy.moderate < 0.5 25   34.300 1 ( 0.44000 0.56000 ) *
##                             8515) cancellation_policy.moderate > 0.5 11    6.702 1 ( 0.09091 0.90909 ) *
##                         2129) market.Chicago > 0.5 5    0.000 0 ( 1.00000 0.00000 ) *
##                       1065) price > 133 55   64.450 0 ( 0.72727 0.27273 )  
##                         2130) property_category.house < 0.5 37   47.970 0 ( 0.64865 0.35135 )  
##                           4260) price < 231 32   38.020 0 ( 0.71875 0.28125 ) *
##                           4261) price > 231 5    5.004 1 ( 0.20000 0.80000 ) *
##                         2131) property_category.house > 0.5 18   12.560 0 ( 0.88889 0.11111 ) *
##                      533) price > 255 8    6.028 1 ( 0.12500 0.87500 ) *
##                    267) bathrooms > 2.25 18   12.560 0 ( 0.88889 0.11111 )  
##                      534) beds < 5.5 13    0.000 0 ( 1.00000 0.00000 ) *
##                      535) beds > 5.5 5    6.730 0 ( 0.60000 0.40000 ) *
##                 67) host_total_listings_count > 2.5 417  573.600 0 ( 0.55156 0.44844 )  
##                  134) host_total_listings_count < 29 385  532.800 0 ( 0.52468 0.47532 )  
##                    268) property_category.other < 0.5 364  504.600 0 ( 0.50549 0.49451 )  
##                      536) property_category.house < 0.5 223  306.300 1 ( 0.44395 0.55605 )  
##                       1072) price < 24 6    0.000 0 ( 1.00000 0.00000 ) *
##                       1073) price > 24 217  296.400 1 ( 0.42857 0.57143 )  
##                         2146) host_acceptance.SOME < 0.5 211  286.000 1 ( 0.41232 0.58768 )  
##                           4292) market.Los.Angeles < 0.5 131  168.600 1 ( 0.34351 0.65649 )  
##                             8584) charges_for_extra.YES < 0.5 32   24.110 1 ( 0.12500 0.87500 ) *
##                             8585) charges_for_extra.YES > 0.5 99  134.300 1 ( 0.41414 0.58586 )  
##                              17170) has_cleaning_feeYES < 0.5 15   11.780 1 ( 0.13333 0.86667 )  
##                                34340) price < 67 10    0.000 1 ( 0.00000 1.00000 ) *
##                                34341) price > 67 5    6.730 1 ( 0.40000 0.60000 ) *
##                              17171) has_cleaning_feeYES > 0.5 84  116.000 1 ( 0.46429 0.53571 )  
##                                34342) host_total_listings_count < 3.5 26   33.540 0 ( 0.65385 0.34615 ) *
##                                34343) host_total_listings_count > 3.5 58   76.990 1 ( 0.37931 0.62069 )  
##                                  68686) market.D.C. < 0.5 50   68.590 1 ( 0.44000 0.56000 ) *
##                                  68687) market.D.C. > 0.5 8    0.000 1 ( 0.00000 1.00000 ) *
##                           4293) market.Los.Angeles > 0.5 80  110.700 0 ( 0.52500 0.47500 )  
##                             8586) price < 62.5 19   19.560 0 ( 0.78947 0.21053 )  
##                              17172) host_total_listings_count < 6.5 11    0.000 0 ( 1.00000 0.00000 ) *
##                              17173) host_total_listings_count > 6.5 8   11.090 0 ( 0.50000 0.50000 ) *
##                             8587) price > 62.5 61   83.760 1 ( 0.44262 0.55738 )  
##                              17174) price < 74 5    0.000 1 ( 0.00000 1.00000 ) *
##                              17175) price > 74 56   77.560 1 ( 0.48214 0.51786 )  
##                                34350) cancellation_policy.moderate < 0.5 40   54.550 0 ( 0.57500 0.42500 )  
##                                  68700) accommodates < 5.5 22   25.780 0 ( 0.72727 0.27273 )  
##                                   137400) price < 105 11   15.160 1 ( 0.45455 0.54545 ) *
##                                   137401) price > 105 11    0.000 0 ( 1.00000 0.00000 ) *
##                                  68701) accommodates > 5.5 18   24.060 1 ( 0.38889 0.61111 )  
##                                   137402) charges_for_extra.YES < 0.5 5    5.004 0 ( 0.80000 0.20000 ) *
##                                   137403) charges_for_extra.YES > 0.5 13   14.050 1 ( 0.23077 0.76923 ) *
##                                34351) cancellation_policy.moderate > 0.5 16   17.990 1 ( 0.25000 0.75000 ) *
##                         2147) host_acceptance.SOME > 0.5 6    0.000 0 ( 1.00000 0.00000 ) *
##                      537) property_category.house > 0.5 141  189.500 0 ( 0.60284 0.39716 )  
##                       1074) beds < 5.5 131  178.800 0 ( 0.57252 0.42748 )  
##                         2148) host_total_listings_count < 25.5 126  170.100 0 ( 0.59524 0.40476 )  
##                           4296) price < 74.5 93  128.400 0 ( 0.53763 0.46237 )  
##                             8592) market.New.York < 0.5 79  106.700 0 ( 0.59494 0.40506 )  
##                              17184) price < 32.5 13   11.160 0 ( 0.84615 0.15385 ) *
##                              17185) price > 32.5 66   90.950 0 ( 0.54545 0.45455 )  
##                                34370) host_total_listings_count < 14.5 60   81.500 0 ( 0.58333 0.41667 ) *
##                                34371) host_total_listings_count > 14.5 6    5.407 1 ( 0.16667 0.83333 ) *
##                             8593) market.New.York > 0.5 14   14.550 1 ( 0.21429 0.78571 )  
##                              17186) price < 47 8   10.590 1 ( 0.37500 0.62500 ) *
##                              17187) price > 47 6    0.000 1 ( 0.00000 1.00000 ) *
##                           4297) price > 74.5 33   36.550 0 ( 0.75758 0.24242 ) *
##                         2149) host_total_listings_count > 25.5 5    0.000 1 ( 0.00000 1.00000 ) *
##                       1075) beds > 5.5 10    0.000 0 ( 1.00000 0.00000 ) *
##                    269) property_category.other > 0.5 21   17.220 0 ( 0.85714 0.14286 )  
##                      538) accommodates < 3.5 12    0.000 0 ( 1.00000 0.00000 ) *
##                      539) accommodates > 3.5 9   11.460 0 ( 0.66667 0.33333 ) *
##                  135) host_total_listings_count > 29 32   24.110 0 ( 0.87500 0.12500 )  
##                    270) accommodates < 3.5 7    9.561 0 ( 0.57143 0.42857 ) *
##                    271) accommodates > 3.5 25    8.397 0 ( 0.96000 0.04000 ) *
##             17) ppp_ind.1 > 0.5 600  689.800 0 ( 0.73833 0.26167 )  
##               34) market.New.York < 0.5 411  421.600 0 ( 0.79075 0.20925 )  
##                 68) price < 177 299  334.600 0 ( 0.75251 0.24749 )  
##                  136) beds < 1.5 268  285.000 0 ( 0.77612 0.22388 )  
##                    272) cancellation_policy.moderate < 0.5 203  195.700 0 ( 0.81281 0.18719 )  
##                      544) cancellation_policy.strict < 0.5 102   73.890 0 ( 0.88235 0.11765 ) *
##                      545) cancellation_policy.strict > 0.5 101  115.200 0 ( 0.74257 0.25743 )  
##                       1090) accommodates < 2.5 91  108.900 0 ( 0.71429 0.28571 )  
##                         2180) host_total_listings_count < 1.5 28   22.970 0 ( 0.85714 0.14286 ) *
##                         2181) host_total_listings_count > 1.5 63   81.520 0 ( 0.65079 0.34921 )  
##                           4362) price < 54 5    0.000 0 ( 1.00000 0.00000 ) *
##                           4363) price > 54 58   76.990 0 ( 0.62069 0.37931 ) *
##                       1091) accommodates > 2.5 10    0.000 0 ( 1.00000 0.00000 ) *
##                    273) cancellation_policy.moderate > 0.5 65   83.200 0 ( 0.66154 0.33846 )  
##                      546) price < 143.5 57   65.700 0 ( 0.73684 0.26316 )  
##                       1092) host_total_listings_count < 4.5 49   60.360 0 ( 0.69388 0.30612 ) *
##                       1093) host_total_listings_count > 4.5 8    0.000 0 ( 1.00000 0.00000 ) *
##                      547) price > 143.5 8    6.028 1 ( 0.12500 0.87500 ) *
##                  137) beds > 1.5 31   42.680 0 ( 0.54839 0.45161 )  
##                    274) host_response.SOME < 0.5 26   35.890 1 ( 0.46154 0.53846 ) *
##                    275) host_response.SOME > 0.5 5    0.000 0 ( 1.00000 0.00000 ) *
##                 69) price > 177 112   76.270 0 ( 0.89286 0.10714 )  
##                  138) host_response.SOME < 0.5 75   62.530 0 ( 0.85333 0.14667 )  
##                    276) host_total_listings_count < 4.5 55   33.510 0 ( 0.90909 0.09091 )  
##                      552) price < 294 30   27.030 0 ( 0.83333 0.16667 )  
##                       1104) accommodates < 3.5 11    0.000 0 ( 1.00000 0.00000 ) *
##                       1105) accommodates > 3.5 19   21.900 0 ( 0.73684 0.26316 )  
##                         2210) cancellation_policy.moderate < 0.5 12   16.300 0 ( 0.58333 0.41667 ) *
##                         2211) cancellation_policy.moderate > 0.5 7    0.000 0 ( 1.00000 0.00000 ) *
##                      553) price > 294 25    0.000 0 ( 1.00000 0.00000 ) *
##                    277) host_total_listings_count > 4.5 20   24.430 0 ( 0.70000 0.30000 ) *
##                  139) host_response.SOME > 0.5 37    9.195 0 ( 0.97297 0.02703 )  
##                    278) property_category.other < 0.5 32    0.000 0 ( 1.00000 0.00000 ) *
##                    279) property_category.other > 0.5 5    5.004 0 ( 0.80000 0.20000 ) *
##               35) market.New.York > 0.5 189  250.200 0 ( 0.62434 0.37566 )  
##                 70) accommodates < 1.5 41   40.470 0 ( 0.80488 0.19512 )  
##                  140) price < 96.5 31   35.400 0 ( 0.74194 0.25806 )  
##                    280) host_total_listings_count < 3 24   21.630 0 ( 0.83333 0.16667 )  
##                      560) host_response.SOME < 0.5 15   17.400 0 ( 0.73333 0.26667 )  
##                       1120) bathrooms < 1.25 10   13.460 0 ( 0.60000 0.40000 ) *
##                       1121) bathrooms > 1.25 5    0.000 0 ( 1.00000 0.00000 ) *
##                      561) host_response.SOME > 0.5 9    0.000 0 ( 1.00000 0.00000 ) *
##                    281) host_total_listings_count > 3 7    9.561 1 ( 0.42857 0.57143 ) *
##                  141) price > 96.5 10    0.000 0 ( 1.00000 0.00000 ) *
##                 71) accommodates > 1.5 148  201.900 0 ( 0.57432 0.42568 )  
##                  142) host_total_listings_count < 5.5 141  193.900 0 ( 0.55319 0.44681 )  
##                    284) price < 149.5 56   76.490 1 ( 0.42857 0.57143 ) *
##                    285) price > 149.5 85  111.500 0 ( 0.63529 0.36471 )  
##                      570) host_total_listings_count < 2.5 74   91.720 0 ( 0.68919 0.31081 )  
##                       1140) beds < 1.5 34   24.630 0 ( 0.88235 0.11765 ) *
##                       1141) beds > 1.5 40   55.350 0 ( 0.52500 0.47500 )  
##                         2282) host_response.SOME < 0.5 33   45.470 1 ( 0.45455 0.54545 ) *
##                         2283) host_response.SOME > 0.5 7    5.742 0 ( 0.85714 0.14286 ) *
##                      571) host_total_listings_count > 2.5 11   12.890 1 ( 0.27273 0.72727 ) *
##                  143) host_total_listings_count > 5.5 7    0.000 0 ( 1.00000 0.00000 ) *
##            9) has_min_nightsYES > 0.5 2499 2218.000 0 ( 0.83754 0.16246 )  
##             18) ppp_ind.1 < 0.5 1175 1211.000 0 ( 0.78894 0.21106 )  
##               36) host_total_listings_count < 17.5 1089 1159.000 0 ( 0.77594 0.22406 )  
##                 72) host_response.SOME < 0.5 835  930.800 0 ( 0.75449 0.24551 )  
##                  144) market.New.York < 0.5 570  591.900 0 ( 0.78596 0.21404 )  
##                    288) price < 220.5 528  566.000 0 ( 0.77273 0.22727 )  
##                      576) price < 206.5 522  553.000 0 ( 0.77778 0.22222 )  
##                       1152) property_category.other < 0.5 501  519.700 0 ( 0.78643 0.21357 ) *
##                       1153) property_category.other > 0.5 21   28.680 0 ( 0.57143 0.42857 )  
##                         2306) host_total_listings_count < 1.5 15   20.190 1 ( 0.40000 0.60000 )  
##                           4612) beds < 2.5 8    6.028 1 ( 0.12500 0.87500 ) *
##                           4613) beds > 2.5 7    8.376 0 ( 0.71429 0.28571 ) *
##                         2307) host_total_listings_count > 1.5 6    0.000 0 ( 1.00000 0.00000 ) *
##                      577) price > 206.5 6    7.638 1 ( 0.33333 0.66667 ) *
##                    289) price > 220.5 42   16.080 0 ( 0.95238 0.04762 )  
##                      578) cancellation_policy.moderate < 0.5 35    0.000 0 ( 1.00000 0.00000 ) *
##                      579) cancellation_policy.moderate > 0.5 7    8.376 0 ( 0.71429 0.28571 ) *
##                  145) market.New.York > 0.5 265  329.500 0 ( 0.68679 0.31321 )  
##                    290) cancellation_policy.strict < 0.5 129  142.300 0 ( 0.75969 0.24031 )  
##                      580) price < 38 10   13.460 1 ( 0.40000 0.60000 ) *
##                      581) price > 38 119  122.300 0 ( 0.78992 0.21008 )  
##                       1162) price < 127.5 99  109.700 0 ( 0.75758 0.24242 )  
##                         2324) accommodates < 3.5 77   75.940 0 ( 0.80519 0.19481 ) *
##                         2325) accommodates > 3.5 22   29.770 0 ( 0.59091 0.40909 ) *
##                       1163) price > 127.5 20    7.941 0 ( 0.95000 0.05000 ) *
##                    291) cancellation_policy.strict > 0.5 136  180.900 0 ( 0.61765 0.38235 ) *
##                 73) host_response.SOME > 0.5 254  217.800 0 ( 0.84646 0.15354 )  
##                  146) beds < 1.94589 130   88.830 0 ( 0.89231 0.10769 )  
##                    292) market.New.York < 0.5 56   17.260 0 ( 0.96429 0.03571 ) *
##                    293) market.New.York > 0.5 74   65.600 0 ( 0.83784 0.16216 )  
##                      586) property_category.house < 0.5 67   49.010 0 ( 0.88060 0.11940 ) *
##                      587) property_category.house > 0.5 7    9.561 1 ( 0.42857 0.57143 ) *
##                  147) beds > 1.94589 124  124.700 0 ( 0.79839 0.20161 )  
##                    294) accommodates < 2.5 9    0.000 0 ( 1.00000 0.00000 ) *
##                    295) accommodates > 2.5 115  120.400 0 ( 0.78261 0.21739 )  
##                      590) bedrooms < 1.18246 46   58.090 0 ( 0.67391 0.32609 )  
##                       1180) accommodates < 5.5 41   47.690 0 ( 0.73171 0.26829 ) *
##                       1181) accommodates > 5.5 5    5.004 1 ( 0.20000 0.80000 ) *
##                      591) bedrooms > 1.18246 69   57.110 0 ( 0.85507 0.14493 )  
##                       1182) accommodates < 5.5 28   31.490 0 ( 0.75000 0.25000 ) *
##                       1183) accommodates > 5.5 41   21.460 0 ( 0.92683 0.07317 )  
##                         2366) charges_for_extra.YES < 0.5 16   15.440 0 ( 0.81250 0.18750 )  
##                           4732) price < 189.5 8   10.590 0 ( 0.62500 0.37500 ) *
##                           4733) price > 189.5 8    0.000 0 ( 1.00000 0.00000 ) *
##                         2367) charges_for_extra.YES > 0.5 25    0.000 0 ( 1.00000 0.00000 ) *
##               37) host_total_listings_count > 17.5 86   32.360 0 ( 0.95349 0.04651 ) *
##             19) ppp_ind.1 > 0.5 1324  968.100 0 ( 0.88066 0.11934 )  
##               38) bathrooms < 1.25 932  763.900 0 ( 0.85730 0.14270 )  
##                 76) accommodates < 1.5 132   60.360 0 ( 0.93939 0.06061 )  
##                  152) market.New.York < 0.5 59    0.000 0 ( 1.00000 0.00000 ) *
##                  153) market.New.York > 0.5 73   50.470 0 ( 0.89041 0.10959 )  
##                    306) charges_for_extra.YES < 0.5 42   40.900 0 ( 0.80952 0.19048 ) *
##                    307) charges_for_extra.YES > 0.5 31    0.000 0 ( 1.00000 0.00000 ) *
##                 77) accommodates > 1.5 800  693.400 0 ( 0.84375 0.15625 )  
##                  154) accommodates < 6.5 789  669.300 0 ( 0.84918 0.15082 )  
##                    308) price < 191.5 531  492.700 0 ( 0.82486 0.17514 )  
##                      616) price < 184 508  455.600 0 ( 0.83465 0.16535 )  
##                       1232) bed_categoryother < 0.5 492  449.700 0 ( 0.82927 0.17073 )  
##                         2464) price < 99.5 110  122.600 0 ( 0.75455 0.24545 )  
##                           4928) host_total_listings_count < 5.5 101  105.900 0 ( 0.78218 0.21782 )  
##                             9856) price < 90.5 57   46.240 0 ( 0.85965 0.14035 ) *
##                             9857) price > 90.5 44   55.040 0 ( 0.68182 0.31818 )  
##                              19714) price < 94.5 8   10.590 1 ( 0.37500 0.62500 ) *
##                              19715) price > 94.5 36   40.490 0 ( 0.75000 0.25000 ) *
##                           4929) host_total_listings_count > 5.5 9   12.370 1 ( 0.44444 0.55556 ) *
##                         2465) price > 99.5 382  321.900 0 ( 0.85079 0.14921 )  
##                           4930) host_acceptance.MISSING < 0.5 31   37.350 0 ( 0.70968 0.29032 ) *
##                           4931) host_acceptance.MISSING > 0.5 351  280.100 0 ( 0.86325 0.13675 )  
##                             9862) host_total_listings_count < 3.5 306  259.000 0 ( 0.84967 0.15033 )  
##                              19724) bedrooms < 0.5 63   69.160 0 ( 0.76190 0.23810 ) *
##                              19725) bedrooms > 0.5 243  185.500 0 ( 0.87243 0.12757 )  
##                                39450) price < 108.5 29    0.000 0 ( 1.00000 0.00000 ) *
##                                39451) price > 108.5 214  177.100 0 ( 0.85514 0.14486 )  
##                                  78902) price < 178.5 201  172.800 0 ( 0.84577 0.15423 )  
##                                   157804) property_category.house < 0.5 174  160.000 0 ( 0.82759 0.17241 )  
##                                     315608) bedrooms < 1.5 161  135.600 0 ( 0.85093 0.14907 ) *
##                                     315609) bedrooms > 1.5 13   17.940 0 ( 0.53846 0.46154 ) *
##                                   157805) property_category.house > 0.5 27    8.554 0 ( 0.96296 0.03704 ) *
##                                  78903) price > 178.5 13    0.000 0 ( 1.00000 0.00000 ) *
##                             9863) host_total_listings_count > 3.5 45   16.360 0 ( 0.95556 0.04444 )  
##                              19726) beds < 1.94589 34    0.000 0 ( 1.00000 0.00000 ) *
##                              19727) beds > 1.94589 11   10.430 0 ( 0.81818 0.18182 ) *
##                       1233) bed_categoryother > 0.5 16    0.000 0 ( 1.00000 0.00000 ) *
##                      617) price > 184 23   30.790 0 ( 0.60870 0.39130 )  
##                       1234) accommodates < 3.5 16   17.990 0 ( 0.75000 0.25000 ) *
##                       1235) accommodates > 3.5 7    8.376 1 ( 0.28571 0.71429 ) *
##                    309) price > 191.5 258  168.600 0 ( 0.89922 0.10078 )  
##                      618) market.New.York < 0.5 136   49.180 0 ( 0.95588 0.04412 ) *
##                      619) market.New.York > 0.5 122  108.900 0 ( 0.83607 0.16393 )  
##                       1238) beds < 2.5 101   69.530 0 ( 0.89109 0.10891 )  
##                         2476) bedrooms < 0.5 23    0.000 0 ( 1.00000 0.00000 ) *
##                         2477) bedrooms > 0.5 78   63.460 0 ( 0.85897 0.14103 )  
##                           4954) has_cleaning_feeYES < 0.5 13    0.000 0 ( 1.00000 0.00000 ) *
##                           4955) has_cleaning_feeYES > 0.5 65   59.110 0 ( 0.83077 0.16923 ) *
##                       1239) beds > 2.5 21   28.680 0 ( 0.57143 0.42857 ) *
##                  155) accommodates > 6.5 11   15.160 1 ( 0.45455 0.54545 )  
##                    310) price < 346 6    5.407 1 ( 0.16667 0.83333 ) *
##                    311) price > 346 5    5.004 0 ( 0.80000 0.20000 ) *
##               39) bathrooms > 1.25 392  186.000 0 ( 0.93622 0.06378 )  
##                 78) market.New.York < 0.5 312  108.100 0 ( 0.95833 0.04167 )  
##                  156) property_category.house < 0.5 96   55.070 0 ( 0.91667 0.08333 )  
##                    312) accommodates < 5.5 61   17.600 0 ( 0.96721 0.03279 )  
##                      624) cancellation_policy.moderate < 0.5 41    0.000 0 ( 1.00000 0.00000 ) *
##                      625) cancellation_policy.moderate > 0.5 20   13.000 0 ( 0.90000 0.10000 ) *
##                    313) accommodates > 5.5 35   32.070 0 ( 0.82857 0.17143 ) *
##                  157) property_category.house > 0.5 216   47.540 0 ( 0.97685 0.02315 )  
##                    314) charges_for_extra.YES < 0.5 116   41.220 0 ( 0.95690 0.04310 ) *
##                    315) charges_for_extra.YES > 0.5 100    0.000 0 ( 1.00000 0.00000 ) *
##                 79) market.New.York > 0.5 80   67.630 0 ( 0.85000 0.15000 )  
##                  158) beds < 2.5 43    9.499 0 ( 0.97674 0.02326 )  
##                    316) price < 383 37    0.000 0 ( 1.00000 0.00000 ) *
##                    317) price > 383 6    5.407 0 ( 0.83333 0.16667 ) *
##                  159) beds > 2.5 37   45.030 0 ( 0.70270 0.29730 )  
##                    318) bedrooms < 2.5 9   11.460 1 ( 0.33333 0.66667 ) *
##                    319) bedrooms > 2.5 28   26.280 0 ( 0.82143 0.17857 )  
##                      638) bathrooms < 1.75 5    6.730 1 ( 0.40000 0.60000 ) *
##                      639) bathrooms > 1.75 23   13.590 0 ( 0.91304 0.08696 )  
##                       1278) accommodates < 8.5 16    0.000 0 ( 1.00000 0.00000 ) *
##                       1279) accommodates > 8.5 7    8.376 0 ( 0.71429 0.28571 ) *
##          5) host_is_superhostTRUE > 0.5 1717 2340.000 0 ( 0.57659 0.42341 )  
##           10) has_min_nightsYES < 0.5 594  819.900 1 ( 0.46128 0.53872 )  
##             20) bedrooms < 0.5 43   44.120 1 ( 0.20930 0.79070 )  
##               40) host_total_listings_count < 6.5 38   29.590 1 ( 0.13158 0.86842 ) *
##               41) host_total_listings_count > 6.5 5    5.004 0 ( 0.80000 0.20000 ) *
##             21) bedrooms > 0.5 551  763.000 1 ( 0.48094 0.51906 )  
##               42) price < 60.5 115  155.600 0 ( 0.59130 0.40870 )  
##                 84) host_total_listings_count < 9.5 110  150.200 0 ( 0.57273 0.42727 ) *
##                 85) host_total_listings_count > 9.5 5    0.000 0 ( 1.00000 0.00000 ) *
##               43) price > 60.5 436  600.400 1 ( 0.45183 0.54817 )  
##                 86) ppp_ind.1 < 0.5 231  307.000 1 ( 0.38095 0.61905 )  
##                  172) host_total_listings_count < 6.5 202  261.900 1 ( 0.35149 0.64851 )  
##                    344) market.New.Orleans < 0.5 185  231.600 1 ( 0.31892 0.68108 )  
##                      688) accommodates < 2.5 47   65.130 1 ( 0.48936 0.51064 )  
##                       1376) cancellation_policy.moderate < 0.5 31   40.320 1 ( 0.35484 0.64516 ) *
##                       1377) cancellation_policy.moderate > 0.5 16   17.990 0 ( 0.75000 0.25000 )  
##                         2754) price < 68.5 7    9.561 1 ( 0.42857 0.57143 ) *
##                         2755) price > 68.5 9    0.000 0 ( 1.00000 0.00000 ) *
##                      689) accommodates > 2.5 138  158.400 1 ( 0.26087 0.73913 )  
##                       1378) price < 71 7    0.000 1 ( 0.00000 1.00000 ) *
##                       1379) price > 71 131  154.100 1 ( 0.27481 0.72519 ) *
##                    345) market.New.Orleans > 0.5 17   20.600 0 ( 0.70588 0.29412 ) *
##                  173) host_total_listings_count > 6.5 29   39.340 0 ( 0.58621 0.41379 ) *
##                 87) ppp_ind.1 > 0.5 205  283.400 0 ( 0.53171 0.46829 )  
##                  174) beds < 1.44589 143  197.900 1 ( 0.47552 0.52448 )  
##                    348) price < 237 138  190.300 1 ( 0.45652 0.54348 )  
##                      696) market.San.Francisco < 0.5 130  180.100 1 ( 0.48462 0.51538 ) *
##                      697) market.San.Francisco > 0.5 8    0.000 1 ( 0.00000 1.00000 ) *
##                    349) price > 237 5    0.000 0 ( 1.00000 0.00000 ) *
##                  175) beds > 1.44589 62   79.380 0 ( 0.66129 0.33871 )  
##                    350) has_cleaning_feeYES < 0.5 6    0.000 0 ( 1.00000 0.00000 ) *
##                    351) has_cleaning_feeYES > 0.5 56   74.100 0 ( 0.62500 0.37500 )  
##                      702) host_total_listings_count < 2.5 35   39.900 0 ( 0.74286 0.25714 )  
##                       1404) market.New.York < 0.5 29   26.660 0 ( 0.82759 0.17241 )  
##                         2808) price < 181.5 10   13.460 0 ( 0.60000 0.40000 ) *
##                         2809) price > 181.5 19    7.835 0 ( 0.94737 0.05263 ) *
##                       1405) market.New.York > 0.5 6    7.638 1 ( 0.33333 0.66667 ) *
##                      703) host_total_listings_count > 2.5 21   28.680 1 ( 0.42857 0.57143 ) *
##           11) has_min_nightsYES > 0.5 1123 1471.000 0 ( 0.63758 0.36242 )  
##             22) market.New.York < 0.5 857 1066.000 0 ( 0.68611 0.31389 )  
##               44) price < 149.5 547  728.500 0 ( 0.61609 0.38391 )  
##                 88) price < 61.5 96  105.700 0 ( 0.76042 0.23958 )  
##                  176) host_response.SOME < 0.5 90   90.070 0 ( 0.80000 0.20000 )  
##                    352) charges_for_extra.YES < 0.5 45   55.800 0 ( 0.68889 0.31111 )  
##                      704) property_category.condo < 0.5 40   51.800 0 ( 0.65000 0.35000 )  
##                       1408) ppp_ind.1 < 0.5 29   39.890 0 ( 0.55172 0.44828 ) *
##                       1409) ppp_ind.1 > 0.5 11    6.702 0 ( 0.90909 0.09091 ) *
##                      705) property_category.condo > 0.5 5    0.000 0 ( 1.00000 0.00000 ) *
##                    353) charges_for_extra.YES > 0.5 45   27.000 0 ( 0.91111 0.08889 )  
##                      706) market.Chicago < 0.5 40    9.353 0 ( 0.97500 0.02500 ) *
##                      707) market.Chicago > 0.5 5    6.730 1 ( 0.40000 0.60000 ) *
##                  177) host_response.SOME > 0.5 6    5.407 1 ( 0.16667 0.83333 ) *
##                 89) price > 61.5 451  612.000 0 ( 0.58537 0.41463 )  
##                  178) property_category.other < 0.5 410  551.000 0 ( 0.60244 0.39756 )  
##                    356) bedrooms < 0.5 51   70.520 1 ( 0.47059 0.52941 ) *
##                    357) bedrooms > 0.5 359  476.400 0 ( 0.62117 0.37883 )  
##                      714) price < 69.5 31   42.170 1 ( 0.41935 0.58065 )  
##                       1428) host_total_listings_count < 2.5 25   29.650 1 ( 0.28000 0.72000 )  
##                         2856) host_total_listings_count < 1.5 12    6.884 1 ( 0.08333 0.91667 ) *
##                         2857) host_total_listings_count > 1.5 13   17.940 1 ( 0.46154 0.53846 ) *
##                       1429) host_total_listings_count > 2.5 6    0.000 0 ( 1.00000 0.00000 ) *
##                      715) price > 69.5 328  428.600 0 ( 0.64024 0.35976 )  
##                       1430) accommodates < 1.5 13    7.051 0 ( 0.92308 0.07692 ) *
##                       1431) accommodates > 1.5 315  415.600 0 ( 0.62857 0.37143 )  
##                         2862) beds < 1.94589 163  204.100 0 ( 0.68098 0.31902 )  
##                           5724) market.San.Diego < 0.5 151  194.500 0 ( 0.65563 0.34437 )  
##                            11448) price < 143.5 146  190.100 0 ( 0.64384 0.35616 )  
##                              22896) price < 81 27   25.870 0 ( 0.81481 0.18519 )  
##                                45792) host_total_listings_count < 1.5 11    0.000 0 ( 1.00000 0.00000 ) *
##                                45793) host_total_listings_count > 1.5 16   19.870 0 ( 0.68750 0.31250 ) *
##                              22897) price > 81 119  159.700 0 ( 0.60504 0.39496 )  
##                                45794) cancellation_policy.strict < 0.5 69   95.640 0 ( 0.50725 0.49275 )  
##                                  91588) host_total_listings_count < 4.5 64   88.160 0 ( 0.54688 0.45312 ) *
##                                  91589) host_total_listings_count > 4.5 5    0.000 1 ( 0.00000 1.00000 ) *
##                                45795) cancellation_policy.strict > 0.5 50   57.310 0 ( 0.74000 0.26000 )  
##                                  91590) price < 132 45   45.040 0 ( 0.80000 0.20000 ) *
##                                  91591) price > 132 5    5.004 1 ( 0.20000 0.80000 ) *
##                            11449) price > 143.5 5    0.000 0 ( 1.00000 0.00000 ) *
##                           5725) market.San.Diego > 0.5 12    0.000 0 ( 1.00000 0.00000 ) *
##                         2863) beds > 1.94589 152  207.500 0 ( 0.57237 0.42763 )  
##                           5726) market.San.Francisco < 0.5 141  188.600 0 ( 0.60993 0.39007 ) *
##                           5727) market.San.Francisco > 0.5 11    6.702 1 ( 0.09091 0.90909 ) *
##                  179) property_category.other > 0.5 41   55.640 1 ( 0.41463 0.58537 )  
##                    358) charges_for_extra.YES < 0.5 21   23.050 1 ( 0.23810 0.76190 )  
##                      716) price < 87.5 7    0.000 1 ( 0.00000 1.00000 ) *
##                      717) price > 87.5 14   18.250 1 ( 0.35714 0.64286 ) *
##                    359) charges_for_extra.YES > 0.5 20   26.920 0 ( 0.60000 0.40000 ) *
##               45) price > 149.5 310  301.700 0 ( 0.80968 0.19032 )  
##                 90) market.New.Orleans < 0.5 279  287.900 0 ( 0.78853 0.21147 )  
##                  180) ppp_ind.1 < 0.5 89  110.800 0 ( 0.68539 0.31461 ) *
##                  181) ppp_ind.1 > 0.5 190  169.100 0 ( 0.83684 0.16316 )  
##                    362) host_total_listings_count < 2.5 127  134.000 0 ( 0.77953 0.22047 )  
##                      724) property_category.condo < 0.5 114  127.100 0 ( 0.75439 0.24561 )  
##                       1448) market.Nashville < 0.5 103  120.500 0 ( 0.72816 0.27184 )  
##                         2896) market.Portland < 0.5 96  115.900 0 ( 0.70833 0.29167 ) *
##                         2897) market.Portland > 0.5 7    0.000 0 ( 1.00000 0.00000 ) *
##                       1449) market.Nashville > 0.5 11    0.000 0 ( 1.00000 0.00000 ) *
##                      725) property_category.condo > 0.5 13    0.000 0 ( 1.00000 0.00000 ) *
##                    363) host_total_listings_count > 2.5 63   24.120 0 ( 0.95238 0.04762 )  
##                      726) property_category.house < 0.5 28   19.070 0 ( 0.89286 0.10714 )  
##                       1452) host_total_listings_count < 8.5 13   14.050 0 ( 0.76923 0.23077 ) *
##                       1453) host_total_listings_count > 8.5 15    0.000 0 ( 1.00000 0.00000 ) *
##                      727) property_category.house > 0.5 35    0.000 0 ( 1.00000 0.00000 ) *
##                 91) market.New.Orleans > 0.5 31    0.000 0 ( 1.00000 0.00000 ) *
##             23) market.New.York > 0.5 266  368.400 1 ( 0.48120 0.51880 )  
##               46) accommodates < 3.5 165  226.500 0 ( 0.55758 0.44242 )  
##                 92) price < 204.5 158  218.100 0 ( 0.53797 0.46203 )  
##                  184) price < 182.5 151  207.400 0 ( 0.55629 0.44371 ) *
##                  185) price > 182.5 7    5.742 1 ( 0.14286 0.85714 ) *
##                 93) price > 204.5 7    0.000 0 ( 1.00000 0.00000 ) *
##               47) accommodates > 3.5 101  131.600 1 ( 0.35644 0.64356 )  
##                 94) host_total_listings_count < 1.5 61   65.720 1 ( 0.22951 0.77049 )  
##                  188) beds < 3.5 52   60.580 1 ( 0.26923 0.73077 )  
##                    376) price < 182 29   23.270 1 ( 0.13793 0.86207 )  
##                      752) price < 132.5 17   18.550 1 ( 0.23529 0.76471 ) *
##                      753) price > 132.5 12    0.000 1 ( 0.00000 1.00000 ) *
##                    377) price > 182 23   31.490 1 ( 0.43478 0.56522 ) *
##                  189) beds > 3.5 9    0.000 1 ( 0.00000 1.00000 ) *
##                 95) host_total_listings_count > 1.5 40   55.050 0 ( 0.55000 0.45000 )  
##                  190) bathrooms < 1.25 34   47.020 1 ( 0.47059 0.52941 ) *
##                  191) bathrooms > 1.25 6    0.000 0 ( 1.00000 0.00000 ) *
##        3) host_response.MISSING > 0.5 1161  151.500 0 ( 0.98794 0.01206 )  
##          6) host_is_superhostTRUE < 0.5 1125  104.800 0 ( 0.99200 0.00800 )  
##           12) host_total_listings_count < 1.5 995   52.120 0 ( 0.99598 0.00402 )  
##             24) ppp_ind.1 < 0.5 399   44.780 0 ( 0.98997 0.01003 )  
##               48) market.New.York < 0.5 166    0.000 0 ( 1.00000 0.00000 ) *
##               49) market.New.York > 0.5 233   40.450 0 ( 0.98283 0.01717 ) *
##             25) ppp_ind.1 > 0.5 596    0.000 0 ( 1.00000 0.00000 ) *
##           13) host_total_listings_count > 1.5 130   42.390 0 ( 0.96154 0.03846 )  
##             26) market.New.York < 0.5 53    0.000 0 ( 1.00000 0.00000 ) *
##             27) market.New.York > 0.5 77   37.010 0 ( 0.93506 0.06494 ) *
##          7) host_is_superhostTRUE > 0.5 36   29.010 0 ( 0.86111 0.13889 )  
##           14) has_cleaning_feeYES < 0.5 12   15.280 0 ( 0.66667 0.33333 ) *
##           15) has_cleaning_feeYES > 0.5 24    8.314 0 ( 0.95833 0.04167 ) *
  1. Create pruned trees of size 2, 4, 6, 8, 10, 15, 20, 25, 30, 35, and 40. Plot fitting or complexity-performance curves consisting of the accuracy in the validation and training sets for each pruned tree (assuming a cutoff of 0.5). Make sure the two sets of points are different colors.

Hint: you might want to change the y-axis of your plot from the default by adding the ylim() argument to plot().

ANSWER TO QUESTION 3b HERE:

tree_sizes <- c(2,4,6,8,10,15,20,25,30,35,40)
tr_accs <- c(0,0,0,0,0,0,0,0,0,0,0)
va_accs <- c(0,0,0,0,0,0,0,0,0,0,0)

accuracy <- function(classifications, actuals){
  correct_classifications <- ifelse(classifications == actuals, 1, 0)
  acc <- sum(correct_classifications)/length(classifications)
  return(acc)
}
### USE A FOR LOOP

for (i in 1:length(tree_sizes)) {
  pruned_tree <- prune.tree(full_tree,best = tree_sizes[i])
  if (is.null(pruned_tree)) {
    cat("Pruning failed for size:", size, "\n")
  } else {
    plot(pruned_tree)
    text(pruned_tree,pretty =0)
    tree_preds_tr <- predict(pruned_tree,newdata=train_data)
    classifications_tr <-ifelse(tree_preds_tr[,2]> 0.5,1,0)
    tr_accs[i] <- accuracy(classifications_tr,train_data$high_booking_rate)
    
    tree_preds_va <- predict(pruned_tree,newdata=valid_data)
    classifications_va <-ifelse(tree_preds_va[,2] > 0.5,1,0)
    va_accs[i] <- accuracy(classifications_va,valid_data$high_booking_rate)
    
  }
  
}

tr_accs
##  [1] 0.7561429 0.7561429 0.7627143 0.7641429 0.7672857 0.7672857 0.7672857
##  [8] 0.7705714 0.7711429 0.7764286 0.7765714
va_accs
##  [1] 0.7546667 0.7546667 0.7656667 0.7580000 0.7590000 0.7590000 0.7590000
##  [8] 0.7680000 0.7673333 0.7650000 0.7656667
plot(tree_sizes,tr_accs, col = "orange", type = 'l', ylim = c(0.7,0.8))
lines(tree_sizes,va_accs, col = "black")
legend("topright", legend = c("Training Accuracy", "Validation Accuracy"), col = c("orange", "black"), lty = 1)
points(tree_sizes, tr_accs, col = "orange", pch = 16)
points(tree_sizes, va_accs, col = "black", pch = 16)

  1. Which tree size is the best, and how did you select the best one? Report the validation accuracy of your best tree. Which would you choose if you needed to create a model to predict high_booking_rate - the best tree, or your logistic regression from above?

ANSWER TO QUESTION 3c HERE: Best tree size is 25 terminal nodes, selected based on the lowest cross-validation error, resulting in a validation accuracy of approx 76.8%. Comparing this to the logistic regression model’s accuracy of ~75.47%, the decision tree slightly outperforms. Given the trade-offs between interchangeability, simplicity, & accuracy, choose the decision tree for predicting high_booking_rate, due to its better performance on the validation set & its understandable nature.

my_tree_size <- tree_sizes[which.max(va_accs)]
my_tree_size
## [1] 25
my_va_accs <- va_accs[which.max(va_accs)]
my_va_accs
## [1] 0.768

4: kNN

  1. Set up for running kNN by separating the training and validation X matrices from the y variables.
tra_x <- train_data[,-which(names(train_data) == "high_booking_rate")]
tra_y <- train_data$high_booking_rate
val_x <- valid_data[,-which(names(valid_data) == "high_booking_rate")]
val_y <- valid_data$high_booking_rate

4b. Compute kNN estimates in the training and validation data using k values of 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 50, 100, and 200. Assume a cutoff of 0.5. Plot the accuracy in the validation and training sets for each k value. Make sure the two sets of points are different colors! Once again, you might want to play with the ylim() on your graph. It also might help to plot the log(k) on the x-axis rather than just k.

+Note: kNN will take a little more time than logistic regression or trees to run, so be patient!

ANSWER TO QUESTION 4b HERE:

kvec <- c(1,2,3,4,5,6,7,8,9,10,15,20,50,100,200) 
va_acc <- rep(0, length(kvec))
tr_acc <- rep(0, length(kvec))

for(i in 1:length(kvec)){
  inner_k <- kvec[i]
  
  inner_tr_preds <- knn(tra_x, tra_x, tra_y, k=inner_k, prob = TRUE) 
  inner_tr_acc <- accuracy(inner_tr_preds, tra_y)
  tr_acc[i] <- inner_tr_acc
  
  
  inner_va_preds <- knn(tra_x, val_x, tra_y, k=inner_k, prob = TRUE) 
  inner_va_acc <- accuracy(inner_va_preds, val_y)
  va_acc[i] <- inner_va_acc
}
tr_acc
##  [1] 0.9974286 0.8391429 0.8360000 0.8061429 0.7984286 0.7888571 0.7877143
##  [8] 0.7802857 0.7777143 0.7760000 0.7690000 0.7651429 0.7568571 0.7561429
## [15] 0.7561429
va_acc
##  [1] 0.6806667 0.7063333 0.7226667 0.7223333 0.7300000 0.7293333 0.7386667
##  [8] 0.7376667 0.7440000 0.7473333 0.7513333 0.7530000 0.7536667 0.7546667
## [15] 0.7546667
plot(log(kvec),tr_acc, col = "lightgreen", type = 'l', ylim = c(0.6,1))
lines(log(kvec),va_acc, col = "black")
legend("topright", legend = c("Training Accuracy", "Validation Accuracy"), col = c("lightgreen", "black"), lty = 1)
points(log(kvec), tr_acc, col = "lightgreen", pch = 16)
points(log(kvec), va_acc, col = "black", pch = 16)

  1. Which k is the best? Report the validation accuracy from the best kNN model. Are you satisfied with the values of k that you tried, or do you think it might improve your model to try more values of k? ANSWER TO QUESTION 4c HERE:Best k value is 100, with a validation accuracy of = 75.5%. This value was selected by comparing the accuracy of models with different k values on the validation set, & choosing 1 with the highest accuracy. The range of k values tested provided a good balance between model complexity & accuracy, suggesting a thorough search was performed. But, exploring values beyond 200 or in finer increments around 100 could potentially improve the model slightly, but given the pattern observed, substantial improvements are unlikely.
bst_k <- kvec[which.max(va_acc)]
bst_k
## [1] 100
bst_va_acc <- va_acc[which.max(va_acc)]
bst_va_acc
## [1] 0.7546667
  1. Now which model would you pick - the best kNN, the best decision tree, or logistic regression? ANSWER TO QUESTION 4d HERE: This would depend on the comparison of their validation accuracies & the context of their application (like: performance, interpretability). If we compare the validation accuracies the best decision tree or logistic regression model has a higher validation accuracy than 0.755, one of them might be preferred. So, if interpretability is a key factor, decision trees or logistic regression might be favored due to their more straightforward explanation of predictions.