PART 1: Introduction

Motivation

Airbnb has taken a non asset-based approach to housing and hospitality that has enabled individuals to earn a profit by commercializing their own private properties. This is a productof a shared economy, which has become a disruptive force in favor of evolving consumer tastes.

Particularly for Airbnb, the Superhost program creates incentives and opportunities for ambitious individuals. When the data for the specific set we set to anlayze was explored, we found that hosts that are in the Superhost program earned up to 22% more in profit than their counterparts and attract more customers,

Goals and Hypothesis

Our goal for this project is to develop a proposal for whether any changes should be made to the Airbnb Superhost Program with a data-driven approach. We will use various machine learning algorithms, and then select the best algorithm available to develop our proposal.

HYPOTHESIS: We expect the following features to stand out in our analysis, given what we know about the program:

  • Amenities
  • Review scores rating
  • Price

Data Sources

Inside Airbnb is an independent, non-commercial set of tools and data that allows the user to explore how Airbnb is being used in cities around the world (http://insideairbnb.com/get-the-data.html)

The ‘Los Angeles Listings’ dataset contains 96 features and 43,047 records.

Data Preprocessing

The following were used to clean and prepare the data:

  • Attribute Reduction
  • Change the characteristics
  • Create dummy variables for categorical data
  • Bucket data groups into smaller groups
  • Handling ‘NA’ or missing values
  • Data Transformation

PART 2: Libraries and Data Import/Cleaning

LIBRARIES

# Libraries for machine learning
library(tidyverse)
library(class)
library(gmodels)
library(caret)
library(ipred)
library(adabag)
library(vcd)
library(randomForest)
library(e1071)
library(C50)
library(klaR)
library(rJava)
library(RWeka)
library(magrittr)
library(ROCR)
library(pROC)
library(neuralnet)
library(kernlab)
library(VIM)
library(mice)

# Libraries for data cleaning and preprocessing
library(dplyr)
library(stringr)
library(lubridate)
library(ggplot2)
library(corrplot)
library(Boruta)
##     id                       listing_url    scrape_id last_scraped
## 1  109  https://www.airbnb.com/rooms/109 2.018121e+13   2018-12-07
## 2  344  https://www.airbnb.com/rooms/344 2.018121e+13   2018-12-07
## 3 2708 https://www.airbnb.com/rooms/2708 2.018121e+13   2018-12-06
## 4 2732 https://www.airbnb.com/rooms/2732 2.018121e+13   2018-12-06
## 5 2864 https://www.airbnb.com/rooms/2864 2.018121e+13   2018-12-06
## 6 3021 https://www.airbnb.com/rooms/3021 2.018121e+13   2018-12-07
##                                                 name
## 1 Amazing bright elegant condo park front *UPGRADED*
## 2                  Family perfect;Pool;Near Studios!
## 3 Gold Memory Foam Bed & Breakfast in West Hollywood
## 4                              Zen Life at the Beach
## 5  *Upscale Professional Home with Beautiful Studio*
## 6    Hollywood Hills Zen Modern style Apt/Guesthouse
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             summary
## 1                                                                                  *** Unit upgraded with new bamboo flooring, brand new Ultra HD 50" Sony TV, new paint, new lighting, new mattresses, ultra fast cable Internet connection, Apple TV, (Hidden by Airbnb) Chromecast. *** Gorgeous and Elegant Furnished Condo in front of Culver City Fox Hills Park.  Upper corner unit, total silence protected by trees. Short walk to the new Westfield Mall. Tennis courts, heated pool and jacuzzi hot tub.
## 2                                                                                                                     This home is perfect for families; aspiring child actors w/parents; and friends vacationing for the summer or holidays.  The pool is large, back patio terrific for evening dinners/parties around the firepit while folks nighttime swim during the summer. Chilly firepit fun during the winter. Quiet neighborhood minutes from Burbank Airport and all freeways.  GREAT CENTRAL LOCATION!
## 3            Our best memory foam pillows you'll ever sleep on. First Morning: Starbuck's & Peet's coffee, latte-style coffee also protein bars, granola bars, and a fresh baked Swedish cinnamon roll, continental breakfast as well as breakfast requests.  A welcome bottle of Voss artesian water from Norway. Terry robe & slippers. Handmade Amish wildflower soap. Candy bowl & trail mix jar. SoCal: beaches, Walk of Fame, clubs. Then back here for R&R.   Pamper yourself in West Hollywood, California.
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  
## 5 Centrally located.... Furnished with 42 inch Sony Plasma tv/hbo/showtime.  Fiber optic WIFI. 4.6 cu ft. Refridgerator, Microwave, Convection Oven.  Large bathroom with Jaccuzi and large shower, brazilian cherry hardwood floors, thomasville bedroom furniture, ikea office, fireplace, ceiling fan, etc.  Many restaurants and Shopping.  Cheesecake Factory, Fronks (best burgers and ribs), Marino's, Royal Taste (Thai), California Pizza Kitchen, The Nest (breakfast!), Bike Trail, Beach in 15 minutes.
## 6                                                                                                                                                                                  A very Modern Hollywood Hills Zen style gallery-esque abode , Dark Brazilian Hardwood Floors, Sleek modern Concrete decor infused with Asian feng shui sensibilities, Artisinal in all aspects  with all the modern conveniences. Located in beautiful and Musically  Historic  Laurel Canyon.  Approximate size is 460 sq feet.
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               space
## 1                          *** Unit upgraded with new bamboo flooring, brand new Ultra HD 50" Sony TV, new paint, new lighting, new mattresses, ultra fast cable Internet connection. *** Gorgeous and Elegant Furnished Apartment in front of Culver City Fox Hills Park.  Upper corner unit, total silence protected by trees. Short walk to the new Westfield Mall. Tennis courts, heated pool and jacuzzi hot tub. *** Upgraded with bamboo flooring and new paint the whole apartment *** Just installed gorgeous high quality bamboo hardwood floor in the whole apartment! (pictures here shows bamboo only in the living room, now also the bedrooms have bamboo flooring) Gorgeous and Elegant Furnished Apartment in front of Culver City Fox Hills Park.  Upper corner unit, totally silent protected by trees. Short walk to the new Westfield Mall. MANY MORE PICTURES AVAILABLE HERE: (URL HIDDEN) Listing Type: Short or Long Term Rental  Listing Description: Gorgeous and Elegant Furnished Apartment in front of the Park Bedrooms: 2 bedrooms B
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Cheerful & comfortable; near studios, amusement parks, downtown, beaches!  Central and modern; private, convenient, family-friendly, pool.  3 bedrooms.  Terrific host - feel welcome and relaxed while you travel.
## 3 Flickering fireplace display heater.  Decorated with fresh flowers for the Holidays. Blendtec® Designer 625 Blender Bundle with Twister Jar MORE THAN JUST SMOOTHIES: From hot soup to ice cream and everything in between, your imagination is the only limit to the creative power you can unleash with Blendtec blender. This space is completely upgraded and updated.  Luxury Gold Queen size memory foam bed in fully private 10 ft. x 13 ft. 6 in. walled off completely enclosed space, screened off,  of huge living room. 17' high exposed beam ceiling.  Brand new plush carpeting and plank flooring  throughout. â\200¢ Wireless Router WIFI â\200¢ Great looking new building  â\200¢ Great looking lobby  â\200¢ Right off Sunset Blvd. (west of La Brea)  â\200¢ Perfect for a model or actor.  â\200¢ Walk to Formosa Cafe  â\200¢ 5 Minutes: Beverly Hills, Paramount, Pantages Theater, Television City.  â\200¢ 8 Minutes: Burbank, Universal, Disney, NBC, Fox, SAG.  â\200¢ Laundry Facilities  â\200¢ Jacuzzi and sundeck  â\200¢ Public transportation close by  â\200¢ Newl
## 4                          This is a three story townhouse with the following layout. The first floor, where my guest stays, is an open space filled with light. Bright, airy, cheerful. You will be sleeping on a sleeper couch facing a beautiful garden overflowing with flowers to greet you every morning that offers a lovely patio area to sit and have a meal. Very peaceful and serene.  There is no door but Japanese screens are provided if you want to nest in. While this area is considered a shared space, you will have it completely to yourself and have any privacy you seek. On the same floor, is the shared kitchen fully stocked. The 2nd floor is my office and private bath for my guest. The third floor is my quarters. We could possibly not run into one another. Located in Santa Monica, a few blocks from the very hip Main Street area that has cafes, both fine and casual dining that will appeal to those of you who are foodies. Walking distance to yoga studios, farmers markets and wonderful unique shops. The pristine c
## 5                                                                                                                                                                                                                                                                                                                                                                                                                        The space is furnished with Thomasville furniture, brazillian cherrywood flooring, jacuzzi, large bathroom, large closet, large office desk area/furniture.   2 minutes to the freeway, 12 minutes to closest beach, 30 minutes to Hollywood, 5 minutes to restaurants, shopping and grocery stores.  There are many parks close by and very close to the San Gabriel Pass where the cyclists ride down to Seal Beach.  It's a safe, quiet place with tenants with a very busy lifestyle and there are no children.  No smoking in the house and no alcoholics/drugs.  I'm a very friendly home owner and I respect everyone's privacy! :)
## 6                          Stay amongst the Stars when you visit the Hollywood Hills! One of the safest areas !!! Sleep just minutes away from Jim Morrison's house, .....The Mamas and Papas home, and many more present and past celebrities!! ...the Hollywood Hills / Laurel Canyon Welcome to Paradise , quiet, lush ,song birds greenery, , and refreshing breezes; yet in the heart of Hollywood, quick access to Sunset Strip, West Hollywood, Downtown, Beverly Hills, and Universal City Walk and other film studios. This gem is nestled up the infamous Laurel Canyon and has all of the amenities and comforts of a custom designer home , completely separate entrance, plenty of easy parking, and fully loaded! After your active touring, relax in this  fully-furnished Soho gallery-esque modern guest house. Master bedroom includes- queen size bed, fresh linens and comforter, White leather Corbusier chair, custom corner desk, and a walk-in closet; media/living room features a 42" plasma screen with slime Warner cable ,  Italian le
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 description
## 1  *** Unit upgraded with new bamboo flooring, brand new Ultra HD 50" Sony TV, new paint, new lighting, new mattresses, ultra fast cable Internet connection, Apple TV, (Hidden by Airbnb) Chromecast. *** Gorgeous and Elegant Furnished Condo in front of Culver City Fox Hills Park.  Upper corner unit, total silence protected by trees. Short walk to the new Westfield Mall. Tennis courts, heated pool and jacuzzi hot tub. *** Unit upgraded with new bamboo flooring, brand new Ultra HD 50" Sony TV, new paint, new lighting, new mattresses, ultra fast cable Internet connection. *** Gorgeous and Elegant Furnished Apartment in front of Culver City Fox Hills Park.  Upper corner unit, total silence protected by trees. Short walk to the new Westfield Mall. Tennis courts, heated pool and jacuzzi hot tub. *** Upgraded with bamboo flooring and new paint the whole apartment *** Just installed gorgeous high quality bamboo hardwood floor in the whole apartment! (pictures here shows bamboo only in the living r
## 2  This home is perfect for families; aspiring child actors w/parents; and friends vacationing for the summer or holidays.  The pool is large, back patio terrific for evening dinners/parties around the firepit while folks nighttime swim during the summer. Chilly firepit fun during the winter. Quiet neighborhood minutes from Burbank Airport and all freeways.  GREAT CENTRAL LOCATION! Cheerful & comfortable; near studios, amusement parks, downtown, beaches!  Central and modern; private, convenient, family-friendly, pool.  3 bedrooms.  Terrific host - feel welcome and relaxed while you travel. Pool, patio and self-contained main house all accessible freely by guests.  Garage, pool house and back caretaker unit not accessible. Host and caretaker may be available throughout your stay to assist in troubleshooting with local information/amenities. During holiday time, you may have the place to yourself.  Host available for support by phone. Quiet-yet-close to all the fun in LA! Hollywood, Univers
## 3 Our best memory foam pillows you'll ever sleep on. First Morning: Starbuck's & Peet's coffee, latte-style coffee also protein bars, granola bars, and a fresh baked Swedish cinnamon roll, continental breakfast as well as breakfast requests.  A welcome bottle of Voss artesian water from Norway. Terry robe & slippers. Handmade Amish wildflower soap. Candy bowl & trail mix jar. SoCal: beaches, Walk of Fame, clubs. Then back here for R&R.   Pamper yourself in West Hollywood, California. Flickering fireplace display heater.  Decorated with fresh flowers for the Holidays. Blendtec® Designer 625 Blender Bundle with Twister Jar MORE THAN JUST SMOOTHIES: From hot soup to ice cream and everything in between, your imagination is the only limit to the creative power you can unleash with Blendtec blender. This space is completely upgraded and updated.  Luxury Gold Queen size memory foam bed in fully private 10 ft. x 13 ft. 6 in. walled off completely enclosed space, screened off,  of huge living roo
## 4  This is a three story townhouse with the following layout. The first floor, where my guest stays, is an open space filled with light. Bright, airy, cheerful. You will be sleeping on a sleeper couch facing a beautiful garden overflowing with flowers to greet you every morning that offers a lovely patio area to sit and have a meal. Very peaceful and serene.  There is no door but Japanese screens are provided if you want to nest in. While this area is considered a shared space, you will have it completely to yourself and have any privacy you seek. On the same floor, is the shared kitchen fully stocked. The 2nd floor is my office and private bath for my guest. The third floor is my quarters. We could possibly not run into one another. Located in Santa Monica, a few blocks from the very hip Main Street area that has cafes, both fine and casual dining that will appeal to those of you who are foodies. Walking distance to yoga studios, farmers markets and wonderful unique shops. The pristine c
## 5  Centrally located.... Furnished with 42 inch Sony Plasma tv/hbo/showtime.  Fiber optic WIFI. 4.6 cu ft. Refridgerator, Microwave, Convection Oven.  Large bathroom with Jaccuzi and large shower, brazilian cherry hardwood floors, thomasville bedroom furniture, ikea office, fireplace, ceiling fan, etc.  Many restaurants and Shopping.  Cheesecake Factory, Fronks (best burgers and ribs), Marino's, Royal Taste (Thai), California Pizza Kitchen, The Nest (breakfast!), Bike Trail, Beach in 15 minutes. The space is furnished with Thomasville furniture, brazillian cherrywood flooring, jacuzzi, large bathroom, large closet, large office desk area/furniture.   2 minutes to the freeway, 12 minutes to closest beach, 30 minutes to Hollywood, 5 minutes to restaurants, shopping and grocery stores.  There are many parks close by and very close to the San Gabriel Pass where the cyclists ride down to Seal Beach.  It's a safe, quiet place with tenants with a very busy lifestyle and there are no children.  N
## 6  A very Modern Hollywood Hills Zen style gallery-esque abode , Dark Brazilian Hardwood Floors, Sleek modern Concrete decor infused with Asian feng shui sensibilities, Artisinal in all aspects  with all the modern conveniences. Located in beautiful and Musically  Historic  Laurel Canyon.  Approximate size is 460 sq feet. Stay amongst the Stars when you visit the Hollywood Hills! One of the safest areas !!! Sleep just minutes away from Jim Morrison's house, .....The Mamas and Papas home, and many more present and past celebrities!! ...the Hollywood Hills / Laurel Canyon Welcome to Paradise , quiet, lush ,song birds greenery, , and refreshing breezes; yet in the heart of Hollywood, quick access to Sunset Strip, West Hollywood, Downtown, Beverly Hills, and Universal City Walk and other film studios. This gem is nestled up the infamous Laurel Canyon and has all of the amenities and comforts of a custom designer home , completely separate entrance, plenty of easy parking, and fully loaded! Af
##   experiences_offered
## 1                none
## 2                none
## 3                none
## 4                none
## 5                none
## 6                none
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   neighborhood_overview
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## 2                                                                                                                                                                                                                                                                                                                                                                                                                               Quiet-yet-close to all the fun in LA! Hollywood, Universal Studios, beaches, great hikes and more are all minutes away.
## 3 We are minutes away from the Mentor Language Institute, Kings College, Musicians Institute, and many film schools including AFI, and the American Academy of Dramatic Arts.  Halfway between UCLA and USC.  We are minutes away from the Hollywood Boulevard Walk of Fame and all the clubs on Sunset Strip. All the comedy clubs are here, as well. Minutes from the Grove and Rodeo Drive. I'll give you maps and directions to everything.  Universal City is just up the road. Magic Mountain is a short drive out of town. Disneyland , as well.
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      
## 5                                                      What makes the neighborhood unique is that there are 5 grocery stores within 5 minutes and 2 Malls within 7 minutes.  There are also many parks and with the San Gabriel Pass being a few minutes away, you can actually ride a bike to Seal Beach.  The 91 freeway is 2 minutes away and the 605 3 minutes.  The 105 freeway about 6 minutes and the 5 freeway about 6 minutes.  The closest beach is about 12 minutes away.  Downtown LA is about 20 minutes.  Disneyland is about 12 minutes.
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                   This is the famous Hollywood hills.. Historical for Music , many nighbor are well known celebrities
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       notes
## 1                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             One dog may be on premises, friendly and cared for by caretaker.  A great addition to stabilize kids-away-from-home and bring a family feel to your vacation.
## 3 Decorated for the Holidays. Blendtec® Designer 625 Blender Bundle with Twister Jar MORE THAN JUST SMOOTHIES: From hot soup to ice cream and everything in between, your imagination is the only limit to the creative power you can unleash with Blendtec blender. Our memory foam pillows are the best you'll ever sleep on. They are customizable utilizing exclusive Variable Fill Technology ensuring a pillow that is tailored just for you. This is the only memory foam pillow in the world that is adjustable. You can sculpt it much like a down pillow - it will shift and change into whatever shape you desire. We offer a continental breakfast and/or light breakfast fare. Wake up coffee or you can make your own.  The first night and morning for all guests.  There is a candy bowl with and without, sugar-free.  There is a white terrycloth robe and slippers as well as fluffy thick bath and hand towels and a facecloth.  Handmade Amish Wildflower Soap. A luffa mitt and other arrival bath amenities.  A wel
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                If you are doing business travel, this studio is excellent because it offers one large desk and also a built in desk that would give you lots of room.  Fiber optic wifi is very stable and fast 100 mbps.
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          
##                                                                                                                                                                                                                                                                               transit
## 1                                                                                                                                                                                                                                                                                    
## 2                                                 Short drive to subway and elevated trains running to major tourist spots in LA; freeways minutes away as well.  Car is advised for maximum accessibility to greater Los Angeles. Uber-friendly suburb, close to Hollywood and more.
## 3 There are many buses; bus stops going in every direction are just around the corner. The subway is five minutes away. We are in the heart of Los Angeles, West Hollywood, Hollywood, California, USA.  Convenient to all the major studios. Beverly Hills is minutes away, as well.
## 4                                                                                                                                                                                                                                                                                    
## 5                                                                                                                                                                                                                       Public transportation is a 3 minutes walk to the main street.
## 6                                                                                                                                                                                                                       Car, Bike and Hike !! Uber access , Bus stop walking distance
##                                                                                                                                                                                                                                                                                        access
## 1                                                                                                                                                                                                                                                                                            
## 2                                                                                                                                                      Pool, patio and self-contained main house all accessible freely by guests.  Garage, pool house and back caretaker unit not accessible.
## 3 Kitchen with new refrigerator, dishwasher, stove and oven with new plank floors. Jacuzzi and sundeck New gym with new treadmill and elliptical   Sauna Your own secure parking space  Washer Dryer in building Shared brand new updated Bath with new glass enclosure and new plank floors.
## 4                                                                                                                                                                                                                                                                                            
## 5                                                                                                                                                                                                                                 Good access to all things in Los Angeles and Orange County.
## 6                                                                                                                                                                                                                                                                                            
##                                                                                                                                                                                                                                                                                                interaction
## 1                                                                                                                                                                                                                                                                                                         
## 2                                                                                   Host and caretaker may be available throughout your stay to assist in troubleshooting with local information/amenities. During holiday time, you may have the place to yourself.  Host available for support by phone.
## 3 I am friendly and available to help you with your needs even before you arrive. I am seldom seen as I am in and out with my daily tasks.   I always greet you with a smile if we do run into each other.  I am happy to help you find things to do especially if it is about the entertainment industry.
## 4                                                                                                                                                                                                                                                                                                         
## 5                                                                                                                                                                                                                                      I am always available for questions throughout the travellers stay.
## 6                                                                                                                                                                                                                                                                                                         
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    house_rules
## 1 Camelot NEW RESIDENTSâ\200\231 GENERAL INFORMATION   File:  New Residents Info  1 Created on 12/13/05  Hello, and welcome to the Camelot Condominium Complex.  Below is some information  to help you become oriented to your building and the complex.  1. The Camelot complex consists of five buildings.  Your new unit is in bldg._______.(URL HIDDEN)You need to always use your building number plus your unit number when  contacting either our property management company, Real Support Property  Management Co. at ((PHONE NUMBER HIDDEN) or the Camelot office at (PHONE NUMBER HIDDEN).   The Camelot office hours are Monday through Friday 8:30 am to 3:30 pm.   2.  Parking in our Structures:  Parking for residents is in assigned, numbered  spaces that legally belong to each unit.  (Some units only have one parking space.)   You, any guest, or temporary worker you might have may only park in one of your  assigned spaces.  NOTE:  Any vehicle parked illegally in another residentâ\200\231s slot or  in a common area can
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Host asks that guests refrain from partying loudly into the evening on back patio/pool area.  Guest swim at their own risk; guest booking indicates agreement by Guest that Host is not responsible for any injuries related to the use of the pool or from being in or around the pool area. Finally, plumbing in the house is a bit sensitive. No feminine items down the toilet and nothing at all allowed in the garbage disposal in the kitchen. Thank you!
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      I just have one rule. The Golden Rule Do unto others as you would have them do unto you. This is a no smoking drug free place. No pets.
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      ABOUT YOU.  Friendly travelers or people coming to LA for work are welcome to stay .I am open to interns who visit Santa Monica. This isnâ\200\231t a party house, but if youâ\200\231re looking for a party there are plenty of great bars, music and comedyvenues within walking distance. Please tell me about yourself, and we can decide if itâ\200\231s going to be a good fit. A few requestsâ\200¦  -No smoking -No guests apart from those registered, and no parties  -Please remove shoes in the house.  -I keep my home clean, and would ask you to do the same.
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     No Drugs, No partying, No unreasonable  loudness of anykind after 11pm, no smoking, please keep voices low when entering property after 11pm as to not disturb neighbors
##   thumbnail_url medium_url
## 1            NA         NA
## 2            NA         NA
## 3            NA         NA
## 4            NA         NA
## 5            NA         NA
## 6            NA         NA
##                                                                                     picture_url
## 1            https://a0.muscache.com/im/pictures/4321499/1da9892a_original.jpg?aki_policy=large
## 2 https://a0.muscache.com/im/pictures/cc4b724d-db8b-4dd8-9c01-25841c4ba6ca.jpg?aki_policy=large
## 3           https://a0.muscache.com/im/pictures/40618141/2ac0b446_original.jpg?aki_policy=large
## 4            https://a0.muscache.com/im/pictures/1082974/0f74c9d1_original.jpg?aki_policy=large
## 5           https://a0.muscache.com/im/pictures/23817858/de20cdd9_original.jpg?aki_policy=large
## 6 https://a0.muscache.com/im/pictures/5147dcd2-efad-495c-8c31-d781cc626878.jpg?aki_policy=large
##   xl_picture_url host_id                               host_url      host_name
## 1             NA     521  https://www.airbnb.com/users/show/521          Paolo
## 2             NA     767  https://www.airbnb.com/users/show/767        Melissa
## 3             NA    3008 https://www.airbnb.com/users/show/3008          Chas.
## 4             NA    3041 https://www.airbnb.com/users/show/3041 Yoga Priestess
## 5             NA    3207 https://www.airbnb.com/users/show/3207      Bernadine
## 6             NA    3415 https://www.airbnb.com/users/show/3415        Nataraj
##   host_since                            host_location
## 1 2008-06-27 San Francisco, California, United States
## 2 2008-07-11       Burbank, California, United States
## 3 2008-09-16   Los Angeles, California, United States
## 4 2008-09-17  Santa Monica, California, United States
## 5 2008-09-25    Long Beach, California, United States
## 6 2008-10-02   Los Angeles, California, United States
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                host_about
## 1 Search for me on the Internet with the keyword pppaolo\n\nPolyhedric Lateral Thinker Entrepreneur, a Human Network Router and a Serendipity Innovator\n\n"Ahead of Number One" Paolo is a young technology entrepreneur and visionary, specializing in structuring progressive business models that capture the moment and stay on top of the future.\n\nClass of 1977, a master in computer science and one in marketing. 15 yrs of deep experience in Internet technology and business strategy, served more than 100 businesses.\nHe founded his first Internet company when he was 18. Then he founded Digitix, in 1999 in Italy and in 2002 in the United States, when he moved, first to NY, then LA and finally SF; involved in the operations and partner in 5 startups.\n\nIn 2010 he founded in Silicon Valley along with other partners, Doochoo, a revolutionary platform for the opinions in Internet as an innovative marketing and user engagement tool between brands and consumers: signed with Ikea, the first client in 2011.\n\nIn 2011 he held the position of Head of Innovation and Emerging Media in H-art, company of H-farm group, interactive agency for strategic marketing and communication projects, acquired by WPP, the world's' largest marketing and comm. group.\nPaolo now works full time and focuses only on his venture Doochoo ( (Website hidden by Airbnb) \n\nVery active in SV, connecting together experienced entrepreneurs, investors, , start-up rookies, and working to create an "Int'l Accelerator and TT Center", meanwhile, for years he has created a bridge between Italian and int'l companies.\n\nHe received several career awards from excellence centers, universities and conferences in Italy and USA; interviewed and mentioned on Int'l papers such as Financial Times, Wired, TechCrunch, La Repubblica, Il Sole 24 Ore, RAI, and many more.\n\nPaolo speaks Italian English Spanish French, restless traveler, power rollerskater, addicted photographer.\n\nTo date, Paolo commutes monthly between SF, NY and Venezia, and now with his company Pick1 (Doochoo Inc) he is part of Start-Up Chile amazing program and 500 Startups! 
## 2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Single mother, CEO and Owner of an international coaching and training business.  \n\nLove to travel! Family-focused and single friendly due to my own status! Hail from Washington, DC originally.  International interests.\n\n"RIOT FOR JOY" is my motto.  Looking forward to getting to know YOU.
## 3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Writer.\nLiterary Manager.\nPhotographer.\nProducing Partner.\nI work all the time.\nI wear many hats.\nProfessional.\nPleasant.\nRespectful.\nOptimistic and cheerful.
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         I have been teaching yoga and meditation for 30 years.\nWorld-traveled,passionate,love life and committed to making the world a healthier place one person and one company at a time. Enjoy meeting new and interesting people.
## 5                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Fair, open, honest and very informative for new guests to the area.
## 6                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Music Industry, Record producer, Songwriter, Composer, Multi Instrumentalist, Recording Artist
##   host_response_time host_response_rate host_acceptance_rate host_is_superhost
## 1                N/A                N/A                  N/A                 f
## 2       within a day               100%                  N/A                 f
## 3     within an hour               100%                  N/A                 t
## 4 within a few hours               100%                  N/A                 f
## 5                N/A                N/A                  N/A                 f
## 6                N/A                N/A                  N/A                 f
##                                                                                           host_thumbnail_url
## 1          https://a0.muscache.com/im/users/521/profile_pic/1429917533/original.jpg?aki_policy=profile_small
## 2          https://a0.muscache.com/im/users/767/profile_pic/1259093012/original.jpg?aki_policy=profile_small
## 3 https://a0.muscache.com/im/pictures/user/d17cfddd-9f98-4d0c-bfee-c005cc38a7de.jpg?aki_policy=profile_small
## 4         https://a0.muscache.com/im/users/3041/profile_pic/1331080494/original.jpg?aki_policy=profile_small
## 5      https://a0.muscache.com/im/pictures/8b82a267-bc4b-4d8b-935a-463a39c8c5ae.jpg?aki_policy=profile_small
## 6         https://a0.muscache.com/im/users/3415/profile_pic/1281545642/original.jpg?aki_policy=profile_small
##                                                                                                host_picture_url
## 1          https://a0.muscache.com/im/users/521/profile_pic/1429917533/original.jpg?aki_policy=profile_x_medium
## 2          https://a0.muscache.com/im/users/767/profile_pic/1259093012/original.jpg?aki_policy=profile_x_medium
## 3 https://a0.muscache.com/im/pictures/user/d17cfddd-9f98-4d0c-bfee-c005cc38a7de.jpg?aki_policy=profile_x_medium
## 4         https://a0.muscache.com/im/users/3041/profile_pic/1331080494/original.jpg?aki_policy=profile_x_medium
## 5      https://a0.muscache.com/im/pictures/8b82a267-bc4b-4d8b-935a-463a39c8c5ae.jpg?aki_policy=profile_x_medium
## 6         https://a0.muscache.com/im/users/3415/profile_pic/1281545642/original.jpg?aki_policy=profile_x_medium
##   host_neighbourhood host_listings_count host_total_listings_count
## 1        Culver City                   1                         1
## 2            Burbank                   1                         1
## 3          Hollywood                   2                         2
## 4       Santa Monica                   2                         2
## 5         Bellflower                   1                         1
## 6      Laurel Canyon                   3                         3
##                                                                 host_verifications
## 1                                 ['email', 'phone', 'facebook', 'reviews', 'kba']
## 2                   ['email', 'phone', 'reviews', 'jumio', 'kba', 'government_id']
## 3                                 ['email', 'phone', 'facebook', 'reviews', 'kba']
## 4 ['email', 'phone', 'reviews', 'jumio', 'offline_government_id', 'government_id']
## 5                                            ['email', 'phone', 'facebook', 'kba']
## 6                          ['email', 'phone', 'reviews', 'jumio', 'government_id']
##   host_has_profile_pic host_identity_verified                          street
## 1                    t                      t  Culver City, CA, United States
## 2                    t                      t      Burbank, CA, United States
## 3                    t                      t  Los Angeles, CA, United States
## 4                    t                      f Santa Monica, CA, United States
## 5                    t                      t   Bellflower, CA, United States
## 6                    t                      t  Los Angeles, CA, United States
##   neighbourhood neighbourhood_cleansed neighbourhood_group_cleansed
## 1   Culver City            Culver City                           NA
## 2       Burbank                Burbank                           NA
## 3                            Hollywood                           NA
## 4  Santa Monica           Santa Monica                           NA
## 5    Bellflower             Bellflower                           NA
## 6 Laurel Canyon   Hollywood Hills West                           NA
##           city state zipcode      market   smart_location country_code
## 1  Culver City    CA   90230 Los Angeles  Culver City, CA           US
## 2      Burbank    CA   91505 Los Angeles      Burbank, CA           US
## 3  Los Angeles    CA   90046 Los Angeles  Los Angeles, CA           US
## 4 Santa Monica    CA   90405 Los Angeles Santa Monica, CA           US
## 5   Bellflower    CA   90706 Los Angeles   Bellflower, CA           US
## 6  Los Angeles    CA   90046 Los Angeles  Los Angeles, CA           US
##         country latitude longitude is_location_exact property_type
## 1 United States 33.98209 -118.3849                 t   Condominium
## 2 United States 34.16562 -118.3346                 t         House
## 3 United States 34.09768 -118.3460                 t     Apartment
## 4 United States 34.00475 -118.4813                 t     Apartment
## 5 United States 33.87619 -118.1140                 t     Apartment
## 6 United States 34.11132 -118.3823                 t   Guest suite
##         room_type accommodates bathrooms bedrooms beds      bed_type
## 1 Entire home/apt            6       2.0        2    3      Real Bed
## 2 Entire home/apt            6       1.0        3    3      Real Bed
## 3    Private room            1       1.5        1    1      Real Bed
## 4    Private room            1       1.0        1    1 Pull-out Sofa
## 5 Entire home/apt            2       1.0        1    1      Real Bed
## 6 Entire home/apt            2       1.0        1    2      Real Bed
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      amenities
## 1                                                                                                                                                                                                                                              {TV,"Cable TV",Internet,Wifi,"Air conditioning","Wheelchair accessible",Pool,Kitchen,"Free parking on premises","Pets allowed",Gym,Elevator,"Hot tub","Indoor fireplace","Buzzer/wireless intercom",Heating,"Family/kid friendly","Suitable for events",Washer,Dryer,"Smoke detector","Carbon monoxide detector","First aid kit","Safety card","Fire extinguisher",Essentials,Shampoo,"24-hour check-in",Hangers,"Hair dryer",Iron,"Laptop friendly workspace"}
## 2                                                                                                    {TV,"Cable TV",Internet,Wifi,"Air conditioning",Pool,Kitchen,"Pets live on this property",Dog(s),"Free street parking",Heating,"Family/kid friendly",Washer,Dryer,"Smoke detector","Carbon monoxide detector","First aid kit",Essentials,Shampoo,"24-hour check-in",Hangers,"Hair dryer",Iron,"Laptop friendly workspace","Childrenâ\200\231s books and toys","Fireplace guards","Childrenâ\200\231s dinnerware","Hot water",Microwave,"Coffee maker",Refrigerator,Dishwasher,"Dishes and silverware","Cooking basics",Oven,Stove,"Single level home","BBQ grill","Patio or balcony","Luggage dropoff allowed",Other}
## 3 {Internet,Wifi,"Air conditioning","Wheelchair accessible",Kitchen,"Free parking on premises",Gym,Breakfast,Elevator,"Free street parking","Hot tub","Buzzer/wireless intercom",Heating,Washer,Dryer,"Smoke detector","Carbon monoxide detector","First aid kit","Safety card","Fire extinguisher",Essentials,Shampoo,"24-hour check-in",Hangers,"Hair dryer",Iron,"Laptop friendly workspace","translation missing: en.hosting_amenity_49","translation missing: en.hosting_amenity_50","Hot water","Bed linens","Extra pillows and blankets",Microwave,"Coffee maker",Refrigerator,Dishwasher,"Dishes and silverware","Cooking basics",Oven,Stove,"Single level home","Patio or balcony","Host greets you"}
## 4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      {Internet,Wifi,Kitchen,Heating,Washer,Dryer,"Smoke detector",Essentials,Shampoo,Hangers,"Hair dryer","Host greets you"}
## 5                                                                                                                                                                                                                                                                                                                                                                        {TV,"Cable TV",Internet,Wifi,"Air conditioning",Kitchen,"Free parking on premises","Hot tub","Indoor fireplace",Heating,Washer,Dryer,"Smoke detector","First aid kit","Fire extinguisher",Hangers,"Hair dryer","Laptop friendly workspace","translation missing: en.hosting_amenity_49","translation missing: en.hosting_amenity_50"}
## 6                                                                                                                                                                                                                                                                                                      {TV,"Cable TV",Wifi,"Air conditioning",Kitchen,"Free parking on premises","Pets allowed","Free street parking",Heating,"Family/kid friendly","Smoke detector","Carbon monoxide detector",Essentials,Shampoo,Hangers,"Hair dryer",Iron,"Laptop friendly workspace","translation missing: en.hosting_amenity_50","Private entrance","Hot water","Bed linens","Long term stays allowed","Host greets you"}
##   square_feet   price weekly_price monthly_price security_deposit cleaning_fee
## 1          NA $122.00      $904.00     $2,851.00          $500.00      $240.00
## 2          NA $168.00                                       $0.00      $100.00
## 3          NA  $79.00      $399.00       $949.00          $299.00       $85.00
## 4          NA $140.00      $800.00     $1,879.00                       $100.00
## 5          NA  $80.00      $399.00     $1,400.00          $100.00       $75.00
## 6          NA  $82.00      $790.00     $2,450.00          $250.00       $60.00
##   guests_included extra_people minimum_nights maximum_nights calendar_updated
## 1               3       $25.00              7            730     9 months ago
## 2               6        $0.00              2             14       4 days ago
## 3               1        $0.00              6            366            today
## 4               1        $0.00              1            180      4 weeks ago
## 5               1       $25.00              2            730     4 months ago
## 6               1        $9.00              3            730      3 weeks ago
##   has_availability availability_30 availability_60 availability_90
## 1                t               0               0               0
## 2                t               0               0               0
## 3                t               0               0               6
## 4                t              25              55              85
## 5                t               0               0               0
## 6                t               0               0               7
##   availability_365 calendar_last_scraped number_of_reviews first_review
## 1              236            2018-12-07                 2   2011-08-15
## 2              135            2018-12-07                 4   2016-06-14
## 3              260            2018-12-06                13   2014-06-09
## 4              360            2018-12-06                18   2011-06-06
## 5                0            2018-12-06                 0             
## 6              282            2018-12-07                23   2013-09-03
##   last_review review_scores_rating review_scores_accuracy
## 1  2016-05-15                   80                     10
## 2  2018-10-21                   93                     10
## 3  2018-09-07                   97                     10
## 4  2018-11-15                   96                      9
## 5                               NA                     NA
## 6  2018-10-31                   81                      8
##   review_scores_cleanliness review_scores_checkin review_scores_communication
## 1                        10                     6                           8
## 2                        10                    10                          10
## 3                        10                    10                          10
## 4                         9                    10                          10
## 5                        NA                    NA                          NA
## 6                         8                     8                           9
##   review_scores_location review_scores_value requires_license license
## 1                     10                   8                f        
## 2                     10                   9                f        
## 3                     10                  10                f        
## 4                     10                   9                t  228269
## 5                     NA                  NA                f        
## 6                      9                   8                f        
##              jurisdiction_names instant_bookable is_business_travel_ready
## 1         {"Culver City"," CA"}                f                        f
## 2                                              t                        f
## 3 {"City of Los Angeles"," CA"}                t                        f
## 4              {"Santa Monica"}                f                        f
## 5                                              f                        f
## 6 {"City of Los Angeles"," CA"}                f                        f
##           cancellation_policy require_guest_profile_picture
## 1 strict_14_with_grace_period                             t
## 2                    flexible                             f
## 3 strict_14_with_grace_period                             f
## 4 strict_14_with_grace_period                             f
## 5 strict_14_with_grace_period                             f
## 6 strict_14_with_grace_period                             f
##   require_guest_phone_verification calculated_host_listings_count
## 1                                f                              1
## 2                                f                              1
## 3                                f                              2
## 4                                f                              2
## 5                                f                              1
## 6                                f                              3
##   reviews_per_month
## 1              0.02
## 2              0.13
## 3              0.24
## 4              0.20
## 5                NA
## 6              0.36

DATA CLEANING AND PREPROCESSING

We need to select useful features from the dataset which can be used for descriptions, modeling and predictions by subsetting the data. We will do the following in no particular order:

  • We will not consider most of the text-based features
  • We will not consider features with url’s
  • We will remove ‘host_verifications’. It is more important to know whether a host is verified that knowing all of the documents.
  • Etc.
Los_Angeles_Listings_subset <- Los_Angeles_Listings %>% 
  dplyr::select(id,host_id,host_since, host_response_time, host_response_rate,
                experiences_offered, host_acceptance_rate, host_is_superhost,
                host_listings_count, host_total_listings_count, host_has_profile_pic,
                host_identity_verified, neighbourhood_cleansed, city, state,
                zipcode, market, country_code, country, is_location_exact,
                property_type, room_type, accommodates, bathrooms, bedrooms,
                beds, bed_type,amenities, square_feet, price, weekly_price, monthly_price,
                security_deposit, cleaning_fee, guests_included, extra_people, minimum_nights,
                maximum_nights, has_availability, availability_30,
                availability_60, availability_90, availability_365, number_of_reviews,
                first_review, last_review, review_scores_rating, review_scores_accuracy,
                review_scores_cleanliness, review_scores_checkin, review_scores_communication,
                review_scores_location, review_scores_value, requires_license,
                instant_bookable, is_business_travel_ready, cancellation_policy,
                require_guest_profile_picture, require_guest_phone_verification,
                calculated_host_listings_count, reviews_per_month, neighbourhood_group_cleansed)

# Take a look at descriptive summary of Los_Angeles_Listings_subset dataset
summary(Los_Angeles_Listings_subset)
##        id              host_id               host_since   
##  Min.   :     109   Min.   :       59   2017-05-10:  162  
##  1st Qu.:11624794   1st Qu.: 10488351   2015-11-02:   88  
##  Median :19916052   Median : 37791461   2015-04-19:   83  
##  Mean   :18186850   Mean   : 64844796   2018-05-02:   82  
##  3rd Qu.:25741808   3rd Qu.:106317829   2016-07-09:   79  
##  Max.   :30584243   Max.   :229328694   2014-07-11:   76  
##                                         (Other)   :42477  
##           host_response_time host_response_rate experiences_offered
##                    :    3    100%   :23975      none:43047         
##  a few days or more:  700    N/A    :12011                         
##  N/A               :12011    90%    :  870                         
##  within a day      : 2626    99%    :  619                         
##  within a few hours: 4663    98%    :  584                         
##  within an hour    :23044    0%     :  476                         
##                              (Other): 4512                         
##  host_acceptance_rate host_is_superhost host_listings_count
##     :    3             :    3           Min.   :  0.000    
##  N/A:43044            f:31649           1st Qu.:  1.000    
##                       t:11395           Median :  2.000    
##                                         Mean   :  9.598    
##                                         3rd Qu.:  5.000    
##                                         Max.   :803.000    
##                                         NA's   :3          
##  host_total_listings_count host_has_profile_pic host_identity_verified
##  Min.   :  0.000            :    3               :    3               
##  1st Qu.:  1.000           f:   49              f:22651               
##  Median :  2.000           t:42995              t:20393               
##  Mean   :  9.598                                                      
##  3rd Qu.:  5.000                                                      
##  Max.   :803.000                                                      
##  NA's   :3                                                            
##      neighbourhood_cleansed             city           state      
##  Hollywood      : 2757      Los Angeles   :27454   CA     :43000  
##  Venice         : 2691      Long Beach    : 1497   Ca     :   32  
##  Downtown       : 1642      West Hollywood: 1001   ca     :    6  
##  Long Beach     : 1499      Santa Monica  :  962          :    2  
##  Hollywood Hills: 1092      Marina del Rey:  747   NY     :    2  
##  Westlake       : 1006      Beverly Hills :  709   加州 :    1  
##  (Other)        :32360      (Other)       :10677   (Other):    4  
##     zipcode                        market      country_code
##  90291  : 2193   Los Angeles          :41609   US:43047    
##  90046  : 1661   Other (Domestic)     : 1023               
##  90028  : 1659   Malibu               :  265               
##  90026  : 1263                        :   74               
##  90068  : 1007   Fontana              :   35               
##  90036  :  988   Coastal Orange County:   13               
##  (Other):34276   (Other)              :   28               
##           country      is_location_exact     property_type  
##  United States:43047   f: 9296           Apartment  :16187  
##                        t:33751           House      :14510  
##                                          Condominium: 2406  
##                                          Guesthouse : 2202  
##                                          Townhouse  : 1368  
##                                          Guest suite: 1324  
##                                          (Other)    : 5050  
##            room_type      accommodates      bathrooms         bedrooms     
##  Entire home/apt:26835   Min.   : 1.000   Min.   : 0.000   Min.   : 0.000  
##  Private room   :14261   1st Qu.: 2.000   1st Qu.: 1.000   1st Qu.: 1.000  
##  Shared room    : 1951   Median : 3.000   Median : 1.000   Median : 1.000  
##                          Mean   : 3.678   Mean   : 1.451   Mean   : 1.415  
##                          3rd Qu.: 5.000   3rd Qu.: 2.000   3rd Qu.: 2.000  
##                          Max.   :40.000   Max.   :22.000   Max.   :50.000  
##                                           NA's   :28       NA's   :18      
##       beds                 bed_type    
##  Min.   : 0.000   Airbed       :  131  
##  1st Qu.: 1.000   Couch        :   82  
##  Median : 1.000   Futon        :  232  
##  Mean   : 1.981   Pull-out Sofa:  157  
##  3rd Qu.: 2.000   Real Bed     :42445  
##  Max.   :50.000                        
##  NA's   :34                            
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          amenities    
##  {}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           :  140  
##  {TV,Wifi,"Air conditioning",Pool,Kitchen,Heating,Essentials,Shampoo,Hangers}                                                                                                                                                                                                                                                                                                                                                                                                                                                 :   34  
##  {TV,Wifi,Kitchen,"Smoke detector","Carbon monoxide detector","Fire extinguisher",Essentials,Shampoo,Hangers,"Hair dryer",Iron,"Private entrance","Hot water","Body soap","Bed linens",Microwave,"Coffee maker",Refrigerator,"Dishes and silverware","Cooking basics",Stove,"Host greets you"}                                                                                                                                                                                                                                :   30  
##  {"Family/kid friendly"}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      :   29  
##  {TV,Wifi,"Air conditioning",Pool,Kitchen,Gym,Elevator,"Hot tub",Heating,Washer,Dryer,"Smoke detector","Carbon monoxide detector","Fire extinguisher",Essentials,Shampoo,Hangers,"Hair dryer",Iron,"Laptop friendly workspace","Self check-in",Lockbox,"Hot water","Bed linens","Ethernet connection",Microwave,"Coffee maker",Refrigerator,Dishwasher,"Dishes and silverware","Cooking basics",Oven,Stove,"BBQ grill","Patio or balcony","Long term stays allowed","Paid parking on premises"}                               :   23  
##  {TV,"Cable TV",Wifi,Pool,Kitchen,"Free parking on premises",Gym,"Pets live on this property",Elevator,"Indoor fireplace",Heating,Washer,Dryer,"Smoke detector","Carbon monoxide detector","Fire extinguisher",Essentials,Hangers,"Hair dryer",Iron,"Laptop friendly workspace","Self check-in",Lockbox,"Private entrance",Bathtub,"Hot water","Bed linens","Ethernet connection",Microwave,"Coffee maker",Refrigerator,Dishwasher,"Dishes and silverware",Oven,Stove,"Patio or balcony","Long term stays allowed",Beachfront}:   21  
##  (Other)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      :42770  
##   square_feet         price        weekly_price     monthly_price  
##  Min.   :   0.0   $100.00: 1381          :37410            :37901  
##  1st Qu.: 400.0   $150.00: 1216   $500.00:  203   $3,000.00:  161  
##  Median : 800.0   $75.00 : 1194   $600.00:  192   $2,500.00:  147  
##  Mean   : 991.7   $50.00 : 1055   $800.00:  172   $1,500.00:  141  
##  3rd Qu.:1200.0   $99.00 : 1015   $700.00:  170   $1,800.00:  128  
##  Max.   :7000.0   $125.00:  909   $650.00:  160   $2,000.00:  123  
##  NA's   :42700    (Other):36277   (Other): 4740   (Other)  : 4446  
##  security_deposit  cleaning_fee   guests_included   extra_people  
##         :11450           : 6336   Min.   : 1.000   $0.00  :20313  
##  $0.00  : 8578    $50.00 : 2796   1st Qu.: 1.000   $10.00 : 4148  
##  $100.00: 4492    $100.00: 2609   Median : 1.000   $25.00 : 3724  
##  $500.00: 3607    $25.00 : 1775   Mean   : 1.909   $20.00 : 3509  
##  $200.00: 2773    $0.00  : 1756   3rd Qu.: 2.000   $15.00 : 2820  
##  $300.00: 2085    $150.00: 1751   Max.   :16.000   $50.00 : 1949  
##  (Other):10062    (Other):26024                    (Other): 6584  
##  minimum_nights     maximum_nights      has_availability availability_30
##  Min.   :   1.000   Min.   :      1.0   t:43047          Min.   : 0.00  
##  1st Qu.:   1.000   1st Qu.:     30.0                    1st Qu.: 1.00  
##  Median :   2.000   Median :   1125.0                    Median :13.00  
##  Mean   :   5.104   Mean   :    666.9                    Mean   :13.66  
##  3rd Qu.:   3.000   3rd Qu.:   1125.0                    3rd Qu.:25.00  
##  Max.   :3000.000   Max.   :1000000.0                    Max.   :30.00  
##                                                                         
##  availability_60 availability_90 availability_365 number_of_reviews
##  Min.   : 0.0    Min.   : 0.00   Min.   :  0      Min.   :  0.0    
##  1st Qu.: 8.0    1st Qu.:18.00   1st Qu.: 51      1st Qu.:  1.0    
##  Median :36.0    Median :64.00   Median :160      Median :  7.0    
##  Mean   :32.2    Mean   :52.24   Mean   :177      Mean   : 28.5    
##  3rd Qu.:54.0    3rd Qu.:83.00   3rd Qu.:335      3rd Qu.: 32.0    
##  Max.   :60.0    Max.   :90.00   Max.   :365      Max.   :739.0    
##                                                                    
##      first_review       last_review    review_scores_rating
##            : 8720             : 8720   Min.   : 20.00      
##  2018-11-12:  119   2018-12-02: 1196   1st Qu.: 93.00      
##  2018-10-28:  116   2018-11-12: 1131   Median : 97.00      
##  2018-11-11:  111   2018-11-25: 1113   Mean   : 94.48      
##  2018-07-08:  106   2018-11-18:  921   3rd Qu.:100.00      
##  2018-08-12:  105   2018-11-24:  893   Max.   :100.00      
##  (Other)   :33770   (Other)   :29073   NA's   :9276        
##  review_scores_accuracy review_scores_cleanliness review_scores_checkin
##  Min.   : 2.000         Min.   : 2.000            Min.   : 2.000       
##  1st Qu.:10.000         1st Qu.: 9.000            1st Qu.:10.000       
##  Median :10.000         Median :10.000            Median :10.000       
##  Mean   : 9.645         Mean   : 9.438            Mean   : 9.777       
##  3rd Qu.:10.000         3rd Qu.:10.000            3rd Qu.:10.000       
##  Max.   :10.000         Max.   :10.000            Max.   :10.000       
##  NA's   :9299           NA's   :9297              NA's   :9335         
##  review_scores_communication review_scores_location review_scores_value
##  Min.   : 2.000              Min.   : 2.000         Min.   : 2.000     
##  1st Qu.:10.000              1st Qu.: 9.000         1st Qu.: 9.000     
##  Median :10.000              Median :10.000         Median :10.000     
##  Mean   : 9.765              Mean   : 9.655         Mean   : 9.483     
##  3rd Qu.:10.000              3rd Qu.:10.000         3rd Qu.:10.000     
##  Max.   :10.000              Max.   :10.000         Max.   :10.000     
##  NA's   :9304                NA's   :9340           NA's   :9348       
##  requires_license instant_bookable is_business_travel_ready
##  f:41439          f:23614          f:43047                 
##  t: 1608          t:19433                                  
##                                                            
##                                                            
##                                                            
##                                                            
##                                                            
##                   cancellation_policy require_guest_profile_picture
##  flexible                   :12890    f:42255                      
##  moderate                   :11888    t:  792                      
##  strict                     :   63                                 
##  strict_14_with_grace_period:18008                                 
##  super_strict_30            :    9                                 
##  super_strict_60            :  189                                 
##                                                                    
##  require_guest_phone_verification calculated_host_listings_count
##  f:41950                          Min.   :  1.000               
##  t: 1097                          1st Qu.:  1.000               
##                                   Median :  2.000               
##                                   Mean   :  5.834               
##                                   3rd Qu.:  5.000               
##                                   Max.   :152.000               
##                                                                 
##  reviews_per_month neighbourhood_group_cleansed
##  Min.   : 0.010    Mode:logical                
##  1st Qu.: 0.390    NA's:43047                  
##  Median : 1.180                                
##  Mean   : 1.898                                
##  3rd Qu.: 2.860                                
##  Max.   :17.840                                
##  NA's   :8720
# Removing original dataset to free up the space
rm(Los_Angeles_Listings)


# Summary output reveals interesting things
# We can see feature "neighbourhood_group_cleansed","experiences_offered" 
# and "host_acceptance_rate" are almost completely empty.
# All the values are NA. So we should remove these feature as they don't
# contain any useful information
Los_Angeles_Listings_subset$neighbourhood_group_cleansed=NULL
Los_Angeles_Listings_subset$host_acceptance_rate=NULL
Los_Angeles_Listings_subset$experiences_offered=NULL

# Also, we can notice in the summary output that features like "country code",
# "country","state", "has_availability" and "is_business_travel_ready" 
# contains only single type of information. So they are not useful
# for modeling and predictions. We can remove them
Los_Angeles_Listings_subset$country_code=NULL
Los_Angeles_Listings_subset$country=NULL
Los_Angeles_Listings_subset$state=NULL
Los_Angeles_Listings_subset$has_availability=NULL
Los_Angeles_Listings_subset$is_business_travel_ready=NULL

# host_total_listings_count and host_listings_count contain same information
# so keeping only one of these i.e host_listings_count
Los_Angeles_Listings_subset$host_total_listings_count <- NULL

# Feature square_feet contains 42700 NA values which is approximately
# 99% of the total values. So we should also remove this feature
summary(Los_Angeles_Listings_subset$square_feet)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##     0.0   400.0   800.0   991.7  1200.0  7000.0   42700
Los_Angeles_Listings_subset$square_feet<- NULL

#The rest of the following we determined to be not needed for this analysis:
Los_Angeles_Listings_subset$weekly_price<- NULL
Los_Angeles_Listings_subset$monthly_price<- NULL
Los_Angeles_Listings_subset$first_review<- NULL
Los_Angeles_Listings_subset$last_review<- NULL
Los_Angeles_Listings_subset$market<- NULL
Los_Angeles_Listings_subset$zipcode<- NULL
Los_Angeles_Listings_subset$city<- NULL
Los_Angeles_Listings_subset$neighbourhood_cleansed<- NULL
Los_Angeles_Listings_subset$id<- NULL
Los_Angeles_Listings_subset$host_id<- NULL
Los_Angeles_Listings_subset$host_since<- NULL

# Checking the summary statistics of updated dataset
summary(Los_Angeles_Listings_subset)
##           host_response_time host_response_rate host_is_superhost
##                    :    3    100%   :23975       :    3          
##  a few days or more:  700    N/A    :12011      f:31649          
##  N/A               :12011    90%    :  870      t:11395          
##  within a day      : 2626    99%    :  619                       
##  within a few hours: 4663    98%    :  584                       
##  within an hour    :23044    0%     :  476                       
##                              (Other): 4512                       
##  host_listings_count host_has_profile_pic host_identity_verified
##  Min.   :  0.000      :    3               :    3               
##  1st Qu.:  1.000     f:   49              f:22651               
##  Median :  2.000     t:42995              t:20393               
##  Mean   :  9.598                                                
##  3rd Qu.:  5.000                                                
##  Max.   :803.000                                                
##  NA's   :3                                                      
##  is_location_exact     property_type             room_type      accommodates   
##  f: 9296           Apartment  :16187   Entire home/apt:26835   Min.   : 1.000  
##  t:33751           House      :14510   Private room   :14261   1st Qu.: 2.000  
##                    Condominium: 2406   Shared room    : 1951   Median : 3.000  
##                    Guesthouse : 2202                           Mean   : 3.678  
##                    Townhouse  : 1368                           3rd Qu.: 5.000  
##                    Guest suite: 1324                           Max.   :40.000  
##                    (Other)    : 5050                                           
##    bathrooms         bedrooms           beds                 bed_type    
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Airbed       :  131  
##  1st Qu.: 1.000   1st Qu.: 1.000   1st Qu.: 1.000   Couch        :   82  
##  Median : 1.000   Median : 1.000   Median : 1.000   Futon        :  232  
##  Mean   : 1.451   Mean   : 1.415   Mean   : 1.981   Pull-out Sofa:  157  
##  3rd Qu.: 2.000   3rd Qu.: 2.000   3rd Qu.: 2.000   Real Bed     :42445  
##  Max.   :22.000   Max.   :50.000   Max.   :50.000                        
##  NA's   :28       NA's   :18       NA's   :34                            
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          amenities    
##  {}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           :  140  
##  {TV,Wifi,"Air conditioning",Pool,Kitchen,Heating,Essentials,Shampoo,Hangers}                                                                                                                                                                                                                                                                                                                                                                                                                                                 :   34  
##  {TV,Wifi,Kitchen,"Smoke detector","Carbon monoxide detector","Fire extinguisher",Essentials,Shampoo,Hangers,"Hair dryer",Iron,"Private entrance","Hot water","Body soap","Bed linens",Microwave,"Coffee maker",Refrigerator,"Dishes and silverware","Cooking basics",Stove,"Host greets you"}                                                                                                                                                                                                                                :   30  
##  {"Family/kid friendly"}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      :   29  
##  {TV,Wifi,"Air conditioning",Pool,Kitchen,Gym,Elevator,"Hot tub",Heating,Washer,Dryer,"Smoke detector","Carbon monoxide detector","Fire extinguisher",Essentials,Shampoo,Hangers,"Hair dryer",Iron,"Laptop friendly workspace","Self check-in",Lockbox,"Hot water","Bed linens","Ethernet connection",Microwave,"Coffee maker",Refrigerator,Dishwasher,"Dishes and silverware","Cooking basics",Oven,Stove,"BBQ grill","Patio or balcony","Long term stays allowed","Paid parking on premises"}                               :   23  
##  {TV,"Cable TV",Wifi,Pool,Kitchen,"Free parking on premises",Gym,"Pets live on this property",Elevator,"Indoor fireplace",Heating,Washer,Dryer,"Smoke detector","Carbon monoxide detector","Fire extinguisher",Essentials,Hangers,"Hair dryer",Iron,"Laptop friendly workspace","Self check-in",Lockbox,"Private entrance",Bathtub,"Hot water","Bed linens","Ethernet connection",Microwave,"Coffee maker",Refrigerator,Dishwasher,"Dishes and silverware",Oven,Stove,"Patio or balcony","Long term stays allowed",Beachfront}:   21  
##  (Other)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      :42770  
##      price       security_deposit  cleaning_fee   guests_included 
##  $100.00: 1381          :11450           : 6336   Min.   : 1.000  
##  $150.00: 1216   $0.00  : 8578    $50.00 : 2796   1st Qu.: 1.000  
##  $75.00 : 1194   $100.00: 4492    $100.00: 2609   Median : 1.000  
##  $50.00 : 1055   $500.00: 3607    $25.00 : 1775   Mean   : 1.909  
##  $99.00 : 1015   $200.00: 2773    $0.00  : 1756   3rd Qu.: 2.000  
##  $125.00:  909   $300.00: 2085    $150.00: 1751   Max.   :16.000  
##  (Other):36277   (Other):10062    (Other):26024                   
##   extra_people   minimum_nights     maximum_nights      availability_30
##  $0.00  :20313   Min.   :   1.000   Min.   :      1.0   Min.   : 0.00  
##  $10.00 : 4148   1st Qu.:   1.000   1st Qu.:     30.0   1st Qu.: 1.00  
##  $25.00 : 3724   Median :   2.000   Median :   1125.0   Median :13.00  
##  $20.00 : 3509   Mean   :   5.104   Mean   :    666.9   Mean   :13.66  
##  $15.00 : 2820   3rd Qu.:   3.000   3rd Qu.:   1125.0   3rd Qu.:25.00  
##  $50.00 : 1949   Max.   :3000.000   Max.   :1000000.0   Max.   :30.00  
##  (Other): 6584                                                         
##  availability_60 availability_90 availability_365 number_of_reviews
##  Min.   : 0.0    Min.   : 0.00   Min.   :  0      Min.   :  0.0    
##  1st Qu.: 8.0    1st Qu.:18.00   1st Qu.: 51      1st Qu.:  1.0    
##  Median :36.0    Median :64.00   Median :160      Median :  7.0    
##  Mean   :32.2    Mean   :52.24   Mean   :177      Mean   : 28.5    
##  3rd Qu.:54.0    3rd Qu.:83.00   3rd Qu.:335      3rd Qu.: 32.0    
##  Max.   :60.0    Max.   :90.00   Max.   :365      Max.   :739.0    
##                                                                    
##  review_scores_rating review_scores_accuracy review_scores_cleanliness
##  Min.   : 20.00       Min.   : 2.000         Min.   : 2.000           
##  1st Qu.: 93.00       1st Qu.:10.000         1st Qu.: 9.000           
##  Median : 97.00       Median :10.000         Median :10.000           
##  Mean   : 94.48       Mean   : 9.645         Mean   : 9.438           
##  3rd Qu.:100.00       3rd Qu.:10.000         3rd Qu.:10.000           
##  Max.   :100.00       Max.   :10.000         Max.   :10.000           
##  NA's   :9276         NA's   :9299           NA's   :9297             
##  review_scores_checkin review_scores_communication review_scores_location
##  Min.   : 2.000        Min.   : 2.000              Min.   : 2.000        
##  1st Qu.:10.000        1st Qu.:10.000              1st Qu.: 9.000        
##  Median :10.000        Median :10.000              Median :10.000        
##  Mean   : 9.777        Mean   : 9.765              Mean   : 9.655        
##  3rd Qu.:10.000        3rd Qu.:10.000              3rd Qu.:10.000        
##  Max.   :10.000        Max.   :10.000              Max.   :10.000        
##  NA's   :9335          NA's   :9304                NA's   :9340          
##  review_scores_value requires_license instant_bookable
##  Min.   : 2.000      f:41439          f:23614         
##  1st Qu.: 9.000      t: 1608          t:19433         
##  Median :10.000                                       
##  Mean   : 9.483                                       
##  3rd Qu.:10.000                                       
##  Max.   :10.000                                       
##  NA's   :9348                                         
##                   cancellation_policy require_guest_profile_picture
##  flexible                   :12890    f:42255                      
##  moderate                   :11888    t:  792                      
##  strict                     :   63                                 
##  strict_14_with_grace_period:18008                                 
##  super_strict_30            :    9                                 
##  super_strict_60            :  189                                 
##                                                                    
##  require_guest_phone_verification calculated_host_listings_count
##  f:41950                          Min.   :  1.000               
##  t: 1097                          1st Qu.:  1.000               
##                                   Median :  2.000               
##                                   Mean   :  5.834               
##                                   3rd Qu.:  5.000               
##                                   Max.   :152.000               
##                                                                 
##  reviews_per_month
##  Min.   : 0.010   
##  1st Qu.: 0.390   
##  Median : 1.180   
##  Mean   : 1.898   
##  3rd Qu.: 2.860   
##  Max.   :17.840   
##  NA's   :8720
str(Los_Angeles_Listings_subset)
## 'data.frame':    43047 obs. of  41 variables:
##  $ host_response_time              : Factor w/ 6 levels "","a few days or more",..: 3 4 6 5 3 3 6 6 5 6 ...
##  $ host_response_rate              : Factor w/ 66 levels "","0%","10%",..: 66 4 4 4 66 66 4 4 4 4 ...
##  $ host_is_superhost               : Factor w/ 3 levels "","f","t": 2 2 3 2 2 2 2 3 3 2 ...
##  $ host_listings_count             : int  1 1 2 2 1 3 13 2 1 3 ...
##  $ host_has_profile_pic            : Factor w/ 3 levels "","f","t": 3 3 3 3 3 3 3 3 3 3 ...
##  $ host_identity_verified          : Factor w/ 3 levels "","f","t": 3 3 3 2 3 3 3 3 3 2 ...
##  $ is_location_exact               : Factor w/ 2 levels "f","t": 2 2 2 2 2 2 2 2 2 1 ...
##  $ property_type                   : Factor w/ 44 levels "Aparthotel","Apartment",..: 16 26 2 2 2 22 43 2 2 26 ...
##  $ room_type                       : Factor w/ 3 levels "Entire home/apt",..: 1 1 2 2 1 1 2 2 1 2 ...
##  $ accommodates                    : int  6 6 1 1 2 2 2 1 2 2 ...
##  $ bathrooms                       : num  2 1 1.5 1 1 1 1 1.5 1 1 ...
##  $ bedrooms                        : int  2 3 1 1 1 1 1 1 1 1 ...
##  $ beds                            : int  3 3 1 1 1 2 1 1 1 1 ...
##  $ bed_type                        : Factor w/ 5 levels "Airbed","Couch",..: 5 5 5 4 5 5 5 5 5 5 ...
##  $ amenities                       : Factor w/ 40292 levels "{\"Air conditioning\",\"Fire extinguisher\",Essentials,Shampoo,Hangers}",..: 3226 7439 738 2180 3891 10322 3126 740 2133 34974 ...
##  $ price                           : Factor w/ 867 levels "$0.00","$1,000.00",..: 135 187 785 155 796 801 811 862 796 562 ...
##  $ security_deposit                : Factor w/ 214 levels "","$0.00","$1,000.00",..: 166 2 114 1 25 102 52 130 1 1 ...
##  $ cleaning_fee                    : Factor w/ 294 levels "","$0.00","$1,000.00",..: 103 8 277 8 262 236 109 282 109 137 ...
##  $ guests_included                 : int  3 6 1 1 1 1 2 1 2 1 ...
##  $ extra_people                    : Factor w/ 99 levels "$0.00","$10.00",..: 36 1 1 1 36 93 1 1 36 1 ...
##  $ minimum_nights                  : int  7 2 6 1 2 3 5 6 2 1 ...
##  $ maximum_nights                  : int  730 14 366 180 730 730 30 375 365 730 ...
##  $ availability_30                 : int  0 0 0 25 0 0 15 5 19 30 ...
##  $ availability_60                 : int  0 0 0 55 0 0 15 14 49 60 ...
##  $ availability_90                 : int  0 0 6 85 0 7 15 44 73 90 ...
##  $ availability_365                : int  236 135 260 360 0 282 15 319 313 179 ...
##  $ number_of_reviews               : int  2 4 13 18 0 23 22 12 184 0 ...
##  $ review_scores_rating            : int  80 93 97 96 NA 81 89 96 94 NA ...
##  $ review_scores_accuracy          : int  10 10 10 9 NA 8 8 10 10 NA ...
##  $ review_scores_cleanliness       : int  10 10 10 9 NA 8 8 9 9 NA ...
##  $ review_scores_checkin           : int  6 10 10 10 NA 8 9 10 10 NA ...
##  $ review_scores_communication     : int  8 10 10 10 NA 9 9 10 10 NA ...
##  $ review_scores_location          : int  10 10 10 10 NA 9 9 9 9 NA ...
##  $ review_scores_value             : int  8 9 10 9 NA 8 8 9 9 NA ...
##  $ requires_license                : Factor w/ 2 levels "f","t": 1 1 1 2 1 1 1 1 1 1 ...
##  $ instant_bookable                : Factor w/ 2 levels "f","t": 1 2 2 1 1 1 1 2 1 1 ...
##  $ cancellation_policy             : Factor w/ 6 levels "flexible","moderate",..: 4 1 4 4 4 4 4 4 4 1 ...
##  $ require_guest_profile_picture   : Factor w/ 2 levels "f","t": 2 1 1 1 1 1 1 1 1 1 ...
##  $ require_guest_phone_verification: Factor w/ 2 levels "f","t": 1 1 1 1 1 1 1 1 1 1 ...
##  $ calculated_host_listings_count  : int  1 1 2 2 1 3 5 2 1 2 ...
##  $ reviews_per_month               : num  0.02 0.13 0.24 0.2 NA 0.36 0.19 0.1 1.59 NA ...
# New we will relabel and replace the object. 

LA_Listings_Cleaned <- Los_Angeles_Listings_subset

Converting Factor to Numeric

# Converting host_response_rate to numeric column
LA_Listings_Cleaned$host_response_rate <- as.numeric(
  gsub( "%", "", as.character(LA_Listings_Cleaned$host_response_rate)))

# Converting price to numeric column
LA_Listings_Cleaned$price <- as.numeric(
  gsub( "[\\$,]", "", as.character(LA_Listings_Cleaned$price)))


# Converting security_deposit to numeric column
LA_Listings_Cleaned$security_deposit <- as.numeric(
  gsub( "[\\$,]", "", as.character(LA_Listings_Cleaned$security_deposit)))

# Converting cleaning_fee to numeric column
LA_Listings_Cleaned$cleaning_fee <- as.numeric(
  gsub( "[\\$,]", "", as.character(LA_Listings_Cleaned$cleaning_fee)))

# Converting extra_people to numeric column
LA_Listings_Cleaned$extra_people <- as.numeric(
  gsub( "[\\$,]", "", as.character(LA_Listings_Cleaned$extra_people)))

Handling Missing Values

# Looking at the summary of whole dataset except "amenity" feature
summary(LA_Listings_Cleaned[,-22])
##           host_response_time host_response_rate host_is_superhost
##                    :    3    Min.   :  0.00      :    3          
##  a few days or more:  700    1st Qu.:100.00     f:31649          
##  N/A               :12011    Median :100.00     t:11395          
##  within a day      : 2626    Mean   : 95.32                      
##  within a few hours: 4663    3rd Qu.:100.00                      
##  within an hour    :23044    Max.   :100.00                      
##                              NA's   :12014                       
##  host_listings_count host_has_profile_pic host_identity_verified
##  Min.   :  0.000      :    3               :    3               
##  1st Qu.:  1.000     f:   49              f:22651               
##  Median :  2.000     t:42995              t:20393               
##  Mean   :  9.598                                                
##  3rd Qu.:  5.000                                                
##  Max.   :803.000                                                
##  NA's   :3                                                      
##  is_location_exact     property_type             room_type      accommodates   
##  f: 9296           Apartment  :16187   Entire home/apt:26835   Min.   : 1.000  
##  t:33751           House      :14510   Private room   :14261   1st Qu.: 2.000  
##                    Condominium: 2406   Shared room    : 1951   Median : 3.000  
##                    Guesthouse : 2202                           Mean   : 3.678  
##                    Townhouse  : 1368                           3rd Qu.: 5.000  
##                    Guest suite: 1324                           Max.   :40.000  
##                    (Other)    : 5050                                           
##    bathrooms         bedrooms           beds                 bed_type    
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Airbed       :  131  
##  1st Qu.: 1.000   1st Qu.: 1.000   1st Qu.: 1.000   Couch        :   82  
##  Median : 1.000   Median : 1.000   Median : 1.000   Futon        :  232  
##  Mean   : 1.451   Mean   : 1.415   Mean   : 1.981   Pull-out Sofa:  157  
##  3rd Qu.: 2.000   3rd Qu.: 2.000   3rd Qu.: 2.000   Real Bed     :42445  
##  Max.   :22.000   Max.   :50.000   Max.   :50.000                        
##  NA's   :28       NA's   :18       NA's   :34                            
##                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          amenities    
##  {}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           :  140  
##  {TV,Wifi,"Air conditioning",Pool,Kitchen,Heating,Essentials,Shampoo,Hangers}                                                                                                                                                                                                                                                                                                                                                                                                                                                 :   34  
##  {TV,Wifi,Kitchen,"Smoke detector","Carbon monoxide detector","Fire extinguisher",Essentials,Shampoo,Hangers,"Hair dryer",Iron,"Private entrance","Hot water","Body soap","Bed linens",Microwave,"Coffee maker",Refrigerator,"Dishes and silverware","Cooking basics",Stove,"Host greets you"}                                                                                                                                                                                                                                :   30  
##  {"Family/kid friendly"}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      :   29  
##  {TV,Wifi,"Air conditioning",Pool,Kitchen,Gym,Elevator,"Hot tub",Heating,Washer,Dryer,"Smoke detector","Carbon monoxide detector","Fire extinguisher",Essentials,Shampoo,Hangers,"Hair dryer",Iron,"Laptop friendly workspace","Self check-in",Lockbox,"Hot water","Bed linens","Ethernet connection",Microwave,"Coffee maker",Refrigerator,Dishwasher,"Dishes and silverware","Cooking basics",Oven,Stove,"BBQ grill","Patio or balcony","Long term stays allowed","Paid parking on premises"}                               :   23  
##  {TV,"Cable TV",Wifi,Pool,Kitchen,"Free parking on premises",Gym,"Pets live on this property",Elevator,"Indoor fireplace",Heating,Washer,Dryer,"Smoke detector","Carbon monoxide detector","Fire extinguisher",Essentials,Hangers,"Hair dryer",Iron,"Laptop friendly workspace","Self check-in",Lockbox,"Private entrance",Bathtub,"Hot water","Bed linens","Ethernet connection",Microwave,"Coffee maker",Refrigerator,Dishwasher,"Dishes and silverware",Oven,Stove,"Patio or balcony","Long term stays allowed",Beachfront}:   21  
##  (Other)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      :42770  
##      price         security_deposit  cleaning_fee    guests_included 
##  Min.   :    0.0   Min.   :   0.0   Min.   :   0.0   Min.   : 1.000  
##  1st Qu.:   69.0   1st Qu.:   0.0   1st Qu.:  30.0   1st Qu.: 1.000  
##  Median :  105.0   Median : 200.0   Median :  60.0   Median : 1.000  
##  Mean   :  198.6   Mean   : 391.5   Mean   :  83.3   Mean   : 1.909  
##  3rd Qu.:  179.0   3rd Qu.: 400.0   3rd Qu.: 100.0   3rd Qu.: 2.000  
##  Max.   :25000.0   Max.   :5100.0   Max.   :1500.0   Max.   :16.000  
##                    NA's   :11450    NA's   :6336                     
##   extra_people minimum_nights     availability_30 availability_60
##  Min.   :  0   Min.   :   1.000   Min.   : 0.00   Min.   : 0.0   
##  1st Qu.:  0   1st Qu.:   1.000   1st Qu.: 1.00   1st Qu.: 8.0   
##  Median : 10   Median :   2.000   Median :13.00   Median :36.0   
##  Mean   : 15   Mean   :   5.104   Mean   :13.66   Mean   :32.2   
##  3rd Qu.: 20   3rd Qu.:   3.000   3rd Qu.:25.00   3rd Qu.:54.0   
##  Max.   :300   Max.   :3000.000   Max.   :30.00   Max.   :60.0   
##                                                                  
##  availability_90 availability_365 number_of_reviews review_scores_rating
##  Min.   : 0.00   Min.   :  0      Min.   :  0.0     Min.   : 20.00      
##  1st Qu.:18.00   1st Qu.: 51      1st Qu.:  1.0     1st Qu.: 93.00      
##  Median :64.00   Median :160      Median :  7.0     Median : 97.00      
##  Mean   :52.24   Mean   :177      Mean   : 28.5     Mean   : 94.48      
##  3rd Qu.:83.00   3rd Qu.:335      3rd Qu.: 32.0     3rd Qu.:100.00      
##  Max.   :90.00   Max.   :365      Max.   :739.0     Max.   :100.00      
##                                                     NA's   :9276        
##  review_scores_accuracy review_scores_cleanliness review_scores_checkin
##  Min.   : 2.000         Min.   : 2.000            Min.   : 2.000       
##  1st Qu.:10.000         1st Qu.: 9.000            1st Qu.:10.000       
##  Median :10.000         Median :10.000            Median :10.000       
##  Mean   : 9.645         Mean   : 9.438            Mean   : 9.777       
##  3rd Qu.:10.000         3rd Qu.:10.000            3rd Qu.:10.000       
##  Max.   :10.000         Max.   :10.000            Max.   :10.000       
##  NA's   :9299           NA's   :9297              NA's   :9335         
##  review_scores_communication review_scores_location review_scores_value
##  Min.   : 2.000              Min.   : 2.000         Min.   : 2.000     
##  1st Qu.:10.000              1st Qu.: 9.000         1st Qu.: 9.000     
##  Median :10.000              Median :10.000         Median :10.000     
##  Mean   : 9.765              Mean   : 9.655         Mean   : 9.483     
##  3rd Qu.:10.000              3rd Qu.:10.000         3rd Qu.:10.000     
##  Max.   :10.000              Max.   :10.000         Max.   :10.000     
##  NA's   :9304                NA's   :9340           NA's   :9348       
##  requires_license instant_bookable                  cancellation_policy
##  f:41439          f:23614          flexible                   :12890   
##  t: 1608          t:19433          moderate                   :11888   
##                                    strict                     :   63   
##                                    strict_14_with_grace_period:18008   
##                                    super_strict_30            :    9   
##                                    super_strict_60            :  189   
##                                                                        
##  require_guest_profile_picture require_guest_phone_verification
##  f:42255                       f:41950                         
##  t:  792                       t: 1097                         
##                                                                
##                                                                
##                                                                
##                                                                
##                                                                
##  calculated_host_listings_count reviews_per_month
##  Min.   :  1.000                Min.   : 0.010   
##  1st Qu.:  1.000                1st Qu.: 0.390   
##  Median :  2.000                Median : 1.180   
##  Mean   :  5.834                Mean   : 1.898   
##  3rd Qu.:  5.000                3rd Qu.: 2.860   
##  Max.   :152.000                Max.   :17.840   
##                                 NA's   :8720
# For categorical variable we will replace NAs and blanks with Mode value
# For numerical variables we will replace NAs and blanks with median values
# Also for each variable in which we impute some values in place of NAs and blanks,
# we will creating a corresponding flag variable which will
# contain information of whether the value in variable is imputed one or actual one.
# Sometime knowing that the value is imputed or actual one also helps in 
# improving the predictive ability of model

LA_Listings_Cleaned$flag_host_response_rate <- 
  ifelse(is.na(LA_Listings_Cleaned$host_response_rate) |
           LA_Listings_Cleaned$host_response_rate=='' , 1,0)

LA_Listings_Cleaned$host_response_rate <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$host_response_rate) |
           LA_Listings_Cleaned$host_response_rate=='' , 
         median(LA_Listings_Cleaned$host_response_rate,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$host_response_rate)))

LA_Listings_Cleaned$flag_host_response_time <- 
  ifelse(LA_Listings_Cleaned$host_response_time=='N/A' |
           LA_Listings_Cleaned$host_response_time=='' , 1,0)

LA_Listings_Cleaned$host_response_time <- as.factor(
  ifelse(LA_Listings_Cleaned$host_response_time=='N/A' |
           LA_Listings_Cleaned$host_response_time=='' , 'within an hour',
         as.character(LA_Listings_Cleaned$host_response_time)))

LA_Listings_Cleaned$flag_host_is_superhost <- 
  ifelse(LA_Listings_Cleaned$host_is_superhost=='N/A' |
           LA_Listings_Cleaned$host_is_superhost=='' , 1,0)

LA_Listings_Cleaned$host_is_superhost <- as.factor(
  ifelse(LA_Listings_Cleaned$host_is_superhost=='N/A' |
           LA_Listings_Cleaned$host_is_superhost=='' , 'f',
         as.character(LA_Listings_Cleaned$host_is_superhost)))


LA_Listings_Cleaned$flag_host_listings_count <- 
  ifelse(is.na(LA_Listings_Cleaned$host_listings_count) |
           LA_Listings_Cleaned$host_listings_count=='' , 1,0)

LA_Listings_Cleaned$host_listings_count <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$host_listings_count) |
           LA_Listings_Cleaned$host_listings_count=='' , 
         median(LA_Listings_Cleaned$host_listings_count,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$host_listings_count)))


LA_Listings_Cleaned$flag_host_has_profile_pic <- 
  ifelse(LA_Listings_Cleaned$host_has_profile_pic=='N/A' |
           LA_Listings_Cleaned$host_has_profile_pic=='' , 1,0)

LA_Listings_Cleaned$host_has_profile_pic <- as.factor(
  ifelse(LA_Listings_Cleaned$host_has_profile_pic=='N/A' |
           LA_Listings_Cleaned$host_has_profile_pic=='' , 't',
         as.character(LA_Listings_Cleaned$host_has_profile_pic)))


LA_Listings_Cleaned$flag_host_identity_verified <- 
  ifelse(LA_Listings_Cleaned$host_identity_verified=='N/A' |
           LA_Listings_Cleaned$host_identity_verified=='' , 1,0)

LA_Listings_Cleaned$host_identity_verified <- as.factor(
  ifelse(LA_Listings_Cleaned$host_identity_verified=='N/A' |
           LA_Listings_Cleaned$host_identity_verified=='' , 't',
         as.character(LA_Listings_Cleaned$host_identity_verified)))


LA_Listings_Cleaned$flag_bathrooms <- 
  ifelse(is.na(LA_Listings_Cleaned$bathrooms) |
           LA_Listings_Cleaned$bathrooms=='' , 1,0)

LA_Listings_Cleaned$bathrooms <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$bathrooms) |
           LA_Listings_Cleaned$bathrooms=='' , 
         median(LA_Listings_Cleaned$bathrooms,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$bathrooms)))


LA_Listings_Cleaned$flag_bedrooms <- 
  ifelse(is.na(LA_Listings_Cleaned$bedrooms) |
           LA_Listings_Cleaned$bedrooms=='' , 1,0)

LA_Listings_Cleaned$bedrooms <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$bedrooms) |
           LA_Listings_Cleaned$bedrooms=='' , 
         median(LA_Listings_Cleaned$bedrooms,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$bedrooms)))


LA_Listings_Cleaned$flag_beds <- 
  ifelse(is.na(LA_Listings_Cleaned$beds) |
           LA_Listings_Cleaned$beds=='' , 1,0)

LA_Listings_Cleaned$beds <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$beds) |
           LA_Listings_Cleaned$beds=='' , 
         median(LA_Listings_Cleaned$beds,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$beds)))


LA_Listings_Cleaned$flag_security_deposit <- 
  ifelse(is.na(LA_Listings_Cleaned$security_deposit) |
           LA_Listings_Cleaned$security_deposit=='' , 1,0)

LA_Listings_Cleaned$security_deposit <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$security_deposit) |
           LA_Listings_Cleaned$security_deposit=='' , 
         median(LA_Listings_Cleaned$security_deposit,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$security_deposit)))


LA_Listings_Cleaned$flag_cleaning_fee <- 
  ifelse(is.na(LA_Listings_Cleaned$cleaning_fee) |
           LA_Listings_Cleaned$cleaning_fee=='' , 1,0)

LA_Listings_Cleaned$cleaning_fee <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$cleaning_fee) |
           LA_Listings_Cleaned$cleaning_fee=='' , 
         median(LA_Listings_Cleaned$cleaning_fee,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$cleaning_fee)))


LA_Listings_Cleaned$flag_review_scores_rating <- 
  ifelse(is.na(LA_Listings_Cleaned$review_scores_rating) |
           LA_Listings_Cleaned$review_scores_rating=='' , 1,0)

LA_Listings_Cleaned$review_scores_rating <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$review_scores_rating) |
           LA_Listings_Cleaned$review_scores_rating=='' , 
         median(LA_Listings_Cleaned$review_scores_rating,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$review_scores_rating)))


LA_Listings_Cleaned$flag_review_scores_accuracy <- 
  ifelse(is.na(LA_Listings_Cleaned$review_scores_accuracy) |
           LA_Listings_Cleaned$review_scores_accuracy=='' , 1,0)

LA_Listings_Cleaned$review_scores_accuracy <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$review_scores_accuracy) |
           LA_Listings_Cleaned$review_scores_accuracy=='' , 
         median(LA_Listings_Cleaned$review_scores_accuracy,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$review_scores_accuracy)))


LA_Listings_Cleaned$flag_review_scores_cleanliness <- 
  ifelse(is.na(LA_Listings_Cleaned$review_scores_cleanliness) |
           LA_Listings_Cleaned$review_scores_cleanliness=='' , 1,0)

LA_Listings_Cleaned$review_scores_cleanliness <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$review_scores_cleanliness) |
           LA_Listings_Cleaned$review_scores_cleanliness=='' , 
         median(LA_Listings_Cleaned$review_scores_cleanliness,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$review_scores_cleanliness)))


LA_Listings_Cleaned$flag_review_scores_checkin <- 
  ifelse(is.na(LA_Listings_Cleaned$review_scores_checkin) |
           LA_Listings_Cleaned$review_scores_checkin=='' , 1,0)

LA_Listings_Cleaned$review_scores_checkin <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$review_scores_checkin) |
           LA_Listings_Cleaned$review_scores_checkin=='' , 
         median(LA_Listings_Cleaned$review_scores_checkin,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$review_scores_checkin)))


LA_Listings_Cleaned$flag_review_scores_communication <- 
  ifelse(is.na(LA_Listings_Cleaned$review_scores_communication) |
           LA_Listings_Cleaned$review_scores_communication=='' , 1,0)


LA_Listings_Cleaned$review_scores_communication <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$review_scores_communication) |
           LA_Listings_Cleaned$review_scores_communication=='' , 
         median(LA_Listings_Cleaned$review_scores_communication,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$review_scores_communication)))


LA_Listings_Cleaned$flag_review_scores_location <- 
  ifelse(is.na(LA_Listings_Cleaned$review_scores_location) |
           LA_Listings_Cleaned$review_scores_location=='' , 1,0)

LA_Listings_Cleaned$review_scores_location <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$review_scores_location) |
           LA_Listings_Cleaned$review_scores_location=='' , 
         median(LA_Listings_Cleaned$review_scores_location,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$review_scores_location)))


LA_Listings_Cleaned$flag_review_scores_value <- 
  ifelse(is.na(LA_Listings_Cleaned$review_scores_value) |
           LA_Listings_Cleaned$review_scores_value=='' , 1,0)


LA_Listings_Cleaned$review_scores_value <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$review_scores_value) |
           LA_Listings_Cleaned$review_scores_value=='' , 
         median(LA_Listings_Cleaned$review_scores_value,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$review_scores_value)))


LA_Listings_Cleaned$flag_reviews_per_month <- 
  ifelse(is.na(LA_Listings_Cleaned$reviews_per_month) |
           LA_Listings_Cleaned$reviews_per_month=='' , 1,0)

LA_Listings_Cleaned$reviews_per_month <- as.numeric(
  ifelse(is.na(LA_Listings_Cleaned$reviews_per_month) |
           LA_Listings_Cleaned$reviews_per_month=='' , 
         median(LA_Listings_Cleaned$reviews_per_month,na.rm = TRUE),
         as.character(LA_Listings_Cleaned$reviews_per_month)))

Feature Engineering

# Creating some meaningful features from less important features
LA_Listings_Cleaned$amenities_count <-
  str_count(LA_Listings_Cleaned$amenities, ",")+1

# Removing actual amenities variable
LA_Listings_Cleaned$amenities <- NULL

Dummy Coding Categorical Variables

# host_response_time variable
LA_Listings_Cleaned$host_response_within_few_days_or_more <-
  ifelse(LA_Listings_Cleaned$host_response_time=='a few days or more',1,0)


LA_Listings_Cleaned$host_response_within_a_days <-
  ifelse(LA_Listings_Cleaned$host_response_time=='within a day',1,0)


LA_Listings_Cleaned$host_response_within_few_hours <-
  ifelse(LA_Listings_Cleaned$host_response_time=='within a few hours',1,0)


LA_Listings_Cleaned$host_response_within_an_hour <-
  ifelse(LA_Listings_Cleaned$host_response_time=='within an hour',1,0)

# Removing actual variable
LA_Listings_Cleaned$host_response_time <- NULL


# host_has_profile_pic variable
LA_Listings_Cleaned$host_has_profile_pic<-
  ifelse(LA_Listings_Cleaned$host_has_profile_pic == 't',1,0)

# host_identity_verified variable
LA_Listings_Cleaned$host_identity_verified<-
  ifelse(LA_Listings_Cleaned$host_identity_verified == 'f',0,1)


#  is_location_exact variable
LA_Listings_Cleaned$is_location_exact<-
  ifelse(LA_Listings_Cleaned$is_location_exact == 'f',0,1)

# requires_license variable
LA_Listings_Cleaned$requires_license<-
  ifelse(LA_Listings_Cleaned$requires_license == 'f',0,1)


# instant_bookable variable
LA_Listings_Cleaned$instant_bookable<-
  ifelse(LA_Listings_Cleaned$instant_bookable == 'f',0,1)

# require_guest_profile_picture variable
LA_Listings_Cleaned$require_guest_profile_picture<-
  ifelse(LA_Listings_Cleaned$require_guest_profile_picture == 'f',0,1)

# require_guest_phone_verification variable
LA_Listings_Cleaned$require_guest_phone_verification<-
  ifelse(LA_Listings_Cleaned$require_guest_phone_verification == 'f',0,1)

# host_is_superhost variable
LA_Listings_Cleaned$host_is_superhost<-
  ifelse(LA_Listings_Cleaned$host_is_superhost == 'f',0,1)

Bucketing Dummified Features

# cancellation_policy variable

summary(LA_Listings_Cleaned$cancellation_policy)
##                    flexible                    moderate 
##                       12890                       11888 
##                      strict strict_14_with_grace_period 
##                          63                       18008 
##             super_strict_30             super_strict_60 
##                           9                         189
# Bucketing strict,strict_14_with_grace_period,super_strict_30,super_strict_60 under
#one category of "strict"

LA_Listings_Cleaned$cancellation_policy_strict <-
  ifelse(LA_Listings_Cleaned$cancellation_policy == 'strict' |
           LA_Listings_Cleaned$cancellation_policy == 'strict_14_with_grace_period' |
           LA_Listings_Cleaned$cancellation_policy == 'super_strict_30' |
           LA_Listings_Cleaned$cancellation_policy == 'super_strict_60',1,0)


LA_Listings_Cleaned$cancellation_policy_flexible <-
  ifelse(LA_Listings_Cleaned$cancellation_policy == 'flexible',1,0)


LA_Listings_Cleaned$cancellation_policy_moderate <-
  ifelse(LA_Listings_Cleaned$cancellation_policy == 'moderate',1,0)

# Removing actual variable
LA_Listings_Cleaned$cancellation_policy <- NULL


# property_type variable
summary(LA_Listings_Cleaned$property_type)
##             Aparthotel              Apartment                   Barn 
##                     38                  16187                      7 
##      Bed and breakfast                   Boat         Boutique hotel 
##                    186                     34                    132 
##               Bungalow                    Bus                  Cabin 
##                   1269                      4                     89 
##              Camper/RV               Campsite Casa particular (Cuba) 
##                    182                      8                      2 
##                 Castle                   Cave                 Chalet 
##                     10                      1                     15 
##            Condominium                Cottage             Dome house 
##                   2406                    176                      4 
##                   Dorm            Earth house              Farm stay 
##                      2                     11                     30 
##            Guest suite             Guesthouse                 Hostel 
##                   1324                   2202                    341 
##                  Hotel                  House              Houseboat 
##                     40                  14510                      3 
##                    Hut                 Island             Lighthouse 
##                      5                      2                      1 
##                   Loft         Minsu (Taiwan)                  Other 
##                   1006                      2                    169 
##                  Plane                 Resort     Serviced apartment 
##                      1                      2                    356 
##                   Tent             Tiny house                   Tipi 
##                     25                     50                      6 
##              Townhouse                  Train              Treehouse 
##                   1368                      1                     12 
##                  Villa                   Yurt 
##                    814                     14
# Bucketing property type into 3 categories named House, Apartment and Other
LA_Listings_Cleaned$property_type <- as.factor(
  ifelse(LA_Listings_Cleaned$property_type == 'House','House',
         ifelse(LA_Listings_Cleaned$property_type == 'Apartment','Apartment',
                'Other'))
)

# Dummification of property type
LA_Listings_Cleaned$property_type_house <-
  ifelse(LA_Listings_Cleaned$property_type == 'House',1,0)


LA_Listings_Cleaned$property_type_apartment <-
  ifelse(LA_Listings_Cleaned$property_type == 'Apartment',1,0)


LA_Listings_Cleaned$property_type_other <-
  ifelse(LA_Listings_Cleaned$property_type == 'Other',1,0)

# Removing actual variable
LA_Listings_Cleaned$property_type <- NULL

### room type variable
summary(LA_Listings_Cleaned$room_type)
## Entire home/apt    Private room     Shared room 
##           26835           14261            1951
LA_Listings_Cleaned$room_type_private <- 
  ifelse(LA_Listings_Cleaned$room_type == 'Private room',1,0)

LA_Listings_Cleaned$room_type_shared <- 
  ifelse(LA_Listings_Cleaned$room_type == 'Shared room',1,0)

LA_Listings_Cleaned$room_type_entire_home <- 
  ifelse(LA_Listings_Cleaned$room_type == 'Entire home/apt',1,0)

# Removing actual column
LA_Listings_Cleaned$room_type <- NULL

### bed type variable
summary(LA_Listings_Cleaned$bed_type)
##        Airbed         Couch         Futon Pull-out Sofa      Real Bed 
##           131            82           232           157         42445
LA_Listings_Cleaned$bed_type <-
  ifelse(LA_Listings_Cleaned$bed_type == 'Real Bed' ,1,0)

#str(LA_Listings_Cleaned)

PART 3: Data Exploration and Final Dataset

##   0%  25%  50%  75% 100% 
##    0    1    2    5  803

##   0%  25%  50%  75% 100% 
##    1    2    3    5   40

##   0%  25%  50%  75% 100% 
##    0    1    1    2   22

##   0%  25%  50%  75% 100% 
##    0    1    1    2   50

##   0%  25%  50%  75% 100% 
##    0    1    1    2   50

##    0%   25%   50%   75%  100% 
##     0    69   105   179 25000

##   0%  25%  50%  75% 100% 
##    0  100  200  300 5100

##   0%  25%  50%  75% 100% 
##    0   35   60  100 1500

##   0%  25%  50%  75% 100% 
##    1    1    1    2   16

##   0%  25%  50%  75% 100% 
##    0    0   10   20  300

##   0%  25%  50%  75% 100% 
##    1    1    2    3 3000

##   0%  25%  50%  75% 100% 
##    0    1    7   32  739

##   0%  25%  50%  75% 100% 
##   20   94   97   99  100

##   0%  25%  50%  75% 100% 
##    2   10   10   10   10

##   0%  25%  50%  75% 100% 
##    2    9   10   10   10

##   0%  25%  50%  75% 100% 
##    2   10   10   10   10

##   0%  25%  50%  75% 100% 
##    2   10   10   10   10

##   0%  25%  50%  75% 100% 
##    2   10   10   10   10

##   0%  25%  50%  75% 100% 
##    2    9   10   10   10

##    0%   25%   50%   75%  100% 
##  0.01  0.56  1.18  2.31 17.84

##   0%  25%  50%  75% 100% 
##    1   18   24   33  112
##   host_is_superhost avg_price
## 1                 0       100
## 2                 1       110

##    host_listings_count avg_price
## 1                    0     100.0
## 2                    1     104.0
## 3                    2     100.0
## 4                    3     100.0
## 5                    4      99.5
## 6                    5     100.0
## 7                    6     115.0
## 8                    7     109.0
## 9                    8     105.0
## 10                   9     115.0
## 11                  10     130.0
## 12                  11     110.0
## 13                  12     110.0
## 14                  13     125.0
## 15                  14     129.0
## 16                  15     115.0
## 17                  16      90.0
## 18                  17     135.5
## 19                  18     100.0
## 20                  19     135.0
## 21                  20     118.5
## 22                  21      89.0
## 23                  22      79.0
## 24                  23      72.5
## 25                  24      63.0
## 26                  25     115.0
## 27                  26      75.0
## 28                  27      59.0
## 29                  28     159.0
## 30                  29      95.0
## 31                  30     115.0
## 32                  31     179.0
## 33                  32      25.0
## 34                  33     139.5
## 35                  34      69.0
## 36                  35     179.0
## 37                  36      52.0
## 38                  37     130.0
## 39                  38      39.0
## 40                  39     139.0
## 41                  40     299.0
## 42                  41     110.0
## 43                  43     199.0
## 44                  45      39.0
## 45                  46      25.0
## 46                  47      72.0
## 47                  48     500.0
## 48                  49     220.0
## 49                  50     179.0
## 50                  51     200.0
## 51                  53      84.0
## 52                  56     231.0
## 53                  58     175.0
## 54                  59     199.0
## 55                  60     230.5
## 56                  61     250.0
## 57                  62     214.0
## 58                  63     196.0
## 59                  66     204.5
## 60                  67     575.0
## 61                  69      80.0
## 62                  70     299.0
## 63                  76      23.5
## 64                  89     299.0
## 65                  90     109.0
## 66                  95     249.0
## 67                  98     299.0
## 68                 100      35.0
## 69                 115      90.0
## 70                 116     196.0
## 71                 117    2233.5
## 72                 148     300.0
## 73                 152    4562.5
## 74                 165     192.0
## 75                 185     499.0
## 76                 209    1100.0
## 77                 218     400.0
## 78                 223     560.0
## 79                 272     323.0
## 80                 280     454.0
## 81                 343     128.5
## 82                 388     645.0
## 83                 447     269.0
## 84                 483     259.0
## 85                 520     200.0
## 86                 571     162.0
## 87                 664     100.0
## 88                 803     249.0

##   host_has_profile_pic avg_price
## 1                    0       130
## 2                    1       105

##   host_identity_verified avg_price
## 1                      0       100
## 2                      1       109

##    accommodates avg_price
## 1             1      50.0
## 2             2      80.0
## 3             3     100.0
## 4             4     131.5
## 5             5     160.0
## 6             6     200.0
## 7             7     225.0
## 8             8     299.0
## 9             9     265.0
## 10           10     450.0
## 11           11     280.0
## 12           12     550.0
## 13           13     329.0
## 14           14     566.0
## 15           15     262.5
## 16           16     499.0
## 17           20    3995.0
## 18           40     340.0

##   cancelation_policy median_price
## 1             Strict          125
## 2           Moderate          100
## 3           Flexible           90

##   property_type median_price
## 1         House          100
## 2     Apartment          105
## 3         Other          110

##      room_type median_price
## 1      private           65
## 2       shared           30
## 3 entire house          149

Data Transformations and Final Dataset

A significant amount of the data that we explored were skewed either way, which made the data transformations necessarry. Below is a breakdown of the transformed data that will be included in the final dataset:

########## Data transformations for outlier treatment #############

LA_Listings_Cleaned$log_host_listings_count <- log(LA_Listings_Cleaned$host_listings_count+1)
LA_Listings_Cleaned$log_accommodate <- log(LA_Listings_Cleaned$accommodates+1)
LA_Listings_Cleaned$log_bathrooms <- log(LA_Listings_Cleaned$bathrooms+1)
LA_Listings_Cleaned$log_bedrooms <- log(LA_Listings_Cleaned$bedrooms+1)
LA_Listings_Cleaned$log_price <- log(LA_Listings_Cleaned$price+1)
LA_Listings_Cleaned$log_security_deposit <- log(LA_Listings_Cleaned$security_deposit+1)
LA_Listings_Cleaned$log_cleaning_fee <- log(LA_Listings_Cleaned$cleaning_fee+1)
LA_Listings_Cleaned$log_extra_people <- log(LA_Listings_Cleaned$extra_people+1)
LA_Listings_Cleaned$log_minimum_nights <- log(LA_Listings_Cleaned$minimum_nights+1)
LA_Listings_Cleaned$log_number_of_reviews <- log(LA_Listings_Cleaned$number_of_reviews+1)
LA_Listings_Cleaned$cuberoot_review_scores_rating <-LA_Listings_Cleaned$review_scores_rating^(1/3)
LA_Listings_Cleaned$log_reviews_per_month<- log(LA_Listings_Cleaned$reviews_per_month+1)


# removing the original columns as we are going to use their 
#trnasformations going forward
LA_Listings_Cleaned$host_listings_count<- NULL
LA_Listings_Cleaned$accommodates<- NULL
LA_Listings_Cleaned$bathrooms<- NULL
LA_Listings_Cleaned$bedrooms<- NULL
LA_Listings_Cleaned$price<- NULL
LA_Listings_Cleaned$security_deposit<- NULL
LA_Listings_Cleaned$cleaning_fee<- NULL
LA_Listings_Cleaned$extra_people<- NULL
LA_Listings_Cleaned$minimum_nights<- NULL
LA_Listings_Cleaned$number_of_reviews<- NULL
LA_Listings_Cleaned$reviews_per_month<- NULL
LA_Listings_Cleaned$review_scores_rating <- NULL

write.csv(LA_Listings_Cleaned, 'LA_Listings_Cleaned_FINAL.csv')
LA_Listings_Training <- read.csv('LA_Listings_Cleaned_FINAL.csv')

LA_Listings_Training <- LA_Listings_Training[,-26:-44]
LA_Listings_Training <- LA_Listings_Training[,-1]
LA_Listings_Training <- LA_Listings_Training[,-3]
LA_Listings_Training <- LA_Listings_Training[,-8]

write.csv(LA_Listings_Training, 'LA_Listings_Cleaned_FINAL.csv')
LA_Listings_Training <- read.csv('LA_Listings_Cleaned_FINAL.csv')
LA_Listings_Training <- LA_Listings_Training[,-1]
str(LA_Listings_Training)
## 'data.frame':    43047 obs. of  48 variables:
##  $ host_response_rate                   : int  100 100 100 100 100 100 100 100 100 100 ...
##  $ host_is_superhost                    : int  0 0 1 0 0 0 0 1 1 0 ...
##  $ host_identity_verified               : int  1 1 1 0 1 1 1 1 1 0 ...
##  $ is_location_exact                    : int  1 1 1 1 1 1 1 1 1 0 ...
##  $ beds                                 : int  3 3 1 1 1 2 1 1 1 1 ...
##  $ bed_type                             : int  1 1 1 0 1 1 1 1 1 1 ...
##  $ guests_included                      : int  3 6 1 1 1 1 2 1 2 1 ...
##  $ availability_30                      : int  0 0 0 25 0 0 15 5 19 30 ...
##  $ availability_60                      : int  0 0 0 55 0 0 15 14 49 60 ...
##  $ availability_90                      : int  0 0 6 85 0 7 15 44 73 90 ...
##  $ availability_365                     : int  236 135 260 360 0 282 15 319 313 179 ...
##  $ review_scores_accuracy               : int  10 10 10 9 10 8 8 10 10 10 ...
##  $ review_scores_cleanliness            : int  10 10 10 9 10 8 8 9 9 10 ...
##  $ review_scores_checkin                : int  6 10 10 10 10 8 9 10 10 10 ...
##  $ review_scores_communication          : int  8 10 10 10 10 9 9 10 10 10 ...
##  $ review_scores_location               : int  10 10 10 10 10 9 9 9 9 10 ...
##  $ review_scores_value                  : int  8 9 10 9 10 8 8 9 9 10 ...
##  $ requires_license                     : int  0 0 0 1 0 0 0 0 0 0 ...
##  $ instant_bookable                     : int  0 1 1 0 0 0 0 1 0 0 ...
##  $ require_guest_profile_picture        : int  1 0 0 0 0 0 0 0 0 0 ...
##  $ require_guest_phone_verification     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ calculated_host_listings_count       : int  1 1 2 2 1 3 5 2 1 2 ...
##  $ amenities_count                      : int  32 41 43 12 20 24 39 57 12 13 ...
##  $ host_response_within_few_days_or_more: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ host_response_within_a_days          : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ host_response_within_few_hours       : int  0 0 0 1 0 0 0 0 1 0 ...
##  $ host_response_within_an_hour         : int  1 0 1 0 1 1 1 1 0 1 ...
##  $ cancellation_policy_strict           : int  1 0 1 1 1 1 1 1 1 0 ...
##  $ cancellation_policy_flexible         : int  0 1 0 0 0 0 0 0 0 1 ...
##  $ cancellation_policy_moderate         : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ property_type_house                  : int  0 1 0 0 0 0 0 0 0 1 ...
##  $ property_type_apartment              : int  0 0 1 1 1 0 0 1 1 0 ...
##  $ property_type_other                  : int  1 0 0 0 0 1 1 0 0 0 ...
##  $ room_type_private                    : int  0 0 1 1 0 0 1 1 0 1 ...
##  $ room_type_shared                     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ room_type_entire_home                : int  1 1 0 0 1 1 0 0 1 0 ...
##  $ log_host_listings_count              : num  0.693 0.693 1.099 1.099 0.693 ...
##  $ log_accommodate                      : num  1.946 1.946 0.693 0.693 1.099 ...
##  $ log_bathrooms                        : num  1.099 0.693 0.916 0.693 0.693 ...
##  $ log_bedrooms                         : num  1.099 1.386 0.693 0.693 0.693 ...
##  $ log_price                            : num  4.81 5.13 4.38 4.95 4.39 ...
##  $ log_security_deposit                 : num  6.22 0 5.7 5.3 4.62 ...
##  $ log_cleaning_fee                     : num  5.48 4.62 4.45 4.62 4.33 ...
##  $ log_extra_people                     : num  3.26 0 0 0 3.26 ...
##  $ log_minimum_nights                   : num  2.079 1.099 1.946 0.693 1.099 ...
##  $ log_number_of_reviews                : num  1.1 1.61 2.64 2.94 0 ...
##  $ cuberoot_review_scores_rating        : num  4.31 4.53 4.59 4.58 4.59 ...
##  $ log_reviews_per_month                : num  0.0198 0.1222 0.2151 0.1823 0.7793 ...
#summary(LA_Listings_Training)
remove('LA_Listings_Cleaned')
remove('Los_Angeles_Listings_subset')

PART 4: Evaluating the Models

We crafted multiple models using multiple algorithms for the purpose of predicting the Superhost status. We trained 85% of the data and tested against the remaining 15% when running all the following algorithms:

kNN Algorithm

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Row Total |
## |           N / Col Total |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  6876 
## 
##  
##              | predicted 
##       actual |         0 |         1 | Row Total | 
## -------------|-----------|-----------|-----------|
##            0 |      4718 |       306 |      5024 | 
##              |     0.939 |     0.061 |     0.731 | 
##              |     0.770 |     0.407 |           | 
##              |     0.686 |     0.045 |           | 
## -------------|-----------|-----------|-----------|
##            1 |      1406 |       446 |      1852 | 
##              |     0.759 |     0.241 |     0.269 | 
##              |     0.230 |     0.593 |           | 
##              |     0.204 |     0.065 |           | 
## -------------|-----------|-----------|-----------|
## Column Total |      6124 |       752 |      6876 | 
##              |     0.891 |     0.109 |           | 
## -------------|-----------|-----------|-----------|
## 
## 

Naive-Bayes Algorithm

## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |           N / Row Total |
## |           N / Col Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  6876 
## 
##  
##              | predicted 
##       actual |        No |       Yes | Row Total | 
## -------------|-----------|-----------|-----------|
##           No |      3867 |        94 |      3961 | 
##              |     0.976 |     0.024 |     0.576 | 
##              |     0.770 |     0.051 |           | 
## -------------|-----------|-----------|-----------|
##          Yes |      1157 |      1758 |      2915 | 
##              |     0.397 |     0.603 |     0.424 | 
##              |     0.230 |     0.949 |           | 
## -------------|-----------|-----------|-----------|
## Column Total |      5024 |      1852 |      6876 | 
##              |     0.731 |     0.269 |           | 
## -------------|-----------|-----------|-----------|
## 
## 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   No  Yes
##        No  3867   94
##        Yes 1157 1758
##                                           
##                Accuracy : 0.8181          
##                  95% CI : (0.8087, 0.8271)
##     No Information Rate : 0.7307          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6087          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.9492          
##             Specificity : 0.7697          
##          Pos Pred Value : 0.6031          
##          Neg Pred Value : 0.9763          
##              Prevalence : 0.2693          
##          Detection Rate : 0.2557          
##    Detection Prevalence : 0.4239          
##       Balanced Accuracy : 0.8595          
##                                           
##        'Positive' Class : Yes             
## 

C50 Algorithm

## 
## Call:
## C5.0.default(x = superhost_train[, -2], y = superhost_train$host_is_superhost)
## 
## Classification Tree
## Number of samples: 36171 
## Number of predictors: 47 
## 
## Tree size: 197 
## 
## Non-standard options: attempt to group attributes
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  6876 
## 
##  
##                   | Predicted Superhosts 
## Actual Superhosts |        No |       Yes | Row Total | 
## ------------------|-----------|-----------|-----------|
##                No |      4588 |       436 |      5024 | 
##                   |     0.667 |     0.063 |           | 
## ------------------|-----------|-----------|-----------|
##               Yes |       571 |      1281 |      1852 | 
##                   |     0.083 |     0.186 |           | 
## ------------------|-----------|-----------|-----------|
##      Column Total |      5159 |      1717 |      6876 | 
## ------------------|-----------|-----------|-----------|
## 
## 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   No  Yes
##        No  4588  571
##        Yes  436 1281
##                                          
##                Accuracy : 0.8535         
##                  95% CI : (0.845, 0.8618)
##     No Information Rate : 0.7307         
##     P-Value [Acc > NIR] : < 2.2e-16      
##                                          
##                   Kappa : 0.6191         
##                                          
##  Mcnemar's Test P-Value : 2.414e-05      
##                                          
##             Sensitivity : 0.6917         
##             Specificity : 0.9132         
##          Pos Pred Value : 0.7461         
##          Neg Pred Value : 0.8893         
##              Prevalence : 0.2693         
##          Detection Rate : 0.1863         
##    Detection Prevalence : 0.2497         
##       Balanced Accuracy : 0.8025         
##                                          
##        'Positive' Class : Yes            
## 

OneR Algorithm

## 
## === Summary ===
## 
## Correctly Classified Instances       28569               78.9832 %
## Incorrectly Classified Instances      7602               21.0168 %
## Kappa statistic                          0.3537
## Mean absolute error                      0.2102
## Root mean squared error                  0.4584
## Relative absolute error                 54.1038 %
## Root relative squared error            104.0237 %
## Total Number of Instances            36171     
## 
## === Confusion Matrix ===
## 
##      a     b   <-- classified as
##  25249  1379 |     a = No
##   6223  3320 |     b = Yes
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  6876 
## 
##  
##                   | Predicted Superhosts 
## Actual Superhosts |        No |       Yes | Row Total | 
## ------------------|-----------|-----------|-----------|
##                No |      4795 |       229 |      5024 | 
##                   |     0.697 |     0.033 |           | 
## ------------------|-----------|-----------|-----------|
##               Yes |      1178 |       674 |      1852 | 
##                   |     0.171 |     0.098 |           | 
## ------------------|-----------|-----------|-----------|
##      Column Total |      5973 |       903 |      6876 | 
## ------------------|-----------|-----------|-----------|
## 
## 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   No  Yes
##        No  4795 1178
##        Yes  229  674
##                                           
##                Accuracy : 0.7954          
##                  95% CI : (0.7856, 0.8049)
##     No Information Rate : 0.7307          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.3798          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.36393         
##             Specificity : 0.95442         
##          Pos Pred Value : 0.74640         
##          Neg Pred Value : 0.80278         
##              Prevalence : 0.26934         
##          Detection Rate : 0.09802         
##    Detection Prevalence : 0.13133         
##       Balanced Accuracy : 0.65917         
##                                           
##        'Positive' Class : Yes             
## 

RIPPER Algorithm

## 
## === Summary ===
## 
## Correctly Classified Instances       31044               85.8257 %
## Incorrectly Classified Instances      5127               14.1743 %
## Kappa statistic                          0.6116
## Mean absolute error                      0.2347
## Root mean squared error                  0.3425
## Relative absolute error                 60.413  %
## Root relative squared error             77.7264 %
## Total Number of Instances            36171     
## 
## === Confusion Matrix ===
## 
##      a     b   <-- classified as
##  24963  1665 |     a = No
##   3462  6081 |     b = Yes
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  6876 
## 
##  
##                   | Predicted Superhosts 
## Actual Superhosts |        No |       Yes | Row Total | 
## ------------------|-----------|-----------|-----------|
##                No |      4659 |       365 |      5024 | 
##                   |     0.678 |     0.053 |           | 
## ------------------|-----------|-----------|-----------|
##               Yes |       676 |      1176 |      1852 | 
##                   |     0.098 |     0.171 |           | 
## ------------------|-----------|-----------|-----------|
##      Column Total |      5335 |      1541 |      6876 | 
## ------------------|-----------|-----------|-----------|
## 
## 
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   No  Yes
##        No  4659  676
##        Yes  365 1176
##                                          
##                Accuracy : 0.8486         
##                  95% CI : (0.8399, 0.857)
##     No Information Rate : 0.7307         
##     P-Value [Acc > NIR] : < 2.2e-16      
##                                          
##                   Kappa : 0.5938         
##                                          
##  Mcnemar's Test P-Value : < 2.2e-16      
##                                          
##             Sensitivity : 0.6350         
##             Specificity : 0.9273         
##          Pos Pred Value : 0.7631         
##          Neg Pred Value : 0.8733         
##              Prevalence : 0.2693         
##          Detection Rate : 0.1710         
##    Detection Prevalence : 0.2241         
##       Balanced Accuracy : 0.7812         
##                                          
##        'Positive' Class : Yes            
## 

Random Forest Algorithm

##                 Length Class  Mode     
## call                3  -none- call     
## type                1  -none- character
## predicted       36171  factor numeric  
## err.rate         1500  -none- numeric  
## confusion           6  -none- numeric  
## votes           72342  matrix numeric  
## oob.times       36171  -none- numeric  
## classes             2  -none- character
## importance         47  -none- numeric  
## importanceSD        0  -none- NULL     
## localImportance     0  -none- NULL     
## proximity           0  -none- NULL     
## ntree               1  -none- numeric  
## mtry                1  -none- numeric  
## forest             14  -none- list     
## y               36171  factor numeric  
## test                0  -none- NULL     
## inbag               0  -none- NULL     
## terms               3  terms  call
## 
##  
##    Cell Contents
## |-------------------------|
## |                       N |
## |         N / Table Total |
## |-------------------------|
## 
##  
## Total Observations in Table:  6876 
## 
##  
##                                     | superhost_randomForest_predict 
## superhost_subtest$host_is_superhost |        No |       Yes | Row Total | 
## ------------------------------------|-----------|-----------|-----------|
##                                  No |      5012 |        41 |      5053 | 
##                                     |     0.729 |     0.006 |           | 
## ------------------------------------|-----------|-----------|-----------|
##                                 Yes |       114 |      1709 |      1823 | 
##                                     |     0.017 |     0.249 |           | 
## ------------------------------------|-----------|-----------|-----------|
##                        Column Total |      5126 |      1750 |      6876 | 
## ------------------------------------|-----------|-----------|-----------|
## 
## 

PART 5: Making a Data-Driven Decision

Based on the model evaluations, we have determined that the ‘Random Forest’ algorithm outperformed all the other models tested.

We revisited the hypothesis and then referred to the Random Forest feature called ‘importance’ to look at the model’s strongest variables. The breakdown below is the Gini Index of all the dataset’s features in descending order of importance.

##                                  feature MeanDecreaseGini
## 1                  log_number_of_reviews      1902.849607
## 2                  log_reviews_per_month      1403.107873
## 3          cuberoot_review_scores_rating      1151.949485
## 4                        amenities_count       834.175902
## 5                log_host_listings_count       601.631644
## 6         calculated_host_listings_count       549.289563
## 7                       availability_365       533.870086
## 8                              log_price       522.865651
## 9                       log_cleaning_fee       463.861478
## 10                       availability_90       458.176886
## 11                       availability_60       425.546695
## 12                       availability_30       383.321351
## 13                  log_security_deposit       362.576043
## 14             review_scores_cleanliness       335.175444
## 15                      log_extra_people       315.250710
## 16                    log_minimum_nights       292.749317
## 17                review_scores_accuracy       275.235448
## 18                   review_scores_value       267.842379
## 19                       log_accommodate       255.051576
## 20                                  beds       202.411203
## 21                    host_response_rate       197.656280
## 22                       guests_included       185.722499
## 23                         log_bathrooms       158.628532
## 24          cancellation_policy_moderate       156.998636
## 25                          log_bedrooms       152.516979
## 26           review_scores_communication       149.028336
## 27                host_identity_verified       119.109564
## 28          cancellation_policy_flexible       116.849558
## 29                 review_scores_checkin       110.313383
## 30                      instant_bookable       109.120240
## 31                review_scores_location       103.964534
## 32               property_type_apartment       102.375983
## 33                   property_type_other        84.496325
## 34                   property_type_house        84.093639
## 35                     is_location_exact        82.458249
## 36            cancellation_policy_strict        80.394675
## 37          host_response_within_an_hour        77.265232
## 38                      requires_license        68.270094
## 39                     room_type_private        64.770366
## 40        host_response_within_few_hours        63.534209
## 41                 room_type_entire_home        62.387486
## 42           host_response_within_a_days        47.737785
## 43      require_guest_phone_verification        25.057622
## 44         require_guest_profile_picture        20.313194
## 45                      room_type_shared        19.571724
## 46                              bed_type        15.360340
## 47 host_response_within_few_days_or_more         6.364909

PART 6: Recommendation and Conclusion

RECOMMENDATION

Previously, we determined that the Random Forest Algorithm best determined whether a host is a Superhost.

The best-performing model shows the following features are most important or ‘significant’ in determining the Superhost status of a host:

  • Number of reviews
  • Reviews per month
  • Review Scores Rating
  • Amenities
  • Listings Count per Host
  • 365-Day Availability
  • Price
  • 90-Day Availability
  • 60-Day Availability
  • 30-Day Availability

Based on these findings, we propose the following:

  • Incorporate ‘availability’ as an additional threshold requirement for the Airbnb Superhost Program and/or..
  • Create a new teir of host bearing the ‘availability’ feature in mind

Loyal hosts that have been participating in the superhost program should be rightfully rewarded for their dedication to Airbnb, and this would be a great way to encourage and incentivize the host to be more available throughout the year.

FUTURE WORK

This study’s area of focus was soley on the Los Angeles, CA market. That being said, future work is more than necessary to create more of an argument in favor of the proposal. The following are a few areas where we can expand on this project:

  • Expand on the ‘Amenities’ feature and other similarly formatted features by parsing the text databy target words and create more dummy data, therefore creating more opportunities for expanded classification analysis
  • Run the same analysis on all the other cities in the United States whereever data is made available