Airbnb is a platform business that provides and guides an opportunity to link two groups - the hosts and the guests. Anybody with an open room or free space can become a host on Airbnb and offer it to global community. It is a good way to provide extra income with minimal effort. It is an easy way to advertise space because the platform has the traffic and a global user base to support it. Airbnb gives hosts an easy way to monetize a space that would otherwise be going to waste.
On the other side we have guests with a very specific needs - some might be seeking affordable accomodation close to the city attractions while others a luxurious apartment by the sea. They might be groups, families or individuals both local and foreign. After every visit guests have an opportunity to rate they stay and leave their feedback.
We will try to find out what contributes to the listing popularity and predict whether listing has potential to make into Top 100 most reviewed acommodations based on its attributes.
Since the data contains both current and historical listings we will measure their popularity based on total number of reviews received. We will rank all the listings and categorize them as is_top_100 true or false.
For that purpose publicly available Airbnb data will be used which can be sourced from Inside Airbnb website. The data covers all Barcelona listing details, customer reviews and associated geolocation information collected on 9th of June 2018 and is published in a form of csv files:
Copy of the data can be found at Inside Airbnb
Quick inspection of the files revealed that summary files contains only limited number of columns that are also available within details files therefore will not be used further. Also information contained within the calendar is already present in the listing details so we will not use it as well.
For data import, exploration and visiualisation we will use R language with additional packages. The below code will import all the csv files for further analysis and exploration. To preserve multiple languages ‘utf-8’ encoding will be used across the files.
lis_det <- read_csv('C:/Airbnb/listing_details.csv', guess_max = 10000)
## Parsed with column specification:
## cols(
## .default = col_character(),
## id = col_integer(),
## scrape_id = col_double(),
## last_scraped = col_date(format = ""),
## host_id = col_integer(),
## host_since = col_date(format = ""),
## host_listings_count = col_integer(),
## host_total_listings_count = col_integer(),
## latitude = col_double(),
## longitude = col_double(),
## accommodates = col_integer(),
## bathrooms = col_double(),
## bedrooms = col_integer(),
## beds = col_integer(),
## square_feet = col_integer(),
## guests_included = col_integer(),
## minimum_nights = col_integer(),
## maximum_nights = col_integer(),
## availability_30 = col_integer(),
## availability_60 = col_integer(),
## availability_90 = col_integer()
## # ... with 14 more columns
## )
## See spec(...) for full column specifications.
nb_geo <- geojson_read('C:/Airbnb/neighbourhoods.geojson', what = 'sp')
We will start with familiarizing ourselves with the columns in the dataset, to understand what each feature represents. This is important, because a poor understanding of the features could cause us to make mistakes in the data analysis and the modeling process. We will also try to reduce number of columns that either contained elsewhere or do not carry information that can be used to answer our questions.
The file contains all historical and active listings captured in Barcelona on the 9th of June 2018. We will inspect the file’s composition first.
dim(lis_det)
## [1] 17788 96
kable(lis_det[1:5,1:20]) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, font_size = 9) %>%
scroll_box(width = "910px", height = "400px")
id | listing_url | scrape_id | last_scraped | name | summary | space | description | experiences_offered | neighborhood_overview | notes | transit | access | interaction | house_rules | thumbnail_url | medium_url | picture_url | xl_picture_url | host_id |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
18666 | https://www.airbnb.com/rooms/18666 | 2.018071e+13 | 2018-07-10 | Flat with Sunny Terrace | Apartment located near the “Plaza de las Glorias” and the second-hand market (Encants). The accommodation is also close to the National Theatre of Catalunya and the Agbar Tower which has become one of the new symbols of Barcelona. Licence number: HUTB-(PHONE NUMBER HIDDEN) | Nice apartment situated on the penthouse floor of a building with elevator. Huge Living/dining-room with double sofa-bed 1 bedroom with two single beds 1 bedroom with double bed Nice kitchen opened to the living/dining-room and fully equipped for 6 people Bathroom with shower The accommodation has been recently renovated and tastefully decorated with a comfortable furniture and wood floor. Also it is equipped with air-conditioning and heating. | Apartment located near the “Plaza de las Glorias” and the second-hand market (Encants). The accommodation is also close to the National Theatre of Catalunya and the Agbar Tower which has become one of the new symbols of Barcelona. Licence number: HUTB-(PHONE NUMBER HIDDEN) Nice apartment situated on the penthouse floor of a building with elevator. Huge Living/dining-room with double sofa-bed 1 bedroom with two single beds 1 bedroom with double bed Nice kitchen opened to the living/dining-room and fully equipped for 6 people Bathroom with shower The accommodation has been recently renovated and tastefully decorated with a comfortable furniture and wood floor. Also it is equipped with air-conditioning and heating. Free Wifi - air conditioning. We will provide basic amenities like shower gel, shampoo,and hand soap. Also, 1 set of bed linen and towels per person will be included. We can provide you all kind of entrance and tickets for monuments and shows in Barcelona in order you avo | none | Apartment in Barcelona near to the Plaza de las Glorias, the old market (Encants), the Agbar Tower one of the new symbols of Barcelona and the Teatre Nacional de Catalunya. All kinds of services in surroundings (shops, supermarkets, restaurants, bars). | NA | Good transports connections, 50 m from the metro station “Clot” (line 1 and 2, red and purple), just 10 minutes by metro to Passeig de Gracia (La Pedrera, Casa Batllo), Plaza Catalunya and Las Ramblas. | Free Wifi - air conditioning. We will provide basic amenities like shower gel, shampoo,and hand soap. Also, 1 set of bed linen and towels per person will be included. | We can provide you all kind of entrance and tickets for monuments and shows in Barcelona in order you avoid queues and plan your trip in advance. Also we can organize shuttle from/to airport. All that you need for get a perfect stay in our nice city you only have to ask us. | Cleaning fee: 40 euros (to pay at arrival) Tourist tax at arrival: 2.48 Eur/person/night (to pay at arrival) Arrival and departure times: Check-in: 3 pm to 9 pm. Check-out: Before 11 am. Checking-in and checking-out times will be flexible and can be arranged provided there are not other bookings made prior or after your reservation. The client must always inform of the approximate arrival time. If the client arrives between 9 pm and 12 am, he/she shall pay a 30 euros extra charge. Arrivals after 12 am and until 2 am shall have an additional extra charge of 50 euros | NA | NA | https://a0.muscache.com/im/pictures/47f88bc6-6561-445a-beec-f8ec4ddc1038.jpg?aki_policy=large | NA | 71615 |
18674 | https://www.airbnb.com/rooms/18674 | 2.018071e+13 | 2018-07-10 | Huge flat for 8 people close to Sagrada Familia | 110m2 apartment to rent in Barcelona. Located in the Eixample district, near the Sagrada Familia. It has a small balcony where you can see the temple of Gaudi. Capacity for 8 people. Licence number: HUTB-002062 | Apartment with 110 m2 located in the 6th floor in a building with elevator Huge living/dinig-room 1 double bedrrom 1 bedroom with 2 single beds 1 bedroom with bunk beds Kitchen fully equipped for 8 people 1 bathroom with bathtub 1 small bathroom with shower balcony The accommodation has been recently renovated and tastefully decorated with a comfortable furniture and wood floor. Also it is equipped with heating, air conditioning and wifi. | 110m2 apartment to rent in Barcelona. Located in the Eixample district, near the Sagrada Familia. It has a small balcony where you can see the temple of Gaudi. Capacity for 8 people. Licence number: HUTB-002062 Apartment with 110 m2 located in the 6th floor in a building with elevator Huge living/dinig-room 1 double bedrrom 1 bedroom with 2 single beds 1 bedroom with bunk beds Kitchen fully equipped for 8 people 1 bathroom with bathtub 1 small bathroom with shower balcony The accommodation has been recently renovated and tastefully decorated with a comfortable furniture and wood floor. Also it is equipped with heating, air conditioning and wifi. Free Wifi - air conditioning. We will provide basic amenities like shower gel, shampoo,and hand soap. Also, 1 set of bed linen and towels per person will be included. We can provide you all kind of entrance and tickets for monuments and shows in Barcelona in order you avoid queues and plan your trip in advance. Also we can organize sh | none | Apartment in Barcelona located in the heart of Eixample district, within only 150 m form the great Sagrada Familia and really near of Gaudí Avenue and the famous Sant Pau Hospital . All kind of services in surroundings (shops, supermarkets, restaurants, bars). | NA | Good transport connection, 150 m from metro “Sagarda Familia” (L5 and L2) and within only 15 minutes you can arrive by metro to “Plaza Catalunya”, “Paseo de Gracia” and “Ciutat Vella”. Also there are bus stations in surroundings. | Free Wifi - air conditioning. We will provide basic amenities like shower gel, shampoo,and hand soap. Also, 1 set of bed linen and towels per person will be included. | We can provide you all kind of entrance and tickets for monuments and shows in Barcelona in order you avoid queues and plan your trip in advance. Also we can organize shuttle from/to airport. All that you need for get a perfect stay in our nice city you only have to ask us. | Tourist tax at arrival: 2.48 Eur/person/night (to pay at arrival) Arrival and departure times: Check-in: 3 pm to 9 pm. Check-out: Before 11 am. Checking-in and checking-out times will be flexible and can be arranged provided there are not other bookings made prior or after your reservation. The client must always inform of the approximate arrival time. If the client arrives between 9 pm and 12 am, he/she shall pay a 30 euros extra charge. Arrivals after 12 am and until 2 am shall have an additional extra charge of 50 euros. | NA | NA | https://a0.muscache.com/im/pictures/13031453/413cdbfc_original.jpg?aki_policy=large | NA | 71615 |
19157 | https://www.airbnb.com/rooms/19157 | 2.018071e+13 | 2018-07-10 | Great Place in Sagrada Familia, Bcn | We offer a Room in a very well located apartment and close to major attractions by metro, bus or walking. If you want an accesible place to discover the city, then this is a great option for you. The room is located in a very basic apartment, there are no luxuries. The apartment is not modern. If you are looking for great location, quiet place then this is a good option for you. Keep in mind that you will be sharing with me and another roommate and that you are not renting the whole flat. | Cozy, well located apartment located just two blocks from amazing Sagrada Familia Church. Our neighborhood is safe, lively during the day and quiet at night. Also, there are many nice restaurants, bars and local shops. The building has an elevator and It is two blocks away from two metro stations, plus there are plenty of bus stops nearby so you can get to any part of the city quickly. We are only 10 minutes away by Metro from Plaza Cataluña. The guests’ room is furnished with a double bed, open cupboard to put personal belongings and essentials ( toiletries, towels, maps, etc) and it also has a pine wooden wardrobe to put your clothes, two night tables, and reading lamp. ** Bed linen and towels are provided for your convenience, specially for those who want to travel light. >> Toiletries like shampoo, bath gel and conditioner are provided and left in the bathroom for guests use. >> Please note that the room is big enough for two people, although if you are used to bigger room sp | We offer a Room in a very well located apartment and close to major attractions by metro, bus or walking. If you want an accesible place to discover the city, then this is a great option for you. The room is located in a very basic apartment, there are no luxuries. The apartment is not modern. If you are looking for great location, quiet place then this is a good option for you. Keep in mind that you will be sharing with me and another roommate and that you are not renting the whole flat. Cozy, well located apartment located just two blocks from amazing Sagrada Familia Church. Our neighborhood is safe, lively during the day and quiet at night. Also, there are many nice restaurants, bars and local shops. The building has an elevator and It is two blocks away from two metro stations, plus there are plenty of bus stops nearby so you can get to any part of the city quickly. We are only 10 minutes away by Metro from Plaza Cataluña. The guests’ room is furnished with a double bed, open | none | The neighbourhood has a local and touristy balanced combination. The vicinity to Sagrada Familia and Hospital Sant Pau makes the area a major attraction without the massive amount of tourists that you will get in other parts of Barcelona. There are many local stores with very nice cafés and terraces where you can enjoy and get a feel of Barcelona’s Mediterranean lifestyle | Please take into account that there is no Air Conditioning in the apartment nor the room. The bathroom is shared, you can leave your amenities there for your convenience. Some guests have complaint about noise: we are quiet in the apartment, but neighbours sometimes are not quiet. So if you are too sensitive to noise, I honestly don´t recommend staying home. Please, once again, take into account that you are only renting a room not the whole apartment and please keep in mind that you will be sharing the apartment with me and my roommate. | Just two blocks from the apartment you have 2 metro stations: Hospital Sant Pau Station( Metro Blue Line) and Sagrada Familia Station (Blue and Purple line). By metro you are only 3 stops away from Ramblas and you can get there in 12-15 minutes. If you prefer to take the bus, then two blocks away you have several bus stops with lines 19, 45, 47 that will take to Plaza Cataluña, or line 92 that will take you to Glories Shopping Mall if you are in the mood for shopping. | Guests will have access to bathroom, kitchen (Please note that heavy cooking is not allowed). The living room is not shared. The bathroom and kitchen are to be shared. This means that you have to clean up after use. ***** The cleaning service is only provided before your stay to clean and arrange the room and the bathroom before your arrival. This means that during your stay you have to also collaborate on keeping common areas clean since there is no but there is no permanent cleaning during your stay. For this purpose cleaning material and gloves are permanently left in the bathroom*** - Shampoo and liquid soap are provided and left in the bathroom for those guests who travel light. - In the kitchen basic equipment is provided ( cups, dishes, bowls, cutlery) in case you want to prepare breakfast. Heavy cooking is not allowed. *** Kitchen premises must be left clean and tidy after every meal (this means doing the washing up, cleaning the stove, counter areas, and sweeping the flo | Due to working schedules, sometimes I cannot welcome the guests or be home. Check in will be coordinated though for guests arrival. Late check in is not a problem. I will be at guests disposal for questions and requests at their arrival and during their stay if we coincide. In case of any question, issue, need or recommendation feel free to use Airbnb chat or free instant messaging tools like wassup, telegram or iMessage and I will gladly get back to guest with recommendations or information requested. | CLEANING FEE: it is to cover the cleaning lady that cleans and organises your room, bathroom and kitchen before your arrival. It does not mean that there is a permanent cleaning service in the house. - Keeping things clean and tidy is also a guests responsibility in common shared areas. – Shall you need extra cleaning services please do not hesitate to mention it, the cleaning lady can be hired on your behalf for an extra fee of 12 x cleaning service. -KITCHEN: heavy cooking is not allowed. If you need to prepare breakfast or a snack that is fine. All the areas must be left clean and organized after use ( this means doing the washing up of used utensils, leaving stove, counters and floor clean). All utensils used will be left clean and tidy. No food or beverage is included in the use of the kitchen, so you will have to buy all your food/ drink products. *** CHECK-IN is flexible and this means that it will depend on your arrival schedule and my working schedule availability. I | NA | NA | https://a0.muscache.com/im/pictures/10556089/29e5de9f_original.jpg?aki_policy=large | NA | 73099 |
20345 | https://www.airbnb.com/rooms/20345 | 2.018071e+13 | 2018-07-10 | 2 Double rooms for 4 persons, WI-FI | HOME SHARING!!! Hello everybady! My name is Mila, hospitality is the best feature of me, you will feel at home as in yours. I am very respectful, sociable, calm and friendly person. I have hosted guests since 2010, I have a lot of experience and patience. I will be happy to host you and share my home with you! You are WELCOME !! WELCOME!!! WILKOMMEN !!! BIENBENUE !!! | Apartment (3 bedrooms, living room, kitchen and bathroom) is ideal for Groups of 4 adults or for families with children. Rent 2 BEDROOMS (large and medium) in the comfortable apartment, furnished, clean, bright and quiet, WI-FI is available. One BEDROOM is large, has air conditioning, has 2 single beds (90x190) with a nice window, you can see the far hills and the gardens of the neighborhood, big closet, night tables, radio. Another BEDROOM is medium with a nice window, it is ideal for couples or two children. It has 1 double bed (135x190), closet, night table. LIVING ROOM has TV, DVD, CD player and a pretty balcony. Breakfast is included and you are welcome to use the kitchen to cook. Attention for families with young children! Children under 6! Apply the 30% off after communicating his age. For stays longer do the 20% discount. The price of 450 for a month for a person applies from November to April, except the week of Christmas and Easter. ADDITIONAL PAYMENTS: 20 BY CLEAN | HOME SHARING!!! Hello everybady! My name is Mila, hospitality is the best feature of me, you will feel at home as in yours. I am very respectful, sociable, calm and friendly person. I have hosted guests since 2010, I have a lot of experience and patience. I will be happy to host you and share my home with you! You are WELCOME !! WELCOME!!! WILKOMMEN !!! BIENBENUE !!! Apartment (3 bedrooms, living room, kitchen and bathroom) is ideal for Groups of 4 adults or for families with children. Rent 2 BEDROOMS (large and medium) in the comfortable apartment, furnished, clean, bright and quiet, WI-FI is available. One BEDROOM is large, has air conditioning, has 2 single beds (90x190) with a nice window, you can see the far hills and the gardens of the neighborhood, big closet, night tables, radio. Another BEDROOM is medium with a nice window, it is ideal for couples or two children. It has 1 double bed (135x190), closet, night table. LIVING ROOM has TV, DVD, CD player and a pretty balcony. | none | En la misma plaza hay parque y una placita para los juegos de los niños. El passeig Fabra i Puig es una zona comercial donde se encuentran muchos restaurantes, supermercados y otras tiendas. En 7-10 min. a pie hay un centro comercial " Herron City“, Mercadona, El”Cort Ingles“, piscina y gimnacios. En 4-5 min. a pie esta estación de trenes y busos”Sant Andreu Arenal" y entrada al metro. | Cleaning-15. Cleaning during stay for guests desire, the price can be negotiated. Price for used washing machine - 10 Check- In or Check- Out of 23:00 until 7:00 pm. -15 Penalty for delay, input at the time had not foreseen - 15 | DIRECTIONS to the apartment from the AIRPORT. Transport is cheap, fast and modern. From “El Prat” airport to the apartment you can get: 1. Now with subway line N9!!! 2. take a green bus free from Terminal 1 to the Train Station Commuter (Terminal 2) and take train on R2-Nord to “CLOT-ARAGO”, where he will transfer to the station “CLOT” Metro (line 1, red) to “Fabra I Puig” station (direction “FONDO”), get off at “Fabra I Puig” and leave the last wagon on the right to the stairs or take the acensor in the rear of the train, cross the Meridiana avenue , riding down the street ESCOCIA and walk about 3-4 min. to find PLAZA GARRIGO. The trip takes about 40 minutos. 3. Or take bus L46 Plaza Spain (or night bus to Plaza Catalunya N17 ), where you have to take the metro (line 1, red) directly to “Fabra I Puig”. The time on the road depends on traffic. | DIRECTIONS TO MILA´s HOUSE. AIRPORT TRANSPORTATION is inexpensive, fast and modern. From the airport “el PRAT” : 1. Now with subway, line N9!!! 2. Or you can arrive to MILA´s apartment taking a free green bus from Terminal 1 to the Rodalies train station ( Terminal 2) and take the R2-NORD, until station “CLOT- ARAGO”, where you will transfer to the metro station “CLOT” (red line) until station “FABRA I PUIG” ( heading to “FONDO”) get off at “FABRA I PUIG” and exit to the stairs at the rear of the train, cross the Meridian Avenue, look for the street ESCOCIA and walk until you find PLAZA GARRIGO. The trip will take around 40 minits. 3. Or take the L46 bus to Plaza España (or nite bus 17 to Plaza Catalunya) where you will transfer to the metro (red line) direct to “FABRA I PUIG”. The time on the road depends on traffic is. Intercom: call button “cinque-2”, (that is written in catalan) … Plant 5, Gate 2. Llamar por interfono: el botón “cinque-2” (quinta planta, puerta 2) Intercom-Anru | In general …I will be present during your stay and help my guests depends on your needs | RESPECT! GOOD CHEER AND GOOD SENSE OF HUMOR! Please pray: You can smoke on the balcony! You can not make too much noise! | NA | NA | https://a0.muscache.com/im/pictures/623060/17f03910_original.jpg?aki_policy=large | NA | 76809 |
25786 | https://www.airbnb.com/rooms/25786 | 2.018071e+13 | 2018-07-10 | NICE ROOM AVAILABLE IN THE HEART OF GRACIA | JUST GO THROUGH THE MANY REVIEWS I GOT THROUGH THE YEARS, NO BETTER FEEDBACK THAN THAT. WELCOME. | Room available for rent.- PEDRO PEREZ. Shared with a Catalan male aged 38, Ayurvedic massage therapist and Yoga practitioner. Looking for people non-smoking, enthusiastic willing to share more than just the space in a centric beautiful flat in PLaça Vila de Gracia. i am very flexible you can use anything in the house feel free to ask anything! The neighborhood is really special you could live here and not needing anything from outside, such an experience, just 100 years ago was a village in the outskirts of barcelona, we do have our own cultural program throughout the year, very Catalan place. The area is full of bohemians, artisans and modern artists. Most of the area has been taken over by us over the past 10 years making it a mix between the past and the present-future. Metro stations around are: Diagonal L3-L5, Fontana L3, Joanic L4, 10-15 minutes walking to city center Ramblas. Separate Wardrobe room available Kitchen and bathroom shared Bills included available for renti | JUST GO THROUGH THE MANY REVIEWS I GOT THROUGH THE YEARS, NO BETTER FEEDBACK THAN THAT. WELCOME. Room available for rent.- PEDRO PEREZ. Shared with a Catalan male aged 38, Ayurvedic massage therapist and Yoga practitioner. Looking for people non-smoking, enthusiastic willing to share more than just the space in a centric beautiful flat in PLaça Vila de Gracia. i am very flexible you can use anything in the house feel free to ask anything! The neighborhood is really special you could live here and not needing anything from outside, such an experience, just 100 years ago was a village in the outskirts of barcelona, we do have our own cultural program throughout the year, very Catalan place. The area is full of bohemians, artisans and modern artists. Most of the area has been taken over by us over the past 10 years making it a mix between the past and the present-future. Metro stations around are: Diagonal L3-L5, Fontana L3, Joanic L4, 10-15 minutes walking to city center Ramblas. S | none | Solo decir que a menudo ni salgo del barrio. Muy entretenido con sus gentes y lugares. | No dudes en perdir una cita para un masaje relajante o terapeutico. Masaje ayurvedico y tailandés disponibles. *Airport car service available for 25 one way | Metro con las principales estaciones. A pie. El autobús. Bicicletas para alquilar. Si miras el mapa de Barcelona veras esta justo en el corazón | All access with respect. Kitchen facilities need permission. Feel free to ask. Avoid Noise after midnight and early.morning | Available for interaction. Nonproblem feel free to ask. | Clean Bathroom after use and quick shower At all times. avoid noise early mornings and after mindnight. Clean and tidy room. Communication is essential. Ask permission to use kitchen facilities, cooking, washing machine and fridge. Weekends are a bit noisy. This is an essential part of this famous area. SUNDAY TO THURSDAY IS FINE. i follow these rules strictly myself. | NA | NA | https://a0.muscache.com/im/pictures/3a27896a-95ce-4d69-9fc4-39116ed3dd9c.jpg?aki_policy=large | NA | 108310 |
Out of the first 20 columns we will keep the following:
And remove all the below:
kable(lis_det[1:5,21:40]) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, font_size = 9) %>%
scroll_box(width = "910px", height = "400px")
host_url | host_name | host_since | host_location | host_about | host_response_time | host_response_rate | host_acceptance_rate | host_is_superhost | host_thumbnail_url | host_picture_url | host_neighbourhood | host_listings_count | host_total_listings_count | host_verifications | host_has_profile_pic | host_identity_verified | street | neighbourhood | neighbourhood_cleansed |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
https://www.airbnb.com/users/show/71615 | Mireia And Maria | 2010-01-19 | Barcelona, Cataluña, Spain |
We are Mireia (39) & Maria (41), two multilingual entrepreneurs loving Barcelona and having big experience in the touristic market. In our apartments you are going feel youself like at home. The location of our flats perfectly suites for travelling and sightseeing. We are looking forward to sincerely host you in our apartments. |
within an hour | 99% | N/A | f | https://a0.muscache.com/im/users/71615/profile_pic/1426612511/original.jpg?aki_policy=profile_small | https://a0.muscache.com/im/users/71615/profile_pic/1426612511/original.jpg?aki_policy=profile_x_medium | El Camp de l’Arpa del Clot | 50 | 50 | [‘email’, ‘phone’, ‘reviews’, ‘jumio’, ‘government_id’] | t | t | Barcelona, CT, Spain | El Camp de l’Arpa del Clot | el Camp de l’Arpa del Clot |
https://www.airbnb.com/users/show/71615 | Mireia And Maria | 2010-01-19 | Barcelona, Cataluña, Spain |
We are Mireia (39) & Maria (41), two multilingual entrepreneurs loving Barcelona and having big experience in the touristic market. In our apartments you are going feel youself like at home. The location of our flats perfectly suites for travelling and sightseeing. We are looking forward to sincerely host you in our apartments. |
within an hour | 99% | N/A | f | https://a0.muscache.com/im/users/71615/profile_pic/1426612511/original.jpg?aki_policy=profile_small | https://a0.muscache.com/im/users/71615/profile_pic/1426612511/original.jpg?aki_policy=profile_x_medium | El Camp de l’Arpa del Clot | 50 | 50 | [‘email’, ‘phone’, ‘reviews’, ‘jumio’, ‘government_id’] | t | t | Barcelona, CT, Spain | La Sagrada Família | la Sagrada Família |
https://www.airbnb.com/users/show/73099 | Urania | 2010-01-24 | Barcelona, Cataluña, Spain |
Hi there, We love art, music and gastronomy, not to mention travelling and giving tips to our visitors so they can discover Barcelona. We have decorated the apartment so it reflects the spirit of a city like Barcelona: a mix between the Mediterranean and Europe, cosmopolitan and local at the same time, where people from all cultures are welcomed to leave their imprint with creativity and where you can feel like at home. We are very independent and leave a lot of space to our guests so they can keep their privacy. |
within an hour | 90% | N/A | f | https://a0.muscache.com/im/users/73099/profile_pic/1281190747/original.jpg?aki_policy=profile_small | https://a0.muscache.com/im/users/73099/profile_pic/1281190747/original.jpg?aki_policy=profile_x_medium | la Sagrada Família | 2 | 2 | [‘email’, ‘phone’, ‘reviews’] | t | f | Barcelona, CT, Spain | La Sagrada Família | la Sagrada Família |
https://www.airbnb.com/users/show/76809 | Mila | 2010-02-02 | Barcelona, Cataluña, Spain | NA | within a few hours | 100% | N/A | f | https://a0.muscache.com/im/users/76809/profile_pic/1265162222/original.jpg?aki_policy=profile_small | https://a0.muscache.com/im/users/76809/profile_pic/1265162222/original.jpg?aki_policy=profile_x_medium | Vilapicina i la Torre Llobeta | 1 | 1 | [‘phone’, ‘facebook’, ‘reviews’, ‘jumio’] | t | f | Barcelona, Catalonia, Spain | Vilapicina i la Torre Llobeta | Vilapicina i la Torre Llobeta |
https://www.airbnb.com/users/show/108310 | Pedro | 2010-04-14 | Barcelona, Catalonia, Spain |
Hola! as i say in my add i look for enthusiastic people willing to share things, experiences not just coming to Barcelona and sightseeing. Of course if your option is so, Go ahead. Many people has come up to my place so far through airbnb and the experience has been great. Let me introduce myself! I think when i was born i had the force to travel and meet people, this is my goal in life! I studied Photography for 3 years and when the degree was finished i had the urge to travel and so i did until Today!! Half way i was in India and met and ayurvedic Massage Master i was totally impressed by so i became his disciple until today. I now make my living out of massage therapy and when i can i travel back there to keep going my studies in new techniques and philosophy of the indian traditions on reality, such an amazing country. I am flexible and not only centered to one thing, that is why when airbnb came i thought it would be amazing being able to share my place, get some extra money “of course” and at the same time being able to host people from all over the world. So far so many people has come over and made new friends. Some people did not want to talk much but appreciated the location and the experience, cause at the end of the day it is quite revolutionary that we can do this. thanks to the guys of airbnb.!! You can meet me at my apartment, feel free! Pedro |
within an hour | 100% | N/A | f | https://a0.muscache.com/im/pictures/user/43199285-d4a5-412d-8a06-5c91efb78042.jpg?aki_policy=profile_small | https://a0.muscache.com/im/pictures/user/43199285-d4a5-412d-8a06-5c91efb78042.jpg?aki_policy=profile_x_medium | Vila de Gràcia | 1 | 1 | [‘email’, ‘phone’, ‘reviews’, ‘jumio’, ‘offline_government_id’, ‘selfie’, ‘government_id’, ‘identity_manual’] | t | t | Barcelona, Barcelona, Spain | Vila de Gràcia | la Vila de Gràcia |
From the next 20 columns we will keep the following:
And remove all the below:
kable(lis_det[1:5,41:60]) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, font_size = 9) %>%
scroll_box(width = "910px", height = "400px")
neighbourhood_group_cleansed | city | state | zipcode | market | smart_location | country_code | country | latitude | longitude | is_location_exact | property_type | room_type | accommodates | bathrooms | bedrooms | beds | bed_type | amenities | square_feet |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Sant Martí | Barcelona | CT | 08026 | Barcelona | Barcelona, Spain | ES | Spain | 41.40889 | 2.185545 | t | Apartment | Entire home/apt | 6 | 1 | 2 | 4 | Real Bed | {TV,Internet,Wifi,“Air conditioning”,“Wheelchair accessible”,Kitchen,Elevator,“Free street parking”,Heating,“Family/kid friendly”,Washer,Dryer,Essentials,Shampoo,“Hair dryer”,“Hot water”,“Host greets you”,“Paid parking on premises”} | 75 |
Eixample | Barcelona | CT | 08025 | Barcelona | Barcelona, Spain | ES | Spain | 41.40420 | 2.173058 | t | Apartment | Entire home/apt | 8 | 2 | 3 | 6 | Real Bed | {TV,Internet,Wifi,“Air conditioning”,“Wheelchair accessible”,Kitchen,Elevator,“Free street parking”,“Buzzer/wireless intercom”,Heating,“Family/kid friendly”,Washer,Essentials,Shampoo,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,Crib,“Hot water”,“Host greets you”,“Paid parking on premises”} | NA |
Eixample | Barcelona | CT | 08025 | Barcelona | Barcelona, Spain | ES | Spain | 41.40793 | 2.174540 | t | Apartment | Private room | 2 | 1 | 1 | 1 | Real Bed | {Internet,Wifi,Kitchen,“Smoking allowed”,Elevator,“First aid kit”,Essentials,Shampoo,“translation missing: en.hosting_amenity_49”,“translation missing: en.hosting_amenity_50”} | NA |
Nou Barris | Barcelona | Catalonia | 08016 | Barcelona | Barcelona, Spain | ES | Spain | 41.42950 | 2.181558 | t | Apartment | Private room | 4 | 1 | 2 | 3 | Real Bed | {TV,Wifi,“Air conditioning”,Kitchen,“Paid parking off premises”,“Pets allowed”,Breakfast,Elevator,“Free street parking”,“Buzzer/wireless intercom”,Heating,“Family/kid friendly”,Washer,Dryer,Essentials,Shampoo,“24-hour check-in”,Hangers,“Hair dryer”,Iron,Crib,“Room-darkening shades”,“Hot water”,“Host greets you”} | 732 |
Gràcia | Barcelona | Barcelona | 08012 | Barcelona | Barcelona, Spain | ES | Spain | 41.40145 | 2.156446 | t | Apartment | Private room | 2 | 1 | 1 | 1 | Real Bed | {TV,Wifi,“Air conditioning”,Kitchen,Elevator,Heating,“Family/kid friendly”,Washer,“Fire extinguisher”,Essentials,Shampoo,“Lock on bedroom door”,Hangers,“Hair dryer”,“Hot water”,“Luggage dropoff allowed”} | NA |
From the next 20 columns we will keep the following:
And remove all the below:
kable(lis_det[1:5,61:80]) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, font_size = 9) %>%
scroll_box(width = "910px", height = "200px")
price | weekly_price | monthly_price | security_deposit | cleaning_fee | guests_included | extra_people | minimum_nights | maximum_nights | calendar_updated | has_availability | availability_30 | availability_60 | availability_90 | availability_365 | calendar_last_scraped | number_of_reviews | first_review | last_review | review_scores_rating |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
$130.00 | NA | NA | $150.00 | $42.00 | 2 | $25.00 | 3 | 730 | today | t | 0 | 0 | 0 | 61 | 2018-07-10 | 1 | 2015-10-10 | 2015-10-10 | 80 |
$140.00 | NA | NA | $150.00 | $50.00 | 2 | $30.00 | 1 | 1125 | today | t | 3 | 17 | 47 | 132 | 2018-07-10 | 5 | 2013-05-27 | 2018-06-18 | 85 |
$30.00 | $185.00 | $580.00 | NA | $20.00 | 2 | $15.00 | 2 | 180 | 4 weeks ago | t | 1 | 2 | 12 | 96 | 2018-07-10 | 165 | 2010-08-18 | 2018-06-23 | 89 |
$25.00 | NA | $450.00 | $100.00 | $20.00 | 1 | $20.00 | 2 | 365 | 3 weeks ago | t | 11 | 40 | 70 | 345 | 2018-07-10 | 72 | 2010-06-16 | 2018-06-17 | 84 |
$42.00 | NA | NA | NA | NA | 1 | $31.00 | 1 | 730 | 3 weeks ago | t | 16 | 31 | 52 | 94 | 2018-07-10 | 191 | 2010-08-11 | 2018-07-06 | 95 |
From the next 20 columns we will keep the following:
And remove all the below:
kable(lis_det[1:5,81:96]) %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed", "responsive"), full_width = F, font_size = 9) %>%
scroll_box(width = "910px", height = "200px")
review_scores_accuracy | review_scores_cleanliness | review_scores_checkin | review_scores_communication | review_scores_location | review_scores_value | requires_license | license | jurisdiction_names | instant_bookable | is_business_travel_ready | cancellation_policy | require_guest_profile_picture | require_guest_phone_verification | calculated_host_listings_count | reviews_per_month |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | 10 | 2 | 10 | 10 | 8 | t | HUTB-003004 | NA | f | f | flexible | f | f | 29 | 0.03 |
9 | 10 | 10 | 10 | 9 | 9 | t | HUTB-002062 | NA | t | f | strict_14_with_grace_period | f | f | 29 | 0.08 |
9 | 9 | 10 | 10 | 10 | 9 | t | NA | NA | t | f | strict_14_with_grace_period | f | f | 2 | 1.72 |
8 | 9 | 9 | 9 | 9 | 8 | t | NA | NA | f | f | strict_14_with_grace_period | t | t | 1 | 0.73 |
9 | 9 | 10 | 10 | 10 | 9 | t | NA | NA | t | f | moderate | t | t | 1 | 1.98 |
From the final set of columns we will keep the following:
And remove all the below:
We will now exctract all the column of interest and store it inseparate data frame - lis_det_sel. Also we will rename id column to make it consistent with other files (“listing_id”).
lis_det_sel <- lis_det[c('id', 'last_scraped', 'host_name', 'host_since', 'host_location', 'host_about', 'host_is_superhost', 'host_has_profile_pic', 'host_identity_verified', 'neighbourhood_cleansed', 'neighbourhood_group_cleansed', 'latitude', 'longitude', 'property_type', 'room_type', 'accommodates', 'bathrooms', 'bedrooms', 'beds', 'bed_type', 'amenities', 'price', 'security_deposit', 'cleaning_fee', 'guests_included', 'extra_people', 'minimum_nights', 'first_review', 'last_review', 'number_of_reviews', 'review_scores_accuracy', 'review_scores_cleanliness', 'review_scores_checkin', 'review_scores_communication', 'review_scores_location', 'review_scores_value', 'instant_bookable', 'cancel_policy' = 'cancellation_policy', 'require_guest_profile_picture', 'require_guest_phone_verification', 'calculated_host_listings_count')]
names(lis_det_sel)[1] <- "listing_id"
We will now perform data cleansing and pre-processing to prepare the data for further analysis.
First we will remove all the listings that have number of reviews equal to 0 as most of these will be NA.
lis_det_sel <- subset(lis_det_sel, number_of_reviews > 0)
dim(lis_det_sel)
## [1] 14764 41
This will reduce number of records to 14764.
Secondly we will input 0 where value is not explicitly specified within the listing for the following columns:
And convert host_about from NA to empty string to allow character count.
lis_det_sel$security_deposit[is.na(lis_det_sel$security_deposit)] <- 0
lis_det_sel$cleaning_fee[is.na(lis_det_sel$cleaning_fee)] <- 0
lis_det_sel$host_about[is.na(lis_det_sel$host_about)] <- ''
And lastly we will drop all the records that have any remaining NAs as we will not be able to perfom further calculations. This include missing review scores, host_since and first_review date.
lis_det_sel <- subset(lis_det_sel, rowSums(is.na(lis_det_sel))==0)
dim(lis_det_sel)
## [1] 14416 41
This will leave us with 14414 records out of 17788 original records.
We will now apply cleansing / conversion to below fields:
lis_det_sel$host_is_superhost <- as.numeric(ifelse(lis_det_sel$host_is_superhost == 't', 1, 0))
lis_det_sel$host_has_profile_pic <- as.numeric(ifelse(lis_det_sel$host_has_profile_pic == 't', 1, 0))
lis_det_sel$host_identity_verified <- as.numeric(ifelse(lis_det_sel$host_identity_verified == 't', 1, 0))
lis_det_sel$price <- lis_det_sel$price %>% str_extract_all("\\(?[0-9,.]+\\)?") %>% gsub(",", "", .) %>% as.numeric()
lis_det_sel$security_deposit <- lis_det_sel$security_deposit %>% str_extract_all("\\(?[0-9,.]+\\)?") %>% gsub(",", "", .) %>% as.numeric()
lis_det_sel$cleaning_fee <- lis_det_sel$cleaning_fee %>% str_extract_all("\\(?[0-9,.]+\\)?") %>% gsub(",", "", .) %>% as.numeric()
lis_det_sel$extra_people <- lis_det_sel$extra_people %>% str_extract_all("\\(?[0-9,.]+\\)?") %>% gsub(",", "", .) %>% as.numeric()
lis_det_sel$instant_bookable <- as.numeric(ifelse(lis_det_sel$instant_bookable == 't', 1, 0))
lis_det_sel$require_guest_profile_picture <- as.numeric(ifelse(lis_det_sel$require_guest_profile_picture == 't', 1, 0))
lis_det_sel$require_guest_phone_verification <- as.numeric(ifelse(lis_det_sel$require_guest_phone_verification == 't', 1, 0))
And we can add following calculated fields:
lis_det_sel <- lis_det_sel %>%
mutate(listing_duration = as.numeric(difftime(lis_det_sel$last_scraped, lis_det_sel$first_review, unit = 'days')), hosting_duration = as.numeric(difftime(lis_det_sel$last_scraped, lis_det_sel$host_since, unit = 'days')), host_local = as.numeric(str_detect(host_location, 'barcelona|Barcelona')), host_about_len = ifelse(is.na(host_about), 0, nchar(host_about)), total_amenities = ifelse(nchar(amenities)>2, str_count(amenities, ',')+1, 0),price_per_person = price / accommodates)
lis_det_sel$is_top_100 <- ifelse(rank(-lis_det_sel$number_of_reviews) <= 100, 1, 0)
And convert categorical values into dummy variables:
lis_det_sel$neighbourhood_group_cleansed <- str_replace_all(lis_det_sel$neighbourhood_group_cleansed, "[^[:alnum:]]", "_")
lis_det_sel$property_type <- str_replace_all(lis_det_sel$property_type, "[^[:alnum:]]", "_")
lis_det_sel$room_type <- str_replace_all(lis_det_sel$room_type, "[^[:alnum:]]", "_")
lis_det_sel$bed_type <- str_replace_all(lis_det_sel$bed_type, "[^[:alnum:]]", "_")
lis_det_sel$neighbourhood_group_cleansed <- str_replace_all(lis_det_sel$neighbourhood_group_cleansed, "[^[:alnum:]]", "_")
lis_det_sel$property_type <- str_replace_all(lis_det_sel$property_type, "[^[:alnum:]]", "_")
lis_det_sel$room_type <- str_replace_all(lis_det_sel$room_type, "[^[:alnum:]]", "_")
lis_det_sel$bed_type <- str_replace_all(lis_det_sel$bed_type, "[^[:alnum:]]", "_")
nb_group_dummy <- dummy(lis_det_sel$neighbourhood_group_cleansed, sep = "_")
lis_det_sel <- cbind(lis_det_sel, nb_group_dummy)
property_type_dummy <- dummy(lis_det_sel$property_type, sep = "_")
lis_det_sel <- cbind(lis_det_sel, property_type_dummy)
room_type_dummy <- dummy(lis_det_sel$room_type, sep = "_")
lis_det_sel <- cbind(lis_det_sel, room_type_dummy)
bed_type_dummy <- dummy(lis_det_sel$bed_type, sep = "_")
lis_det_sel <- cbind(lis_det_sel, bed_type_dummy)
cancellation_policy_dummy <- dummy(lis_det_sel$cancellation_policy, sep = "_")
lis_det_sel <- cbind(lis_det_sel, cancellation_policy_dummy)
```
We will now create two dataframes wich are subsets of lis_det_sel:
In both cases we will drop number_of_reviews column as is already contained within is_top_100.
lis_det_clean <- lis_det_sel[, c(7:9, 11:11, 14:15, 17:20, 23:27, 31:48)]
lis_det_clean_dummy <- lis_det_sel[, c(7:9, 17:19, 23:27, 31:37, 39:96)]
We will now visualise distribution of all the features broken down on popularity.
We will use barchart that will show us relative density grouped by our target value.
discrete <- c("host_is_superhost", "host_has_profile_pic", "host_identity_verified", "instant_bookable", "require_guest_profile_picture", "require_guest_phone_verification", "host_local")
for (colname in discrete) {
temp <- subset(lis_det_clean, is_top_100 == 1)
temp <- temp %>%
group_by(is_top_100, temp[,colname]) %>%
summarise(density = n()/nrow(.))
colnames(temp)[2] <- colname
temp1 <- subset(lis_det_clean, is_top_100 == 0)
temp1 <- temp1 %>%
group_by(is_top_100, temp1[,colname]) %>%
summarise(density = n()/nrow(.))
colnames(temp1)[2] <- colname
temp2 <- rbind(temp, temp1)
plot <- ggplot(data=temp2, aes(x=as.factor(temp2[[colname]]), y=density, fill=as.factor(is_top_100))) +
geom_bar(position = 'dodge', stat='identity') + labs(fill = "is_top_100", x = colname,
title = paste(colname, " relative density grouped by is_top_100")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
print(plot)
}
Based on the above barcharts we can conclude that:
We will again use barchart that will show us relative density grouped by our target value.
categorical <- c("neighbourhood_group_cleansed", "property_type", "room_type", "bed_type", "cancellation_policy")
for (colname in categorical) {
temp <- subset(lis_det_clean, is_top_100 == 1)
temp <- temp %>%
group_by(is_top_100, temp[,colname]) %>%
summarise(density = n()/nrow(.))
colnames(temp)[2] <- colname
temp1 <- subset(lis_det_clean, is_top_100 == 0)
temp1 <- temp1 %>%
group_by(is_top_100, temp1[,colname]) %>%
summarise(density = n()/nrow(.))
colnames(temp1)[2] <- colname
temp2 <- rbind(temp, temp1)
plot <- ggplot(data=temp2, aes(x=temp2[[colname]], y=density, fill=as.factor(is_top_100))) +
geom_bar(position = 'dodge', stat='identity') + labs(fill = "is_top_100", x = colname,
title = paste(colname, " relative density grouped by is_top_100")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
print(plot)
}
Based on the above barcharts we can conclude that:
For continous values we will use a box plot to better understand data distribution between groups.
continous <- c("bathrooms", "bedrooms", "beds", "price_per_person", "security_deposit", "cleaning_fee", "guests_included", "extra_people", "minimum_nights", "review_scores_accuracy", "review_scores_cleanliness", "review_scores_checkin", "review_scores_communication", "review_scores_location", "review_scores_value", "calculated_host_listings_count", "listing_duration", "hosting_duration", "host_about_len", "total_amenities")
for (colname in continous) {
plot <- ggplot(data=lis_det_clean, aes(x=as.factor(is_top_100), y=lis_det_clean[[colname]])) +
geom_boxplot(fill="lightblue") + labs(x = "is_top_100", y = colname,
title = paste(colname, " grouped by is_top_100")) +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
scale_y_continuous(limits = quantile(lis_det_clean[[colname]], c(0.1, 0.9)))
print(plot)
}
Based on the above boxplots we can conclude that:
We will now identify and remove near-zero variance predictors using the follwoing code. This data will be then used in our predictive model.
nzv <- nearZeroVar(select(lis_det_clean, -is_top_100))
lis_det_clean_nzv <- lis_det_clean[, -nzv]
nzv_dummy <- nearZeroVar(select(lis_det_clean_dummy, -is_top_100))
lis_det_clean_dummy_nzv <- lis_det_clean_dummy[, -nzv_dummy]
By applying nearZeroVar function we have reduced our datasets to 28 and 36 columns including our target.
Neighbourhoods GEOJSON file contains full list of Barcelona neighbourhoods with geospatial data that we will use to visualise information on the map. WE will use Leaflet R package and display listings from both groups using lat/long information coming from listing details dataset. This will give us an idea of geographical distribution with Red points being in Top 100 most popular listings.
other <- lis_det_sel %>%
filter(is_top_100 == 0)
top_100 <- lis_det_sel %>%
filter(is_top_100 == 1)
leaflet() %>% setView(lng = 2.154007, lat = 41.390205, zoom = 12) %>%
addTiles() %>%
addPolygons(data = nb_geo, color = "#444444", weight = 2, opacity = 1) %>%
addCircleMarkers( lng = other$longitude,
lat = other$latitude,
radius = 2,
stroke = FALSE,
color = "blue",
fillOpacity = 0.5,
group = "Other"
) %>%
addCircleMarkers( lng = top_100$longitude,
lat = top_100$latitude,
radius = 3,
stroke = FALSE,
color = "red",
fillOpacity = 0.9,
group = "Top 100"
)
We will now try to build our initial model based on the data we have created in prior steps. Before we start we will try to remove highly correlated features. This is because the highly correlated features are voted for twice in the model, over inflating their importance.
descrCor <- cor(lis_det_clean_dummy_nzv)
highlyCorrelated <- findCorrelation(descrCor, cutoff=0.7)
highlyCorCol <- colnames(lis_det_clean_dummy_nzv)[highlyCorrelated]
highlyCorCol
## [1] "bedrooms"
lis_det_clean_dummy_nzv_uncor <- lis_det_clean_dummy_nzv[, -which(colnames(lis_det_clean_dummy_nzv) %in% highlyCorCol)]
Unsurprisingly beds and bedrooms are highly correlated with number of beds available.These variables have been removed from the new dataset.
We will now split our data into traning and test data with 60/40 split then use Naive Bayes method to do predictions.
set.seed(132)
nb_sub <- sample(nrow(lis_det_clean_dummy_nzv_uncor), floor(nrow(lis_det_clean_dummy_nzv_uncor) * 0.6))
nb_train <- lis_det_clean_dummy_nzv_uncor[nb_sub, ]
nb_test <- lis_det_clean_dummy_nzv_uncor[-nb_sub, ]
nb <- naiveBayes(as.factor(is_top_100) ~ ., data = nb_train)
nb_prediction <- predict(nb, nb_test)
nb_conf <- table(nb_test$is_top_100, nb_prediction)
print(nb_conf)
## nb_prediction
## 0 1
## 0 2943 2789
## 1 1 34
nb_accuracy <- sum(diag(nb_conf))/sum(nb_conf)
print(nb_accuracy)
## [1] 0.5162129
nb_precision <- nb_conf[2,2] / (nb_conf[2,2] + nb_conf[2,1])
print(nb_precision)
## [1] 0.9714286
nb_recall <- nb_conf[2,2] / (nb_conf[2,2] + nb_conf[1,2])
print(nb_recall)
## [1] 0.01204392
nb_roc <- performance(prediction(as.numeric(nb_prediction), as.numeric(nb_test$is_top_100)), "tpr", "fpr")
plot(nb_roc, colorize=TRUE)
abline(0, 1, lty = 2)
As we can see our model struggles to correctly predict almost 48% non Top 100 listings within the test sample but predicts very well Top 100 listings with 97 % accuracy. Recall for our model is only 12%. This is also well represented on the ROC plot.
We will now try to build our secondary model based on the data we have created in prior steps. We will first split our data into traning and test data with 60/40 split then use Decission Tree method to do predictions.
set.seed(132)
lis_det_clean_nzv1 <- lis_det_clean_nzv
dt_sub <- sample(nrow(lis_det_clean_nzv1), floor(nrow(lis_det_clean_nzv1) * 0.6))
dt_train <- lis_det_clean_nzv1[dt_sub, ]
dt_test <- lis_det_clean_nzv1[-dt_sub, ]
dt_model <- rpart(is_top_100 ~ ., data = dt_train, method = "class", control = rpart.control(cp = 0.01, minbucket = 5))
fancyRpartPlot(dt_model, caption = "")
printcp(dt_model)
##
## Classification tree:
## rpart(formula = is_top_100 ~ ., data = dt_train, method = "class",
## control = rpart.control(cp = 0.01, minbucket = 5))
##
## Variables actually used in tree construction:
## [1] cancellation_policy cleaning_fee
## [3] extra_people host_is_superhost
## [5] hosting_duration listing_duration
## [7] minimum_nights neighbourhood_group_cleansed
## [9] review_scores_location
##
## Root node error: 64/8649 = 0.0073997
##
## n= 8649
##
## CP nsplit rel error xerror xstd
## 1 0.018229 0 1.00000 1.0000 0.12454
## 2 0.010417 10 0.79688 1.1875 0.13562
## 3 0.010000 13 0.76562 1.1719 0.13473
plotcp(dt_model)
dt_prediction <- predict(dt_model, dt_test, type = "class")
dt_pred <- prediction(predict(dt_model, type = "prob")[, 2], dt_train$is_top_100)
dt_conf <- table(dt_test$is_top_100, dt_prediction)
print(dt_conf)
## dt_prediction
## 0 1
## 0 5722 10
## 1 31 4
dt_accuracy <- sum(diag(dt_conf))/sum(dt_conf)
print(dt_accuracy)
## [1] 0.9928906
dt_precision <- dt_conf[2,2] / (dt_conf[2,2] + dt_conf[2,1])
print(dt_precision)
## [1] 0.1142857
dt_recall <- dt_conf[2,2] / (dt_conf[2,2] + dt_conf[1,2])
print(dt_recall)
## [1] 0.2857143
dt_roc <- performance(dt_pred, measure="tpr", x.measure="fpr")
plot(dt_roc, colorize=TRUE)
abline(0, 1, lty = 2)
In case of decission tree the model accurately predicts almost all False negaitve occurences but struggles to detect most of the True positive values. Overall accuracy is over 99% which is well visualised on ROC plot.
In general the second model works better even if its not predicting well most of the true positive cases. This is due to that fact that Top 100 cases are under repressented within the data set. However it predicts well huge majority of false negative making overall accuracy to stand above 99%.
The first model predicts true positive better but is poor when predicting false negative values.
Both models will require further work and adjustment to make predictions accurate.
Data exploration shed some light on data geographical distribution and correlation beween number of reviews and following features:
As for geographical distribution the most popular neighbourhood for Top 100 is Eixample and Ciutat Viela which does not come as a surprise as they are located where the best city attractions are namely Sagrada Familia, Les Rambles or Gothic Quarter.
However the data set itself was difficult to provide valuable information to perform accurate predictions.
This will require further work and application of more than one method.