knitr::opts_chunk$set(echo = FALSE)
# Load the 'dplyr' library
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
# Load the data into hotel_data for further use
#hotel_data <- read.csv(file.choose())
hotel_data <- read.csv('C:/Users/amitg/Documents/workspaceR/data/hotels.csv')
In this section,
1. Data set ‘hotel_data’ is summarized.
2.
then find the length of dataset - hotel_data by using nrow() and assign
to variable - hotel_data_length.
3. Calculate and print the size of
subsample (50% of hotel_data_lenght).
## hotel is_canceled lead_time arrival_date_year
## Length:119390 Min. :0.0000 Min. : 0 Min. :2015
## Class :character 1st Qu.:0.0000 1st Qu.: 18 1st Qu.:2016
## Mode :character Median :0.0000 Median : 69 Median :2016
## Mean :0.3704 Mean :104 Mean :2016
## 3rd Qu.:1.0000 3rd Qu.:160 3rd Qu.:2017
## Max. :1.0000 Max. :737 Max. :2017
##
## arrival_date_month arrival_date_week_number arrival_date_day_of_month
## Length:119390 Min. : 1.00 Min. : 1.0
## Class :character 1st Qu.:16.00 1st Qu.: 8.0
## Mode :character Median :28.00 Median :16.0
## Mean :27.17 Mean :15.8
## 3rd Qu.:38.00 3rd Qu.:23.0
## Max. :53.00 Max. :31.0
##
## stays_in_weekend_nights stays_in_week_nights adults
## Min. : 0.0000 Min. : 0.0 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 1.0 1st Qu.: 2.000
## Median : 1.0000 Median : 2.0 Median : 2.000
## Mean : 0.9276 Mean : 2.5 Mean : 1.856
## 3rd Qu.: 2.0000 3rd Qu.: 3.0 3rd Qu.: 2.000
## Max. :19.0000 Max. :50.0 Max. :55.000
##
## children babies meal country
## Min. : 0.0000 Min. : 0.000000 Length:119390 Length:119390
## 1st Qu.: 0.0000 1st Qu.: 0.000000 Class :character Class :character
## Median : 0.0000 Median : 0.000000 Mode :character Mode :character
## Mean : 0.1039 Mean : 0.007949
## 3rd Qu.: 0.0000 3rd Qu.: 0.000000
## Max. :10.0000 Max. :10.000000
## NA's :4
## market_segment distribution_channel is_repeated_guest
## Length:119390 Length:119390 Min. :0.00000
## Class :character Class :character 1st Qu.:0.00000
## Mode :character Mode :character Median :0.00000
## Mean :0.03191
## 3rd Qu.:0.00000
## Max. :1.00000
##
## previous_cancellations previous_bookings_not_canceled reserved_room_type
## Min. : 0.00000 Min. : 0.0000 Length:119390
## 1st Qu.: 0.00000 1st Qu.: 0.0000 Class :character
## Median : 0.00000 Median : 0.0000 Mode :character
## Mean : 0.08712 Mean : 0.1371
## 3rd Qu.: 0.00000 3rd Qu.: 0.0000
## Max. :26.00000 Max. :72.0000
##
## assigned_room_type booking_changes deposit_type agent
## Length:119390 Min. : 0.0000 Length:119390 Length:119390
## Class :character 1st Qu.: 0.0000 Class :character Class :character
## Mode :character Median : 0.0000 Mode :character Mode :character
## Mean : 0.2211
## 3rd Qu.: 0.0000
## Max. :21.0000
##
## company days_in_waiting_list customer_type adr
## Length:119390 Min. : 0.000 Length:119390 Min. : -6.38
## Class :character 1st Qu.: 0.000 Class :character 1st Qu.: 69.29
## Mode :character Median : 0.000 Mode :character Median : 94.58
## Mean : 2.321 Mean : 101.83
## 3rd Qu.: 0.000 3rd Qu.: 126.00
## Max. :391.000 Max. :5400.00
##
## required_car_parking_spaces total_of_special_requests reservation_status
## Min. :0.00000 Min. :0.0000 Length:119390
## 1st Qu.:0.00000 1st Qu.:0.0000 Class :character
## Median :0.00000 Median :0.0000 Mode :character
## Mean :0.06252 Mean :0.5714
## 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :8.00000 Max. :5.0000
##
## reservation_status_date
## Length:119390
## Class :character
## Mode :character
##
##
##
##
## Data set size: 119390
## Subsample size: 59695
In this section, 5 sub samples have been created of size -
subsample_size (subsample_size - 50% of hotel_data_length).
Scrutinize #1
In this section, Ist Sub sample - hotel_data_subsample_df_1 has been
scrutinize.
1. Length of Sub sample - hotel_data_subsample_df_1.
2. Print Ist five rows (for few column) of sub sample.
3. Print
Last five rows (for few column) of sub sample.
Note : By seeing 2
& 3, data consistency can be verified.
4. Print the Internal
structure of Sub sample.
5. then summarized - sub sample.
## [1] 59695
## hotel lead_time meal market_segment distribution_channel country
## 1 City Hotel 208 BB Online TA TA/TO BEL
## 2 City Hotel 164 BB Groups TA/TO PRT
## 3 City Hotel 41 HB Offline TA/TO TA/TO ITA
## 4 City Hotel 43 HB Groups TA/TO PRT
## 5 Resort Hotel 80 HB Offline TA/TO TA/TO PRT
## hotel lead_time meal market_segment distribution_channel country
## 59691 City Hotel 10 SC Online TA TA/TO NLD
## 59692 City Hotel 3 BB Corporate Corporate PRT
## 59693 Resort Hotel 74 BB Online TA TA/TO PRT
## 59694 Resort Hotel 10 BB Online TA TA/TO FRA
## 59695 City Hotel 41 HB Offline TA/TO TA/TO ITA
## 'data.frame': 59695 obs. of 32 variables:
## $ hotel : chr "City Hotel" "City Hotel" "City Hotel" "City Hotel" ...
## $ is_canceled : int 0 1 0 1 0 1 0 0 0 1 ...
## $ lead_time : int 208 164 41 43 80 341 54 72 116 247 ...
## $ arrival_date_year : int 2017 2017 2015 2015 2015 2015 2016 2016 2016 2015 ...
## $ arrival_date_month : chr "June" "May" "September" "July" ...
## $ arrival_date_week_number : int 22 20 36 27 34 39 46 12 20 41 ...
## $ arrival_date_day_of_month : int 1 15 4 3 17 23 11 16 12 9 ...
## $ stays_in_weekend_nights : int 1 1 0 0 2 0 2 0 2 1 ...
## $ stays_in_week_nights : int 3 2 1 2 5 2 3 3 3 2 ...
## $ adults : int 3 1 2 1 2 2 2 2 2 1 ...
## $ children : int 0 0 0 0 0 0 0 0 0 0 ...
## $ babies : int 0 0 0 0 0 0 0 0 0 0 ...
## $ meal : chr "BB" "BB" "HB" "HB" ...
## $ country : chr "BEL" "PRT" "ITA" "PRT" ...
## $ market_segment : chr "Online TA" "Groups" "Offline TA/TO" "Groups" ...
## $ distribution_channel : chr "TA/TO" "TA/TO" "TA/TO" "TA/TO" ...
## $ is_repeated_guest : int 0 0 0 0 0 0 0 0 0 0 ...
## $ previous_cancellations : int 0 0 0 0 0 1 0 0 0 1 ...
## $ previous_bookings_not_canceled: int 0 0 0 0 0 0 0 0 0 0 ...
## $ reserved_room_type : chr "D" "A" "A" "A" ...
## $ assigned_room_type : chr "D" "A" "A" "A" ...
## $ booking_changes : int 0 0 0 0 0 0 0 0 2 0 ...
## $ deposit_type : chr "No Deposit" "Non Refund" "No Deposit" "No Deposit" ...
## $ agent : chr "7" "NULL" "39" "1" ...
## $ company : chr "NULL" "NULL" "NULL" "NULL" ...
## $ days_in_waiting_list : int 0 0 38 0 0 0 0 0 0 0 ...
## $ customer_type : chr "Transient" "Transient" "Transient-Party" "Transient-Party" ...
## $ adr : num 128 160 110 63 133 ...
## $ required_car_parking_spaces : int 0 0 0 0 0 0 0 0 0 0 ...
## $ total_of_special_requests : int 1 0 0 0 0 0 1 1 1 0 ...
## $ reservation_status : chr "Check-Out" "Canceled" "Check-Out" "Canceled" ...
## $ reservation_status_date : chr "2017-06-05" "2017-01-31" "2015-09-05" "2015-06-16" ...
## hotel is_canceled lead_time arrival_date_year
## Length:59695 Min. :0.0000 Min. : 0.0 Min. :2015
## Class :character 1st Qu.:0.0000 1st Qu.: 18.0 1st Qu.:2016
## Mode :character Median :0.0000 Median : 69.0 Median :2016
## Mean :0.3694 Mean :104.2 Mean :2016
## 3rd Qu.:1.0000 3rd Qu.:160.0 3rd Qu.:2017
## Max. :1.0000 Max. :737.0 Max. :2017
## arrival_date_month arrival_date_week_number arrival_date_day_of_month
## Length:59695 Min. : 1.00 Min. : 1.00
## Class :character 1st Qu.:16.00 1st Qu.: 8.00
## Mode :character Median :28.00 Median :16.00
## Mean :27.23 Mean :15.78
## 3rd Qu.:38.00 3rd Qu.:23.00
## Max. :53.00 Max. :31.00
## stays_in_weekend_nights stays_in_week_nights adults children
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. :0.0000
## 1st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 2.000 1st Qu.:0.0000
## Median : 1.000 Median : 2.000 Median : 2.000 Median :0.0000
## Mean : 0.927 Mean : 2.494 Mean : 1.857 Mean :0.1042
## 3rd Qu.: 2.000 3rd Qu.: 3.000 3rd Qu.: 2.000 3rd Qu.:0.0000
## Max. :16.000 Max. :41.000 Max. :50.000 Max. :3.0000
## babies meal country market_segment
## Min. :0.000000 Length:59695 Length:59695 Length:59695
## 1st Qu.:0.000000 Class :character Class :character Class :character
## Median :0.000000 Mode :character Mode :character Mode :character
## Mean :0.007907
## 3rd Qu.:0.000000
## Max. :2.000000
## distribution_channel is_repeated_guest previous_cancellations
## Length:59695 Min. :0.00000 Min. : 0.0000
## Class :character 1st Qu.:0.00000 1st Qu.: 0.0000
## Mode :character Median :0.00000 Median : 0.0000
## Mean :0.03195 Mean : 0.0883
## 3rd Qu.:0.00000 3rd Qu.: 0.0000
## Max. :1.00000 Max. :26.0000
## previous_bookings_not_canceled reserved_room_type assigned_room_type
## Min. : 0.0000 Length:59695 Length:59695
## 1st Qu.: 0.0000 Class :character Class :character
## Median : 0.0000 Mode :character Mode :character
## Mean : 0.1362
## 3rd Qu.: 0.0000
## Max. :72.0000
## booking_changes deposit_type agent company
## Min. : 0.0000 Length:59695 Length:59695 Length:59695
## 1st Qu.: 0.0000 Class :character Class :character Class :character
## Median : 0.0000 Mode :character Mode :character Mode :character
## Mean : 0.2227
## 3rd Qu.: 0.0000
## Max. :20.0000
## days_in_waiting_list customer_type adr
## Min. : 0.00 Length:59695 Min. : 0.00
## 1st Qu.: 0.00 Class :character 1st Qu.: 69.00
## Median : 0.00 Mode :character Median : 94.78
## Mean : 2.26 Mean :101.95
## 3rd Qu.: 0.00 3rd Qu.:126.00
## Max. :391.00 Max. :510.00
## required_car_parking_spaces total_of_special_requests reservation_status
## Min. :0.00000 Min. :0.0000 Length:59695
## 1st Qu.:0.00000 1st Qu.:0.0000 Class :character
## Median :0.00000 Median :0.0000 Mode :character
## Mean :0.06331 Mean :0.5731
## 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :8.00000 Max. :5.0000
## reservation_status_date
## Length:59695
## Class :character
## Mode :character
##
##
##
Scrutinize #2
In this section, IInd Sub sample -
sub_sample_hotel_data_2 has been scrutinize.
1. Length of Sub sample
- hotel_data_subsample_df_1.
2. Print Ist five rows (for few column)
of sub sample.
3. Print Last five rows (for few column) of sub
sample.
Note : By seeing 2 & 3, data consistency can be
verified.
4. Print the Internal structure of Sub sample.
5. then
summarized - sub sample
## [1] 59695
## hotel lead_time meal market_segment distribution_channel country
## 1 City Hotel 208 BB Online TA TA/TO BEL
## 2 City Hotel 164 BB Groups TA/TO PRT
## 3 City Hotel 41 HB Offline TA/TO TA/TO ITA
## 4 City Hotel 43 HB Groups TA/TO PRT
## 5 Resort Hotel 80 HB Offline TA/TO TA/TO PRT
## hotel lead_time meal market_segment distribution_channel country
## 59691 City Hotel 10 SC Online TA TA/TO NLD
## 59692 City Hotel 3 BB Corporate Corporate PRT
## 59693 Resort Hotel 74 BB Online TA TA/TO PRT
## 59694 Resort Hotel 10 BB Online TA TA/TO FRA
## 59695 City Hotel 41 HB Offline TA/TO TA/TO ITA
## 'data.frame': 59695 obs. of 32 variables:
## $ hotel : chr "City Hotel" "Resort Hotel" "Resort Hotel" "City Hotel" ...
## $ is_canceled : int 1 0 0 0 0 0 1 0 0 0 ...
## $ lead_time : int 25 100 16 45 11 11 62 11 62 10 ...
## $ arrival_date_year : int 2016 2016 2016 2016 2017 2017 2015 2015 2017 2016 ...
## $ arrival_date_month : chr "March" "July" "May" "January" ...
## $ arrival_date_week_number : int 14 29 22 4 29 12 30 31 17 23 ...
## $ arrival_date_day_of_month : int 29 15 22 23 21 21 19 1 24 31 ...
## $ stays_in_weekend_nights : int 0 0 2 1 2 0 2 4 1 0 ...
## $ stays_in_week_nights : int 4 1 1 1 3 3 5 10 3 3 ...
## $ adults : int 2 2 1 2 2 2 2 1 2 2 ...
## $ children : int 2 2 0 0 0 0 0 0 0 0 ...
## $ babies : int 0 0 0 0 0 0 0 0 0 0 ...
## $ meal : chr "BB" "BB" "BB" "BB" ...
## $ country : chr "GBR" "NLD" "PRT" "BEL" ...
## $ market_segment : chr "Online TA" "Online TA" "Direct" "Online TA" ...
## $ distribution_channel : chr "TA/TO" "TA/TO" "Corporate" "TA/TO" ...
## $ is_repeated_guest : int 0 0 0 0 0 0 0 0 0 0 ...
## $ previous_cancellations : int 0 0 1 0 0 0 0 0 0 0 ...
## $ previous_bookings_not_canceled: int 0 0 5 0 0 0 0 0 0 0 ...
## $ reserved_room_type : chr "F" "G" "A" "A" ...
## $ assigned_room_type : chr "F" "G" "A" "A" ...
## $ booking_changes : int 0 3 1 0 0 0 0 0 0 0 ...
## $ deposit_type : chr "No Deposit" "No Deposit" "No Deposit" "No Deposit" ...
## $ agent : chr "9" "240" "NULL" "9" ...
## $ company : chr "NULL" "NULL" "292" "NULL" ...
## $ days_in_waiting_list : int 0 0 0 0 0 0 0 0 0 0 ...
## $ customer_type : chr "Transient" "Transient-Party" "Transient" "Transient-Party" ...
## $ adr : num 216 226 70 80.3 123.2 ...
## $ required_car_parking_spaces : int 0 0 0 0 0 0 0 0 0 0 ...
## $ total_of_special_requests : int 0 1 0 0 1 0 1 0 1 1 ...
## $ reservation_status : chr "Canceled" "Check-Out" "Check-Out" "Check-Out" ...
## $ reservation_status_date : chr "2016-03-26" "2016-07-16" "2016-05-25" "2016-01-25" ...
## hotel is_canceled lead_time arrival_date_year
## Length:59695 Min. :0.0000 Min. : 0.0 Min. :2015
## Class :character 1st Qu.:0.0000 1st Qu.: 18.0 1st Qu.:2016
## Mode :character Median :0.0000 Median : 69.0 Median :2016
## Mean :0.3709 Mean :104.4 Mean :2016
## 3rd Qu.:1.0000 3rd Qu.:162.0 3rd Qu.:2017
## Max. :1.0000 Max. :709.0 Max. :2017
##
## arrival_date_month arrival_date_week_number arrival_date_day_of_month
## Length:59695 Min. : 1.00 Min. : 1.00
## Class :character 1st Qu.:16.00 1st Qu.: 8.00
## Mode :character Median :28.00 Median :16.00
## Mean :27.22 Mean :15.82
## 3rd Qu.:38.00 3rd Qu.:23.00
## Max. :53.00 Max. :31.00
##
## stays_in_weekend_nights stays_in_week_nights adults
## Min. : 0.0000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 1.000 1st Qu.: 2.000
## Median : 1.0000 Median : 2.000 Median : 2.000
## Mean : 0.9279 Mean : 2.498 Mean : 1.859
## 3rd Qu.: 2.0000 3rd Qu.: 3.000 3rd Qu.: 2.000
## Max. :19.0000 Max. :50.000 Max. :55.000
##
## children babies meal country
## Min. : 0.0000 Min. : 0.000000 Length:59695 Length:59695
## 1st Qu.: 0.0000 1st Qu.: 0.000000 Class :character Class :character
## Median : 0.0000 Median : 0.000000 Mode :character Mode :character
## Mean : 0.1044 Mean : 0.008393
## 3rd Qu.: 0.0000 3rd Qu.: 0.000000
## Max. :10.0000 Max. :10.000000
## NA's :3
## market_segment distribution_channel is_repeated_guest
## Length:59695 Length:59695 Min. :0.00000
## Class :character Class :character 1st Qu.:0.00000
## Mode :character Mode :character Median :0.00000
## Mean :0.03198
## 3rd Qu.:0.00000
## Max. :1.00000
##
## previous_cancellations previous_bookings_not_canceled reserved_room_type
## Min. : 0.00000 Min. : 0.0000 Length:59695
## 1st Qu.: 0.00000 1st Qu.: 0.0000 Class :character
## Median : 0.00000 Median : 0.0000 Mode :character
## Mean : 0.08585 Mean : 0.1308
## 3rd Qu.: 0.00000 3rd Qu.: 0.0000
## Max. :26.00000 Max. :68.0000
##
## assigned_room_type booking_changes deposit_type agent
## Length:59695 Min. : 0.000 Length:59695 Length:59695
## Class :character 1st Qu.: 0.000 Class :character Class :character
## Mode :character Median : 0.000 Mode :character Mode :character
## Mean : 0.221
## 3rd Qu.: 0.000
## Max. :17.000
##
## company days_in_waiting_list customer_type adr
## Length:59695 Min. : 0.000 Length:59695 Min. : -6.38
## Class :character 1st Qu.: 0.000 Class :character 1st Qu.: 70.00
## Mode :character Median : 0.000 Mode :character Median : 95.00
## Mean : 2.403 Mean :101.97
## 3rd Qu.: 0.000 3rd Qu.:126.00
## Max. :391.000 Max. :450.00
##
## required_car_parking_spaces total_of_special_requests reservation_status
## Min. :0.0000 Min. :0.0000 Length:59695
## 1st Qu.:0.0000 1st Qu.:0.0000 Class :character
## Median :0.0000 Median :0.0000 Mode :character
## Mean :0.0628 Mean :0.5766
## 3rd Qu.:0.0000 3rd Qu.:1.0000
## Max. :8.0000 Max. :5.0000
##
## reservation_status_date
## Length:59695
## Class :character
## Mode :character
##
##
##
##
Scrutinize #3
In this section, IIIrd Sub sample -
sub_sample_hotel_data_3 has been scrutinize.
1. Length of Sub sample
- hotel_data_subsample_df_3.
2. Print Ist five rows (for few column)
of sub sample.
3. Print Last five rows (for few column) of sub
sample.
Note : By seeing 2 & 3, data consistency can be
verified.
4. Print the Internal structure of Sub sample.
5. then
summarized - sub sample
## [1] 59695
## hotel lead_time meal market_segment distribution_channel country
## 1 City Hotel 208 BB Online TA TA/TO BEL
## 2 City Hotel 164 BB Groups TA/TO PRT
## 3 City Hotel 41 HB Offline TA/TO TA/TO ITA
## 4 City Hotel 43 HB Groups TA/TO PRT
## 5 Resort Hotel 80 HB Offline TA/TO TA/TO PRT
## hotel lead_time meal market_segment distribution_channel country
## 59691 City Hotel 10 SC Online TA TA/TO NLD
## 59692 City Hotel 3 BB Corporate Corporate PRT
## 59693 Resort Hotel 74 BB Online TA TA/TO PRT
## 59694 Resort Hotel 10 BB Online TA TA/TO FRA
## 59695 City Hotel 41 HB Offline TA/TO TA/TO ITA
## 'data.frame': 59695 obs. of 32 variables:
## $ hotel : chr "City Hotel" "City Hotel" "City Hotel" "Resort Hotel" ...
## $ is_canceled : int 0 1 1 0 1 0 0 0 0 0 ...
## $ lead_time : int 115 265 478 152 15 181 4 77 38 58 ...
## $ arrival_date_year : int 2017 2015 2017 2016 2016 2015 2016 2017 2015 2017 ...
## $ arrival_date_month : chr "August" "July" "August" "May" ...
## $ arrival_date_week_number : int 34 28 32 19 53 32 7 19 35 30 ...
## $ arrival_date_day_of_month : int 20 9 8 2 27 6 12 11 25 28 ...
## $ stays_in_weekend_nights : int 2 0 0 1 0 0 1 0 2 2 ...
## $ stays_in_week_nights : int 1 2 4 2 1 2 2 3 6 3 ...
## $ adults : int 1 2 1 2 2 2 2 2 2 2 ...
## $ children : int 0 0 0 0 0 0 2 0 1 0 ...
## $ babies : int 0 0 0 0 0 0 0 0 0 0 ...
## $ meal : chr "BB" "BB" "BB" "BB" ...
## $ country : chr "GBR" "PRT" "PRT" "ARG" ...
## $ market_segment : chr "Offline TA/TO" "Groups" "Offline TA/TO" "Offline TA/TO" ...
## $ distribution_channel : chr "TA/TO" "TA/TO" "TA/TO" "TA/TO" ...
## $ is_repeated_guest : int 0 0 0 0 0 0 0 0 0 0 ...
## $ previous_cancellations : int 0 1 0 0 0 0 0 0 0 0 ...
## $ previous_bookings_not_canceled: int 0 0 0 0 0 0 0 0 0 0 ...
## $ reserved_room_type : chr "D" "A" "A" "A" ...
## $ assigned_room_type : chr "D" "A" "C" "A" ...
## $ booking_changes : int 1 0 2 0 0 0 0 0 2 0 ...
## $ deposit_type : chr "No Deposit" "No Deposit" "No Deposit" "No Deposit" ...
## $ agent : chr "98" "1" "229" "336" ...
## $ company : chr "NULL" "NULL" "NULL" "NULL" ...
## $ days_in_waiting_list : int 0 0 0 0 0 0 0 0 0 0 ...
## $ customer_type : chr "Transient-Party" "Contract" "Transient-Party" "Transient" ...
## $ adr : num 125 62 88.5 47.4 65 ...
## $ required_car_parking_spaces : int 0 0 0 1 0 0 0 0 0 0 ...
## $ total_of_special_requests : int 0 0 0 1 0 2 1 0 1 2 ...
## $ reservation_status : chr "Check-Out" "Canceled" "Canceled" "Check-Out" ...
## $ reservation_status_date : chr "2017-08-23" "2015-01-01" "2017-07-28" "2016-05-05" ...
## hotel is_canceled lead_time arrival_date_year
## Length:59695 Min. :0.0000 Min. : 0.0 Min. :2015
## Class :character 1st Qu.:0.0000 1st Qu.: 18.0 1st Qu.:2016
## Mode :character Median :0.0000 Median : 69.0 Median :2016
## Mean :0.3686 Mean :103.1 Mean :2016
## 3rd Qu.:1.0000 3rd Qu.:159.0 3rd Qu.:2017
## Max. :1.0000 Max. :737.0 Max. :2017
##
## arrival_date_month arrival_date_week_number arrival_date_day_of_month
## Length:59695 Min. : 1.00 Min. : 1.00
## Class :character 1st Qu.:16.00 1st Qu.: 8.00
## Mode :character Median :28.00 Median :16.00
## Mean :27.18 Mean :15.81
## 3rd Qu.:38.00 3rd Qu.:23.00
## Max. :53.00 Max. :31.00
##
## stays_in_weekend_nights stays_in_week_nights adults children
## Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. :0.000
## 1st Qu.: 0.0000 1st Qu.: 1.000 1st Qu.: 2.000 1st Qu.:0.000
## Median : 1.0000 Median : 2.000 Median : 2.000 Median :0.000
## Mean : 0.9267 Mean : 2.483 Mean : 1.854 Mean :0.103
## 3rd Qu.: 2.0000 3rd Qu.: 3.000 3rd Qu.: 2.000 3rd Qu.:0.000
## Max. :19.0000 Max. :50.000 Max. :55.000 Max. :3.000
## NA's :2
## babies meal country market_segment
## Min. : 0.000000 Length:59695 Length:59695 Length:59695
## 1st Qu.: 0.000000 Class :character Class :character Class :character
## Median : 0.000000 Mode :character Mode :character Mode :character
## Mean : 0.008259
## 3rd Qu.: 0.000000
## Max. :10.000000
##
## distribution_channel is_repeated_guest previous_cancellations
## Length:59695 Min. :0.00000 Min. : 0.00000
## Class :character 1st Qu.:0.00000 1st Qu.: 0.00000
## Mode :character Median :0.00000 Median : 0.00000
## Mean :0.03319 Mean : 0.08972
## 3rd Qu.:0.00000 3rd Qu.: 0.00000
## Max. :1.00000 Max. :26.00000
##
## previous_bookings_not_canceled reserved_room_type assigned_room_type
## Min. : 0.0000 Length:59695 Length:59695
## 1st Qu.: 0.0000 Class :character Class :character
## Median : 0.0000 Mode :character Mode :character
## Mean : 0.1412
## 3rd Qu.: 0.0000
## Max. :70.0000
##
## booking_changes deposit_type agent company
## Min. : 0.000 Length:59695 Length:59695 Length:59695
## 1st Qu.: 0.000 Class :character Class :character Class :character
## Median : 0.000 Mode :character Mode :character Mode :character
## Mean : 0.223
## 3rd Qu.: 0.000
## Max. :18.000
##
## days_in_waiting_list customer_type adr
## Min. : 0.000 Length:59695 Min. : -6.38
## 1st Qu.: 0.000 Class :character 1st Qu.: 68.46
## Median : 0.000 Mode :character Median : 94.35
## Mean : 2.317 Mean :101.27
## 3rd Qu.: 0.000 3rd Qu.:125.10
## Max. :391.000 Max. :402.00
##
## required_car_parking_spaces total_of_special_requests reservation_status
## Min. :0.00000 Min. :0.0000 Length:59695
## 1st Qu.:0.00000 1st Qu.:0.0000 Class :character
## Median :0.00000 Median :0.0000 Mode :character
## Mean :0.06198 Mean :0.5695
## 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :2.00000 Max. :5.0000
##
## reservation_status_date
## Length:59695
## Class :character
## Mode :character
##
##
##
##
Scrutinize #4
In this section, IVth Sub sample -
sub_sample_hotel_data_4 has been scrutinize.
1. Length of Sub sample
- hotel_data_subsample_df_4.
2. Print Ist five rows (for few column)
of sub sample.
3. Print Last five rows (for few column) of sub
sample.
Note : By seeing 2 & 3, data consistency can be
verified.
4. Print the Internal structure of Sub sample.
5. then
summarized - sub sample
## [1] 59695
## hotel lead_time meal market_segment distribution_channel country
## 1 City Hotel 208 BB Online TA TA/TO BEL
## 2 City Hotel 164 BB Groups TA/TO PRT
## 3 City Hotel 41 HB Offline TA/TO TA/TO ITA
## 4 City Hotel 43 HB Groups TA/TO PRT
## 5 Resort Hotel 80 HB Offline TA/TO TA/TO PRT
## hotel lead_time meal market_segment distribution_channel country
## 59691 City Hotel 10 SC Online TA TA/TO NLD
## 59692 City Hotel 3 BB Corporate Corporate PRT
## 59693 Resort Hotel 74 BB Online TA TA/TO PRT
## 59694 Resort Hotel 10 BB Online TA TA/TO FRA
## 59695 City Hotel 41 HB Offline TA/TO TA/TO ITA
## 'data.frame': 59695 obs. of 32 variables:
## $ hotel : chr "City Hotel" "Resort Hotel" "City Hotel" "City Hotel" ...
## $ is_canceled : int 0 0 1 1 0 0 0 1 1 0 ...
## $ lead_time : int 116 1 102 73 3 54 33 5 226 48 ...
## $ arrival_date_year : int 2017 2016 2016 2017 2016 2015 2015 2016 2016 2016 ...
## $ arrival_date_month : chr "April" "May" "November" "April" ...
## $ arrival_date_week_number : int 14 20 47 14 14 52 41 38 36 35 ...
## $ arrival_date_day_of_month : int 2 9 17 3 29 22 8 12 1 24 ...
## $ stays_in_weekend_nights : int 1 0 2 1 0 0 0 1 2 0 ...
## $ stays_in_week_nights : int 0 0 3 3 2 5 3 1 5 4 ...
## $ adults : int 2 1 2 2 2 3 2 1 2 3 ...
## $ children : int 0 0 0 0 0 0 0 0 2 0 ...
## $ babies : int 0 0 0 0 0 0 0 0 0 0 ...
## $ meal : chr "SC" "BB" "BB" "BB" ...
## $ country : chr "FRA" "GBR" "GBR" "PRT" ...
## $ market_segment : chr "Online TA" "Direct" "Online TA" "Groups" ...
## $ distribution_channel : chr "TA/TO" "Direct" "TA/TO" "TA/TO" ...
## $ is_repeated_guest : int 0 0 0 0 0 0 0 1 0 0 ...
## $ previous_cancellations : int 0 0 0 0 0 0 0 2 0 0 ...
## $ previous_bookings_not_canceled: int 0 0 0 0 0 0 0 1 0 0 ...
## $ reserved_room_type : chr "A" "G" "A" "A" ...
## $ assigned_room_type : chr "A" "G" "A" "A" ...
## $ booking_changes : int 0 2 0 0 0 0 2 0 0 0 ...
## $ deposit_type : chr "No Deposit" "No Deposit" "No Deposit" "Non Refund" ...
## $ agent : chr "9" "NULL" "9" "20" ...
## $ company : chr "NULL" "NULL" "NULL" "NULL" ...
## $ days_in_waiting_list : int 0 0 0 0 0 0 0 0 0 0 ...
## $ customer_type : chr "Transient" "Transient" "Transient" "Transient" ...
## $ adr : num 99 0 85 105 58 ...
## $ required_car_parking_spaces : int 0 0 0 0 0 1 1 0 0 0 ...
## $ total_of_special_requests : int 1 0 2 0 0 1 0 0 0 1 ...
## $ reservation_status : chr "Check-Out" "Check-Out" "Canceled" "Canceled" ...
## $ reservation_status_date : chr "2017-04-03" "2016-05-09" "2016-09-21" "2017-01-20" ...
## hotel is_canceled lead_time arrival_date_year
## Length:59695 Min. :0.00 Min. : 0.0 Min. :2015
## Class :character 1st Qu.:0.00 1st Qu.: 18.0 1st Qu.:2016
## Mode :character Median :0.00 Median : 68.0 Median :2016
## Mean :0.37 Mean :103.9 Mean :2016
## 3rd Qu.:1.00 3rd Qu.:160.0 3rd Qu.:2017
## Max. :1.00 Max. :737.0 Max. :2017
##
## arrival_date_month arrival_date_week_number arrival_date_day_of_month
## Length:59695 Min. : 1.00 Min. : 1.0
## Class :character 1st Qu.:16.00 1st Qu.: 8.0
## Mode :character Median :28.00 Median :16.0
## Mean :27.16 Mean :15.8
## 3rd Qu.:38.00 3rd Qu.:23.0
## Max. :53.00 Max. :31.0
##
## stays_in_weekend_nights stays_in_week_nights adults
## Min. : 0.0000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 1.000 1st Qu.: 2.000
## Median : 1.0000 Median : 2.000 Median : 2.000
## Mean : 0.9229 Mean : 2.503 Mean : 1.858
## 3rd Qu.: 2.0000 3rd Qu.: 3.000 3rd Qu.: 2.000
## Max. :18.0000 Max. :42.000 Max. :50.000
##
## children babies meal country
## Min. : 0.0000 Min. : 0.000000 Length:59695 Length:59695
## 1st Qu.: 0.0000 1st Qu.: 0.000000 Class :character Class :character
## Median : 0.0000 Median : 0.000000 Mode :character Mode :character
## Mean : 0.1034 Mean : 0.008945
## 3rd Qu.: 0.0000 3rd Qu.: 0.000000
## Max. :10.0000 Max. :10.000000
## NA's :2
## market_segment distribution_channel is_repeated_guest
## Length:59695 Length:59695 Min. :0.00000
## Class :character Class :character 1st Qu.:0.00000
## Mode :character Mode :character Median :0.00000
## Mean :0.03265
## 3rd Qu.:0.00000
## Max. :1.00000
##
## previous_cancellations previous_bookings_not_canceled reserved_room_type
## Min. : 0.00000 Min. : 0.0000 Length:59695
## 1st Qu.: 0.00000 1st Qu.: 0.0000 Class :character
## Median : 0.00000 Median : 0.0000 Mode :character
## Mean : 0.09021 Mean : 0.1407
## 3rd Qu.: 0.00000 3rd Qu.: 0.0000
## Max. :26.00000 Max. :71.0000
##
## assigned_room_type booking_changes deposit_type agent
## Length:59695 Min. : 0.0000 Length:59695 Length:59695
## Class :character 1st Qu.: 0.0000 Class :character Class :character
## Mode :character Median : 0.0000 Mode :character Mode :character
## Mean : 0.2207
## 3rd Qu.: 0.0000
## Max. :17.0000
##
## company days_in_waiting_list customer_type adr
## Length:59695 Min. : 0.000 Length:59695 Min. : -6.38
## Class :character 1st Qu.: 0.000 Class :character 1st Qu.: 68.42
## Mode :character Median : 0.000 Mode :character Median : 94.50
## Mean : 2.343 Mean : 101.78
## 3rd Qu.: 0.000 3rd Qu.: 126.00
## Max. :391.000 Max. :5400.00
##
## required_car_parking_spaces total_of_special_requests reservation_status
## Min. :0.00000 Min. :0.0000 Length:59695
## 1st Qu.:0.00000 1st Qu.:0.0000 Class :character
## Median :0.00000 Median :0.0000 Mode :character
## Mean :0.06166 Mean :0.5683
## 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :3.00000 Max. :5.0000
##
## reservation_status_date
## Length:59695
## Class :character
## Mode :character
##
##
##
##
Scrutinize #5
In this section, Vth Sub sample -
sub_sample_hotel_data_5 has been scrutinize.
1. Length of Sub sample
- hotel_data_subsample_df_5.
2. Print Ist five rows(for few column)
of sub sample.
3. Print Last five rows(for few column) of sub
sample.
Note : By seeing 2 & 3, data consistency can be
verified.
4. Print the Internal structure of Sub sample.
5. then
summarized - sub sample
## [1] 59695
## hotel meal market_segment distribution_channel country
## 1 City Hotel BB Online TA TA/TO BEL
## 2 City Hotel BB Groups TA/TO PRT
## 3 City Hotel HB Offline TA/TO TA/TO ITA
## 4 City Hotel HB Groups TA/TO PRT
## 5 Resort Hotel HB Offline TA/TO TA/TO PRT
## hotel meal market_segment distribution_channel country
## 59691 City Hotel SC Online TA TA/TO NLD
## 59692 City Hotel BB Corporate Corporate PRT
## 59693 Resort Hotel BB Online TA TA/TO PRT
## 59694 Resort Hotel BB Online TA TA/TO FRA
## 59695 City Hotel HB Offline TA/TO TA/TO ITA
## 'data.frame': 59695 obs. of 32 variables:
## $ hotel : chr "City Hotel" "City Hotel" "Resort Hotel" "City Hotel" ...
## $ is_canceled : int 0 0 1 1 1 1 1 0 1 0 ...
## $ lead_time : int 0 160 168 15 80 75 80 6 39 4 ...
## $ arrival_date_year : int 2015 2017 2016 2017 2015 2016 2015 2015 2016 2017 ...
## $ arrival_date_month : chr "December" "June" "April" "August" ...
## $ arrival_date_week_number : int 51 25 16 32 40 45 45 34 27 10 ...
## $ arrival_date_day_of_month : int 16 22 12 7 28 30 2 18 27 6 ...
## $ stays_in_weekend_nights : int 0 0 0 1 1 2 1 0 1 1 ...
## $ stays_in_week_nights : int 1 3 2 3 3 0 1 3 0 0 ...
## $ adults : int 1 2 2 2 2 2 2 2 2 1 ...
## $ children : int 0 0 0 0 0 0 0 0 0 0 ...
## $ babies : int 0 0 0 0 0 0 0 0 0 0 ...
## $ meal : chr "BB" "SC" "HB" "BB" ...
## $ country : chr "ESP" "BEL" "PRT" "NLD" ...
## $ market_segment : chr "Direct" "Online TA" "Groups" "Direct" ...
## $ distribution_channel : chr "Direct" "TA/TO" "TA/TO" "Direct" ...
## $ is_repeated_guest : int 0 0 0 0 0 0 0 0 0 0 ...
## $ previous_cancellations : int 0 0 0 0 0 0 0 0 0 0 ...
## $ previous_bookings_not_canceled: int 0 0 0 0 0 0 0 0 0 0 ...
## $ reserved_room_type : chr "A" "A" "A" "A" ...
## $ assigned_room_type : chr "D" "A" "A" "A" ...
## $ booking_changes : int 0 0 0 0 0 0 0 0 0 0 ...
## $ deposit_type : chr "No Deposit" "No Deposit" "Non Refund" "No Deposit" ...
## $ agent : chr "NULL" "9" "245" "14" ...
## $ company : chr "NULL" "NULL" "NULL" "NULL" ...
## $ days_in_waiting_list : int 0 0 0 0 0 0 60 0 0 0 ...
## $ customer_type : chr "Transient" "Transient" "Transient" "Transient" ...
## $ adr : num 75 99 86 175 69.5 105 111 75 89.1 62.4 ...
## $ required_car_parking_spaces : int 0 0 0 0 0 0 0 0 0 0 ...
## $ total_of_special_requests : int 0 0 0 0 2 0 0 2 0 0 ...
## $ reservation_status : chr "Check-Out" "Check-Out" "Canceled" "Canceled" ...
## $ reservation_status_date : chr "2015-12-17" "2017-06-25" "2016-01-05" "2017-07-28" ...
## hotel is_canceled lead_time arrival_date_year
## Length:59695 Min. :0.000 Min. : 0.0 Min. :2015
## Class :character 1st Qu.:0.000 1st Qu.: 18.0 1st Qu.:2016
## Mode :character Median :0.000 Median : 69.0 Median :2016
## Mean :0.369 Mean :103.9 Mean :2016
## 3rd Qu.:1.000 3rd Qu.:160.5 3rd Qu.:2017
## Max. :1.000 Max. :737.0 Max. :2017
##
## arrival_date_month arrival_date_week_number arrival_date_day_of_month
## Length:59695 Min. : 1.00 Min. : 1.00
## Class :character 1st Qu.:16.00 1st Qu.: 8.00
## Mode :character Median :28.00 Median :16.00
## Mean :27.17 Mean :15.86
## 3rd Qu.:38.00 3rd Qu.:24.00
## Max. :53.00 Max. :31.00
##
## stays_in_weekend_nights stays_in_week_nights adults children
## Min. : 0.0000 Min. : 0.000 Min. : 0.000 Min. :0.0000
## 1st Qu.: 0.0000 1st Qu.: 1.000 1st Qu.: 2.000 1st Qu.:0.0000
## Median : 1.0000 Median : 2.000 Median : 2.000 Median :0.0000
## Mean : 0.9324 Mean : 2.524 Mean : 1.859 Mean :0.1048
## 3rd Qu.: 2.0000 3rd Qu.: 3.000 3rd Qu.: 2.000 3rd Qu.:0.0000
## Max. :18.0000 Max. :42.000 Max. :50.000 Max. :3.0000
## NA's :3
## babies meal country market_segment
## Min. : 0.000000 Length:59695 Length:59695 Length:59695
## 1st Qu.: 0.000000 Class :character Class :character Class :character
## Median : 0.000000 Mode :character Mode :character Mode :character
## Mean : 0.008292
## 3rd Qu.: 0.000000
## Max. :10.000000
##
## distribution_channel is_repeated_guest previous_cancellations
## Length:59695 Min. :0.0000 Min. : 0.00000
## Class :character 1st Qu.:0.0000 1st Qu.: 0.00000
## Mode :character Median :0.0000 Median : 0.00000
## Mean :0.0319 Mean : 0.08927
## 3rd Qu.:0.0000 3rd Qu.: 0.00000
## Max. :1.0000 Max. :26.00000
##
## previous_bookings_not_canceled reserved_room_type assigned_room_type
## Min. : 0.0000 Length:59695 Length:59695
## 1st Qu.: 0.0000 Class :character Class :character
## Median : 0.0000 Mode :character Mode :character
## Mean : 0.1388
## 3rd Qu.: 0.0000
## Max. :71.0000
##
## booking_changes deposit_type agent company
## Min. : 0.0000 Length:59695 Length:59695 Length:59695
## 1st Qu.: 0.0000 Class :character Class :character Class :character
## Median : 0.0000 Mode :character Mode :character Mode :character
## Mean : 0.2218
## 3rd Qu.: 0.0000
## Max. :17.0000
##
## days_in_waiting_list customer_type adr
## Min. : 0.000 Length:59695 Min. : -6.38
## 1st Qu.: 0.000 Class :character 1st Qu.: 69.21
## Median : 0.000 Mode :character Median : 94.50
## Mean : 2.215 Mean : 102.12
## 3rd Qu.: 0.000 3rd Qu.: 126.00
## Max. :391.000 Max. :5400.00
##
## required_car_parking_spaces total_of_special_requests reservation_status
## Min. :0.00000 Min. :0.0000 Length:59695
## 1st Qu.:0.00000 1st Qu.:0.0000 Class :character
## Median :0.00000 Median :0.0000 Mode :character
## Mean :0.06389 Mean :0.5772
## 3rd Qu.:0.00000 3rd Qu.:1.0000
## Max. :8.00000 Max. :5.0000
##
## reservation_status_date
## Length:59695
## Class :character
## Mode :character
##
##
##
##
Above graph shown the distribution of lead_time across all sub-samples.
Note: in above plot, 1 - subsample-1, 2 -
subsample-2,3-subsample-3, 4-subsample-4,5-subsample-5
#Anomaly Detection for sub sample data frame#1 : -
hotel_data_subsample_df_1
## # A tibble: 2 × 2
## hotel count
## <chr> <int>
## 1 City Hotel 39893
## 2 Resort Hotel 19802
## # A tibble: 2 × 2
## hotel count
## <chr> <int>
## 1 City Hotel 39906
## 2 Resort Hotel 19789
## # A tibble: 2 × 2
## hotel count
## <chr> <int>
## 1 City Hotel 39651
## 2 Resort Hotel 20044
## # A tibble: 2 × 2
## hotel count
## <chr> <int>
## 1 City Hotel 39608
## 2 Resort Hotel 20087
## # A tibble: 2 × 2
## hotel count
## <chr> <int>
## 1 City Hotel 39607
## 2 Resort Hotel 20088
In above section, subsample has been create with the help of sample_n()
function of dplyr package with 50% size of orignal data set. In above
determining of Anomaly Detection, We have grouped the data by the
“hotel” and then calculate the counts of observation for each hotal type
with in the sub-sample. Observation is - 1. Each subsample have
different number of hotel types
2. In sab-sample -
hotel_data_subsample_df_1 and hotel_data_subsample_df_5 , Number of
Resort Hotel is higher then other sub-samples
(hotel_data_subsample_df_2,hotel_data_subsample_df_3,hotel_data_subsample_df_4)
## Observed Chi-Square Test Statistic: 1681060
## P-value of sub sample 1: 0.428
## Observed Chi-Square Test Statistic: 1662916
## P-value of sun-sample 2: 0.442
## Observed Chi-Square Test Statistic: 1671291
## P-value of sub sample 3: 0.476
## Observed Chi-Square Test Statistic: 1738752
## P-value of sub sample 4: 0.46
## Observed Chi-Square Test Statistic: 1664807
## P-value of sub sample 5: 0.45
#Monte Carlo Simulations of data set hotel_data_subsample_df_1
Monte Carlo Simulations histograms diagram is graphical representationof frequency distribution of data set which were based on randam numbers genetated using Monte Carlo simulations.
In above histograms diagram, The Title of Graph is - “Monte Carlo Simulation of Lead Time” which is indicating that the plot represt the distribution of Lead Times(on X Axis) derived from Monte Carlo simulation. The X-axis label - Lead Time specifies the variable being measured.
The hight of each bar represnts the frequency of lead times falling within a specific range. Higher Bar means a higher occurence of lead time with in the range.
The histogram can be compared to expected or historical lead time data. Discrepancies may highlight areas where the simulation differs from real-world observations, prompting further investigation or refinement of the simulation model.
The insights gained from the histogram can be used for decision support, especially in scenarios where lead time variability is a critical factor. Understanding the distribution helps in making informed decisions and developing strategies to manage lead time uncertainties.
Thank You.!!!