1. Initial setup and Configure the data set.
  2. Load the data set file in variable hotel_data files.
  3. Data set - Hotels : This data comes from an open hotel booking demand dataset from Antonio, Almeida and Nunes.
knitr::opts_chunk$set(echo = FALSE)
# Load the 'dplyr' library
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
# Load the data into hotel_data for further use
hotel_data <- read.csv(file.choose())

In this section,
1. Data set ‘hotel_data’ is summarized.
2. then find the length of dataset - hotel_data by using nrow() and assign to variable - hotel_data_length.
3. Calculate and print the size of subsample (50% of hotel_data_lenght).

##     hotel            is_canceled       lead_time   arrival_date_year
##  Length:119390      Min.   :0.0000   Min.   :  0   Min.   :2015     
##  Class :character   1st Qu.:0.0000   1st Qu.: 18   1st Qu.:2016     
##  Mode  :character   Median :0.0000   Median : 69   Median :2016     
##                     Mean   :0.3704   Mean   :104   Mean   :2016     
##                     3rd Qu.:1.0000   3rd Qu.:160   3rd Qu.:2017     
##                     Max.   :1.0000   Max.   :737   Max.   :2017     
##                                                                     
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:119390      Min.   : 1.00            Min.   : 1.0             
##  Class :character   1st Qu.:16.00            1st Qu.: 8.0             
##  Mode  :character   Median :28.00            Median :16.0             
##                     Mean   :27.17            Mean   :15.8             
##                     3rd Qu.:38.00            3rd Qu.:23.0             
##                     Max.   :53.00            Max.   :31.0             
##                                                                       
##  stays_in_weekend_nights stays_in_week_nights     adults      
##  Min.   : 0.0000         Min.   : 0.0         Min.   : 0.000  
##  1st Qu.: 0.0000         1st Qu.: 1.0         1st Qu.: 2.000  
##  Median : 1.0000         Median : 2.0         Median : 2.000  
##  Mean   : 0.9276         Mean   : 2.5         Mean   : 1.856  
##  3rd Qu.: 2.0000         3rd Qu.: 3.0         3rd Qu.: 2.000  
##  Max.   :19.0000         Max.   :50.0         Max.   :55.000  
##                                                               
##     children           babies              meal             country         
##  Min.   : 0.0000   Min.   : 0.000000   Length:119390      Length:119390     
##  1st Qu.: 0.0000   1st Qu.: 0.000000   Class :character   Class :character  
##  Median : 0.0000   Median : 0.000000   Mode  :character   Mode  :character  
##  Mean   : 0.1039   Mean   : 0.007949                                        
##  3rd Qu.: 0.0000   3rd Qu.: 0.000000                                        
##  Max.   :10.0000   Max.   :10.000000                                        
##  NA's   :4                                                                  
##  market_segment     distribution_channel is_repeated_guest
##  Length:119390      Length:119390        Min.   :0.00000  
##  Class :character   Class :character     1st Qu.:0.00000  
##  Mode  :character   Mode  :character     Median :0.00000  
##                                          Mean   :0.03191  
##                                          3rd Qu.:0.00000  
##                                          Max.   :1.00000  
##                                                           
##  previous_cancellations previous_bookings_not_canceled reserved_room_type
##  Min.   : 0.00000       Min.   : 0.0000                Length:119390     
##  1st Qu.: 0.00000       1st Qu.: 0.0000                Class :character  
##  Median : 0.00000       Median : 0.0000                Mode  :character  
##  Mean   : 0.08712       Mean   : 0.1371                                  
##  3rd Qu.: 0.00000       3rd Qu.: 0.0000                                  
##  Max.   :26.00000       Max.   :72.0000                                  
##                                                                          
##  assigned_room_type booking_changes   deposit_type          agent          
##  Length:119390      Min.   : 0.0000   Length:119390      Length:119390     
##  Class :character   1st Qu.: 0.0000   Class :character   Class :character  
##  Mode  :character   Median : 0.0000   Mode  :character   Mode  :character  
##                     Mean   : 0.2211                                        
##                     3rd Qu.: 0.0000                                        
##                     Max.   :21.0000                                        
##                                                                            
##    company          days_in_waiting_list customer_type           adr         
##  Length:119390      Min.   :  0.000      Length:119390      Min.   :  -6.38  
##  Class :character   1st Qu.:  0.000      Class :character   1st Qu.:  69.29  
##  Mode  :character   Median :  0.000      Mode  :character   Median :  94.58  
##                     Mean   :  2.321                         Mean   : 101.83  
##                     3rd Qu.:  0.000                         3rd Qu.: 126.00  
##                     Max.   :391.000                         Max.   :5400.00  
##                                                                              
##  required_car_parking_spaces total_of_special_requests reservation_status
##  Min.   :0.00000             Min.   :0.0000            Length:119390     
##  1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
##  Median :0.00000             Median :0.0000            Mode  :character  
##  Mean   :0.06252             Mean   :0.5714                              
##  3rd Qu.:0.00000             3rd Qu.:1.0000                              
##  Max.   :8.00000             Max.   :5.0000                              
##                                                                          
##  reservation_status_date
##  Length:119390          
##  Class :character       
##  Mode  :character       
##                         
##                         
##                         
## 
## Data set size: 119390
## Subsample size: 59695

In this section, 5 sub samples have been created of size - subsample_size (subsample_size - 50% of hotel_data_length).

Scrutinize #1

In this section, Ist Sub sample - hotel_data_subsample_df_1 has been scrutinize.
1. Length of Sub sample - hotel_data_subsample_df_1.
2. Print Ist five rows (for few column) of sub sample.
3. Print Last five rows (for few column) of sub sample.
Note : By seeing 2 & 3, data consistency can be verified.
4. Print the Internal structure of Sub sample.
5. then summarized - sub sample.

## [1] 59695
##        hotel lead_time meal market_segment distribution_channel country
## 1 City Hotel       186   SC      Online TA                TA/TO     BRA
## 2 City Hotel       109   BB      Online TA                TA/TO     BEL
## 3 City Hotel        52   SC      Online TA                TA/TO     GBR
## 4 City Hotel       265   BB         Groups                TA/TO     PRT
## 5 City Hotel        14   BB      Online TA                TA/TO     ESP
##            hotel lead_time meal market_segment distribution_channel country
## 59691 City Hotel       352   SC      Online TA                TA/TO     GBR
## 59692 City Hotel       165   BB      Online TA                TA/TO     ITA
## 59693 City Hotel        45   BB      Online TA                TA/TO     DEU
## 59694 City Hotel       156   BB      Online TA                TA/TO     SWE
## 59695 City Hotel        73   BB         Groups                TA/TO     PRT
## 'data.frame':    59695 obs. of  32 variables:
##  $ hotel                         : chr  "City Hotel" "City Hotel" "City Hotel" "City Hotel" ...
##  $ is_canceled                   : int  1 0 1 1 1 0 0 1 1 0 ...
##  $ lead_time                     : int  186 109 52 265 14 7 221 89 309 48 ...
##  $ arrival_date_year             : int  2017 2016 2017 2016 2017 2016 2015 2016 2017 2016 ...
##  $ arrival_date_month            : chr  "April" "September" "June" "April" ...
##  $ arrival_date_week_number      : int  16 37 26 17 16 45 42 42 19 38 ...
##  $ arrival_date_day_of_month     : int  22 7 28 20 18 31 17 10 13 14 ...
##  $ stays_in_weekend_nights       : int  1 0 0 0 0 1 2 1 0 0 ...
##  $ stays_in_week_nights          : int  1 3 2 4 3 2 1 1 1 2 ...
##  $ adults                        : int  2 2 1 2 3 2 3 2 1 2 ...
##  $ children                      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ babies                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ meal                          : chr  "SC" "BB" "SC" "BB" ...
##  $ country                       : chr  "BRA" "BEL" "GBR" "PRT" ...
##  $ market_segment                : chr  "Online TA" "Online TA" "Online TA" "Groups" ...
##  $ distribution_channel          : chr  "TA/TO" "TA/TO" "TA/TO" "TA/TO" ...
##  $ is_repeated_guest             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_cancellations        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_bookings_not_canceled: int  0 0 0 0 0 1 0 0 0 0 ...
##  $ reserved_room_type            : chr  "A" "A" "A" "A" ...
##  $ assigned_room_type            : chr  "A" "A" "A" "A" ...
##  $ booking_changes               : int  0 0 0 0 0 1 1 0 1 0 ...
##  $ deposit_type                  : chr  "No Deposit" "No Deposit" "No Deposit" "Non Refund" ...
##  $ agent                         : chr  "9" "9" "9" "30" ...
##  $ company                       : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ days_in_waiting_list          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ customer_type                 : chr  "Transient" "Transient" "Transient" "Transient" ...
##  $ adr                           : num  99 130 120 101 215 ...
##  $ required_car_parking_spaces   : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ total_of_special_requests     : int  0 1 1 0 1 1 0 1 0 0 ...
##  $ reservation_status            : chr  "Canceled" "Check-Out" "Canceled" "Canceled" ...
##  $ reservation_status_date       : chr  "2016-10-18" "2016-09-10" "2017-05-09" "2015-07-30" ...
##     hotel            is_canceled      lead_time     arrival_date_year
##  Length:59695       Min.   :0.000   Min.   :  0.0   Min.   :2015     
##  Class :character   1st Qu.:0.000   1st Qu.: 18.0   1st Qu.:2016     
##  Mode  :character   Median :0.000   Median : 69.0   Median :2016     
##                     Mean   :0.367   Mean   :103.9   Mean   :2016     
##                     3rd Qu.:1.000   3rd Qu.:161.0   3rd Qu.:2017     
##                     Max.   :1.000   Max.   :629.0   Max.   :2017     
##                                                                      
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:59695       Min.   : 1.00            Min.   : 1.0             
##  Class :character   1st Qu.:16.00            1st Qu.: 8.0             
##  Mode  :character   Median :28.00            Median :16.0             
##                     Mean   :27.15            Mean   :15.8             
##                     3rd Qu.:38.00            3rd Qu.:23.0             
##                     Max.   :53.00            Max.   :31.0             
##                                                                       
##  stays_in_weekend_nights stays_in_week_nights     adults          children     
##  Min.   : 0.000          Min.   : 0.000       Min.   : 0.000   Min.   :0.0000  
##  1st Qu.: 0.000          1st Qu.: 1.000       1st Qu.: 2.000   1st Qu.:0.0000  
##  Median : 1.000          Median : 2.000       Median : 2.000   Median :0.0000  
##  Mean   : 0.931          Mean   : 2.505       Mean   : 1.857   Mean   :0.1053  
##  3rd Qu.: 2.000          3rd Qu.: 3.000       3rd Qu.: 2.000   3rd Qu.:0.0000  
##  Max.   :19.000          Max.   :50.000       Max.   :50.000   Max.   :3.0000  
##                                                                NA's   :3       
##      babies             meal             country          market_segment    
##  Min.   :0.000000   Length:59695       Length:59695       Length:59695      
##  1st Qu.:0.000000   Class :character   Class :character   Class :character  
##  Median :0.000000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :0.008158                                                           
##  3rd Qu.:0.000000                                                           
##  Max.   :2.000000                                                           
##                                                                             
##  distribution_channel is_repeated_guest previous_cancellations
##  Length:59695         Min.   :0.00000   Min.   : 0.0000       
##  Class :character     1st Qu.:0.00000   1st Qu.: 0.0000       
##  Mode  :character     Median :0.00000   Median : 0.0000       
##                       Mean   :0.03161   Mean   : 0.0856       
##                       3rd Qu.:0.00000   3rd Qu.: 0.0000       
##                       Max.   :1.00000   Max.   :26.0000       
##                                                               
##  previous_bookings_not_canceled reserved_room_type assigned_room_type
##  Min.   : 0.0000                Length:59695       Length:59695      
##  1st Qu.: 0.0000                Class :character   Class :character  
##  Median : 0.0000                Mode  :character   Mode  :character  
##  Mean   : 0.1401                                                     
##  3rd Qu.: 0.0000                                                     
##  Max.   :72.0000                                                     
##                                                                      
##  booking_changes   deposit_type          agent             company         
##  Min.   : 0.0000   Length:59695       Length:59695       Length:59695      
##  1st Qu.: 0.0000   Class :character   Class :character   Class :character  
##  Median : 0.0000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 0.2212                                                           
##  3rd Qu.: 0.0000                                                           
##  Max.   :15.0000                                                           
##                                                                            
##  days_in_waiting_list customer_type           adr        
##  Min.   :  0.00       Length:59695       Min.   :  0.00  
##  1st Qu.:  0.00       Class :character   1st Qu.: 69.66  
##  Median :  0.00       Mode  :character   Median : 95.00  
##  Mean   :  2.34                          Mean   :101.83  
##  3rd Qu.:  0.00                          3rd Qu.:126.00  
##  Max.   :391.00                          Max.   :510.00  
##                                                          
##  required_car_parking_spaces total_of_special_requests reservation_status
##  Min.   :0.00000             Min.   :0.0000            Length:59695      
##  1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
##  Median :0.00000             Median :0.0000            Mode  :character  
##  Mean   :0.06315             Mean   :0.5679                              
##  3rd Qu.:0.00000             3rd Qu.:1.0000                              
##  Max.   :3.00000             Max.   :5.0000                              
##                                                                          
##  reservation_status_date
##  Length:59695           
##  Class :character       
##  Mode  :character       
##                         
##                         
##                         
## 

Scrutinize #2
In this section, IInd Sub sample - sub_sample_hotel_data_2 has been scrutinize.
1. Length of Sub sample - hotel_data_subsample_df_1.
2. Print Ist five rows (for few column) of sub sample.
3. Print Last five rows (for few column) of sub sample.
Note : By seeing 2 & 3, data consistency can be verified.
4. Print the Internal structure of Sub sample.
5. then summarized - sub sample

## [1] 59695
##        hotel lead_time meal market_segment distribution_channel country
## 1 City Hotel       186   SC      Online TA                TA/TO     BRA
## 2 City Hotel       109   BB      Online TA                TA/TO     BEL
## 3 City Hotel        52   SC      Online TA                TA/TO     GBR
## 4 City Hotel       265   BB         Groups                TA/TO     PRT
## 5 City Hotel        14   BB      Online TA                TA/TO     ESP
##            hotel lead_time meal market_segment distribution_channel country
## 59691 City Hotel       352   SC      Online TA                TA/TO     GBR
## 59692 City Hotel       165   BB      Online TA                TA/TO     ITA
## 59693 City Hotel        45   BB      Online TA                TA/TO     DEU
## 59694 City Hotel       156   BB      Online TA                TA/TO     SWE
## 59695 City Hotel        73   BB         Groups                TA/TO     PRT
## 'data.frame':    59695 obs. of  32 variables:
##  $ hotel                         : chr  "City Hotel" "City Hotel" "City Hotel" "City Hotel" ...
##  $ is_canceled                   : int  0 1 1 0 1 0 1 1 0 0 ...
##  $ lead_time                     : int  22 35 330 243 228 102 87 105 89 100 ...
##  $ arrival_date_year             : int  2016 2016 2015 2017 2016 2016 2017 2016 2016 2015 ...
##  $ arrival_date_month            : chr  "September" "March" "September" "May" ...
##  $ arrival_date_week_number      : int  38 13 37 22 35 22 15 15 37 42 ...
##  $ arrival_date_day_of_month     : int  15 25 12 29 27 23 15 6 4 15 ...
##  $ stays_in_weekend_nights       : int  0 1 2 1 2 1 2 0 2 1 ...
##  $ stays_in_week_nights          : int  1 2 2 0 4 2 2 1 4 3 ...
##  $ adults                        : int  2 2 2 2 2 2 2 2 1 2 ...
##  $ children                      : int  0 1 0 0 0 1 0 0 0 0 ...
##  $ babies                        : int  0 0 0 0 0 1 0 0 0 0 ...
##  $ meal                          : chr  "BB" "BB" "BB" "SC" ...
##  $ country                       : chr  "GBR" "PRT" "PRT" "GBR" ...
##  $ market_segment                : chr  "Online TA" "Offline TA/TO" "Groups" "Online TA" ...
##  $ distribution_channel          : chr  "TA/TO" "TA/TO" "TA/TO" "TA/TO" ...
##  $ is_repeated_guest             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_cancellations        : int  0 0 1 0 0 0 0 0 0 0 ...
##  $ previous_bookings_not_canceled: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ reserved_room_type            : chr  "A" "A" "A" "A" ...
##  $ assigned_room_type            : chr  "B" "A" "A" "A" ...
##  $ booking_changes               : int  0 0 0 0 0 0 0 0 1 0 ...
##  $ deposit_type                  : chr  "No Deposit" "No Deposit" "No Deposit" "No Deposit" ...
##  $ agent                         : chr  "9" "3" "1" "9" ...
##  $ company                       : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ days_in_waiting_list          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ customer_type                 : chr  "Transient" "Transient" "Transient-Party" "Transient" ...
##  $ adr                           : num  159 76 62 107.1 80.8 ...
##  $ required_car_parking_spaces   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ total_of_special_requests     : int  3 1 0 1 0 2 0 0 0 1 ...
##  $ reservation_status            : chr  "Check-Out" "Canceled" "Canceled" "Check-Out" ...
##  $ reservation_status_date       : chr  "2016-09-16" "2016-02-19" "2015-07-06" "2017-05-30" ...
##     hotel            is_canceled       lead_time     arrival_date_year
##  Length:59695       Min.   :0.0000   Min.   :  0.0   Min.   :2015     
##  Class :character   1st Qu.:0.0000   1st Qu.: 18.0   1st Qu.:2016     
##  Mode  :character   Median :0.0000   Median : 69.0   Median :2016     
##                     Mean   :0.3729   Mean   :103.5   Mean   :2016     
##                     3rd Qu.:1.0000   3rd Qu.:160.0   3rd Qu.:2017     
##                     Max.   :1.0000   Max.   :709.0   Max.   :2017     
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:59695       Min.   : 1.00            Min.   : 1.0             
##  Class :character   1st Qu.:16.00            1st Qu.: 8.0             
##  Mode  :character   Median :27.00            Median :16.0             
##                     Mean   :27.12            Mean   :15.7             
##                     3rd Qu.:38.00            3rd Qu.:23.0             
##                     Max.   :53.00            Max.   :31.0             
##  stays_in_weekend_nights stays_in_week_nights     adults          children     
##  Min.   : 0.0000         Min.   : 0.000       Min.   : 0.000   Min.   :0.0000  
##  1st Qu.: 0.0000         1st Qu.: 1.000       1st Qu.: 2.000   1st Qu.:0.0000  
##  Median : 1.0000         Median : 2.000       Median : 2.000   Median :0.0000  
##  Mean   : 0.9266         Mean   : 2.492       Mean   : 1.859   Mean   :0.1002  
##  3rd Qu.: 2.0000         3rd Qu.: 3.000       3rd Qu.: 2.000   3rd Qu.:0.0000  
##  Max.   :19.0000         Max.   :50.000       Max.   :55.000   Max.   :3.0000  
##      babies             meal             country          market_segment    
##  Min.   :0.000000   Length:59695       Length:59695       Length:59695      
##  1st Qu.:0.000000   Class :character   Class :character   Class :character  
##  Median :0.000000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :0.008007                                                           
##  3rd Qu.:0.000000                                                           
##  Max.   :2.000000                                                           
##  distribution_channel is_repeated_guest previous_cancellations
##  Length:59695         Min.   :0.00000   Min.   : 0.00000      
##  Class :character     1st Qu.:0.00000   1st Qu.: 0.00000      
##  Mode  :character     Median :0.00000   Median : 0.00000      
##                       Mean   :0.03293   Mean   : 0.09644      
##                       3rd Qu.:0.00000   3rd Qu.: 0.00000      
##                       Max.   :1.00000   Max.   :26.00000      
##  previous_bookings_not_canceled reserved_room_type assigned_room_type
##  Min.   : 0.0000                Length:59695       Length:59695      
##  1st Qu.: 0.0000                Class :character   Class :character  
##  Median : 0.0000                Mode  :character   Mode  :character  
##  Mean   : 0.1422                                                     
##  3rd Qu.: 0.0000                                                     
##  Max.   :72.0000                                                     
##  booking_changes   deposit_type          agent             company         
##  Min.   : 0.0000   Length:59695       Length:59695       Length:59695      
##  1st Qu.: 0.0000   Class :character   Class :character   Class :character  
##  Median : 0.0000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 0.2224                                                           
##  3rd Qu.: 0.0000                                                           
##  Max.   :21.0000                                                           
##  days_in_waiting_list customer_type           adr         
##  Min.   :  0.000      Length:59695       Min.   :   0.00  
##  1st Qu.:  0.000      Class :character   1st Qu.:  69.36  
##  Median :  0.000      Mode  :character   Median :  94.50  
##  Mean   :  2.246                         Mean   : 101.88  
##  3rd Qu.:  0.000                         3rd Qu.: 126.00  
##  Max.   :391.000                         Max.   :5400.00  
##  required_car_parking_spaces total_of_special_requests reservation_status
##  Min.   :0.00000             Min.   :0.0000            Length:59695      
##  1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
##  Median :0.00000             Median :0.0000            Mode  :character  
##  Mean   :0.06217             Mean   :0.5728                              
##  3rd Qu.:0.00000             3rd Qu.:1.0000                              
##  Max.   :8.00000             Max.   :5.0000                              
##  reservation_status_date
##  Length:59695           
##  Class :character       
##  Mode  :character       
##                         
##                         
## 

Scrutinize #3
In this section, IIIrd Sub sample - sub_sample_hotel_data_3 has been scrutinize.
1. Length of Sub sample - hotel_data_subsample_df_3.
2. Print Ist five rows (for few column) of sub sample.
3. Print Last five rows (for few column) of sub sample.
Note : By seeing 2 & 3, data consistency can be verified.
4. Print the Internal structure of Sub sample.
5. then summarized - sub sample

## [1] 59695
##        hotel lead_time meal market_segment distribution_channel country
## 1 City Hotel       186   SC      Online TA                TA/TO     BRA
## 2 City Hotel       109   BB      Online TA                TA/TO     BEL
## 3 City Hotel        52   SC      Online TA                TA/TO     GBR
## 4 City Hotel       265   BB         Groups                TA/TO     PRT
## 5 City Hotel        14   BB      Online TA                TA/TO     ESP
##            hotel lead_time meal market_segment distribution_channel country
## 59691 City Hotel       352   SC      Online TA                TA/TO     GBR
## 59692 City Hotel       165   BB      Online TA                TA/TO     ITA
## 59693 City Hotel        45   BB      Online TA                TA/TO     DEU
## 59694 City Hotel       156   BB      Online TA                TA/TO     SWE
## 59695 City Hotel        73   BB         Groups                TA/TO     PRT
## 'data.frame':    59695 obs. of  32 variables:
##  $ hotel                         : chr  "Resort Hotel" "Resort Hotel" "City Hotel" "City Hotel" ...
##  $ is_canceled                   : int  0 0 1 0 1 1 1 1 1 0 ...
##  $ lead_time                     : int  1 211 333 8 115 195 166 92 2 144 ...
##  $ arrival_date_year             : int  2016 2016 2016 2016 2016 2017 2016 2017 2016 2017 ...
##  $ arrival_date_month            : chr  "November" "May" "September" "November" ...
##  $ arrival_date_week_number      : int  47 20 39 45 13 30 45 33 2 17 ...
##  $ arrival_date_day_of_month     : int  17 14 20 5 20 23 1 16 8 27 ...
##  $ stays_in_weekend_nights       : int  2 1 0 0 2 2 0 0 0 2 ...
##  $ stays_in_week_nights          : int  7 1 2 1 0 5 3 3 2 4 ...
##  $ adults                        : int  2 2 2 2 1 2 2 2 2 2 ...
##  $ children                      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ babies                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ meal                          : chr  "BB" "HB" "BB" "BB" ...
##  $ country                       : chr  "PRT" "DEU" "PRT" "CHN" ...
##  $ market_segment                : chr  "Direct" "Groups" "Offline TA/TO" "Offline TA/TO" ...
##  $ distribution_channel          : chr  "Direct" "TA/TO" "TA/TO" "TA/TO" ...
##  $ is_repeated_guest             : int  1 0 0 0 0 0 0 0 0 0 ...
##  $ previous_cancellations        : int  0 0 1 0 1 0 0 0 0 0 ...
##  $ previous_bookings_not_canceled: int  2 0 0 0 0 0 0 0 0 0 ...
##  $ reserved_room_type            : chr  "A" "A" "A" "A" ...
##  $ assigned_room_type            : chr  "E" "A" "A" "A" ...
##  $ booking_changes               : int  1 0 0 0 0 0 0 0 0 0 ...
##  $ deposit_type                  : chr  "No Deposit" "No Deposit" "Non Refund" "No Deposit" ...
##  $ agent                         : chr  "NULL" "298" "58" "359" ...
##  $ company                       : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ days_in_waiting_list          : int  0 0 19 0 22 0 0 0 0 0 ...
##  $ customer_type                 : chr  "Transient" "Transient-Party" "Transient" "Transient-Party" ...
##  $ adr                           : num  59.3 85 90 130 70 ...
##  $ required_car_parking_spaces   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ total_of_special_requests     : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ reservation_status            : chr  "Check-Out" "Check-Out" "Canceled" "Check-Out" ...
##  $ reservation_status_date       : chr  "2016-11-26" "2016-05-16" "2015-11-11" "2016-11-06" ...
##     hotel            is_canceled       lead_time     arrival_date_year
##  Length:59695       Min.   :0.0000   Min.   :  0.0   Min.   :2015     
##  Class :character   1st Qu.:0.0000   1st Qu.: 18.0   1st Qu.:2016     
##  Mode  :character   Median :0.0000   Median : 69.0   Median :2016     
##                     Mean   :0.3678   Mean   :103.8   Mean   :2016     
##                     3rd Qu.:1.0000   3rd Qu.:160.0   3rd Qu.:2017     
##                     Max.   :1.0000   Max.   :629.0   Max.   :2017     
##                                                                       
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:59695       Min.   : 1.00            Min.   : 1.00            
##  Class :character   1st Qu.:16.00            1st Qu.: 8.00            
##  Mode  :character   Median :28.00            Median :16.00            
##                     Mean   :27.19            Mean   :15.81            
##                     3rd Qu.:38.00            3rd Qu.:23.00            
##                     Max.   :53.00            Max.   :31.00            
##                                                                       
##  stays_in_weekend_nights stays_in_week_nights     adults          children     
##  Min.   : 0.0000         Min.   : 0.000       Min.   : 0.000   Min.   :0.0000  
##  1st Qu.: 0.0000         1st Qu.: 1.000       1st Qu.: 2.000   1st Qu.:0.0000  
##  Median : 1.0000         Median : 2.000       Median : 2.000   Median :0.0000  
##  Mean   : 0.9255         Mean   : 2.498       Mean   : 1.856   Mean   :0.1029  
##  3rd Qu.: 2.0000         3rd Qu.: 3.000       3rd Qu.: 2.000   3rd Qu.:0.0000  
##  Max.   :16.0000         Max.   :41.000       Max.   :26.000   Max.   :3.0000  
##                                                                NA's   :3       
##      babies             meal             country          market_segment    
##  Min.   :0.000000   Length:59695       Length:59695       Length:59695      
##  1st Qu.:0.000000   Class :character   Class :character   Class :character  
##  Median :0.000000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :0.008376                                                           
##  3rd Qu.:0.000000                                                           
##  Max.   :2.000000                                                           
##                                                                             
##  distribution_channel is_repeated_guest previous_cancellations
##  Length:59695         Min.   :0.00000   Min.   : 0.00000      
##  Class :character     1st Qu.:0.00000   1st Qu.: 0.00000      
##  Mode  :character     Median :0.00000   Median : 0.00000      
##                       Mean   :0.03005   Mean   : 0.08279      
##                       3rd Qu.:0.00000   3rd Qu.: 0.00000      
##                       Max.   :1.00000   Max.   :26.00000      
##                                                               
##  previous_bookings_not_canceled reserved_room_type assigned_room_type
##  Min.   : 0.0000                Length:59695       Length:59695      
##  1st Qu.: 0.0000                Class :character   Class :character  
##  Median : 0.0000                Mode  :character   Mode  :character  
##  Mean   : 0.1248                                                     
##  3rd Qu.: 0.0000                                                     
##  Max.   :71.0000                                                     
##                                                                      
##  booking_changes   deposit_type          agent             company         
##  Min.   : 0.0000   Length:59695       Length:59695       Length:59695      
##  1st Qu.: 0.0000   Class :character   Class :character   Class :character  
##  Median : 0.0000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 0.2242                                                           
##  3rd Qu.: 0.0000                                                           
##  Max.   :20.0000                                                           
##                                                                            
##  days_in_waiting_list customer_type           adr        
##  Min.   :  0.000      Length:59695       Min.   : -6.38  
##  1st Qu.:  0.000      Class :character   1st Qu.: 69.00  
##  Median :  0.000      Mode  :character   Median : 94.70  
##  Mean   :  2.246                         Mean   :101.85  
##  3rd Qu.:  0.000                         3rd Qu.:126.00  
##  Max.   :391.000                         Max.   :508.00  
##                                                          
##  required_car_parking_spaces total_of_special_requests reservation_status
##  Min.   :0.00000             Min.   :0.0000            Length:59695      
##  1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
##  Median :0.00000             Median :0.0000            Mode  :character  
##  Mean   :0.06253             Mean   :0.5738                              
##  3rd Qu.:0.00000             3rd Qu.:1.0000                              
##  Max.   :3.00000             Max.   :5.0000                              
##                                                                          
##  reservation_status_date
##  Length:59695           
##  Class :character       
##  Mode  :character       
##                         
##                         
##                         
## 

Scrutinize #4
In this section, IVth Sub sample - sub_sample_hotel_data_4 has been scrutinize.
1. Length of Sub sample - hotel_data_subsample_df_4.
2. Print Ist five rows (for few column) of sub sample.
3. Print Last five rows (for few column) of sub sample.
Note : By seeing 2 & 3, data consistency can be verified.
4. Print the Internal structure of Sub sample.
5. then summarized - sub sample

## [1] 59695
##        hotel lead_time meal market_segment distribution_channel country
## 1 City Hotel       186   SC      Online TA                TA/TO     BRA
## 2 City Hotel       109   BB      Online TA                TA/TO     BEL
## 3 City Hotel        52   SC      Online TA                TA/TO     GBR
## 4 City Hotel       265   BB         Groups                TA/TO     PRT
## 5 City Hotel        14   BB      Online TA                TA/TO     ESP
##            hotel lead_time meal market_segment distribution_channel country
## 59691 City Hotel       352   SC      Online TA                TA/TO     GBR
## 59692 City Hotel       165   BB      Online TA                TA/TO     ITA
## 59693 City Hotel        45   BB      Online TA                TA/TO     DEU
## 59694 City Hotel       156   BB      Online TA                TA/TO     SWE
## 59695 City Hotel        73   BB         Groups                TA/TO     PRT
## 'data.frame':    59695 obs. of  32 variables:
##  $ hotel                         : chr  "Resort Hotel" "City Hotel" "City Hotel" "City Hotel" ...
##  $ is_canceled                   : int  0 0 1 0 1 1 1 1 0 0 ...
##  $ lead_time                     : int  70 151 175 152 5 626 40 195 63 34 ...
##  $ arrival_date_year             : int  2015 2016 2017 2017 2016 2016 2017 2017 2017 2016 ...
##  $ arrival_date_month            : chr  "August" "October" "July" "May" ...
##  $ arrival_date_week_number      : int  34 44 28 18 7 46 5 33 30 50 ...
##  $ arrival_date_day_of_month     : int  22 25 15 4 12 7 31 19 26 5 ...
##  $ stays_in_weekend_nights       : int  2 0 2 0 0 1 0 2 0 2 ...
##  $ stays_in_week_nights          : int  4 4 1 3 2 2 3 1 3 5 ...
##  $ adults                        : int  2 2 2 2 2 2 2 3 2 2 ...
##  $ children                      : int  2 0 0 0 0 0 0 0 2 0 ...
##  $ babies                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ meal                          : chr  "HB" "BB" "BB" "BB" ...
##  $ country                       : chr  "PRT" "ROU" "CN" "DEU" ...
##  $ market_segment                : chr  "Direct" "Online TA" "Online TA" "Online TA" ...
##  $ distribution_channel          : chr  "Direct" "TA/TO" "TA/TO" "TA/TO" ...
##  $ is_repeated_guest             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_cancellations        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_bookings_not_canceled: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ reserved_room_type            : chr  "C" "A" "A" "D" ...
##  $ assigned_room_type            : chr  "C" "A" "A" "D" ...
##  $ booking_changes               : int  1 0 0 0 0 0 0 1 4 0 ...
##  $ deposit_type                  : chr  "No Deposit" "No Deposit" "No Deposit" "No Deposit" ...
##  $ agent                         : chr  "250" "9" "9" "9" ...
##  $ company                       : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ days_in_waiting_list          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ customer_type                 : chr  "Transient" "Transient" "Transient" "Transient-Party" ...
##  $ adr                           : num  211 90.9 107.1 139.5 91 ...
##  $ required_car_parking_spaces   : int  1 0 0 0 0 0 0 0 0 0 ...
##  $ total_of_special_requests     : int  0 1 1 2 1 0 1 0 1 1 ...
##  $ reservation_status            : chr  "Check-Out" "Check-Out" "Canceled" "Check-Out" ...
##  $ reservation_status_date       : chr  "2015-08-28" "2016-10-29" "2017-01-28" "2017-05-07" ...
##     hotel            is_canceled       lead_time     arrival_date_year
##  Length:59695       Min.   :0.0000   Min.   :  0.0   Min.   :2015     
##  Class :character   1st Qu.:0.0000   1st Qu.: 18.0   1st Qu.:2016     
##  Mode  :character   Median :0.0000   Median : 69.0   Median :2016     
##                     Mean   :0.3699   Mean   :104.1   Mean   :2016     
##                     3rd Qu.:1.0000   3rd Qu.:161.0   3rd Qu.:2017     
##                     Max.   :1.0000   Max.   :629.0   Max.   :2017     
##                                                                       
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:59695       Min.   : 1.00            Min.   : 1.00            
##  Class :character   1st Qu.:16.00            1st Qu.: 8.00            
##  Mode  :character   Median :28.00            Median :16.00            
##                     Mean   :27.28            Mean   :15.79            
##                     3rd Qu.:38.00            3rd Qu.:23.00            
##                     Max.   :53.00            Max.   :31.00            
##                                                                       
##  stays_in_weekend_nights stays_in_week_nights     adults      
##  Min.   : 0.0000         Min.   : 0.000       Min.   : 0.000  
##  1st Qu.: 0.0000         1st Qu.: 1.000       1st Qu.: 2.000  
##  Median : 1.0000         Median : 2.000       Median : 2.000  
##  Mean   : 0.9266         Mean   : 2.502       Mean   : 1.851  
##  3rd Qu.: 2.0000         3rd Qu.: 3.000       3rd Qu.: 2.000  
##  Max.   :16.0000         Max.   :41.000       Max.   :55.000  
##                                                               
##     children           babies             meal             country         
##  Min.   : 0.0000   Min.   :0.000000   Length:59695       Length:59695      
##  1st Qu.: 0.0000   1st Qu.:0.000000   Class :character   Class :character  
##  Median : 0.0000   Median :0.000000   Mode  :character   Mode  :character  
##  Mean   : 0.1039   Mean   :0.008292                                        
##  3rd Qu.: 0.0000   3rd Qu.:0.000000                                        
##  Max.   :10.0000   Max.   :9.000000                                        
##  NA's   :4                                                                 
##  market_segment     distribution_channel is_repeated_guest
##  Length:59695       Length:59695         Min.   :0.00000  
##  Class :character   Class :character     1st Qu.:0.00000  
##  Mode  :character   Mode  :character     Median :0.00000  
##                                          Mean   :0.03211  
##                                          3rd Qu.:0.00000  
##                                          Max.   :1.00000  
##                                                           
##  previous_cancellations previous_bookings_not_canceled reserved_room_type
##  Min.   : 0.00000       Min.   : 0.0000                Length:59695      
##  1st Qu.: 0.00000       1st Qu.: 0.0000                Class :character  
##  Median : 0.00000       Median : 0.0000                Mode  :character  
##  Mean   : 0.09446       Mean   : 0.1369                                  
##  3rd Qu.: 0.00000       3rd Qu.: 0.0000                                  
##  Max.   :26.00000       Max.   :72.0000                                  
##                                                                          
##  assigned_room_type booking_changes   deposit_type          agent          
##  Length:59695       Min.   : 0.0000   Length:59695       Length:59695      
##  Class :character   1st Qu.: 0.0000   Class :character   Class :character  
##  Mode  :character   Median : 0.0000   Mode  :character   Mode  :character  
##                     Mean   : 0.2233                                        
##                     3rd Qu.: 0.0000                                        
##                     Max.   :17.0000                                        
##                                                                            
##    company          days_in_waiting_list customer_type           adr       
##  Length:59695       Min.   :  0.000      Length:59695       Min.   :  0.0  
##  Class :character   1st Qu.:  0.000      Class :character   1st Qu.: 69.0  
##  Mode  :character   Median :  0.000      Mode  :character   Median : 95.0  
##                     Mean   :  2.286                         Mean   :101.8  
##                     3rd Qu.:  0.000                         3rd Qu.:126.0  
##                     Max.   :391.000                         Max.   :451.5  
##                                                                            
##  required_car_parking_spaces total_of_special_requests reservation_status
##  Min.   :0.0000              Min.   :0.0000            Length:59695      
##  1st Qu.:0.0000              1st Qu.:0.0000            Class :character  
##  Median :0.0000              Median :0.0000            Mode  :character  
##  Mean   :0.0619              Mean   :0.5735                              
##  3rd Qu.:0.0000              3rd Qu.:1.0000                              
##  Max.   :3.0000              Max.   :5.0000                              
##                                                                          
##  reservation_status_date
##  Length:59695           
##  Class :character       
##  Mode  :character       
##                         
##                         
##                         
## 

Scrutinize #5
In this section, Vth Sub sample - sub_sample_hotel_data_5 has been scrutinize.
1. Length of Sub sample - hotel_data_subsample_df_5.
2. Print Ist five rows(for few column) of sub sample.
3. Print Last five rows(for few column) of sub sample.
Note : By seeing 2 & 3, data consistency can be verified.
4. Print the Internal structure of Sub sample.
5. then summarized - sub sample

## [1] 59695
##        hotel meal market_segment distribution_channel country
## 1 City Hotel   SC      Online TA                TA/TO     BRA
## 2 City Hotel   BB      Online TA                TA/TO     BEL
## 3 City Hotel   SC      Online TA                TA/TO     GBR
## 4 City Hotel   BB         Groups                TA/TO     PRT
## 5 City Hotel   BB      Online TA                TA/TO     ESP
##            hotel meal market_segment distribution_channel country
## 59691 City Hotel   SC      Online TA                TA/TO     GBR
## 59692 City Hotel   BB      Online TA                TA/TO     ITA
## 59693 City Hotel   BB      Online TA                TA/TO     DEU
## 59694 City Hotel   BB      Online TA                TA/TO     SWE
## 59695 City Hotel   BB         Groups                TA/TO     PRT
## 'data.frame':    59695 obs. of  32 variables:
##  $ hotel                         : chr  "City Hotel" "Resort Hotel" "City Hotel" "City Hotel" ...
##  $ is_canceled                   : int  1 0 1 1 0 0 0 0 1 0 ...
##  $ lead_time                     : int  414 24 308 53 212 147 287 37 13 3 ...
##  $ arrival_date_year             : int  2015 2016 2016 2016 2017 2016 2015 2016 2017 2016 ...
##  $ arrival_date_month            : chr  "December" "July" "November" "June" ...
##  $ arrival_date_week_number      : int  49 30 48 27 35 38 42 26 4 44 ...
##  $ arrival_date_day_of_month     : int  5 21 25 27 28 12 15 21 24 23 ...
##  $ stays_in_weekend_nights       : int  2 0 0 1 2 1 0 0 8 2 ...
##  $ stays_in_week_nights          : int  1 0 2 2 5 3 3 3 22 1 ...
##  $ adults                        : int  2 2 2 2 2 2 2 2 1 1 ...
##  $ children                      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ babies                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ meal                          : chr  "BB" "BB" "BB" "SC" ...
##  $ country                       : chr  "PRT" "GBR" "PRT" "ESP" ...
##  $ market_segment                : chr  "Groups" "Online TA" "Offline TA/TO" "Online TA" ...
##  $ distribution_channel          : chr  "TA/TO" "TA/TO" "TA/TO" "TA/TO" ...
##  $ is_repeated_guest             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_cancellations        : int  1 0 0 0 0 0 0 0 0 0 ...
##  $ previous_bookings_not_canceled: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ reserved_room_type            : chr  "A" "A" "A" "A" ...
##  $ assigned_room_type            : chr  "A" "A" "A" "A" ...
##  $ booking_changes               : int  0 0 0 0 0 0 1 0 1 0 ...
##  $ deposit_type                  : chr  "Non Refund" "No Deposit" "Non Refund" "No Deposit" ...
##  $ agent                         : chr  "1" "NULL" "20" "9" ...
##  $ company                       : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ days_in_waiting_list          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ customer_type                 : chr  "Transient" "Transient" "Transient" "Transient" ...
##  $ adr                           : num  62 0 52 101.1 92.2 ...
##  $ required_car_parking_spaces   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ total_of_special_requests     : int  0 0 0 0 1 1 0 1 1 0 ...
##  $ reservation_status            : chr  "Canceled" "Check-Out" "Canceled" "Canceled" ...
##  $ reservation_status_date       : chr  "2015-07-23" "2016-07-21" "2016-03-15" "2016-05-09" ...
##     hotel            is_canceled       lead_time     arrival_date_year
##  Length:59695       Min.   :0.0000   Min.   :  0.0   Min.   :2015     
##  Class :character   1st Qu.:0.0000   1st Qu.: 18.0   1st Qu.:2016     
##  Mode  :character   Median :0.0000   Median : 70.0   Median :2016     
##                     Mean   :0.3744   Mean   :104.8   Mean   :2016     
##                     3rd Qu.:1.0000   3rd Qu.:161.0   3rd Qu.:2017     
##                     Max.   :1.0000   Max.   :629.0   Max.   :2017     
##                                                                       
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:59695       Min.   : 1.00            Min.   : 1.00            
##  Class :character   1st Qu.:16.00            1st Qu.: 8.00            
##  Mode  :character   Median :28.00            Median :16.00            
##                     Mean   :27.23            Mean   :15.73            
##                     3rd Qu.:38.00            3rd Qu.:23.00            
##                     Max.   :53.00            Max.   :31.00            
##                                                                       
##  stays_in_weekend_nights stays_in_week_nights     adults          children     
##  Min.   : 0.0000         Min.   : 0.000       Min.   : 0.000   Min.   :0.0000  
##  1st Qu.: 0.0000         1st Qu.: 1.000       1st Qu.: 2.000   1st Qu.:0.0000  
##  Median : 1.0000         Median : 2.000       Median : 2.000   Median :0.0000  
##  Mean   : 0.9312         Mean   : 2.504       Mean   : 1.859   Mean   :0.1034  
##  3rd Qu.: 2.0000         3rd Qu.: 3.000       3rd Qu.: 2.000   3rd Qu.:0.0000  
##  Max.   :18.0000         Max.   :42.000       Max.   :50.000   Max.   :3.0000  
##                                                                NA's   :1       
##      babies              meal             country          market_segment    
##  Min.   : 0.000000   Length:59695       Length:59695       Length:59695      
##  1st Qu.: 0.000000   Class :character   Class :character   Class :character  
##  Median : 0.000000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 0.007639                                                           
##  3rd Qu.: 0.000000                                                           
##  Max.   :10.000000                                                           
##                                                                              
##  distribution_channel is_repeated_guest previous_cancellations
##  Length:59695         Min.   :0.00000   Min.   : 0.00000      
##  Class :character     1st Qu.:0.00000   1st Qu.: 0.00000      
##  Mode  :character     Median :0.00000   Median : 0.00000      
##                       Mean   :0.03176   Mean   : 0.09393      
##                       3rd Qu.:0.00000   3rd Qu.: 0.00000      
##                       Max.   :1.00000   Max.   :26.00000      
##                                                               
##  previous_bookings_not_canceled reserved_room_type assigned_room_type
##  Min.   : 0.0000                Length:59695       Length:59695      
##  1st Qu.: 0.0000                Class :character   Class :character  
##  Median : 0.0000                Mode  :character   Mode  :character  
##  Mean   : 0.1389                                                     
##  3rd Qu.: 0.0000                                                     
##  Max.   :72.0000                                                     
##                                                                      
##  booking_changes   deposit_type          agent             company         
##  Min.   : 0.0000   Length:59695       Length:59695       Length:59695      
##  1st Qu.: 0.0000   Class :character   Class :character   Class :character  
##  Median : 0.0000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 0.2235                                                           
##  3rd Qu.: 0.0000                                                           
##  Max.   :20.0000                                                           
##                                                                            
##  days_in_waiting_list customer_type           adr         
##  Min.   :  0.00       Length:59695       Min.   :   0.00  
##  1st Qu.:  0.00       Class :character   1st Qu.:  69.29  
##  Median :  0.00       Mode  :character   Median :  95.00  
##  Mean   :  2.39                          Mean   : 102.08  
##  3rd Qu.:  0.00                          3rd Qu.: 126.00  
##  Max.   :391.00                          Max.   :5400.00  
##                                                           
##  required_car_parking_spaces total_of_special_requests reservation_status
##  Min.   :0.00000             Min.   :0.0000            Length:59695      
##  1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
##  Median :0.00000             Median :0.0000            Mode  :character  
##  Mean   :0.06364             Mean   :0.5694                              
##  3rd Qu.:0.00000             3rd Qu.:1.0000                              
##  Max.   :8.00000             Max.   :5.0000                              
##                                                                          
##  reservation_status_date
##  Length:59695           
##  Class :character       
##  Mode  :character       
##                         
##                         
##                         
## 

Box plot for a numeric variable -‘lead_time’ in subsample.

Above graph shown the distribution of lead_time across all sub-samples.
Note: in above plot, 1 - subsample-1, 2 - subsample-2,3-subsample-3, 4-subsample-4,5-subsample-5



Ask:How you approached each subsample along with anomaly?
#Anomaly Detection for sub sample data frames.

## # A tibble: 2 × 2
##   hotel        count
##   <chr>        <int>
## 1 City Hotel   39534
## 2 Resort Hotel 20161
## # A tibble: 2 × 2
##   hotel        count
##   <chr>        <int>
## 1 City Hotel   39747
## 2 Resort Hotel 19948
## # A tibble: 2 × 2
##   hotel        count
##   <chr>        <int>
## 1 City Hotel   39494
## 2 Resort Hotel 20201
## # A tibble: 2 × 2
##   hotel        count
##   <chr>        <int>
## 1 City Hotel   39618
## 2 Resort Hotel 20077
## # A tibble: 2 × 2
##   hotel        count
##   <chr>        <int>
## 1 City Hotel   39607
## 2 Resort Hotel 20088

In above section, subsamples have been created with the help of sample_n() function of dplyr package with 50% size of orignal data set.
To determine of Anomaly Detection, We have grouped the data by the “hotel” and then calculate the counts of observation for each hotal type with in the sub-sample.
Observation is - 1. Each subsample has different number of hotel types.
2. In sab-sample - hotel_data_subsample_df_1 and hotel_data_subsample_df_5 , Number of Resort Hotel is higher then other sub-samples (hotel_data_subsample_df_2,hotel_data_subsample_df_3,hotel_data_subsample_df_4)

Determing frequency tables for Consistency Analysis

Ask: how you approached each subsample along and consistency analysis?

  1. Create frequency tables for the “country” variable for each sub-sample(by using table function) for Consistency Analysis and assign in variable.
  2. For Consistency analysis, comparing the distributions of countries across the subsamples. Consistency would imply that the distribution of countries remains relatively stable across the subsamples.

Performing Monte Carlo Simulation of Chi-Square Test for detemining consistency.

## Observed Chi-Square Test Statistic: 1729186
## P-value of sub sample 1: 0.444

## Observed Chi-Square Test Statistic: 1651152
## P-value of sun-sample 2: 0.436

## Observed Chi-Square Test Statistic: 1723060
## P-value of sub sample 3: 0.452

## Observed Chi-Square Test Statistic: 1748038
## P-value of sub sample 4: 0.45

## Observed Chi-Square Test Statistic: 1711324
## P-value of sub sample 5: 0.44

  1. The chi-square test has been run on all sub-sample datasets for determining the Consistencey in sub-sample
  2. In chi-square test on sub-samples, Since P-value is around .5 which is > 0.1 so we can not conclude a significant difference exists.

#Monte Carlo Simulations of data set hotel_data_subsample_df_1

Monte Carlo Simulations

Monte Carlo Simulations histograms diagram is graphical representation of frequency distribution of data set which were based on randam numbers genetated using Monte Carlo simulations.

In above histograms diagram, The Title of Graph is - “Monte Carlo Simulation of Lead Time which is indicating that the plot represent the distribution of Lead Times(on X Axis) derived from Monte Carlo simulation. The X-axis label - Lead Time specifies the variable being measured.

The hight of each bar represnts the frequency of lead times falling within a specific range. Higher Bar means a higher occurence of lead time with in the range.

The histogram can be compared to expected or historical lead time data. Discrepancies may highlight areas where the simulation differs from real-world observations, prompting further investigation or refinement of the simulation model.

The insights gained from the histogram can be used for decision support, especially in scenarios where lead time variability is a critical factor. Understanding the distribution helps in making informed decisions and developing strategies to manage lead time uncertainties.

Thank You.!!!