1. Initial setup and Configure the data set.
  2. Load the data set file in variable hotel_data files.
  3. Data set - Hotels : This data comes from an open hotel booking demand dataset from Antonio, Almeida and Nunes.
knitr::opts_chunk$set(echo = FALSE)
# Load the 'dplyr' library
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
# Load the data into hotel_data for further use
#hotel_data <- read.csv(file.choose())
hotel_data <- read.csv('C:/Users/amitg/Documents/workspaceR/data/hotels.csv')

In this section,
1. Data set ‘hotel_data’ is summarized.
2. then find the length of dataset - hotel_data by using nrow() and assign to variable - hotel_data_length.
3. Calculate and print the size of subsample (50% of hotel_data_lenght).

##     hotel            is_canceled       lead_time   arrival_date_year
##  Length:119390      Min.   :0.0000   Min.   :  0   Min.   :2015     
##  Class :character   1st Qu.:0.0000   1st Qu.: 18   1st Qu.:2016     
##  Mode  :character   Median :0.0000   Median : 69   Median :2016     
##                     Mean   :0.3704   Mean   :104   Mean   :2016     
##                     3rd Qu.:1.0000   3rd Qu.:160   3rd Qu.:2017     
##                     Max.   :1.0000   Max.   :737   Max.   :2017     
##                                                                     
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:119390      Min.   : 1.00            Min.   : 1.0             
##  Class :character   1st Qu.:16.00            1st Qu.: 8.0             
##  Mode  :character   Median :28.00            Median :16.0             
##                     Mean   :27.17            Mean   :15.8             
##                     3rd Qu.:38.00            3rd Qu.:23.0             
##                     Max.   :53.00            Max.   :31.0             
##                                                                       
##  stays_in_weekend_nights stays_in_week_nights     adults      
##  Min.   : 0.0000         Min.   : 0.0         Min.   : 0.000  
##  1st Qu.: 0.0000         1st Qu.: 1.0         1st Qu.: 2.000  
##  Median : 1.0000         Median : 2.0         Median : 2.000  
##  Mean   : 0.9276         Mean   : 2.5         Mean   : 1.856  
##  3rd Qu.: 2.0000         3rd Qu.: 3.0         3rd Qu.: 2.000  
##  Max.   :19.0000         Max.   :50.0         Max.   :55.000  
##                                                               
##     children           babies              meal             country         
##  Min.   : 0.0000   Min.   : 0.000000   Length:119390      Length:119390     
##  1st Qu.: 0.0000   1st Qu.: 0.000000   Class :character   Class :character  
##  Median : 0.0000   Median : 0.000000   Mode  :character   Mode  :character  
##  Mean   : 0.1039   Mean   : 0.007949                                        
##  3rd Qu.: 0.0000   3rd Qu.: 0.000000                                        
##  Max.   :10.0000   Max.   :10.000000                                        
##  NA's   :4                                                                  
##  market_segment     distribution_channel is_repeated_guest
##  Length:119390      Length:119390        Min.   :0.00000  
##  Class :character   Class :character     1st Qu.:0.00000  
##  Mode  :character   Mode  :character     Median :0.00000  
##                                          Mean   :0.03191  
##                                          3rd Qu.:0.00000  
##                                          Max.   :1.00000  
##                                                           
##  previous_cancellations previous_bookings_not_canceled reserved_room_type
##  Min.   : 0.00000       Min.   : 0.0000                Length:119390     
##  1st Qu.: 0.00000       1st Qu.: 0.0000                Class :character  
##  Median : 0.00000       Median : 0.0000                Mode  :character  
##  Mean   : 0.08712       Mean   : 0.1371                                  
##  3rd Qu.: 0.00000       3rd Qu.: 0.0000                                  
##  Max.   :26.00000       Max.   :72.0000                                  
##                                                                          
##  assigned_room_type booking_changes   deposit_type          agent          
##  Length:119390      Min.   : 0.0000   Length:119390      Length:119390     
##  Class :character   1st Qu.: 0.0000   Class :character   Class :character  
##  Mode  :character   Median : 0.0000   Mode  :character   Mode  :character  
##                     Mean   : 0.2211                                        
##                     3rd Qu.: 0.0000                                        
##                     Max.   :21.0000                                        
##                                                                            
##    company          days_in_waiting_list customer_type           adr         
##  Length:119390      Min.   :  0.000      Length:119390      Min.   :  -6.38  
##  Class :character   1st Qu.:  0.000      Class :character   1st Qu.:  69.29  
##  Mode  :character   Median :  0.000      Mode  :character   Median :  94.58  
##                     Mean   :  2.321                         Mean   : 101.83  
##                     3rd Qu.:  0.000                         3rd Qu.: 126.00  
##                     Max.   :391.000                         Max.   :5400.00  
##                                                                              
##  required_car_parking_spaces total_of_special_requests reservation_status
##  Min.   :0.00000             Min.   :0.0000            Length:119390     
##  1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
##  Median :0.00000             Median :0.0000            Mode  :character  
##  Mean   :0.06252             Mean   :0.5714                              
##  3rd Qu.:0.00000             3rd Qu.:1.0000                              
##  Max.   :8.00000             Max.   :5.0000                              
##                                                                          
##  reservation_status_date
##  Length:119390          
##  Class :character       
##  Mode  :character       
##                         
##                         
##                         
## 
## Data set size: 119390
## Subsample size: 59695

In this section, 5 sub samples have been created of size - subsample_size (subsample_size - 50% of hotel_data_length).

Scrutinize #1

In this section, Ist Sub sample - hotel_data_subsample_df_1 has been scrutinize.
1. Length of Sub sample - hotel_data_subsample_df_1.
2. Print Ist five rows (for few column) of sub sample.
3. Print Last five rows (for few column) of sub sample.
Note : By seeing 2 & 3, data consistency can be verified.
4. Print the Internal structure of Sub sample.
5. then summarized - sub sample.

## [1] 59695
##          hotel lead_time meal market_segment distribution_channel country
## 1   City Hotel       208   BB      Online TA                TA/TO     BEL
## 2   City Hotel       164   BB         Groups                TA/TO     PRT
## 3   City Hotel        41   HB  Offline TA/TO                TA/TO     ITA
## 4   City Hotel        43   HB         Groups                TA/TO     PRT
## 5 Resort Hotel        80   HB  Offline TA/TO                TA/TO     PRT
##              hotel lead_time meal market_segment distribution_channel country
## 59691   City Hotel        10   SC      Online TA                TA/TO     NLD
## 59692   City Hotel         3   BB      Corporate            Corporate     PRT
## 59693 Resort Hotel        74   BB      Online TA                TA/TO     PRT
## 59694 Resort Hotel        10   BB      Online TA                TA/TO     FRA
## 59695   City Hotel        41   HB  Offline TA/TO                TA/TO     ITA
## 'data.frame':    59695 obs. of  32 variables:
##  $ hotel                         : chr  "City Hotel" "City Hotel" "City Hotel" "City Hotel" ...
##  $ is_canceled                   : int  0 1 0 1 0 1 0 0 0 1 ...
##  $ lead_time                     : int  208 164 41 43 80 341 54 72 116 247 ...
##  $ arrival_date_year             : int  2017 2017 2015 2015 2015 2015 2016 2016 2016 2015 ...
##  $ arrival_date_month            : chr  "June" "May" "September" "July" ...
##  $ arrival_date_week_number      : int  22 20 36 27 34 39 46 12 20 41 ...
##  $ arrival_date_day_of_month     : int  1 15 4 3 17 23 11 16 12 9 ...
##  $ stays_in_weekend_nights       : int  1 1 0 0 2 0 2 0 2 1 ...
##  $ stays_in_week_nights          : int  3 2 1 2 5 2 3 3 3 2 ...
##  $ adults                        : int  3 1 2 1 2 2 2 2 2 1 ...
##  $ children                      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ babies                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ meal                          : chr  "BB" "BB" "HB" "HB" ...
##  $ country                       : chr  "BEL" "PRT" "ITA" "PRT" ...
##  $ market_segment                : chr  "Online TA" "Groups" "Offline TA/TO" "Groups" ...
##  $ distribution_channel          : chr  "TA/TO" "TA/TO" "TA/TO" "TA/TO" ...
##  $ is_repeated_guest             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_cancellations        : int  0 0 0 0 0 1 0 0 0 1 ...
##  $ previous_bookings_not_canceled: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ reserved_room_type            : chr  "D" "A" "A" "A" ...
##  $ assigned_room_type            : chr  "D" "A" "A" "A" ...
##  $ booking_changes               : int  0 0 0 0 0 0 0 0 2 0 ...
##  $ deposit_type                  : chr  "No Deposit" "Non Refund" "No Deposit" "No Deposit" ...
##  $ agent                         : chr  "7" "NULL" "39" "1" ...
##  $ company                       : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ days_in_waiting_list          : int  0 0 38 0 0 0 0 0 0 0 ...
##  $ customer_type                 : chr  "Transient" "Transient" "Transient-Party" "Transient-Party" ...
##  $ adr                           : num  128 160 110 63 133 ...
##  $ required_car_parking_spaces   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ total_of_special_requests     : int  1 0 0 0 0 0 1 1 1 0 ...
##  $ reservation_status            : chr  "Check-Out" "Canceled" "Check-Out" "Canceled" ...
##  $ reservation_status_date       : chr  "2017-06-05" "2017-01-31" "2015-09-05" "2015-06-16" ...
##     hotel            is_canceled       lead_time     arrival_date_year
##  Length:59695       Min.   :0.0000   Min.   :  0.0   Min.   :2015     
##  Class :character   1st Qu.:0.0000   1st Qu.: 18.0   1st Qu.:2016     
##  Mode  :character   Median :0.0000   Median : 69.0   Median :2016     
##                     Mean   :0.3694   Mean   :104.2   Mean   :2016     
##                     3rd Qu.:1.0000   3rd Qu.:160.0   3rd Qu.:2017     
##                     Max.   :1.0000   Max.   :737.0   Max.   :2017     
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:59695       Min.   : 1.00            Min.   : 1.00            
##  Class :character   1st Qu.:16.00            1st Qu.: 8.00            
##  Mode  :character   Median :28.00            Median :16.00            
##                     Mean   :27.23            Mean   :15.78            
##                     3rd Qu.:38.00            3rd Qu.:23.00            
##                     Max.   :53.00            Max.   :31.00            
##  stays_in_weekend_nights stays_in_week_nights     adults          children     
##  Min.   : 0.000          Min.   : 0.000       Min.   : 0.000   Min.   :0.0000  
##  1st Qu.: 0.000          1st Qu.: 1.000       1st Qu.: 2.000   1st Qu.:0.0000  
##  Median : 1.000          Median : 2.000       Median : 2.000   Median :0.0000  
##  Mean   : 0.927          Mean   : 2.494       Mean   : 1.857   Mean   :0.1042  
##  3rd Qu.: 2.000          3rd Qu.: 3.000       3rd Qu.: 2.000   3rd Qu.:0.0000  
##  Max.   :16.000          Max.   :41.000       Max.   :50.000   Max.   :3.0000  
##      babies             meal             country          market_segment    
##  Min.   :0.000000   Length:59695       Length:59695       Length:59695      
##  1st Qu.:0.000000   Class :character   Class :character   Class :character  
##  Median :0.000000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   :0.007907                                                           
##  3rd Qu.:0.000000                                                           
##  Max.   :2.000000                                                           
##  distribution_channel is_repeated_guest previous_cancellations
##  Length:59695         Min.   :0.00000   Min.   : 0.0000       
##  Class :character     1st Qu.:0.00000   1st Qu.: 0.0000       
##  Mode  :character     Median :0.00000   Median : 0.0000       
##                       Mean   :0.03195   Mean   : 0.0883       
##                       3rd Qu.:0.00000   3rd Qu.: 0.0000       
##                       Max.   :1.00000   Max.   :26.0000       
##  previous_bookings_not_canceled reserved_room_type assigned_room_type
##  Min.   : 0.0000                Length:59695       Length:59695      
##  1st Qu.: 0.0000                Class :character   Class :character  
##  Median : 0.0000                Mode  :character   Mode  :character  
##  Mean   : 0.1362                                                     
##  3rd Qu.: 0.0000                                                     
##  Max.   :72.0000                                                     
##  booking_changes   deposit_type          agent             company         
##  Min.   : 0.0000   Length:59695       Length:59695       Length:59695      
##  1st Qu.: 0.0000   Class :character   Class :character   Class :character  
##  Median : 0.0000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 0.2227                                                           
##  3rd Qu.: 0.0000                                                           
##  Max.   :20.0000                                                           
##  days_in_waiting_list customer_type           adr        
##  Min.   :  0.00       Length:59695       Min.   :  0.00  
##  1st Qu.:  0.00       Class :character   1st Qu.: 69.00  
##  Median :  0.00       Mode  :character   Median : 94.78  
##  Mean   :  2.26                          Mean   :101.95  
##  3rd Qu.:  0.00                          3rd Qu.:126.00  
##  Max.   :391.00                          Max.   :510.00  
##  required_car_parking_spaces total_of_special_requests reservation_status
##  Min.   :0.00000             Min.   :0.0000            Length:59695      
##  1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
##  Median :0.00000             Median :0.0000            Mode  :character  
##  Mean   :0.06331             Mean   :0.5731                              
##  3rd Qu.:0.00000             3rd Qu.:1.0000                              
##  Max.   :8.00000             Max.   :5.0000                              
##  reservation_status_date
##  Length:59695           
##  Class :character       
##  Mode  :character       
##                         
##                         
## 

Scrutinize #2
In this section, IInd Sub sample - sub_sample_hotel_data_2 has been scrutinize.
1. Length of Sub sample - hotel_data_subsample_df_1.
2. Print Ist five rows (for few column) of sub sample.
3. Print Last five rows (for few column) of sub sample.
Note : By seeing 2 & 3, data consistency can be verified.
4. Print the Internal structure of Sub sample.
5. then summarized - sub sample

## [1] 59695
##          hotel lead_time meal market_segment distribution_channel country
## 1   City Hotel       208   BB      Online TA                TA/TO     BEL
## 2   City Hotel       164   BB         Groups                TA/TO     PRT
## 3   City Hotel        41   HB  Offline TA/TO                TA/TO     ITA
## 4   City Hotel        43   HB         Groups                TA/TO     PRT
## 5 Resort Hotel        80   HB  Offline TA/TO                TA/TO     PRT
##              hotel lead_time meal market_segment distribution_channel country
## 59691   City Hotel        10   SC      Online TA                TA/TO     NLD
## 59692   City Hotel         3   BB      Corporate            Corporate     PRT
## 59693 Resort Hotel        74   BB      Online TA                TA/TO     PRT
## 59694 Resort Hotel        10   BB      Online TA                TA/TO     FRA
## 59695   City Hotel        41   HB  Offline TA/TO                TA/TO     ITA
## 'data.frame':    59695 obs. of  32 variables:
##  $ hotel                         : chr  "City Hotel" "Resort Hotel" "Resort Hotel" "City Hotel" ...
##  $ is_canceled                   : int  1 0 0 0 0 0 1 0 0 0 ...
##  $ lead_time                     : int  25 100 16 45 11 11 62 11 62 10 ...
##  $ arrival_date_year             : int  2016 2016 2016 2016 2017 2017 2015 2015 2017 2016 ...
##  $ arrival_date_month            : chr  "March" "July" "May" "January" ...
##  $ arrival_date_week_number      : int  14 29 22 4 29 12 30 31 17 23 ...
##  $ arrival_date_day_of_month     : int  29 15 22 23 21 21 19 1 24 31 ...
##  $ stays_in_weekend_nights       : int  0 0 2 1 2 0 2 4 1 0 ...
##  $ stays_in_week_nights          : int  4 1 1 1 3 3 5 10 3 3 ...
##  $ adults                        : int  2 2 1 2 2 2 2 1 2 2 ...
##  $ children                      : int  2 2 0 0 0 0 0 0 0 0 ...
##  $ babies                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ meal                          : chr  "BB" "BB" "BB" "BB" ...
##  $ country                       : chr  "GBR" "NLD" "PRT" "BEL" ...
##  $ market_segment                : chr  "Online TA" "Online TA" "Direct" "Online TA" ...
##  $ distribution_channel          : chr  "TA/TO" "TA/TO" "Corporate" "TA/TO" ...
##  $ is_repeated_guest             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_cancellations        : int  0 0 1 0 0 0 0 0 0 0 ...
##  $ previous_bookings_not_canceled: int  0 0 5 0 0 0 0 0 0 0 ...
##  $ reserved_room_type            : chr  "F" "G" "A" "A" ...
##  $ assigned_room_type            : chr  "F" "G" "A" "A" ...
##  $ booking_changes               : int  0 3 1 0 0 0 0 0 0 0 ...
##  $ deposit_type                  : chr  "No Deposit" "No Deposit" "No Deposit" "No Deposit" ...
##  $ agent                         : chr  "9" "240" "NULL" "9" ...
##  $ company                       : chr  "NULL" "NULL" "292" "NULL" ...
##  $ days_in_waiting_list          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ customer_type                 : chr  "Transient" "Transient-Party" "Transient" "Transient-Party" ...
##  $ adr                           : num  216 226 70 80.3 123.2 ...
##  $ required_car_parking_spaces   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ total_of_special_requests     : int  0 1 0 0 1 0 1 0 1 1 ...
##  $ reservation_status            : chr  "Canceled" "Check-Out" "Check-Out" "Check-Out" ...
##  $ reservation_status_date       : chr  "2016-03-26" "2016-07-16" "2016-05-25" "2016-01-25" ...
##     hotel            is_canceled       lead_time     arrival_date_year
##  Length:59695       Min.   :0.0000   Min.   :  0.0   Min.   :2015     
##  Class :character   1st Qu.:0.0000   1st Qu.: 18.0   1st Qu.:2016     
##  Mode  :character   Median :0.0000   Median : 69.0   Median :2016     
##                     Mean   :0.3709   Mean   :104.4   Mean   :2016     
##                     3rd Qu.:1.0000   3rd Qu.:162.0   3rd Qu.:2017     
##                     Max.   :1.0000   Max.   :709.0   Max.   :2017     
##                                                                       
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:59695       Min.   : 1.00            Min.   : 1.00            
##  Class :character   1st Qu.:16.00            1st Qu.: 8.00            
##  Mode  :character   Median :28.00            Median :16.00            
##                     Mean   :27.22            Mean   :15.82            
##                     3rd Qu.:38.00            3rd Qu.:23.00            
##                     Max.   :53.00            Max.   :31.00            
##                                                                       
##  stays_in_weekend_nights stays_in_week_nights     adults      
##  Min.   : 0.0000         Min.   : 0.000       Min.   : 0.000  
##  1st Qu.: 0.0000         1st Qu.: 1.000       1st Qu.: 2.000  
##  Median : 1.0000         Median : 2.000       Median : 2.000  
##  Mean   : 0.9279         Mean   : 2.498       Mean   : 1.859  
##  3rd Qu.: 2.0000         3rd Qu.: 3.000       3rd Qu.: 2.000  
##  Max.   :19.0000         Max.   :50.000       Max.   :55.000  
##                                                               
##     children           babies              meal             country         
##  Min.   : 0.0000   Min.   : 0.000000   Length:59695       Length:59695      
##  1st Qu.: 0.0000   1st Qu.: 0.000000   Class :character   Class :character  
##  Median : 0.0000   Median : 0.000000   Mode  :character   Mode  :character  
##  Mean   : 0.1044   Mean   : 0.008393                                        
##  3rd Qu.: 0.0000   3rd Qu.: 0.000000                                        
##  Max.   :10.0000   Max.   :10.000000                                        
##  NA's   :3                                                                  
##  market_segment     distribution_channel is_repeated_guest
##  Length:59695       Length:59695         Min.   :0.00000  
##  Class :character   Class :character     1st Qu.:0.00000  
##  Mode  :character   Mode  :character     Median :0.00000  
##                                          Mean   :0.03198  
##                                          3rd Qu.:0.00000  
##                                          Max.   :1.00000  
##                                                           
##  previous_cancellations previous_bookings_not_canceled reserved_room_type
##  Min.   : 0.00000       Min.   : 0.0000                Length:59695      
##  1st Qu.: 0.00000       1st Qu.: 0.0000                Class :character  
##  Median : 0.00000       Median : 0.0000                Mode  :character  
##  Mean   : 0.08585       Mean   : 0.1308                                  
##  3rd Qu.: 0.00000       3rd Qu.: 0.0000                                  
##  Max.   :26.00000       Max.   :68.0000                                  
##                                                                          
##  assigned_room_type booking_changes  deposit_type          agent          
##  Length:59695       Min.   : 0.000   Length:59695       Length:59695      
##  Class :character   1st Qu.: 0.000   Class :character   Class :character  
##  Mode  :character   Median : 0.000   Mode  :character   Mode  :character  
##                     Mean   : 0.221                                        
##                     3rd Qu.: 0.000                                        
##                     Max.   :17.000                                        
##                                                                           
##    company          days_in_waiting_list customer_type           adr        
##  Length:59695       Min.   :  0.000      Length:59695       Min.   : -6.38  
##  Class :character   1st Qu.:  0.000      Class :character   1st Qu.: 70.00  
##  Mode  :character   Median :  0.000      Mode  :character   Median : 95.00  
##                     Mean   :  2.403                         Mean   :101.97  
##                     3rd Qu.:  0.000                         3rd Qu.:126.00  
##                     Max.   :391.000                         Max.   :450.00  
##                                                                             
##  required_car_parking_spaces total_of_special_requests reservation_status
##  Min.   :0.0000              Min.   :0.0000            Length:59695      
##  1st Qu.:0.0000              1st Qu.:0.0000            Class :character  
##  Median :0.0000              Median :0.0000            Mode  :character  
##  Mean   :0.0628              Mean   :0.5766                              
##  3rd Qu.:0.0000              3rd Qu.:1.0000                              
##  Max.   :8.0000              Max.   :5.0000                              
##                                                                          
##  reservation_status_date
##  Length:59695           
##  Class :character       
##  Mode  :character       
##                         
##                         
##                         
## 

Scrutinize #3
In this section, IIIrd Sub sample - sub_sample_hotel_data_3 has been scrutinize.
1. Length of Sub sample - hotel_data_subsample_df_3.
2. Print Ist five rows (for few column) of sub sample.
3. Print Last five rows (for few column) of sub sample.
Note : By seeing 2 & 3, data consistency can be verified.
4. Print the Internal structure of Sub sample.
5. then summarized - sub sample

## [1] 59695
##          hotel lead_time meal market_segment distribution_channel country
## 1   City Hotel       208   BB      Online TA                TA/TO     BEL
## 2   City Hotel       164   BB         Groups                TA/TO     PRT
## 3   City Hotel        41   HB  Offline TA/TO                TA/TO     ITA
## 4   City Hotel        43   HB         Groups                TA/TO     PRT
## 5 Resort Hotel        80   HB  Offline TA/TO                TA/TO     PRT
##              hotel lead_time meal market_segment distribution_channel country
## 59691   City Hotel        10   SC      Online TA                TA/TO     NLD
## 59692   City Hotel         3   BB      Corporate            Corporate     PRT
## 59693 Resort Hotel        74   BB      Online TA                TA/TO     PRT
## 59694 Resort Hotel        10   BB      Online TA                TA/TO     FRA
## 59695   City Hotel        41   HB  Offline TA/TO                TA/TO     ITA
## 'data.frame':    59695 obs. of  32 variables:
##  $ hotel                         : chr  "City Hotel" "City Hotel" "City Hotel" "Resort Hotel" ...
##  $ is_canceled                   : int  0 1 1 0 1 0 0 0 0 0 ...
##  $ lead_time                     : int  115 265 478 152 15 181 4 77 38 58 ...
##  $ arrival_date_year             : int  2017 2015 2017 2016 2016 2015 2016 2017 2015 2017 ...
##  $ arrival_date_month            : chr  "August" "July" "August" "May" ...
##  $ arrival_date_week_number      : int  34 28 32 19 53 32 7 19 35 30 ...
##  $ arrival_date_day_of_month     : int  20 9 8 2 27 6 12 11 25 28 ...
##  $ stays_in_weekend_nights       : int  2 0 0 1 0 0 1 0 2 2 ...
##  $ stays_in_week_nights          : int  1 2 4 2 1 2 2 3 6 3 ...
##  $ adults                        : int  1 2 1 2 2 2 2 2 2 2 ...
##  $ children                      : int  0 0 0 0 0 0 2 0 1 0 ...
##  $ babies                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ meal                          : chr  "BB" "BB" "BB" "BB" ...
##  $ country                       : chr  "GBR" "PRT" "PRT" "ARG" ...
##  $ market_segment                : chr  "Offline TA/TO" "Groups" "Offline TA/TO" "Offline TA/TO" ...
##  $ distribution_channel          : chr  "TA/TO" "TA/TO" "TA/TO" "TA/TO" ...
##  $ is_repeated_guest             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_cancellations        : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ previous_bookings_not_canceled: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ reserved_room_type            : chr  "D" "A" "A" "A" ...
##  $ assigned_room_type            : chr  "D" "A" "C" "A" ...
##  $ booking_changes               : int  1 0 2 0 0 0 0 0 2 0 ...
##  $ deposit_type                  : chr  "No Deposit" "No Deposit" "No Deposit" "No Deposit" ...
##  $ agent                         : chr  "98" "1" "229" "336" ...
##  $ company                       : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ days_in_waiting_list          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ customer_type                 : chr  "Transient-Party" "Contract" "Transient-Party" "Transient" ...
##  $ adr                           : num  125 62 88.5 47.4 65 ...
##  $ required_car_parking_spaces   : int  0 0 0 1 0 0 0 0 0 0 ...
##  $ total_of_special_requests     : int  0 0 0 1 0 2 1 0 1 2 ...
##  $ reservation_status            : chr  "Check-Out" "Canceled" "Canceled" "Check-Out" ...
##  $ reservation_status_date       : chr  "2017-08-23" "2015-01-01" "2017-07-28" "2016-05-05" ...
##     hotel            is_canceled       lead_time     arrival_date_year
##  Length:59695       Min.   :0.0000   Min.   :  0.0   Min.   :2015     
##  Class :character   1st Qu.:0.0000   1st Qu.: 18.0   1st Qu.:2016     
##  Mode  :character   Median :0.0000   Median : 69.0   Median :2016     
##                     Mean   :0.3686   Mean   :103.1   Mean   :2016     
##                     3rd Qu.:1.0000   3rd Qu.:159.0   3rd Qu.:2017     
##                     Max.   :1.0000   Max.   :737.0   Max.   :2017     
##                                                                       
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:59695       Min.   : 1.00            Min.   : 1.00            
##  Class :character   1st Qu.:16.00            1st Qu.: 8.00            
##  Mode  :character   Median :28.00            Median :16.00            
##                     Mean   :27.18            Mean   :15.81            
##                     3rd Qu.:38.00            3rd Qu.:23.00            
##                     Max.   :53.00            Max.   :31.00            
##                                                                       
##  stays_in_weekend_nights stays_in_week_nights     adults          children    
##  Min.   : 0.0000         Min.   : 0.000       Min.   : 0.000   Min.   :0.000  
##  1st Qu.: 0.0000         1st Qu.: 1.000       1st Qu.: 2.000   1st Qu.:0.000  
##  Median : 1.0000         Median : 2.000       Median : 2.000   Median :0.000  
##  Mean   : 0.9267         Mean   : 2.483       Mean   : 1.854   Mean   :0.103  
##  3rd Qu.: 2.0000         3rd Qu.: 3.000       3rd Qu.: 2.000   3rd Qu.:0.000  
##  Max.   :19.0000         Max.   :50.000       Max.   :55.000   Max.   :3.000  
##                                                                NA's   :2      
##      babies              meal             country          market_segment    
##  Min.   : 0.000000   Length:59695       Length:59695       Length:59695      
##  1st Qu.: 0.000000   Class :character   Class :character   Class :character  
##  Median : 0.000000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 0.008259                                                           
##  3rd Qu.: 0.000000                                                           
##  Max.   :10.000000                                                           
##                                                                              
##  distribution_channel is_repeated_guest previous_cancellations
##  Length:59695         Min.   :0.00000   Min.   : 0.00000      
##  Class :character     1st Qu.:0.00000   1st Qu.: 0.00000      
##  Mode  :character     Median :0.00000   Median : 0.00000      
##                       Mean   :0.03319   Mean   : 0.08972      
##                       3rd Qu.:0.00000   3rd Qu.: 0.00000      
##                       Max.   :1.00000   Max.   :26.00000      
##                                                               
##  previous_bookings_not_canceled reserved_room_type assigned_room_type
##  Min.   : 0.0000                Length:59695       Length:59695      
##  1st Qu.: 0.0000                Class :character   Class :character  
##  Median : 0.0000                Mode  :character   Mode  :character  
##  Mean   : 0.1412                                                     
##  3rd Qu.: 0.0000                                                     
##  Max.   :70.0000                                                     
##                                                                      
##  booking_changes  deposit_type          agent             company         
##  Min.   : 0.000   Length:59695       Length:59695       Length:59695      
##  1st Qu.: 0.000   Class :character   Class :character   Class :character  
##  Median : 0.000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 0.223                                                           
##  3rd Qu.: 0.000                                                           
##  Max.   :18.000                                                           
##                                                                           
##  days_in_waiting_list customer_type           adr        
##  Min.   :  0.000      Length:59695       Min.   : -6.38  
##  1st Qu.:  0.000      Class :character   1st Qu.: 68.46  
##  Median :  0.000      Mode  :character   Median : 94.35  
##  Mean   :  2.317                         Mean   :101.27  
##  3rd Qu.:  0.000                         3rd Qu.:125.10  
##  Max.   :391.000                         Max.   :402.00  
##                                                          
##  required_car_parking_spaces total_of_special_requests reservation_status
##  Min.   :0.00000             Min.   :0.0000            Length:59695      
##  1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
##  Median :0.00000             Median :0.0000            Mode  :character  
##  Mean   :0.06198             Mean   :0.5695                              
##  3rd Qu.:0.00000             3rd Qu.:1.0000                              
##  Max.   :2.00000             Max.   :5.0000                              
##                                                                          
##  reservation_status_date
##  Length:59695           
##  Class :character       
##  Mode  :character       
##                         
##                         
##                         
## 

Scrutinize #4
In this section, IVth Sub sample - sub_sample_hotel_data_4 has been scrutinize.
1. Length of Sub sample - hotel_data_subsample_df_4.
2. Print Ist five rows (for few column) of sub sample.
3. Print Last five rows (for few column) of sub sample.
Note : By seeing 2 & 3, data consistency can be verified.
4. Print the Internal structure of Sub sample.
5. then summarized - sub sample

## [1] 59695
##          hotel lead_time meal market_segment distribution_channel country
## 1   City Hotel       208   BB      Online TA                TA/TO     BEL
## 2   City Hotel       164   BB         Groups                TA/TO     PRT
## 3   City Hotel        41   HB  Offline TA/TO                TA/TO     ITA
## 4   City Hotel        43   HB         Groups                TA/TO     PRT
## 5 Resort Hotel        80   HB  Offline TA/TO                TA/TO     PRT
##              hotel lead_time meal market_segment distribution_channel country
## 59691   City Hotel        10   SC      Online TA                TA/TO     NLD
## 59692   City Hotel         3   BB      Corporate            Corporate     PRT
## 59693 Resort Hotel        74   BB      Online TA                TA/TO     PRT
## 59694 Resort Hotel        10   BB      Online TA                TA/TO     FRA
## 59695   City Hotel        41   HB  Offline TA/TO                TA/TO     ITA
## 'data.frame':    59695 obs. of  32 variables:
##  $ hotel                         : chr  "City Hotel" "Resort Hotel" "City Hotel" "City Hotel" ...
##  $ is_canceled                   : int  0 0 1 1 0 0 0 1 1 0 ...
##  $ lead_time                     : int  116 1 102 73 3 54 33 5 226 48 ...
##  $ arrival_date_year             : int  2017 2016 2016 2017 2016 2015 2015 2016 2016 2016 ...
##  $ arrival_date_month            : chr  "April" "May" "November" "April" ...
##  $ arrival_date_week_number      : int  14 20 47 14 14 52 41 38 36 35 ...
##  $ arrival_date_day_of_month     : int  2 9 17 3 29 22 8 12 1 24 ...
##  $ stays_in_weekend_nights       : int  1 0 2 1 0 0 0 1 2 0 ...
##  $ stays_in_week_nights          : int  0 0 3 3 2 5 3 1 5 4 ...
##  $ adults                        : int  2 1 2 2 2 3 2 1 2 3 ...
##  $ children                      : int  0 0 0 0 0 0 0 0 2 0 ...
##  $ babies                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ meal                          : chr  "SC" "BB" "BB" "BB" ...
##  $ country                       : chr  "FRA" "GBR" "GBR" "PRT" ...
##  $ market_segment                : chr  "Online TA" "Direct" "Online TA" "Groups" ...
##  $ distribution_channel          : chr  "TA/TO" "Direct" "TA/TO" "TA/TO" ...
##  $ is_repeated_guest             : int  0 0 0 0 0 0 0 1 0 0 ...
##  $ previous_cancellations        : int  0 0 0 0 0 0 0 2 0 0 ...
##  $ previous_bookings_not_canceled: int  0 0 0 0 0 0 0 1 0 0 ...
##  $ reserved_room_type            : chr  "A" "G" "A" "A" ...
##  $ assigned_room_type            : chr  "A" "G" "A" "A" ...
##  $ booking_changes               : int  0 2 0 0 0 0 2 0 0 0 ...
##  $ deposit_type                  : chr  "No Deposit" "No Deposit" "No Deposit" "Non Refund" ...
##  $ agent                         : chr  "9" "NULL" "9" "20" ...
##  $ company                       : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ days_in_waiting_list          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ customer_type                 : chr  "Transient" "Transient" "Transient" "Transient" ...
##  $ adr                           : num  99 0 85 105 58 ...
##  $ required_car_parking_spaces   : int  0 0 0 0 0 1 1 0 0 0 ...
##  $ total_of_special_requests     : int  1 0 2 0 0 1 0 0 0 1 ...
##  $ reservation_status            : chr  "Check-Out" "Check-Out" "Canceled" "Canceled" ...
##  $ reservation_status_date       : chr  "2017-04-03" "2016-05-09" "2016-09-21" "2017-01-20" ...
##     hotel            is_canceled     lead_time     arrival_date_year
##  Length:59695       Min.   :0.00   Min.   :  0.0   Min.   :2015     
##  Class :character   1st Qu.:0.00   1st Qu.: 18.0   1st Qu.:2016     
##  Mode  :character   Median :0.00   Median : 68.0   Median :2016     
##                     Mean   :0.37   Mean   :103.9   Mean   :2016     
##                     3rd Qu.:1.00   3rd Qu.:160.0   3rd Qu.:2017     
##                     Max.   :1.00   Max.   :737.0   Max.   :2017     
##                                                                     
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:59695       Min.   : 1.00            Min.   : 1.0             
##  Class :character   1st Qu.:16.00            1st Qu.: 8.0             
##  Mode  :character   Median :28.00            Median :16.0             
##                     Mean   :27.16            Mean   :15.8             
##                     3rd Qu.:38.00            3rd Qu.:23.0             
##                     Max.   :53.00            Max.   :31.0             
##                                                                       
##  stays_in_weekend_nights stays_in_week_nights     adults      
##  Min.   : 0.0000         Min.   : 0.000       Min.   : 0.000  
##  1st Qu.: 0.0000         1st Qu.: 1.000       1st Qu.: 2.000  
##  Median : 1.0000         Median : 2.000       Median : 2.000  
##  Mean   : 0.9229         Mean   : 2.503       Mean   : 1.858  
##  3rd Qu.: 2.0000         3rd Qu.: 3.000       3rd Qu.: 2.000  
##  Max.   :18.0000         Max.   :42.000       Max.   :50.000  
##                                                               
##     children           babies              meal             country         
##  Min.   : 0.0000   Min.   : 0.000000   Length:59695       Length:59695      
##  1st Qu.: 0.0000   1st Qu.: 0.000000   Class :character   Class :character  
##  Median : 0.0000   Median : 0.000000   Mode  :character   Mode  :character  
##  Mean   : 0.1034   Mean   : 0.008945                                        
##  3rd Qu.: 0.0000   3rd Qu.: 0.000000                                        
##  Max.   :10.0000   Max.   :10.000000                                        
##  NA's   :2                                                                  
##  market_segment     distribution_channel is_repeated_guest
##  Length:59695       Length:59695         Min.   :0.00000  
##  Class :character   Class :character     1st Qu.:0.00000  
##  Mode  :character   Mode  :character     Median :0.00000  
##                                          Mean   :0.03265  
##                                          3rd Qu.:0.00000  
##                                          Max.   :1.00000  
##                                                           
##  previous_cancellations previous_bookings_not_canceled reserved_room_type
##  Min.   : 0.00000       Min.   : 0.0000                Length:59695      
##  1st Qu.: 0.00000       1st Qu.: 0.0000                Class :character  
##  Median : 0.00000       Median : 0.0000                Mode  :character  
##  Mean   : 0.09021       Mean   : 0.1407                                  
##  3rd Qu.: 0.00000       3rd Qu.: 0.0000                                  
##  Max.   :26.00000       Max.   :71.0000                                  
##                                                                          
##  assigned_room_type booking_changes   deposit_type          agent          
##  Length:59695       Min.   : 0.0000   Length:59695       Length:59695      
##  Class :character   1st Qu.: 0.0000   Class :character   Class :character  
##  Mode  :character   Median : 0.0000   Mode  :character   Mode  :character  
##                     Mean   : 0.2207                                        
##                     3rd Qu.: 0.0000                                        
##                     Max.   :17.0000                                        
##                                                                            
##    company          days_in_waiting_list customer_type           adr         
##  Length:59695       Min.   :  0.000      Length:59695       Min.   :  -6.38  
##  Class :character   1st Qu.:  0.000      Class :character   1st Qu.:  68.42  
##  Mode  :character   Median :  0.000      Mode  :character   Median :  94.50  
##                     Mean   :  2.343                         Mean   : 101.78  
##                     3rd Qu.:  0.000                         3rd Qu.: 126.00  
##                     Max.   :391.000                         Max.   :5400.00  
##                                                                              
##  required_car_parking_spaces total_of_special_requests reservation_status
##  Min.   :0.00000             Min.   :0.0000            Length:59695      
##  1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
##  Median :0.00000             Median :0.0000            Mode  :character  
##  Mean   :0.06166             Mean   :0.5683                              
##  3rd Qu.:0.00000             3rd Qu.:1.0000                              
##  Max.   :3.00000             Max.   :5.0000                              
##                                                                          
##  reservation_status_date
##  Length:59695           
##  Class :character       
##  Mode  :character       
##                         
##                         
##                         
## 

Scrutinize #5
In this section, Vth Sub sample - sub_sample_hotel_data_5 has been scrutinize.
1. Length of Sub sample - hotel_data_subsample_df_5.
2. Print Ist five rows(for few column) of sub sample.
3. Print Last five rows(for few column) of sub sample.
Note : By seeing 2 & 3, data consistency can be verified.
4. Print the Internal structure of Sub sample.
5. then summarized - sub sample

## [1] 59695
##          hotel meal market_segment distribution_channel country
## 1   City Hotel   BB      Online TA                TA/TO     BEL
## 2   City Hotel   BB         Groups                TA/TO     PRT
## 3   City Hotel   HB  Offline TA/TO                TA/TO     ITA
## 4   City Hotel   HB         Groups                TA/TO     PRT
## 5 Resort Hotel   HB  Offline TA/TO                TA/TO     PRT
##              hotel meal market_segment distribution_channel country
## 59691   City Hotel   SC      Online TA                TA/TO     NLD
## 59692   City Hotel   BB      Corporate            Corporate     PRT
## 59693 Resort Hotel   BB      Online TA                TA/TO     PRT
## 59694 Resort Hotel   BB      Online TA                TA/TO     FRA
## 59695   City Hotel   HB  Offline TA/TO                TA/TO     ITA
## 'data.frame':    59695 obs. of  32 variables:
##  $ hotel                         : chr  "City Hotel" "City Hotel" "Resort Hotel" "City Hotel" ...
##  $ is_canceled                   : int  0 0 1 1 1 1 1 0 1 0 ...
##  $ lead_time                     : int  0 160 168 15 80 75 80 6 39 4 ...
##  $ arrival_date_year             : int  2015 2017 2016 2017 2015 2016 2015 2015 2016 2017 ...
##  $ arrival_date_month            : chr  "December" "June" "April" "August" ...
##  $ arrival_date_week_number      : int  51 25 16 32 40 45 45 34 27 10 ...
##  $ arrival_date_day_of_month     : int  16 22 12 7 28 30 2 18 27 6 ...
##  $ stays_in_weekend_nights       : int  0 0 0 1 1 2 1 0 1 1 ...
##  $ stays_in_week_nights          : int  1 3 2 3 3 0 1 3 0 0 ...
##  $ adults                        : int  1 2 2 2 2 2 2 2 2 1 ...
##  $ children                      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ babies                        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ meal                          : chr  "BB" "SC" "HB" "BB" ...
##  $ country                       : chr  "ESP" "BEL" "PRT" "NLD" ...
##  $ market_segment                : chr  "Direct" "Online TA" "Groups" "Direct" ...
##  $ distribution_channel          : chr  "Direct" "TA/TO" "TA/TO" "Direct" ...
##  $ is_repeated_guest             : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_cancellations        : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ previous_bookings_not_canceled: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ reserved_room_type            : chr  "A" "A" "A" "A" ...
##  $ assigned_room_type            : chr  "D" "A" "A" "A" ...
##  $ booking_changes               : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ deposit_type                  : chr  "No Deposit" "No Deposit" "Non Refund" "No Deposit" ...
##  $ agent                         : chr  "NULL" "9" "245" "14" ...
##  $ company                       : chr  "NULL" "NULL" "NULL" "NULL" ...
##  $ days_in_waiting_list          : int  0 0 0 0 0 0 60 0 0 0 ...
##  $ customer_type                 : chr  "Transient" "Transient" "Transient" "Transient" ...
##  $ adr                           : num  75 99 86 175 69.5 105 111 75 89.1 62.4 ...
##  $ required_car_parking_spaces   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ total_of_special_requests     : int  0 0 0 0 2 0 0 2 0 0 ...
##  $ reservation_status            : chr  "Check-Out" "Check-Out" "Canceled" "Canceled" ...
##  $ reservation_status_date       : chr  "2015-12-17" "2017-06-25" "2016-01-05" "2017-07-28" ...
##     hotel            is_canceled      lead_time     arrival_date_year
##  Length:59695       Min.   :0.000   Min.   :  0.0   Min.   :2015     
##  Class :character   1st Qu.:0.000   1st Qu.: 18.0   1st Qu.:2016     
##  Mode  :character   Median :0.000   Median : 69.0   Median :2016     
##                     Mean   :0.369   Mean   :103.9   Mean   :2016     
##                     3rd Qu.:1.000   3rd Qu.:160.5   3rd Qu.:2017     
##                     Max.   :1.000   Max.   :737.0   Max.   :2017     
##                                                                      
##  arrival_date_month arrival_date_week_number arrival_date_day_of_month
##  Length:59695       Min.   : 1.00            Min.   : 1.00            
##  Class :character   1st Qu.:16.00            1st Qu.: 8.00            
##  Mode  :character   Median :28.00            Median :16.00            
##                     Mean   :27.17            Mean   :15.86            
##                     3rd Qu.:38.00            3rd Qu.:24.00            
##                     Max.   :53.00            Max.   :31.00            
##                                                                       
##  stays_in_weekend_nights stays_in_week_nights     adults          children     
##  Min.   : 0.0000         Min.   : 0.000       Min.   : 0.000   Min.   :0.0000  
##  1st Qu.: 0.0000         1st Qu.: 1.000       1st Qu.: 2.000   1st Qu.:0.0000  
##  Median : 1.0000         Median : 2.000       Median : 2.000   Median :0.0000  
##  Mean   : 0.9324         Mean   : 2.524       Mean   : 1.859   Mean   :0.1048  
##  3rd Qu.: 2.0000         3rd Qu.: 3.000       3rd Qu.: 2.000   3rd Qu.:0.0000  
##  Max.   :18.0000         Max.   :42.000       Max.   :50.000   Max.   :3.0000  
##                                                                NA's   :3       
##      babies              meal             country          market_segment    
##  Min.   : 0.000000   Length:59695       Length:59695       Length:59695      
##  1st Qu.: 0.000000   Class :character   Class :character   Class :character  
##  Median : 0.000000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 0.008292                                                           
##  3rd Qu.: 0.000000                                                           
##  Max.   :10.000000                                                           
##                                                                              
##  distribution_channel is_repeated_guest previous_cancellations
##  Length:59695         Min.   :0.0000    Min.   : 0.00000      
##  Class :character     1st Qu.:0.0000    1st Qu.: 0.00000      
##  Mode  :character     Median :0.0000    Median : 0.00000      
##                       Mean   :0.0319    Mean   : 0.08927      
##                       3rd Qu.:0.0000    3rd Qu.: 0.00000      
##                       Max.   :1.0000    Max.   :26.00000      
##                                                               
##  previous_bookings_not_canceled reserved_room_type assigned_room_type
##  Min.   : 0.0000                Length:59695       Length:59695      
##  1st Qu.: 0.0000                Class :character   Class :character  
##  Median : 0.0000                Mode  :character   Mode  :character  
##  Mean   : 0.1388                                                     
##  3rd Qu.: 0.0000                                                     
##  Max.   :71.0000                                                     
##                                                                      
##  booking_changes   deposit_type          agent             company         
##  Min.   : 0.0000   Length:59695       Length:59695       Length:59695      
##  1st Qu.: 0.0000   Class :character   Class :character   Class :character  
##  Median : 0.0000   Mode  :character   Mode  :character   Mode  :character  
##  Mean   : 0.2218                                                           
##  3rd Qu.: 0.0000                                                           
##  Max.   :17.0000                                                           
##                                                                            
##  days_in_waiting_list customer_type           adr         
##  Min.   :  0.000      Length:59695       Min.   :  -6.38  
##  1st Qu.:  0.000      Class :character   1st Qu.:  69.21  
##  Median :  0.000      Mode  :character   Median :  94.50  
##  Mean   :  2.215                         Mean   : 102.12  
##  3rd Qu.:  0.000                         3rd Qu.: 126.00  
##  Max.   :391.000                         Max.   :5400.00  
##                                                           
##  required_car_parking_spaces total_of_special_requests reservation_status
##  Min.   :0.00000             Min.   :0.0000            Length:59695      
##  1st Qu.:0.00000             1st Qu.:0.0000            Class :character  
##  Median :0.00000             Median :0.0000            Mode  :character  
##  Mean   :0.06389             Mean   :0.5772                              
##  3rd Qu.:0.00000             3rd Qu.:1.0000                              
##  Max.   :8.00000             Max.   :5.0000                              
##                                                                          
##  reservation_status_date
##  Length:59695           
##  Class :character       
##  Mode  :character       
##                         
##                         
##                         
## 

Box plot for a numeric variable -‘lead_time’ in subsample.

Above graph shown the distribution of lead_time across all sub-samples.
Note: in above plot, 1 - subsample-1, 2 - subsample-2,3-subsample-3, 4-subsample-4,5-subsample-5


#Anomaly Detection for sub sample data frame#1 : - hotel_data_subsample_df_1

## # A tibble: 2 × 2
##   hotel        count
##   <chr>        <int>
## 1 City Hotel   39893
## 2 Resort Hotel 19802
## # A tibble: 2 × 2
##   hotel        count
##   <chr>        <int>
## 1 City Hotel   39906
## 2 Resort Hotel 19789
## # A tibble: 2 × 2
##   hotel        count
##   <chr>        <int>
## 1 City Hotel   39651
## 2 Resort Hotel 20044
## # A tibble: 2 × 2
##   hotel        count
##   <chr>        <int>
## 1 City Hotel   39608
## 2 Resort Hotel 20087
## # A tibble: 2 × 2
##   hotel        count
##   <chr>        <int>
## 1 City Hotel   39607
## 2 Resort Hotel 20088

Ask:How you approached each subsample along with anomaly?

In above section, subsample has been create with the help of sample_n() function of dplyr package with 50% size of orignal data set. In above determining of Anomaly Detection, We have grouped the data by the “hotel” and then calculate the counts of observation for each hotal type with in the sub-sample. Observation is - 1. Each subsample have different number of hotel types
2. In sab-sample - hotel_data_subsample_df_1 and hotel_data_subsample_df_5 , Number of Resort Hotel is higher then other sub-samples (hotel_data_subsample_df_2,hotel_data_subsample_df_3,hotel_data_subsample_df_4)

Determing frequency tables for Consistency Analysis

Ask: how you approached each subsample along and consistency analysis?

  1. Create frequency tables for the “country” variable for each sub-sample(by using table function) for Consistency Analysis and assign in variable.
  2. For Consistency analysis, comparing the distributions of countries across the subsamples. Consistency would imply that the distribution of countries remains relatively stable across the subsamples.

Performing Monte Carlo Simulation of Chi-Square Test for detemining consistency.

## Observed Chi-Square Test Statistic: 1681060
## P-value of sub sample 1: 0.428

## Observed Chi-Square Test Statistic: 1662916
## P-value of sun-sample 2: 0.442

## Observed Chi-Square Test Statistic: 1671291
## P-value of sub sample 3: 0.476

## Observed Chi-Square Test Statistic: 1738752
## P-value of sub sample 4: 0.46

## Observed Chi-Square Test Statistic: 1664807
## P-value of sub sample 5: 0.45

  1. The chi-square test has been run on all sub-sample datasets for determining the Consistencey in sub-sample
  2. In chi-square test on sub-samples, Since P-value is around .5 which is > 0.1 so we can not conclude a significant difference exists.

#Monte Carlo Simulations of data set hotel_data_subsample_df_1

Monte Carlo Simulations

Monte Carlo Simulations histograms diagram is graphical representationof frequency distribution of data set which were based on randam numbers genetated using Monte Carlo simulations.

In above histograms diagram, The Title of Graph is - “Monte Carlo Simulation of Lead Time” which is indicating that the plot represt the distribution of Lead Times(on X Axis) derived from Monte Carlo simulation. The X-axis label - Lead Time specifies the variable being measured.

The hight of each bar represnts the frequency of lead times falling within a specific range. Higher Bar means a higher occurence of lead time with in the range.

The histogram can be compared to expected or historical lead time data. Discrepancies may highlight areas where the simulation differs from real-world observations, prompting further investigation or refinement of the simulation model.

The insights gained from the histogram can be used for decision support, especially in scenarios where lead time variability is a critical factor. Understanding the distribution helps in making informed decisions and developing strategies to manage lead time uncertainties.

Thank You.!!!