Preliminary Work

Load Data Frames

load("C:/Users/Himanshu/Desktop/Hotel/Project_Hotel.RData")
attach(fill)

As I have already made many changes to the initial dataframe (selected appropriate columns and created new columns from old ones), the final usable dataframe ‘fill’ is loaded.

Inspecting data

dim(fill)
## [1] 1978   26
str(fill)
## 'data.frame':    1978 obs. of  26 variables:
##  $ guest_recommendation: int  85 87 50 63 76 90 67 0 84 92 ...
##  $ hotel_category      : Factor w/ 2 levels "gostays","regular": 1 2 2 2 1 2 2 2 2 2 ...
##  $ hotel_star_rating   : int  2 0 0 2 3 0 0 3 3 0 ...
##  $ image_count         : int  20 17 28 6 20 13 3 17 7 6 ...
##  $ property_type       : Factor w/ 18 levels "Beach Hut","BnB",..: 15 6 15 9 9 9 9 14 9 15 ...
##  $ room_count          : int  17 18 15 20 24 4 22 21 10 14 ...
##  $ site_review_count   : int  87 8 2 121 550 21 28 1 57 127 ...
##  $ site_review_rating  : num  4 4.5 2.5 2.8 4 4.3 3.6 1 4.2 4.5 ...
##  $ poi                 : num  18 3 8 11 24 5 1 1 3 2 ...
##  $ internet            : Factor w/ 2 levels "no","yes": 2 1 1 2 2 2 2 2 2 1 ...
##  $ room_service        : Factor w/ 2 levels "no","yes": 2 2 1 1 2 2 2 2 2 2 ...
##  $ pool                : Factor w/ 2 levels "no","yes": 1 1 2 1 1 1 1 1 1 1 ...
##  $ gym                 : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 1 1 1 1 ...
##  $ restaurant          : Factor w/ 2 levels "no","yes": 2 1 2 2 2 1 1 1 1 2 ...
##  $ doc                 : Factor w/ 2 levels "no","yes": 2 2 1 1 2 2 1 2 2 2 ...
##  $ tour                : Factor w/ 2 levels "no","yes": 1 1 1 2 2 1 1 1 1 1 ...
##  $ AC                  : Factor w/ 2 levels "no","yes": 1 2 2 1 2 1 2 2 1 2 ...
##  $ rev_positive        : int  74 8 1 56 452 19 22 0 50 122 ...
##  $ rev_critical        : int  13 0 1 65 98 2 6 1 7 5 ...
##  $ rev_images          : int  13 2 1 17 58 7 2 0 11 42 ...
##  $ service_quality     : num  3.9 4.7 2.5 2.7 4 4.5 3.5 1 4.2 4.5 ...
##  $ amenities           : num  3.7 4.7 2.5 2.6 3.9 4.1 3.7 1 4.1 4.4 ...
##  $ food_drinks         : num  3.8 4.3 1 2.5 4.1 4.2 3.4 1 4.1 4.4 ...
##  $ value_money         : num  4.1 4.7 2.5 2.9 4 4.5 3.8 1 4.1 4.6 ...
##  $ location            : num  4 4.8 2.5 2.9 4 3.9 4 1 4.2 4.7 ...
##  $ cleanliness         : num  4.1 4.8 1 2.6 4.1 4.5 3.8 1 4.1 4.6 ...
some(fill)
##      guest_recommendation hotel_category hotel_star_rating image_count
## 304                    79        gostays                 2           6
## 700                    85        regular                 3           9
## 845                    80        regular                 4          26
## 2496                   82        regular                 3          12
## 2819                   76        regular                 3          18
## 3094                   81        gostays                 2          10
## 3132                   80        regular                 3           6
## 3186                   64        regular                 1          10
## 3402                   75        regular                 2          11
## 3610                   60        regular                 4          18
##      property_type room_count site_review_count site_review_rating poi
## 304          Hotel         10               134                4.1   9
## 700          Hotel        112               102                4.4   3
## 845          Hotel         45                70                4.5  29
## 2496         Hotel         27                46                3.7   7
## 2819        Resort          8                50                4.4   1
## 3094         Hotel         30               131                3.9   2
## 3132         Hotel         16                57                4.6   5
## 3186         Hotel         44                87                3.0  34
## 3402         Hotel         40                16                2.8   2
## 3610         Hotel         11                 5                3.7   5
##      internet room_service pool gym restaurant doc tour  AC rev_positive
## 304       yes           no  yes  no        yes  no   no yes          113
## 700       yes          yes  yes  no        yes yes  yes yes           98
## 845       yes          yes   no yes        yes yes   no yes           66
## 2496      yes          yes   no  no        yes yes   no yes           37
## 2819       no          yes   no  no         no  no   no  no           44
## 3094      yes          yes   no  no        yes yes   no  no          105
## 3132      yes          yes   no  no        yes  no   no yes           56
## 3186      yes          yes   no  no         no yes  yes  no           50
## 3402       no          yes   no  no         no  no   no yes            9
## 3610      yes          yes   no  no         no yes  yes yes            4
##      rev_critical rev_images service_quality amenities food_drinks
## 304            21         29             4.0       4.0         4.1
## 700             4         19             4.3       4.5         4.4
## 845             4         10             4.4       4.4         4.4
## 2496            9          6             3.6       3.8         3.9
## 2819            6         11             4.3       4.1         4.1
## 3094           26         13             3.9       3.8         3.7
## 3132            1          8             4.6       4.4         4.5
## 3186           37         17             2.8       2.9         3.1
## 3402            7          1             2.9       2.6         2.3
## 3610            1          1             3.5       3.5         4.0
##      value_money location cleanliness
## 304          4.2      4.0         4.3
## 700          4.4      4.4         4.4
## 845          4.5      4.7         4.6
## 2496         3.7      4.0         4.4
## 2819         4.3      4.4         4.5
## 3094         3.9      4.1         3.9
## 3132         4.6      4.4         4.6
## 3186         3.0      3.1         3.4
## 3402         2.7      2.7         3.1
## 3610         3.5      3.3         4.0

Describing data

summary(fill)
##  guest_recommendation hotel_category hotel_star_rating  image_count    
##  Min.   :  0.00       gostays: 125   Min.   :0.000     Min.   :  0.00  
##  1st Qu.: 69.00       regular:1853   1st Qu.:0.000     1st Qu.:  8.00  
##  Median : 80.00                      Median :2.000     Median : 14.00  
##  Mean   : 76.39                      Mean   :1.973     Mean   : 17.57  
##  3rd Qu.: 89.00                      3rd Qu.:3.000     3rd Qu.: 23.00  
##  Max.   :100.00                      Max.   :5.000     Max.   :129.00  
##                                                                        
##            property_type    room_count      site_review_count
##  Hotel            :1319   Min.   :   0.00   Min.   :   1.00  
##  Resort           : 323   1st Qu.:  11.00   1st Qu.:   5.00  
##  Guest House      :  87   Median :  20.00   Median :  21.00  
##  Service Apartment:  68   Mean   :  31.55   Mean   :  53.80  
##  Homestay         :  43   3rd Qu.:  35.00   3rd Qu.:  64.75  
##  Lodge            :  35   Max.   :5874.00   Max.   :2094.00  
##  (Other)          : 103                                      
##  site_review_rating      poi         internet   room_service  pool     
##  Min.   :1.000      Min.   : 1.000   no : 558   no : 184     no :1615  
##  1st Qu.:3.500      1st Qu.: 3.000   yes:1420   yes:1794     yes: 363  
##  Median :4.000      Median : 6.000                                     
##  Mean   :3.807      Mean   : 8.234                                     
##  3rd Qu.:4.300      3rd Qu.:11.000                                     
##  Max.   :5.000      Max.   :57.000                                     
##                                                                        
##   gym       restaurant  doc        tour        AC        rev_positive   
##  no :1639   no : 697   no : 609   no :1104   no : 822   Min.   :   0.0  
##  yes: 339   yes:1281   yes:1369   yes: 874   yes:1156   1st Qu.:   4.0  
##                                                         Median :  16.0  
##                                                         Mean   :  45.5  
##                                                         3rd Qu.:  55.0  
##                                                         Max.   :1806.0  
##                                                                         
##   rev_critical     rev_images      service_quality   amenities    
##  Min.   :  0.0   Min.   :  0.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:  1.0   1st Qu.:  1.000   1st Qu.:3.400   1st Qu.:3.400  
##  Median :  3.0   Median :  3.000   Median :3.900   Median :3.900  
##  Mean   :  8.3   Mean   :  8.369   Mean   :3.764   Mean   :3.714  
##  3rd Qu.:  9.0   3rd Qu.: 10.000   3rd Qu.:4.300   3rd Qu.:4.200  
##  Max.   :287.0   Max.   :370.000   Max.   :5.000   Max.   :5.000  
##                                                                   
##   food_drinks     value_money       location      cleanliness   
##  Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:3.400   1st Qu.:3.500   1st Qu.:3.600   1st Qu.:3.500  
##  Median :4.000   Median :4.000   Median :4.000   Median :4.100  
##  Mean   :3.791   Mean   :3.823   Mean   :3.895   Mean   :3.877  
##  3rd Qu.:4.400   3rd Qu.:4.300   3rd Qu.:4.400   3rd Qu.:4.500  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
## 
describe(Filter(is.numeric,fill))[,c(1:5,7:10)]
##                      vars    n  mean     sd median   mad min  max range
## guest_recommendation    1 1978 76.39  21.03   80.0 14.83   0  100   100
## hotel_star_rating       2 1978  1.97   1.52    2.0  1.48   0    5     5
## image_count             3 1978 17.57  13.57   14.0 10.38   0  129   129
## room_count              4 1978 31.55 135.28   20.0 14.83   0 5874  5874
## site_review_count       5 1978 53.80 100.08   21.0 28.17   1 2094  2093
## site_review_rating      6 1978  3.81   0.78    4.0  0.59   1    5     4
## poi                     7 1978  8.23   7.92    6.0  5.93   1   57    56
## rev_positive            8 1978 45.50  87.33   16.0 22.24   0 1806  1806
## rev_critical            9 1978  8.30  16.13    3.0  4.45   0  287   287
## rev_images             10 1978  8.37  16.23    3.0  4.45   0  370   370
## service_quality        11 1978  3.76   0.81    3.9  0.59   1    5     4
## amenities              12 1978  3.71   0.81    3.9  0.59   1    5     4
## food_drinks            13 1978  3.79   0.90    4.0  0.74   1    5     4
## value_money            14 1978  3.82   0.80    4.0  0.59   1    5     4
## location               15 1978  3.90   0.76    4.0  0.59   1    5     4
## cleanliness            16 1978  3.88   0.90    4.1  0.74   1    5     4
table(hotel_category)
## hotel_category
## gostays regular 
##     125    1853
aggregate(hotel_star_rating, by= list(hotel_category),mean)
##   Group.1        x
## 1 gostays 2.576000
## 2 regular 1.932002
aggregate(site_review_rating, by= list(hotel_category),mean)
##   Group.1        x
## 1 gostays 4.109600
## 2 regular 3.786077

Gostays hotels have higher star and site ratings than regular ones

table(property_type)
## property_type
##         Beach Hut               BnB          Bungalow           Cottage 
##                 2                27                 4                22 
##         Farm Stay       Guest House          Homestay            Hostel 
##                 1                87                43                 5 
##             Hotel         Houseboat             Lodge      Luxury Yacht 
##              1319                11                35                 0 
##             Motel            Palace            Resort Service Apartment 
##                 1                 7               323                68 
##              Tent             Villa 
##                11                12
aggregate(hotel_star_rating, by= list(property_type),mean)
##              Group.1         x
## 1          Beach Hut 1.0000000
## 2                BnB 0.2222222
## 3           Bungalow 0.7500000
## 4            Cottage 0.7727273
## 5          Farm Stay 0.0000000
## 6        Guest House 0.3908046
## 7           Homestay 0.3488372
## 8             Hostel 0.8000000
## 9              Hotel 2.1539045
## 10         Houseboat 0.9090909
## 11             Lodge 0.2571429
## 12             Motel 0.0000000
## 13            Palace 3.0000000
## 14            Resort 2.7461300
## 15 Service Apartment 0.6176471
## 16              Tent 0.5454545
## 17             Villa 0.4166667
aggregate(site_review_rating, by= list(property_type),mean)
##              Group.1        x
## 1          Beach Hut 4.300000
## 2                BnB 3.829630
## 3           Bungalow 4.575000
## 4            Cottage 4.090909
## 5          Farm Stay 1.300000
## 6        Guest House 3.828736
## 7           Homestay 4.074419
## 8             Hostel 3.280000
## 9              Hotel 3.757998
## 10         Houseboat 4.109091
## 11             Lodge 3.497143
## 12             Motel 2.000000
## 13            Palace 3.714286
## 14            Resort 3.934985
## 15 Service Apartment 3.936765
## 16              Tent 4.363636
## 17             Villa 3.658333

It clearly shows that hotels, resorts and palaces have higher star ratings It also shows that bungalow, tent and beach hut have higher site ratings

aggregate(cbind(hotel_star_rating, site_review_rating) ~ internet,  data = fill, mean)
##   internet hotel_star_rating site_review_rating
## 1       no          1.548387           3.663978
## 2      yes          2.139437           3.862535
aggregate(cbind(hotel_star_rating, site_review_rating) ~ room_service,  data = fill, mean)
##   room_service hotel_star_rating site_review_rating
## 1           no          1.103261           3.816304
## 2          yes          2.061873           3.805518
aggregate(cbind(hotel_star_rating, site_review_rating) ~ pool,  data = fill, mean)
##   pool hotel_star_rating site_review_rating
## 1   no          1.708978           3.767059
## 2  yes          3.146006           3.982094
aggregate(cbind(hotel_star_rating, site_review_rating) ~ gym,  data = fill, mean)
##   gym hotel_star_rating site_review_rating
## 1  no          1.723612           3.760891
## 2 yes          3.176991           4.027139
aggregate(cbind(hotel_star_rating, site_review_rating) ~ restaurant,  data = fill, mean)
##   restaurant hotel_star_rating site_review_rating
## 1         no          1.096126           3.643902
## 2        yes          2.449649           3.895004
aggregate(cbind(hotel_star_rating, site_review_rating) ~ doc,  data = fill, mean)
##   doc hotel_star_rating site_review_rating
## 1  no          1.832512           3.858292
## 2 yes          2.035062           3.783492
aggregate(cbind(hotel_star_rating, site_review_rating) ~ tour,  data = fill, mean)
##   tour hotel_star_rating site_review_rating
## 1   no          1.788043           3.790036
## 2  yes          2.205950           3.827346
aggregate(cbind(hotel_star_rating, site_review_rating) ~ AC,  data = fill, mean)
##    AC hotel_star_rating site_review_rating
## 1  no          1.585158           3.724939
## 2 yes          2.248270           3.864533

It is clear that availability of pool, gym and restaurant have a very high effect on star ratings and site ratings.

Visualization

ggplot(fill, aes(hotel_star_rating))+ geom_bar(fill="steelblue")

ggplot(fill, aes(site_review_rating))+ geom_histogram(fill="steelblue", color="gray",bins = 5)

rat =melt(fill[,21:26], variable.name = "feature", value.name = "rating")
ggplot(data = rat, aes(x=feature, y=rating))+ geom_boxplot(aes(fill=feature))

review =melt(fill[,c(18:20)], variable.name = "review_type", value.name = "count")
ggplot(data = review, aes(x=review_type, y=count))+ geom_boxplot(aes(fill=review_type))+ ylim(0,100)

scatterplot(guest_recommendation~site_review_rating, pch=19,cex=0.7)

ggplot(aggregate(guest_recommendation~hotel_star_rating,fill,mean),aes(hotel_star_rating,guest_recommendation))+ geom_bar(stat = "identity", fill="slateblue")

scatterplot(site_review_count~site_review_rating, ylim = c(0,800), pch=19,cex=0.7)

ggplot(aggregate(site_review_count~hotel_star_rating,fill,mean),aes(hotel_star_rating,site_review_count))+ geom_bar(stat = "identity", fill="slateblue")

scatterplot(image_count~site_review_rating, ylim=c(0,100), pch=19,cex=0.7)

ggplot(aggregate(image_count~hotel_star_rating,fill,mean),aes(hotel_star_rating,image_count))+ geom_bar(stat = "identity", fill="slateblue")

scatterplot(room_count~site_review_rating, ylim= c(0,300), pch=19,cex=0.7)

ggplot(aggregate(room_count~hotel_star_rating,fill,mean),aes(hotel_star_rating,room_count))+ geom_bar(stat = "identity", fill="slateblue")

scatterplot(poi~site_review_rating, pch=19,cex=0.7)

ggplot(aggregate(poi~hotel_star_rating,fill,mean),aes(hotel_star_rating,poi))+ geom_bar(stat = "identity", fill="slateblue")

xyplot(site_review_rating~ rev_positive+rev_critical+rev_images, xlim = c(0,500),pch = 19, cex = 0.5, auto.key =TRUE)

mycolors <- brewer.pal(2,"Set1")
names(mycolors) <- levels(hotel_category)
scatterplotMatrix(~hotel_star_rating+site_review_rating+guest_recommendation+poi+site_review_count+image_count|hotel_category, data=fill , reg.line="" , smoother="", col=mycolors, smoother.args=list(col="grey") , cex=0.7 , pch=c(16,16) , main="Scatter Plot of hotel ratings and variables",legend.plot= FALSE)
legend(x="topright", legend = levels(hotel_category), col=mycolors, pch=c(16,16), cex = 0.5)

scatterplotMatrix(~site_review_rating+service_quality+amenities+food_drinks+value_money+location+cleanliness, data=fill , reg.line="" , smoother="", col="purple", smoother.args=list(col="grey") , cex=0.5 , pch=c(16,16) , main="Scatter Plot of different site ratings",legend.plot= FALSE)

Correlations

num_fill = Filter(is.numeric,fill)

Dataset with numeric columns

round(cor(num_fill),2)
##                      guest_recommendation hotel_star_rating image_count
## guest_recommendation                 1.00              0.07        0.06
## hotel_star_rating                    0.07              1.00        0.31
## image_count                          0.06              0.31        1.00
## room_count                           0.01              0.11        0.12
## site_review_count                    0.08              0.25        0.22
## site_review_rating                   0.54              0.21        0.17
## poi                                 -0.02              0.13        0.10
## rev_positive                         0.10              0.26        0.22
## rev_critical                        -0.06              0.14        0.13
## rev_images                           0.10              0.25        0.24
## service_quality                      0.54              0.21        0.16
## amenities                            0.52              0.23        0.18
## food_drinks                          0.43              0.20        0.15
## value_money                          0.55              0.18        0.16
## location                             0.49              0.21        0.17
## cleanliness                          0.44              0.20        0.16
##                      room_count site_review_count site_review_rating   poi
## guest_recommendation       0.01              0.08               0.54 -0.02
## hotel_star_rating          0.11              0.25               0.21  0.13
## image_count                0.12              0.22               0.17  0.10
## room_count                 1.00              0.05               0.02  0.01
## site_review_count          0.05              1.00               0.15  0.08
## site_review_rating         0.02              0.15               1.00  0.00
## poi                        0.01              0.08               0.00  1.00
## rev_positive               0.05              0.99               0.18  0.07
## rev_critical               0.03              0.82              -0.07  0.11
## rev_images                 0.05              0.93               0.18  0.05
## service_quality            0.03              0.13               0.95  0.00
## amenities                  0.03              0.14               0.95  0.02
## food_drinks                0.00              0.13               0.83 -0.02
## value_money                0.02              0.15               0.94  0.01
## location                   0.03              0.15               0.89  0.04
## cleanliness                0.00              0.14               0.86  0.00
##                      rev_positive rev_critical rev_images service_quality
## guest_recommendation         0.10        -0.06       0.10            0.54
## hotel_star_rating            0.26         0.14       0.25            0.21
## image_count                  0.22         0.13       0.24            0.16
## room_count                   0.05         0.03       0.05            0.03
## site_review_count            0.99         0.82       0.93            0.13
## site_review_rating           0.18        -0.07       0.18            0.95
## poi                          0.07         0.11       0.05            0.00
## rev_positive                 1.00         0.76       0.94            0.17
## rev_critical                 0.76         1.00       0.69           -0.08
## rev_images                   0.94         0.69       1.00            0.16
## service_quality              0.17        -0.08       0.16            1.00
## amenities                    0.18        -0.07       0.17            0.95
## food_drinks                  0.16        -0.05       0.15            0.81
## value_money                  0.18        -0.05       0.17            0.95
## location                     0.18        -0.03       0.17            0.88
## cleanliness                  0.17        -0.05       0.16            0.83
##                      amenities food_drinks value_money location
## guest_recommendation      0.52        0.43        0.55     0.49
## hotel_star_rating         0.23        0.20        0.18     0.21
## image_count               0.18        0.15        0.16     0.17
## room_count                0.03        0.00        0.02     0.03
## site_review_count         0.14        0.13        0.15     0.15
## site_review_rating        0.95        0.83        0.94     0.89
## poi                       0.02       -0.02        0.01     0.04
## rev_positive              0.18        0.16        0.18     0.18
## rev_critical             -0.07       -0.05       -0.05    -0.03
## rev_images                0.17        0.15        0.17     0.17
## service_quality           0.95        0.81        0.95     0.88
## amenities                 1.00        0.81        0.95     0.88
## food_drinks               0.81        1.00        0.79     0.76
## value_money               0.95        0.79        1.00     0.89
## location                  0.88        0.76        0.89     1.00
## cleanliness               0.83        0.93        0.81     0.78
##                      cleanliness
## guest_recommendation        0.44
## hotel_star_rating           0.20
## image_count                 0.16
## room_count                  0.00
## site_review_count           0.14
## site_review_rating          0.86
## poi                         0.00
## rev_positive                0.17
## rev_critical               -0.05
## rev_images                  0.16
## service_quality             0.83
## amenities                   0.83
## food_drinks                 0.93
## value_money                 0.81
## location                    0.78
## cleanliness                 1.00
corrgram(num_fill, upper.panel = panel.pie)

It is evident that site ratings for different categories and site_review_rating are highly correlated to each other
site_review_count, positive reviews, critical reviews, and review with images are highly correlated to each other guest_recommendation is highly correlated to site_review_rating
other variables are slightly positively correlated to each other

Correlation Tests

t.test(site_review_rating~room_service)
## 
##  Welch Two Sample t-test
## 
## data:  site_review_rating by room_service
## t = 0.18376, df = 224.86, p-value = 0.8544
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1048790  0.1264509
## sample estimates:
##  mean in group no mean in group yes 
##          3.816304          3.805518
t.test(site_review_rating~doc)
## 
##  Welch Two Sample t-test
## 
## data:  site_review_rating by doc
## t = 2.026, df = 1238.2, p-value = 0.04297
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.002369018 0.147232348
## sample estimates:
##  mean in group no mean in group yes 
##          3.858292          3.783492
t.test(site_review_rating~tour)
## 
##  Welch Two Sample t-test
## 
## data:  site_review_rating by tour
## t = -1.0853, df = 1974.6, p-value = 0.2779
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.10472890  0.03011029
## sample estimates:
##  mean in group no mean in group yes 
##          3.790036          3.827346
t.test(site_review_rating~AC)
## 
##  Welch Two Sample t-test
## 
## data:  site_review_rating by AC
## t = -3.8084, df = 1520.3, p-value = 0.0001455
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2114926 -0.0676948
## sample estimates:
##  mean in group no mean in group yes 
##          3.724939          3.864533

It shows that we cannot say that site_review_rating depends upon availability of room service or tour

chisq.test(table(hotel_star_rating,pool))
## 
##  Pearson's Chi-squared test
## 
## data:  table(hotel_star_rating, pool)
## X-squared = 509.03, df = 5, p-value < 2.2e-16
chisq.test(table(hotel_star_rating,gym))
## 
##  Pearson's Chi-squared test
## 
## data:  table(hotel_star_rating, gym)
## X-squared = 462.88, df = 5, p-value < 2.2e-16
chisq.test(table(hotel_star_rating,restaurant))
## 
##  Pearson's Chi-squared test
## 
## data:  table(hotel_star_rating, restaurant)
## X-squared = 369.18, df = 5, p-value < 2.2e-16

Thus we can say that hotel star ratings depends very highly on pool, gym and restaurant

Model Selection

detach(fill)
attach(num_fill)
model1 = hotel_star_rating~.
fit1 = lm(model1, data = num_fill)
summary(fit1)
## 
## Call:
## lm(formula = model1, data = num_fill)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2289 -1.2751  0.2619  1.0489  3.8225 
## 
## Coefficients:
##                        Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           0.3384935  0.1797488   1.883  0.05983 .  
## guest_recommendation -0.0025758  0.0017748  -1.451  0.14683    
## image_count           0.0246398  0.0024103  10.223  < 2e-16 ***
## room_count            0.0006943  0.0002308   3.008  0.00266 ** 
## site_review_count    -1.2879715  1.0520393  -1.224  0.22100    
## site_review_rating   -0.0319941  0.1631926  -0.196  0.84459    
## poi                   0.0185037  0.0039796   4.650 3.55e-06 ***
## rev_positive          1.2914874  1.0520135   1.228  0.21973    
## rev_critical          1.2830521  1.0521528   1.219  0.22282    
## rev_images            0.0023266  0.0055500   0.419  0.67511    
## service_quality       0.2367631  0.1523872   1.554  0.12042    
## amenities             0.7564474  0.1592815   4.749 2.19e-06 ***
## food_drinks           0.0888872  0.0984196   0.903  0.36656    
## value_money          -0.9239230  0.1516589  -6.092 1.34e-09 ***
## location              0.1966440  0.0975099   2.017  0.04387 *  
## cleanliness          -0.0162286  0.1049361  -0.155  0.87711    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.375 on 1962 degrees of freedom
## Multiple R-squared:  0.1903, Adjusted R-squared:  0.1841 
## F-statistic: 30.73 on 15 and 1962 DF,  p-value: < 2.2e-16
leap1 = regsubsets(model1, data = num_fill, nbest = 1)
#summary(leap1)
plot(leap1, scale="adjr2")

Best fit model

model2 = hotel_star_rating~image_count+room_count+poi+rev_positive+service_quality+amenities+value_money+location
fit2 = lm(model2, data = num_fill)
summary(fit2)
## 
## Call:
## lm(formula = model2, data = num_fill)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2218 -1.2668  0.2744  1.0552  3.7486 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      0.2187715  0.1663037   1.315  0.18850    
## image_count      0.0251043  0.0023882  10.512  < 2e-16 ***
## room_count       0.0006800  0.0002304   2.951  0.00321 ** 
## poi              0.0174416  0.0039420   4.425 1.02e-05 ***
## rev_positive     0.0030391  0.0003683   8.252 2.82e-16 ***
## service_quality  0.2511091  0.1397814   1.796  0.07258 .  
## amenities        0.8102837  0.1535247   5.278 1.45e-07 ***
## value_money     -0.9687292  0.1467228  -6.602 5.19e-11 ***
## location         0.1947483  0.0943291   2.065  0.03910 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.375 on 1969 degrees of freedom
## Multiple R-squared:  0.1871, Adjusted R-squared:  0.1838 
## F-statistic: 56.65 on 8 and 1969 DF,  p-value: < 2.2e-16
coefplot(fit2, intercept= FALSE, outerCI=1.96,coefficients=c("image_count","room_count","poi","rev_positive","service_quality","amenities","value_money","location"))

summary(fit1)$adj.r.squared
## [1] 0.1840776
summary(fit2)$adj.r.squared
## [1] 0.1837894
AIC(fit1)
## [1] 6889.977
AIC(fit2)
## [1] 6883.72

Adjusted R Squared for Model 2 is less than Model 1
AIC for Model 2 is less than Model 1

Thus, Model 2 is the best fit ordinary least squares model. Model 2 predicts the hotel star rating as a function of the following explanatory variables: “image_count”,“room_count”,“poi”,“rev_positive”,“service_quality”,“amenities”,“value_money”,“location”

\[EndOfFile\]