Load Data Frames
load("C:/Users/Himanshu/Desktop/Hotel/Project_Hotel.RData")
attach(fill)
As I have already made many changes to the initial dataframe (selected appropriate columns and created new columns from old ones), the final usable dataframe ‘fill’ is loaded.
dim(fill)
## [1] 1978 26
str(fill)
## 'data.frame': 1978 obs. of 26 variables:
## $ guest_recommendation: int 85 87 50 63 76 90 67 0 84 92 ...
## $ hotel_category : Factor w/ 2 levels "gostays","regular": 1 2 2 2 1 2 2 2 2 2 ...
## $ hotel_star_rating : int 2 0 0 2 3 0 0 3 3 0 ...
## $ image_count : int 20 17 28 6 20 13 3 17 7 6 ...
## $ property_type : Factor w/ 18 levels "Beach Hut","BnB",..: 15 6 15 9 9 9 9 14 9 15 ...
## $ room_count : int 17 18 15 20 24 4 22 21 10 14 ...
## $ site_review_count : int 87 8 2 121 550 21 28 1 57 127 ...
## $ site_review_rating : num 4 4.5 2.5 2.8 4 4.3 3.6 1 4.2 4.5 ...
## $ poi : num 18 3 8 11 24 5 1 1 3 2 ...
## $ internet : Factor w/ 2 levels "no","yes": 2 1 1 2 2 2 2 2 2 1 ...
## $ room_service : Factor w/ 2 levels "no","yes": 2 2 1 1 2 2 2 2 2 2 ...
## $ pool : Factor w/ 2 levels "no","yes": 1 1 2 1 1 1 1 1 1 1 ...
## $ gym : Factor w/ 2 levels "no","yes": 1 2 1 1 1 1 1 1 1 1 ...
## $ restaurant : Factor w/ 2 levels "no","yes": 2 1 2 2 2 1 1 1 1 2 ...
## $ doc : Factor w/ 2 levels "no","yes": 2 2 1 1 2 2 1 2 2 2 ...
## $ tour : Factor w/ 2 levels "no","yes": 1 1 1 2 2 1 1 1 1 1 ...
## $ AC : Factor w/ 2 levels "no","yes": 1 2 2 1 2 1 2 2 1 2 ...
## $ rev_positive : int 74 8 1 56 452 19 22 0 50 122 ...
## $ rev_critical : int 13 0 1 65 98 2 6 1 7 5 ...
## $ rev_images : int 13 2 1 17 58 7 2 0 11 42 ...
## $ service_quality : num 3.9 4.7 2.5 2.7 4 4.5 3.5 1 4.2 4.5 ...
## $ amenities : num 3.7 4.7 2.5 2.6 3.9 4.1 3.7 1 4.1 4.4 ...
## $ food_drinks : num 3.8 4.3 1 2.5 4.1 4.2 3.4 1 4.1 4.4 ...
## $ value_money : num 4.1 4.7 2.5 2.9 4 4.5 3.8 1 4.1 4.6 ...
## $ location : num 4 4.8 2.5 2.9 4 3.9 4 1 4.2 4.7 ...
## $ cleanliness : num 4.1 4.8 1 2.6 4.1 4.5 3.8 1 4.1 4.6 ...
some(fill)
## guest_recommendation hotel_category hotel_star_rating image_count
## 304 79 gostays 2 6
## 700 85 regular 3 9
## 845 80 regular 4 26
## 2496 82 regular 3 12
## 2819 76 regular 3 18
## 3094 81 gostays 2 10
## 3132 80 regular 3 6
## 3186 64 regular 1 10
## 3402 75 regular 2 11
## 3610 60 regular 4 18
## property_type room_count site_review_count site_review_rating poi
## 304 Hotel 10 134 4.1 9
## 700 Hotel 112 102 4.4 3
## 845 Hotel 45 70 4.5 29
## 2496 Hotel 27 46 3.7 7
## 2819 Resort 8 50 4.4 1
## 3094 Hotel 30 131 3.9 2
## 3132 Hotel 16 57 4.6 5
## 3186 Hotel 44 87 3.0 34
## 3402 Hotel 40 16 2.8 2
## 3610 Hotel 11 5 3.7 5
## internet room_service pool gym restaurant doc tour AC rev_positive
## 304 yes no yes no yes no no yes 113
## 700 yes yes yes no yes yes yes yes 98
## 845 yes yes no yes yes yes no yes 66
## 2496 yes yes no no yes yes no yes 37
## 2819 no yes no no no no no no 44
## 3094 yes yes no no yes yes no no 105
## 3132 yes yes no no yes no no yes 56
## 3186 yes yes no no no yes yes no 50
## 3402 no yes no no no no no yes 9
## 3610 yes yes no no no yes yes yes 4
## rev_critical rev_images service_quality amenities food_drinks
## 304 21 29 4.0 4.0 4.1
## 700 4 19 4.3 4.5 4.4
## 845 4 10 4.4 4.4 4.4
## 2496 9 6 3.6 3.8 3.9
## 2819 6 11 4.3 4.1 4.1
## 3094 26 13 3.9 3.8 3.7
## 3132 1 8 4.6 4.4 4.5
## 3186 37 17 2.8 2.9 3.1
## 3402 7 1 2.9 2.6 2.3
## 3610 1 1 3.5 3.5 4.0
## value_money location cleanliness
## 304 4.2 4.0 4.3
## 700 4.4 4.4 4.4
## 845 4.5 4.7 4.6
## 2496 3.7 4.0 4.4
## 2819 4.3 4.4 4.5
## 3094 3.9 4.1 3.9
## 3132 4.6 4.4 4.6
## 3186 3.0 3.1 3.4
## 3402 2.7 2.7 3.1
## 3610 3.5 3.3 4.0
summary(fill)
## guest_recommendation hotel_category hotel_star_rating image_count
## Min. : 0.00 gostays: 125 Min. :0.000 Min. : 0.00
## 1st Qu.: 69.00 regular:1853 1st Qu.:0.000 1st Qu.: 8.00
## Median : 80.00 Median :2.000 Median : 14.00
## Mean : 76.39 Mean :1.973 Mean : 17.57
## 3rd Qu.: 89.00 3rd Qu.:3.000 3rd Qu.: 23.00
## Max. :100.00 Max. :5.000 Max. :129.00
##
## property_type room_count site_review_count
## Hotel :1319 Min. : 0.00 Min. : 1.00
## Resort : 323 1st Qu.: 11.00 1st Qu.: 5.00
## Guest House : 87 Median : 20.00 Median : 21.00
## Service Apartment: 68 Mean : 31.55 Mean : 53.80
## Homestay : 43 3rd Qu.: 35.00 3rd Qu.: 64.75
## Lodge : 35 Max. :5874.00 Max. :2094.00
## (Other) : 103
## site_review_rating poi internet room_service pool
## Min. :1.000 Min. : 1.000 no : 558 no : 184 no :1615
## 1st Qu.:3.500 1st Qu.: 3.000 yes:1420 yes:1794 yes: 363
## Median :4.000 Median : 6.000
## Mean :3.807 Mean : 8.234
## 3rd Qu.:4.300 3rd Qu.:11.000
## Max. :5.000 Max. :57.000
##
## gym restaurant doc tour AC rev_positive
## no :1639 no : 697 no : 609 no :1104 no : 822 Min. : 0.0
## yes: 339 yes:1281 yes:1369 yes: 874 yes:1156 1st Qu.: 4.0
## Median : 16.0
## Mean : 45.5
## 3rd Qu.: 55.0
## Max. :1806.0
##
## rev_critical rev_images service_quality amenities
## Min. : 0.0 Min. : 0.000 Min. :1.000 Min. :1.000
## 1st Qu.: 1.0 1st Qu.: 1.000 1st Qu.:3.400 1st Qu.:3.400
## Median : 3.0 Median : 3.000 Median :3.900 Median :3.900
## Mean : 8.3 Mean : 8.369 Mean :3.764 Mean :3.714
## 3rd Qu.: 9.0 3rd Qu.: 10.000 3rd Qu.:4.300 3rd Qu.:4.200
## Max. :287.0 Max. :370.000 Max. :5.000 Max. :5.000
##
## food_drinks value_money location cleanliness
## Min. :1.000 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.:3.400 1st Qu.:3.500 1st Qu.:3.600 1st Qu.:3.500
## Median :4.000 Median :4.000 Median :4.000 Median :4.100
## Mean :3.791 Mean :3.823 Mean :3.895 Mean :3.877
## 3rd Qu.:4.400 3rd Qu.:4.300 3rd Qu.:4.400 3rd Qu.:4.500
## Max. :5.000 Max. :5.000 Max. :5.000 Max. :5.000
##
describe(Filter(is.numeric,fill))[,c(1:5,7:10)]
## vars n mean sd median mad min max range
## guest_recommendation 1 1978 76.39 21.03 80.0 14.83 0 100 100
## hotel_star_rating 2 1978 1.97 1.52 2.0 1.48 0 5 5
## image_count 3 1978 17.57 13.57 14.0 10.38 0 129 129
## room_count 4 1978 31.55 135.28 20.0 14.83 0 5874 5874
## site_review_count 5 1978 53.80 100.08 21.0 28.17 1 2094 2093
## site_review_rating 6 1978 3.81 0.78 4.0 0.59 1 5 4
## poi 7 1978 8.23 7.92 6.0 5.93 1 57 56
## rev_positive 8 1978 45.50 87.33 16.0 22.24 0 1806 1806
## rev_critical 9 1978 8.30 16.13 3.0 4.45 0 287 287
## rev_images 10 1978 8.37 16.23 3.0 4.45 0 370 370
## service_quality 11 1978 3.76 0.81 3.9 0.59 1 5 4
## amenities 12 1978 3.71 0.81 3.9 0.59 1 5 4
## food_drinks 13 1978 3.79 0.90 4.0 0.74 1 5 4
## value_money 14 1978 3.82 0.80 4.0 0.59 1 5 4
## location 15 1978 3.90 0.76 4.0 0.59 1 5 4
## cleanliness 16 1978 3.88 0.90 4.1 0.74 1 5 4
table(hotel_category)
## hotel_category
## gostays regular
## 125 1853
aggregate(hotel_star_rating, by= list(hotel_category),mean)
## Group.1 x
## 1 gostays 2.576000
## 2 regular 1.932002
aggregate(site_review_rating, by= list(hotel_category),mean)
## Group.1 x
## 1 gostays 4.109600
## 2 regular 3.786077
Gostays hotels have higher star and site ratings than regular ones
table(property_type)
## property_type
## Beach Hut BnB Bungalow Cottage
## 2 27 4 22
## Farm Stay Guest House Homestay Hostel
## 1 87 43 5
## Hotel Houseboat Lodge Luxury Yacht
## 1319 11 35 0
## Motel Palace Resort Service Apartment
## 1 7 323 68
## Tent Villa
## 11 12
aggregate(hotel_star_rating, by= list(property_type),mean)
## Group.1 x
## 1 Beach Hut 1.0000000
## 2 BnB 0.2222222
## 3 Bungalow 0.7500000
## 4 Cottage 0.7727273
## 5 Farm Stay 0.0000000
## 6 Guest House 0.3908046
## 7 Homestay 0.3488372
## 8 Hostel 0.8000000
## 9 Hotel 2.1539045
## 10 Houseboat 0.9090909
## 11 Lodge 0.2571429
## 12 Motel 0.0000000
## 13 Palace 3.0000000
## 14 Resort 2.7461300
## 15 Service Apartment 0.6176471
## 16 Tent 0.5454545
## 17 Villa 0.4166667
aggregate(site_review_rating, by= list(property_type),mean)
## Group.1 x
## 1 Beach Hut 4.300000
## 2 BnB 3.829630
## 3 Bungalow 4.575000
## 4 Cottage 4.090909
## 5 Farm Stay 1.300000
## 6 Guest House 3.828736
## 7 Homestay 4.074419
## 8 Hostel 3.280000
## 9 Hotel 3.757998
## 10 Houseboat 4.109091
## 11 Lodge 3.497143
## 12 Motel 2.000000
## 13 Palace 3.714286
## 14 Resort 3.934985
## 15 Service Apartment 3.936765
## 16 Tent 4.363636
## 17 Villa 3.658333
It clearly shows that hotels, resorts and palaces have higher star ratings It also shows that bungalow, tent and beach hut have higher site ratings
aggregate(cbind(hotel_star_rating, site_review_rating) ~ internet, data = fill, mean)
## internet hotel_star_rating site_review_rating
## 1 no 1.548387 3.663978
## 2 yes 2.139437 3.862535
aggregate(cbind(hotel_star_rating, site_review_rating) ~ room_service, data = fill, mean)
## room_service hotel_star_rating site_review_rating
## 1 no 1.103261 3.816304
## 2 yes 2.061873 3.805518
aggregate(cbind(hotel_star_rating, site_review_rating) ~ pool, data = fill, mean)
## pool hotel_star_rating site_review_rating
## 1 no 1.708978 3.767059
## 2 yes 3.146006 3.982094
aggregate(cbind(hotel_star_rating, site_review_rating) ~ gym, data = fill, mean)
## gym hotel_star_rating site_review_rating
## 1 no 1.723612 3.760891
## 2 yes 3.176991 4.027139
aggregate(cbind(hotel_star_rating, site_review_rating) ~ restaurant, data = fill, mean)
## restaurant hotel_star_rating site_review_rating
## 1 no 1.096126 3.643902
## 2 yes 2.449649 3.895004
aggregate(cbind(hotel_star_rating, site_review_rating) ~ doc, data = fill, mean)
## doc hotel_star_rating site_review_rating
## 1 no 1.832512 3.858292
## 2 yes 2.035062 3.783492
aggregate(cbind(hotel_star_rating, site_review_rating) ~ tour, data = fill, mean)
## tour hotel_star_rating site_review_rating
## 1 no 1.788043 3.790036
## 2 yes 2.205950 3.827346
aggregate(cbind(hotel_star_rating, site_review_rating) ~ AC, data = fill, mean)
## AC hotel_star_rating site_review_rating
## 1 no 1.585158 3.724939
## 2 yes 2.248270 3.864533
It is clear that availability of pool, gym and restaurant have a very high effect on star ratings and site ratings.
ggplot(fill, aes(hotel_star_rating))+ geom_bar(fill="steelblue")
ggplot(fill, aes(site_review_rating))+ geom_histogram(fill="steelblue", color="gray",bins = 5)
rat =melt(fill[,21:26], variable.name = "feature", value.name = "rating")
ggplot(data = rat, aes(x=feature, y=rating))+ geom_boxplot(aes(fill=feature))
review =melt(fill[,c(18:20)], variable.name = "review_type", value.name = "count")
ggplot(data = review, aes(x=review_type, y=count))+ geom_boxplot(aes(fill=review_type))+ ylim(0,100)
scatterplot(guest_recommendation~site_review_rating, pch=19,cex=0.7)
ggplot(aggregate(guest_recommendation~hotel_star_rating,fill,mean),aes(hotel_star_rating,guest_recommendation))+ geom_bar(stat = "identity", fill="slateblue")
scatterplot(site_review_count~site_review_rating, ylim = c(0,800), pch=19,cex=0.7)
ggplot(aggregate(site_review_count~hotel_star_rating,fill,mean),aes(hotel_star_rating,site_review_count))+ geom_bar(stat = "identity", fill="slateblue")
scatterplot(image_count~site_review_rating, ylim=c(0,100), pch=19,cex=0.7)
ggplot(aggregate(image_count~hotel_star_rating,fill,mean),aes(hotel_star_rating,image_count))+ geom_bar(stat = "identity", fill="slateblue")
scatterplot(room_count~site_review_rating, ylim= c(0,300), pch=19,cex=0.7)
ggplot(aggregate(room_count~hotel_star_rating,fill,mean),aes(hotel_star_rating,room_count))+ geom_bar(stat = "identity", fill="slateblue")
scatterplot(poi~site_review_rating, pch=19,cex=0.7)
ggplot(aggregate(poi~hotel_star_rating,fill,mean),aes(hotel_star_rating,poi))+ geom_bar(stat = "identity", fill="slateblue")
xyplot(site_review_rating~ rev_positive+rev_critical+rev_images, xlim = c(0,500),pch = 19, cex = 0.5, auto.key =TRUE)
mycolors <- brewer.pal(2,"Set1")
names(mycolors) <- levels(hotel_category)
scatterplotMatrix(~hotel_star_rating+site_review_rating+guest_recommendation+poi+site_review_count+image_count|hotel_category, data=fill , reg.line="" , smoother="", col=mycolors, smoother.args=list(col="grey") , cex=0.7 , pch=c(16,16) , main="Scatter Plot of hotel ratings and variables",legend.plot= FALSE)
legend(x="topright", legend = levels(hotel_category), col=mycolors, pch=c(16,16), cex = 0.5)
scatterplotMatrix(~site_review_rating+service_quality+amenities+food_drinks+value_money+location+cleanliness, data=fill , reg.line="" , smoother="", col="purple", smoother.args=list(col="grey") , cex=0.5 , pch=c(16,16) , main="Scatter Plot of different site ratings",legend.plot= FALSE)
num_fill = Filter(is.numeric,fill)
Dataset with numeric columns
round(cor(num_fill),2)
## guest_recommendation hotel_star_rating image_count
## guest_recommendation 1.00 0.07 0.06
## hotel_star_rating 0.07 1.00 0.31
## image_count 0.06 0.31 1.00
## room_count 0.01 0.11 0.12
## site_review_count 0.08 0.25 0.22
## site_review_rating 0.54 0.21 0.17
## poi -0.02 0.13 0.10
## rev_positive 0.10 0.26 0.22
## rev_critical -0.06 0.14 0.13
## rev_images 0.10 0.25 0.24
## service_quality 0.54 0.21 0.16
## amenities 0.52 0.23 0.18
## food_drinks 0.43 0.20 0.15
## value_money 0.55 0.18 0.16
## location 0.49 0.21 0.17
## cleanliness 0.44 0.20 0.16
## room_count site_review_count site_review_rating poi
## guest_recommendation 0.01 0.08 0.54 -0.02
## hotel_star_rating 0.11 0.25 0.21 0.13
## image_count 0.12 0.22 0.17 0.10
## room_count 1.00 0.05 0.02 0.01
## site_review_count 0.05 1.00 0.15 0.08
## site_review_rating 0.02 0.15 1.00 0.00
## poi 0.01 0.08 0.00 1.00
## rev_positive 0.05 0.99 0.18 0.07
## rev_critical 0.03 0.82 -0.07 0.11
## rev_images 0.05 0.93 0.18 0.05
## service_quality 0.03 0.13 0.95 0.00
## amenities 0.03 0.14 0.95 0.02
## food_drinks 0.00 0.13 0.83 -0.02
## value_money 0.02 0.15 0.94 0.01
## location 0.03 0.15 0.89 0.04
## cleanliness 0.00 0.14 0.86 0.00
## rev_positive rev_critical rev_images service_quality
## guest_recommendation 0.10 -0.06 0.10 0.54
## hotel_star_rating 0.26 0.14 0.25 0.21
## image_count 0.22 0.13 0.24 0.16
## room_count 0.05 0.03 0.05 0.03
## site_review_count 0.99 0.82 0.93 0.13
## site_review_rating 0.18 -0.07 0.18 0.95
## poi 0.07 0.11 0.05 0.00
## rev_positive 1.00 0.76 0.94 0.17
## rev_critical 0.76 1.00 0.69 -0.08
## rev_images 0.94 0.69 1.00 0.16
## service_quality 0.17 -0.08 0.16 1.00
## amenities 0.18 -0.07 0.17 0.95
## food_drinks 0.16 -0.05 0.15 0.81
## value_money 0.18 -0.05 0.17 0.95
## location 0.18 -0.03 0.17 0.88
## cleanliness 0.17 -0.05 0.16 0.83
## amenities food_drinks value_money location
## guest_recommendation 0.52 0.43 0.55 0.49
## hotel_star_rating 0.23 0.20 0.18 0.21
## image_count 0.18 0.15 0.16 0.17
## room_count 0.03 0.00 0.02 0.03
## site_review_count 0.14 0.13 0.15 0.15
## site_review_rating 0.95 0.83 0.94 0.89
## poi 0.02 -0.02 0.01 0.04
## rev_positive 0.18 0.16 0.18 0.18
## rev_critical -0.07 -0.05 -0.05 -0.03
## rev_images 0.17 0.15 0.17 0.17
## service_quality 0.95 0.81 0.95 0.88
## amenities 1.00 0.81 0.95 0.88
## food_drinks 0.81 1.00 0.79 0.76
## value_money 0.95 0.79 1.00 0.89
## location 0.88 0.76 0.89 1.00
## cleanliness 0.83 0.93 0.81 0.78
## cleanliness
## guest_recommendation 0.44
## hotel_star_rating 0.20
## image_count 0.16
## room_count 0.00
## site_review_count 0.14
## site_review_rating 0.86
## poi 0.00
## rev_positive 0.17
## rev_critical -0.05
## rev_images 0.16
## service_quality 0.83
## amenities 0.83
## food_drinks 0.93
## value_money 0.81
## location 0.78
## cleanliness 1.00
corrgram(num_fill, upper.panel = panel.pie)
It is evident that site ratings for different categories and site_review_rating are highly correlated to each other
site_review_count, positive reviews, critical reviews, and review with images are highly correlated to each other guest_recommendation is highly correlated to site_review_rating
other variables are slightly positively correlated to each other
t.test(site_review_rating~room_service)
##
## Welch Two Sample t-test
##
## data: site_review_rating by room_service
## t = 0.18376, df = 224.86, p-value = 0.8544
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1048790 0.1264509
## sample estimates:
## mean in group no mean in group yes
## 3.816304 3.805518
t.test(site_review_rating~doc)
##
## Welch Two Sample t-test
##
## data: site_review_rating by doc
## t = 2.026, df = 1238.2, p-value = 0.04297
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.002369018 0.147232348
## sample estimates:
## mean in group no mean in group yes
## 3.858292 3.783492
t.test(site_review_rating~tour)
##
## Welch Two Sample t-test
##
## data: site_review_rating by tour
## t = -1.0853, df = 1974.6, p-value = 0.2779
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.10472890 0.03011029
## sample estimates:
## mean in group no mean in group yes
## 3.790036 3.827346
t.test(site_review_rating~AC)
##
## Welch Two Sample t-test
##
## data: site_review_rating by AC
## t = -3.8084, df = 1520.3, p-value = 0.0001455
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.2114926 -0.0676948
## sample estimates:
## mean in group no mean in group yes
## 3.724939 3.864533
It shows that we cannot say that site_review_rating depends upon availability of room service or tour
chisq.test(table(hotel_star_rating,pool))
##
## Pearson's Chi-squared test
##
## data: table(hotel_star_rating, pool)
## X-squared = 509.03, df = 5, p-value < 2.2e-16
chisq.test(table(hotel_star_rating,gym))
##
## Pearson's Chi-squared test
##
## data: table(hotel_star_rating, gym)
## X-squared = 462.88, df = 5, p-value < 2.2e-16
chisq.test(table(hotel_star_rating,restaurant))
##
## Pearson's Chi-squared test
##
## data: table(hotel_star_rating, restaurant)
## X-squared = 369.18, df = 5, p-value < 2.2e-16
Thus we can say that hotel star ratings depends very highly on pool, gym and restaurant
detach(fill)
attach(num_fill)
model1 = hotel_star_rating~.
fit1 = lm(model1, data = num_fill)
summary(fit1)
##
## Call:
## lm(formula = model1, data = num_fill)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.2289 -1.2751 0.2619 1.0489 3.8225
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.3384935 0.1797488 1.883 0.05983 .
## guest_recommendation -0.0025758 0.0017748 -1.451 0.14683
## image_count 0.0246398 0.0024103 10.223 < 2e-16 ***
## room_count 0.0006943 0.0002308 3.008 0.00266 **
## site_review_count -1.2879715 1.0520393 -1.224 0.22100
## site_review_rating -0.0319941 0.1631926 -0.196 0.84459
## poi 0.0185037 0.0039796 4.650 3.55e-06 ***
## rev_positive 1.2914874 1.0520135 1.228 0.21973
## rev_critical 1.2830521 1.0521528 1.219 0.22282
## rev_images 0.0023266 0.0055500 0.419 0.67511
## service_quality 0.2367631 0.1523872 1.554 0.12042
## amenities 0.7564474 0.1592815 4.749 2.19e-06 ***
## food_drinks 0.0888872 0.0984196 0.903 0.36656
## value_money -0.9239230 0.1516589 -6.092 1.34e-09 ***
## location 0.1966440 0.0975099 2.017 0.04387 *
## cleanliness -0.0162286 0.1049361 -0.155 0.87711
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.375 on 1962 degrees of freedom
## Multiple R-squared: 0.1903, Adjusted R-squared: 0.1841
## F-statistic: 30.73 on 15 and 1962 DF, p-value: < 2.2e-16
leap1 = regsubsets(model1, data = num_fill, nbest = 1)
#summary(leap1)
plot(leap1, scale="adjr2")
model2 = hotel_star_rating~image_count+room_count+poi+rev_positive+service_quality+amenities+value_money+location
fit2 = lm(model2, data = num_fill)
summary(fit2)
##
## Call:
## lm(formula = model2, data = num_fill)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.2218 -1.2668 0.2744 1.0552 3.7486
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.2187715 0.1663037 1.315 0.18850
## image_count 0.0251043 0.0023882 10.512 < 2e-16 ***
## room_count 0.0006800 0.0002304 2.951 0.00321 **
## poi 0.0174416 0.0039420 4.425 1.02e-05 ***
## rev_positive 0.0030391 0.0003683 8.252 2.82e-16 ***
## service_quality 0.2511091 0.1397814 1.796 0.07258 .
## amenities 0.8102837 0.1535247 5.278 1.45e-07 ***
## value_money -0.9687292 0.1467228 -6.602 5.19e-11 ***
## location 0.1947483 0.0943291 2.065 0.03910 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.375 on 1969 degrees of freedom
## Multiple R-squared: 0.1871, Adjusted R-squared: 0.1838
## F-statistic: 56.65 on 8 and 1969 DF, p-value: < 2.2e-16
coefplot(fit2, intercept= FALSE, outerCI=1.96,coefficients=c("image_count","room_count","poi","rev_positive","service_quality","amenities","value_money","location"))
summary(fit1)$adj.r.squared
## [1] 0.1840776
summary(fit2)$adj.r.squared
## [1] 0.1837894
AIC(fit1)
## [1] 6889.977
AIC(fit2)
## [1] 6883.72
Adjusted R Squared for Model 2 is less than Model 1
AIC for Model 2 is less than Model 1
Thus, Model 2 is the best fit ordinary least squares model. Model 2 predicts the hotel star rating as a function of the following explanatory variables: “image_count”,“room_count”,“poi”,“rev_positive”,“service_quality”,“amenities”,“value_money”,“location”
\[EndOfFile\]