The dataset is taken from 515K Hotel Reviews Data in Europe | Kaggle
Loading the required libraries
library(dplyr)
library(ggplot2); library(RColorBrewer)
Loading and viewing the data
reviews <- read.csv("Hotel_Reviews.csv", stringsAsFactors = FALSE)
Changing reviewers’ nationality into factor
Seeing the average negative text length
reviews$Reviewer_Nationality <- as.factor(reviews$Reviewer_Nationality)
summary(reviews$Review_Total_Negative_Word_Counts)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 2.00 9.00 18.54 23.00 408.00
The average negative text length is 18.54, and we want to find out the nationalities who write more.
Seeing the top 15 nationalities that write the most
reviews %>% group_by(Reviewer_Nationality) %>%
summarise(avg=mean(Review_Total_Negative_Word_Counts)) %>%
top_n(15) %>%
ggplot(aes(x=reorder(Reviewer_Nationality, -avg), y=avg, fill=Reviewer_Nationality)) +
geom_bar(stat = "identity") +
xlab("Nationalities") +
ylab("Average length of negative text")
## Selecting by avg
