The dataset is taken from 515K Hotel Reviews Data in Europe | Kaggle

Loading the required libraries

library(dplyr)
library(ggplot2); library(RColorBrewer)

Loading and viewing the data

reviews <- read.csv("Hotel_Reviews.csv", stringsAsFactors = FALSE)

Changing reviewers’ nationality into factor

Seeing the average negative text length

reviews$Reviewer_Nationality <- as.factor(reviews$Reviewer_Nationality)
summary(reviews$Review_Total_Negative_Word_Counts)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    2.00    9.00   18.54   23.00  408.00

The average negative text length is 18.54, and we want to find out the nationalities who write more.

Seeing the top 15 nationalities that write the most

reviews %>% group_by(Reviewer_Nationality) %>%
summarise(avg=mean(Review_Total_Negative_Word_Counts)) %>%
top_n(15) %>%
ggplot(aes(x=reorder(Reviewer_Nationality, -avg), y=avg, fill=Reviewer_Nationality)) +
geom_bar(stat = "identity") +
xlab("Nationalities") +
ylab("Average length of negative text")
## Selecting by avg