Movie Theater Comparison

In this analysis there will be a comparison of two movie theaters which are Cinemark Oakley Station and XD and Esquire Theatre which are both theaters in the Cincinnati area. The data that has been collected has come from reviews on yelp. I intend to answer three questions about what the belief of people on yelp think of the theaters. This data is of 147 reviews for both theaters.

Question 1: What were the most common used unique words for each theater?

This question is answered by taking the content of the review data written and then looking at it on the word then we can see a comparison of the counts of words. Below is the code to separate the reviews into just the words.

tidy_cinemark <- 
  cinemark_reviews %>%
  unnest_tokens(word,review_content) %>%
  anti_join(stop_words)

tidy_esquire <- 
  esquire_reviews %>%
  unnest_tokens(word,review_content) %>%
  anti_join(stop_words)

In the code above the object of cinemark_reviews and esquire_reviews hold the data for each theater and then it is put into unnest_tokens for the review content which is separating the content into just the words. Below it the code for then looking at the top 10 words for each theater. If you look at the code there has been some words filter which do not make sense to look at in the top ten because they are going to be said the most for both theaters or they are the name of the theater. Here is the list of words: movie, theater, esquire, cinemark, theaters and theaters. I have removed movie because it was by far the most for both place then followed by the word theater and the other spellings of theater. I have removed the names of the places because it makes sense they would say there names.

tidy_cinemark %>%
  group_by(word) %>% 
  filter(word != 'movie' & 
         word != 'theater' & 
         word != 'esquire' & 
         word != 'theaters' & 
         word != 'theaters' &
         word != 'cinemark') %>% 
  summarize(n = n()) %>% 
  arrange(-n) %>% 
  head(10)

tidy_esquire %>%
  group_by(word) %>% 
  filter(word != 'movie' & 
         word != 'theater' & 
         word != 'esquire' & 
         word != 'theaters' & 
         word != 'theaters' &
         word != 'cinemark') %>% 
  summarize(n = n()) %>% 
  arrange(-n) %>% 
  head(10)

Below are the the results for each theater.

Cinemark Oakley Station and XD:

Word Count

1 seats 93

2 clean 54

3 nice 44

4 food 39

5 popcorn 38

6 parking 37

7 love 36

8 tickets 36

9 concession 33

10 seat 33

Esquire Theatre:

Word Count

1 movies 55

2 films 46

3 love 38

4 theatre 33

5 independent 29

6 film 26

7 time 25

8 experience 21

9 popcorn 21

10 parking 20

The take away from these charts is there seems for be that at Esquire Theatre there is a more a focus on the movie itself with the words movies and films in the top potions as well as the word film. In Cinemark Oakley Station and XD there seems to me more a a focus on the concessions and the food as well as the seats but popcorn is in both top tens. It is interesting the amount of times both theaters use the word parking which in this context can not be know if it is positive or negative but there could be more analysis done with words that go with parking.

Question 2: What is the percentage of words for each of the theaters for each sentiment through NRC?

In this question there will be a sentiment that is looked at for each of the words that is considered to be be one of NRC emotions that is associated with that word. Below is the code is how to get the NRC as well as how to assignment the sentiment scores counts to the emotions as well as the percentage those emotions are of all the reviews for the theater.

nrc <- 
  get_sentiments("nrc")

cinemark_sentiment <- 
  tidy_cinemark %>% 
  filter(word != 'movie' & 
           word != 'theater' & 
           word != 'esquire' & 
           word != 'theaters' & 
           word != 'theaters' &
           word != 'cinemark') %>% 
  inner_join(nrc, by = "word", relationship = "many-to-many") %>% 
  group_by(sentiment) %>%
  summarize(`Count`=n(),
            `Percent of scoreable words` = `Count`/nrow(.)) %>% 
  arrange(-`Percent of scoreable words`)

esquire_sentiment <- 
  tidy_esquire %>% 
  filter(word != 'movie' & 
           word != 'theater' & 
           word != 'esquire' & 
           word != 'theaters' & 
           word != 'theaters' &
           word != 'cinemark') %>% 
  inner_join(nrc, by = "word", relationship = "many-to-many") %>% 
  group_by(sentiment) %>%
  summarize(`Count`=n(),
            `Percent of scoreable words` = `Count`/nrow(.)) %>% 
  arrange(-`Percent of scoreable words`)

Below this is combining the two vectors into one vector.

cinemark_sentiment$book <- "Cinemark"
esquire_sentiment$book <- "Esquire"

combined_sentiment <- 
  bind_rows(cinemark_sentiment,esquire_sentiment)

The the final step it to put it into a graph with the code below.

combined_sentiment %>% 
  ggplot(aes(x=sentiment, y = `Percent of scoreable words`, fill=book))+
  geom_col(position = "dodge") +
  scale_fill_brewer(palette="Set1") +
  scale_y_continuous(labels = scales::percent) +
  labs(title = "A comparison of the emotive sentiments found in Cinemark and Esquire movie Theathers",
       subtitle = "Using the NRC Lexicon (Mohammad and Turney, 2013), shown as a percent of scorable words",
       x = "Sentiment",
       fill = "Book")

In this graph the first look at negative and positive and it is clear to see that people have more positive opinions about Esquire and more negative opinions about Cinemark and the difference between both is similar. It is also interesting to note that people fear but also trust Cinemark more than Esquire which seems that it could contrast each feeling. The biggest difference is the amount more of surprise that people have for Esquire. It seems that people have feeling that do not really make sense and do not go with each other that negative and positive.

Question 3: What is the the the average rating of each movie theater over the time reviewed?

In this question the first step is to create a label for each review and then combine each review so that they are both in one to compare each theater. Below is the code to achieve this task.

cinemark_reviews$theater <- "Cinemark"
esquire_reviews$theater <- "Esquire"
combined_reviews <-
  bind_rows(cinemark_reviews,esquire_reviews)

The next step is to graph the the results on a scatter plot graph which the code for doing is below as well as the graph it provides.

combined_reviews %>% 
  group_by(theater) %>% 
  ggplot(aes(x = review_date , y = review_rating, color = theater)) +
  geom_point(alpha = .75) +
  labs(title = "Review Scores Over Time for Each Review Rating",
       x = "Date",
       y = "Review Rating")

In this graph the obvious can be seen that Cinemark was opened later because they reviews start at a later date. When the first reviews for Esquire started to come in they were all very positive and took a few years before they went down. The reviews for Cinemark seem to be all over the place and have no real idea of what it should be rated. The last point is that Esquire has not revived a 1 star review.