Set up

These are all of the library we need

# Add your library below.
# read csv
library(readr)
# text mining
library(tm)

## Loading required package: NLP

# for cleaning text
library(tidytext)
# word cloud
library(wordcloud)

## Loading required package: RColorBrewer

# Graph colors
library(paletteer)

## Warning: package 'paletteer' was built under R version 4.2.3

library(RColorBrewer)
# shiny app
library(shiny)

## Warning: package 'shiny' was built under R version 4.2.3

# To write functions that can be accessed by shiny
library(memoise)
# For ggplot axis
library(scales)

## 
## Attaching package: 'scales'

## The following object is masked from 'package:readr':
## 
##     col_factor

#used for data manipulation
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

#used to drop na values
library(tidyr)
#used for the plot
library(ggplot2)

## 
## Attaching package: 'ggplot2'

## The following object is masked from 'package:NLP':
## 
##     annotate

#used to get the USA map
library(maps)
#used for colors
library(RColorBrewer)
#used for reordering the count per timezone with ftc_reorder
library(forcats)
#used for making plot intractable
library(ggiraph)

## Warning: package 'ggiraph' was built under R version 4.2.3

# for ddply
library(plyr)

## ------------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## ------------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following object is masked from 'package:maps':
## 
##     ozone

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

# for grid.arrange
library(gridExtra)

## 
## Attaching package: 'gridExtra'

## The following object is masked from 'package:dplyr':
## 
##     combine

library(patchwork)
# to knit shinyapp
library(knitr)

Interactive Map Data Visualization by Cody Garrison

For this project I used the airplane data set and made a map that shows the different negative reasons that were said about the airlines and sorted them by timezone.

Get the data and filter it

Here I loaded the data to get read to use and then started by sorting the tweets by the amount of times a negative reason was given and the timezone it was recorded in.

#read the file
tweet <- read.csv("Tweets-1.csv")

#get the negative tweets and count them
negative_reasons <- tweet %>% 
  filter(airline_sentiment == "negative") %>% 
  group_by(user_timezone, negativereason) %>% 
  dplyr::summarise(count = n()) %>% 
  arrange(desc(count))

## `summarise()` has grouped output by 'user_timezone'. You can override using the
## `.groups` argument.

Get/create the map data

I then create the four U.S. timezones by using their lattitude and longitude so I can put it on a map and add it to the negative reasons data. I then get the map data from the maps package so I can put this information on a map.

#create timezones to use for the map and then add it to the negative reasons data
tz <- data.frame(
  user_timezone = c("Eastern Time (US & Canada)", "Central Time (US & Canada)", "Mountain Time (US & Canada)", "Pacific Time (US & Canada)"),
  lat = c(35.8333,39.8333, 45.6667, 37.7833),
  lon = c(-82.6167,-98.5833, -113.2000, -120.4167)
)
negative_reasons <- negative_reasons %>% left_join(tz, by = "user_timezone")

#get map data
map_data <- map_data("usa")

Sort the timezone data individually

I then use the forcats package to sort each timezone by the most to least amount of negative reasons so that they show up correctly on the map.

#orders negative reasons based on the count per timezone
negative_reasons <- negative_reasons %>% mutate(negativereason = fct_reorder(negativereason, count, .desc = TRUE))

negative_reasons <- drop_na(negative_reasons)

negative_count <- negative_reasons %>% group_by(user_timezone) %>% 
  dplyr::summarise(total_count = sum(count),
            .groups = 'drop') %>% as.data.frame()

# Now create another column with the total number of words corresponding to each airline
negative_reasons <- negative_reasons %>% inner_join(negative_count, by= "user_timezone")

negative_reasons$ratio <- round((negative_reasons$count/negative_reasons$total_count)*100,3)

Map making

Next I create the colors that I will be using for the map, then create the ggplot using geom_polygon, geom_text, and geom_point but I use ggiraph to make the points intractable so that the map is easier to read. I also add the size of the points and change the color to the ones I picked. I then scale the map correctly and remove unnecessary guides and then remove the chart behind the plot by using theme_void(). Finally I use girafe(ggobj = gg) to load the plot.

my_colors <- c("#993853", "#34495e", "#7f8c8d", "#bbcc00", "#c0392b", "#d35400", "#2980b9", "#8e44ad", "#16a085", "#27ae60")

#create the ggplot object with normal geom and ggiraph for the points
gg <- ggplot(negative_reasons) +
  geom_polygon(data = map_data, aes(x = long, y = lat, group = group, tooltip = region), fill = "white", color = "black", size = 0.2) +
  #only need points as ggiraph since only thing that needs to be interacted with
  ggiraph::geom_point_interactive(aes(x = lon, y = lat, size = count * 2.5, color = negativereason, tooltip = paste(negativereason, "<br>", count, " tweets")), alpha = 0.8, shape   = 21) +
  geom_text(data = tz, aes(x = lon + 5, y = lat + 5, label = user_timezone), size = 2.75, fontface = "bold") +
  scale_size(range = c(2, 45)) +
  scale_color_manual(values = my_colors) +
  scale_fill_manual(values = my_colors) +
  coord_map() +
  #removes size from legend
  guides(size = FALSE) +
  theme_void()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.

## Warning in geom_polygon(data = map_data, aes(x = long, y = lat, group = group,
## : Ignoring unknown aesthetics: tooltip

## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.

#uses girafe to load the object so its intractable
girafe(ggobj = gg)

Create another map visualization but with percentage of negative reasons

#create the ggplot object with normal geom and ggiraph for the points
gg2 <- ggplot(negative_reasons) +
  geom_polygon(data = map_data, aes(x = long, y = lat, group = group, tooltip = region), fill = "white", color = "black", size = 0.2) +
  #only need points as ggiraph since only thing that needs to be interacted with
  ggiraph::geom_point_interactive(aes(x = lon, y = lat, size = ratio, color = negativereason, tooltip = paste(negativereason, "<br>", ratio, " %")), alpha = 0.8, shape   = 21) +
  geom_text(data = tz, aes(x = lon + 5, y = lat + 5, label = user_timezone), size = 2.75, fontface = "bold") +
  scale_size(range = c(2, 45)) +
  scale_color_manual(values = my_colors) +
  scale_fill_manual(values = my_colors) +
  coord_map() +
  #removes size from legend
  guides(size = FALSE) +
  theme_void()

## Warning in geom_polygon(data = map_data, aes(x = long, y = lat, group = group,
## : Ignoring unknown aesthetics: tooltip

#uses girafe to load the object so its intractable
girafe(ggobj = gg2)

Rationale

Creating this map was fun as I had a hard time figuring out how to make it easy to read and had to create the timezones as there isn’t anything in R that I found that could do it easily. I settled with making it interactive with points on each timezone with a bigger radius since it looked the best from everything I had tried. It did show some interesting results as every timezone had the same customer service issue as the highest negative reason. One thing that stood out was that the rankings of negative reasons in the Eastern and Central timezones were the same. The Mountain timezone was all over the place as it was a lot less data than the others but had the cancelled flight reason all the way at the bottom compared to the others having it as the third reason. The Pacific timezone was close to being similar to the Eastern and Central timezones but there was not much lost luggage and it was the third least reported negative reason in that area compared to the fourth in the others.

Sentiment Analysis by Marc Flores

Clean text data

tweet = read.csv("Tweets-1.csv")

clean_columns = c("airline", "text")

clean_tweets = tweet[clean_columns]

# Clean data
clean_tweets$text = gsub('@','', clean_tweets$text)
clean_tweets$text = tolower(clean_tweets$text)
clean_tweets$text = gsub('virginamerica', "", clean_tweets$text)
clean_tweets$text = gsub('united', "", clean_tweets$text)
clean_tweets$text = gsub('southwestair', "", clean_tweets$text)
clean_tweets$text = gsub('jetblue', "", clean_tweets$text)
clean_tweets$text = gsub('usairways', "", clean_tweets$text)
clean_tweets$text = gsub('americanAir', "", clean_tweets$text)
clean_tweets$text = gsub('flight', "", clean_tweets$text)
clean_tweets$text <- gsub("@\\w+", "", clean_tweets$text)
clean_tweets$text = removeNumbers(clean_tweets$text)
clean_tweets$text <- gsub("https?://.+", "", clean_tweets$text)
clean_tweets$text <- gsub("\\d+\\w*\\d*", "", clean_tweets$text)
clean_tweets$text <- gsub("#\\w+", "", clean_tweets$text)
clean_tweets$text <- gsub("[^\x01-\x7F]", "", clean_tweets$text)
clean_tweets$text <- gsub("[[:punct:]]", " ", clean_tweets$text)

Unnest tokens and create a word frequency data frame with sentiments.

Unnest tokens then join with sentiments from nrc.

# unnest tokens
tidy_twitter = clean_tweets %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words)

## Joining with `by = join_by(word)`

# Inner join with nrc sentiments
sentiment_twitter <- tidy_twitter %>% 
  inner_join(get_sentiments("nrc"))

## Joining with `by = join_by(word)`

## Warning in inner_join(., get_sentiments("nrc")): Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 9 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.

Calculate percentage of each sentiment

# Group by Airline and count the number of words of each sentiment
word_counts <- sentiment_twitter %>%
  dplyr::count(sentiment, airline) %>%
  group_by(airline) %>%
  ungroup() %>%
  mutate(word = fct_reorder(sentiment, n))

# Get the total number of words per airline disregarding sentiments 
num_df <- word_counts %>% group_by(airline) %>% 
  dplyr::summarise(total_sentiment = sum(n),
            .groups = 'drop') %>% as.data.frame()

# Now create another column with the total number of words corresponding to each airline
word_counts2 <- word_counts %>% inner_join(num_df, by= "airline")

# Get percentage by dividing the sentiment word count total word count 
word_counts2$percentage <- round((word_counts2$n/word_counts2$total_sentiment)*100, 2)

Graph Sentiment Ratio

ggplot(word_counts2, aes(x = sentiment, y = percentage, fill = percentage)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ airline, scales = "free_y") +
  coord_flip() +
  ggtitle("Sentiment Percentages\n") +
  geom_text(aes(label = percentage), hjust = -0.05, vjust = .4, size = 3.5) +
  theme_minimal() +
  labs(y = "Count of Sentiment", x = "NRC Lexicon", color = "Percentage") + 
  expand_limits(y = 30)

Rationale

I found that the US airways had something the most interesting. They seemed to have the highest negative sentiment and I believe this is due to their higher anticipation score as well as a higher sadness score compared to the rest of the airlines. US airways also had the highest anger which contributed to the higher negative sentiment. The best performing was Virgin Airways and this can be because of the low surprise and anticipation scores as people want to know their flight is on time. It can also be attributed to the low disgust and lowest anger among all the airlines. # Shiny App by Jae In Lee

Distribution analysis for hourly tweets by Jae In Lee

Create a simple hourly data distribution with ggplot2.

Extract Hours from tweet_created.
Create a data frame that contains means of airline_sentiment variables
Plot using geom_density() and geom_vline() with facet_wrap grouped by airline.
Do the same for negativereasons as colored value.
Use grid.arrange() to put two graphs together

# Get date and time from tweet_created
tweet$date <- parse_datetime(tweet$tweet_created, format = "%m/%d/%Y %H:%M")

# pull out only the hour and add it as a variable named hour
tweet$hour <- format(tweet$date, format = "%H")
tweet$hour <- as.numeric(tweet$hour)

# function that returns peak density
fun_peak <- function(x, adjust = 2) {
  d <- density(x, adjust = adjust)
  d$x[c(F, diff(diff(d$y) >= 0) < 0)]
}
# Create a dataframe of peak density
vline <- tweet %>%
  group_by(airline, airline_sentiment) %>%
  dplyr::summarise(peak = fun_peak(hour))

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.

## `summarise()` has grouped output by 'airline', 'airline_sentiment'. You can
## override using the `.groups` argument.

# Plot airline sentiment by hour
p1 <- ggplot(tweet, aes(x = hour, fill = airline_sentiment)) + 
  geom_density(alpha = 0.3) + 
  facet_wrap(~airline) +
  geom_vline(data = vline, aes(xintercept=peak, color = airline_sentiment), show.legend = FALSE) +
  labs(title = "Tweets Sentiments Peaks", x = "", y = "", fill = "Sentiment")

# Drop blank negative reason categories
neg_air <- filter(tweet, negativereason != "")

vline2 <- neg_air %>%
  group_by(negativereason, airline) %>%
  dplyr::summarise(peak = fun_peak(hour))

## Warning: Returning more (or less) than 1 row per `summarise()` group was deprecated in
## dplyr 1.1.0.
## ℹ Please use `reframe()` instead.
## ℹ When switching from `summarise()` to `reframe()`, remember that `reframe()`
##   always returns an ungrouped data frame and adjust accordingly.

## `summarise()` has grouped output by 'negativereason', 'airline'. You can
## override using the `.groups` argument.

# Plot negative reason by hour
p2 <- ggplot(neg_air, aes(x = hour, fill = negativereason)) + geom_density(alpha = 0.3) + geom_vline(data=vline2, aes(xintercept=peak, color=negativereason), show.legend = FALSE)  + 
  facet_wrap(~airline)
  labs(title = "Reason for Negative Tweets Peaks", x = "Hour", y = "", fill = "NegativeReason")

## $x
## [1] "Hour"
## 
## $y
## [1] ""
## 
## $fill
## [1] "NegativeReason"
## 
## $title
## [1] "Reason for Negative Tweets Peaks"
## 
## attr(,"class")
## [1] "labels"

grid.arrange(p1, p2)

Rationale

We see almost like a bell-shaped distribution for all airlines which starts from 12 AM at midnight and the number of tweets consistently grows larger until around 12PM to 1PM and begins decreasing all the way until midnight again. This also indicates that there is a peak timeframe for airline tweets. This time frame seems like it’s between noon to 1 PM for all airlines. Airline companies that pay attention to such patterns should be able to help them monitor tweets more effectively.

Step 1: Load the data.

airline <- read.csv("Tweets-1.csv")
sentiment <- read.csv("nrc.csv")

Step 1.1: We need to do some exploration of negative reason before looking at the wordcloud.

we see that customer service and late flight are the most tweeted issues among airlines.

ggplot(neg_air, aes(x = negativereason, fill = negativereason)) + geom_bar() +
  scale_fill_paletteer_d("MetBrewer::Renoir") +
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank()) + 
  labs(title = "Reason for Negative Tweets", x = "", y = "", fill = "Reasons") +
  facet_wrap(~airline)

Step 2 Write the function that will carete the shiny input variables by using hte memoise()

This function runs the variable “reason” which will be the input value in the ui section.

# Write your code below.
# Select just the text and airline

reasons <- unique(airline$negativereason)[-1]

getTermMatrix <- memoise(function(reason) {
  # Careful not to let just any name slip in here; a
  # malicious user could manipulate this value.
  if (!(reason %in% reasons))
    stop("Unknown Reason")
  
  text <- airline[airline$negativereason == reason,]$text
  
  myCorpus = Corpus(VectorSource(text))
  myCorpus = tm_map(myCorpus, content_transformer(tolower))
  myCorpus = tm_map(myCorpus, removePunctuation)
  myCorpus = tm_map(myCorpus, removeNumbers)
  myCorpus = tm_map(myCorpus, removeWords,
                    c("virginamerica", "jetblue", "united", "southwestair", "delta", "usairways", "americanair", "flight", stopwords("SMART")))
  
  
  myDTM = TermDocumentMatrix(myCorpus,
                             control = list(minWordLength = 1))

  m = as.matrix(myDTM)
  
  sort(rowSums(m), decreasing = TRUE)
  
})

Link of the Shiny App Below

Check out the app here: https://jaeinrprogramming.shinyapps.io/AirlineWordcloud/

Step 3 Build the app now.

## 
## Listening on http://127.0.0.1:3472

## Warning in tm_map.SimpleCorpus(myCorpus, content_transformer(tolower)):
## transformation drops documents

## Warning in tm_map.SimpleCorpus(myCorpus, removePunctuation): transformation
## drops documents

## Warning in tm_map.SimpleCorpus(myCorpus, removeNumbers): transformation drops
## documents

## Warning in tm_map.SimpleCorpus(myCorpus, removeWords, c("virginamerica", :
## transformation drops documents

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Warning in strwidth(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in strheight(words[i], cex = size[i], ...): "size" is not a graphical
## parameter

## Warning in text.default(x1, y1, words[i], cex = size[i], offset = 0, srt =
## rotWord * : "size" is not a graphical parameter

## Joining with `by = join_by(word)`

## Warning in inner_join(., sentiment): Each row in `x` is expected to match at most 1 row in `y`.
## ℹ Row 21 of `x` matches multiple rows.
## ℹ If multiple matches are expected, set `multiple = "all"` to silence this
##   warning.

Rationale:

I really wanted to compare all of the word cloud for the negative reasons since that could be helpful to detect words in tweets that are common among all types of customer issues. Knowing the frequent terms and the sentiments behind all types of negative tweets can help companies use automation to quickly identify what type of issue customers are experiencing so they can try to resolve them. The most prominent issue was the idea of having 10 different word clouds on a piece of paper. It would look unprofessional and cluttered. Also, Word clouds are often difficult to detect patterns let alone 10 of them. I decided to come up with a shiny app that the user can select whatever reason for the negative tweets and show a bar plot that shows what kind of sentiments are often associate with the reason. I cam across a really cool way to adjust the number of the minimum frequency and the maximum number of words in a word cloud in the Shinyapp’s official website.

The category that contains the most amount of negative words is “Late Flight” and there is most frequent use of positive words in the “Customer Service Issue” category. I found it interesting that some tweets addressing issues like customer service, booking and flight attendant complaint actually contains more positive words than negative words since they discuss topics which can be trickier to detect if there is an issue if your only using positive and negative sentiments. The negative tweet category that contains the most amount of negative words is “Late Flight”. So why does any of this matter? Using distribution patterns of emotions like this can help companies detect negative tweets and address and resolve the issue as timely as possible.

Final Project

Jae In Lee, Cody Garrison, Marc Flores

2023-04-21

Set up

Interactive Map Data Visualization by Cody Garrison

Get the data and filter it

Get/create the map data

Sort the timezone data individually

Map making

Create another map visualization but with percentage of negative reasons

Rationale

Sentiment Analysis by Marc Flores

Clean text data

Unnest tokens and create a word frequency data frame with sentiments.

Calculate percentage of each sentiment

Graph Sentiment Ratio

Rationale

Distribution analysis for hourly tweets by Jae In Lee

Create a simple hourly data distribution with ggplot2.

Rationale

Step 1: Load the data.

Step 1.1: We need to do some exploration of negative reason before looking at the wordcloud.

Step 2 Write the function that will carete the shiny input variables by using hte memoise()

Link of the Shiny App Below

Step 3 Build the app now.

Rationale: