Executive summary

Generative AI has become an indispensable part of our daily lives. According to statistics released by Adobe in April 2024, 53% of Americans have used generative models, with 81% applying them for personal tasks, 30% for work, and 17% for academic purposes. As the market has grown, the variety of generative AI services has diversified, and competition has intensified. As of January 2025, there are approximately 67,000 companies related to generative AI. In this vast generative AI market, it is crucial to first understand how different generative AI apps are perceived by users in terms of their key characteristics and satisfaction to deliver attractive services to consumers. Accordingly, the primary research question set for this study was: “How are three generative apps positioned among users in terms of satisfaction and use?” To address this, the following sub-questions were developed:

  1. What emotions do people feel toward each generative AI app, and how satisfied are they?
  2. What differentiating features and characteristics do users perceive in each generative AI app?
  3. Which features or functions of each generative AI app do users find satisfying or unsatisfying?

To answer these questions, sentiment analysis, tf-idf analysis, and bigram analysis were conducted. The results, based on reviews written in early 2024, allowed for the classification of the three AI apps by satisfaction and use. Satisfaction was defined by how positive users’ emotions were toward each app. Use was determined by whether users recognized the app’s primary purpose as new content generation, or whether they perceived it as serving other purposes such as searching information. Using the ‘afinn’ lexicon, average sentiment scores were calculated, ranking the apps as follows: ChatGPT, Copilot and BingAI. The apps were mapped according to the actual gaps in sentiment scores. For ChatGPT, tf-idf and bigram analyses confirmed active use for information retrieval, image generation, and essay writing. Copilot showed similar patterns to ChatGPT but with relatively more mentions of image generation. Bing was mainly used for search purposes, with significantly more mentions related to app-tech. This analysis was conducted using reviews collected in early 2024, so its timeliness may be somewhat limited. The uneven distribution of review counts and platforms also makes generalization difficult. In addition, a limitation of this study is that it was difficult to identify the unique characteristics of each app or to make comparisons with same AI services available on desktop platforms based solely on user reviews.Therefore, collecting more recent reviews—including those from communities like Reddit—and conducting positioning analyses across more than two dimensions would be a meaningful direction for future research.

Data background

The review data for generative AI apps was downloaded from Kaggle, extracted by Reham Alabduljabbar. The dataset consists of reviews from users of five generative AI applications: ChatGPT, Bing AI, Microsoft Co-Pilot, Gemini AI, and Da Vinci AI. User reviews were collected from the App Store and Google Play between January 2023 and March 2024, all written in English by users in the United States. The dataset used for analysis consists of a total of 7,736 reviews(rows) and 12 columns: ‘No.of.reviews’, ‘review_id’(reviewer ID), ‘source’(review platform), ‘review_title’, ‘review_description’, ‘rating’ and ‘thumbs_up’(satisfaction ratings from the app installation platform), ‘review_date’, ‘appVersion’, ‘language_code’, ‘country_code’, and ‘App_name’(type of generative AI app)

The distribution of review periods and platforms varies by app, as shown in the accompanying table. Notably, no reviews for Gemini were collected from the App Store, and only one review for DaVinci was collected from Google Play. Additionally, Gemini reviews were collected only in March 2024, and Copilot reviews were collected starting from December 2023, unlike the other apps. Investigation of app release dates confirmed that Gemini was not available on the App Store until November 2024, making review collection before March 2024 impossible. On Google Play, Gemini was released in early February 2024, and review activity increased from March onward. To ensure the representativeness of the sample, this analysis includes only ChatGPT, Copilot, and Bing AI, which had a sufficient number of reviews on both the App Store and Google Play between January and March 2024.

Data loading, cleaning and preprocessing

To extract only the necessary data, the dataset was filtered to include only the entries where the ‘app’ column contained ‘ChatGPT’, ‘Copilot’, or ‘BingAI’. Additionally, the date column was processed using the floor_date function to limit the dataset to the period from January to March 2024.

ai_review <- read.csv("ai_app_review.csv", encoding = "UTF-8")
head(ai_review)
##   No.of.reviews                            review_id      source review_title
## 1             0 d2676cae-da30-4e57-9a05-21ee5cd78495 Google Play             
## 2             1 b313673d-5446-4479-acc0-5c780181d482 Google Play             
## 3             2 9bce6943-4a9e-4fc1-a2f7-eef460fd1b1d Google Play             
## 4             3 eb57122a-2da6-4d25-8610-f211bf06fbe1 Google Play             
## 5             4 c70e4556-b060-4538-bde3-6b272a18153c Google Play             
## 6             5 0b91d0fb-d2e3-46fc-9916-8ec75e51e6fd Google Play             
##                                                                                                                                                                                                                                                                                                                                                     review_description
## 1 quite refreshing and impressive but not sure why the search function stuck halfway, failed to load any hit even after retrying, not sure if it's the issue with my phone or internet connection but when this happen and i try Chrome it works well and fast. keep up the good job... But why it keeps force closing when I'm streaming video as well as tab freeze.
## 2                                                                                                                                                                                                                                                                                                                                                               nice 🙂
## 3                                                                                                                                                                                                                                                                                                                                                                   ✌🏻
## 4                                                                                                                                                                                                                                                                                                                              very good and intresting .i lov rashiya
## 5                                                                                                                                                                                                     In Bing news, there is no way to translate it, in Google news, there is option to translate it just like I do on browser. Please add basic features it's a shame
## 6                                                                                                                                                                                                                                                                                      it's help me with all my mathematics assignment so for it's a solid five star 🌟
##   rating thumbs_up     review_date     appVersion laguage_code country_code
## 1      3         0 3/24/2024 18:05 28.0.420319014           en           us
## 2      1         0 3/24/2024 18:04 28.0.420319010           en           us
## 3      5         0 3/24/2024 17:42 27.9.420301046           en           us
## 4      5         0 3/24/2024 17:17 27.9.420301046           en           us
## 5      2         0 3/24/2024 16:58                          en           us
## 6      5         0 3/24/2024 16:52 27.9.420301046           en           us
##   App_name
## 1   BingAI
## 2   BingAI
## 3   BingAI
## 4   BingAI
## 5   BingAI
## 6   BingAI
review_clean <- ai_review %>%
  mutate(date = mdy_hm(review_date)) %>%  
  mutate(date = as.Date(date)) %>%
  mutate(rating = as.numeric(rating)) %>%
  rename(number = No.of.reviews, platform = source, app = App_name) %>%
  select(number, platform, review_description, rating, date, app) %>%
  mutate(review_description = str_replace_all(review_description, "[^A-Za-z0-9 ]", "")) %>%
  mutate(date = floor_date(date, "month"), 
         date = format(date, "%Y-%m")) %>%
  filter(app %in% c("chatgpt", "copilot", "BingAI"), 
         date %in% c("2024-01", "2024-02", "2024-03"), 
         str_length(review_description) >= 5)
head(review_clean)
##   number    platform
## 1      0 Google Play
## 2      1 Google Play
## 3      3 Google Play
## 4      4 Google Play
## 5      5 Google Play
## 6      6 Google Play
##                                                                                                                                                                                                                                                                                                                                            review_description
## 1 quite refreshing and impressive but not sure why the search function stuck halfway failed to load any hit even after retrying not sure if its the issue with my phone or internet connection but when this happen and i try Chrome it works well and fast keep up the good job But why it keeps force closing when Im streaming video as well as tab freeze
## 2                                                                                                                                                                                                                                                                                                                                                       nice 
## 3                                                                                                                                                                                                                                                                                                                      very good and intresting i lov rashiya
## 4                                                                                                                                                                                                 In Bing news there is no way to translate it in Google news there is option to translate it just like I do on browser Please add basic features its a shame
## 5                                                                                                                                                                                                                                                                                its help me with all my mathematics assignment so for its a solid five star 
## 6                                                                                                                                                                                                                                                                                                                                                      sahi h
##   rating    date    app
## 1      3 2024-03 BingAI
## 2      1 2024-03 BingAI
## 3      5 2024-03 BingAI
## 4      2 2024-03 BingAI
## 5      5 2024-03 BingAI
## 6      3 2024-03 BingAI
tidy_review <- review_clean %>%
  unnest_tokens(input = review_description, output = word, drop = F) %>%
  anti_join(stop_words) %>%
  select(number, app, word, rating, date, platform)
## Joining with `by = join_by(word)`
head(tidy_review)
##   number    app       word rating    date    platform
## 1      0 BingAI refreshing      3 2024-03 Google Play
## 2      0 BingAI impressive      3 2024-03 Google Play
## 3      0 BingAI     search      3 2024-03 Google Play
## 4      0 BingAI   function      3 2024-03 Google Play
## 5      0 BingAI      stuck      3 2024-03 Google Play
## 6      0 BingAI    halfway      3 2024-03 Google Play

Text data analysis

Anaysis and Figure 1

To address the first research question, sentiment analysis was performed using the ‘afinn’ lexicon, which measures both the direction (positive/negative) and intensity of emotions, enabling more precise and intuitive sentiment evaluation.

1. Comparison of Sentiment Score and Rating Score

By comparing sentiment scores, I aimed to quantitatively assess the emotions users experienced while using each app. Additionally, by comparing the sentiment scores from app reviews with the rating scores provided on each platform, I sought to determine whether the sentiment expressed in written feedback was consistent with the users’ formal satisfaction ratings. For objectivity, the sentiment score was defined as the mean of the AFINN sentiment values, and the rating score as the mean of the rating values.

Graph layout

For visualization, a scatter plot was chosen, with the x-axis representing sentiment score and the y-axis representing rating score, so that the distribution between the two values could be easily observed. To enhance readability, the axis labels were revised, the ‘theme_minimal’ function was used to simplify the background, and the size of the points and app names was adjusted.

Result analysis

Although all sentiment scores were greater than zero, they were generally close to 1 on average, indicating that the overall sentiment was not particularly high. The sentiment scores were highest in the following order: Copilot, ChatGPT, BingAI. In terms of rating scores, the order was: ChatGPT, Copilot, BingAI. To conduct a more detailed analysis, differences across platforms were examined. Unlike the results observed when both platforms were combined, ChatGPT had the highest sentiment and rating scores on both the App Store and Google Play. This suggests that users who used more positive language in their reviews also gave higher satisfaction ratings. The discrepancy between the results by platform and the combined results appears to be due to the uneven number of reviews for ChatGPT and Copilot across the two platforms. To account for this imbalance, satisfaction levels were assessed based on platform-specific results, showing the following order: ChatGPT, Copilot, and BingAI.

1-1) Comparison of Average Sentiment Score and Rating by Generative AI App(combined Google Play and the App Store)

afinn <- get_sentiments("afinn")
afinn
## # A tibble: 2,477 × 2
##    word       value
##    <chr>      <dbl>
##  1 abandon       -2
##  2 abandoned     -2
##  3 abandons      -2
##  4 abducted      -2
##  5 abduction     -2
##  6 abductions    -2
##  7 abhor         -3
##  8 abhorred      -3
##  9 abhorrent     -3
## 10 abhors        -3
## # ℹ 2,467 more rows
sentiment_rating <- tidy_review %>%
  inner_join(afinn) %>%
  select(app, word, value, rating) %>%
  group_by(app) %>%
  summarise(sentiment_score = mean(value, na.rm = TRUE), 
            rating_score = mean(rating, na.rm = TRUE))
## Joining with `by = join_by(word)`
sentiment_rating 
## # A tibble: 3 × 3
##   app     sentiment_score rating_score
##   <chr>             <dbl>        <dbl>
## 1 BingAI            0.931         3.37
## 2 chatgpt           1.14          4.05
## 3 copilot           1.16          3.80
sentiment_rating_visualize <- ggplot(sentiment_rating, aes(x = sentiment_score, y = rating_score, label = app)) +
  geom_point(size = 3, color = "steelblue") +
  geom_text(vjust = -0.5, size = 3) +
  labs(x = "Sentiment Score",
    y = "Rating Score",
    title = "Comparison of Average Sentiment Score and Rating by Generative AI App") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 12),
         plot.margin = margin(t = 20, r = 20, b = 20, l = 20))
sentiment_rating_visualize

ggsave(filename = "Comparison of Average Sentiment Score and Rating by Generative AI App.png", 
       plot = sentiment_rating_visualize, width = 8, height = 6, dpi = 300, units = "in", bg = "white")

1-2) Comparison of Average Sentiment Score and Rating in APP store

ios_sentiment_rating <- tidy_review %>%
  inner_join(afinn) %>%
  select(app, word, value, rating, platform) %>%
  group_by(app) %>%
  filter(platform == "App Store") %>%
  summarise(sentiment_score = mean(value, na.rm = TRUE), 
            rating_score = mean(rating, na.rm = TRUE))
## Joining with `by = join_by(word)`
ios_sentiment_rating 
## # A tibble: 3 × 3
##   app     sentiment_score rating_score
##   <chr>             <dbl>        <dbl>
## 1 BingAI            0.560         3.16
## 2 chatgpt           0.925         3.93
## 3 copilot           0.766         3.41
ios_visualize <- ggplot(ios_sentiment_rating, aes(x = sentiment_score, y = rating_score, label = app)) +
  geom_point(size = 3, color = "orange") +
  geom_text(vjust = -0.5, size = 3) +
  labs(x = "Sentiment Score",
    y = "Rating Score",
    title = "Comparison of Average Sentiment Score and Rating in APP store") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 12),
        plot.margin = margin(t = 20, r = 20, b = 20, l = 20))
ios_visualize

ggsave(filename = "Comparison of Average Sentiment Score and Rating in APP store.png", 
       plot = ios_visualize, width = 8, height = 6, dpi = 300, units = "in", bg = "white")

1-3) Comparison of Average Sentiment Score and Rating in Google Play

android_sentiment_rating <- tidy_review %>%
  inner_join(afinn) %>%
  select(app, word, value, rating, platform) %>%
  group_by(app) %>%
  filter(platform == "Google Play") %>%
  summarise(sentiment_score = mean(value, na.rm = TRUE), 
            rating_score = mean(rating, na.rm = TRUE))
## Joining with `by = join_by(word)`
android_sentiment_rating 
## # A tibble: 3 × 3
##   app     sentiment_score rating_score
##   <chr>             <dbl>        <dbl>
## 1 BingAI             1.25         3.54
## 2 chatgpt            1.83         4.42
## 3 copilot            1.62         4.26
android_visualize <- ggplot(android_sentiment_rating, aes(x = sentiment_score, y = rating_score, label = app)) +
  geom_point(size = 3, color = "tomato") +
  geom_text(vjust = -0.5, size = 3) +
  labs(x = "Sentiment Score",
    y = "Rating Score",
    title = "Comparison of Average Sentiment Score and Rating in Google Play") +
  theme_minimal() +
  theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 12),
        plot.margin = margin(t = 20, r = 20, b = 20, l = 20))
android_visualize

ggsave(filename = "Comparison of Average Sentiment Score and Rating in Google Play.png", 
       plot = android_visualize, width = 8, height = 6, dpi = 300, units = "in", bg = "white")

2. Comparison of Sentiment Words for Each Generative AI App

To gain a more detailed understanding of the specific emotions and evaluations users had for each Generative AI app, I compared the top 15 sentiment words used in the reviews of each app.

Graph layout

For visualization, since there were several apps, I used facet_wrap with ‘free_y’ scales to examine the distribution of sentiment words within each app. However, the x-axis scale was not set to “free” to compare absolute differences in sentiment word counts.
I adjusted the margins for readability, hid the legend, and sorted the bars by frequency so that the results could be easily interpreted. I also used distinctive brand colors for each app and arranged the graphs in order of rating score to avoid confusion.

Result analysis

According to the sentiment word analysis, the most frequently used word for ChatGPT was ‘love’, and the appearance of ‘helpful’ and ‘help’ among the top-ranked terms suggests that users received significant assistance in completing their tasks. On the other hand, ‘wrong’ was the 10th most frequently used sentiment word, indicating that the app also elicited negative emotional responses to some extent. In Copilot, ‘love’ was likewise the most frequently used word, and many other positive sentiment words were identified. However, ‘wrong’ also ranked as the 10th most frequently extracted word. For BingAI, ‘love’ again appeared most frequently, along with other positive sentiment words such as ‘nice’, ‘amazing’, ‘awesome’, ‘helpful’, and ‘excellent’, reflecting favorable user evaluations. Nonetheless, ‘bad’ was the 11th most frequently used sentiment word, suggesting some degree of negative feedback. The number of sentiment words was higher for ChatGPT and Copilot than for Bing, which is likely due to the greater total number of reviews for ChatGPT and Copilot. In all three apps, positive sentiment words appeared more frequently than negative ones, which may explain why the sentiment scores were greater than zero. Additionally, the gap between the number of positive and negative sentiment words was larger for ChatGPT and Copilot compared to Bing, which likely contributed to their higher sentiment scores.

ai_sentiment <- tidy_review %>%
  inner_join(afinn) %>%
  count(app, word, sort = TRUE) %>%
  group_by(app) %>%
  slice_max(n, n = 15, with_ties = FALSE) %>%
  ungroup()
## Joining with `by = join_by(word)`
ai_sentiment 
## # A tibble: 45 × 3
##    app    word          n
##    <chr>  <chr>     <int>
##  1 BingAI love         65
##  2 BingAI nice         54
##  3 BingAI rewards      45
##  4 BingAI amazing      30
##  5 BingAI easy         28
##  6 BingAI awesome      24
##  7 BingAI excellent    22
##  8 BingAI helpful      22
##  9 BingAI free         21
## 10 BingAI fun          21
## # ℹ 35 more rows
ai_sentiment$app <- factor(ai_sentiment$app, levels = c("chatgpt", "copilot", "BingAI", "DAVINCI", "gemini"))

ai_sentiment_visualize <- ggplot(ai_sentiment, aes(x = reorder_within(word, n, app), y = n, fill = app)) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ app, scales = "free_y", ncol = 3) +
  scale_x_reordered() +
  coord_flip() + 
  scale_fill_manual(values = c(
    "chatgpt" = "#ab68ff",
    "copilot" = "#09AA6C",
    "BingAI"  = "#ffb900")) +
  labs(title = "Comparison of top15 sentiment words by Generative AI apps", x = "Word", y = "Count") +
  theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 14),
         plot.margin = margin(t = 20, r = 20, b = 20, l = 20))
ai_sentiment_visualize

ggsave(filename = "Comparison of top15 sentiment words by Generative AI apps.png", 
       plot = ai_sentiment_visualize, width = 10, height = 6, dpi = 300, units = "in", bg = "white")

Analysis and Figure 2

1. tf-idf Analysis

To address the second research question, a tf-idf analysis was conducted to identify words that are rare across all reviews but frequent within each app’s reviews. Since the dataset included multiple documents (apps), tf-idf was chosen over log odds ratio. Stop words and irrelevant terms (e.g., contractions like it’s, I’m) were filtered out before visualization.

Graph layout

Due to the number of apps, facet_wrap was used with free scales instead of a shared y-axis to examine the distribution of tf-idf words within each app. To enhance readability, margins were adjusted, legends were hidden, and labels were angled to prevent overlap caused by long tf-idf values (up to three decimal places). Bars were sorted by frequency for quick interpretation. Each app was assigned a distinct color based on its brand identity to improve differentiation. The graphs were ordered by the apps’ rating scores, consistent with the sentiment analysis, to minimize confusion.

Result analysis

App names themselves frequently appear as top words due to review context. However, for BingAI, the term copilot was also prominent, suggesting a strong association. This aligns with the fact that both BingAI and Copilot are Microsoft products. The presence of microsoft among BingAI’s top tf-idf words further supports this, especially after Microsoft rebranded Bing as “Microsoft Copilot” in November 2023. In the case of ChatGPT, words such as ‘question’, ‘answer’, ‘school’, ‘essay’, and ‘custom’ were frequently extracted. From this, it can be inferred that users are utilizing AI functions in a question-and-answer format and mainly using ChatGPT for essay writing. As for ‘custom’ and ‘log’, since they are often used with multiple meanings, more detailed analysis is required. In the case of Copilot, similar to Bing, words related to its developer, such as ‘Microsoft’ and ‘365’, were extracted as top-ranking terms. Words like ‘chat’ and ‘gpt’ also appeared frequently, indicating that Copilot is often mentioned alongside ChatGPT. In fact, it was observed that many data/IT-related platforms such as DataNorth, DataCamp, and A-Zapier have published content comparing Copilot and ChatGPT. Similar to ChatGPT, the frequent appearance of ‘questions’ and ‘answer’ suggests that it is also used in a Q&A format. In addition, the frequent use of the word ‘create’ implies that Copilot is also utilized for content generation. For Bing, words like ‘rewards’, ‘earn’, and ‘receipt’ were frequently extracted. This is likely due to the ‘Bing Deal’ feature, which allows users to scan receipts to receive paybacks. It was also confirmed that a significant number of videos related to receipt scanning using Bing have been uploaded on TikTok. Since terms like ‘receipts’ and ‘notifications’ can be used in various contexts, further in-depth analysis is needed.

ai_tf_idf <- tidy_review %>%
  count(app, word, sort = T) %>%
  bind_tf_idf(term = word,           
              document = app,  
              n = n) %>%
  mutate(word_clean = tolower(str_replace_all(word, "['’‘`]", "") %>%
                              str_replace_all("[[:punct:]]", "")
                              )) %>%
  filter(!word_clean %in% c("im", "its", "ive", "aaaaaaaaaaaaaaa")) %>%
  group_by(app) %>%
  arrange(desc(tf_idf)) %>%
  slice_max(tf_idf, n = 10, with_ties = F) 
ai_tf_idf
## # A tibble: 30 × 7
## # Groups:   app [3]
##    app    word              n       tf   idf   tf_idf word_clean   
##    <chr>  <chr>         <int>    <dbl> <dbl>    <dbl> <chr>        
##  1 BingAI rewards          45 0.00629  0.405 0.00255  rewards      
##  2 BingAI notifications    14 0.00196  1.10  0.00215  notifications
##  3 BingAI tabs             10 0.00140  1.10  0.00154  tabs         
##  4 BingAI receipt           9 0.00126  1.10  0.00138  receipt      
##  5 BingAI receipts          8 0.00112  1.10  0.00123  receipts     
##  6 BingAI redeem            7 0.000978 1.10  0.00107  redeem       
##  7 BingAI earn             18 0.00252  0.405 0.00102  earn         
##  8 BingAI duty              6 0.000839 1.10  0.000921 duty         
##  9 BingAI homepage          6 0.000839 1.10  0.000921 homepage     
## 10 BingAI robux             6 0.000839 1.10  0.000921 robux        
## # ℹ 20 more rows
ai_tf_idf$app <- factor(ai_tf_idf$app, levels = c("chatgpt", "copilot", "BingAI", "DAVINCI", "gemini"))

ai_tf_idf_visualize <- ggplot(ai_tf_idf, aes(x = reorder_within(word, tf_idf, app),
                  y = tf_idf,
                  fill = app)) +
  geom_col(width = 0.7, show.legend = F) +
  coord_flip() +
  facet_wrap(~ app, scales = "free", ncol = 3) +
  scale_x_reordered() +
  scale_fill_manual(values = c(
    "chatgpt" = "#ab68ff",
    "copilot" = "#09AA6C",
    "BingAI"  = "#ffb900")) +
  labs(title = "Top 10 words in tf-idf by Generative AI App",
       x = "Words", y = "tf-idf") +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),
        plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
        plot.margin = margin(t = 20, r = 20, b = 20, l = 20))
ai_tf_idf_visualize

ggsave("Top 10 words in tf-idf by Generative AI App.png", 
       plot = ai_tf_idf_visualize, width = 10, height = 6, dpi = 300, units = "in", bg = "white")

2. Bigram analysis

As discussed above, TF-IDF allowed for a simple identification of the characteristics of each app, but it was sometimes difficult to understand the precise context in which certain words were used. To analyze the perceived features of each app in more detail, bigram analysis was conducted. Since bigrams capture two words used consecutively, they are more useful in understanding contextual meaning.

Graph layout

Initially, each app’s name appeared so frequently that bigram networks were formed mainly around those names. Therefore, the app names were removed prior to building the bigram networks. To ensure readability and preserve sufficient information for each app, a frequency filter was applied: 8 for ChatGPT, 5 for Copilot, and 4 for Bing. A similar number of bigrams was extracted for each app with these settings. Relatively more bigrams were observed in Copilot and ChatGPT, which is likely due to the larger number of total reviews for these two apps, as previously confirmed. To avoid confusion, brand colors were used for each app in the graphs, as applied above, and care was taken to prevent overlapping between words and nodes to improve readability. Additionally, the edges were darkened according to frequency, making the results easier to interpret.

Result analysis

Chatgpt

For Chatgpt, the context of the word ‘custom’, which was previously unclear, was clarified. The term ‘custom instructions’ was frequently used in reviews, suggesting that users recognize the feature that allows them to inform ChatGPT of their preferences, roles, and desired answer formats in advance, so that responses are tailored accordingly in future conversations. Also, frequent use of terms like ‘paid version’ and ‘free version’ indicates user awareness of differences between subscription plans. Moreover, phrases such as ‘real time’ and ‘images generate’ show that users perceive real-time responses and image generation features as notable aspects. Positive evaluations were also evident, with frequent bigrams like ‘user friendly’, ‘game changer’, and ‘highly recommend’.

Copilot

For Copilot, bigrams such as ‘create images’, ‘image generation’, and ‘search results’ were widely used, indicating core functionalities of Copilot like image creation and information retrieval. Compared to other generative AI apps like ChatGPT, Gemini, or Bing—which are not primarily focused on image creation—Copilot had more image-related bigrams, suggesting that users actively use it for generating images. Furthermore, as Copilot is integrated with the Bing search engine and allows real-time information retrieval, it is reasonable to assume that bigrams like ‘search engine’ appeared frequently. Like ChatGPT, positive expressions such as ‘user friendly’, ‘highly recommend’, and ‘game changer’ were also commonly found.

BingAI

For Bing, the frequent extraction of the term ‘search engine’ indicates that users mainly use Bing for information retrieval. The mention of ‘edge app’ also appeared often, reflecting Bing’s integration with the Edge browser. Bigrams such as ‘microsoft rewards’ and ‘gift card’ refer to app-based incentive features. Through the BingAI app, users can earn points from Microsoft by performing searches, which can be redeemed for various gift cards. This suggests that users perceive Bing not only as an AI app but also as a broader platform including reward-based utility features.

#Chat GPT
chatgpt_bigram <- review_clean %>%
  filter(app == "chatgpt") %>%
  unnest_tokens(input = review_description,
                output = bigram,
                token = "ngrams",
                n = 2) %>%
  separate(bigram, c("word1", "word2"), sep = " ") %>%
  filter(!word1 %in% c("br", stop_words$word, "chatgpt"),
         !word2 %in% c("br", stop_words$word, "chatgpt")) %>%
  count(word1, word2, sort = T) %>%
  na.omit() %>%
  filter(n >= 8)
chatgpt_bigram
##         word1        word2   n
## 1        chat          gpt 114
## 2        nice          app  20
## 3       voice         chat  19
## 4           5        stars  17
## 5          ai          app  17
## 6      highly    recommend  17
## 7        free      version  16
## 8     amazing          app  13
## 9     helpful          app  13
## 10 absolutely         love  12
## 11        app          ive  12
## 12       love         chat  12
## 13     custom instructions  11
## 14       game      changer  11
## 15   generate       images  11
## 16       real         time  11
## 17  excellent          app  10
## 18       paid      version  10
## 19    awesome          app   9
## 20       dont   understand   9
## 21          5         star   8
## 22       chat          bot   8
## 23      daily         life   8
## 24        gpt            4   8
## 25       mind      blowing   8
## 26       real       person   8
## 27     search       engine   8
## 28     social        media   8
set.seed(2023)
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

chatgpt_bigram_visualize <- ggraph(chatgpt_bigram, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "#ab68ff", size = 5) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1, repel = T) +
  theme_void() +
  labs(title = "Bigram analysis of ChatGPT") +
  theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
        plot.margin = margin(t = 20, r = 20, b = 20, l = 20))
chatgpt_bigram_visualize

ggsave("Bigram analysis of ChatGPT.png", 
       plot = chatgpt_bigram_visualize, width = 10, height = 6, dpi = 300, units = "in", bg = "white")
#Copilot
copilot_bigram <- review_clean %>%
  filter(app == "copilot") %>%
  unnest_tokens(input = review_description,
                output = bigram,
                token = "ngrams",
                n = 2) %>%
  separate(bigram, c("word1", "word2"), sep = " ") %>%
  filter(!word1 %in% c("br", stop_words$word, "copilot"),
         !word2 %in% c("br", stop_words$word, "copilot")) %>%
  count(word1, word2, sort = T) %>%
  na.omit() %>%
  filter(n >= 5)
copilot_bigram
##         word1        word2  n
## 1        chat          gpt 26
## 2          ai          app 23
## 3      create       images 18
## 4       image   generation  9
## 5           5        stars  8
## 6     amazing          app  8
## 7     chatgpt          app  8
## 8         gpt            4  8
## 9      highly    recommend  8
## 10       nice          app  8
## 11         ai         tool  7
## 12 artificial intelligence  7
## 13       chat      history  7
## 14      image     creation  7
## 15      image    generator  7
## 16     search       engine  7
## 17         24        hours  6
## 18 absolutely         love  6
## 19       bing         chat  6
## 20       user     friendly  6
## 21         ai    assistant  5
## 22        app        store  5
## 23       bing          app  5
## 24  excellent          app  5
## 25 generating       images  5
## 26   internet   connection  5
set.seed(2023)
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

copilot_bigram_visualize <- ggraph(copilot_bigram, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "#09AA6C", size = 5) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1, repel = T) +
  theme_void() +
  labs(title = "Bigram analysis of Copilot") +
  theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
        plot.margin = margin(t = 20, r = 20, b = 20, l = 20))
copilot_bigram_visualize

ggsave("Bigram analysis of Copilot.png", 
       plot = copilot_bigram_visualize, width = 10, height = 6, dpi = 300, units = "in", bg = "white")
#BingAI
bing_bigram <- review_clean %>%
  filter(app == "BingAI") %>%
  unnest_tokens(input = review_description,
                output = bigram,
                token = "ngrams",
                n = 2) %>%
  separate(bigram, c("word1", "word2"), sep = " ") %>%
  filter(!word1 %in% c("br", stop_words$word, "BingAI"),
         !word2 %in% c("br", stop_words$word, "BingAI")) %>%
  count(word1, word2, sort = T) %>%
  na.omit() %>%
  filter(n >= 4)
bing_bigram
##        word1     word2  n
## 1     search    engine 19
## 2       nice       app 18
## 3       bing       app 17
## 4       bing      chat 11
## 5          5     stars 10
## 6      image   creator  8
## 7         ai      chat  7
## 8       chat       gpt  7
## 9       love      bing  7
## 10      bing        ai  6
## 11     image generator  6
## 12    search   results  6
## 13        ai     image  5
## 14 excellent       app  5
## 15       gpt         4  5
## 16 microsoft   rewards  5
## 17    search   engines  5
## 18        ai       app  4
## 19       app      dont  4
## 20   copilot   feature  4
## 21      duty         4  4
## 22      edge       app  4
## 23      gift      card  4
## 24      gift     cards  4
## 25    search       box  4
set.seed(2023)
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

BingAI_bigram_visualize <- ggraph(bing_bigram, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "#ffb900", size = 5) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1, repel = T) +
  theme_void() +
  labs(title = "Bigram analysis of BingAI") +
  theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
        plot.margin = margin(t = 20, r = 20, b = 20, l = 20))
BingAI_bigram_visualize

ggsave("Bigram analysis of BingAI.png", 
       plot = BingAI_bigram_visualize, width = 10, height = 6, dpi = 300, units = "in", bg = "white")

Anaysis and Figure 3

Through the previous sentiment analysis, TF-IDF, and bigram analysis, we examined how satisfied users are with each app and which features they perceive as most significant. Now, we aim to combine these analyses to investigate which aspects of each app are evaluated positively or negatively by users. To achieve this, bigram networks were created by filtering for positive sentiment words such as “helpful” and “nice,” and negative ones like “bad” from each app’s bigram table. Although the goal was to use “helpful” and “bad” as standard sentiment words across all apps for objective comparison, meaningful bigram results could not be secured for every app due to variations in expression. Therefore, we conducted test runs with alternative words for each app and selected the most meaningful positive and negative sentiment words accordingly.

Graph layout

To ensure readability and facilitate comparative analysis, the bigram networks related to positive and negative sentiment words were placed side by side.

Result analysis

ChatGPT (helpful)

Words such as tool, information, fast, suggestion, and advice were frequently used together with helpful. This suggests that users perceive ChatGPT as an app that provides convenient and useful information and functions.

ChatGPT (bad)

Words like gateway and rewriting were frequently used with bad. Many online reviews have reported “bad gateway” errors, and the term rewriting may reflect limitations in the app’s ability to properly revise text or incorporate user feedback during content generation.

Copilot (helpful)

Words such as answers, resource, results, and tool appeared often with helpful. This implies that users recognize Copilot as an AI app that provides especially useful outputs.

Copilot (bad)

Words such as attitude, feels, and reskin were often used with bad. This suggests that users may not perceive significant improvements or functional differences from Microsoft’s previous AI products, resulting in negative emotional responses during use.

Bing (nice)

Words like search, art, picture, and easy frequently appeared with nice. This indicates that Bing is performing well in terms of information delivery and some image generation.

Bing (bad)

Words such as experience and search were used with bad, revealing that some users also have negative evaluations of Bing’s search function. Additionally, the appearance of economy in negative contexts suggests that some users may feel dissatisfaction with Bing’s reward-related features.

1) Chat GPT

#helpful 
chatgpt_helpful <- review_clean %>%
  filter(app == "chatgpt") %>%
  unnest_tokens(input = review_description,
                output = bigram,
                token = "ngrams",
                n = 2) %>%
  separate(bigram, c("word1", "word2"), sep = " ") %>%
  filter(!word1 %in% c("br", stop_words$word, "chatgpt"),
         !word2 %in% c("br", stop_words$word, "chatgpt")) %>%
  count(word1, word2, sort = T) %>%
  na.omit() %>%
  filter(word1 == "helpful") %>%
  graph_from_data_frame()
chatgpt_helpful
## IGRAPH a31c02a DN-- 19 19 -- 
## + attr: name (v/c), n (e/n)
## + edges from a31c02a (vertex names):
##  [1] helpful->app         helpful->tool        helpful->responses  
##  [4] helpful->application helpful->advice      helpful->ai         
##  [7] helpful->alot        helpful->apps        helpful->fast       
## [10] helpful->feature     helpful->helpful     helpful->information
## [13] helpful->life        helpful->lonely      helpful->love       
## [16] helpful->person      helpful->solve       helpful->suggestions
## [19] helpful->till
set.seed(2023)
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

chatgpt_helpful_graph <- ggraph(chatgpt_helpful, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "#ab68ff", size = 5) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1, repel = T) +
  theme_void() +
  theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
        plot.margin = margin(t = 20, r = 20, b = 20, l = 20))



#bad
chatgpt_bad <- review_clean %>%
  filter(app == "chatgpt") %>%
  unnest_tokens(input = review_description,
                output = bigram,
                token = "ngrams",
                n = 2) %>%
  separate(bigram, c("word1", "word2"), sep = " ") %>%
  filter(!word1 %in% c("br", stop_words$word, "chatgpt"),
         !word2 %in% c("br", stop_words$word, "chatgpt")) %>%
  count(word1, word2, sort = T) %>%
  na.omit() %>%
  filter(word1 == "bad") %>%
  graph_from_data_frame()
chatgpt_bad
## IGRAPH a3d19d7 DN-- 16 15 -- 
## + attr: name (v/c), n (e/n)
## + edges from a3d19d7 (vertex names):
##  [1] bad->gateway    bad->reviews    bad->ad         bad->ai        
##  [5] bad->answer     bad->constantly bad->day        bad->experience
##  [9] bad->fast       bad->game       bad->giving     bad->platform  
## [13] bad->reasoning  bad->rewriting  bad->words
set.seed(2023)
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

chatgpt_bad_graph <- ggraph(chatgpt_bad, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "#ab68ff", size = 5) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1, repel = T) +
  theme_void() +
  theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
        plot.margin = margin(t = 20, r = 20, b = 20, l = 20))



chatgpt_centered <- chatgpt_helpful_graph + chatgpt_bad_graph +
  plot_annotation(
    title = "Bigrams Centered on 'Helpful' and 'Bad' in ChatGPT Reviews",
    theme = theme(
      plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
      plot.margin = margin(t = 20, r = 20, b = 20, l = 20)
    )
  )

chatgpt_centered

ggsave("Bigrams Centered on 'Helpful' and 'Bad' in ChatGPT Reviews.png", 
       plot = chatgpt_centered, width = 10, height = 6, dpi = 300, units = "in", bg = "white")

2) copilot

#helpful 
copilot_helpful <- review_clean %>%
  filter(app == "copilot") %>%
  unnest_tokens(input = review_description,
                output = bigram,
                token = "ngrams",
                n = 2) %>%
  separate(bigram, c("word1", "word2"), sep = " ") %>%
  filter(!word1 %in% c("br", stop_words$word, "copilot"),
         !word2 %in% c("br", stop_words$word, "copilot")) %>%
  count(word1, word2, sort = T) %>%
  na.omit() %>%
  filter(word1 == "helpful") %>%
  graph_from_data_frame()
copilot_helpful
## IGRAPH a6a6251 DN-- 12 11 -- 
## + attr: name (v/c), n (e/n)
## + edges from a6a6251 (vertex names):
##  [1] helpful->application helpful->app         helpful->resource   
##  [4] helpful->answers     helpful->assistant   helpful->easy       
##  [7] helpful->images      helpful->love        helpful->results    
## [10] helpful->software    helpful->tool
set.seed(2023)
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

copilot_helpful_graph <- ggraph(copilot_helpful, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "#09AA6C", size = 5) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1, repel = T) +
  theme_void() +
  theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
        plot.margin = margin(t = 20, r = 20, b = 20, l = 20))



#bad
copilot_bad <- review_clean %>%
  filter(app == "copilot") %>%
  unnest_tokens(input = review_description,
                output = bigram,
                token = "ngrams",
                n = 2) %>%
  separate(bigram, c("word1", "word2"), sep = " ") %>%
  filter(!word1 %in% c("br", stop_words$word, "copilot"),
         !word2 %in% c("br", stop_words$word, "copilot")) %>%
  count(word1, word2, sort = T) %>%
  na.omit() %>%
  filter(word1 == "bad") %>%
  graph_from_data_frame()
copilot_bad
## IGRAPH a7de982 DN-- 7 6 -- 
## + attr: name (v/c), n (e/n)
## + edges from a7de982 (vertex names):
## [1] bad->attitude  bad->business  bad->feature   bad->feels     bad->microsoft
## [6] bad->reskin
set.seed(2023)
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

copilot_bad_graph <- ggraph(copilot_bad, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "#09AA6C", size = 5) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1, repel = T) +
  theme_void() +
  theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
        plot.margin = margin(t = 20, r = 20, b = 20, l = 20))



copilot_centered <- copilot_helpful_graph + copilot_bad_graph +
  plot_annotation(
    title = "Bigrams Centered on 'Helpful' and 'Bad' in copilot Reviews",
    theme = theme(
      plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
      plot.margin = margin(t = 20, r = 20, b = 20, l = 20)
    )
  )

copilot_centered

ggsave("Bigrams Centered on 'Helpful' and 'Bad' in copilot Reviews.png", 
       plot = copilot_centered, width = 10, height = 6, dpi = 300, units = "in", bg = "white")

3) BingAI

#nice 
BingAI_nice <- review_clean %>%
  filter(app == "BingAI") %>%
  unnest_tokens(input = review_description,
                output = bigram,
                token = "ngrams",
                n = 2) %>%
  separate(bigram, c("word1", "word2"), sep = " ") %>%
  filter(!word1 %in% c("br", stop_words$word, "BingAI"),
         !word2 %in% c("br", stop_words$word, "BingAI")) %>%
  count(word1, word2, sort = T) %>%
  na.omit() %>%
  filter(word1 == "nice") %>%
  graph_from_data_frame()
BingAI_nice 
## IGRAPH aa649c1 DN-- 12 11 -- 
## + attr: name (v/c), n (e/n)
## + edges from aa649c1 (vertex names):
##  [1] nice->app         nice->ai          nice->amount      nice->application
##  [5] nice->art         nice->cool        nice->easy        nice->hai        
##  [9] nice->pic         nice->search      nice->set
set.seed(2023)
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

BingAI_nice_graph <- ggraph(BingAI_nice, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "#ffb900", size = 5) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1, repel = T) +
  theme_void() +
  theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
        plot.margin = margin(t = 20, r = 20, b = 20, l = 20))



#bad
BingAI_bad <- review_clean %>%
  filter(app == "BingAI") %>%
  unnest_tokens(input = review_description,
                output = bigram,
                token = "ngrams",
                n = 2) %>%
  separate(bigram, c("word1", "word2"), sep = " ") %>%
  filter(!word1 %in% c("br", stop_words$word, "BingAI"),
         !word2 %in% c("br", stop_words$word, "BingAI")) %>%
  count(word1, word2, sort = T) %>%
  na.omit() %>%
  filter(word1 == "bad") %>%
  graph_from_data_frame()
BingAI_bad
## IGRAPH aac4f96 DN-- 7 6 -- 
## + attr: name (v/c), n (e/n)
## + edges from aac4f96 (vertex names):
## [1] bad->app        bad->experience bad->search     bad->broken    
## [5] bad->economy    bad->people
set.seed(2023)
a <- grid::arrow(type = "closed", length = unit(.15, "inches"))

BingAI_bad_graph <- ggraph(BingAI_bad, layout = "fr") +
  geom_edge_link(aes(edge_alpha = n), show.legend = FALSE,
                 arrow = a, end_cap = circle(.07, 'inches')) +
  geom_node_point(color = "#ffb900", size = 5) +
  geom_node_text(aes(label = name), vjust = 1, hjust = 1, repel = T) +
  theme_void() +
  theme(plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
        plot.margin = margin(t = 20, r = 20, b = 20, l = 20))


BingAI_centered <- BingAI_nice_graph + BingAI_bad_graph +
  plot_annotation(
    title = "Bigrams Centered on 'Nice' and 'Bad' in BingAI Reviews",
    theme = theme(
      plot.title = element_text(face = "bold", hjust = 0.5, size = 12),
      plot.margin = margin(t = 20, r = 20, b = 20, l = 20)
    )
  )

BingAI_centered

ggsave("Bigrams Centered on 'Nice' and 'Bad' in BingAI Reviews.png", 
       plot = BingAI_centered, width = 10, height = 6, dpi = 300, units = "in", bg = "white")