Final project

Author

Nancy

Analyzing public sentiment toward ChatGPT on Twitter

Artificial intelligence (AI) has been increasingly adopted across various sectors, including manufacturing, finance, healthcare, and the arts, to drive innovation and enhance efficiency. Similarly, AI holds great potential for education, particularly in language learning, due to its ability to provide personalized learning experiences, immediate feedback, and interactive learning environments (Martin et al., 2024; Huang et al., 2023; Alharbi, 2023).

Given this potential, I am particularly interested in understanding public sentiment towards one of the most discussed and researched AI tools, ChatGPT, with a focus on its applications in language learning. The aim of this exploration is to provide insights that can be communicated to K-12 and broader educators, offering them a foundational understanding of the potential benefits and concerns of using ChatGPT into their daily practices, and fostering ethical AI awareness. Existing literature suggests that teachers’ readiness to use AI—which involves digital competency, ethical awareness, and a broader vision—influences their capacity to effectively and ethically implement AI tools like ChatGPT in classrooms (Wang et al.2023). Findings from this study could also serve as valuable evidence for policymakers in understanding how public perceptions towards using AI tools.

Furthermore, I hope to bridge the gap between public discourse and practical classroom applications, enabling language teachers to better integrate ChatGPT into their teaching practices to enhance student engagement and learning outcomes.

The following specific questions will be addressed through this analysis:

Who tweets most often about this topic? Which tweets gain the most popularity or are retweeted the most?
How does the frequency of tweets vary over time in the given dataset?
What are the most frequently used words in comments on tweets about ChatGPT?
What is the general public sentiment towards ChatGPT?
What are the most discussed topics related to ChatGPT usage?
What are the primary uses of ChatGPT in language learning contexts?
Is the sentiment towards ChatGPT in language learning contexts more positive compared to the overall sentiment?

The first packages will be installed and loaded first:

library(cld2)
library(dplyr)


Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

library(vader)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ forcats   1.0.0     ✔ readr     2.1.5
✔ ggplot2   3.5.1     ✔ stringr   1.5.1
✔ lubridate 1.9.3     ✔ tibble    3.2.1
✔ purrr     1.0.2     ✔ tidyr     1.3.1

── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(wordcloud2)
library(tidytext)
library(stringr)
library(ggplot2)
library(tm)

Loading required package: NLP

Attaching package: 'NLP'

The following object is masked from 'package:ggplot2':

    annotate

Data Wrangling

a.Data retrieval

Due to constraints (both API limitations and costs) in accessing large-scale Twitter data, I downloaded a public dataset from Kaggle, which contains 43,004 tweets mentioning ‘ChatGPT,’ ‘GPT3,’ or ‘GPT4,’ spanning from April 3, 2023, to May 12, 2023. It is saved as chatgpt_daily_tweets.

tweets_data <- read.csv("/cloud/project/chatgpt_daily_tweets.csv", stringsAsFactors = FALSE)

head (tweets_data)

                tweet_id             tweet_created            tweet_extracted
1  1.642889622681432e+18 2023-04-03 13:59:44+00:00 2023-04-08 01:07:02.538242
2 1.6428442314496123e+18 2023-04-03 10:59:22+00:00 2023-04-08 01:06:59.379927
3 1.6427385624866693e+18 2023-04-03 03:59:28+00:00 2023-04-08 01:06:52.504868
4 1.6429198880616448e+18 2023-04-03 15:59:59+00:00 2023-04-08 01:07:04.742617
5  1.642708351690711e+18 2023-04-03 01:59:25+00:00 2023-04-08 01:06:50.638068
6 1.6428593561893274e+18 2023-04-03 11:59:28+00:00 2023-04-08 01:07:00.375167
                                                                                                                                                                                                                                                                                                  text
1                                                                                                                                                                                   RT @jexep: เทคนิคฝึกภาษากับ ChatGPT ที่ผมลอง (ผมลองฝึก อังกฤษ - ญี่ปุ่น, อังกฤษ - เยอรมัน) ใช้วิธีเดียวกัน ได้ผลเป็นที่น่าพอใจครับ เหลือแค่…
2                                                                                                                                                                                                                                     ChatGPTをもっと活かせるChrome拡張機能4選 https://t.co/hfacFe570t
3                                                                                                                                                       RT @DarrellLerner: ChatGPT Plugins are the fastest way to get rich in 2023. \n\nI’ve created a step-by-step guide showing you how to earn $10…
4 Get an intelligent chatbot for your website in minutes with Chatbase AI. Train ChatGPT on your data and let it answer any question your users have. Simply upload a document or link and add the chat widget - it's that easy!\nMake Money using AI: https://t.co/yLHEqn4w9T https://t.co/ba54JvoRsM
5                                                      🔥Hey Guys, #ZenithSwap has launched at just $ 55,000 USD Marketcap. The ChatGPT of DEX - Reimagining DeFi with AI-Powered Yield Farming. Now at 4X. Lot of up potential at such low marketcap.🔥😇 $ARB $ZSP #Arbitrum https://t.co/9VWYtYzAJD
6                                                                                                                                         RT @sinsonetwork: Now! Join #SINSO DataLand^ChatGPT #Airdrop!\n📅3.23-4.6\n📌Tasks\n①Log in to&lt;https://t.co/Hlwqa7HG40&gt;\n②Try SINSO #ChatGPT&amp; twe…
  lang             user_id          user_name   user_username   user_location
1   th          4706577259       👷🏼 ♡ #GOT7     BPawarisa1a ในใจJacksonwang
2   ja          2264288640     ミミズクりんゆ    DRVO_Project       東京←岐阜
3   en          2383245894                 pk pradeep42329225           India
4   en 1633040597782081537           AR Leyva    ArrheniusLey  United Kingdom
5   en 1311403370670960640 Human Being 🇨🇳🇸🇬🇻🇳 KiarostamiBeing     Chicago, IL
6   en           462142717          MokoHaram        holymoko Zambia mufulira
                                                                                                                                                       user_description
1                                                                                              @JacksonWang852 ➖ รีวิว #รีวิวแบมพี #แบมพีอัพเดท ✨ #แบมพีสะสมแต้มบุญ #แบมอยากแจก
2 料理垢です。ｳｪｯﾌﾞ4年目 発言はファッキン個人の見解 TypeScript/Vue/React/広報/採用 ChatGPT plus プロンプトエンジニア組織や環境を作る人へのシフト 音楽活動：@DrvoProject
3                                                                                                                                             💐💐।।जय श्री महाकाल।।💐💐
4                                               Passionate about AI and its potential to transform the business landscape and shape the future. https://t.co/c3UpfbLbqH
5                                Freedom is the Recognition of Necessity • Market Socialism • Cheng Enfu Stan • Empiricism is the Science of Economics || Stay Humble ✌️
6                                                                                                                                                        Public faker🍂
               user_created user_followers_count user_following_count
1 2016-01-04 02:27:33+00:00                 1293                  445
2 2013-12-27 12:39:07+00:00                 7878                 4941
3 2014-03-11 06:04:10+00:00                  269                 4141
4 2023-03-07 09:43:36+00:00                  264                   24
5 2020-09-30 20:32:00+00:00                  447                  419
6 2012-01-12 16:25:12+00:00                 2286                  982
  user_tweet_count user_verified source retweet_count like_count reply_count
1            87051         False     NA         13640          0           0
2            76597         False     NA             0          0           0
3             3816         False     NA           628          0           0
4              198         False     NA             0          0           0
5            12949         False     NA             0          0           0
6            20438         False     NA           270          0           0
  impression_count
1                0
2              290
3                0
4               58
5                0
6                0

b.Data cleaning

After reviewing the first several rows of the dataset, all columns are in the correct data type, so no changes are needed. However, the text column contains tweets in various languages, so we need to filter out only the English tweets. Additionally, it is necessary to remove duplicates, emojis and unusual punctuation before proceeding further with tokenization

tweets_english <- tweets_data %>%
  filter(lang== "en")
head (tweets_english)

                tweet_id             tweet_created            tweet_extracted
1 1.6427385624866693e+18 2023-04-03 03:59:28+00:00 2023-04-08 01:06:52.504868
2 1.6429198880616448e+18 2023-04-03 15:59:59+00:00 2023-04-08 01:07:04.742617
3  1.642708351690711e+18 2023-04-03 01:59:25+00:00 2023-04-08 01:06:50.638068
4 1.6428593561893274e+18 2023-04-03 11:59:28+00:00 2023-04-08 01:07:00.375167
5 1.6428744495305933e+18 2023-04-03 12:59:26+00:00 2023-04-08 01:07:01.437161
6 1.6429952282095575e+18 2023-04-03 20:59:22+00:00 2023-04-08 01:07:10.575474
                                                                                                                                                                                                                                                                                                  text
1                                                                                                                                                       RT @DarrellLerner: ChatGPT Plugins are the fastest way to get rich in 2023. \n\nI’ve created a step-by-step guide showing you how to earn $10…
2 Get an intelligent chatbot for your website in minutes with Chatbase AI. Train ChatGPT on your data and let it answer any question your users have. Simply upload a document or link and add the chat widget - it's that easy!\nMake Money using AI: https://t.co/yLHEqn4w9T https://t.co/ba54JvoRsM
3                                                      🔥Hey Guys, #ZenithSwap has launched at just $ 55,000 USD Marketcap. The ChatGPT of DEX - Reimagining DeFi with AI-Powered Yield Farming. Now at 4X. Lot of up potential at such low marketcap.🔥😇 $ARB $ZSP #Arbitrum https://t.co/9VWYtYzAJD
4                                                                                                                                         RT @sinsonetwork: Now! Join #SINSO DataLand^ChatGPT #Airdrop!\n📅3.23-4.6\n📌Tasks\n①Log in to&lt;https://t.co/Hlwqa7HG40&gt;\n②Try SINSO #ChatGPT&amp; twe…
5        The plagiarism detector will introduce its #AI detection tool tomorrow, hoping to protect academic integrity in a post-#ChatGPT world. \n\nThe speedy launch and lack of an opt-out have #academics worried.\n\nAn important piece @liamhknox for @insidehighered \n\nhttps://t.co/pq7DB5r9An
6                                                                                                                                                              "Germany could follow in Italy's footsteps by blocking ChatGPT over data security concerns..." 👇 #readmorehere https://t.co/OS8DDsU2uD
  lang             user_id            user_name   user_username
1   en          2383245894                   pk pradeep42329225
2   en 1633040597782081537             AR Leyva    ArrheniusLey
3   en 1311403370670960640   Human Being 🇨🇳🇸🇬🇻🇳 KiarostamiBeing
4   en           462142717            MokoHaram        holymoko
5   en  896094324341039104 Dr. Susan D'Agostino susan_dagostino
6   en 1001039446555480064      Milli\U{01f968}      Milli19751
       user_location
1              India
2     United Kingdom
3        Chicago, IL
4    Zambia mufulira
5 New Hampshire, USA
6                   
                                                                                                                             user_description
1                                                                                                                   💐💐।।जय श्री महाकाल।।💐💐
2                     Passionate about AI and its potential to transform the business landscape and shape the future. https://t.co/c3UpfbLbqH
3      Freedom is the Recognition of Necessity • Market Socialism • Cheng Enfu Stan • Empiricism is the Science of Economics || Stay Humble ✌️
4                                                                                                                              Public faker🍂
5 Tech reporter @insidehighered. Mathematician. Bylines @WashingtonPost @TheAtlantic @QuantaMagazine @WIRED @BulletinAtomic @Nature @NPR @BBC
6                                                     👣NeverSaveTheBestForLater Nam myōhō renge kyō 🉐\n🍀firma qui: https://t.co/cE9wMNLF7s
               user_created user_followers_count user_following_count
1 2014-03-11 06:04:10+00:00                  269                 4141
2 2023-03-07 09:43:36+00:00                  264                   24
3 2020-09-30 20:32:00+00:00                  447                  419
4 2012-01-12 16:25:12+00:00                 2286                  982
5 2017-08-11 19:41:50+00:00                 4444                 1924
6 2018-05-28 09:56:17+00:00                  960                  743
  user_tweet_count user_verified source retweet_count like_count reply_count
1             3816         False     NA           628          0           0
2              198         False     NA             0          0           0
3            12949         False     NA             0          0           0
4            20438         False     NA           270          0           0
5             3320          True     NA             7         22           1
6            24651         False     NA             0          0           0
  impression_count
1                0
2               58
3                0
4                0
5             5769
6               36

tweets_english_only <- tweets_english |>
  mutate(text = str_remove_all(text, "http[s]?://\\S+|[^A-Za-z0-9[:punct:][:space:]]")) |> distinct(tweet_id, .keep_all = TRUE)

head (tweets_english_only)

                tweet_id             tweet_created            tweet_extracted
1 1.6427385624866693e+18 2023-04-03 03:59:28+00:00 2023-04-08 01:06:52.504868
2 1.6429198880616448e+18 2023-04-03 15:59:59+00:00 2023-04-08 01:07:04.742617
3  1.642708351690711e+18 2023-04-03 01:59:25+00:00 2023-04-08 01:06:50.638068
4 1.6428593561893274e+18 2023-04-03 11:59:28+00:00 2023-04-08 01:07:00.375167
5 1.6428744495305933e+18 2023-04-03 12:59:26+00:00 2023-04-08 01:07:01.437161
6 1.6429952282095575e+18 2023-04-03 20:59:22+00:00 2023-04-08 01:07:10.575474
                                                                                                                                                                                                                                                                    text
1                                                                                                                          RT @DarrellLerner: ChatGPT Plugins are the fastest way to get rich in 2023. \n\nI’ve created a step-by-step guide showing you how to earn 10…
2                 Get an intelligent chatbot for your website in minutes with Chatbase AI. Train ChatGPT on your data and let it answer any question your users have. Simply upload a document or link and add the chat widget - it's that easy!\nMake Money using AI:  
3                                                        Hey Guys, #ZenithSwap has launched at just  55,000 USD Marketcap. The ChatGPT of DEX - Reimagining DeFi with AI-Powered Yield Farming. Now at 4X. Lot of up potential at such low marketcap. ARB ZSP #Arbitrum 
4                                                                                                                                             RT @sinsonetwork: Now! Join #SINSO DataLandChatGPT #Airdrop!\n3.23-4.6\nTasks\nLog in to&lt;\nTry SINSO #ChatGPT&amp; twe…
5 The plagiarism detector will introduce its #AI detection tool tomorrow, hoping to protect academic integrity in a post-#ChatGPT world. \n\nThe speedy launch and lack of an opt-out have #academics worried.\n\nAn important piece @liamhknox for @insidehighered \n\n
6                                                                                                                                                         "Germany could follow in Italy's footsteps by blocking ChatGPT over data security concerns..."  #readmorehere 
  lang             user_id            user_name   user_username
1   en          2383245894                   pk pradeep42329225
2   en 1633040597782081537             AR Leyva    ArrheniusLey
3   en 1311403370670960640   Human Being 🇨🇳🇸🇬🇻🇳 KiarostamiBeing
4   en           462142717            MokoHaram        holymoko
5   en  896094324341039104 Dr. Susan D'Agostino susan_dagostino
6   en 1001039446555480064      Milli\U{01f968}      Milli19751
       user_location
1              India
2     United Kingdom
3        Chicago, IL
4    Zambia mufulira
5 New Hampshire, USA
6                   
                                                                                                                             user_description
1                                                                                                                   💐💐।।जय श्री महाकाल।।💐💐
2                     Passionate about AI and its potential to transform the business landscape and shape the future. https://t.co/c3UpfbLbqH
3      Freedom is the Recognition of Necessity • Market Socialism • Cheng Enfu Stan • Empiricism is the Science of Economics || Stay Humble ✌️
4                                                                                                                              Public faker🍂
5 Tech reporter @insidehighered. Mathematician. Bylines @WashingtonPost @TheAtlantic @QuantaMagazine @WIRED @BulletinAtomic @Nature @NPR @BBC
6                                                     👣NeverSaveTheBestForLater Nam myōhō renge kyō 🉐\n🍀firma qui: https://t.co/cE9wMNLF7s
               user_created user_followers_count user_following_count
1 2014-03-11 06:04:10+00:00                  269                 4141
2 2023-03-07 09:43:36+00:00                  264                   24
3 2020-09-30 20:32:00+00:00                  447                  419
4 2012-01-12 16:25:12+00:00                 2286                  982
5 2017-08-11 19:41:50+00:00                 4444                 1924
6 2018-05-28 09:56:17+00:00                  960                  743
  user_tweet_count user_verified source retweet_count like_count reply_count
1             3816         False     NA           628          0           0
2              198         False     NA             0          0           0
3            12949         False     NA             0          0           0
4            20438         False     NA           270          0           0
5             3320          True     NA             7         22           1
6            24651         False     NA             0          0           0
  impression_count
1                0
2               58
3                0
4                0
5             5769
6               36

c. understanding this twitter dataset:

We will use some metadata components to gain an initial understanding of this Twitter dataset. We will identify the most active users, determine which tweets received the most retweets, and examine how tweet trends vary over the retrieval period. These findings can be compared to later sentiment analysis results for deeper insights.

user_tweet_most <- tweets_english_only|>
  group_by(user_username) |>
  summarise(count = n()) |>
  arrange(desc(count))

head(user_tweet_most)

# A tibble: 6 × 2
  user_username   count
  <chr>           <int>
1 SaveToNotion       20
2 bitone_great       20
3 AI_Dev_News        18
4 gdprAI             14
5 gyzkard            14
6 u6lCxJzMV1Y6YLT    12

influence_user <- tweets_english_only |> select (user_username,user_followers_count) |> arrange (desc(user_followers_count)) |> distinct(user_username, .keep_all = TRUE) |>  mutate (rank = row_number()) |> head (20)

influence_user

     user_username user_followers_count rank
1     TheEconomist             27205851    1
2         business              9219642    2
3     elespectador              6689623    3
4       IndiaToday              6264283    4
5    IndianExpress              4265737    5
6         CNNChile              4026611    6
7    rapplerdotcom              3721922    7
8          Gizmodo              2754674    8
9         janboehm              2718988    9
10     TheAtlantic              2117429   10
11    globeandmail              2008375   11
12   BreitbartNews              1910120   12
13 ChannelNewsAsia              1242706   13
14         otvnews              1212344   14
15   EnsedeCiencia              1211588   15
16  business_today              1166909   16
17  TheSandboxGame              1091070   17
18       Expresoec              1058317   18
19         LePoint              1028388   19
20         BBCTech               913699   20

compare_result <- user_tweet_most |>
               left_join(influence_user, by="user_username")

head (compare_result )

# A tibble: 6 × 4
  user_username   count user_followers_count  rank
  <chr>           <int>                <dbl> <int>
1 SaveToNotion       20                   NA    NA
2 bitone_great       20                   NA    NA
3 AI_Dev_News        18                   NA    NA
4 gdprAI             14                   NA    NA
5 gyzkard            14                   NA    NA
6 u6lCxJzMV1Y6YLT    12                   NA    NA

Based on the results, we can see that Save ToNotion, and gdprAI have comparatively most tweets and are also ranked pretty high among influencers.

rtwt <- tweets_english_only [,c("text","tweet_created","retweet_count")] |> arrange(desc(retweet_count)) |> relocate(text)

head(rtwt)

                                                                                                                                             text
1 RT @Zenith_Swap: We are listing on our own exchange and SushiSwap at 2:30 PM UTC\n\nZenithSwap - The ChatGPT of DEX\nReimagining Decentralized…
2 RT @Zenith_Swap: We are listing on our own exchange and SushiSwap at 2:30 PM UTC\n\nZenithSwap - The ChatGPT of DEX\nReimagining Decentralized…
3 RT @Zenith_Swap: We are listing on our own exchange and SushiSwap at 2:30 PM UTC\n\nZenithSwap - The ChatGPT of DEX\nReimagining Decentralized…
4 RT @Zenith_Swap: We are listing on our own exchange and SushiSwap at 2:30 PM UTC\n\nZenithSwap - The ChatGPT of DEX\nReimagining Decentralized…
5 RT @Zenith_Swap: We are listing on our own exchange and SushiSwap at 2:30 PM UTC\n\nZenithSwap - The ChatGPT of DEX\nReimagining Decentralized…
6 RT @Zenith_Swap: We are listing on our own exchange and SushiSwap at 2:30 PM UTC\n\nZenithSwap - The ChatGPT of DEX\nReimagining Decentralized…
              tweet_created retweet_count
1 2023-04-03 07:59:32+00:00         25944
2 2023-04-03 00:59:30+00:00         25944
3 2023-04-03 00:59:25+00:00         25944
4 2023-04-03 00:59:40+00:00         25944
5 2023-04-03 01:59:35+00:00         25944
6 2023-04-03 00:59:52+00:00         25944

rtwt_unique15 <-rtwt |> distinct(text, .keep_all = TRUE) |> head(15)
head (rtwt_unique15)

                                                                                                                                              text
1  RT @Zenith_Swap: We are listing on our own exchange and SushiSwap at 2:30 PM UTC\n\nZenithSwap - The ChatGPT of DEX\nReimagining Decentralized…
2                                                  RT @johnvianny: Best AI Tools You Need To Know\n#chatgpt #chatgpt3 #ArtificialIntelligence #ai 
3        RT @Visiitapp:  SIIT Token Giveaway\n We're excited to announce the 30000 #SIIT Token Airdrop!\n Prize Pool - 30000\n\nFollow @VisiitApp…
4  RT @itsPaulAi: ChatGPT has now a big problem.\n\nGoogle just updated its free competitor, Bard.\n\nHere are 8 things impossible on ChatGPT but…
5 RT @MushtaqBilalPhD: ChatGPT is everywhere and everyone is using it.\n\nBut most academics don't know how to use it *smartly.*\n\nHere's how to…
6  RT @heykahn: ChatGPT is just the tip of the iceberg.\n\n1,000 AI tools were released in March.\n\nHere are the 10 most valuable AI tools to bo…
              tweet_created retweet_count
1 2023-04-03 07:59:32+00:00         25944
2 2023-05-01 09:58:52+00:00         17045
3 2023-04-20 15:59:25+00:00         14213
4 2023-05-12 21:59:18+00:00         13521
5 2023-04-06 06:58:57+00:00         12797
6 2023-04-15 08:59:36+00:00         11688

Apparently, the most discussed topics in the retweets include AI tools like ChatGPT, practical uses of ChatGPT for prompts, its application in academia, concerns about misinformation, the dangers of powerful AI, fake citations, and the potential negative impact on humanity.

tweets_data_summary <- tweets_english_only %>%
  mutate(day = as.Date(tweet_created)) %>%
  group_by(day) %>%
  summarise(count = n())

glimpse(tweets_data_summary)

Rows: 40
Columns: 2
$ day   <date> 2023-04-03, 2023-04-04, 2023-04-05, 2023-04-06, 2023-04-07, 202…
$ count <int> 522, 457, 483, 876, 438, 406, 456, 443, 384, 449, 509, 568, 472,…

We will create a time plot to show how the number of tweets varies over time. Given the relatively short time span of the dataset, this may not provide extensive insights, as I wasn’t able to retrieve real-time data.

plot <- ggplot(tweets_data_summary, aes(x = day, y = count)) +
  geom_line(color = "blue") +
  labs(title = "Number of Tweets Over Time",
       x = "Day",
       y = "Number of Tweets")

print(plot)

Except for April 6, the number of tweets on other days remained fairly consistent, around 500. To understand the cause of the peak on April 6, we can align this with any significant events or publications that may have driven increased discussion on that day.

d. Tidy data for Tokens

In this step, we will remove duplicated retweet texts to further clean the data for tokenization and sentiment analysis. We will use the unnest_tokens() function to convert the text into a one-token-per-row format. For the initial analysis, we will focus on unigrams, while bigrams will be used in later language learning explorations. We will then remove common stop words (e.g., “the,” “to,” “and,” “in”) that carry little meaning, as well as custom words specific to this dataset that do not contribute significant value.

chatgpt_tweets <- tweets_english_only |>select (text,tweet_id,tweet_created,lang,user_username) |> relocate(text) |> distinct(text, .keep_all = TRUE) 

head (chatgpt_tweets)

                                                                                                                                                                                                                                                                    text
1                                                                                                                          RT @DarrellLerner: ChatGPT Plugins are the fastest way to get rich in 2023. \n\nI’ve created a step-by-step guide showing you how to earn 10…
2                 Get an intelligent chatbot for your website in minutes with Chatbase AI. Train ChatGPT on your data and let it answer any question your users have. Simply upload a document or link and add the chat widget - it's that easy!\nMake Money using AI:  
3                                                        Hey Guys, #ZenithSwap has launched at just  55,000 USD Marketcap. The ChatGPT of DEX - Reimagining DeFi with AI-Powered Yield Farming. Now at 4X. Lot of up potential at such low marketcap. ARB ZSP #Arbitrum 
4                                                                                                                                             RT @sinsonetwork: Now! Join #SINSO DataLandChatGPT #Airdrop!\n3.23-4.6\nTasks\nLog in to&lt;\nTry SINSO #ChatGPT&amp; twe…
5 The plagiarism detector will introduce its #AI detection tool tomorrow, hoping to protect academic integrity in a post-#ChatGPT world. \n\nThe speedy launch and lack of an opt-out have #academics worried.\n\nAn important piece @liamhknox for @insidehighered \n\n
6                                                                                                                                                         "Germany could follow in Italy's footsteps by blocking ChatGPT over data security concerns..."  #readmorehere 
                tweet_id             tweet_created lang   user_username
1 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en pradeep42329225
2 1.6429198880616448e+18 2023-04-03 15:59:59+00:00   en    ArrheniusLey
3  1.642708351690711e+18 2023-04-03 01:59:25+00:00   en KiarostamiBeing
4 1.6428593561893274e+18 2023-04-03 11:59:28+00:00   en        holymoko
5 1.6428744495305933e+18 2023-04-03 12:59:26+00:00   en susan_dagostino
6 1.6429952282095575e+18 2023-04-03 20:59:22+00:00   en      Milli19751

token_tweets <- chatgpt_tweets |> 
  unnest_tokens(output = word, 
                input = text) |>
  relocate(word)
head(token_tweets)

           word               tweet_id             tweet_created lang
1            rt 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
2 darrelllerner 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
3       chatgpt 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
4       plugins 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
5           are 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
6           the 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
    user_username
1 pradeep42329225
2 pradeep42329225
3 pradeep42329225
4 pradeep42329225
5 pradeep42329225
6 pradeep42329225

tidy_tweets <- anti_join(token_tweets,
                         stop_words,
                         by = "word") 
head (tidy_tweets)

           word               tweet_id             tweet_created lang
1            rt 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
2 darrelllerner 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
3       chatgpt 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
4       plugins 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
5       fastest 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
6          rich 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
    user_username
1 pradeep42329225
2 pradeep42329225
3 pradeep42329225
4 pradeep42329225
5 pradeep42329225
6 pradeep42329225

token <- tidy_tweets |>
  count (word, sort=TRUE)

head (token)

     word     n
1 chatgpt 10891
2      rt  5464
3      ai  3522
4    gpt4   560
5     amp   557
6  openai   489

Create a list of my_words based on the initial filtration result to further tidy the tokens.

my_stopwords <- c("rt", "4", "1", "10", "gpt","2","amp","pm","gpt4","3","chat","pro","chatgpt","ph","it’s","viu","5")

my_token <- tidy_tweets |> filter(!word %in% my_stopwords)
head (my_token)

           word               tweet_id             tweet_created lang
1 darrelllerner 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
2       plugins 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
3       fastest 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
4          rich 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
5          2023 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
6          i’ve 1.6427385624866693e+18 2023-04-03 03:59:28+00:00   en
    user_username
1 pradeep42329225
2 pradeep42329225
3 pradeep42329225
4 pradeep42329225
5 pradeep42329225
6 pradeep42329225

my_token_2 <-my_token |> count (word,sort=TRUE)
head (my_token_2)

    word    n
1     ai 3522
2 openai  489
3 people  473
4   code  456
5  tools  453
6   time  447

Data Exploration

Top_tokens:

top_words <- my_token_2 |> 
  top_n(50)

Selecting by n

top_words

           word    n
1            ai 3522
2        openai  489
3        people  473
4          code  456
5         tools  453
6          time  447
7          free  432
8         write  417
9          data  366
10       google  365
11        canva  357
12      prompts  339
13       prompt  321
14      youtube  284
15      premium  282
16         copy  265
17     language  263
18       create  261
19      writing  252
20       search  242
21        world  242
22      netflix  232
23      spotify  228
24    grammarly  227
25        learn  227
26      powered  227
27      account  225
28     quillbot  225
29        human  208
30          app  207
31       disney  207
32         read  205
33 intelligence  202
34       future  201
35         tool  201
36   technology  196
37      content  195
38        check  194
39          day  192
40        model  188
41   coursehero  185
42         tech  180
43   artificial  179
44      picsart  178
45     business  177
46       answer  173
47   generative  173
48      vivamax  173
49         text  172
50        based  171

wordcloud2(top_words)

top_words <- my_token_2 |> 
  top_n(30)

Selecting by n

word_frequency <- top_words|> mutate(word=reorder(word,n))

ggplot(word_frequency, aes(x=n,y =word)) + geom_col()+ labs(y=NULL) + theme(axis.text.y = element_text(size = 8, hjust = 0.8))

After analyzing the most frequently used words in the comments, we can identify that the central theme revolves around AI, with other prominent clusters relating to “people,” “code,” “tools,” and “time.” These words suggest that a significant usage of ChatGPT is for tools related to coding and saving time. Additionally, words like “free,” “write,” and “language” reflect the relevance of my research interest in language learning and its use in language contexts. Other words such as “google,” “canva,” and “youtube” may indicate how AI tools are integrated or compared with other popular platforms, supporting broader applications. Later on, we will use topic modeling (e.g., LDA) to categorize these frequent words into different topics or trends, allowing a better understanding of what people are discussing most.

3.Modelling

Now we are ready to begin exploring the sentiments expressed about ChatGPT on Twitter. To conduct the sentiment analysis, we will use the vader() package, which allows us to analyze each tweet as a whole. VADER (Valence Aware Dictionary and sEntiment Reasoner) is specifically optimized for social media text, making it well-suited for analyzing tweets. It takes into account emoticons, capitalization, punctuation, and other nuances that are common in social media content.

tweets_sample <- chatgpt_tweets |>
  sample_n(1000)

vader_chatgpt <- vader_df(tweets_sample$text) |>
    select (text, compound, word_scores,pos,neu,neg,but_count)
    
head(vader_chatgpt)

                                                                                                                                                                                text
1                                                             @at0x_eth It got us all feeling like we're living in an animation  @StabilityAI \n\n#AIArt #NFTs #ChatGPT #midjourney 
2                                                                                                                                          Using #ChatGPT to rewrite blobs of text. 
3                                      RT @PalantirChad: PLTR I asked ChatGPT to explain to a 5 year old what @PalantirTech does, using Lego as a reference for data. \n\nFeel free…
4                         RT @mrgreen: R.I.P., Siri.\n\nHello, ChatGPT.\n\nNow you can replace Siri with ChatGPT.\n\nIt’s free &amp; takes 30 seconds.\n\nIt works flawlessy.\n\nYo…
5 First it sends a prompt to #ChatGPT asking it to role play as an educational Twitter account that knows about all civilizations, societies, tribes, etc. throughout history. (2/n)
6                                                                                  I just published ChatGPT for SEO — How to Boost @Google  Ranking of your Website \n\nClick here: 
  compound
1    0.459
2    0.000
3    0.511
4    0.511
5    0.340
6    0.402
                                                                             word_scores
1                                {0, 0, 0, 0, 0, 0.5, 1.5, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
2                                                                  {0, 0, 0, 0, 0, 0, 0}
3          {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.3}
4                {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.3, 0, 0, 0, 0, 0, 0, 0, 0}
5 {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
6                                        {0, 0, 0, 0, 0, 0, 0, 1.7, 0, 0, 0, 0, 0, 0, 0}
    pos   neu neg but_count
1 0.211 0.789   0         0
2 0.000 1.000   0         0
3 0.121 0.879   0         0
4 0.130 0.870   0         0
5 0.082 0.918   0         0
6 0.162 0.838   0         0

mean(vader_chatgpt$compound)

[1] 0.200714

vader_chatgpt_summary <- vader_chatgpt |> 
  mutate(sentiment = ifelse(compound >= 0.05, "positive",
                            ifelse(compound <= -0.05, "negative", "neutral"))) |>
  count(sentiment, sort = TRUE) |> 
  spread(sentiment, n) |> 
  relocate(positive) |>
  mutate(ratio = negative/positive)

vader_chatgpt_summary

  positive negative neutral     ratio
1      529      176     295 0.3327032

The overall result shows that there are 500 positive tweets, compared to about one-third that are negative (176 tweets). This indicates a relatively high percentage of positive sentiment. The mean score of 0.19 also suggests a generally positive compound sentiment. While ChatGPT is generally well-received, there are still notable concerns or criticisms expressed by some users due to our preliminary understanding of the data. Therefore, we will take a look of the top positive and negative comments to get a full understanding.

top_positive <- vader_chatgpt |>
  arrange(desc(compound)) |> 
  head(10) 
head (top_positive)

                                                                                                                                                                                                                                                                                                           text
1                                                                        "Wishing the people of Sudan and our ChatGPT community a blessed Laylat al-Qadr that is better than a thousand months, as well as a happy Friday Eid, with hopes of prosperity and progress under the leadership of the Crown Prince."
2                                                     I see a lot of people on Twitter talking about ChatGPT &amp; how to get started etc. A kind stranger on Reddit has written something pretty awesome. \n\nThought I'd share it here for those interested. \n\nA collection of prompts and more. Enjoy.\n\n
3                                    Unlike ChatGPT, which is an AI language model, ShibaGPT leverages the power of GPT-3 to create witty and engaging memes that resonate with the crypto community. With ShibaGPT, you can enjoy the best of both worlds: the cuteness of Shiba and the intelligence of GPT. 
4                                                                                                                  @Uncle_Scwoop Hey there! The bundle you've been waiting for has arrived!\n\nIn appreciation of your support, I am excited to offer you a FREE copy of Ads 101 ChatGPT 2.0. \n\nMuch love\n\n
5                              ChatGPT was discussing the history of Poland with a friend when they stumbled upon the heroic stories of the Polish Army and the beautiful nature preserved by Lasy Pastwowe, as well as the impact of Jana Pawa II on the city of Gliwicach. They also shared their passion for
6 @Lycenae @Grady_Booch It’s absolutely wonderful as a companion to programming, providing substantial time saving. It’s also great as a component, but it’s being treated like magic, and GPT4, even inside clever solutions like BabyAGI, does not yet produce the complete products it’s being touted to do.
  compound
1    0.959
2    0.959
3    0.954
4    0.953
5    0.953
6    0.949
                                                                                                                                                       word_scores
1                           {0.9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.9, 0, 0, 0, 0, 1.9, 0, 0, 0, 0, 0, 1.1, 0, 0, 2.7, 0, 0, 0, 1.8, 0, 0, 0, 1.8, 0, 0, 0, 0, 0, 0, 0}
2                     {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.4, 0, 0, 0, 0, 0, 0, 2.2, 3.1, 0, 0, 1.2, 0, 0, 0, 0, 1.7, 0, 0, 0, 0, 0, 0, 2.493}
3              {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.1, 0, 0, 1.4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.2, 0, 3.2, 0, 0, 0, 0, 2.3, 0, 0, 0, 0, 2.1, 0, 0}
4                                                     {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2.3, 0, 0, 1.7, 0, 0, 1.4, 0, 0, 0, 0, 3.033, 0, 0, 0, 0, 0, 0, 0, 3.2}
5    {0, 0, 0, 0, 0, 0, 0, 0, 0, 2.2, 0, 0, 0, 0, 0, 2.6, 0, 0, 0, 0, 0, 0, 0, 2.9, 0, 0, 0, 0, 0, 0, 1.1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.4, 0, 2, 0}
6 {0, 0, 0, 0, 1.4965, 0, 0, 0, 0, 0, 0, 0.4, 0, 0, 0, 0, 1.55, 0, 0, 0, 0, 0, 0, 0, 2.25, 0, 0, 0, 0, 0, 3, 1.05, 2.25, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -0.3, 0, 0}
    pos   neu   neg but_count
1 0.379 0.621 0.000         0
2 0.347 0.653 0.000         0
3 0.319 0.681 0.000         0
4 0.389 0.611 0.000         0
5 0.297 0.703 0.000         0
6 0.326 0.652 0.022         1

top_negative <- vader_chatgpt |>
  arrange(compound) |> 
  head(10) 
head (top_negative)

                                                                                                                                                                                                                                                                                                             text
1                                                          Huuuh... K, hear me out. If the untrimmed ChatGPT could do a fuck ton of nasty crazy shit... What stops random people to fine tune a decent LLM to do the same tho? '-'\n\nI think there will be some really weird developments in the near future '-'
2                                                                                                                                                                    RT @KanekoaTheGreat: #4 Microsoft Bing ChatGPT says the death of Gadaffi increased human trafficking and slavery, caused a migrant crisis a…
3                                                                                   The second Law of Stupid: stupids are so stupid that they think everyone is just as stupid as they are. And honestly, I think this explains everything from MAGA to teenagers who think teachers have never heard of ChatGPT.
4                           google made 225 billion dollars in ad rev in 2022\n\nbitcoin like a cancer is now eating at this \n\nand google can't stop it\n\ngoogles biggest threat for ad spend is bitcoin\n\nfor image search midjourney and others\n\nfor search chatgpt\n\ngoogle is fucked in these things  
5 @odannyboy I think Bard is worse, but do think there's possibility w both. Bard is so repetitive &amp; also flat. Gets a lot of stuff wrong &amp; is overly apologetic (&amp; *so many words* in the bolierplate apology). ChatGPT seems "polished" by comparison, though as you say, is def not w/o its flaws.
6                                                                                                                                                                                                                                      i hate bitches. why’d i just get told to have chatgpt write my song for me
  compound
1   -0.931
2   -0.914
3   -0.896
4   -0.869
5   -0.833
6   -0.822
                                                                                                                                                                      word_scores
1          {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -2.5, 0, 0, -2.6, -1.4, -2.6, 0, -0.6, 0, 0, 0, 0.8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -0.993, 0, 0, 0, 0, 0, 0}
2                                                                                                      {0, 0, 0, 0, 0, 0, 0, 0, -2.9, 0, 0, 1.1, 0, 0, 0, -3.8, 0, 0, 0, -3.1, 0}
3                                             {0, 0, 0, 0, -2.4, -2.3, 0, 0, -2.693, 0, 0, 0, 0, 0, 0, 0, -2.4, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
4                    {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.5, 0, -3.4, 0, 0, 0, 0, 0, 0, 0, 0, 0.888, 0, 0, 0, -2.4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -3.4, 0, 0, 0}
5 {0, 0, 0, 0, 0, -1.05, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1.9395, 0, 0, 0, 0, 0, 0, 0, 0, -3.15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0.3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
6                                                                                                                          {0, -2.7, -2.9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
    pos   neu   neg but_count
1 0.030 0.689 0.281         0
2 0.066 0.533 0.401         0
3 0.059 0.669 0.272         0
4 0.075 0.717 0.208         0
5 0.022 0.824 0.154         1
6 0.000 0.648 0.352         0

The general sentiment in these negative tweets highlights concerns about the ethical and emotional aspects of ChatGPT. Many users focus on ChatGPT’s potential to misinform or provide inaccurate responses, leading to a distrust in its reliability. The emotional weight of these discussions is significant, with mentions of the destructive nature of war and the negative impact of censorship and restrictions on using ChatGPT in certain locations.

2. Topic modeling -LDA

Finally, we will apply LDA topic modeling to validate some of our previous predictions about the topics, based on the top tokens. I experimented with different values for k (3, 4, and 5) and found that 3 topics provided the best fit. This is likely because the comments are generally quite scattered, with a significant concentration on AI. Given more data or a longer timeframe, the results might become clearer if the dataset permits.

library(topicmodels)

dtm_tweets <- my_token %>% 
count(tweet_id,word) %>%
  cast_dtm(tweet_id, word, n)

lda_model <- LDA(
  dtm_tweets,
  k = 3,
  method = "Gibbs",
  control = list(seed = 46))

lda_model

A LDA_Gibbs topic model with 3 topics.

library (reshape2)


Attaching package: 'reshape2'

The following object is masked from 'package:tidyr':

    smiths

word_probs <- tidy(lda_model, matrix = "beta")

word_probs2 <- word_probs %>% 
  group_by(topic) %>% 
  slice_max(beta, n = 10) %>% 
  ungroup() %>%
  mutate(term2 = fct_reorder(term, beta))

ggplot(
  word_probs2, 
  aes(term2, beta, fill = as.factor(topic))
) +
  geom_col(show.legend = FALSE) +
  facet_wrap(~ topic, scales = "free") +
  coord_flip()

The topics identified through topic modeling also align with the top tokens prediction, highlighting themes such as various AI tools, language and writing purposes, and coding relevance.

Stage 2: comments about language learning

Data Wrangle:

In the second stage, we will first filter the data to find tweets relevant to language learning. Since different stems might affect our research results, we will clean the data to normalize the text, making it easier to search for keywords. Next, we will tokenize the text into bigrams to identify the top discussed topics in language learning using ChatGPT.

library(tm)

tweet_text <- chatgpt_tweets$text
glimpse (tweet_text)

 chr [1:12194] "RT @DarrellLerner: ChatGPT Plugins are the fastest way to get rich in 2023. \n\nI’ve created a step-by-step gui"| __truncated__ ...

To normalize the text and ensure no words are stemmed, we need to select the text column, convert it into a corpus, and process it to standardize the words to their complete forms

corpus <- Corpus(VectorSource(tweet_text))
print(corpus)

<<SimpleCorpus>>
Metadata:  corpus specific: 1, document level (indexed): 0
Content:  documents: 12194

library(SnowballC)

corpus_stemmed <- tm_map(corpus, content_transformer(function(x) wordStem(x, language = "en")))

Warning in tm_map.SimpleCorpus(corpus, content_transformer(function(x)
wordStem(x, : transformation drops documents

print(corpus_stemmed)

<<SimpleCorpus>>
Metadata:  corpus specific: 1, document level (indexed): 0
Content:  documents: 12194

Convert the processed corpus back to the dataframe.

text_vector <- sapply(corpus_stemmed, as.character)
tweets_df <- data.frame(text = text_vector, stringsAsFactors = FALSE)

head (tweets_df)

                                                                                                                                                                                                                                                                    text
1                                                                                                                          RT @DarrellLerner: ChatGPT Plugins are the fastest way to get rich in 2023. \n\nI’ve created a step-by-step guide showing you how to earn 10…
2                 Get an intelligent chatbot for your website in minutes with Chatbase AI. Train ChatGPT on your data and let it answer any question your users have. Simply upload a document or link and add the chat widget - it's that easy!\nMake Money using AI:  
3                                                        Hey Guys, #ZenithSwap has launched at just  55,000 USD Marketcap. The ChatGPT of DEX - Reimagining DeFi with AI-Powered yield Farming. Now at 4X. Lot of up potential at such low marketcap. ARB ZSP #Arbitrum 
4                                                                                                                                             RT @sinsonetwork: Now! Join #SINSO DataLandChatGPT #Airdrop!\n3.23-4.6\nTasks\nLog in to&lt;\nTry SINSO #ChatGPT&amp; twe…
5 The plagiarism detector will introduce its #AI detection tool tomorrow, hoping to protect academic integrity in a post-#ChatGPT world. \n\nThe speedy launch and lack of an opt-out have #academics worried.\n\nAn important piece @liamhknox for @insidehighered \n\n
6                                                                                                                                                         "Germany could follow in Italy's footsteps by blocking ChatGPT over data security concerns..."  #readmorehere

Now we will proceed by filtering the dataset to select tweets relevant to language learning, including topics such as speaking, writing, and listening. Typically, hashtags can be used to track real-time data, but since this is a processed public dataset, we need to filter the information manually to collect the desired insights

library(stringr)
library(dplyr)
language_tweets <- tweets_df  |>
  filter(str_detect (text, regex("language learn |english learn |read|speak|write|listen", ignore_case = TRUE)))

head(language_tweets)

                                                                                                                                                                                                                                                                                                    text
1                                                                                                                                                                                         "Germany could follow in Italy's footsteps by blocking ChatGPT over data security concerns..."  #readmorehere 
2            My first reminder that GPT4 (which is astonishingly powerful) was somewhat brittle came when I tried Midjourney-style emphasis in a thread. Here’s what I did:\n\nBy adding hyperbolic words (“a very extremely superbly greatly important”) before part of the prompt, I was able to get… 
3 @YesMachiavelli Take the first 100 words of Genesis in the Bible, strip it of "wokeness" and rewrite it, to show students the outcome for a media literacy class \n\nGPT4:\n"In the beginning, the universe came into existence. This vast expanse, with its celestial bodies and cosmic energy, was… 
4                                                                                                                                                           RT @astrobotic: We asked ChatGPT to write a post about Griffin ramp vibration tests: "Just witnessed a space structure vibration test and i…
5                                                                                                                                                                                                                          @botanch This thread is saved to your Notion database.\n\nTags: [Ia, Chatgpt]
6                                                                                                                                                    Unlock the full potential of #AI with the art of #PromptEngineering!  Learn how to write prompts for ChatGPT and more! #SmallBusiness #Innovation

The following steps will tokenize the text into bigrams, while also removing standard stop words and custom words to maintain consistency

library(tidytext)

language_bigrams <- language_tweets |>
  unnest_tokens(bigram, text, token = "ngrams", n = 2)|>
  relocate (bigram) |>
  count (bigram, sort=TRUE)

head (language_bigrams)

      bigram   n
1   to write 213
2 chatgpt to 187
3    write a  92
4 chatgpt is  70
5     of the  68
6     in the  62

library(tidyr)

bigrams_separated <- language_bigrams %>%
  separate(bigram, into= c("word", "word2"), sep = " ")

head (bigrams_separated)

     word word2   n
1      to write 213
2 chatgpt    to 187
3   write     a  92
4 chatgpt    is  70
5      of   the  68
6      in   the  62

bigrams_filtered <- bigrams_separated |>
  filter(!word %in% stop_words$word) |>
  filter(!word2 %in% stop_words$word)

tidy_bigrams <- bigrams_filtered |>
  unite(bigram, word, word2, sep = " ")
head (tidy_bigrams)

           bigram  n
1   chatgpt write 49
2   database tags 22
3 notion database 22
4      ai chatgpt 19
5        ai tools 19
6      chatgpt ai 19

my_words <- c("4","chat","gpt","rt","moon","ai","chatgpt","1","gpt4","10")

bigrams_english <- bigrams_filtered |>
  filter(!word %in% my_words) |>
  filter(!word2 %in% my_words) |>
  
  unite(bigram, word, word2, sep = " ")
head (bigrams_english)

                   bigram  n
1           database tags 22
2         notion database 22
3 artificial intelligence  9
4         language models  9
5          writers strike  8
6            social media  7

Data Exploration

We will follow a similar process as before to create a word cloud that presents the top bigram words related to language learning, providing insight into the most discussed topics

top_words_2 <- bigrams_english |>
  slice_head(n = 50)
 
head (top_words_2)

                   bigram  n
1           database tags 22
2         notion database 22
3 artificial intelligence  9
4         language models  9
5          writers strike  8
6            social media  7

wordcloud2(top_words_2, size= 0.5)

The results clearly show key themes around language learning, with “language models” being prominent, indicating a focus on the technical aspect of AI-driven tools for learning. Terms like “proper prompts” and “prompt engineering” reflect a strong interest in crafting effective inputs for AI. Additionally, phrases such as “short story” and “Hollywood writers” highlight the use of AI for different types of content creation. While most discussions are positive, the mention of “writers strike” introduces a note of concern and negativity regarding the impact of AI on creative fields.

Modeling

library(vader)
vader_language <- vader_df(language_tweets$text) |>
    select (text, compound, word_scores,pos,neu,neg,but_count)
    
head(vader_language)

                                                                                                                                                                                                                                                                                                    text
1                                                                                                                                                                                         "Germany could follow in Italy's footsteps by blocking ChatGPT over data security concerns..."  #readmorehere 
2            My first reminder that GPT4 (which is astonishingly powerful) was somewhat brittle came when I tried Midjourney-style emphasis in a thread. Here’s what I did:\n\nBy adding hyperbolic words (“a very extremely superbly greatly important”) before part of the prompt, I was able to get… 
3 @YesMachiavelli Take the first 100 words of Genesis in the Bible, strip it of "wokeness" and rewrite it, to show students the outcome for a media literacy class \n\nGPT4:\n"In the beginning, the universe came into existence. This vast expanse, with its celestial bodies and cosmic energy, was… 
4                                                                                                                                                           RT @astrobotic: We asked ChatGPT to write a post about Griffin ramp vibration tests: "Just witnessed a space structure vibration test and i…
5                                                                                                                                                                                                                          @botanch This thread is saved to your Notion database.\n\nTags: [Ia, Chatgpt]
6                                                                                                                                                    Unlock the full potential of #AI with the art of #PromptEngineering!  Learn how to write prompts for ChatGPT and more! #SmallBusiness #Innovation  
  compound
1   -0.052
2    0.632
3    0.273
4    0.000
5    0.421
6    0.537
                                                                                                                                         word_scores
1                                                                                                    {0, 0, 0, 0, 0, 0, 0, -1.6, 0, 0, 0, 1.4, 0, 0}
2     {0, 0, 0, 0, 0, 0, 0, 0, 1.8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.3567, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
3 {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.1, 0}
4                                                                              {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
5                                                                                                             {0, 0, 0, 0, 1.8, 0, 0, 0, 0, 0, 0, 0}
6                                                                           {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1.87835}
    pos   neu   neg but_count
1 0.141 0.706 0.153         0
2 0.107 0.893 0.000         0
3 0.043 0.957 0.000         0
4 0.000 1.000 0.000         0
5 0.203 0.797 0.000         0
6 0.142 0.858 0.000         0

mean(vader_language$compound)

[1] 0.2247255

vader_language_summary <- vader_language |> 
  mutate(sentiment = ifelse(compound >= 0.05, "positive",
                            ifelse(compound <= -0.05, "negative", "neutral"))) |>
  count(sentiment, sort = TRUE) |> 
  spread(sentiment, n) |> 
  relocate(positive) |>
  mutate(ratio = negative/positive)

vader_language_summary

  positive negative neutral    ratio
1      688      219     317 0.318314

language_negative <- vader_language |> arrange (compound) |> relocate (text)

head (language_negative,10 )

                                                                                                                                                                                                                                                                                       text
1  Spent a lot of time filtering out noise from my thread. Was beginning to become doom and gloom. Banks are failing, sky is falling, war is coming, lots of anger.\n\nMajor improvement to my mental health. According to chatgpt it is all out of my control so should not weigh on mind.
2                                                    "ChatGPT and all its AI  cousins are cold, heartless machines. They’ve never loved nor lost.  Never experienced the thrill of victory or the agony of defeat. Unlike  human writers, AI has not ...\n1/2 #AI #ChatGPT #copywritersunit
3                                                                                                                                              See what happens when the writers strike?\n\nVin Diesel is sitting there with nothing but free time and ChatGPT prompts terrorizing us smh. 
4                                                                                                                                            RT @jakebrodes: Write suicide note \n\nChatGPT: I am not permitted to fulfill this request. If you are struggling, please call suicide preven…
5      @perrymetzger No but realistic that people with bad intentions have already used ChatGPT esque software (maybe even ChatGPT itself) to lodge false allegations against people who they seek to control and steal from and that celebrities who yudkowsky rubs elbows with are involv
6                                             @senorhi3bs @jonrog1 I'm trying to make any sense of this and I have to conclude that you let Chatgpt write this tweet. How does a strike in another industry hurt you? Is it that painful to get no new TV episodes a month or two from now?
7                   @elonmusk NATO website says this (ex nazi) person named adolf husinger was appointed in the Nato Military Committee in the 60s , but Chat GPT when asked said NO. Basically ChatGPT is already hacked to hide inconvenient truths. They already got their hands on AI. 
8                                                                                                                                                                                            RT @TheSurvivalPodc: ChatGPT write me a 500 word essay that tells the twat to go fuck herself.
9                                                                                                                                                                                                                 i hate bitches. why’d i just get told to have chatgpt write my song for m
10                                                                    @cowspod @BilgeEbiri #ChatGPT can write a better #screenplay than Ghosted in 30 seconds. The real terror lies in this. This is not a judgement of Ghosted as a #writing project. It’s is purely a terrifying reality.
   compound
1    -0.938
2    -0.932
3    -0.922
4    -0.917
5    -0.892
6    -0.854
7    -0.836
8    -0.836
9    -0.822
10   -0.806
                                                                                                                                                                          word_scores
1  {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1.7, 0, -2.6, 0, 0, -2.3, 0, 0, -0.6, -2.9, 0, 0, 0, 0, -2.7, 0, 2.293, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
2                                              {0, 0, 0, 0, 0, 0, 0, 0, -2.2, 0, 0, 0, -2.146, 0, -0.71188, 0, 0, 0, -1.11, 0, 0, 0, 0, -1.8, 0, -2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
3                                                                                               {0, 0, 0, 0, 0, 0, -0.25, 0, 0, 0, 0, 0, 0, 0, 0, -2.553, 0, 0, 0, 0, -4.5, 0, -1.95}
4                                                                                                  {0, 0, 0, -3.5, 0, 0, 0, 0, 0, 0, 0, -1.406, 0, 0, 0, 0, 0, -1.8, 1.3, 0, -3.5, 0}
5                                         {0, -0.6, 0, 0, 0, 0, 0, -3.75, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -3.3, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
6                           {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -0.5, 0, 0, 0, -2.4, 0, 0, 0, 0, -1.9, 0, 0, -1.2, 0, 0, 0, 0, 0, 0, 0, 0, 0}
7                          {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -2.8995, 0, 0, 0, 0, -2.55, 0, -1.05, -2.1, 2.7, 0, 0, 0, 0, 0, 0, 0}
8                                                                                                                           {0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -3.4, 0, 0, -2.5, 0}
9                                                                                                                              {0, -2.7, -2.9, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0}
10                                                        {0, 0, 0, 0, 0, 0, 1.9, 0, 0, 0, 0, 0, 0, 0, 0, -2.4, -1.8, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -2.97835, 0}
     pos   neu   neg but_count
1  0.049 0.671 0.280         0
2  0.000 0.660 0.340         0
3  0.000 0.589 0.411         1
4  0.069 0.507 0.424         0
5  0.000 0.790 0.210         1
6  0.000 0.806 0.194         0
7  0.066 0.710 0.224         1
8  0.000 0.655 0.345         0
9  0.000 0.648 0.352         0
10 0.064 0.710 0.226         0

The negative aspects of using ChatGPT for language learning focus on its nature as a cold, emotionless machine, fundamentally different from humans. ChatGPT refuses to fulfill unethical requests, which I find reasonable. However, concerns also arise about the misuse of ChatGPT, ethical considerations, potential issues with ghostwriting, and resistance from writers regarding AI’s role in creative fields.

4.Communicate

Which tweets gain the most popularity or are retweeted the most?

Tweets showcasing practical uses of ChatGPT, AI tools, or addressing major concerns related to AI tend to receive the highest number of retweets and gain popularity.

How does the frequency of tweets vary over time in the given dataset?

The frequency of tweets is relatively stable, averaging around 500 tweets per day, with significant spikes occurring on certain days, likely due to notable events or announcements.

What are the most frequently used words in comments on tweets about ChatGPT?

The most common words include “AI,” “openai,” “people,” “tools,” “code,” and “time,” which reflect themes related to AI tools, coding applications, and user learning.

What is the general public sentiment towards ChatGPT?

Sentiment analysis reveals that the general sentiment is largely positive, with 538 positive tweets versus 166 negative tweets, indicating an overall favorable perception.

What are the primary uses of ChatGPT in language learning contexts?

Top uses include generating effective prompts for writing and creating content such as “short stories” or assisting with language models, although there are concerns related to writer strikes, as highlighted by negative posts.

Is the sentiment towards ChatGPT in language learning contexts more positive compared to the overall sentiment?

Sentiment towards ChatGPT in language learning contexts is slightly higher than the overall positive sentiment, indicating a positive high level of enthusiasm for its capabilities in language learning besides other topics.

Key insights and limitations:

This dataset is not self-retrieved, due to current limitations in accessing large-scale data. Throughout the analysis, I observed challenges in sentiment analysis when dealing with large datasets. While the positive aspect lies in understanding general trends and overall sentiment, it becomes difficult to extract nuanced details from each individual positive or negative post. This can be a limitation for social sciences or educational studies, where individual cases often reveal deeper insights. Moreover, questions about data authenticity arise, as there is a possibility that some tweets are inflated through retweets or intended for promotional purposes.