To have a baseline for exploring immigration-related tweets, I produced brief overview of the whole dataset first
In the corpus of Tweets since 21st of March 2020 till 14th of September (latest available at the moment) there are 1,435,326 tweets posted by 95858 users, and 47037 (49%) of users posted more than 1 tweet
On average, users posted 5 tweets (SD = 107.9), but, as expected it is a heavily long-tailed distribution with a small core of very active users and many inactive ones
The number of tweets per day on the COVID topic decreases over time (Fig. 1) with a slight increase showing up in the last month. Bars demonstrate daily counts, a line is a moving average over 7-days period (same for the plots below)
Fig. 1
Fig. 2
What was the dynamic of new users joining the discussion? I calculated and visualized cumulative distinct counts for it. So, if at a time T users A, B, C, D were participating in the discussion, and at time T+1 users A, B, D, E were posting tweets, we have 4 unique users at time point T and 4 at time point T+1.
Fig. 3
I calculated the period between first and last post of every user, users who posted only once were excluded. The histogram (Fig.4) shows that for many users period of ‘participation in the discussion’ is quite long. This enables us to look at the changes in attitudes over time in a panel-like mode later
Fig. 4
Next, I looked closer at the tweets with #migpol hashtag
Out of all COVID tweets, 36060 have #migpol hashtag (2.5% on average), but the number and relative percentage of tweets is increasing by the end of data collection, (see Fig. 5), despite general decrease in number of posts related to COVID.
4324 of unique users were tweeting about this, posting 8 tweets on average
78% of them are retweets, and this proportion of retweets is higher than in general for COVID tweets
Fig.5
Next, I attached sentiment analysis from Guus and looked at the fluctuations. I aggregated sentiments at a daily level, see Fig.6 Red line is a moving average showing the mean value over 7 days.
There is no clear trend in sentiments change, although by the end of a period fluctuations become steeper. This might be partly explained by increase in N of tweets. No clear trend might also be showing that there is polarisation among the users
Fig. 6
There are many retweets in the data, which might be useful for studying polarisation. But if we exclude them for text analysis, N becomes quite low (for #migpol dataset, for instance), especially at a daily level.
There are obvious differences between #migpol tweets and those detected with immigration keywords. What are these differences? Do we analyse them combined or separately?
There are some spikes in interests to the topic of migpol and immigration, how we can explain them?
We might try alternative sentiment analysis algorithms