COVID Twitter Corpus: general overview

To have a baseline for exploring immigration-related tweets, I produced brief overview of the whole dataset first

Fig. 1 Fig. 1

Fig. 2 Fig. 2

Twitter Users

New users

What was the dynamic of new users joining the discussion? I calculated and visualized cumulative distinct counts for it. So, if at a time T users A, B, C, D were participating in the discussion, and at time T+1 users A, B, D, E were posting tweets, we have 4 unique users at time point T and 4 at time point T+1.

Fig. 3 Fig. 3

  • this picture demonstrates that N of users we have in the dataset was growing gradually, without sharp inflows of new users after the start of data collection
  • COVID discussion attracts more and more interested users over time, and probably, after being engaged in a discussion once, users stay in the conversation, as we do not see flattening of the curve
For how long users stay in the discussion

I calculated the period between first and last post of every user, users who posted only once were excluded. The histogram (Fig.4) shows that for many users period of ‘participation in the discussion’ is quite long. This enables us to look at the changes in attitudes over time in a panel-like mode later

Fig. 4 Fig. 4

#migpol hashtag

Next, I looked closer at the tweets with #migpol hashtag

Fig.5 Fig. 5

Sentiments of #migpol tweets

Next, I attached sentiment analysis from Guus and looked at the fluctuations. I aggregated sentiments at a daily level, see Fig.6 Red line is a moving average showing the mean value over 7 days.

There is no clear trend in sentiments change, although by the end of a period fluctuations become steeper. This might be partly explained by increase in N of tweets. No clear trend might also be showing that there is polarisation among the users

Fig. 6 Fig. 6

Some conclusions and questions

  1. There are many retweets in the data, which might be useful for studying polarisation. But if we exclude them for text analysis, N becomes quite low (for #migpol dataset, for instance), especially at a daily level.

  2. There are obvious differences between #migpol tweets and those detected with immigration keywords. What are these differences? Do we analyse them combined or separately?

  3. There are some spikes in interests to the topic of migpol and immigration, how we can explain them?

  4. We might try alternative sentiment analysis algorithms