1. PURPOSE

1a. Motivation and Focus

Studying social media in the digital age can illuminate themes within organizations, topics and even agents of change.

Data Feminism is a specific topic of interest for me. “Data Feminism offers strategies for data scientists seeking to learn how feminism can help them work toward justice, and for feminists who want to focus their efforts on the growing field of data science. But Data Feminism is about much more than gender. It is about power, about who has it and who doesn’t, and about how those differentials of power can be challenged and changed” (@DIgnazio and @Klein, 2020).

Guiding Questions:

Feminist Joni Seager asserts, “What get’s counted, counts!” My guiding question is: - What do recent tweets say about “Data Feminism” since the recent SoLar Convention?

I pulled data over a week after the SoLAr confernce to try and understand what themes emerged.

1b. Load Libraries

First, we load libraries that we will use to wrangle our data and then visualize it. We will not use the Twitter API here as the data set was saved.

library(tidyverse)
library(readxl)
library(tidytext)
library(textdata)
library(ggplot2)
library(kableExtra)
library(scales)
library(rtweet)
library(tidyr)
library(stringr)


library(vtree)
library(igraph)
library(ggraph)
library(tidygraph)
library(networkD3)
library(ggplot2)

#SET PARAMETERS
#define colors to use throughout
my_colors <- c("#E69F00", "#56B4E9", "#009E73", "#CC79A7", "#D55E00", "#D65E00")

theme_plot <- function(aticks = element_blank(),
                         pgminor = element_blank(),
                         lt = element_blank(),
                         lp = "none")
{
  theme(plot.title = element_text(hjust = 0.5), #center the title
        axis.ticks = aticks, #set axis ticks to on or off
        panel.grid.minor = pgminor, #turn on or off the minor grid lines
        legend.title = lt, #turn on or off the legend title
        legend.position = lp) #turn on or off the legend
}

2. METHOD

2a. Read in and Wrangle the tweet data

We will read in the previously pulled Data Feminism tweets. Using: - “#DataFeminism” - “Data Feminism” and - “#AfrofeministDataFutures”

Then subset the rows and columns to pull only English language texts
We will bind each separate data frame into one data frame named tweets().
We will need to create a unique identifying index column for later analysis.
Finally, let’s look at the head of our new tweets() data frame.

We have 491 tweets and 91 variables. We do not need all of that so we will wrangle our data in the next section.

dataF_tweets <- read_excel("data/dataF_tweets.xlsx")

2b. Tidy Data

dataF_tidy <- dataF_tweets %>%
  filter(lang == "en") %>%
  select(screen_name, created_at, text)%>%
  mutate(feminism = "data feminism") %>%
  relocate(feminism)

dataF_tidy %>%
  head()%>%
  kbl(caption = "Restructured Data - Data Feminism data frame") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

Restructured Data - Data Feminism data frame
feminism	screen_name	created_at	text
data feminism	nandaseth	2022-03-25 00:53:31	Data Feminism! Data. 0s and 1s Binary. Idhula enna Feminism? What karmam is this? https://t.co/qBLPu9hrqZ
data feminism	danbouk	2022-03-24 14:42:25	@lokendrachauhan @Keviriah @zephoria Thank you for reading! I most do history, but a great text aimed at thinking about building consensus and making data do good and useful things today is Data Feminism by @kanarinka & @laurenfklein https://t.co/4tIGU4sUx2
data feminism	14prinsp	2022-03-24 13:52:58	Congratulations for the organisers of #LAK22 for being courageous in opening spaces to look at #learninganalytics through lenses such as decolonisation and data feminism. Viva!
data feminism	14prinsp	2022-03-24 13:12:10	First point of data feminism to consider in #learninganalytics is “In today’s world, data is power” @kanarinka #LAK22 The power is distributed unequally and those who are already vulnerable, are made even more vulnerable
data feminism	MamtaShah	2022-03-24 13:52:09	Thank you for a powerful and insightful keynote on data feminism @kanarinka https://t.co/AEqFSMb1SR #LAK22 https://t.co/7CCt7BnacG
data feminism	OlgaOvi	2022-03-24 13:48:33	“Data feminism requires an expanded definition of data visualization and data science!” #LAK22 @kanarinka

Tokenize words

We will tidy our text using the tidytext and dplyr packages to split the text into two tokens (bigrams)creating a table with one-token-per-row . The token is under a column called word(). Another step to tidy the text is to remove the most common stop words such as a, the, is, are, amp, and, etc.

Before we break them into bigrams let’s inspect one token to ee if there is any other nonsense words we need to eliminate.

#tokenize tweets
tweet_tokens <- 
  dataF_tidy %>%
  unnest_tokens(output = word, 
                input = text, 
                token = "tweets")

#Tidy text and get rid of #art (nonsenese word that turned up with Canvas)
dataFem_tweets <-
  tweet_tokens %>%
  anti_join(stop_words, by = "word")

dataFem_tweets%>%
  head%>%
  kbl(caption = "tokenized dataFem data frame") %>%
  kable_styling(bootstrap_options = c("striped", "hover"))

tokenized dataFem data frame
feminism	screen_name	created_at	word
data feminism	nandaseth	2022-03-25 00:53:31	data
data feminism	nandaseth	2022-03-25 00:53:31	feminism
data feminism	nandaseth	2022-03-25 00:53:31	data
data feminism	nandaseth	2022-03-25 00:53:31	0s
data feminism	nandaseth	2022-03-25 00:53:31	1s
data feminism	nandaseth	2022-03-25 00:53:31	binary

3. Explore

3a. Tokenized single word count

Let’s look the word count for the dataFem_tweets to see if we see any emerging themes. We can see a high number if Data Science themes and hashtags. We can go ahead and create our bigrams in the next section.

dataFem_tweets %>%
  count(word, sort = TRUE)

## # A tibble: 2,793 x 2
##    word              n
##    <chr>         <int>
##  1 #femtech        398
##  2 100daysofcode   124
##  3 #womenwhocode   122
##  4 #womenintech    109
##  5 #ml             103
##  6 #iot             94
##  7 #ai              72
##  8 #iiot            66
##  9 #python          66
## 10 #flutter         63
## # ... with 2,783 more rows

3b. Create Bigram

Below let’s create our Bigrams by removing hashtags, and other nonsense words. You now see that we have 145 observations and two variables

# regex for parsing tweets
replace_reg <- "https?://[^\\s]+|&amp;|&lt;|&gt;|&d2l;|&aristotlemrs;|&aleks;|\bRT\\b"
# split into word pairs
dataF_bigrams <- dataF_tidy %>% 
  mutate(text = str_replace_all(text, replace_reg, "")) %>%
  unnest_tokens(bigram, text, token = "ngrams", n = 2)

# remove stop words
dataF_bigrams <- dataF_bigrams %>%
  separate(bigram, into = c("first","second"), sep = " ", remove = FALSE) %>%
  anti_join(stop_words, by = c("first" = "word")) %>%
  anti_join(stop_words, by = c("second" = "word")) %>%
  filter(str_detect(first, "[a-z]") &
         str_detect(second, "[a-z]"))

bigrams_united <- dataF_bigrams%>%
  unite(bigram, first, second, sep = " ")

bigrams_united

## # A tibble: 5,805 x 4
##    feminism      screen_name created_at          bigram                  
##    <chr>         <chr>       <dttm>              <chr>                   
##  1 data feminism nandaseth   2022-03-25 00:53:31 data feminism           
##  2 data feminism nandaseth   2022-03-25 00:53:31 feminism data           
##  3 data feminism nandaseth   2022-03-25 00:53:31 data 0s                 
##  4 data feminism nandaseth   2022-03-25 00:53:31 1s binary               
##  5 data feminism nandaseth   2022-03-25 00:53:31 binary idhula           
##  6 data feminism nandaseth   2022-03-25 00:53:31 idhula enna             
##  7 data feminism nandaseth   2022-03-25 00:53:31 enna feminism           
##  8 data feminism danbouk     2022-03-24 14:42:25 lokendrachauhan keviriah
##  9 data feminism danbouk     2022-03-24 14:42:25 keviriah zephoria       
## 10 data feminism danbouk     2022-03-24 14:42:25 text aimed              
## # ... with 5,795 more rows

#count up new birgams and create a new column called n only keep more than 5 counts
dataF_bigrams_count <- dataF_bigrams %>%
  group_by(screen_name, bigram, first, second)%>%
  summarise(n=n())%>%
  filter(n >= 5)%>%
  arrange(-n)%>%
  ungroup()


dataF_bigrams_count %>%
  count(bigram, sort = TRUE)

## # A tibble: 145 x 2
##    bigram                    n
##    <chr>                 <int>
##  1 iot iiot                  3
##  2 100daysofcode femtech     2
##  3 ai ml                     2
##  4 analytics rstats          2
##  5 ar ml                     2
##  6 cloud bigdata             2
##  7 femtech ar                2
##  8 flutter javascript        2
##  9 iiot nlp                  2
## 10 java 100daysofcode        2
## # ... with 135 more rows

Again, we see a high prevalence of Data Science terms.

4 Model

4a. GGRAPH of Bigrams

Here we will plot our bigrams in a network graph using ggraph package. We can see that there is some central bigrams with radiating nodes.

# Rename and reorder columns (so we can make the graphs more easily)
dataF_bigram_tbl <- dataF_bigrams_count %>%
  dplyr::select(c('first','second', 'n'))


bigram_graph <- dataF_bigram_tbl %>%
  filter(n > 10) %>%
  graph_from_data_frame()


set.seed(123)


p <- ggraph(bigram_graph, layout = "fr") +
    geom_edge_link(aes(edge_alpha = n), show.legend = FALSE) +
    geom_node_point(color = "lightblue", size = 5) +
    geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
 theme_graph() +
  theme(legend.position = "none")

p

5. COMMUNICATE

Purpose - The purpose of the case study is to look at the **social network* of Bigrams from a Tweet Dataset on Data Feminism pulled previously during the SoLar Conference. Understanding how information is shared within the network is important to understand what topics or themes for future research.

Methods - For this independent analysis I explored tweet Bigrams which is a text mining process.

Findings - Several top Cluster themes stood out: - Women who Code - femtech - IOT (Internet of Things) - ML (Machine Learning)

The words are paired by co-occurrence.

Discussion - Bigrams network might show the general idea of the content of the information gathered in twitter posts. Insights from a case study like this may be used to guide Public and Private organizations looking to monitor how information regarding research or product launch. A Bigram analysis from collected Tweets may show terms that may not be identical to other analysis.

References: D’ignazio, C., & Klein, L. F. (2020). Data feminism. MIT press.

What can Tweets tell us about Data Feminism?

Jeanne McClure

March 28, 2022