Studying social media in the digital age can illuminate themes within organizations, topics and even agents of change.
Data Feminism is a specific topic of interest for me. “Data Feminism offers strategies for data scientists seeking to learn how feminism can help them work toward justice, and for feminists who want to focus their efforts on the growing field of data science. But Data Feminism is about much more than gender. It is about power, about who has it and who doesn’t, and about how those differentials of power can be challenged and changed” (@DIgnazio and @Klein, 2020).
Guiding Questions:
Feminist Joni Seager asserts, “What get’s counted, counts!” My guiding question is: - What do recent tweets say about “Data Feminism” since the recent SoLar Convention?
I pulled data over a week after the SoLAr confernce to try and understand what themes emerged.
First, we load libraries that we will use to wrangle our data and then visualize it. We will not use the Twitter API here as the data set was saved.
library(tidyverse)
library(readxl)
library(tidytext)
library(textdata)
library(ggplot2)
library(kableExtra)
library(scales)
library(rtweet)
library(tidyr)
library(stringr)
library(vtree)
library(igraph)
library(ggraph)
library(tidygraph)
library(networkD3)
library(ggplot2)
#SET PARAMETERS
#define colors to use throughout
my_colors <- c("#E69F00", "#56B4E9", "#009E73", "#CC79A7", "#D55E00", "#D65E00")
theme_plot <- function(aticks = element_blank(),
pgminor = element_blank(),
lt = element_blank(),
lp = "none")
{
theme(plot.title = element_text(hjust = 0.5), #center the title
axis.ticks = aticks, #set axis ticks to on or off
panel.grid.minor = pgminor, #turn on or off the minor grid lines
legend.title = lt, #turn on or off the legend title
legend.position = lp) #turn on or off the legend
}
We will read in the previously pulled Data Feminism tweets. Using: - “#DataFeminism” - “Data Feminism” and - “#AfrofeministDataFutures”
We have 491 tweets and 91 variables. We do not need all of that so we will wrangle our data in the next section.
dataF_tweets <- read_excel("data/dataF_tweets.xlsx")
dataF_tidy <- dataF_tweets %>%
filter(lang == "en") %>%
select(screen_name, created_at, text)%>%
mutate(feminism = "data feminism") %>%
relocate(feminism)
dataF_tidy %>%
head()%>%
kbl(caption = "Restructured Data - Data Feminism data frame") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| feminism | screen_name | created_at | text |
|---|---|---|---|
| data feminism | nandaseth | 2022-03-25 00:53:31 | Data Feminism! Data. 0s and 1s Binary. Idhula enna Feminism? What karmam is this? https://t.co/qBLPu9hrqZ |
| data feminism | danbouk | 2022-03-24 14:42:25 | @lokendrachauhan @Keviriah @zephoria Thank you for reading! I most do history, but a great text aimed at thinking about building consensus and making data do good and useful things today is Data Feminism by @kanarinka & @laurenfklein https://t.co/4tIGU4sUx2 |
| data feminism | 14prinsp | 2022-03-24 13:52:58 | Congratulations for the organisers of #LAK22 for being courageous in opening spaces to look at #learninganalytics through lenses such as decolonisation and data feminism. Viva! |
| data feminism | 14prinsp | 2022-03-24 13:12:10 | First point of data feminism to consider in #learninganalytics is “In today’s world, data is power” @kanarinka #LAK22 The power is distributed unequally and those who are already vulnerable, are made even more vulnerable |
| data feminism | MamtaShah | 2022-03-24 13:52:09 |
Thank you for a powerful and insightful keynote on data feminism @kanarinka https://t.co/AEqFSMb1SR #LAK22 https://t.co/7CCt7BnacG |
| data feminism | OlgaOvi | 2022-03-24 13:48:33 | “Data feminism requires an expanded definition of data visualization and data science!” #LAK22 @kanarinka |
We will tidy our text using the tidytext and dplyr packages to split the text into two tokens (bigrams)creating a table with one-token-per-row . The token is under a column called word(). Another step to tidy the text is to remove the most common stop words such as a, the, is, are, amp, and, etc.
Before we break them into bigrams let’s inspect one token to ee if there is any other nonsense words we need to eliminate.
#tokenize tweets
tweet_tokens <-
dataF_tidy %>%
unnest_tokens(output = word,
input = text,
token = "tweets")
#Tidy text and get rid of #art (nonsenese word that turned up with Canvas)
dataFem_tweets <-
tweet_tokens %>%
anti_join(stop_words, by = "word")
dataFem_tweets%>%
head%>%
kbl(caption = "tokenized dataFem data frame") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| feminism | screen_name | created_at | word |
|---|---|---|---|
| data feminism | nandaseth | 2022-03-25 00:53:31 | data |
| data feminism | nandaseth | 2022-03-25 00:53:31 | feminism |
| data feminism | nandaseth | 2022-03-25 00:53:31 | data |
| data feminism | nandaseth | 2022-03-25 00:53:31 | 0s |
| data feminism | nandaseth | 2022-03-25 00:53:31 | 1s |
| data feminism | nandaseth | 2022-03-25 00:53:31 | binary |
Let’s look the word count for the dataFem_tweets to see if we see any emerging themes. We can see a high number if Data Science themes and hashtags. We can go ahead and create our bigrams in the next section.
dataFem_tweets %>%
count(word, sort = TRUE)
## # A tibble: 2,793 x 2
## word n
## <chr> <int>
## 1 #femtech 398
## 2 100daysofcode 124
## 3 #womenwhocode 122
## 4 #womenintech 109
## 5 #ml 103
## 6 #iot 94
## 7 #ai 72
## 8 #iiot 66
## 9 #python 66
## 10 #flutter 63
## # ... with 2,783 more rows
Below let’s create our Bigrams by removing hashtags, and other nonsense words. You now see that we have 145 observations and two variables
# regex for parsing tweets
replace_reg <- "https?://[^\\s]+|&|<|>|&d2l;|&aristotlemrs;|&aleks;|\bRT\\b"
# split into word pairs
dataF_bigrams <- dataF_tidy %>%
mutate(text = str_replace_all(text, replace_reg, "")) %>%
unnest_tokens(bigram, text, token = "ngrams", n = 2)
# remove stop words
dataF_bigrams <- dataF_bigrams %>%
separate(bigram, into = c("first","second"), sep = " ", remove = FALSE) %>%
anti_join(stop_words, by = c("first" = "word")) %>%
anti_join(stop_words, by = c("second" = "word")) %>%
filter(str_detect(first, "[a-z]") &
str_detect(second, "[a-z]"))
bigrams_united <- dataF_bigrams%>%
unite(bigram, first, second, sep = " ")
bigrams_united
## # A tibble: 5,805 x 4
## feminism screen_name created_at bigram
## <chr> <chr> <dttm> <chr>
## 1 data feminism nandaseth 2022-03-25 00:53:31 data feminism
## 2 data feminism nandaseth 2022-03-25 00:53:31 feminism data
## 3 data feminism nandaseth 2022-03-25 00:53:31 data 0s
## 4 data feminism nandaseth 2022-03-25 00:53:31 1s binary
## 5 data feminism nandaseth 2022-03-25 00:53:31 binary idhula
## 6 data feminism nandaseth 2022-03-25 00:53:31 idhula enna
## 7 data feminism nandaseth 2022-03-25 00:53:31 enna feminism
## 8 data feminism danbouk 2022-03-24 14:42:25 lokendrachauhan keviriah
## 9 data feminism danbouk 2022-03-24 14:42:25 keviriah zephoria
## 10 data feminism danbouk 2022-03-24 14:42:25 text aimed
## # ... with 5,795 more rows
#count up new birgams and create a new column called n only keep more than 5 counts
dataF_bigrams_count <- dataF_bigrams %>%
group_by(screen_name, bigram, first, second)%>%
summarise(n=n())%>%
filter(n >= 5)%>%
arrange(-n)%>%
ungroup()
dataF_bigrams_count %>%
count(bigram, sort = TRUE)
## # A tibble: 145 x 2
## bigram n
## <chr> <int>
## 1 iot iiot 3
## 2 100daysofcode femtech 2
## 3 ai ml 2
## 4 analytics rstats 2
## 5 ar ml 2
## 6 cloud bigdata 2
## 7 femtech ar 2
## 8 flutter javascript 2
## 9 iiot nlp 2
## 10 java 100daysofcode 2
## # ... with 135 more rows
Again, we see a high prevalence of Data Science terms.
Here we will plot our bigrams in a network graph using ggraph package. We can see that there is some central bigrams with radiating nodes.
# Rename and reorder columns (so we can make the graphs more easily)
dataF_bigram_tbl <- dataF_bigrams_count %>%
dplyr::select(c('first','second', 'n'))
bigram_graph <- dataF_bigram_tbl %>%
filter(n > 10) %>%
graph_from_data_frame()
set.seed(123)
p <- ggraph(bigram_graph, layout = "fr") +
geom_edge_link(aes(edge_alpha = n), show.legend = FALSE) +
geom_node_point(color = "lightblue", size = 5) +
geom_node_text(aes(label = name), vjust = 1, hjust = 1) +
theme_graph() +
theme(legend.position = "none")
p
Purpose - The purpose of the case study is to look at the **social network* of Bigrams from a Tweet Dataset on Data Feminism pulled previously during the SoLar Conference. Understanding how information is shared within the network is important to understand what topics or themes for future research.
Methods - For this independent analysis I explored tweet Bigrams which is a text mining process.
Findings - Several top Cluster themes stood out: - Women who Code - femtech - IOT (Internet of Things) - ML (Machine Learning)
The words are paired by co-occurrence.
Discussion - Bigrams network might show the general idea of the content of the information gathered in twitter posts. Insights from a case study like this may be used to guide Public and Private organizations looking to monitor how information regarding research or product launch. A Bigram analysis from collected Tweets may show terms that may not be identical to other analysis.
References: D’ignazio, C., & Klein, L. F. (2020). Data feminism. MIT press.