Problem:

Filter out rows which contain the hashtag #caa

Background:

When we download twitter data for analysis it includes the hashtags associated with each tweet. The data wraps the hashtags (0 or more) into a single string.

Example: "#red, #white, #caa, #blue"

Note that it is all one long single string.

I need to filter my tibble to just those rows which contain ‘#caa’.

Here is a generic 4 row tibble with a hashtags variable as described as one long string and then split out each hashtag which returns a tibble with a list vector named hashtags.

library(tidyverse)
tb <- tibble("numbers" = c(20, 33, 28, 23),
             "sex" = c("m", "f", "f", "m"),
             "hashtags"= c("#caa", "#red, #yellow","","#red, #caa")
             )

tb$hashtags <- str_split(tb$hashtags,", ")
view(tb)

Discussion

I could grep for #caa and call it a day. That would work, but I prefer to slice up the string into multiple hashtags for further analysis.

I str_split() each string into its individual components and then iterate over that list of chars to see if #caa or any other hashtag exists.

The problem comes when str_split is run. str_split returns a list.

A list with the length of however many rows there are in the original tibble.

Checking each list for a value is suprisingly hard. filter("#caa" %in% tb$hashtags) returns the entire tibble so we need to look elsewhere.

Solution

sapply()

src: (https://stackoverflow.com/a/53086319/4858518)

It returns a boolen 1 if #caa exists

So I mutate a column with a 1 or 0 and just filter on that to get my filtered results

Now we use mutate %>% sapply() to create a new logical/boolean vector ‘hasCaa’ with a 1 or 0 and filter to get our two desired rows.

tb <- tibble("numbers" = c(20, 33, 28, 23),
             "sex" = c("m", "f", "f", "m"),
             "hashtags"= c("#caa", "#red, #yellow","","#red, #caa")
             )

tb$hashtags <- str_split(tb$hashtags,", ")

tb <- tb %>% mutate( hasCaa = 1 * sapply(tb$hashtags, `%in%`, x = "#caa")  )

tb <- tb %>% filter(hasCaa == 1)
view(tb)

The preview shows you a rendered HTML copy of the contents of the editor. Consequently, unlike Knit, Preview does not run any R code chunks. Instead, the output of the chunk when it was last run in the editor is displayed.