A simple format like this works perfectly:
| ID | Speaker | Transcript |
|---|---|---|
| 1 | Patient | I have been feeling tired lately and sleeping poorly |
| 2 | Patient | The medication helped but I still feel anxious |
| 3 | Clinician | Patient reports improvement in mood |
df <- data.frame(
ID = c(1,2,3),
Speaker = c("Patient", "Patient", "Clinician"),
Transcript = c(
"I have been feeling tired lately and sleeping poorly",
"The medication helped but I still feel anxious",
"Patient reports improvement in mood"
)
)
words <- df %>%
unnest_tokens(word, Transcript) %>%
count(word, sort = TRUE)
wordcloud2(words)