library(readxl)
library(tidyverse)
library(sentimentr)
library(stringr)
library(knitr)
library(ggplot2)
library(tidytext)
library(lme4)
library(lmerTest)
library(rcompanion)“I Love You Just as Much”: Emotional Valence and Composition in Owners’ Narratives About Dogs and Cats
Introduction
Understanding how people talk about their pets provides insight into the emotional structure of human–animal relationships. Although growing research highlights the psychological importance of companion animals, less attention has been given to the emotional tone embedded in everyday narratives about different species. In particular, it remains unclear whether pet owners use distinct emotional language when discussing dogs compared to cats, who are among the most popular pet species.
The present study analyzes whether the overall emotional tone of language differs when participants describe thier dogs verse thier cats. Using interview transcripts from a broader study of multispecies families, pet-related narrative segments were extracted and analyzed using lexicon-based sentiment analysis tools (e.g., sentimentr, NRC Emotion Lexicon). Sentiment scores were compared across species to assess whether systematic differences in emotional expression emerge.
Required Packages
All data preparation, sentiment scoring, and statistical analyses were conducted in R. The following packages were used:
Data Preparation
Load and Setup Data
pet_chunks <- read_csv("sentiment_analysis_raw.csv")Rows: 536 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): species, raw_chunk
dbl (2): participant_id, chunk_id
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Basic Descriptives
First, we calculated descriptive statistics to characterize the analytic sample.
# count total number of chunks
pet_chunks %>%
summarise(n = n_distinct(participant_id)) %>%
pull(n)[1] 20
# count total number of chunks
nrow(pet_chunks)[1] 536
# create participant groups (Dog Only, Cat Only, Dog and Cat)
participant_groups <- pet_chunks %>%
distinct(participant_id, species) %>% # keep one row per participant per species
group_by(participant_id) %>%
summarise(
n_species = n_distinct(species), # count how many species each participant discussed
group = case_when(
n_species == 2 ~ "Dog and Cat", # count how many species each participant discussed
first(species) == "dog" ~ "Dog Only",
first(species) == "cat" ~ "Cat Only",
TRUE ~ NA_character_
),
.groups = "drop"
) %>%
# display group in this order
mutate(group = factor(group, levels = c("Dog Only", "Cat Only", "Dog and Cat")))
# count number of participants in each group
participant_groups %>% count(group) %>% knitr::kable()| group | n |
|---|---|
| Dog Only | 11 |
| Cat Only | 4 |
| Dog and Cat | 5 |
The analytic sample consisted of 20 unique participants contributing a total of 536 pet-related narrative chunks. Of these participants, 11 discussed only dogs, 4 discussed only cats, and 5 discussed both species. For participants who discussed both species, narrative segments were coded and analyzed at the chunk level, such that each segment was assigned to the species it referenced. Overall, most participants contributed narratives about a single species, with a smaller subset providing narratives about both dogs and cats.
Data Cleaning
Next, the narrative text was cleaned and standardized prior to sentiment analysis.
pet_chunks <- pet_chunks %>%
mutate(clean_chunk = raw_chunk)
pet_chunks <- pet_chunks %>%
mutate(
clean_chunk = clean_chunk %>%
str_remove_all("\\([^)]*\\)") %>% # remove (laughs), (pause), etc.
str_replace_all("\\s+", " ") %>% # replace extra spaces or line breaks with a single space
str_trim() %>% # remove extra spaces at the beginning and end of each chunk
str_to_lower()) # convert text to lowercase to improve consistency for sentiment analysis
# create a unique identifier for each chunk
pet_chunks <- pet_chunks %>%
mutate(chunk_uid = paste(participant_id, chunk_id, sep = "_")) During data cleaning, bracketed notations (e.g., “(laughs)”) were removed, extra spaces and line breaks were standardized, and all text was converted to lowercase. A unique identifier was also created for each chunk to help tracking across analyses. The cleaned text was used in all subsequent sentiment analyses.
Sentiment Analysis
1. Emotional Valence Analysis (sentimentr)
In this section, chunk-level sentiment scores were calculated using the sentimentr package.
# calculating average sentiment score by unique id
chunk_sent <- sentiment_by(
pet_chunks$clean_chunk,
by = pet_chunks$chunk_uid)
# checking column name in the sentiment output
names(chunk_sent)[1] "chunk_uid" "word_count" "sd" "ave_sentiment"
# join sentiment score to main by chunk_uid
pet_chunks <- pet_chunks %>%
left_join(chunk_sent, by = "chunk_uid")
# calculate and display descriptive statistics (mean, SD, and sample size) for chunk-level sentiment scores by species.
pet_chunks %>%
group_by(species) %>%
summarise(
mean_sentiment = round(mean(ave_sentiment), 3),
sd_sentiment = round(sd(ave_sentiment), 3),
n = n()
) %>%
knitr::kable()| species | mean_sentiment | sd_sentiment | n |
|---|---|---|---|
| cat | 0.091 | 0.162 | 167 |
| dog | 0.114 | 0.156 | 369 |
Descriptive statistics showed that the mean sentiment score for dog-related chunks was 0.114 (SD = 0.156, n = 369), whereas the mean for cat-related chunks was 0.091 (SD = 0.162, n = 167).
2. Emotion Category Analysis (NRC Lexicon)
a. Tokenization
To examine discrete emotional categories, the cleaned text was tokenized and matched with the NRC emotion lexicon.
# 1. break each chunk into individual words while keeping the participant and chunk IDs
tokens <- pet_chunks %>%
select(participant_id, chunk_uid, species, clean_chunk) %>%
unnest_tokens(word, clean_chunk)
# 2. count the total number of words in each chunk (helps to adjust for chunk length in later analyses)
chunk_total_words <- tokens %>%
count(participant_id, chunk_uid, species, name = "total_words")
# 3. load the NRC emotion lexicon
nrc <- get_sentiments("nrc")
# 4. # inner_join keeps only tokens that appear in the NRC lexicon
nrc_tokens <- tokens %>%
inner_join(nrc, by = "word", relationship = "many-to-many")
# 5. focus on discrete emotions only (exclude positive/negative)
nrc_discrete <- nrc_tokens %>%
filter(!sentiment %in% c("positive", "negative"))
# 6.creates one row per chunk per emotion category
chunk_emotion_counts <- nrc_discrete %>%
count(participant_id, chunk_uid, species, sentiment, name = "emotion_n") %>%
left_join(chunk_total_words, by = c("participant_id", "chunk_uid", "species"))b. Descriptive Emotion Distribution
To explore the distribution of discrete emotional categories across species, the proportion of NRC emotion words was calculated separately for dog and cat narratives. For each species, emotion counts were converted to proportions to account for differences in overall emotion word frequency. Figure 1 presents the descriptive distribution of discrete emotions across species.
# summarize the total number of emotion words for each emotion category within each species
emotion_summary <- chunk_emotion_counts %>%
group_by(species, sentiment) %>%
summarise(total_emotion_words = sum(emotion_n), .groups = "drop")
# convert raw counts to proportions within each species, which allows us to compare the relative distribution of emotions
emotion_summary_prop <- emotion_summary %>%
group_by(species) %>%
mutate(prop = total_emotion_words / sum(total_emotion_words))
# create a bar plot showing the proportion of each emotion category by species
ggplot(emotion_summary_prop,
aes(x = sentiment, y = prop, fill = species)) +
geom_col(position = "dodge") +
labs(x = "Emotion Category",
y = "Proportion of Emotion Words",
fill = "Species") +
theme_minimal()Interpretation:
As shown in Figure 1, the overall pattern of discrete emotions was broadly similar across dog and cat narratives. Joy and trust emerged as the most frequent emotion categories for both species, whereas anger and disgust were relatively less common. Although minor visual differences were observed in certain categories (e.g., anticipation and surprise), the general distribution of emotions appeared comparable across species. Formal statistical tests were conducted to determine whether these observed differences were statistically meaningful.
Statistical Testing
1. Emotional Valence Comparison (Chunk-Level Analysis)
Does the overall emotional tone differ when people talk about dogs versus cats?
Since participants contributed multiple narrative chunks, a linear mixed-effects model was fitted to examine whether overall emotional valence differed between dog and cat narratives while accounting for clustering within participants. A random intercept for participant was included to allow each individual to vary in their baseline level of emotional tone.
# fit linear mixed-effects model with participant as a random intercept
m_chunks <- lmer(ave_sentiment ~ species + (1 | participant_id),
data = pet_chunks)
# 1. ave_sentiment is the outcome variable (sentiment score for each chunk)
# 2. species is the predictor (dog vs cat)
# 3. (1 | participant_id) accounts for the fact that chunks are grouped within participants
# 4. data = pet_chunks tells the model which dataset to use
summary(m_chunks)Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: ave_sentiment ~ species + (1 | participant_id)
Data: pet_chunks
REML criterion at convergence: -443.4
Scaled residuals:
Min 1Q Median 3Q Max
-3.9931 -0.6453 -0.1090 0.6055 4.7474
Random effects:
Groups Name Variance Std.Dev.
participant_id (Intercept) 5.314e-05 0.00729
Residual 2.495e-02 0.15797
Number of obs: 536, groups: participant_id, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.09045 0.01251 27.21218 7.232 8.48e-08 ***
speciesdog 0.02401 0.01504 33.64186 1.596 0.12
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
speciesdog -0.826
Interpretation:
A linear mixed-effects model was fitted to examine whether overall emotional valence differed by species while accounting for clustering of narrative chunks within participants. Species was not a significant predictor of valence, b = 0.024, SE = 0.015, t = 1.60, p = .12. Although dog-related chunks showed slightly higher average sentiment scores than cat-related chunks, this difference was small and not statistically significant after accounting for participant-level variation.
Robustness Check: Sentence-Level Analysis
As a robustness check, emotional valence was reanalyzed at the sentence level to determine whether the findings were sensitive to the choice of chunk segmentation. A linear mixed-effects model was again used to account for clustering of sentences within participants.
# split each cleaned chunk into individual sentences (this creates a new dataset where each row represents one sentence)
pet_sentences <- pet_chunks %>%
unnest_tokens(sentence, clean_chunk, token = "sentences") %>%
mutate(sentence_uid = row_number()) # assign a unique ID to each sentence
# calculates an average sentiment score for each sentence
sentence_sent <- sentiment_by(pet_sentences$sentence, by = pet_sentences$sentence_uid)
# merge sentence-level sentiment scores back into the dataset
pet_sentences <- pet_sentences %>%
left_join(
sentence_sent %>%
select(sentence_uid, ave_sentiment),
by = "sentence_uid",
suffix = c("", "_sentence")
)
# rename the sentiment variable for clarity
pet_sentences <- pet_sentences %>%
rename(sentence_sentiment = ave_sentiment_sentence)
m_sentences <- lmer(
sentence_sentiment ~ species + (1 | participant_id) + (1 | chunk_uid),
data = pet_sentences
)
# Fixed effect: species (dog vs. cat)
# Random intercepts: participant_id (accounts for repeated participants)
# chunk_uid (accounts for sentences nested within chunks)
summary(m_sentences)Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: sentence_sentiment ~ species + (1 | participant_id) + (1 | chunk_uid)
Data: pet_sentences
REML criterion at convergence: -481.6
Scaled residuals:
Min 1Q Median 3Q Max
-4.2408 -0.5183 -0.1086 0.5286 3.9399
Random effects:
Groups Name Variance Std.Dev.
chunk_uid (Intercept) 0.0046509 0.06820
participant_id (Intercept) 0.0001295 0.01138
Residual 0.0277856 0.16669
Number of obs: 825, groups: chunk_uid, 536; participant_id, 20
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 0.08681 0.01265 24.80038 6.861 3.58e-07 ***
speciesdog 0.01392 0.01522 32.14718 0.914 0.367
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr)
speciesdog -0.820
# confidence intervals
confint(m_sentences)Computing profile confidence intervals ...
2.5 % 97.5 %
.sig01 0.04373878 0.08884448
.sig02 0.00000000 0.03179067
.sigma 0.15605770 0.17784810
(Intercept) 0.06176241 0.11132778
speciesdog -0.01562290 0.04399845
Interpretation:
Consistent with the chunk-level analysis, a sentence-level linear mixed-effects model revealed no significant effect of species on sentiment, b = 0.014, SE = 0.015, t = 0.91, p = 0.38. The 95% confidence interval for the species effect included zero [−0.016, 0.044], indicating no reliable difference in emotional tone between dog and cat narratives at the sentence level. These findings suggest that emotional valence did not meaningfully differ when participants discussed dogs versus cats at the sentence level.
2. Emotion Category Comparison (Chunk-Level Analysis)
Extending the valence analysis to examine discrete emotion categories, a generalized linear mixed-effects model with a Poisson link was fitted to test whether emotion rates differed by species.
Are certain emotions used more often when people talk about dogs compared to cats?
m_nrc <- glmer(
emotion_n ~ species * sentiment + (1 | participant_id),
offset = log(total_words),
family = poisson,
data = chunk_emotion_counts,
control = glmerControl(optimizer = "bobyqa")
)
# 1. emotion_n is the outcome variable (number of emotion words in each chunk)
# 2. species and sentiment are the predictors (dog vs cat, and emotion category)
# 3. species * sentiment tests whether emotion categories differ by species
# 4. (1 | participant_id) accounts for chunks being grouped within participants
# 5. offset = log(total_words) adjusts for differences in chunk length
# 6. family = poisson is used because the outcome is a count variable
# 7. data = chunk_emotion_counts tells the model which dataset to use
summary(m_nrc)Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: poisson ( log )
Formula: emotion_n ~ species * sentiment + (1 | participant_id)
Data: chunk_emotion_counts
Offset: log(total_words)
Control: glmerControl(optimizer = "bobyqa")
AIC BIC logLik -2*log(L) df.resid
7615.0 7710.9 -3790.5 7581.0 2073
Scaled residuals:
Min 1Q Median 3Q Max
-2.4957 -0.5970 0.0124 0.7120 7.6370
Random effects:
Groups Name Variance Std.Dev.
participant_id (Intercept) 0.05446 0.2334
Number of obs: 2090, groups: participant_id, 20
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -4.46961 0.11498 -38.873 < 2e-16 ***
speciesdog 0.08145 0.12464 0.653 0.51346
sentimentanticipation 0.34363 0.11537 2.979 0.00290 **
sentimentdisgust 0.10006 0.13134 0.762 0.44617
sentimentfear 0.25492 0.12217 2.087 0.03692 *
sentimentjoy 0.58107 0.10959 5.302 1.15e-07 ***
sentimentsadness 0.37268 0.11801 3.158 0.00159 **
sentimentsurprise -0.11588 0.13435 -0.863 0.38840
sentimenttrust 0.60656 0.10894 5.568 2.58e-08 ***
speciesdog:sentimentanticipation 0.10650 0.13621 0.782 0.43428
speciesdog:sentimentdisgust -0.10090 0.15896 -0.635 0.52557
speciesdog:sentimentfear -0.12360 0.14638 -0.844 0.39846
speciesdog:sentimentjoy -0.01937 0.13118 -0.148 0.88262
speciesdog:sentimentsadness -0.07603 0.14084 -0.540 0.58932
speciesdog:sentimentsurprise 0.18989 0.15787 1.203 0.22905
speciesdog:sentimenttrust -0.00984 0.12994 -0.076 0.93964
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation matrix not shown by default, as p = 16 > 12.
Use print(x, correlation=TRUE) or
vcov(x) if you need it
Interpretation:
A Poisson generalized linear mixed-effects model indicated that the frequency of emotion-related words did not differ between dog- and cat-related narratives after accounting for chunk length and participant-level clustering, b = 0.081, SE = 0.125, z = 0.65, p = .51. Although some emotion categories occurred more frequently than others overall, the interaction between species and emotion category was not significant, suggesting that the emotional profiles of language used to describe dogs and cats were similar.
Discussion
Across both analytic approaches, species was not a reliable predictor of emotional language. The chunk-level and sentence-level analyses revealed no significant differences in overall emotional valence between dog and cat narratives. Similarly, although discrete emotion categories varied in overall frequency, the generalized linear mixed-effects model indicated that the distribution of specific emotion types did not meaningfully differ by species after accounting for participant-level clustering and chunk length. Together, these findings suggest that participants expressed comparable emotional tone and emotion profiles when discussing dogs and cats.