“I Love You Just as Much”: Emotional Valence and Composition in Owners’ Narratives About Dogs and Cats

Author

Xinru Wang

Published

March 2, 2026

Introduction

Understanding how people talk about their pets provides insight into the emotional structure of human–animal relationships. Although growing research highlights the psychological importance of companion animals, less attention has been given to the emotional tone embedded in everyday narratives about different species. In particular, it remains unclear whether pet owners use distinct emotional language when discussing dogs compared to cats, who are among the most popular pet species.

The present study analyzes whether the overall emotional tone of language differs when participants describe thier dogs verse thier cats. Using interview transcripts from a broader study of multispecies families, pet-related narrative segments were extracted and analyzed using lexicon-based sentiment analysis tools (e.g., sentimentr, NRC Emotion Lexicon). Sentiment scores were compared across species to assess whether systematic differences in emotional expression emerge.

Required Packages

All data preparation, sentiment scoring, and statistical analyses were conducted in R. The following packages were used:

library(readxl)
library(tidyverse)
library(sentimentr)
library(stringr)
library(knitr)
library(ggplot2)
library(tidytext)
library(lme4)
library(lmerTest)
library(rcompanion)

Data Preparation

Load and Setup Data

pet_chunks <- read_csv("sentiment_analysis_raw.csv")

Rows: 536 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (2): species, raw_chunk
dbl (2): participant_id, chunk_id

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Basic Descriptives

First, we calculated descriptive statistics to characterize the analytic sample.

# count total number of chunks
pet_chunks %>%
  summarise(n = n_distinct(participant_id)) %>%
  pull(n)

[1] 20

# count total number of chunks
nrow(pet_chunks)

[1] 536

# create participant groups (Dog Only, Cat Only, Dog and Cat)
participant_groups <- pet_chunks %>%
  distinct(participant_id, species) %>%  # keep one row per participant per species
  group_by(participant_id) %>%
  summarise(
    n_species = n_distinct(species), # count how many species each participant discussed
    group = case_when(
      n_species == 2 ~ "Dog and Cat", # count how many species each participant discussed
      first(species) == "dog" ~ "Dog Only",
      first(species) == "cat" ~ "Cat Only",
      TRUE ~ NA_character_
    ),
    .groups = "drop"
  ) %>%
  # display group in this order
  mutate(group = factor(group, levels = c("Dog Only", "Cat Only", "Dog and Cat"))) 

# count number of participants in each group
participant_groups %>% count(group) %>% knitr::kable()

Table 1: Number of Participants by Pet Species Category

group	n
Dog Only	11
Cat Only	4
Dog and Cat	5

The analytic sample consisted of 20 unique participants contributing a total of 536 pet-related narrative chunks. Of these participants, 11 discussed only dogs, 4 discussed only cats, and 5 discussed both species. For participants who discussed both species, narrative segments were coded and analyzed at the chunk level, such that each segment was assigned to the species it referenced. Overall, most participants contributed narratives about a single species, with a smaller subset providing narratives about both dogs and cats.

Data Cleaning

Next, the narrative text was cleaned and standardized prior to sentiment analysis.

pet_chunks <- pet_chunks %>%
  mutate(clean_chunk = raw_chunk)

pet_chunks <- pet_chunks %>%
  mutate(
    clean_chunk = clean_chunk %>%
      str_remove_all("\\([^)]*\\)") %>%  # remove (laughs), (pause), etc.
      str_replace_all("\\s+", " ") %>%   # replace extra spaces or line breaks with a single space
      str_trim() %>% # remove extra spaces at the beginning and end of each chunk 
      str_to_lower()) # convert text to lowercase to improve consistency for sentiment analysis

# create a unique identifier for each chunk
pet_chunks <- pet_chunks %>%
  mutate(chunk_uid = paste(participant_id, chunk_id, sep = "_"))

During data cleaning, bracketed notations (e.g., “(laughs)”) were removed, extra spaces and line breaks were standardized, and all text was converted to lowercase. A unique identifier was also created for each chunk to help tracking across analyses. The cleaned text was used in all subsequent sentiment analyses.

Sentiment Analysis

1. Emotional Valence Analysis (`sentimentr`)

In this section, chunk-level sentiment scores were calculated using the sentimentr package.

# calculating average sentiment score by unique id 
chunk_sent <- sentiment_by(
  pet_chunks$clean_chunk,
  by = pet_chunks$chunk_uid)

# checking column name in the sentiment output
names(chunk_sent)

[1] "chunk_uid"     "word_count"    "sd"            "ave_sentiment"

# join sentiment score to main by chunk_uid
pet_chunks <- pet_chunks %>%
  left_join(chunk_sent, by = "chunk_uid")

# calculate and display descriptive statistics (mean, SD, and sample size) for chunk-level sentiment scores by species.
pet_chunks %>%
  group_by(species) %>%
  summarise(
    mean_sentiment = round(mean(ave_sentiment), 3),
    sd_sentiment = round(sd(ave_sentiment), 3),
    n = n()
  ) %>%
  knitr::kable()

Table 2: Mean and Standard Deviation of Chunk-Level Sentiment Scores by Species

species	mean_sentiment	sd_sentiment	n
cat	0.091	0.162	167
dog	0.114	0.156	369

Descriptive statistics showed that the mean sentiment score for dog-related chunks was 0.114 (SD = 0.156, n = 369), whereas the mean for cat-related chunks was 0.091 (SD = 0.162, n = 167).

2. Emotion Category Analysis (NRC Lexicon)

a. Tokenization

To examine discrete emotional categories, the cleaned text was tokenized and matched with the NRC emotion lexicon.

# 1. break each chunk into individual words while keeping the participant and chunk IDs
tokens <- pet_chunks %>%
  select(participant_id, chunk_uid, species, clean_chunk) %>%
  unnest_tokens(word, clean_chunk)

# 2. count the total number of words in each chunk (helps to adjust for chunk length in later analyses)
chunk_total_words <- tokens %>%
  count(participant_id, chunk_uid, species, name = "total_words")

# 3. load the NRC emotion lexicon
nrc <- get_sentiments("nrc")

# 4. # inner_join keeps only tokens that appear in the NRC lexicon
nrc_tokens <- tokens %>%
  inner_join(nrc, by = "word", relationship = "many-to-many")

# 5. focus on discrete emotions only (exclude positive/negative)
nrc_discrete <- nrc_tokens %>%
  filter(!sentiment %in% c("positive", "negative"))

# 6.creates one row per chunk per emotion category
chunk_emotion_counts <- nrc_discrete %>%
  count(participant_id, chunk_uid, species, sentiment, name = "emotion_n") %>%
  left_join(chunk_total_words, by = c("participant_id", "chunk_uid", "species"))

b. Descriptive Emotion Distribution

To explore the distribution of discrete emotional categories across species, the proportion of NRC emotion words was calculated separately for dog and cat narratives. For each species, emotion counts were converted to proportions to account for differences in overall emotion word frequency. Figure 1 presents the descriptive distribution of discrete emotions across species.

# summarize the total number of emotion words for each emotion category within each species
emotion_summary <- chunk_emotion_counts %>%
  group_by(species, sentiment) %>%
  summarise(total_emotion_words = sum(emotion_n), .groups = "drop")

# convert raw counts to proportions within each species, which allows us to compare the relative distribution of emotions 
emotion_summary_prop <- emotion_summary %>%
  group_by(species) %>%
  mutate(prop = total_emotion_words / sum(total_emotion_words))

# create a bar plot showing the proportion of each emotion category by species
ggplot(emotion_summary_prop,
       aes(x = sentiment, y = prop, fill = species)) +
  geom_col(position = "dodge") +
  labs(x = "Emotion Category",
       y = "Proportion of Emotion Words",
       fill = "Species") +
  theme_minimal()

Figure 1: Distribution of NRC Discrete Emotions by Species (Descriptive)

Interpretation:

As shown in Figure 1, the overall pattern of discrete emotions was broadly similar across dog and cat narratives. Joy and trust emerged as the most frequent emotion categories for both species, whereas anger and disgust were relatively less common. Although minor visual differences were observed in certain categories (e.g., anticipation and surprise), the general distribution of emotions appeared comparable across species. Formal statistical tests were conducted to determine whether these observed differences were statistically meaningful.

Statistical Testing

1. Emotional Valence Comparison (Chunk-Level Analysis)

Analytic Question

Does the overall emotional tone differ when people talk about dogs versus cats?

Since participants contributed multiple narrative chunks, a linear mixed-effects model was fitted to examine whether overall emotional valence differed between dog and cat narratives while accounting for clustering within participants. A random intercept for participant was included to allow each individual to vary in their baseline level of emotional tone.

# fit linear mixed-effects model with participant as a random intercept
m_chunks <- lmer(ave_sentiment ~ species + (1 | participant_id),
              data = pet_chunks)
# 1. ave_sentiment is the outcome variable (sentiment score for each chunk)
# 2. species is the predictor (dog vs cat)
# 3. (1 | participant_id) accounts for the fact that chunks are grouped within participants
# 4. data = pet_chunks tells the model which dataset to use

summary(m_chunks)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: ave_sentiment ~ species + (1 | participant_id)
   Data: pet_chunks

REML criterion at convergence: -443.4

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-3.9931 -0.6453 -0.1090  0.6055  4.7474 

Random effects:
 Groups         Name        Variance  Std.Dev.
 participant_id (Intercept) 5.314e-05 0.00729 
 Residual                   2.495e-02 0.15797 
Number of obs: 536, groups:  participant_id, 20

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  0.09045    0.01251 27.21218   7.232 8.48e-08 ***
speciesdog   0.02401    0.01504 33.64186   1.596     0.12    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
           (Intr)
speciesdog -0.826

Interpretation:

A linear mixed-effects model was fitted to examine whether overall emotional valence differed by species while accounting for clustering of narrative chunks within participants. Species was not a significant predictor of valence, b = 0.024, SE = 0.015, t = 1.60, p = .12. Although dog-related chunks showed slightly higher average sentiment scores than cat-related chunks, this difference was small and not statistically significant after accounting for participant-level variation.

Robustness Check: Sentence-Level Analysis

As a robustness check, emotional valence was reanalyzed at the sentence level to determine whether the findings were sensitive to the choice of chunk segmentation. A linear mixed-effects model was again used to account for clustering of sentences within participants.

# split each cleaned chunk into individual sentences (this creates a new dataset where each row represents one sentence)
pet_sentences <- pet_chunks %>%
  unnest_tokens(sentence, clean_chunk, token = "sentences") %>%
  mutate(sentence_uid = row_number()) # assign a unique ID to each sentence

# calculates an average sentiment score for each sentence
sentence_sent <- sentiment_by(pet_sentences$sentence, by = pet_sentences$sentence_uid)

# merge sentence-level sentiment scores back into the dataset
pet_sentences <- pet_sentences %>%
  left_join(
    sentence_sent %>%
      select(sentence_uid, ave_sentiment),
    by = "sentence_uid",
    suffix = c("", "_sentence")
  )

# rename the sentiment variable for clarity
pet_sentences <- pet_sentences %>%
  rename(sentence_sentiment = ave_sentiment_sentence)


m_sentences <- lmer(
  sentence_sentiment ~ species + (1 | participant_id) + (1 | chunk_uid),
  data = pet_sentences
)
# Fixed effect: species (dog vs. cat)
# Random intercepts: participant_id (accounts for repeated participants)
#                    chunk_uid (accounts for sentences nested within chunks)

summary(m_sentences)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: sentence_sentiment ~ species + (1 | participant_id) + (1 | chunk_uid)
   Data: pet_sentences

REML criterion at convergence: -481.6

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-4.2408 -0.5183 -0.1086  0.5286  3.9399 

Random effects:
 Groups         Name        Variance  Std.Dev.
 chunk_uid      (Intercept) 0.0046509 0.06820 
 participant_id (Intercept) 0.0001295 0.01138 
 Residual                   0.0277856 0.16669 
Number of obs: 825, groups:  chunk_uid, 536; participant_id, 20

Fixed effects:
            Estimate Std. Error       df t value Pr(>|t|)    
(Intercept)  0.08681    0.01265 24.80038   6.861 3.58e-07 ***
speciesdog   0.01392    0.01522 32.14718   0.914    0.367    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
           (Intr)
speciesdog -0.820

# confidence intervals
confint(m_sentences)

Computing profile confidence intervals ...

                  2.5 %     97.5 %
.sig01       0.04373878 0.08884448
.sig02       0.00000000 0.03179067
.sigma       0.15605770 0.17784810
(Intercept)  0.06176241 0.11132778
speciesdog  -0.01562290 0.04399845

Interpretation:

Consistent with the chunk-level analysis, a sentence-level linear mixed-effects model revealed no significant effect of species on sentiment, b = 0.014, SE = 0.015, t = 0.91, p = 0.38. The 95% confidence interval for the species effect included zero [−0.016, 0.044], indicating no reliable difference in emotional tone between dog and cat narratives at the sentence level. These findings suggest that emotional valence did not meaningfully differ when participants discussed dogs versus cats at the sentence level.

2. Emotion Category Comparison (Chunk-Level Analysis)

Extending the valence analysis to examine discrete emotion categories, a generalized linear mixed-effects model with a Poisson link was fitted to test whether emotion rates differed by species.

Analytic Question

Are certain emotions used more often when people talk about dogs compared to cats?

m_nrc <- glmer(
  emotion_n ~ species * sentiment + (1 | participant_id),
  offset = log(total_words),
  family = poisson,
  data = chunk_emotion_counts,
  control = glmerControl(optimizer = "bobyqa")
)
# 1. emotion_n is the outcome variable (number of emotion words in each chunk)
# 2. species and sentiment are the predictors (dog vs cat, and emotion category)
# 3. species * sentiment tests whether emotion categories differ by species
# 4. (1 | participant_id) accounts for chunks being grouped within participants
# 5. offset = log(total_words) adjusts for differences in chunk length
# 6. family = poisson is used because the outcome is a count variable
# 7. data = chunk_emotion_counts tells the model which dataset to use

summary(m_nrc)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: poisson  ( log )
Formula: emotion_n ~ species * sentiment + (1 | participant_id)
   Data: chunk_emotion_counts
 Offset: log(total_words)
Control: glmerControl(optimizer = "bobyqa")

      AIC       BIC    logLik -2*log(L)  df.resid 
   7615.0    7710.9   -3790.5    7581.0      2073 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.4957 -0.5970  0.0124  0.7120  7.6370 

Random effects:
 Groups         Name        Variance Std.Dev.
 participant_id (Intercept) 0.05446  0.2334  
Number of obs: 2090, groups:  participant_id, 20

Fixed effects:
                                 Estimate Std. Error z value Pr(>|z|)    
(Intercept)                      -4.46961    0.11498 -38.873  < 2e-16 ***
speciesdog                        0.08145    0.12464   0.653  0.51346    
sentimentanticipation             0.34363    0.11537   2.979  0.00290 ** 
sentimentdisgust                  0.10006    0.13134   0.762  0.44617    
sentimentfear                     0.25492    0.12217   2.087  0.03692 *  
sentimentjoy                      0.58107    0.10959   5.302 1.15e-07 ***
sentimentsadness                  0.37268    0.11801   3.158  0.00159 ** 
sentimentsurprise                -0.11588    0.13435  -0.863  0.38840    
sentimenttrust                    0.60656    0.10894   5.568 2.58e-08 ***
speciesdog:sentimentanticipation  0.10650    0.13621   0.782  0.43428    
speciesdog:sentimentdisgust      -0.10090    0.15896  -0.635  0.52557    
speciesdog:sentimentfear         -0.12360    0.14638  -0.844  0.39846    
speciesdog:sentimentjoy          -0.01937    0.13118  -0.148  0.88262    
speciesdog:sentimentsadness      -0.07603    0.14084  -0.540  0.58932    
speciesdog:sentimentsurprise      0.18989    0.15787   1.203  0.22905    
speciesdog:sentimenttrust        -0.00984    0.12994  -0.076  0.93964    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1


Correlation matrix not shown by default, as p = 16 > 12.
Use print(x, correlation=TRUE)  or
    vcov(x)        if you need it

Interpretation:

A Poisson generalized linear mixed-effects model indicated that the frequency of emotion-related words did not differ between dog- and cat-related narratives after accounting for chunk length and participant-level clustering, b = 0.081, SE = 0.125, z = 0.65, p = .51. Although some emotion categories occurred more frequently than others overall, the interaction between species and emotion category was not significant, suggesting that the emotional profiles of language used to describe dogs and cats were similar.

Discussion

Across both analytic approaches, species was not a reliable predictor of emotional language. The chunk-level and sentence-level analyses revealed no significant differences in overall emotional valence between dog and cat narratives. Similarly, although discrete emotion categories varied in overall frequency, the generalized linear mixed-effects model indicated that the distribution of specific emotion types did not meaningfully differ by species after accounting for participant-level clustering and chunk length. Together, these findings suggest that participants expressed comparable emotional tone and emotion profiles when discussing dogs and cats.

Introduction

Required Packages

Data Preparation

Load and Setup Data

Basic Descriptives

Data Cleaning

Sentiment Analysis

1. Emotional Valence Analysis (sentimentr)

2. Emotion Category Analysis (NRC Lexicon)

a. Tokenization

b. Descriptive Emotion Distribution

Statistical Testing

1. Emotional Valence Comparison (Chunk-Level Analysis)

Robustness Check: Sentence-Level Analysis

2. Emotion Category Comparison (Chunk-Level Analysis)

Discussion

1. Emotional Valence Analysis (`sentimentr`)