Introduction

This analysis investigates how the emotional valence of TikTok #storytime videos influences their performance. Performance is measured using the ratio of shares to views. Our dataset consists of 3000 randomly sampled videos posted in 2024 from creators who used the #storytime hashtag. In addition to performance metrics, the dataset includes various user and video attributes such as follower count, likes, comments, video quality, arousal score, and valence score. Note that we am currently waiting for the rest of the data to finish processing, which is why we have proceeded with just 3000 of the 24000 total videos for now.

Dataset Description

User Metrics

  • authorMeta.digg: Number of videos/comments the user has liked.
  • authorMeta.fans: Number of followers.
  • authorMeta.following: Number of accounts the user follows.
  • authorMeta.friends: Number of mutual followers.
  • authorMeta.name: Name of the content creator.
  • authorMeta.verified: Account verified status (boolean).

Video Metrics

  • collectCount: Number of times the video was bookmarked/saved locally.
  • commentCount: Number of comments on the video.
  • createTime: Creation time (numeric timestamp).
  • diggCount: Number of likes on the video.
  • id: Unique post identifier.
  • isPinned: Whether the post was pinned (boolean).
  • playCount: Number of views.
  • shareCount: Number of shares.
  • text: Video caption (string).
  • videoMeta.definition: Definition quality of the video.
  • videoMeta.duration: Length of the video (seconds).
  • hasMention: Whether the post mentions someone else (boolean).

Emotional Metrics

  • transcript.Arousal: Arousal score (0–10).
  • transcript.Valence: Valence score (–1 to 1).

Performance Metric

The performance metric we focus on is defined as:

\[ \text{Shares/View Ratio} = \frac{\text{shares}}{\text{views}} \]

Summary Statistics

A quick look at the basic summary statistics for all variables:

##  authorMeta.digg authorMeta.fans    authorMeta.following authorMeta.friends
##  Min.   :    0   Min.   :   14500   Min.   :   0.0       Min.   :   0.0    
##  1st Qu.:    0   1st Qu.:  199600   1st Qu.:  69.0       1st Qu.:  23.0    
##  Median :    0   Median :  684400   Median : 216.0       Median :  77.0    
##  Mean   : 1263   Mean   : 1700593   Mean   : 504.2       Mean   : 157.4    
##  3rd Qu.:    0   3rd Qu.: 1800000   3rd Qu.: 671.0       3rd Qu.: 191.0    
##  Max.   :77300   Max.   :16700000   Max.   :4005.0       Max.   :1095.0    
##                                                                            
##  authorMeta.heart    authorMeta.id       authorMeta.name    authorMeta.verified
##  Min.   :    44700   Min.   :1.686e+06   Length:3000        Mode :logical      
##  1st Qu.:  7600000   1st Qu.:6.715e+18   Class :character   FALSE:2583         
##  Median : 22900000   Median :6.790e+18   Mode  :character   TRUE :417          
##  Mean   : 67481783   Mean   :5.850e+18                                         
##  3rd Qu.: 84900000   3rd Qu.:7.143e+18                                         
##  Max.   :549000000   Max.   :7.441e+18                                         
##                                                                                
##  authorMeta.video  collectCount     commentCount     createTime       
##  Min.   :  19     Min.   :     0   Min.   :    0   Min.   :1.675e+09  
##  1st Qu.: 356     1st Qu.:    20   1st Qu.:   10   1st Qu.:1.729e+09  
##  Median : 751     Median :   147   Median :   35   Median :1.733e+09  
##  Mean   :1022     Mean   :  3675   Mean   :  399   Mean   :1.730e+09  
##  3rd Qu.:1590     3rd Qu.:  1069   3rd Qu.:  167   3rd Qu.:1.734e+09  
##  Max.   :4503     Max.   :290400   Max.   :47100   Max.   :1.736e+09  
##                                                                       
##    diggCount             id             isPinned       apifyDownloadUrl  
##  Min.   :      2   Min.   :7.194e+18   Mode :logical   Length:3000       
##  1st Qu.:    330   1st Qu.:7.427e+18   FALSE:3000      Class :character  
##  Median :   2608   Median :7.443e+18                   Mode  :character  
##  Mean   :  49405   Mean   :7.431e+18                                     
##  3rd Qu.:  21025   3rd Qu.:7.449e+18                                     
##  Max.   :3300000   Max.   :7.455e+18                                     
##                                                                          
##  musicMeta.musicId   musicMeta.musicName   playCount          shareCount    
##  Min.   :0.000e+00   Length:3000         Min.   :     182   Min.   :     0  
##  1st Qu.:7.374e+18   Class :character    1st Qu.:    9232   1st Qu.:     4  
##  Median :7.431e+18   Mode  :character    Median :   45800   Median :    23  
##  Mean   :7.328e+18                       Mean   :  464261   Mean   :  1711  
##  3rd Qu.:7.446e+18                       3rd Qu.:  240425   3rd Qu.:   234  
##  Max.   :7.455e+18                       Max.   :20100000   Max.   :222800  
##                                                                             
##      text           videoMeta.definition videoMeta.duration videoMeta.height
##  Length:3000        Length:3000          Min.   :  2.0      Min.   : 480    
##  Class :character   Class :character     1st Qu.: 22.0      1st Qu.:1024    
##  Mode  :character   Mode  :character     Median : 66.0      Median :1024    
##                                          Mean   :102.1      Mean   :1013    
##                                          3rd Qu.:122.2      3rd Qu.:1024    
##                                          Max.   :592.0      Max.   :1280    
##                                                                             
##  videoMeta.width  webVideoUrl        hasMention      video_available
##  Min.   : 538.0   Length:3000        Mode :logical   Mode :logical  
##  1st Qu.: 576.0   Class :character   FALSE:2515      FALSE:14       
##  Median : 576.0   Mode  :character   TRUE :485       TRUE :2986     
##  Mean   : 582.4                                                     
##  3rd Qu.: 576.0                                                     
##  Max.   :1024.0                                                     
##                                                                     
##  transcripts        transcript.Valence transcript.Arousal  verified   
##  Length:3000        Min.   :-1.0000    Min.   : 0.0       FALSE:2583  
##  Class :character   1st Qu.:-0.5000    1st Qu.: 5.0       TRUE : 417  
##  Mode  :character   Median : 0.2000    Median : 6.0                   
##                     Mean   : 0.0542    Mean   : 6.2                   
##                     3rd Qu.: 0.5000    3rd Qu.: 7.0                   
##                     Max.   : 1.0000    Max.   :10.0                   
##                     NA's   :14         NA's   :14                     
##    pinned      mentions    shares_view_ratio  
##  FALSE:3000   FALSE:2515   Min.   :0.0000000  
##               TRUE : 485   1st Qu.:0.0002207  
##                            Median :0.0005751  
##                            Mean   :0.0017684  
##                            3rd Qu.:0.0015005  
##                            Max.   :0.0704280  
## 

For a more in-depth summary of the data, including missing values and distributions:

Data summary
Name data
Number of rows 3000
Number of columns 35
_______________________
Column type frequency:
character 7
factor 3
logical 4
numeric 21
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
authorMeta.name 0 1.00 6 19 0 66 0
apifyDownloadUrl 0 1.00 115 119 0 3000 0
musicMeta.musicName 0 1.00 2 69 0 446 0
text 114 0.96 1 1036 0 2558 0
videoMeta.definition 0 1.00 4 4 0 3 0
webVideoUrl 0 1.00 56 69 0 3000 0
transcripts 14 1.00 1 13149 0 2933 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
verified 0 1 FALSE 2 FAL: 2583, TRU: 417
pinned 0 1 FALSE 1 FAL: 3000
mentions 0 1 FALSE 2 FAL: 2515, TRU: 485

Variable type: logical

skim_variable n_missing complete_rate mean count
authorMeta.verified 0 1 0.14 FAL: 2583, TRU: 417
isPinned 0 1 0.00 FAL: 3000
hasMention 0 1 0.16 FAL: 2515, TRU: 485
video_available 0 1 1.00 TRU: 2986, FAL: 14

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
authorMeta.digg 0 1 1.262570e+03 9.799710e+03 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 7.730000e+04 ▇▁▁▁▁
authorMeta.fans 0 1 1.700593e+06 2.577532e+06 1.450000e+04 1.996000e+05 6.844000e+05 1.800000e+06 1.670000e+07 ▇▁▁▁▁
authorMeta.following 0 1 5.041800e+02 7.172500e+02 0.000000e+00 6.900000e+01 2.160000e+02 6.710000e+02 4.005000e+03 ▇▁▁▁▁
authorMeta.friends 0 1 1.573600e+02 2.172200e+02 0.000000e+00 2.300000e+01 7.700000e+01 1.910000e+02 1.095000e+03 ▇▂▁▁▁
authorMeta.heart 0 1 6.748178e+07 9.750761e+07 4.470000e+04 7.600000e+06 2.290000e+07 8.490000e+07 5.490000e+08 ▇▁▁▁▁
authorMeta.id 0 1 5.850033e+18 2.542052e+18 1.686028e+06 6.715424e+18 6.789588e+18 7.143372e+18 7.440710e+18 ▂▁▁▁▇
authorMeta.video 0 1 1.021550e+03 8.378100e+02 1.900000e+01 3.560000e+02 7.510000e+02 1.590000e+03 4.503000e+03 ▇▃▂▁▁
collectCount 0 1 3.674940e+03 1.500483e+04 0.000000e+00 2.000000e+01 1.470000e+02 1.069250e+03 2.904000e+05 ▇▁▁▁▁
commentCount 0 1 3.990200e+02 1.657940e+03 0.000000e+00 1.000000e+01 3.500000e+01 1.670000e+02 4.710000e+04 ▇▁▁▁▁
createTime 0 1 1.730155e+09 7.311797e+06 1.675062e+09 1.729172e+09 1.732900e+09 1.734405e+09 1.735689e+09 ▁▁▁▁▇
diggCount 0 1 4.940526e+04 1.805710e+05 2.000000e+00 3.300000e+02 2.608500e+03 2.102500e+04 3.300000e+06 ▇▁▁▁▁
id 0 1 7.430946e+18 3.142727e+16 7.194338e+18 7.426739e+18 7.442734e+18 7.449215e+18 7.454726e+18 ▁▁▁▁▇
musicMeta.musicId 0 1 7.327568e+18 5.321731e+17 0.000000e+00 7.374449e+18 7.431010e+18 7.446180e+18 7.454725e+18 ▁▁▁▁▇
playCount 0 1 4.642608e+05 1.490896e+06 1.820000e+02 9.232000e+03 4.580000e+04 2.404250e+05 2.010000e+07 ▇▁▁▁▁
shareCount 0 1 1.711050e+03 1.088981e+04 0.000000e+00 4.000000e+00 2.300000e+01 2.340000e+02 2.228000e+05 ▇▁▁▁▁
videoMeta.duration 0 1 1.020500e+02 1.128600e+02 2.000000e+00 2.200000e+01 6.600000e+01 1.222500e+02 5.920000e+02 ▇▂▁▁▁
videoMeta.height 0 1 1.013430e+03 7.426000e+01 4.800000e+02 1.024000e+03 1.024000e+03 1.024000e+03 1.280000e+03 ▁▁▁▇▁
videoMeta.width 0 1 5.823800e+02 5.031000e+01 5.380000e+02 5.760000e+02 5.760000e+02 5.760000e+02 1.024000e+03 ▇▁▁▁▁
transcript.Valence 14 1 5.000000e-02 5.800000e-01 -1.000000e+00 -5.000000e-01 2.000000e-01 5.000000e-01 1.000000e+00 ▆▆▃▇▅
transcript.Arousal 14 1 6.200000e+00 1.490000e+00 0.000000e+00 5.000000e+00 6.000000e+00 7.000000e+00 1.000000e+01 ▁▂▇▇▁
shares_view_ratio 0 1 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00 7.000000e-02 ▇▁▁▁▁

Exploratory Data Analysis

Distribution of Key Numeric Variables

We generate historgrams for variables such as follower count, views, likes, shares, arousal, and valence.

Notes

Follower Count, Views, Likes, Shares

When we apply a log transform to Follower Count, Video Views, Video Likes, and Video Shares, we observe that the distributions become more balanced and closer to a bell shape. This suggests that these metrics are influenced by multiplicative changes rather than simple additive ones. For example, going from 1,000 to 10,000 followers may have a similar impact as moving from 10,000 to 100,000, because each is a tenfold increase. Consequently, if we decide to model these variables, incorporating a log transform may be better.

Arousal and Valence

We also notice that Arousal and Valence are not as skewed as the engagement metrics and span a wide range of emotional states. Arousal tends to cluster in the mid-to-high range, whereas Valence is distributed from negative to positive values, not many values near zero. This variation indicates that the videos in our sample exhibit a diverse array of emotional tones, which is beneficial for studying how different emotional states might influence engagement.

Shares to Views Ratio

We calculate the performance metric (shares/view ratio) and examine its distribution.

### Notes

Log Distribution of Shares/View Ratio

When we examine the Shares to Views Ratio on a log scale, the distribution appears more balanced than it would on a linear scale. This indicates that multiplicative changes play a key role in how frequently videos are shared relative to their total views. In other words, small differences in the ratio (e.g., going from 0.001 to 0.002) may be more meaningful than they would seem on a linear scale. By focusing on the log distribution, we can better identify patterns and outliers in how likely viewers are to share a video once they’ve watched it. This also supports the idea that any modeling we do on shares/view ratio might benefit from a log transform. This might mean that percentage changes or proportional increases are central to engagement.

Emotional Metrics vs. Video Performance

We explore the relationship between the emotional scores and video performance using scatter plots with linear fits.

Notes

Valence vs. Log(Shares/View Ratio)

We notice that valence scores cluster both above and below zero, suggesting that videos can be emotionally positive or negative. Our initial linear trend is relatively flat, but the distribution hints that extreme valence—whether strongly positive or strongly negative—could be more influential on sharing behavior than moderate valence. To investigate this, we plan to explore nonlinear transformations (such as squaring valence or taking the absolute value) to see if extreme emotional tone correlates with higher shares.

Arousal vs. Log(Shares/View Ratio)

Aarousal tends to be slightly more clustered above 0.5, and our scatter plot suggests a modest positive slope. This implies that videos with higher arousal levels may be shared slightly more often. Given that much of our data is already in the higher arousal range, we suspect that maintaining an elevated emotional intensity could contribute to better performance. We will investigate this further by refining our model, possibly incorporating interaction terms or more flexible functional forms to capture the nuances of how arousal impacts video sharing.

Boxplots for Categorical Variables

We can also examine differences in performance by categorical variables such as verified status and pinned posts.

Notes

Shares/View Ratio by Verified Status

Our boxplot shows that both verified and unverified accounts span a wide range of shares-to-views ratios. While verified accounts may exhibit a slightly higher median in the log-transformed ratio, the overlap is substantial. This suggests that being verified alone does not guarantee significantly higher sharing rates, although it might offer a modest advantage. Further statistical testing (e.g., a t-test or nonparametric test) would help determine whether the observed difference is meaningful or simply due to variability.

Shares/View Ratio by Pinned Status

This graph is not that helpful right now becasue all of the 3000 data points are not pinned. Once the rest of the data is done processing, we can take another look here. Pinned posts show a somewhat higher median shares-to-views ratio than unpinned posts, but again the distributions overlap considerably. This indicates that pinning a post could be associated with marginally increased shares, yet it’s not a decisive factor for all videos. As with verification, we may need additional modeling or tests to see whether pinning posts systematically influences share behavior or if other factors (such as content quality or audience size) play a more important role.

Correlation Matrix of Numeric Variables

We construct a correlation matrix to explore relationships among the numeric variables.

Notes

Overall Patterns

Engagement metrics (e.g., diggCount, commentCount, shareCount) tend to cluster together with moderate to strong correlations, suggesting that videos receiving more likes typically also receive more comments and shares.

Shares/View Ratio

The shares_view_ratio column does not exhibit a strong linear correlation with most other variables, including emotional measures (transcript.Arousal and transcript.Valence). This implies that factors like arousal and valence alone may not directly drive shares in a simple linear fashion. Instead, other aspects—such as audience size, video quality, or more nuanced emotional expressions—may interact to influence how frequently viewers share a video.

Emotional Variables

We also notice that transcript.Arousal and transcript.Valence are not highly correlated with each other, suggesting that these two dimensions of emotion capture different aspects of a video’s emotional profile. Their low correlations with performance metrics may point to the need for more complex or non-linear modeling to fully understand how emotional factors affect engagement.

Regression Analysis

More playing around with this once the rest of the data is finsihed processing

We now examine whether the emotional valence of the videos predicts their performance (shares/view ratio) while controlling for other factors. Note that we are only using 3000 of the nearly 24000 complete dataset.

Simple Linear Regression

## 
## Call:
## lm(formula = shares_view_ratio ~ transcript.Valence + transcript.Arousal + 
##     authorMeta.fans + diggCount + playCount + commentCount + 
##     videoMeta.duration + videoMeta.definition + authorMeta.verified, 
##     data = data)
## 
## Residuals:
##       Min        1Q    Median        3Q       Max 
## -0.014444 -0.001443 -0.000996 -0.000144  0.068321 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              -2.197e-04  4.080e-03  -0.054 0.957053    
## transcript.Valence        7.146e-05  1.359e-04   0.526 0.599093    
## transcript.Arousal        1.528e-04  5.515e-05   2.770 0.005637 ** 
## authorMeta.fans          -1.063e-10  3.495e-11  -3.042 0.002371 ** 
## diggCount                 6.727e-09  1.205e-09   5.583 2.57e-08 ***
## playCount                -5.295e-10  1.471e-10  -3.600 0.000323 ***
## commentCount              3.068e-07  6.694e-08   4.582 4.79e-06 ***
## videoMeta.duration       -2.508e-06  7.306e-07  -3.433 0.000606 ***
## videoMeta.definition540p  1.204e-03  4.068e-03   0.296 0.767366    
## videoMeta.definition720p  2.809e-03  4.175e-03   0.673 0.501041    
## authorMeta.verifiedTRUE   3.809e-04  2.461e-04   1.548 0.121765    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.004062 on 2975 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:  0.05258,    Adjusted R-squared:  0.0494 
## F-statistic: 16.51 on 10 and 2975 DF,  p-value: < 2.2e-16

Log Adjusted Linear Regression

## 
## Call:
## lm(formula = shares_view_ratio_log ~ transcript.Valence + transcript.Arousal + 
##     authorMeta.fans + diggCount + playCount + commentCount + 
##     videoMeta.duration + videoMeta.definition + authorMeta.verified, 
##     data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3108 -0.5785  0.2781  1.2314  5.3577 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              -9.126e+00  2.209e+00  -4.132 3.69e-05 ***
## transcript.Valence        1.237e-01  7.357e-02   1.681 0.092773 .  
## transcript.Arousal        1.076e-01  2.985e-02   3.603 0.000319 ***
## authorMeta.fans           4.995e-08  1.892e-08   2.640 0.008322 ** 
## diggCount                 1.682e-06  6.521e-07   2.579 0.009967 ** 
## playCount                -1.019e-08  7.962e-08  -0.128 0.898178    
## commentCount              6.589e-05  3.624e-05   1.818 0.069115 .  
## videoMeta.duration        1.499e-03  3.955e-04   3.791 0.000153 ***
## videoMeta.definition540p  2.449e-01  2.202e+00   0.111 0.911455    
## videoMeta.definition720p  1.329e+00  2.260e+00   0.588 0.556435    
## authorMeta.verifiedTRUE   4.167e-01  1.332e-01   3.129 0.001773 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.199 on 2975 degrees of freedom
##   (14 observations deleted due to missingness)
## Multiple R-squared:  0.0661, Adjusted R-squared:  0.06296 
## F-statistic: 21.06 on 10 and 2975 DF,  p-value: < 2.2e-16

Log Adjusted Linear Regression

## 
## Call:
## glm.nb(formula = shareCount ~ transcript.Valence + transcript.Arousal + 
##     authorMeta.fans + diggCount + commentCount + videoMeta.duration + 
##     videoMeta.definition + authorMeta.verified + offset(log(playCount)), 
##     data = data, init.theta = 0.6170287704, link = log)
## 
## Coefficients:
##                            Estimate Std. Error z value Pr(>|z|)    
## (Intercept)              -8.072e+00  1.302e+00  -6.201 5.62e-10 ***
## transcript.Valence        2.334e-02  4.373e-02   0.534  0.59356    
## transcript.Arousal        7.437e-02  1.787e-02   4.162 3.16e-05 ***
## authorMeta.fans          -6.598e-08  1.081e-08  -6.105 1.03e-09 ***
## diggCount                 9.860e-07  1.910e-07   5.163 2.43e-07 ***
## commentCount              2.355e-04  2.076e-05  11.343  < 2e-16 ***
## videoMeta.duration       -1.455e-03  2.323e-04  -6.266 3.71e-10 ***
## videoMeta.definition540p  1.291e+00  1.298e+00   0.994  0.31998    
## videoMeta.definition720p  1.965e+00  1.331e+00   1.476  0.13984    
## authorMeta.verifiedTRUE   2.328e-01  7.782e-02   2.991  0.00278 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for Negative Binomial(0.617) family taken to be 1)
## 
##     Null deviance: 3770.9  on 2985  degrees of freedom
## Residual deviance: 3482.6  on 2976  degrees of freedom
##   (14 observations deleted due to missingness)
## AIC: 31907
## 
## Number of Fisher Scoring iterations: 1
## 
## 
##               Theta:  0.6170 
##           Std. Err.:  0.0149 
## 
##  2 x log-likelihood:  -31884.6640