This analysis investigates how the emotional valence of TikTok #storytime videos influences their performance. Performance is measured using the ratio of shares to views. Our dataset consists of 3000 randomly sampled videos posted in 2024 from creators who used the #storytime hashtag. In addition to performance metrics, the dataset includes various user and video attributes such as follower count, likes, comments, video quality, arousal score, and valence score. Note that we am currently waiting for the rest of the data to finish processing, which is why we have proceeded with just 3000 of the 24000 total videos for now.
authorMeta.digg: Number of
videos/comments the user has liked.authorMeta.fans: Number of
followers.authorMeta.following: Number of
accounts the user follows.authorMeta.friends: Number of mutual
followers.authorMeta.name: Name of the content
creator.authorMeta.verified: Account verified
status (boolean).collectCount: Number of times the
video was bookmarked/saved locally.commentCount: Number of comments on
the video.createTime: Creation time (numeric
timestamp).diggCount: Number of likes on the
video.id: Unique post identifier.isPinned: Whether the post was pinned
(boolean).playCount: Number of views.shareCount: Number of shares.text: Video caption (string).videoMeta.definition: Definition
quality of the video.videoMeta.duration: Length of the
video (seconds).hasMention: Whether the post mentions
someone else (boolean).transcript.Arousal: Arousal score
(0–10).transcript.Valence: Valence score (–1
to 1).The performance metric we focus on is defined as:
\[ \text{Shares/View Ratio} = \frac{\text{shares}}{\text{views}} \]
A quick look at the basic summary statistics for all variables:
## authorMeta.digg authorMeta.fans authorMeta.following authorMeta.friends
## Min. : 0 Min. : 14500 Min. : 0.0 Min. : 0.0
## 1st Qu.: 0 1st Qu.: 199600 1st Qu.: 69.0 1st Qu.: 23.0
## Median : 0 Median : 684400 Median : 216.0 Median : 77.0
## Mean : 1263 Mean : 1700593 Mean : 504.2 Mean : 157.4
## 3rd Qu.: 0 3rd Qu.: 1800000 3rd Qu.: 671.0 3rd Qu.: 191.0
## Max. :77300 Max. :16700000 Max. :4005.0 Max. :1095.0
##
## authorMeta.heart authorMeta.id authorMeta.name authorMeta.verified
## Min. : 44700 Min. :1.686e+06 Length:3000 Mode :logical
## 1st Qu.: 7600000 1st Qu.:6.715e+18 Class :character FALSE:2583
## Median : 22900000 Median :6.790e+18 Mode :character TRUE :417
## Mean : 67481783 Mean :5.850e+18
## 3rd Qu.: 84900000 3rd Qu.:7.143e+18
## Max. :549000000 Max. :7.441e+18
##
## authorMeta.video collectCount commentCount createTime
## Min. : 19 Min. : 0 Min. : 0 Min. :1.675e+09
## 1st Qu.: 356 1st Qu.: 20 1st Qu.: 10 1st Qu.:1.729e+09
## Median : 751 Median : 147 Median : 35 Median :1.733e+09
## Mean :1022 Mean : 3675 Mean : 399 Mean :1.730e+09
## 3rd Qu.:1590 3rd Qu.: 1069 3rd Qu.: 167 3rd Qu.:1.734e+09
## Max. :4503 Max. :290400 Max. :47100 Max. :1.736e+09
##
## diggCount id isPinned apifyDownloadUrl
## Min. : 2 Min. :7.194e+18 Mode :logical Length:3000
## 1st Qu.: 330 1st Qu.:7.427e+18 FALSE:3000 Class :character
## Median : 2608 Median :7.443e+18 Mode :character
## Mean : 49405 Mean :7.431e+18
## 3rd Qu.: 21025 3rd Qu.:7.449e+18
## Max. :3300000 Max. :7.455e+18
##
## musicMeta.musicId musicMeta.musicName playCount shareCount
## Min. :0.000e+00 Length:3000 Min. : 182 Min. : 0
## 1st Qu.:7.374e+18 Class :character 1st Qu.: 9232 1st Qu.: 4
## Median :7.431e+18 Mode :character Median : 45800 Median : 23
## Mean :7.328e+18 Mean : 464261 Mean : 1711
## 3rd Qu.:7.446e+18 3rd Qu.: 240425 3rd Qu.: 234
## Max. :7.455e+18 Max. :20100000 Max. :222800
##
## text videoMeta.definition videoMeta.duration videoMeta.height
## Length:3000 Length:3000 Min. : 2.0 Min. : 480
## Class :character Class :character 1st Qu.: 22.0 1st Qu.:1024
## Mode :character Mode :character Median : 66.0 Median :1024
## Mean :102.1 Mean :1013
## 3rd Qu.:122.2 3rd Qu.:1024
## Max. :592.0 Max. :1280
##
## videoMeta.width webVideoUrl hasMention video_available
## Min. : 538.0 Length:3000 Mode :logical Mode :logical
## 1st Qu.: 576.0 Class :character FALSE:2515 FALSE:14
## Median : 576.0 Mode :character TRUE :485 TRUE :2986
## Mean : 582.4
## 3rd Qu.: 576.0
## Max. :1024.0
##
## transcripts transcript.Valence transcript.Arousal verified
## Length:3000 Min. :-1.0000 Min. : 0.0 FALSE:2583
## Class :character 1st Qu.:-0.5000 1st Qu.: 5.0 TRUE : 417
## Mode :character Median : 0.2000 Median : 6.0
## Mean : 0.0542 Mean : 6.2
## 3rd Qu.: 0.5000 3rd Qu.: 7.0
## Max. : 1.0000 Max. :10.0
## NA's :14 NA's :14
## pinned mentions shares_view_ratio
## FALSE:3000 FALSE:2515 Min. :0.0000000
## TRUE : 485 1st Qu.:0.0002207
## Median :0.0005751
## Mean :0.0017684
## 3rd Qu.:0.0015005
## Max. :0.0704280
##
For a more in-depth summary of the data, including missing values and distributions:
| Name | data |
| Number of rows | 3000 |
| Number of columns | 35 |
| _______________________ | |
| Column type frequency: | |
| character | 7 |
| factor | 3 |
| logical | 4 |
| numeric | 21 |
| ________________________ | |
| Group variables | None |
Variable type: character
| skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
|---|---|---|---|---|---|---|---|
| authorMeta.name | 0 | 1.00 | 6 | 19 | 0 | 66 | 0 |
| apifyDownloadUrl | 0 | 1.00 | 115 | 119 | 0 | 3000 | 0 |
| musicMeta.musicName | 0 | 1.00 | 2 | 69 | 0 | 446 | 0 |
| text | 114 | 0.96 | 1 | 1036 | 0 | 2558 | 0 |
| videoMeta.definition | 0 | 1.00 | 4 | 4 | 0 | 3 | 0 |
| webVideoUrl | 0 | 1.00 | 56 | 69 | 0 | 3000 | 0 |
| transcripts | 14 | 1.00 | 1 | 13149 | 0 | 2933 | 0 |
Variable type: factor
| skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
|---|---|---|---|---|---|
| verified | 0 | 1 | FALSE | 2 | FAL: 2583, TRU: 417 |
| pinned | 0 | 1 | FALSE | 1 | FAL: 3000 |
| mentions | 0 | 1 | FALSE | 2 | FAL: 2515, TRU: 485 |
Variable type: logical
| skim_variable | n_missing | complete_rate | mean | count |
|---|---|---|---|---|
| authorMeta.verified | 0 | 1 | 0.14 | FAL: 2583, TRU: 417 |
| isPinned | 0 | 1 | 0.00 | FAL: 3000 |
| hasMention | 0 | 1 | 0.16 | FAL: 2515, TRU: 485 |
| video_available | 0 | 1 | 1.00 | TRU: 2986, FAL: 14 |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| authorMeta.digg | 0 | 1 | 1.262570e+03 | 9.799710e+03 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 7.730000e+04 | ▇▁▁▁▁ |
| authorMeta.fans | 0 | 1 | 1.700593e+06 | 2.577532e+06 | 1.450000e+04 | 1.996000e+05 | 6.844000e+05 | 1.800000e+06 | 1.670000e+07 | ▇▁▁▁▁ |
| authorMeta.following | 0 | 1 | 5.041800e+02 | 7.172500e+02 | 0.000000e+00 | 6.900000e+01 | 2.160000e+02 | 6.710000e+02 | 4.005000e+03 | ▇▁▁▁▁ |
| authorMeta.friends | 0 | 1 | 1.573600e+02 | 2.172200e+02 | 0.000000e+00 | 2.300000e+01 | 7.700000e+01 | 1.910000e+02 | 1.095000e+03 | ▇▂▁▁▁ |
| authorMeta.heart | 0 | 1 | 6.748178e+07 | 9.750761e+07 | 4.470000e+04 | 7.600000e+06 | 2.290000e+07 | 8.490000e+07 | 5.490000e+08 | ▇▁▁▁▁ |
| authorMeta.id | 0 | 1 | 5.850033e+18 | 2.542052e+18 | 1.686028e+06 | 6.715424e+18 | 6.789588e+18 | 7.143372e+18 | 7.440710e+18 | ▂▁▁▁▇ |
| authorMeta.video | 0 | 1 | 1.021550e+03 | 8.378100e+02 | 1.900000e+01 | 3.560000e+02 | 7.510000e+02 | 1.590000e+03 | 4.503000e+03 | ▇▃▂▁▁ |
| collectCount | 0 | 1 | 3.674940e+03 | 1.500483e+04 | 0.000000e+00 | 2.000000e+01 | 1.470000e+02 | 1.069250e+03 | 2.904000e+05 | ▇▁▁▁▁ |
| commentCount | 0 | 1 | 3.990200e+02 | 1.657940e+03 | 0.000000e+00 | 1.000000e+01 | 3.500000e+01 | 1.670000e+02 | 4.710000e+04 | ▇▁▁▁▁ |
| createTime | 0 | 1 | 1.730155e+09 | 7.311797e+06 | 1.675062e+09 | 1.729172e+09 | 1.732900e+09 | 1.734405e+09 | 1.735689e+09 | ▁▁▁▁▇ |
| diggCount | 0 | 1 | 4.940526e+04 | 1.805710e+05 | 2.000000e+00 | 3.300000e+02 | 2.608500e+03 | 2.102500e+04 | 3.300000e+06 | ▇▁▁▁▁ |
| id | 0 | 1 | 7.430946e+18 | 3.142727e+16 | 7.194338e+18 | 7.426739e+18 | 7.442734e+18 | 7.449215e+18 | 7.454726e+18 | ▁▁▁▁▇ |
| musicMeta.musicId | 0 | 1 | 7.327568e+18 | 5.321731e+17 | 0.000000e+00 | 7.374449e+18 | 7.431010e+18 | 7.446180e+18 | 7.454725e+18 | ▁▁▁▁▇ |
| playCount | 0 | 1 | 4.642608e+05 | 1.490896e+06 | 1.820000e+02 | 9.232000e+03 | 4.580000e+04 | 2.404250e+05 | 2.010000e+07 | ▇▁▁▁▁ |
| shareCount | 0 | 1 | 1.711050e+03 | 1.088981e+04 | 0.000000e+00 | 4.000000e+00 | 2.300000e+01 | 2.340000e+02 | 2.228000e+05 | ▇▁▁▁▁ |
| videoMeta.duration | 0 | 1 | 1.020500e+02 | 1.128600e+02 | 2.000000e+00 | 2.200000e+01 | 6.600000e+01 | 1.222500e+02 | 5.920000e+02 | ▇▂▁▁▁ |
| videoMeta.height | 0 | 1 | 1.013430e+03 | 7.426000e+01 | 4.800000e+02 | 1.024000e+03 | 1.024000e+03 | 1.024000e+03 | 1.280000e+03 | ▁▁▁▇▁ |
| videoMeta.width | 0 | 1 | 5.823800e+02 | 5.031000e+01 | 5.380000e+02 | 5.760000e+02 | 5.760000e+02 | 5.760000e+02 | 1.024000e+03 | ▇▁▁▁▁ |
| transcript.Valence | 14 | 1 | 5.000000e-02 | 5.800000e-01 | -1.000000e+00 | -5.000000e-01 | 2.000000e-01 | 5.000000e-01 | 1.000000e+00 | ▆▆▃▇▅ |
| transcript.Arousal | 14 | 1 | 6.200000e+00 | 1.490000e+00 | 0.000000e+00 | 5.000000e+00 | 6.000000e+00 | 7.000000e+00 | 1.000000e+01 | ▁▂▇▇▁ |
| shares_view_ratio | 0 | 1 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 0.000000e+00 | 7.000000e-02 | ▇▁▁▁▁ |
We generate historgrams for variables such as follower count, views, likes, shares, arousal, and valence.
Follower Count, Views, Likes, Shares
When we apply a log transform to Follower Count, Video Views, Video Likes, and Video Shares, we observe that the distributions become more balanced and closer to a bell shape. This suggests that these metrics are influenced by multiplicative changes rather than simple additive ones. For example, going from 1,000 to 10,000 followers may have a similar impact as moving from 10,000 to 100,000, because each is a tenfold increase. Consequently, if we decide to model these variables, incorporating a log transform may be better.
Arousal and Valence
We also notice that Arousal and Valence are not as skewed as the engagement metrics and span a wide range of emotional states. Arousal tends to cluster in the mid-to-high range, whereas Valence is distributed from negative to positive values, not many values near zero. This variation indicates that the videos in our sample exhibit a diverse array of emotional tones, which is beneficial for studying how different emotional states might influence engagement.
We explore the relationship between the emotional scores and video performance using scatter plots with linear fits.
Valence vs. Log(Shares/View Ratio)
We notice that valence scores cluster both above and below zero, suggesting that videos can be emotionally positive or negative. Our initial linear trend is relatively flat, but the distribution hints that extreme valence—whether strongly positive or strongly negative—could be more influential on sharing behavior than moderate valence. To investigate this, we plan to explore nonlinear transformations (such as squaring valence or taking the absolute value) to see if extreme emotional tone correlates with higher shares.
Arousal vs. Log(Shares/View Ratio)
Aarousal tends to be slightly more clustered above 0.5, and our scatter plot suggests a modest positive slope. This implies that videos with higher arousal levels may be shared slightly more often. Given that much of our data is already in the higher arousal range, we suspect that maintaining an elevated emotional intensity could contribute to better performance. We will investigate this further by refining our model, possibly incorporating interaction terms or more flexible functional forms to capture the nuances of how arousal impacts video sharing.
We can also examine differences in performance by categorical
variables such as verified status and pinned posts.
Shares/View Ratio by Verified Status
Our boxplot shows that both verified and unverified accounts span a wide range of shares-to-views ratios. While verified accounts may exhibit a slightly higher median in the log-transformed ratio, the overlap is substantial. This suggests that being verified alone does not guarantee significantly higher sharing rates, although it might offer a modest advantage. Further statistical testing (e.g., a t-test or nonparametric test) would help determine whether the observed difference is meaningful or simply due to variability.
Shares/View Ratio by Pinned Status
This graph is not that helpful right now becasue all of the 3000 data points are not pinned. Once the rest of the data is done processing, we can take another look here. Pinned posts show a somewhat higher median shares-to-views ratio than unpinned posts, but again the distributions overlap considerably. This indicates that pinning a post could be associated with marginally increased shares, yet it’s not a decisive factor for all videos. As with verification, we may need additional modeling or tests to see whether pinning posts systematically influences share behavior or if other factors (such as content quality or audience size) play a more important role.
We construct a correlation matrix to explore relationships among the
numeric variables.
Overall Patterns
Engagement metrics (e.g.,
diggCount,commentCount,shareCount) tend to cluster together with moderate to strong correlations, suggesting that videos receiving more likes typically also receive more comments and shares.Shares/View Ratio
The
shares_view_ratiocolumn does not exhibit a strong linear correlation with most other variables, including emotional measures (transcript.Arousalandtranscript.Valence). This implies that factors like arousal and valence alone may not directly drive shares in a simple linear fashion. Instead, other aspects—such as audience size, video quality, or more nuanced emotional expressions—may interact to influence how frequently viewers share a video.Emotional Variables
We also notice that
transcript.Arousalandtranscript.Valenceare not highly correlated with each other, suggesting that these two dimensions of emotion capture different aspects of a video’s emotional profile. Their low correlations with performance metrics may point to the need for more complex or non-linear modeling to fully understand how emotional factors affect engagement.
More playing around with this once the rest of the data is finsihed processing
We now examine whether the emotional valence of the videos predicts their performance (shares/view ratio) while controlling for other factors. Note that we are only using 3000 of the nearly 24000 complete dataset.
##
## Call:
## lm(formula = shares_view_ratio ~ transcript.Valence + transcript.Arousal +
## authorMeta.fans + diggCount + playCount + commentCount +
## videoMeta.duration + videoMeta.definition + authorMeta.verified,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.014444 -0.001443 -0.000996 -0.000144 0.068321
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.197e-04 4.080e-03 -0.054 0.957053
## transcript.Valence 7.146e-05 1.359e-04 0.526 0.599093
## transcript.Arousal 1.528e-04 5.515e-05 2.770 0.005637 **
## authorMeta.fans -1.063e-10 3.495e-11 -3.042 0.002371 **
## diggCount 6.727e-09 1.205e-09 5.583 2.57e-08 ***
## playCount -5.295e-10 1.471e-10 -3.600 0.000323 ***
## commentCount 3.068e-07 6.694e-08 4.582 4.79e-06 ***
## videoMeta.duration -2.508e-06 7.306e-07 -3.433 0.000606 ***
## videoMeta.definition540p 1.204e-03 4.068e-03 0.296 0.767366
## videoMeta.definition720p 2.809e-03 4.175e-03 0.673 0.501041
## authorMeta.verifiedTRUE 3.809e-04 2.461e-04 1.548 0.121765
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.004062 on 2975 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.05258, Adjusted R-squared: 0.0494
## F-statistic: 16.51 on 10 and 2975 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = shares_view_ratio_log ~ transcript.Valence + transcript.Arousal +
## authorMeta.fans + diggCount + playCount + commentCount +
## videoMeta.duration + videoMeta.definition + authorMeta.verified,
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.3108 -0.5785 0.2781 1.2314 5.3577
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -9.126e+00 2.209e+00 -4.132 3.69e-05 ***
## transcript.Valence 1.237e-01 7.357e-02 1.681 0.092773 .
## transcript.Arousal 1.076e-01 2.985e-02 3.603 0.000319 ***
## authorMeta.fans 4.995e-08 1.892e-08 2.640 0.008322 **
## diggCount 1.682e-06 6.521e-07 2.579 0.009967 **
## playCount -1.019e-08 7.962e-08 -0.128 0.898178
## commentCount 6.589e-05 3.624e-05 1.818 0.069115 .
## videoMeta.duration 1.499e-03 3.955e-04 3.791 0.000153 ***
## videoMeta.definition540p 2.449e-01 2.202e+00 0.111 0.911455
## videoMeta.definition720p 1.329e+00 2.260e+00 0.588 0.556435
## authorMeta.verifiedTRUE 4.167e-01 1.332e-01 3.129 0.001773 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.199 on 2975 degrees of freedom
## (14 observations deleted due to missingness)
## Multiple R-squared: 0.0661, Adjusted R-squared: 0.06296
## F-statistic: 21.06 on 10 and 2975 DF, p-value: < 2.2e-16
##
## Call:
## glm.nb(formula = shareCount ~ transcript.Valence + transcript.Arousal +
## authorMeta.fans + diggCount + commentCount + videoMeta.duration +
## videoMeta.definition + authorMeta.verified + offset(log(playCount)),
## data = data, init.theta = 0.6170287704, link = log)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -8.072e+00 1.302e+00 -6.201 5.62e-10 ***
## transcript.Valence 2.334e-02 4.373e-02 0.534 0.59356
## transcript.Arousal 7.437e-02 1.787e-02 4.162 3.16e-05 ***
## authorMeta.fans -6.598e-08 1.081e-08 -6.105 1.03e-09 ***
## diggCount 9.860e-07 1.910e-07 5.163 2.43e-07 ***
## commentCount 2.355e-04 2.076e-05 11.343 < 2e-16 ***
## videoMeta.duration -1.455e-03 2.323e-04 -6.266 3.71e-10 ***
## videoMeta.definition540p 1.291e+00 1.298e+00 0.994 0.31998
## videoMeta.definition720p 1.965e+00 1.331e+00 1.476 0.13984
## authorMeta.verifiedTRUE 2.328e-01 7.782e-02 2.991 0.00278 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Negative Binomial(0.617) family taken to be 1)
##
## Null deviance: 3770.9 on 2985 degrees of freedom
## Residual deviance: 3482.6 on 2976 degrees of freedom
## (14 observations deleted due to missingness)
## AIC: 31907
##
## Number of Fisher Scoring iterations: 1
##
##
## Theta: 0.6170
## Std. Err.: 0.0149
##
## 2 x log-likelihood: -31884.6640