Data

## [1] "Number of creators with more than 20 videos: 977"

Mainstream New Plan 09/06

We look at two ways of defining mainstream

  1. Defining mainstream using the mean of the top creators’ videos (across creators) ☑️
  2. Defining mainstream using the mean of the SD of each top creators’ videos ☑️

Since we use the top creators’ videos to define mainstream, there are a few ways of doing this

I. Use all of the videos of all the top creators ☑️ II. Use all of the videos after a creator reaches 250K ☑️ III. Use the most recent three months videos ☑️

For each way, we define three distance measures

  1. Absolute distance between creators (nonmainstream / all) First 10 Videos [mean / SD corresponding to item a and b above] with the mainstream
  2. Absolute distance between creators (nonmainstream / all) Last 10 Videos [mean / SD corresponding to item a and b above] with the mainstream
  3. Difference between item 1 and item 2

Note: We filter out all creators that do not have more than 20 videos in total.

We also test on two set of outcomes.

  1. outcome: all creators have follower range between 50k and 100k. If a creator ever reaches 100k, we use the first date they hit 100k as period 2 date and extract the number of followers as period 2 followers, and use the follower count 3 month before as period 1 followers.
  2. outcome_250k: all creators have follower range between 50k and 250k. Similar logic as above except extending the upper bound from 100k to 250k.

Note: One potential caveat would be the period 1 outcome for creators who ever hit 100k/250k would always be lower than period 2 outcome.

Goal: Investigate how the three measures correlate with the 6 constructed outcomes.

We have a combintorial exercise:

  1. Top videos to use (top_allvideos, top_250kvideos, top_recentvideos)
  2. All creators vs Nontop creators
  3. Outcome vs Outcome 250k

We also split a 80/20 training set and only use the training set to get correlation.

Investigate interesting results between nontop and all creators

Decomposing Correlation

Nontop creators

## [1] "Covariance: 0.023"
## [1] "SD outcome (perc diff): 0.2946"
## [1] "Distance sum: 3.4902"

Top creators

## [1] "Covariance: -1.6444"
## [1] "SD outcome (perc diff): 3.5668"
## [1] "Distance sum: 3.5564"

Overall

## [1] "Covariance: -0.9406"
## [1] "SD outcome (perc diff): 2.5999"
## [1] "Distance sum: 3.529"

Also note: one potential caveat about the construction of outcome would be the period 1 outcome for creators who ever hit 100k/250k would always be lower than period 2 outcome. - do not expect to make that much difference but could potential try their last hit date instead of first hit date as outcome

Check Covariates Balance

Also check for SD

Nontop creators

## [1] "Covariance: 0.0151"
## [1] "SD outcome (perc diff): 0.2946"
## [1] "Distance sum: 2.6797"

Top creators

## [1] "Covariance: -0.5343"
## [1] "SD outcome (perc diff): 3.5668"
## [1] "Distance sum: 2.7082"

Overall

## [1] "Covariance: -0.2397"
## [1] "SD outcome (perc diff): 2.5999"
## [1] "Distance sum: 2.6928"

Also note: one potential caveat about the construction of outcome would be the period 1 outcome for creators who ever hit 100k/250k would always be lower than period 2 outcome. - do not expect to make that much difference but could potential try their last hit date instead of first hit date as outcome