Mainstream New Plan 09/06
We look at two ways of defining mainstream
- Defining mainstream using the mean of the top creators’ videos (across creators) ☑️
- Defining mainstream using the mean of the SD of each top creators’ videos ☑️
Since we use the top creators’ videos to define mainstream, there are a few ways of doing this
I. Use all of the videos of all the top creators ☑️ II. Use all of the videos after a creator reaches 250K ☑️ III. Use the most recent three months videos ☑️
For each way, we define three distance measures
- Absolute distance between creators (nonmainstream / all) First 10 Videos [mean / SD corresponding to item a and b above] with the mainstream
- Absolute distance between creators (nonmainstream / all) Last 10 Videos [mean / SD corresponding to item a and b above] with the mainstream
- Difference between item 1 and item 2
Note: We filter out all creators that do not have more than 20 videos in total.
We also test on two set of outcomes.
outcome: all creators have follower range between 50k and 100k. If a creator ever reaches 100k, we use the first date they hit 100k as period 2 date and extract the number of followers as period 2 followers, and use the follower count 3 month before as period 1 followers.outcome_250k: all creators have follower range between 50k and 250k. Similar logic as above except extending the upper bound from 100k to 250k.
Note: One potential caveat would be the period 1 outcome for creators who ever hit 100k/250k would always be lower than period 2 outcome.
Goal: Investigate how the three measures correlate with the 6 constructed outcomes.
We have a combintorial exercise:
- Top videos to use (
top_allvideos,top_250kvideos,top_recentvideos) - All creators vs Nontop creators
- Outcome vs Outcome 250k
We also split a 80/20 training set and only use the training set to get correlation.