Introduction

Our task has been assigned by Canal+ and its objective is, broadly speaking, to propose an optimal, cost-effective algorithm for scheduling TV programs. A few useful measures are utilized in the task and they are as follows.

a <- c("AMR", "total AMR","SHR")
b <- c("Average Minute Rating", "Total Average Minute Rating","Audience Share")
c <- c("% of target group watching the program", "% of target group watching TV", "% of people watching the program out of people watching TV at that time")
df <- data.frame(a,b,c)
colnames(df) <- c("Measure", "Full Name","Description")
df %>%
  kbl() %>%
  kable_styling()
Measure Full Name Description
AMR Average Minute Rating % of target group watching the program
total AMR Total Average Minute Rating % of target group watching TV
SHR Audience Share % of people watching the program out of people watching TV at that time

AMR is captured minute by minute and and average over the program duration is calculated.

The model given by Canal+ can be summarized as:

\[ \widehat{AMR}(c)=f(metadata(c),performance\_history(c),schedule\_slot(c)) \] and our objective is to maximize viewership, i.e. select \(schedule\_slot(c)\) that maximizes:

\[ \sum_{c=C} \widehat{AMR}(c) \] where \(C\) is a set of available contents \(c\). The problem is to find an algorithm which is faster than simple \(N!\) selection.

Solution

Our proposed solution is to use the ML Model to predict the SHR and put content with highest values of Predicted SHR into the best slot. It is important to state that this solution is a framework that does not specify all details, but provides a general idea on how to proceed. The details should be specified after consultations with SMEs and getting real-life data for validation. This framework consist of 4 steps:

  1. Find the best ML Model for Predicting SHR based on past content metdata

    1.1. First, all available content metadata should be gathered. Metadata should not contain missing observations in any column, since these are hard to be modelled - in case there are some missing observation, consider removing this column or cautiosly choose a data imputation method. One variable that should always be included in the model is time - the hour within the week when the content was presented.

    1.2. Then, ML models that will be validated should be chosen. We suggest to use a variety of models in order to be able to properly check different types of relationship between the dependent and independent variables (linear - e.g. Linear Regression, SVR with linear kernel, non-linear - e.g. SVR with sigmoid kernel, Random Forest). After choosing the model types, we should choose the set of parameters that will be tested for each model (depends on the choice of models). Last, but not least, we should agree on a performance metric that will be used to evalute the performance of models - we suggest using RMSE as a standard, but after consulations with SMEs, other metric may be chosen.

    1.3. Then, K-fold stratified validation should be conducted - we suggest to run it at least 10 times (but the more the better/safer) for each model with a 50/50 sample split stratified by Predicted SHR (stratification may be achieved by dividing the target variable into bins). Then, we should calculate the average of chosen performance metrics and see which model provided the best results. Depending on the amount of models variations and inital results, we may choose to select a few best performing variations and run the validation again, testing each variation more times than before. This should allow us to find a model and its paramteres that has the best fit for our metadata.

  2. Schedule content that has to be shown live

    2.1. Some content types are always shown live - for example news. If there is any such content - schedule it first to see which slots are still available for other content.

  3. Select slot(s) with the highest Total AMR, calculate Predicted SHR, and put content with highest values into that slot. Repeat this step until all content is schedulded.

    3.1. First, use historical data (consult with SMEs to select a meaningful sample - e.g. last year) to calculate the average Total AMR for each hour of the week. It should be calculated for different time windows, depending on the duration of content that you are presenting.

    3.2. Select slot(s) that have the highest Total AMR for each time window. Then combine content’s metdata with the time from the slot (in some cases the time might be different, depending on the content’s duration) and run the ML Model to obtain Predicted SHR for all content.

    3.3. Select content with highest Predicted SHR and put in into selected slot. This way, AMR is maximixed - content with the highest SHR is put into slots with highest Total AMR, resulting in maximization of our content’s AMR.

    3.4. Repeat steps 3.2 and 3.3. until all content is schedulded. The slots which are left should be controlled so that the content that is left can be schedulded (e.g. there might be a 2h content to schedule but only 1h slots left). In such case, longer content should be scheduled first even though its Predicted SHR might be lower.

Example

This examples illustrates how our algorithm would work in a simplified form - since real-world data was not available, we did not implement all features as the results was not meaningful.

Data

First, we had to set-up a data generating process. We generated 3 files - (i) content to schedule for next week and its metadata, (ii) historical total AMR & (iii) AMR and SHR of past content and its metadata,

Content to schedule for next week and its metadata

  1. There are 4 types of content - news, sport, movie and series
  2. Content of the same type always lasts the same - news and series for 1 hour; sport and movie for 2 hours.
  3. News are played 14 times a week (14h), sport 5 times (15h), movies 30 times (60h) and series 84 times (84h).
  4. News and sport are played live and not included in the ML model, and therefore most metadata is available only for movie and series.
    • News are played daily at 9am and 7pm
    • Sport starts randomly throughout the week, but never at the same time as news
  5. Movie and series have several attributes included in the metadata - below is their desciprion and distribution
    • character: uniform distrbution - action (20%), comedy (20%), documentary (15%), thriller (15%), romance(10%), drama (10%), horror (5%), fantasy (5%)
    • production decade: normal distrbution with u = 2005 and sd = 10 (bounded by range between 1950 and 2020)
    • oscar: uniform distrbution - 1 (5%), 0(95%) [available only for movies]
    • other awards: uniform distrbution - 1 (20%), 0 (80%)
    • imbd score: normal distrbution with u = 6 and sd = 1.5 (bounded by range between 0 and 100)

Code used to generate content to schedule data is below (hidden by default).

# NEWS
start_time_news <- sort(c(seq(9, 168, 24), seq(19, 168, 24)))

metadata_news <- data.frame(
  title = paste(rep('News', 14), seq(1, 14, 1)),
  content_type = rep('news', 14),
  duration = rep(1, 14),
  start_time = start_time_news,
  character = rep('news', 14),
  production_decade = rep('2020', 14),
  oscar = rep(NA, 14),
  other_awards = rep(NA, 14),
  imbd_score = rep(NA, 14),
  stringsAsFactors = F
)


# SPORT
set.seed(456100)
start_time_sport <- sort(round(runif(5, min = 0, max = 168)))
for (time in start_time_sport) {
  if (is.element(time, start_time_news)) {
    start_time_sport[start_time_sport==time] <- time + 1
  } else if (is.element(time+1, start_time_news)) {
    start_time_sport[start_time_sport==time] <- time + 2
  }
}

set.seed(456101)
metadata_sport <- data.frame(
  title = paste(rep('Sport', 5), seq(1, 5, 1)),
  content_type = rep('sport', 5),
  duration = rep(2, 5),
  start_time = start_time_sport,
  character = rep('sport', 5),
  production_decade = rep('2020', 5),
  oscar = rep(NA, 5),
  other_awards = rep(NA, 5),
  imbd_score = rep(NA, 5),
  stringsAsFactors = F
)


# MOVIE
set.seed(456102)
character_movie <- runif(30, 0, 100)
character_movie <- dplyr::case_when(
  character_movie 80 ~ 'action',
  character_movie 60 & character_movie <= 80 ~ 'comedy',
  character_movie 45 & character_movie <= 60 ~ 'documentary',
  character_movie 30 & character_movie <= 45 ~ 'thriller',
  character_movie 20 & character_movie <= 30 ~ 'romance',
  character_movie 10 & character_movie <= 20 ~ 'drama',
  character_movie 5 & character_movie <= 10 ~ 'horror',
  character_movie <= 5 ~ 'fantasy'
)

set.seed(456103)
production_decade_movie <- round(rnorm(30, 2000, 10)/10) * 10
production_decade_movie <- ifelse(production_decade_movie<1950, 1950, ifelse(production_decade_movie2020, 2020, production_decade_movie))

set.seed(456104)
oscar_movie <- runif(30, 0, 100)
oscar_movie <- ifelse(oscar_movie<=5, 1, 0)

set.seed(456105)
other_awards_movie <- runif(30, 0, 100)
other_awards_movie <- ifelse(other_awards_movie<=20, 1, 0)

set.seed(456106)
imbd_score_movie <- round(rnorm(30, 6, 2)*10)
imbd_score_movie <- ifelse(imbd_score_movie<0, 0, ifelse(imbd_score_movie100, 100, imbd_score_movie))

metadata_movie <- data.frame(
  title = paste(rep('Movie', 30), seq(1, 30, 1)),
  content_type = rep('movie', 30),
  duration = rep(2, 30),
  start_time = rep(NA, 30),
  character = character_movie,
  production_decade = production_decade_movie,
  oscar = oscar_movie,
  other_awards = other_awards_movie,
  imbd_score = imbd_score_movie,
  stringsAsFactors = F
)


# SERIES
set.seed(456107)
character_series <- runif(84, 0, 100)
character_series <- dplyr::case_when(
  character_series 80 ~ 'action',
  character_series 60 & character_series <= 80 ~ 'comedy',
  character_series 45 & character_series <= 60 ~ 'documentary',
  character_series 30 & character_series <= 45 ~ 'thriller',
  character_series 20 & character_series <= 30 ~ 'romance',
  character_series 10 & character_series <= 20 ~ 'drama',
  character_series 5 & character_series <= 10 ~ 'horror',
  character_series <= 5 ~ 'fantasy'
)

set.seed(456108)
production_decade_series <- round(rnorm(84, 2005, 10)/10) * 10
production_decade_series <- ifelse(production_decade_series<1950, 1950, ifelse(production_decade_series2020, 2020, production_decade_series))

set.seed(456109)
other_awards_series <- runif(84, 0, 100)
other_awards_series <- ifelse(other_awards_series<=20, 1, 0)

set.seed(456110)
imbd_score_series <- round(rnorm(84, 6, 2)*10)
imbd_score_series <- ifelse(imbd_score_series<0, 0, ifelse(imbd_score_series100, 100, imbd_score_series))

metadata_series <- data.frame(
  title = paste(rep('Series', 84), seq(1, 84, 1)),
  content_type = rep('series', 84),
  duration = rep(1, 84),
  start_time = rep(NA, 84),
  character = character_series,
  production_decade = production_decade_series,
  oscar = rep(0, 84),
  other_awards = other_awards_series,
  imbd_score = imbd_score_series,
  stringsAsFactors = F
)


# ALL
metadata <- rbind(
  metadata_news,
  metadata_sport,
  metadata_movie,
  metadata_series
)

Historical total AMR

  1. Total AMR was generated for 52 weeks for each hour of the week.
  2. Parameters of normal distribution were dependent on the time of the week
    • from 11pm to 5am - u = 5 and sd = 5
    • at 10pm and 6am - u = 10 and sd = 5
    • from 7am to 9am and from 7pm to 9pm at weekdays or from 7am to 9pm at weekends - u = 25 and sd = 10
    • from 10am to 6pm at weekdays - u = 15 and sd = 10
  3. Total AMR was averaged for each hour of the week and calculated in 1 and 2 hour window (2h was an average of both included 1h windows)

Code used to generate content to schedule data is below (hidden by default).

total_amr <- data.frame(
  week = rep(1:52, each = 168),
  time = rep(0:167, times = 52),
  total_amr = rep(0, times = 52 * 168)
)

for (week in 1:52) {
  for (time in 0:167){
    if (time %in% c(23, 0, 1, 2, 3, 4, 5)) {
      amr_vector <- rnorm(60, 5, 5)
    } else if (time %in% c(22, 6)) {
      amr_vector <- rnorm(60, 10, 5)
    } else if (time %in% c(19, 20, 21, 22, 7, 8, 9) | i 120) {
      amr_vector <- rnorm(60, 25, 10)
    } else {
      amr_vector <- rnorm(60, 15, 10)
    }
    
    amr_vector <- ifelse(mean(amr_vector)<1, 1, ifelse(mean(amr_vector)70, 70, mean(amr_vector)))
    
    total_amr[total_amr$week == week & total_amr$time == time, 'total_amr'] = amr_vector
  }
}

total_amr <- total_amr %%
  group_by(time) %%
  summarise(
    total_amr = mean(total_amr)
  ) %%
  mutate(
    total_amr_2h = coalesce((total_amr + lead(total_amr)) / 2, 0)
  ) %%
  as.data.frame()

AMR and SHR of past content and its metadata

  1. This data was generated only for movies & series since always this types will be included in the ML model.
  2. Available metadata and its distribution is same as for the content to schedule, additional attributes are:
    • start_time - uniform distribution between 0 and 167
    • amr - normal distribution with parameters dependent on the start_time
      • from 11pm to 5am - u = 1 and sd = 1
      • at 10pm and 6am - u = 2.5 and sd = 1.5
      • from 7am to 9am and from 7pm to 9pm at weekdays or from 7am to 9pm at weekends - u = 7.5 and sd = 5
      • from 10am to 6pm at weekdays - u = 5 and sd = 2.5
    • shr - calculated as amr/total_amr using data from historical total AMR
  3. 30,000 observations ere drawn in total - 10,000 for movies and 20,000 for series

Code used to generate content to schedule data is below (hidden by default).

# MOVIE
set.seed(456101)
character_movie <- runif(10000, 0, 100)
character_movie <- case_when(
  character_movie 80 ~ 'action',
  character_movie 60 & character_movie <= 80 ~ 'comedy',
  character_movie 45 & character_movie <= 60 ~ 'documentary',
  character_movie 30 & character_movie <= 45 ~ 'thriller',
  character_movie 20 & character_movie <= 30 ~ 'romance',
  character_movie 10 & character_movie <= 20 ~ 'drama',
  character_movie 5 & character_movie <= 10 ~ 'horror',
  character_movie <= 5 ~ 'fantasy'
)

set.seed(456102)
production_decade_movie <- round(rnorm(10000, 2000, 10)/10) * 10
production_decade_movie <- ifelse(production_decade_movie<1950, 1950, ifelse(production_decade_movie2020, 2020, production_decade_movie))

set.seed(456103)
oscar_movie <- runif(10000, 0, 100)
oscar_movie <- ifelse(oscar_movie<=5, 1, 0)

set.seed(456104)
other_awards_movie <- runif(10000, 0, 100)
other_awards_movie <- ifelse(other_awards_movie<=20, 1, 0)

set.seed(456105)
imbd_score_movie <- round(rnorm(10000, 6, 2)*10)
imbd_score_movie <- ifelse(imbd_score_movie<0, 0, ifelse(imbd_score_movie100, 100, imbd_score_movie))

set.seed(456106)
start_time_movie <- round(runif(10000, 0, 166))

set.seed(456107)
amr_movie_vector <- c()

for (time in start_time_movie){
  
  time <- time%%24
  
  if (time %in% c(23, 0, 1, 2, 3, 4, 5)) {
    amr_vector <- rnorm(120, 1, 1)
  } else if (time %in% c(22, 6)) {
    amr_vector <- rnorm(120, 2.5, 1.5)
  } else if (time %in% c(19, 20, 21, 22, 7, 8, 9) | i 120) {
    amr_vector <- rnorm(120, 7.5, 5)
  } else {
    amr_vector <- rnorm(120, 5, 2.5)
  }
  
  amr_vector <- mean(amr_vector)
  amr_vector <- ifelse(amr_vector < 0.1, 0.1, ifelse(amr_vector 10, 10, amr_vector))
  
  amr_movie_vector <- append(amr_movie_vector, amr_vector)
}

amr_movie <- data.frame(
  title = paste(rep('Movie S', 10000), seq(1, 10000, 1)),
  content_type = rep('movie', 10000),
  duration = rep(2, 10000),
  start_time = start_time_movie,
  character = character_movie,
  production_decade = production_decade_movie,
  oscar = oscar_movie,
  other_awards = other_awards_movie,
  imbd_score = imbd_score_movie,
  amr = amr_movie_vector,
  stringsAsFactors = F
)


# SERIES
set.seed(456107)
character_series <- runif(20000, 0, 100)
character_series <- case_when(
  character_movie 80 ~ 'action',
  character_movie 60 & character_movie <= 80 ~ 'comedy',
  character_movie 45 & character_movie <= 60 ~ 'documentary',
  character_movie 30 & character_movie <= 45 ~ 'thriller',
  character_movie 20 & character_movie <= 30 ~ 'romance',
  character_movie 10 & character_movie <= 20 ~ 'drama',
  character_movie 5 & character_movie <= 10 ~ 'horror',
  character_movie <= 5 ~ 'fantasy'
)

set.seed(456108)
production_decade_series <- round(rnorm(20000, 2005, 10)/10) * 10
production_decade_series <- ifelse(production_decade_series<1950, 1950, ifelse(production_decade_series2020, 2020, production_decade_series))

set.seed(456109)
other_awards_series <- runif(20000, 0, 100)
other_awards_series <- ifelse(other_awards_series<=20, 1, 0)

set.seed(456110)
imbd_score_series <- round(rnorm(20000, 6, 2)*10)
imbd_score_series <- ifelse(imbd_score_series<0, 0, ifelse(imbd_score_series100, 100, imbd_score_series))

set.seed(456111)
start_time_series <- round(runif(20000, 0, 167))

set.seed(456112)
amr_series_vector <- c()

for (time in start_time_series){
  
  time <- time%%24
  
  if (time %in% c(23, 0, 1, 2, 3, 4, 5)) {
    amr_vector <- rnorm(60, 1, 1)
  } else if (time %in% c(22, 6)) {
    amr_vector <- rnorm(60, 2.5, 1.5)
  } else if (time %in% c(19, 20, 21, 7, 8, 9) | i 120) {
    amr_vector <- rnorm(60, 7.5, 5)
  } else {
    amr_vector <- rnorm(60, 5, 2.5)
  }
  
  amr_vector <- mean(amr_vector)
  amr_vector <- ifelse(amr_vector < 0.1, 0.1, ifelse(amr_vector 10, 10, amr_vector))
  
  amr_series_vector <- append(amr_series_vector, amr_vector)
}

amr_series <- data.frame(
  title = paste(rep('Series S', 20000), seq(1, 20000, 1)),
  content_type = rep('series', 20000),
  duration = rep(1, 20000),
  start_time = start_time_series,
  character = character_series,
  production_decade = production_decade_series,
  oscar = rep(0, 20000),
  other_awards = other_awards_series,
  imbd_score = imbd_score_series,
  amr = amr_series_vector,
  stringsAsFactors = F
)

amr <- rbind(
  amr_movie,
  amr_series
)

for (col in c('production_decade', 'oscar', 'other_awards')) {
  amr[, col] <- as.character(amr[, col])
}

amr_movie %%
  arrange(start_time)

amr <- amr %%
  merge(total_amr, by.x='start_time', by.y = 'time') %%
  mutate(
    amr_vs_total = ifelse(content_type == 'movie', amr / total_amr_2h, amr / total_amr)
  ) %%
  select(
    title,
    content_type,
    duration,
    start_time,
    character,
    production_decade,
    oscar,
    other_awards,
    imbd_score,
    amr,
    amr_vs_total
  )

Algorithm

We did not implement the K Stratified Validation since the data we use is random, and therefore the model do not have any predictive power. For our example, we have chosen to use Random Forest with 100 trees, 6 variables used at each split, minimum 50 observation in a terminal node, and without replacement. We trained the model with all past metadata.

First, we scheduled all news and sports. Then, we scheduled all movies and series consecutively using the results of RF model. Please note that not all features described in the proposed framework were implemented in the exemplary algorithm since the data was random and the final result would not be meaningful anyway. Nevertheless, we have the know-how to implement all the rules, but that was not the main purpose of our task.

Code used to generate content to schedule data is below (hidden by default).

form <- amr_vs_total ~ content_type + character + production_decade + oscar + other_awards + imbd_score

set.seed(456100)
rf <- randomForest(form, data=amr, ntree = 100, mtry = 6, replace = F, nodesize = 50)

schedule <- data.frame(
  title = character(),
  content_type = character(),
  start_time = integer(),
  end_time = integer()
)

for (t in metadata[metadata$content_type %in% c('news', 'sport'), 'title']) {
  
  obs <- metadata %%
    filter(title == t) %%
    mutate(
      end_time = start_time + duration
    ) %%
    select(
      title,
      content_type,
      character,
      start_time,
      end_time
    )

  schedule <- rbind(schedule, obs)
}

free_slots <- data.frame(
  time = 0:167,
  slot_1h = rep(0, 168),
  slot_2h = rep(0, 168)
)

free_slots[168, 'slot_2h'] <- 1

free_slots <- free_slots %%
  merge(schedule, all.x=T, by.x='time', by.y='start_time') %%
  mutate(
    slot_1h = case_when(
      content_type %in% c('news', 'sport') ~ 1,
      lag(content_type, 1) == 'sport' ~ 1,
      T ~ 0
    ),
    slot_2h = case_when(
      slot_1h == 1 ~ 1,
      lead(slot_1h) == 1 ~ 1,
      T ~ 0
    )
  ) %%
  merge(total_amr, by='time')

metadata_schedule <- metadata[metadata$content_type %in% c('movie', 'series'),]

for (col in c('production_decade', 'oscar', 'other_awards')) {
  metadata_schedule[, col] <- as.character(metadata_schedule[, col])
}

metadata_schedule$amr_vs_total_pred <- predict(rf, metadata_schedule)

metadata_schedule$planned <- 0

metadata_schedule <- metadata_schedule %%
  arrange(desc(amr_vs_total_pred))


for (i in 1:nrow(metadata_schedule)) {
  
  obs <- metadata_schedule[i,]
  
  if (obs$duration == 1) {
    best_slot <- which(free_slots$total_amr == max(free_slots[free_slots$slot_1h == 0, 'total_amr']))
  } else if (obs$duration == 2) {
    best_slot <- which(free_slots$total_amr_2h == max(free_slots[free_slots$slot_2h == 0, 'total_amr_2h']))
  }
  
  print(i)
  print(best_slot)
  print('')
  
  free_slots[best_slot, 'title'] <- obs$title
  free_slots[best_slot, 'content_type'] <- obs$content_type
  free_slots[best_slot, 'character'] <- obs$character
  free_slots[best_slot, 'end_time'] <- free_slots[best_slot, 'time'] + obs$duration
  
  free_slots[best_slot, 'slot_1h'] <- 1
  free_slots[best_slot, 'slot_2h'] <- 1
  free_slots[best_slot - 1, 'slot_2h'] <- 1
  
  if (obs$duration == 2) {
    free_slots[best_slot + 1, 'slot_1h'] <- 1
    free_slots[best_slot + 1, 'slot_2h'] <- 1
  }
  
  metadata_schedule[i, 'planned'] <- 1
}

Plot

Out last step was preparation of a graphic schedule for the next week. We have tweaked the data to be fit into a ggplot figure.

Code used to generate content to schedule data is below (hidden by default).

# create 2 rows if show spans over multiple days
for (i in which(free_slots$content_type %in% c('sport', 'movie') & free_slots$time%%24 == 23)) {
  
  free_slots[i, 'end_time'] <- free_slots[i, 'end_time'] - 1
  
  free_slots[i+1, 'title'] <- paste(free_slots[i, 'title'], 'cont.')
  free_slots[i+1, 'content_type'] <- free_slots[i, 'content_type']
  free_slots[i+1, 'character'] <- free_slots[i, 'character']
  free_slots[i+1, 'end_time'] <- free_slots[i, 'end_time'] + 1
}

plot_data <- free_slots %%
  mutate(
    day_start = floor(time/24),
    day_end = day_start + 1,
    day = (day_start + day_end) / 2,
    hour_start = time%%24,
    hour_end = ifelse(end_time%%24==0, 24, end_time%%24),
    hour_title = ifelse(content_type %in% c('series', 'movie'), (hour_start + hour_end) / 2 - 0.2, (hour_start + hour_end) / 2),
    hour_character = (hour_start + hour_end) / 2 + 0.2
  ) %%
  filter(title != '<NA')
  

png('schedule.png', width = 3508, height = 2480, res=300)
ggplot(plot_data) +
  geom_rect(
    aes(xmin = day_start, xmax = day_end, ymin = hour_start, ymax = hour_end, fill = content_type),
    alpha = 0.5, color = 'black') +
  geom_text(
    aes(x = day, y = hour_title, label = title),
    size = 3, color = 'black', fontface = 'bold') +
  geom_text(
    data = plot_data[plot_data$content_type  %in% c('series', 'movie'),], 
    aes(x = day, y = hour_character, label = paste('(', character, ')', sep='')),
    size = 2, color = 'black') +
  scale_y_reverse(breaks = seq(0, 23, 2),
                  labels = paste(ifelse(nchar(seq(0, 23, 2)) == 1, paste(0, seq(0, 23, 2), sep=''), seq(0, 23, 2)), ':00', sep='')) +
  scale_x_continuous(breaks = 0:6 + 0.5,
                     labels = c('Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday', 'Saturday', 'Sunday'),
                     position = "top") +
  scale_fill_viridis(discrete=T) +
  theme_classic() +
  theme(
    axis.line = element_blank(),
    axis.ticks = element_blank(),
    axis.title = element_blank(),
    axis.text.x = element_text(size = 14, face = 'bold', color = 'black'),
    axis.text.y = element_text(size = 11, face = 'bold', color = 'black'),
    plot.margin = unit(c(1,1,1,1), "cm"),
    panel.spacing = element_blank(),
    legend.position = 'none'
  )
dev.off()

Summary

Key Benefits

We argue that the proposed solution is optimal and efficient for several reasons.

  1. It is time- and cost-effective. The ML algorithm needs to be trained only once, and then, it can be applied to new data (e.g. each new week) quickly and seamlessly.

  2. It is easily adaptable to new data. While being trained on one TV channel, the algorithm can be applied to data from another channel without effort.

  3. Our solution captures viewing habits due to the inclusion of the time a person was watching a specific content.

  4. New features may be easily incorporated. Currently our metadata is limited to 11 features. However, new features can be added after consulting an SME.

Next Steps

  1. Work with SMEs to identify which features may be added to the metadata and establish a set of rules that the algorithm should follow (e.g. time frames for kids’ content).

  2. Adjust the algorithm to work with other possible content types (multiple ML models may be used).

  3. Use real-world data to run the algorithm and evaluate its performance.