# Load necessary library
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0     ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1     ✔ tibble  3.2.1
## ✔ purrr   1.0.2     ✔ tidyr   1.3.1
## ✔ readr   2.1.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tsibble)
## Registered S3 method overwritten by 'tsibble':
##   method               from 
##   as_tibble.grouped_df dplyr
## 
## Attaching package: 'tsibble'
## 
## The following object is masked from 'package:lubridate':
## 
##     interval
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, union
library(fable)
## Loading required package: fabletools
library(fabletools)
library(feasts)

data <- read.csv("~/Documents/STAT 2024/udemy_courses.csv")

data$published_date <- as.Date(data$published_timestamp, format = "%Y-%m-%dT%H:%M:%SZ")

head(data$published_date)
## [1] "2017-01-18" "2017-03-09" "2016-12-19" "2017-05-30" "2016-12-13"
## [6] "2014-05-02"

The published_timestamp column contains timestamps in a detailed format such as “YYYY-MM-DDTHH” (including date and time). The code converts this into a simpler format showing only the date (YYYY-MM-DD) using the as.Date() function, making it more convenient for analysis.

time_series_data <- data %>%
  select(published_date = published_timestamp, num_subscribers)

time_series_data$published_date <- as.Date(time_series_data$published_date, format = "%Y-%m-%dT%H:%M:%SZ")

time_series_data <- time_series_data %>%
  arrange(published_date)

head(time_series_data)

I select two important columns from the dataset: published_timestamp (renamed as published_date) and num_subscribers, which are needed for time-series analysis. The published_date column is converted into a simple date format (YYYY-MM-DD) to make it consistent and easier to work with. Then, the data is sorted by the published_date column in order from earliest to latest, which is useful for time-series analysis. Finally, the first few rows of the updated data are shown using the head() function to check the results.

duplicates <- time_series_data %>%
  count(published_date) %>%
  filter(n > 1)

print(duplicates)
##     published_date  n
## 1       2012-06-18  2
## 2       2012-10-31  3
## 3       2012-11-26  2
## 4       2013-01-03  2
## 5       2013-02-16  2
## 6       2013-02-18  2
## 7       2013-02-24  2
## 8       2013-02-25  2
## 9       2013-02-27  2
## 10      2013-03-02  2
## 11      2013-03-05  2
## 12      2013-03-08  2
## 13      2013-04-17  2
## 14      2013-05-13  2
## 15      2013-05-29  5
## 16      2013-06-07  3
## 17      2013-06-09  2
## 18      2013-06-10  2
## 19      2013-07-09  2
## 20      2013-07-19  2
## 21      2013-07-23  6
## 22      2013-08-05  5
## 23      2013-08-12  9
## 24      2013-08-21  2
## 25      2013-09-03  2
## 26      2013-09-24  2
## 27      2013-09-25  2
## 28      2013-09-29  2
## 29      2013-10-02  2
## 30      2013-10-09  2
## 31      2013-10-10  2
## 32      2013-10-16  6
## 33      2013-10-20  2
## 34      2013-10-23  2
## 35      2013-11-01  2
## 36      2013-11-09  2
## 37      2013-11-18  2
## 38      2013-11-26  3
## 39      2013-11-28  2
## 40      2013-12-04  2
## 41      2013-12-12  2
## 42      2013-12-13  3
## 43      2014-01-14  2
## 44      2014-01-16  4
## 45      2014-01-21  2
## 46      2014-01-27  3
## 47      2014-01-28  2
## 48      2014-01-30  2
## 49      2014-01-31  2
## 50      2014-02-10  2
## 51      2014-02-11  3
## 52      2014-02-12  3
## 53      2014-02-13  2
## 54      2014-02-19  2
## 55      2014-02-25  2
## 56      2014-03-01  2
## 57      2014-03-03  2
## 58      2014-03-04  2
## 59      2014-03-06  2
## 60      2014-03-07  2
## 61      2014-03-10  2
## 62      2014-03-11  8
## 63      2014-03-12  4
## 64      2014-03-13  2
## 65      2014-03-16  2
## 66      2014-03-17  3
## 67      2014-03-21  3
## 68      2014-03-24  2
## 69      2014-03-26  2
## 70      2014-03-27  2
## 71      2014-03-31  3
## 72      2014-04-01  4
## 73      2014-04-04  4
## 74      2014-04-09  2
## 75      2014-04-14  3
## 76      2014-04-15  5
## 77      2014-04-16  2
## 78      2014-04-21  2
## 79      2014-04-23  2
## 80      2014-04-24  2
## 81      2014-04-27  2
## 82      2014-04-29  3
## 83      2014-04-30  3
## 84      2014-05-01  2
## 85      2014-05-05  3
## 86      2014-05-07  7
## 87      2014-05-08  2
## 88      2014-05-09  7
## 89      2014-05-16  5
## 90      2014-05-17  3
## 91      2014-05-19 11
## 92      2014-05-28  2
## 93      2014-06-02  2
## 94      2014-06-16  2
## 95      2014-06-17  5
## 96      2014-06-23  4
## 97      2014-06-25  2
## 98      2014-06-27  2
## 99      2014-06-30  5
## 100     2014-07-04  2
## 101     2014-07-05  2
## 102     2014-07-17  3
## 103     2014-07-19  3
## 104     2014-07-24  2
## 105     2014-08-07  2
## 106     2014-08-10  2
## 107     2014-08-13  2
## 108     2014-08-16  2
## 109     2014-08-18  4
## 110     2014-08-21  2
## 111     2014-08-23  3
## 112     2014-08-28  3
## 113     2014-08-29  2
## 114     2014-09-01  2
## 115     2014-09-02  2
## 116     2014-09-05  2
## 117     2014-09-09  2
## 118     2014-09-14  2
## 119     2014-09-19  4
## 120     2014-09-22  9
## 121     2014-10-01  4
## 122     2014-10-02  3
## 123     2014-10-03  2
## 124     2014-10-06  2
## 125     2014-10-08  4
## 126     2014-10-12  2
## 127     2014-10-16  3
## 128     2014-10-20  2
## 129     2014-10-21  3
## 130     2014-10-23  2
## 131     2014-10-24  3
## 132     2014-10-25  2
## 133     2014-10-27  2
## 134     2014-10-28  2
## 135     2014-10-29  5
## 136     2014-11-04  3
## 137     2014-11-05  3
## 138     2014-11-06  2
## 139     2014-11-08  4
## 140     2014-11-11  2
## 141     2014-11-12  2
## 142     2014-11-18  4
## 143     2014-11-19  2
## 144     2014-11-20  2
## 145     2014-11-21  2
## 146     2014-11-24  3
## 147     2014-11-25  4
## 148     2014-11-28  4
## 149     2014-11-30  3
## 150     2014-12-03  2
## 151     2014-12-06  3
## 152     2014-12-08  2
## 153     2014-12-09  2
## 154     2014-12-10  3
## 155     2014-12-11  3
## 156     2014-12-12  2
## 157     2014-12-17  6
## 158     2014-12-18  4
## 159     2014-12-19  7
## 160     2014-12-22  3
## 161     2014-12-24  2
## 162     2014-12-27  2
## 163     2014-12-28  2
## 164     2014-12-29  4
## 165     2014-12-31  2
## 166     2015-01-01  5
## 167     2015-01-03  3
## 168     2015-01-04  2
## 169     2015-01-05  2
## 170     2015-01-07  3
## 171     2015-01-08  4
## 172     2015-01-09  5
## 173     2015-01-12  2
## 174     2015-01-13  3
## 175     2015-01-14  2
## 176     2015-01-15  2
## 177     2015-01-16  2
## 178     2015-01-17  2
## 179     2015-01-20  3
## 180     2015-01-21  2
## 181     2015-01-22  6
## 182     2015-01-23  2
## 183     2015-01-24  2
## 184     2015-01-26  3
## 185     2015-01-27  5
## 186     2015-01-29  4
## 187     2015-01-30  5
## 188     2015-01-31  4
## 189     2015-02-02  3
## 190     2015-02-05  3
## 191     2015-02-06  5
## 192     2015-02-07  3
## 193     2015-02-09  4
## 194     2015-02-12  3
## 195     2015-02-13  6
## 196     2015-02-17  4
## 197     2015-02-18  2
## 198     2015-02-19  4
## 199     2015-02-20  2
## 200     2015-02-21  2
## 201     2015-02-23  5
## 202     2015-02-24  2
## 203     2015-02-25  4
## 204     2015-02-26  4
## 205     2015-03-02  5
## 206     2015-03-03  3
## 207     2015-03-04  8
## 208     2015-03-05  5
## 209     2015-03-06  2
## 210     2015-03-08  4
## 211     2015-03-09  2
## 212     2015-03-10  2
## 213     2015-03-12  2
## 214     2015-03-13  2
## 215     2015-03-15  2
## 216     2015-03-16  2
## 217     2015-03-19  6
## 218     2015-03-22  3
## 219     2015-03-23  4
## 220     2015-03-24  5
## 221     2015-03-26  4
## 222     2015-03-29  4
## 223     2015-03-31  3
## 224     2015-04-02  2
## 225     2015-04-06  4
## 226     2015-04-08  5
## 227     2015-04-09  2
## 228     2015-04-10  4
## 229     2015-04-12  5
## 230     2015-04-13  7
## 231     2015-04-15  9
## 232     2015-04-16  6
## 233     2015-04-17  3
## 234     2015-04-20  8
## 235     2015-04-21  3
## 236     2015-04-22  2
## 237     2015-04-24  4
## 238     2015-04-26  5
## 239     2015-04-27  3
## 240     2015-04-28  3
## 241     2015-05-01  3
## 242     2015-05-04  2
## 243     2015-05-06  2
## 244     2015-05-08  2
## 245     2015-05-12  2
## 246     2015-05-13  2
## 247     2015-05-14  3
## 248     2015-05-15  3
## 249     2015-05-18  3
## 250     2015-05-19  2
## 251     2015-05-20  3
## 252     2015-05-21  2
## 253     2015-05-22  3
## 254     2015-05-25  5
## 255     2015-05-26  5
## 256     2015-05-27  2
## 257     2015-05-28  4
## 258     2015-06-01  7
## 259     2015-06-02  3
## 260     2015-06-03  5
## 261     2015-06-05  5
## 262     2015-06-07  2
## 263     2015-06-08  6
## 264     2015-06-09  4
## 265     2015-06-11  3
## 266     2015-06-12  5
## 267     2015-06-14  3
## 268     2015-06-16  6
## 269     2015-06-17  5
## 270     2015-06-19  7
## 271     2015-06-22  4
## 272     2015-06-23  3
## 273     2015-06-25  2
## 274     2015-06-26  2
## 275     2015-07-01  6
## 276     2015-07-02  3
## 277     2015-07-05  2
## 278     2015-07-06  3
## 279     2015-07-07  9
## 280     2015-07-08  5
## 281     2015-07-09  2
## 282     2015-07-10  6
## 283     2015-07-12  2
## 284     2015-07-13  2
## 285     2015-07-14  9
## 286     2015-07-15  2
## 287     2015-07-16  8
## 288     2015-07-17  3
## 289     2015-07-19  3
## 290     2015-07-20  6
## 291     2015-07-21  3
## 292     2015-07-22  6
## 293     2015-07-23  4
## 294     2015-07-24  3
## 295     2015-07-26  4
## 296     2015-07-27  5
## 297     2015-07-28  6
## 298     2015-07-29  3
## 299     2015-07-30  5
## 300     2015-08-03  4
## 301     2015-08-04  7
## 302     2015-08-05  3
## 303     2015-08-06  5
## 304     2015-08-07  4
## 305     2015-08-09  4
## 306     2015-08-10  3
## 307     2015-08-11  7
## 308     2015-08-12  2
## 309     2015-08-13  7
## 310     2015-08-14  5
## 311     2015-08-16  5
## 312     2015-08-17  4
## 313     2015-08-18  7
## 314     2015-08-19  2
## 315     2015-08-20  4
## 316     2015-08-21  6
## 317     2015-08-24  4
## 318     2015-08-25  2
## 319     2015-08-26  2
## 320     2015-08-27  4
## 321     2015-08-28  3
## 322     2015-08-31  4
## 323     2015-09-01  3
## 324     2015-09-02  2
## 325     2015-09-04  3
## 326     2015-09-07  6
## 327     2015-09-08  3
## 328     2015-09-10  4
## 329     2015-09-12  3
## 330     2015-09-13  3
## 331     2015-09-14  3
## 332     2015-09-15  7
## 333     2015-09-16  2
## 334     2015-09-17  4
## 335     2015-09-20  7
## 336     2015-09-21  4
## 337     2015-09-22  4
## 338     2015-09-23  8
## 339     2015-09-24  3
## 340     2015-09-25  4
## 341     2015-09-28  4
## 342     2015-09-29  6
## 343     2015-09-30  2
## 344     2015-10-01  3
## 345     2015-10-02  4
## 346     2015-10-04  3
## 347     2015-10-05  2
## 348     2015-10-06  2
## 349     2015-10-07  2
## 350     2015-10-08  5
## 351     2015-10-09  4
## 352     2015-10-12  4
## 353     2015-10-13  4
## 354     2015-10-14  7
## 355     2015-10-15  4
## 356     2015-10-16  4
## 357     2015-10-17  3
## 358     2015-10-18  2
## 359     2015-10-19  5
## 360     2015-10-20  2
## 361     2015-10-21  3
## 362     2015-10-22  5
## 363     2015-10-23  2
## 364     2015-10-25  3
## 365     2015-10-26  7
## 366     2015-10-28  6
## 367     2015-10-29  5
## 368     2015-10-30  3
## 369     2015-11-01  4
## 370     2015-11-02  6
## 371     2015-11-03  6
## 372     2015-11-04  2
## 373     2015-11-05  4
## 374     2015-11-06  7
## 375     2015-11-08  2
## 376     2015-11-09  7
## 377     2015-11-10  8
## 378     2015-11-11 11
## 379     2015-11-12  9
## 380     2015-11-13  9
## 381     2015-11-16  2
## 382     2015-11-17  5
## 383     2015-11-18  5
## 384     2015-11-19  3
## 385     2015-11-20  2
## 386     2015-11-21  2
## 387     2015-11-22  3
## 388     2015-11-23  6
## 389     2015-11-24  3
## 390     2015-11-25  3
## 391     2015-11-26  6
## 392     2015-11-27  2
## 393     2015-11-29  3
## 394     2015-11-30  3
## 395     2015-12-01  3
## 396     2015-12-03  3
## 397     2015-12-04  2
## 398     2015-12-06  3
## 399     2015-12-07  3
## 400     2015-12-08  2
## 401     2015-12-09  3
## 402     2015-12-10  3
## 403     2015-12-14  3
## 404     2015-12-15  4
## 405     2015-12-16  2
## 406     2015-12-17  7
## 407     2015-12-18  2
## 408     2015-12-21  2
## 409     2015-12-22  2
## 410     2015-12-28  2
## 411     2015-12-29 10
## 412     2015-12-30  5
## 413     2016-01-01  2
## 414     2016-01-03  2
## 415     2016-01-04  4
## 416     2016-01-05  2
## 417     2016-01-06  4
## 418     2016-01-07  3
## 419     2016-01-08  4
## 420     2016-01-10  5
## 421     2016-01-11  3
## 422     2016-01-12  5
## 423     2016-01-13  3
## 424     2016-01-14  5
## 425     2016-01-15  2
## 426     2016-01-18  7
## 427     2016-01-19  2
## 428     2016-01-21  3
## 429     2016-01-22  6
## 430     2016-01-24  6
## 431     2016-01-25  6
## 432     2016-01-27  7
## 433     2016-01-28  4
## 434     2016-01-29  6
## 435     2016-02-01  6
## 436     2016-02-02  7
## 437     2016-02-03  6
## 438     2016-02-04  4
## 439     2016-02-05  5
## 440     2016-02-07  2
## 441     2016-02-08  7
## 442     2016-02-09  2
## 443     2016-02-10  4
## 444     2016-02-11  5
## 445     2016-02-12  4
## 446     2016-02-13  5
## 447     2016-02-14  5
## 448     2016-02-15  6
## 449     2016-02-16  5
## 450     2016-02-17  6
## 451     2016-02-18  4
## 452     2016-02-19  4
## 453     2016-02-21  5
## 454     2016-02-22  5
## 455     2016-02-23  4
## 456     2016-02-24  4
## 457     2016-02-25  4
## 458     2016-02-26  5
## 459     2016-02-29  6
## 460     2016-03-01  4
## 461     2016-03-02  4
## 462     2016-03-03  7
## 463     2016-03-04  4
## 464     2016-03-06  3
## 465     2016-03-07  2
## 466     2016-03-08  8
## 467     2016-03-09  2
## 468     2016-03-10  7
## 469     2016-03-11  4
## 470     2016-03-12  4
## 471     2016-03-14  5
## 472     2016-03-15  3
## 473     2016-03-16  3
## 474     2016-03-17  5
## 475     2016-03-18  6
## 476     2016-03-20  4
## 477     2016-03-21  4
## 478     2016-03-22  5
## 479     2016-03-23  2
## 480     2016-03-24  3
## 481     2016-03-25  2
## 482     2016-03-28  2
## 483     2016-03-29  9
## 484     2016-03-30  5
## 485     2016-03-31  5
## 486     2016-04-01  7
## 487     2016-04-02  2
## 488     2016-04-04  3
## 489     2016-04-05  3
## 490     2016-04-06  6
## 491     2016-04-07  7
## 492     2016-04-08  6
## 493     2016-04-11  4
## 494     2016-04-12  6
## 495     2016-04-13 18
## 496     2016-04-14  3
## 497     2016-04-15  3
## 498     2016-04-18  2
## 499     2016-04-19  2
## 500     2016-04-20  3
## 501     2016-04-21  4
## 502     2016-04-22  2
## 503     2016-04-23  3
## 504     2016-04-24  2
## 505     2016-04-25  4
## 506     2016-04-26  5
## 507     2016-04-27  7
## 508     2016-04-28  5
## 509     2016-04-29  3
## 510     2016-05-01  2
## 511     2016-05-02  5
## 512     2016-05-03  6
## 513     2016-05-04  5
## 514     2016-05-05  6
## 515     2016-05-06  2
## 516     2016-05-09  3
## 517     2016-05-11  6
## 518     2016-05-12  3
## 519     2016-05-13  3
## 520     2016-05-16 10
## 521     2016-05-17  7
## 522     2016-05-18  4
## 523     2016-05-23  8
## 524     2016-05-24  6
## 525     2016-05-25  2
## 526     2016-05-26  4
## 527     2016-05-27  5
## 528     2016-05-30  6
## 529     2016-05-31  2
## 530     2016-06-01  2
## 531     2016-06-02  3
## 532     2016-06-03  3
## 533     2016-06-05  2
## 534     2016-06-06  4
## 535     2016-06-07  4
## 536     2016-06-08  3
## 537     2016-06-09  6
## 538     2016-06-10  4
## 539     2016-06-13  5
## 540     2016-06-14  6
## 541     2016-06-15  5
## 542     2016-06-16  2
## 543     2016-06-19  2
## 544     2016-06-20 13
## 545     2016-06-21  6
## 546     2016-06-23  3
## 547     2016-06-26  3
## 548     2016-06-27  4
## 549     2016-06-28  5
## 550     2016-06-29  4
## 551     2016-06-30  3
## 552     2016-07-01  5
## 553     2016-07-02  2
## 554     2016-07-04  5
## 555     2016-07-05  4
## 556     2016-07-06  2
## 557     2016-07-07  4
## 558     2016-07-08  2
## 559     2016-07-10  2
## 560     2016-07-11  3
## 561     2016-07-12  5
## 562     2016-07-13  6
## 563     2016-07-14  3
## 564     2016-07-18  7
## 565     2016-07-20  2
## 566     2016-07-21  4
## 567     2016-07-22  2
## 568     2016-07-23  2
## 569     2016-07-24  2
## 570     2016-07-25  5
## 571     2016-07-26  2
## 572     2016-07-28  7
## 573     2016-07-29  5
## 574     2016-08-02  3
## 575     2016-08-04  3
## 576     2016-08-05  2
## 577     2016-08-06  4
## 578     2016-08-08  5
## 579     2016-08-09  4
## 580     2016-08-10  6
## 581     2016-08-11  6
## 582     2016-08-12  3
## 583     2016-08-14  2
## 584     2016-08-17  6
## 585     2016-08-18  7
## 586     2016-08-19  5
## 587     2016-08-22  5
## 588     2016-08-23  4
## 589     2016-08-25  3
## 590     2016-08-26  4
## 591     2016-08-27  3
## 592     2016-08-29 10
## 593     2016-08-30  5
## 594     2016-08-31  2
## 595     2016-09-01  3
## 596     2016-09-02  3
## 597     2016-09-03  2
## 598     2016-09-04  2
## 599     2016-09-06  4
## 600     2016-09-07  2
## 601     2016-09-09  3
## 602     2016-09-10  2
## 603     2016-09-12 10
## 604     2016-09-13  7
## 605     2016-09-14  3
## 606     2016-09-15  4
## 607     2016-09-18  2
## 608     2016-09-19  8
## 609     2016-09-20  2
## 610     2016-09-21  5
## 611     2016-09-22  2
## 612     2016-09-25  5
## 613     2016-09-26  9
## 614     2016-09-27  8
## 615     2016-09-28  3
## 616     2016-09-29  2
## 617     2016-10-03  8
## 618     2016-10-04  4
## 619     2016-10-05  3
## 620     2016-10-06  3
## 621     2016-10-07  3
## 622     2016-10-09  2
## 623     2016-10-10  5
## 624     2016-10-11  3
## 625     2016-10-12  8
## 626     2016-10-13  5
## 627     2016-10-16  3
## 628     2016-10-17  4
## 629     2016-10-18  7
## 630     2016-10-19  2
## 631     2016-10-21  2
## 632     2016-10-22  3
## 633     2016-10-24  4
## 634     2016-10-25  5
## 635     2016-10-26  2
## 636     2016-10-27  5
## 637     2016-10-28  3
## 638     2016-10-29  6
## 639     2016-10-30  3
## 640     2016-10-31  2
## 641     2016-11-01  7
## 642     2016-11-02  2
## 643     2016-11-03  5
## 644     2016-11-06  2
## 645     2016-11-07  5
## 646     2016-11-08  4
## 647     2016-11-10  4
## 648     2016-11-11  2
## 649     2016-11-12  3
## 650     2016-11-14  4
## 651     2016-11-15  6
## 652     2016-11-16  2
## 653     2016-11-17  4
## 654     2016-11-18  4
## 655     2016-11-20  3
## 656     2016-11-21  3
## 657     2016-11-22  5
## 658     2016-11-23  3
## 659     2016-11-24  7
## 660     2016-11-25  3
## 661     2016-11-27  3
## 662     2016-11-28  4
## 663     2016-11-29  5
## 664     2016-11-30  2
## 665     2016-12-01  5
## 666     2016-12-02  2
## 667     2016-12-03  2
## 668     2016-12-05  4
## 669     2016-12-06  3
## 670     2016-12-07  3
## 671     2016-12-08  3
## 672     2016-12-09  3
## 673     2016-12-11  3
## 674     2016-12-12  2
## 675     2016-12-13  5
## 676     2016-12-14  8
## 677     2016-12-15  7
## 678     2016-12-16  2
## 679     2016-12-19  6
## 680     2016-12-20  5
## 681     2016-12-21  4
## 682     2016-12-22 11
## 683     2016-12-26  4
## 684     2016-12-27  4
## 685     2016-12-28  6
## 686     2016-12-29  3
## 687     2016-12-31  2
## 688     2017-01-02  4
## 689     2017-01-03  9
## 690     2017-01-09  2
## 691     2017-01-10  3
## 692     2017-01-11  4
## 693     2017-01-12  2
## 694     2017-01-13  4
## 695     2017-01-16  4
## 696     2017-01-17  4
## 697     2017-01-18  6
## 698     2017-01-19  8
## 699     2017-01-20  4
## 700     2017-01-23  8
## 701     2017-01-24  4
## 702     2017-01-25  2
## 703     2017-01-26  4
## 704     2017-01-27  5
## 705     2017-02-01  3
## 706     2017-02-02  5
## 707     2017-02-03  8
## 708     2017-02-06  6
## 709     2017-02-07  8
## 710     2017-02-08  5
## 711     2017-02-09  5
## 712     2017-02-10  3
## 713     2017-02-12  2
## 714     2017-02-13  8
## 715     2017-02-14  4
## 716     2017-02-15  2
## 717     2017-02-16  2
## 718     2017-02-17  4
## 719     2017-02-18  4
## 720     2017-02-19  4
## 721     2017-02-20  5
## 722     2017-02-21  6
## 723     2017-02-22  6
## 724     2017-02-23  7
## 725     2017-02-24  4
## 726     2017-02-25  2
## 727     2017-02-26  4
## 728     2017-02-27  2
## 729     2017-02-28  3
## 730     2017-03-01  3
## 731     2017-03-02  4
## 732     2017-03-03  5
## 733     2017-03-06  3
## 734     2017-03-07  6
## 735     2017-03-08 11
## 736     2017-03-09  6
## 737     2017-03-10  5
## 738     2017-03-12  2
## 739     2017-03-13  5
## 740     2017-03-14  4
## 741     2017-03-15  2
## 742     2017-03-16  4
## 743     2017-03-17  2
## 744     2017-03-21  6
## 745     2017-03-22  3
## 746     2017-03-23  6
## 747     2017-03-24  3
## 748     2017-03-27  5
## 749     2017-03-28  3
## 750     2017-03-29  4
## 751     2017-03-30 11
## 752     2017-03-31  7
## 753     2017-04-03  2
## 754     2017-04-04  4
## 755     2017-04-05  3
## 756     2017-04-06  6
## 757     2017-04-11  6
## 758     2017-04-12  4
## 759     2017-04-13  3
## 760     2017-04-14  3
## 761     2017-04-15  3
## 762     2017-04-17  7
## 763     2017-04-18  9
## 764     2017-04-19  5
## 765     2017-04-20  4
## 766     2017-04-23  4
## 767     2017-04-24 12
## 768     2017-04-25  6
## 769     2017-04-26  4
## 770     2017-04-27  8
## 771     2017-04-28  7
## 772     2017-05-01 21
## 773     2017-05-02 17
## 774     2017-05-03  6
## 775     2017-05-04  7
## 776     2017-05-05  4
## 777     2017-05-07  2
## 778     2017-05-08  3
## 779     2017-05-09  7
## 780     2017-05-10  3
## 781     2017-05-11  2
## 782     2017-05-12  2
## 783     2017-05-14  2
## 784     2017-05-15  4
## 785     2017-05-16  3
## 786     2017-05-17  8
## 787     2017-05-18  4
## 788     2017-05-19  3
## 789     2017-05-22  5
## 790     2017-05-23  7
## 791     2017-05-24  8
## 792     2017-05-25  5
## 793     2017-05-26  2
## 794     2017-05-27  2
## 795     2017-05-28  6
## 796     2017-05-29  7
## 797     2017-05-30  6
## 798     2017-05-31  6
## 799     2017-06-02  5
## 800     2017-06-05  2
## 801     2017-06-06  6
## 802     2017-06-07  3
## 803     2017-06-08  4
## 804     2017-06-09  3
## 805     2017-06-11  4
## 806     2017-06-12  5
## 807     2017-06-13  5
## 808     2017-06-15  4
## 809     2017-06-16  2
## 810     2017-06-19  7
## 811     2017-06-20  6
## 812     2017-06-21  6
## 813     2017-06-22  5
## 814     2017-06-23  6
## 815     2017-06-26  2
## 816     2017-06-27  9
## 817     2017-06-28 11
## 818     2017-06-29 15
## 819     2017-06-30  4
## 820     2017-07-02  3
## 821     2017-07-03 12
## 822     2017-07-04  3
## 823     2017-07-05  4
## 824     2017-07-06  6
aggregated_data <- time_series_data %>%
  group_by(published_date) %>%
  summarize(num_subscribers = sum(num_subscribers, na.rm = TRUE), .groups = "drop")

tsibble_data <- aggregated_data %>%
  as_tsibble(index = published_date)

tsibble_data %>%
  ggplot(aes(x = published_date, y = num_subscribers)) +
  geom_line(color = "blue") +
  labs(
    title = "Number of Subscribers Over Time",
    x = "Published Date",
    y = "Number of Subscribers"
  ) +
  theme_minimal()

I first group the data by published_date to ensure that each date has only one entry. If there are multiple entries for the same date, I add up the num_subscribers for that date to get a total. After grouping and summarizing the data, I convert it into a tsibble, a time-series format, to make it easier to analyze trends over time. Finally, I create a line plot where I display how the number of subscribers changes over time.

I can see how the number of subscribers changes over time. When I plot the data as a line chart, it allows me to observe whether the subscriber count is growing, declining, or remaining steady across the dates. This helps me identify patterns, such as periods of growth, seasonal variations, or any unexpected spikes or drops in the numbers.

trend_model <- lm(num_subscribers ~ as.numeric(published_date), data = aggregated_data)

summary(trend_model)
## 
## Call:
## lm(formula = num_subscribers ~ as.numeric(published_date), data = aggregated_data)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -13090  -8152  -4750   1387 256803 
## 
## Coefficients:
##                             Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                56677.555  17404.615   3.256  0.00116 **
## as.numeric(published_date)    -2.829      1.048  -2.699  0.00705 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 17450 on 1208 degrees of freedom
## Multiple R-squared:  0.005995,   Adjusted R-squared:  0.005172 
## F-statistic: 7.286 on 1 and 1208 DF,  p-value: 0.007047
aggregated_data %>%
  ggplot(aes(x = published_date, y = num_subscribers)) +
  geom_point(color = "blue") +
  geom_smooth(method = "lm", color = "red", se = FALSE) +
  labs(
    title = "Linear Trend in Number of Subscribers Over Time",
    x = "Published Date",
    y = "Number of Subscribers"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

aggregated_data %>%
  ggplot(aes(x = published_date, y = num_subscribers)) +
  geom_point(color = "blue") +
  geom_smooth(span = 0.3, color = "green", method = "loess") +
  labs(
    title = "Visualizing Trends to Assess Subsetting Needs",
    x = "Published Date",
    y = "Number of Subscribers"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

I start by creating a linear regression model (trend_model) to explore the overall trend in the number of subscribers over time. By using summary(trend_model), I examine the results of the model to see how well the trend fits the data and whether there is a significant increase or decrease in subscribers over time. Linear Trend in Number of Subscribers Over Time- The scatterplot shows individual subscriber counts over time (blue points) with a fitted red linear regression line. The red linear trend line appears almost flat, indicating no significant overall growth in subscriber numbers over the analyzed time frame based on this linear model The blue points, however, show substantial variability. While most values cluster near zero, there are a few outliers with significantly higher subscriber counts, which could distort the effectiveness of the linear trend model.

Next, I plot the data to visualize the trend. First, I create a scatterplot of the number of subscribers against the published dates and add a red trend line using the linear regression model. This helps me clearly see the linear relationship between the two variables. The majority of points cluster near zero, indicating that most courses have relatively low subscriber counts. The green LOESS curve suggests very subtle upward or downward shifts in the number of subscribers over time. However, the trend appears mostly flat. The slight increases in some regions (e.g., around 2013–2014) may point to periods of growth, while other regions may reflect stable or declining subscriber counts. The LOESS line does not show a significant long-term upward or downward trend. This suggests that there may be no strong time-dependent growth in subscribers, and the data is heavily influenced by outliers

# Convert the date column to Date format
time_series_data$published_date <- as.Date(time_series_data$published_date,time_series_data$published_timestamp)



# Aggregate data to ensure unique dates
aggregated_data <- time_series_data %>%
  group_by(published_date) %>%
  summarize(num_subscribers = sum(num_subscribers, na.rm = TRUE), .groups = "drop")
# Create a tsibble object and fill gaps
tsibble_data <- aggregated_data %>%
  as_tsibble(index = published_date) %>%
  fill_gaps(num_subscribers = 0)  # Fill gaps with 0 subscribers

# Apply smoothing using LOESS to detect seasonality
tsibble_data %>%
  ggplot(aes(x = published_date, y = num_subscribers)) +
  geom_point(color = "blue") +
  geom_smooth(method = "loess", span = 0.2, color = "red") +
  labs(
    title = "Smoothing to Detect Seasonality in Number of Subscribers",
    x = "Published Date",
    y = "Number of Subscribers"
  ) +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

I create a tsibble object from the aggregated data using published_date as the time index. I also fill any missing dates with a default value of 0 subscribers to ensure the data is complete for analysis. Then, I apply LOESS smoothing to the data and plot it. The plot shows the raw data points in blue and a red smoothed curve that helps me detect seasonal patterns or trends over time. This makes it easier to see underlying patterns in the number of subscribers. The red line suggests that there is no strong seasonality or periodicity in the data. The increase in subscribers over time is minimal and mostly dominated by the outliers While the overall trend is subtle, minor fluctuations in the LOESS line could indicate possible seasonal behaviors. This would require further investigation by aggregating data by months or quarters. The significant outliers, where subscriber counts exceed 100,000, skew the perception of the overall trend. Identifying the dates of these outliers could help in understanding what factors contributed to their success (e.g., popular topics or promotions)

# Decompose the time series to extract seasonal components
decomposed <- tsibble_data %>%
  model(stl = STL(num_subscribers ~ season(window = "periodic"))) %>%
  components()

# Plot the decomposed components
autoplot(decomposed) +
  labs(
    title = "Decomposition of Time Series to Detect Seasonality",
    x = "Published Date",
    y = "Number of Subscribers"
  )

# Illustrate seasonality using ACF and PACF
# ACF plot
tsibble_data %>%
  ACF(num_subscribers) %>%
  autoplot() +
  labs(
    title = "ACF Plot to Detect Seasonality",
    x = "Lag",
    y = "ACF"
  )

# PACF plot
tsibble_data %>%
  PACF(num_subscribers) %>%
  autoplot() +
  labs(
    title = "PACF Plot to Detect Seasonality",
    x = "Lag",
    y = "PACF"
  )

Decomposition of Time Series to Detect Seasonality 1st panel - The raw data exhibits a high degree of variability with noticeable outliers, where some courses attracted significantly more subscribers than the majority.This panel shows the original time series data (number of subscribers over time). The raw data exhibits a high degree of variability with noticeable outliers, where some courses attracted significantly more subscribers than the majority. 2nd panel- The trend component captures the long-term progression of the data, smoothing out short-term fluctuations. It reveals a gradual increase in subscriber numbers until around 2015, followed by a slight decline. This could indicate a peak in subscriber growth during this period, possibly due to platform-wide factors such as promotions or market saturation 3rd panel - This panel captures repeating patterns at a seasonal frequency (e.g., yearly or weekly) Clear peaks in the seasonal component suggest recurring periods of higher subscriber activity. This could align with annual or weekly patterns, such as holiday seasons or consistent spikes during specific times of the week. last panel - The remainder represents the variability in the data that cannot be explained by the trend or seasonal components. Large residuals in certain areas indicate irregularities or one-off events that significantly impacted subscriber counts (e.g., a highly successful course launch or promotional campaign)

ACF Plot to Detect Seasonality- The ACF measures how the values of the time series are correlated with their past values (lags). The blue dashed lines indicate the threshold for statistically significant autocorrelations. Bars exceeding this threshold suggest a meaningful pattern or seasonality The ACF plot shows strong, regularly spaced peaks (e.g., at lags 7, 14, 21, and 28). This indicates a weekly seasonality in the time series, suggesting that subscriber behavior repeats at a weekly interval.

PACF Plot to Detect Seasonality- The PACF measures the direct relationship between a time series value and its lagged values, removing the effects of intermediate lags. This gives a clearer picture of the immediate influence of specific lags. Bars extending beyond the blue dashed lines indicate statistically significant partial autocorrelations at those lags The PACF plot shows strong spikes at lags 7, 14, and 21. This suggests that there is a direct weekly pattern in the time series data, consistent with the findings from the ACF plot. These significant lags confirm a weekly seasonality, where subscriber behavior exhibits a strong repetition every 7 days.