# Load necessary library
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ purrr 1.0.2 ✔ tidyr 1.3.1
## ✔ readr 2.1.5
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tsibble)
## Registered S3 method overwritten by 'tsibble':
## method from
## as_tibble.grouped_df dplyr
##
## Attaching package: 'tsibble'
##
## The following object is masked from 'package:lubridate':
##
## interval
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, union
library(fable)
## Loading required package: fabletools
library(fabletools)
library(feasts)
data <- read.csv("~/Documents/STAT 2024/udemy_courses.csv")
data$published_date <- as.Date(data$published_timestamp, format = "%Y-%m-%dT%H:%M:%SZ")
head(data$published_date)
## [1] "2017-01-18" "2017-03-09" "2016-12-19" "2017-05-30" "2016-12-13"
## [6] "2014-05-02"
The published_timestamp column contains timestamps in a detailed format such as “YYYY-MM-DDTHH” (including date and time). The code converts this into a simpler format showing only the date (YYYY-MM-DD) using the as.Date() function, making it more convenient for analysis.
time_series_data <- data %>%
select(published_date = published_timestamp, num_subscribers)
time_series_data$published_date <- as.Date(time_series_data$published_date, format = "%Y-%m-%dT%H:%M:%SZ")
time_series_data <- time_series_data %>%
arrange(published_date)
head(time_series_data)
I select two important columns from the dataset: published_timestamp (renamed as published_date) and num_subscribers, which are needed for time-series analysis. The published_date column is converted into a simple date format (YYYY-MM-DD) to make it consistent and easier to work with. Then, the data is sorted by the published_date column in order from earliest to latest, which is useful for time-series analysis. Finally, the first few rows of the updated data are shown using the head() function to check the results.
duplicates <- time_series_data %>%
count(published_date) %>%
filter(n > 1)
print(duplicates)
## published_date n
## 1 2012-06-18 2
## 2 2012-10-31 3
## 3 2012-11-26 2
## 4 2013-01-03 2
## 5 2013-02-16 2
## 6 2013-02-18 2
## 7 2013-02-24 2
## 8 2013-02-25 2
## 9 2013-02-27 2
## 10 2013-03-02 2
## 11 2013-03-05 2
## 12 2013-03-08 2
## 13 2013-04-17 2
## 14 2013-05-13 2
## 15 2013-05-29 5
## 16 2013-06-07 3
## 17 2013-06-09 2
## 18 2013-06-10 2
## 19 2013-07-09 2
## 20 2013-07-19 2
## 21 2013-07-23 6
## 22 2013-08-05 5
## 23 2013-08-12 9
## 24 2013-08-21 2
## 25 2013-09-03 2
## 26 2013-09-24 2
## 27 2013-09-25 2
## 28 2013-09-29 2
## 29 2013-10-02 2
## 30 2013-10-09 2
## 31 2013-10-10 2
## 32 2013-10-16 6
## 33 2013-10-20 2
## 34 2013-10-23 2
## 35 2013-11-01 2
## 36 2013-11-09 2
## 37 2013-11-18 2
## 38 2013-11-26 3
## 39 2013-11-28 2
## 40 2013-12-04 2
## 41 2013-12-12 2
## 42 2013-12-13 3
## 43 2014-01-14 2
## 44 2014-01-16 4
## 45 2014-01-21 2
## 46 2014-01-27 3
## 47 2014-01-28 2
## 48 2014-01-30 2
## 49 2014-01-31 2
## 50 2014-02-10 2
## 51 2014-02-11 3
## 52 2014-02-12 3
## 53 2014-02-13 2
## 54 2014-02-19 2
## 55 2014-02-25 2
## 56 2014-03-01 2
## 57 2014-03-03 2
## 58 2014-03-04 2
## 59 2014-03-06 2
## 60 2014-03-07 2
## 61 2014-03-10 2
## 62 2014-03-11 8
## 63 2014-03-12 4
## 64 2014-03-13 2
## 65 2014-03-16 2
## 66 2014-03-17 3
## 67 2014-03-21 3
## 68 2014-03-24 2
## 69 2014-03-26 2
## 70 2014-03-27 2
## 71 2014-03-31 3
## 72 2014-04-01 4
## 73 2014-04-04 4
## 74 2014-04-09 2
## 75 2014-04-14 3
## 76 2014-04-15 5
## 77 2014-04-16 2
## 78 2014-04-21 2
## 79 2014-04-23 2
## 80 2014-04-24 2
## 81 2014-04-27 2
## 82 2014-04-29 3
## 83 2014-04-30 3
## 84 2014-05-01 2
## 85 2014-05-05 3
## 86 2014-05-07 7
## 87 2014-05-08 2
## 88 2014-05-09 7
## 89 2014-05-16 5
## 90 2014-05-17 3
## 91 2014-05-19 11
## 92 2014-05-28 2
## 93 2014-06-02 2
## 94 2014-06-16 2
## 95 2014-06-17 5
## 96 2014-06-23 4
## 97 2014-06-25 2
## 98 2014-06-27 2
## 99 2014-06-30 5
## 100 2014-07-04 2
## 101 2014-07-05 2
## 102 2014-07-17 3
## 103 2014-07-19 3
## 104 2014-07-24 2
## 105 2014-08-07 2
## 106 2014-08-10 2
## 107 2014-08-13 2
## 108 2014-08-16 2
## 109 2014-08-18 4
## 110 2014-08-21 2
## 111 2014-08-23 3
## 112 2014-08-28 3
## 113 2014-08-29 2
## 114 2014-09-01 2
## 115 2014-09-02 2
## 116 2014-09-05 2
## 117 2014-09-09 2
## 118 2014-09-14 2
## 119 2014-09-19 4
## 120 2014-09-22 9
## 121 2014-10-01 4
## 122 2014-10-02 3
## 123 2014-10-03 2
## 124 2014-10-06 2
## 125 2014-10-08 4
## 126 2014-10-12 2
## 127 2014-10-16 3
## 128 2014-10-20 2
## 129 2014-10-21 3
## 130 2014-10-23 2
## 131 2014-10-24 3
## 132 2014-10-25 2
## 133 2014-10-27 2
## 134 2014-10-28 2
## 135 2014-10-29 5
## 136 2014-11-04 3
## 137 2014-11-05 3
## 138 2014-11-06 2
## 139 2014-11-08 4
## 140 2014-11-11 2
## 141 2014-11-12 2
## 142 2014-11-18 4
## 143 2014-11-19 2
## 144 2014-11-20 2
## 145 2014-11-21 2
## 146 2014-11-24 3
## 147 2014-11-25 4
## 148 2014-11-28 4
## 149 2014-11-30 3
## 150 2014-12-03 2
## 151 2014-12-06 3
## 152 2014-12-08 2
## 153 2014-12-09 2
## 154 2014-12-10 3
## 155 2014-12-11 3
## 156 2014-12-12 2
## 157 2014-12-17 6
## 158 2014-12-18 4
## 159 2014-12-19 7
## 160 2014-12-22 3
## 161 2014-12-24 2
## 162 2014-12-27 2
## 163 2014-12-28 2
## 164 2014-12-29 4
## 165 2014-12-31 2
## 166 2015-01-01 5
## 167 2015-01-03 3
## 168 2015-01-04 2
## 169 2015-01-05 2
## 170 2015-01-07 3
## 171 2015-01-08 4
## 172 2015-01-09 5
## 173 2015-01-12 2
## 174 2015-01-13 3
## 175 2015-01-14 2
## 176 2015-01-15 2
## 177 2015-01-16 2
## 178 2015-01-17 2
## 179 2015-01-20 3
## 180 2015-01-21 2
## 181 2015-01-22 6
## 182 2015-01-23 2
## 183 2015-01-24 2
## 184 2015-01-26 3
## 185 2015-01-27 5
## 186 2015-01-29 4
## 187 2015-01-30 5
## 188 2015-01-31 4
## 189 2015-02-02 3
## 190 2015-02-05 3
## 191 2015-02-06 5
## 192 2015-02-07 3
## 193 2015-02-09 4
## 194 2015-02-12 3
## 195 2015-02-13 6
## 196 2015-02-17 4
## 197 2015-02-18 2
## 198 2015-02-19 4
## 199 2015-02-20 2
## 200 2015-02-21 2
## 201 2015-02-23 5
## 202 2015-02-24 2
## 203 2015-02-25 4
## 204 2015-02-26 4
## 205 2015-03-02 5
## 206 2015-03-03 3
## 207 2015-03-04 8
## 208 2015-03-05 5
## 209 2015-03-06 2
## 210 2015-03-08 4
## 211 2015-03-09 2
## 212 2015-03-10 2
## 213 2015-03-12 2
## 214 2015-03-13 2
## 215 2015-03-15 2
## 216 2015-03-16 2
## 217 2015-03-19 6
## 218 2015-03-22 3
## 219 2015-03-23 4
## 220 2015-03-24 5
## 221 2015-03-26 4
## 222 2015-03-29 4
## 223 2015-03-31 3
## 224 2015-04-02 2
## 225 2015-04-06 4
## 226 2015-04-08 5
## 227 2015-04-09 2
## 228 2015-04-10 4
## 229 2015-04-12 5
## 230 2015-04-13 7
## 231 2015-04-15 9
## 232 2015-04-16 6
## 233 2015-04-17 3
## 234 2015-04-20 8
## 235 2015-04-21 3
## 236 2015-04-22 2
## 237 2015-04-24 4
## 238 2015-04-26 5
## 239 2015-04-27 3
## 240 2015-04-28 3
## 241 2015-05-01 3
## 242 2015-05-04 2
## 243 2015-05-06 2
## 244 2015-05-08 2
## 245 2015-05-12 2
## 246 2015-05-13 2
## 247 2015-05-14 3
## 248 2015-05-15 3
## 249 2015-05-18 3
## 250 2015-05-19 2
## 251 2015-05-20 3
## 252 2015-05-21 2
## 253 2015-05-22 3
## 254 2015-05-25 5
## 255 2015-05-26 5
## 256 2015-05-27 2
## 257 2015-05-28 4
## 258 2015-06-01 7
## 259 2015-06-02 3
## 260 2015-06-03 5
## 261 2015-06-05 5
## 262 2015-06-07 2
## 263 2015-06-08 6
## 264 2015-06-09 4
## 265 2015-06-11 3
## 266 2015-06-12 5
## 267 2015-06-14 3
## 268 2015-06-16 6
## 269 2015-06-17 5
## 270 2015-06-19 7
## 271 2015-06-22 4
## 272 2015-06-23 3
## 273 2015-06-25 2
## 274 2015-06-26 2
## 275 2015-07-01 6
## 276 2015-07-02 3
## 277 2015-07-05 2
## 278 2015-07-06 3
## 279 2015-07-07 9
## 280 2015-07-08 5
## 281 2015-07-09 2
## 282 2015-07-10 6
## 283 2015-07-12 2
## 284 2015-07-13 2
## 285 2015-07-14 9
## 286 2015-07-15 2
## 287 2015-07-16 8
## 288 2015-07-17 3
## 289 2015-07-19 3
## 290 2015-07-20 6
## 291 2015-07-21 3
## 292 2015-07-22 6
## 293 2015-07-23 4
## 294 2015-07-24 3
## 295 2015-07-26 4
## 296 2015-07-27 5
## 297 2015-07-28 6
## 298 2015-07-29 3
## 299 2015-07-30 5
## 300 2015-08-03 4
## 301 2015-08-04 7
## 302 2015-08-05 3
## 303 2015-08-06 5
## 304 2015-08-07 4
## 305 2015-08-09 4
## 306 2015-08-10 3
## 307 2015-08-11 7
## 308 2015-08-12 2
## 309 2015-08-13 7
## 310 2015-08-14 5
## 311 2015-08-16 5
## 312 2015-08-17 4
## 313 2015-08-18 7
## 314 2015-08-19 2
## 315 2015-08-20 4
## 316 2015-08-21 6
## 317 2015-08-24 4
## 318 2015-08-25 2
## 319 2015-08-26 2
## 320 2015-08-27 4
## 321 2015-08-28 3
## 322 2015-08-31 4
## 323 2015-09-01 3
## 324 2015-09-02 2
## 325 2015-09-04 3
## 326 2015-09-07 6
## 327 2015-09-08 3
## 328 2015-09-10 4
## 329 2015-09-12 3
## 330 2015-09-13 3
## 331 2015-09-14 3
## 332 2015-09-15 7
## 333 2015-09-16 2
## 334 2015-09-17 4
## 335 2015-09-20 7
## 336 2015-09-21 4
## 337 2015-09-22 4
## 338 2015-09-23 8
## 339 2015-09-24 3
## 340 2015-09-25 4
## 341 2015-09-28 4
## 342 2015-09-29 6
## 343 2015-09-30 2
## 344 2015-10-01 3
## 345 2015-10-02 4
## 346 2015-10-04 3
## 347 2015-10-05 2
## 348 2015-10-06 2
## 349 2015-10-07 2
## 350 2015-10-08 5
## 351 2015-10-09 4
## 352 2015-10-12 4
## 353 2015-10-13 4
## 354 2015-10-14 7
## 355 2015-10-15 4
## 356 2015-10-16 4
## 357 2015-10-17 3
## 358 2015-10-18 2
## 359 2015-10-19 5
## 360 2015-10-20 2
## 361 2015-10-21 3
## 362 2015-10-22 5
## 363 2015-10-23 2
## 364 2015-10-25 3
## 365 2015-10-26 7
## 366 2015-10-28 6
## 367 2015-10-29 5
## 368 2015-10-30 3
## 369 2015-11-01 4
## 370 2015-11-02 6
## 371 2015-11-03 6
## 372 2015-11-04 2
## 373 2015-11-05 4
## 374 2015-11-06 7
## 375 2015-11-08 2
## 376 2015-11-09 7
## 377 2015-11-10 8
## 378 2015-11-11 11
## 379 2015-11-12 9
## 380 2015-11-13 9
## 381 2015-11-16 2
## 382 2015-11-17 5
## 383 2015-11-18 5
## 384 2015-11-19 3
## 385 2015-11-20 2
## 386 2015-11-21 2
## 387 2015-11-22 3
## 388 2015-11-23 6
## 389 2015-11-24 3
## 390 2015-11-25 3
## 391 2015-11-26 6
## 392 2015-11-27 2
## 393 2015-11-29 3
## 394 2015-11-30 3
## 395 2015-12-01 3
## 396 2015-12-03 3
## 397 2015-12-04 2
## 398 2015-12-06 3
## 399 2015-12-07 3
## 400 2015-12-08 2
## 401 2015-12-09 3
## 402 2015-12-10 3
## 403 2015-12-14 3
## 404 2015-12-15 4
## 405 2015-12-16 2
## 406 2015-12-17 7
## 407 2015-12-18 2
## 408 2015-12-21 2
## 409 2015-12-22 2
## 410 2015-12-28 2
## 411 2015-12-29 10
## 412 2015-12-30 5
## 413 2016-01-01 2
## 414 2016-01-03 2
## 415 2016-01-04 4
## 416 2016-01-05 2
## 417 2016-01-06 4
## 418 2016-01-07 3
## 419 2016-01-08 4
## 420 2016-01-10 5
## 421 2016-01-11 3
## 422 2016-01-12 5
## 423 2016-01-13 3
## 424 2016-01-14 5
## 425 2016-01-15 2
## 426 2016-01-18 7
## 427 2016-01-19 2
## 428 2016-01-21 3
## 429 2016-01-22 6
## 430 2016-01-24 6
## 431 2016-01-25 6
## 432 2016-01-27 7
## 433 2016-01-28 4
## 434 2016-01-29 6
## 435 2016-02-01 6
## 436 2016-02-02 7
## 437 2016-02-03 6
## 438 2016-02-04 4
## 439 2016-02-05 5
## 440 2016-02-07 2
## 441 2016-02-08 7
## 442 2016-02-09 2
## 443 2016-02-10 4
## 444 2016-02-11 5
## 445 2016-02-12 4
## 446 2016-02-13 5
## 447 2016-02-14 5
## 448 2016-02-15 6
## 449 2016-02-16 5
## 450 2016-02-17 6
## 451 2016-02-18 4
## 452 2016-02-19 4
## 453 2016-02-21 5
## 454 2016-02-22 5
## 455 2016-02-23 4
## 456 2016-02-24 4
## 457 2016-02-25 4
## 458 2016-02-26 5
## 459 2016-02-29 6
## 460 2016-03-01 4
## 461 2016-03-02 4
## 462 2016-03-03 7
## 463 2016-03-04 4
## 464 2016-03-06 3
## 465 2016-03-07 2
## 466 2016-03-08 8
## 467 2016-03-09 2
## 468 2016-03-10 7
## 469 2016-03-11 4
## 470 2016-03-12 4
## 471 2016-03-14 5
## 472 2016-03-15 3
## 473 2016-03-16 3
## 474 2016-03-17 5
## 475 2016-03-18 6
## 476 2016-03-20 4
## 477 2016-03-21 4
## 478 2016-03-22 5
## 479 2016-03-23 2
## 480 2016-03-24 3
## 481 2016-03-25 2
## 482 2016-03-28 2
## 483 2016-03-29 9
## 484 2016-03-30 5
## 485 2016-03-31 5
## 486 2016-04-01 7
## 487 2016-04-02 2
## 488 2016-04-04 3
## 489 2016-04-05 3
## 490 2016-04-06 6
## 491 2016-04-07 7
## 492 2016-04-08 6
## 493 2016-04-11 4
## 494 2016-04-12 6
## 495 2016-04-13 18
## 496 2016-04-14 3
## 497 2016-04-15 3
## 498 2016-04-18 2
## 499 2016-04-19 2
## 500 2016-04-20 3
## 501 2016-04-21 4
## 502 2016-04-22 2
## 503 2016-04-23 3
## 504 2016-04-24 2
## 505 2016-04-25 4
## 506 2016-04-26 5
## 507 2016-04-27 7
## 508 2016-04-28 5
## 509 2016-04-29 3
## 510 2016-05-01 2
## 511 2016-05-02 5
## 512 2016-05-03 6
## 513 2016-05-04 5
## 514 2016-05-05 6
## 515 2016-05-06 2
## 516 2016-05-09 3
## 517 2016-05-11 6
## 518 2016-05-12 3
## 519 2016-05-13 3
## 520 2016-05-16 10
## 521 2016-05-17 7
## 522 2016-05-18 4
## 523 2016-05-23 8
## 524 2016-05-24 6
## 525 2016-05-25 2
## 526 2016-05-26 4
## 527 2016-05-27 5
## 528 2016-05-30 6
## 529 2016-05-31 2
## 530 2016-06-01 2
## 531 2016-06-02 3
## 532 2016-06-03 3
## 533 2016-06-05 2
## 534 2016-06-06 4
## 535 2016-06-07 4
## 536 2016-06-08 3
## 537 2016-06-09 6
## 538 2016-06-10 4
## 539 2016-06-13 5
## 540 2016-06-14 6
## 541 2016-06-15 5
## 542 2016-06-16 2
## 543 2016-06-19 2
## 544 2016-06-20 13
## 545 2016-06-21 6
## 546 2016-06-23 3
## 547 2016-06-26 3
## 548 2016-06-27 4
## 549 2016-06-28 5
## 550 2016-06-29 4
## 551 2016-06-30 3
## 552 2016-07-01 5
## 553 2016-07-02 2
## 554 2016-07-04 5
## 555 2016-07-05 4
## 556 2016-07-06 2
## 557 2016-07-07 4
## 558 2016-07-08 2
## 559 2016-07-10 2
## 560 2016-07-11 3
## 561 2016-07-12 5
## 562 2016-07-13 6
## 563 2016-07-14 3
## 564 2016-07-18 7
## 565 2016-07-20 2
## 566 2016-07-21 4
## 567 2016-07-22 2
## 568 2016-07-23 2
## 569 2016-07-24 2
## 570 2016-07-25 5
## 571 2016-07-26 2
## 572 2016-07-28 7
## 573 2016-07-29 5
## 574 2016-08-02 3
## 575 2016-08-04 3
## 576 2016-08-05 2
## 577 2016-08-06 4
## 578 2016-08-08 5
## 579 2016-08-09 4
## 580 2016-08-10 6
## 581 2016-08-11 6
## 582 2016-08-12 3
## 583 2016-08-14 2
## 584 2016-08-17 6
## 585 2016-08-18 7
## 586 2016-08-19 5
## 587 2016-08-22 5
## 588 2016-08-23 4
## 589 2016-08-25 3
## 590 2016-08-26 4
## 591 2016-08-27 3
## 592 2016-08-29 10
## 593 2016-08-30 5
## 594 2016-08-31 2
## 595 2016-09-01 3
## 596 2016-09-02 3
## 597 2016-09-03 2
## 598 2016-09-04 2
## 599 2016-09-06 4
## 600 2016-09-07 2
## 601 2016-09-09 3
## 602 2016-09-10 2
## 603 2016-09-12 10
## 604 2016-09-13 7
## 605 2016-09-14 3
## 606 2016-09-15 4
## 607 2016-09-18 2
## 608 2016-09-19 8
## 609 2016-09-20 2
## 610 2016-09-21 5
## 611 2016-09-22 2
## 612 2016-09-25 5
## 613 2016-09-26 9
## 614 2016-09-27 8
## 615 2016-09-28 3
## 616 2016-09-29 2
## 617 2016-10-03 8
## 618 2016-10-04 4
## 619 2016-10-05 3
## 620 2016-10-06 3
## 621 2016-10-07 3
## 622 2016-10-09 2
## 623 2016-10-10 5
## 624 2016-10-11 3
## 625 2016-10-12 8
## 626 2016-10-13 5
## 627 2016-10-16 3
## 628 2016-10-17 4
## 629 2016-10-18 7
## 630 2016-10-19 2
## 631 2016-10-21 2
## 632 2016-10-22 3
## 633 2016-10-24 4
## 634 2016-10-25 5
## 635 2016-10-26 2
## 636 2016-10-27 5
## 637 2016-10-28 3
## 638 2016-10-29 6
## 639 2016-10-30 3
## 640 2016-10-31 2
## 641 2016-11-01 7
## 642 2016-11-02 2
## 643 2016-11-03 5
## 644 2016-11-06 2
## 645 2016-11-07 5
## 646 2016-11-08 4
## 647 2016-11-10 4
## 648 2016-11-11 2
## 649 2016-11-12 3
## 650 2016-11-14 4
## 651 2016-11-15 6
## 652 2016-11-16 2
## 653 2016-11-17 4
## 654 2016-11-18 4
## 655 2016-11-20 3
## 656 2016-11-21 3
## 657 2016-11-22 5
## 658 2016-11-23 3
## 659 2016-11-24 7
## 660 2016-11-25 3
## 661 2016-11-27 3
## 662 2016-11-28 4
## 663 2016-11-29 5
## 664 2016-11-30 2
## 665 2016-12-01 5
## 666 2016-12-02 2
## 667 2016-12-03 2
## 668 2016-12-05 4
## 669 2016-12-06 3
## 670 2016-12-07 3
## 671 2016-12-08 3
## 672 2016-12-09 3
## 673 2016-12-11 3
## 674 2016-12-12 2
## 675 2016-12-13 5
## 676 2016-12-14 8
## 677 2016-12-15 7
## 678 2016-12-16 2
## 679 2016-12-19 6
## 680 2016-12-20 5
## 681 2016-12-21 4
## 682 2016-12-22 11
## 683 2016-12-26 4
## 684 2016-12-27 4
## 685 2016-12-28 6
## 686 2016-12-29 3
## 687 2016-12-31 2
## 688 2017-01-02 4
## 689 2017-01-03 9
## 690 2017-01-09 2
## 691 2017-01-10 3
## 692 2017-01-11 4
## 693 2017-01-12 2
## 694 2017-01-13 4
## 695 2017-01-16 4
## 696 2017-01-17 4
## 697 2017-01-18 6
## 698 2017-01-19 8
## 699 2017-01-20 4
## 700 2017-01-23 8
## 701 2017-01-24 4
## 702 2017-01-25 2
## 703 2017-01-26 4
## 704 2017-01-27 5
## 705 2017-02-01 3
## 706 2017-02-02 5
## 707 2017-02-03 8
## 708 2017-02-06 6
## 709 2017-02-07 8
## 710 2017-02-08 5
## 711 2017-02-09 5
## 712 2017-02-10 3
## 713 2017-02-12 2
## 714 2017-02-13 8
## 715 2017-02-14 4
## 716 2017-02-15 2
## 717 2017-02-16 2
## 718 2017-02-17 4
## 719 2017-02-18 4
## 720 2017-02-19 4
## 721 2017-02-20 5
## 722 2017-02-21 6
## 723 2017-02-22 6
## 724 2017-02-23 7
## 725 2017-02-24 4
## 726 2017-02-25 2
## 727 2017-02-26 4
## 728 2017-02-27 2
## 729 2017-02-28 3
## 730 2017-03-01 3
## 731 2017-03-02 4
## 732 2017-03-03 5
## 733 2017-03-06 3
## 734 2017-03-07 6
## 735 2017-03-08 11
## 736 2017-03-09 6
## 737 2017-03-10 5
## 738 2017-03-12 2
## 739 2017-03-13 5
## 740 2017-03-14 4
## 741 2017-03-15 2
## 742 2017-03-16 4
## 743 2017-03-17 2
## 744 2017-03-21 6
## 745 2017-03-22 3
## 746 2017-03-23 6
## 747 2017-03-24 3
## 748 2017-03-27 5
## 749 2017-03-28 3
## 750 2017-03-29 4
## 751 2017-03-30 11
## 752 2017-03-31 7
## 753 2017-04-03 2
## 754 2017-04-04 4
## 755 2017-04-05 3
## 756 2017-04-06 6
## 757 2017-04-11 6
## 758 2017-04-12 4
## 759 2017-04-13 3
## 760 2017-04-14 3
## 761 2017-04-15 3
## 762 2017-04-17 7
## 763 2017-04-18 9
## 764 2017-04-19 5
## 765 2017-04-20 4
## 766 2017-04-23 4
## 767 2017-04-24 12
## 768 2017-04-25 6
## 769 2017-04-26 4
## 770 2017-04-27 8
## 771 2017-04-28 7
## 772 2017-05-01 21
## 773 2017-05-02 17
## 774 2017-05-03 6
## 775 2017-05-04 7
## 776 2017-05-05 4
## 777 2017-05-07 2
## 778 2017-05-08 3
## 779 2017-05-09 7
## 780 2017-05-10 3
## 781 2017-05-11 2
## 782 2017-05-12 2
## 783 2017-05-14 2
## 784 2017-05-15 4
## 785 2017-05-16 3
## 786 2017-05-17 8
## 787 2017-05-18 4
## 788 2017-05-19 3
## 789 2017-05-22 5
## 790 2017-05-23 7
## 791 2017-05-24 8
## 792 2017-05-25 5
## 793 2017-05-26 2
## 794 2017-05-27 2
## 795 2017-05-28 6
## 796 2017-05-29 7
## 797 2017-05-30 6
## 798 2017-05-31 6
## 799 2017-06-02 5
## 800 2017-06-05 2
## 801 2017-06-06 6
## 802 2017-06-07 3
## 803 2017-06-08 4
## 804 2017-06-09 3
## 805 2017-06-11 4
## 806 2017-06-12 5
## 807 2017-06-13 5
## 808 2017-06-15 4
## 809 2017-06-16 2
## 810 2017-06-19 7
## 811 2017-06-20 6
## 812 2017-06-21 6
## 813 2017-06-22 5
## 814 2017-06-23 6
## 815 2017-06-26 2
## 816 2017-06-27 9
## 817 2017-06-28 11
## 818 2017-06-29 15
## 819 2017-06-30 4
## 820 2017-07-02 3
## 821 2017-07-03 12
## 822 2017-07-04 3
## 823 2017-07-05 4
## 824 2017-07-06 6
aggregated_data <- time_series_data %>%
group_by(published_date) %>%
summarize(num_subscribers = sum(num_subscribers, na.rm = TRUE), .groups = "drop")
tsibble_data <- aggregated_data %>%
as_tsibble(index = published_date)
tsibble_data %>%
ggplot(aes(x = published_date, y = num_subscribers)) +
geom_line(color = "blue") +
labs(
title = "Number of Subscribers Over Time",
x = "Published Date",
y = "Number of Subscribers"
) +
theme_minimal()
I first group the data by published_date to ensure that each date has only one entry. If there are multiple entries for the same date, I add up the num_subscribers for that date to get a total. After grouping and summarizing the data, I convert it into a tsibble, a time-series format, to make it easier to analyze trends over time. Finally, I create a line plot where I display how the number of subscribers changes over time.
I can see how the number of subscribers changes over time. When I plot the data as a line chart, it allows me to observe whether the subscriber count is growing, declining, or remaining steady across the dates. This helps me identify patterns, such as periods of growth, seasonal variations, or any unexpected spikes or drops in the numbers.
trend_model <- lm(num_subscribers ~ as.numeric(published_date), data = aggregated_data)
summary(trend_model)
##
## Call:
## lm(formula = num_subscribers ~ as.numeric(published_date), data = aggregated_data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -13090 -8152 -4750 1387 256803
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 56677.555 17404.615 3.256 0.00116 **
## as.numeric(published_date) -2.829 1.048 -2.699 0.00705 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 17450 on 1208 degrees of freedom
## Multiple R-squared: 0.005995, Adjusted R-squared: 0.005172
## F-statistic: 7.286 on 1 and 1208 DF, p-value: 0.007047
aggregated_data %>%
ggplot(aes(x = published_date, y = num_subscribers)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", color = "red", se = FALSE) +
labs(
title = "Linear Trend in Number of Subscribers Over Time",
x = "Published Date",
y = "Number of Subscribers"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
aggregated_data %>%
ggplot(aes(x = published_date, y = num_subscribers)) +
geom_point(color = "blue") +
geom_smooth(span = 0.3, color = "green", method = "loess") +
labs(
title = "Visualizing Trends to Assess Subsetting Needs",
x = "Published Date",
y = "Number of Subscribers"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
I start by creating a linear regression model (trend_model) to explore the overall trend in the number of subscribers over time. By using summary(trend_model), I examine the results of the model to see how well the trend fits the data and whether there is a significant increase or decrease in subscribers over time. Linear Trend in Number of Subscribers Over Time- The scatterplot shows individual subscriber counts over time (blue points) with a fitted red linear regression line. The red linear trend line appears almost flat, indicating no significant overall growth in subscriber numbers over the analyzed time frame based on this linear model The blue points, however, show substantial variability. While most values cluster near zero, there are a few outliers with significantly higher subscriber counts, which could distort the effectiveness of the linear trend model.
Next, I plot the data to visualize the trend. First, I create a scatterplot of the number of subscribers against the published dates and add a red trend line using the linear regression model. This helps me clearly see the linear relationship between the two variables. The majority of points cluster near zero, indicating that most courses have relatively low subscriber counts. The green LOESS curve suggests very subtle upward or downward shifts in the number of subscribers over time. However, the trend appears mostly flat. The slight increases in some regions (e.g., around 2013–2014) may point to periods of growth, while other regions may reflect stable or declining subscriber counts. The LOESS line does not show a significant long-term upward or downward trend. This suggests that there may be no strong time-dependent growth in subscribers, and the data is heavily influenced by outliers
# Convert the date column to Date format
time_series_data$published_date <- as.Date(time_series_data$published_date,time_series_data$published_timestamp)
# Aggregate data to ensure unique dates
aggregated_data <- time_series_data %>%
group_by(published_date) %>%
summarize(num_subscribers = sum(num_subscribers, na.rm = TRUE), .groups = "drop")
# Create a tsibble object and fill gaps
tsibble_data <- aggregated_data %>%
as_tsibble(index = published_date) %>%
fill_gaps(num_subscribers = 0) # Fill gaps with 0 subscribers
# Apply smoothing using LOESS to detect seasonality
tsibble_data %>%
ggplot(aes(x = published_date, y = num_subscribers)) +
geom_point(color = "blue") +
geom_smooth(method = "loess", span = 0.2, color = "red") +
labs(
title = "Smoothing to Detect Seasonality in Number of Subscribers",
x = "Published Date",
y = "Number of Subscribers"
) +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
I create a tsibble object from the aggregated data using published_date as the time index. I also fill any missing dates with a default value of 0 subscribers to ensure the data is complete for analysis. Then, I apply LOESS smoothing to the data and plot it. The plot shows the raw data points in blue and a red smoothed curve that helps me detect seasonal patterns or trends over time. This makes it easier to see underlying patterns in the number of subscribers. The red line suggests that there is no strong seasonality or periodicity in the data. The increase in subscribers over time is minimal and mostly dominated by the outliers While the overall trend is subtle, minor fluctuations in the LOESS line could indicate possible seasonal behaviors. This would require further investigation by aggregating data by months or quarters. The significant outliers, where subscriber counts exceed 100,000, skew the perception of the overall trend. Identifying the dates of these outliers could help in understanding what factors contributed to their success (e.g., popular topics or promotions)
# Decompose the time series to extract seasonal components
decomposed <- tsibble_data %>%
model(stl = STL(num_subscribers ~ season(window = "periodic"))) %>%
components()
# Plot the decomposed components
autoplot(decomposed) +
labs(
title = "Decomposition of Time Series to Detect Seasonality",
x = "Published Date",
y = "Number of Subscribers"
)
# Illustrate seasonality using ACF and PACF
# ACF plot
tsibble_data %>%
ACF(num_subscribers) %>%
autoplot() +
labs(
title = "ACF Plot to Detect Seasonality",
x = "Lag",
y = "ACF"
)
# PACF plot
tsibble_data %>%
PACF(num_subscribers) %>%
autoplot() +
labs(
title = "PACF Plot to Detect Seasonality",
x = "Lag",
y = "PACF"
)
Decomposition of Time Series to Detect Seasonality 1st panel - The raw data exhibits a high degree of variability with noticeable outliers, where some courses attracted significantly more subscribers than the majority.This panel shows the original time series data (number of subscribers over time). The raw data exhibits a high degree of variability with noticeable outliers, where some courses attracted significantly more subscribers than the majority. 2nd panel- The trend component captures the long-term progression of the data, smoothing out short-term fluctuations. It reveals a gradual increase in subscriber numbers until around 2015, followed by a slight decline. This could indicate a peak in subscriber growth during this period, possibly due to platform-wide factors such as promotions or market saturation 3rd panel - This panel captures repeating patterns at a seasonal frequency (e.g., yearly or weekly) Clear peaks in the seasonal component suggest recurring periods of higher subscriber activity. This could align with annual or weekly patterns, such as holiday seasons or consistent spikes during specific times of the week. last panel - The remainder represents the variability in the data that cannot be explained by the trend or seasonal components. Large residuals in certain areas indicate irregularities or one-off events that significantly impacted subscriber counts (e.g., a highly successful course launch or promotional campaign)
ACF Plot to Detect Seasonality- The ACF measures how the values of the time series are correlated with their past values (lags). The blue dashed lines indicate the threshold for statistically significant autocorrelations. Bars exceeding this threshold suggest a meaningful pattern or seasonality The ACF plot shows strong, regularly spaced peaks (e.g., at lags 7, 14, 21, and 28). This indicates a weekly seasonality in the time series, suggesting that subscriber behavior repeats at a weekly interval.
PACF Plot to Detect Seasonality- The PACF measures the direct relationship between a time series value and its lagged values, removing the effects of intermediate lags. This gives a clearer picture of the immediate influence of specific lags. Bars extending beyond the blue dashed lines indicate statistically significant partial autocorrelations at those lags The PACF plot shows strong spikes at lags 7, 14, and 21. This suggests that there is a direct weekly pattern in the time series data, consistent with the findings from the ACF plot. These significant lags confirm a weekly seasonality, where subscriber behavior exhibits a strong repetition every 7 days.