1 Introduction : Instagram

Instagram is one of the most favorite media social that currently use right now among millenials. Here, I do analysis and forecasting of my friend instagram. The data is from instagram, by click settings -> security -> download data, as following picture :

2 Basic Concept : Time Series

2.1 Time Series

Time series is a method of analyzing and processing data which the values are affected by time. The action of predicting future values based on its value in the previous period of time is called forecasting. The data which formatted into a time series (ts) object must have some charactersitics:
* no missing intervals
* no missing values
* data should be ordered by time

2.2 Exploratory Data Analysis (EDA)

A ts object can be decomposed into 3 main components which will be calculated for forecasting. These components are :
- trend (T) : the movement of mean, globally,throughout an interval
- seasonal (S) : the pattern captured on each seasonal interval
- error (E) : the pattern /value that cannot be captured by both trend and seasonal.

2.3 Forecast Model

There are few ways of forecasting in time series :
1. Naive Bayes
2. Simple Moving Average (SMA)
3. Exponential Smoothing
- Simple exponential smoothing (SES): smoothing error
- Double exponential smoothing (Holt): smoothing error and trend
- Triple exponential smoothing (Holt Winters) : smoothing error, trend, & seasonal
4. ARIMA

2.4 Forecasting & Evaluation

Forecasting can be done by using forecast () function from forecast package. Evaluation can be done by comparing errors of the prediction.

2.5 Assumption Check

There are two assumption for a time series analysis :
1. Normality : Shapiro.test
- H0 : residuals are normally distributed
- H1 : residuals are not normally distributed
2. Autocorrelations :Box.test-Ljng-Box
- H0 : No autocorrelations in the forecast errors
- H1 : There is an autocorrelations in the forecast errors

2.7 Read Data

The data was obtained from instagram which contains of Date and account_IG that likes by intan_ohana’s account from 2017-01-01 until 2020-03-11

## 'data.frame':    16384 obs. of  2 variables:
##  $ Date      : Factor w/ 1128 levels "","1/1/2017",..: 498 498 498 498 494 494 494 494 494 494 ...
##  $ IG_account: Factor w/ 999 levels "","_citz","_katie_may",..: 881 680 448 448 121 834 216 797 262 900 ...

2.8 Data Preparation

2.8.1 Data Pre-processing

##       Date IG_account 
##        339          0

The intan_2017 data consists of 16045 observations and 2 variables. The description of each feature is explained below:

  • Date : Date when Intan gave like for IG_account.
  • IG_account : The IG account that liked by Intan.

As a data scientist, I will develop a forecasting model that will forecast number of likes that will be given by Intan. Based on our data, we want to forecast the number of likes given by Intan for each IG_account. That’s why we need to make a new variable (total_likes)

2.8.3 Data Aggregation

##        Date total_likes 
##           0           0
## [1] "2017-01-01" "2020-03-11"

2.9 Time series Object & EDA

In this step I changed the format data into ts format.

## [1] "ts"

Note :
1. There is no trend from total likes given by Intan
2. There is no seasonal trend from total likes given by Intan

2.9.1 Decompose

After I made the time series object for our intan data, I inspected our time series element of our intan_ts data. I want to look at the trend and seasonality pattern to choose the appropriate model for forecast intan_ts data. I used decompose() to know the trend, seasonality, and error of our time series data and visualize them using autoplot().

There is decreasing trend from semester II of 2017 until 2019 end.

2.11 Model Building

Based on the data inspection of decomposition there are trend and seasonal, using Holt winters and Seasonal Arima.

2.12 Forecast and Evaluation

##           Point Forecast        Lo 80    Hi 80       Lo 95    Hi 95
## 2020.0301     18.3538481   6.80628258 29.90141   0.6933706 36.01433
## 2020.0329     20.8903506   9.20189057 32.57881   3.0143936 38.76631
## 2020.0356      4.5543624  -7.27331378 16.38204 -13.5345074 22.64323
## 2020.0384      8.4774078  -3.48786494 20.44268  -9.8218977 26.77671
## 2020.0411     11.4434941  -0.65781078 23.54480  -7.0638546 29.95084
## 2020.0438     14.0949944   1.85916966 26.33082  -4.6180847 32.80807
## 2020.0466     10.0694512  -2.29943046 22.43833  -8.8471209 28.98602
## 2020.0493     13.5987425   1.09822009 26.09926  -5.5191568 32.71664
## 2020.0521     25.1464404  12.51564915 37.77723   5.8293121 44.46357
## 2020.0548     14.9241512   2.16442100 27.68388  -4.5901722 34.43847
## 2020.0575     16.3495020   3.46212279 29.23688  -3.3600437 36.05905
## 2020.0603     13.6104022   0.59662604 26.62418  -6.2924509 33.51326
## 2020.0630     14.9306839   1.79172669 28.06964  -5.1636171 35.02499
## 2020.0658     11.5914627  -1.67149416 24.85442  -8.6924794 31.87540
## 2020.0685     17.4543673   4.06855943 30.84018  -3.0174592 37.92619
## 2020.0712     11.3251008  -2.18244079 24.83264  -9.3329014 31.98310
## 2020.0740     18.2159514   4.58776338 31.84414  -2.6265635 39.05847
## 2020.0767     16.2957701   2.54799446 30.04355  -4.7296383 37.32118
## 2020.0795     14.7003834   0.83405139 28.56672  -6.5063413 35.90711
## 2020.0822     15.6141936   1.63031037 29.59808  -5.7723102 37.00070
## 2020.0849     13.8150540  -0.28540050 27.91551  -7.7497302 35.37984
## 2020.0877     15.8297549   1.61368498 30.04582  -5.9118479 37.57136
## 2020.0904     27.5932206  13.26246792 41.92397   5.6762257 49.51022
## 2020.0932     28.9111473  14.46662245 43.35567   6.8201529 51.00214
## 2020.0959     18.1199078   3.56249980 32.67732  -4.1437265 40.38354
## 2020.0986     20.4922731   5.82285073 35.16170  -1.9426724 42.92722
## 2020.1014     21.0108651   6.23027716 35.79145  -1.5940935 43.61582
## 2020.1041     19.8801946   4.98927089 34.77112  -2.8935080 42.65390
## 2020.1068     19.6723261   4.67187825 34.67277  -3.2688792 42.61353
## 2020.1096     15.2549130   0.14573486 30.36409  -7.8525809 38.36241
## 2020.1123     16.1855250   0.96839355 31.40266  -7.0870693 39.45812
## 2020.1151     18.7686495   3.44432513 34.09297  -4.6678822 42.20518
## 2020.1178     16.0599305   0.62915778 31.49070  -7.5393998 39.65926
## 2020.1205     17.8285530   2.29206137 33.36504  -5.9324605 41.58957
## 2020.1233     20.1348467   4.49335065 35.77634  -3.7867572 44.05645
## 2020.1260     14.7980372  -0.94776311 30.54384  -9.2830863 38.87916
## 2020.1288     16.6855192   0.83610103 32.53494  -7.5540741 40.92511
## 2020.1315     11.3768237  -4.57553921 27.32919 -13.0202100 35.77386
## 2020.1342     11.2000560  -4.85459164 27.25470 -13.3534087 35.75352
## 2020.1370     15.9503396  -0.20594510 32.10662  -8.7585656 40.65924
## 2020.1397     13.6019255  -2.65536104 29.85921 -11.2614487 38.46530
## 2020.1425     11.6221228  -4.73554177 27.97979 -13.3947663 36.63901
## 2020.1452     13.6438002  -2.81363035 30.10123 -11.5256678 38.81327
## 2020.1479     12.9782602  -3.57833509 29.53486 -12.3428672 38.29939
## 2020.1507     16.7409986   0.08582906 33.39617  -8.7308851 42.21288
## 2020.1534     12.9225023  -3.83066158 29.67567 -12.6992508 38.54426
## 2020.1562     10.8806735  -5.96991488 27.73126 -14.8900775 36.65142
## 2020.1589      9.0189101  -7.92854267 25.96636 -16.8999822 34.93780
## 2020.1616      7.6307527  -9.41301401 24.67452 -18.4354391 33.69694
## 2020.1644     -0.8797473 -18.01928667 16.25979 -27.0924108 25.33292
## 2020.1671     12.4961543  -4.73862554 29.73093 -13.8621669 38.85448
## 2020.1699     23.5444862   6.21498919 40.87398  -2.9586924 50.04766
## 2020.1726     15.3563970  -2.06730219 32.78010 -11.2908514 42.00365
## 2020.1753     10.8189143  -6.69848049 28.33631 -15.9716291 37.60946
## 2020.1781     16.9013293  -0.70926264 34.51192 -10.0317469 43.83441
## 2020.1808     13.1141803  -4.58911810 30.81748 -13.9606782 40.18904
## 2020.1836      7.3246893 -10.47083268 25.12021 -19.8912130 34.54059
## 2020.1863     16.6946197  -1.19265038 34.58189 -10.6615992 44.05084
## 2020.1890     15.2560938  -2.72245608 33.23464 -12.2397255 42.75191
## 2020.1918     10.4762305  -7.59313822 28.54560 -17.1584842 38.11095
##           Point Forecast    Lo 80    Hi 80       Lo 95    Hi 95
## 2020.0301       14.54628 4.925006 24.16755 -0.16818884 29.26075
## 2020.0329       15.03610 5.213510 24.85869  0.01374448 30.05846
## 2020.0356       15.09772 5.206607 24.98884 -0.02943367 30.22488
## 2020.0384       15.04768 5.113208 24.98215 -0.14578520 30.24115
## 2020.0411       14.98231 5.011551 24.95307 -0.26664842 30.23126
## 2020.0438       14.91680 4.913955 24.91964 -0.38122912 30.21482
## 2020.0466       14.85400 4.822452 24.88556 -0.48793106 30.19594
## 2020.0493       14.79431 4.737001 24.85162 -0.58701741 30.17564
## 2020.0521       14.73766 4.657221 24.81809 -0.67903970 30.15435
## 2020.0548       14.68390 4.582698 24.78511 -0.76455782 30.13237
## 2020.0575       14.63291 4.513041 24.75277 -0.84409120 30.10990
## 2020.0603       14.58452 4.447893 24.72115 -0.91811407 30.08716
## 2020.0630       14.53862 4.386924 24.69031 -0.98705843 30.06429
## 2020.0658       14.49507 4.329833 24.66030 -1.05131801 30.04145
## 2020.0685       14.45375 4.276343 24.63116 -1.11125204 30.01875
## 2020.0712       14.41455 4.226200 24.60291 -1.16718863 29.99629
## 2020.0740       14.37736 4.179170 24.57556 -1.21942778 29.97416
## 2020.0767       14.34208 4.135038 24.54913 -1.26824412 29.95241
## 2020.0795       14.30861 4.093606 24.52361 -1.31388926 29.93111
## 2020.0822       14.27685 4.054691 24.49901 -1.35659398 29.91030
## 2020.0849       14.24672 4.018123 24.47532 -1.39657013 29.89002
## 2020.0877       14.21814 3.983747 24.45253 -1.43401240 29.87029
## 2020.0904       14.19102 3.951418 24.43062 -1.46909981 29.85114
## 2020.0932       14.16529 3.921002 24.40958 -1.50199719 29.83258
## 2020.0959       14.14088 3.892376 24.38939 -1.53285637 29.81462
## 2020.0986       14.11772 3.865423 24.37003 -1.56181737 29.79727
## 2020.1014       14.09575 3.840039 24.35147 -1.58900940 29.78052
## 2020.1041       14.07491 3.816123 24.33370 -1.61455180 29.76437
## 2020.1068       14.05514 3.793583 24.31669 -1.63855488 29.74883
## 2020.1096       14.03637 3.772334 24.30041 -1.66112068 29.73387
## 2020.1123       14.01857 3.752296 24.28485 -1.68234367 29.71949
## 2020.1151       14.00169 3.733395 24.26998 -1.70231138 29.70569
## 2020.1178       13.98567 3.715561 24.25577 -1.72110496 29.69244
## 2020.1205       13.97047 3.698730 24.24220 -1.73879970 29.67973
## 2020.1233       13.95605 3.682841 24.22925 -1.75546552 29.66756
## 2020.1260       13.94237 3.667839 24.21689 -1.77116737 29.65590
## 2020.1288       13.92939 3.653670 24.20510 -1.78596565 29.64474
## 2020.1315       13.91707 3.640286 24.19386 -1.79991658 29.63406
## 2020.1342       13.90539 3.627640 24.18314 -1.81307247 29.62385
## 2020.1370       13.89431 3.615689 24.17292 -1.82548207 29.61409
## 2020.1397       13.88379 3.604393 24.16319 -1.83719084 29.60477
## 2020.1425       13.87381 3.593715 24.15391 -1.84824117 29.59587
## 2020.1452       13.86435 3.583618 24.14508 -1.85867264 29.58737
## 2020.1479       13.85537 3.574069 24.13667 -1.86852221 29.57926
## 2020.1507       13.84685 3.565038 24.12866 -1.87782444 29.57152
## 2020.1534       13.83877 3.556495 24.12104 -1.88661161 29.56415
## 2020.1562       13.83110 3.548412 24.11379 -1.89491395 29.55711
## 2020.1589       13.82382 3.540764 24.10689 -1.90275975 29.55041
## 2020.1616       13.81692 3.533526 24.10032 -1.91017550 29.54402
## 2020.1644       13.81037 3.526676 24.09407 -1.91718603 29.53794
## 2020.1671       13.80416 3.520191 24.08813 -1.92381462 29.53214
## 2020.1699       13.79827 3.514052 24.08249 -1.93008310 29.52662
## 2020.1726       13.79268 3.508240 24.07711 -1.93601196 29.52137
## 2020.1753       13.78737 3.502737 24.07201 -1.94162044 29.51637
## 2020.1781       13.78234 3.497525 24.06715 -1.94692659 29.51161
## 2020.1808       13.77756 3.492589 24.06254 -1.95194740 29.50708
## 2020.1836       13.77303 3.487915 24.05815 -1.95669883 29.50277
## 2020.1863       13.76874 3.483487 24.05399 -1.96119588 29.49867
## 2020.1890       13.76466 3.479292 24.05003 -1.96545267 29.49477
## 2020.1918       13.76079 3.475318 24.04626 -1.96948249 29.49106

Holt-Winter is better model that auto-arima in forecasting.

2.13 Assumption Check

1. Normality : Shapiro.test
- H0 : residuals are normally distributed
- H1 : residuals are not normally distributed

## 
##  Shapiro-Wilk normality test
## 
## data:  intan_holt_f$residuals
## W = 0.95594, p-value = 0.00000000000004202

2. Autocorrelation : Box.test - Ljung-Box
- H0: No autocorrelation in the forecast errors
- H1: there is an autocorrelation in the forecast errors

## 
##  Box-Ljung test
## 
## data:  intan_holt_f$residuals
## X-squared = 4.1319, df = 1, p-value = 0.04208
## 
##  Box-Ljung test
## 
## data:  intan_arima_f$residuals
## X-squared = 0.00033281, df = 1, p-value = 0.9854

Based on the assumption check, there is no autocorrelation on our forecast residuals (p-value > 0.05) in ARIMA model. Still, our forecast’s residuals are not distributed normally, therefore it’s residuals may not be appeared around its mean as seen in the histogram.

In a time series, such errors might emerge from various unpredictable events and is actually quite unavoidable. One strategy to overcome it is to analyze what kinds of unpredictable events that might occur and occurs frequently. This can be done by time series analysis using seasonality adjustment. From that insight, airports can develop an standard operational procedure and smart strategies for dealing with such events.