DATA 621: Final Project

0.1 Overview

In this analys i we will analyse Stocks data and check and how Covid-19 had impacted it from thethe begining of the year 2020. We have collected data from excahnge for last Five year starting from year 2015 to Apr 27 2020. We would be using Moving Average Model and Auto Regressive model to anlayse the time series data.

0.2 Time Series analysis of Stocks data

0.2.1 Data Preparation

Load the required libraries

Load required Datasets

## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   begins_at = col_datetime(format = ""),
##   open_price = col_double(),
##   close_price = col_double(),
##   high_price = col_double(),
##   low_price = col_double(),
##   volume = col_double(),
##   session = col_character(),
##   interpolated = col_logical(),
##   sname = col_character()
## )
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 370440 obs. of  10 variables:
##  $ X1          : num  0 1 2 3 4 5 6 7 8 9 ...
##  $ begins_at   : POSIXct, format: "2015-04-28" "2015-04-29" ...
##  $ open_price  : num  24.9 24.9 24.9 24.9 24.9 ...
##  $ close_price : num  24.9 24.9 24.9 24.9 24.9 ...
##  $ high_price  : num  24.9 24.9 24.9 24.9 24.9 ...
##  $ low_price   : num  24.9 24.9 24.9 24.9 24.9 ...
##  $ volume      : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ session     : chr  "reg" "reg" "reg" "reg" ...
##  $ interpolated: logi  TRUE TRUE TRUE TRUE TRUE TRUE ...
##  $ sname       : chr  "AA" "AA" "AA" "AA" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   X1 = col_double(),
##   ..   begins_at = col_datetime(format = ""),
##   ..   open_price = col_double(),
##   ..   close_price = col_double(),
##   ..   high_price = col_double(),
##   ..   low_price = col_double(),
##   ..   volume = col_double(),
##   ..   session = col_character(),
##   ..   interpolated = col_logical(),
##   ..   sname = col_character()
##   .. )

Creating a character object called months.abb

0.2.2 Data Exploration

Now, Let’s use the mutate() funtion in the dplyr package to calculate average price for the stocks

## Warning in mean.default(data_pro[]): argument is not numeric or logical:
## returning NA

Yearly average price for the stocks

Monthly average price for the stocks

Let’s look at our data structure

## Observations: 370,440
## Variables: 10
## $ X1           <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ...
## $ begins_at    <dttm> 2015-04-28, 2015-04-29, 2015-04-30, 2015-05-01, 2015-...
## $ open_price   <dbl> 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, ...
## $ close_price  <dbl> 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, ...
## $ high_price   <dbl> 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, ...
## $ low_price    <dbl> 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, ...
## $ volume       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ session      <chr> "reg", "reg", "reg", "reg", "reg", "reg", "reg", "reg"...
## $ interpolated <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, ...
## $ sname        <chr> "AA", "AA", "AA", "AA", "AA", "AA", "AA", "AA", "AA", ...
##        X1           begins_at                     open_price     
##  Min.   :   0.0   Min.   :2015-04-28 00:00:00   Min.   :   0.00  
##  1st Qu.: 314.8   1st Qu.:2016-07-26 18:00:00   1st Qu.:  13.61  
##  Median : 629.5   Median :2017-10-24 12:00:00   Median :  26.33  
##  Mean   : 629.5   Mean   :2017-10-25 14:51:25   Mean   :  48.92  
##  3rd Qu.: 944.2   3rd Qu.:2019-01-25 18:00:00   3rd Qu.:  56.09  
##  Max.   :1259.0   Max.   :2020-04-27 00:00:00   Max.   :1266.56  
##   close_price          high_price        low_price          volume         
##  Min.   :   0.0052   Min.   :   0.00   Min.   :   0.0   Min.   :        0  
##  1st Qu.:  13.6100   1st Qu.:  13.72   1st Qu.:  13.5   1st Qu.:   103282  
##  Median :  26.3300   Median :  26.67   Median :  26.0   Median :   421838  
##  Mean   :  48.9221   Mean   :  49.43   Mean   :  48.4   Mean   :  1650980  
##  3rd Qu.:  56.1200   3rd Qu.:  56.68   3rd Qu.:  55.5   3rd Qu.:  1332573  
##  Max.   :1250.0000   Max.   :1274.41   Max.   :1232.0   Max.   :375088650  
##    session          interpolated       sname          
##  Length:370440      Mode :logical   Length:370440     
##  Class :character   FALSE:362193    Class :character  
##  Mode  :character   TRUE :8247      Mode  :character  
##                                                       
##                                                       
## 
X1 begins_at open_price close_price high_price low_price volume session interpolated sname
Min. : 0.0 Min. :2015-04-28 00:00:00 Min. : 0.00 Min. : 0.0052 Min. : 0.00 Min. : 0.0 Min. : 0 Length:370440 Mode :logical Length:370440
1st Qu.: 314.8 1st Qu.:2016-07-26 18:00:00 1st Qu.: 13.61 1st Qu.: 13.6100 1st Qu.: 13.72 1st Qu.: 13.5 1st Qu.: 103282 Class :character FALSE:362193 Class :character
Median : 629.5 Median :2017-10-24 12:00:00 Median : 26.33 Median : 26.3300 Median : 26.67 Median : 26.0 Median : 421838 Mode :character TRUE :8247 Mode :character
Mean : 629.5 Mean :2017-10-25 14:51:25 Mean : 48.92 Mean : 48.9221 Mean : 49.43 Mean : 48.4 Mean : 1650980 NA NA NA
3rd Qu.: 944.2 3rd Qu.:2019-01-25 18:00:00 3rd Qu.: 56.09 3rd Qu.: 56.1200 3rd Qu.: 56.68 3rd Qu.: 55.5 3rd Qu.: 1332573 NA NA NA
Max. :1259.0 Max. :2020-04-27 00:00:00 Max. :1266.56 Max. :1250.0000 Max. :1274.41 Max. :1232.0 Max. :375088650 NA NA NA

Calculate number of ZERO’s in each variable in the dataset

## Warning: attributes are not identical across measure variables;
## they will be dropped
variable n percent
volume 8625 2.3%
X1 294 0.1%
high_price 1 0%
low_price 1 0%
open_price 1 0%

Now, let’s only target shares whose open prices are between 100 and 200

##   [1] "AA"    "AAN"   "AAP"   "AAT"   "AB"    "ABB"   "ABBV"  "ABC"   "ABEV" 
##  [10] "ABG"   "ABM"   "ABR"   "ABT"   "ACC"   "ACCO"  "ACH"   "ACM"   "ACN"  
##  [19] "ACP"   "ACRE"  "ACT"   "ADC"   "ADM"   "ADPT"  "ADS"   "ADT"   "ADX"  
##  [28] "AEB"   "AEE"   "AEG"   "AEL"   "AEM"   "AEO"   "AEP"   "AER"   "AES"  
##  [37] "AFC"   "AFB"   "AFG"   "AFL"   "AFSIA" "AFSIB" "AFSIC" "AFT"   "AG"   
##  [46] "AGCO"  "AGD"   "AGI"   "AGM"   "AGN"   "AGO"   "AGRO"  "AGX"   "AHC"  
##  [55] "AHH"   "AHT"   "AI"    "AIF"   "AIG"   "AIN"   "AIR"   "AIT"   "AIV"  
##  [64] "AIW"   "AIZ"   "AJG"   "AKR"   "AL"    "ALB"   "ALE"   "ALEX"  "ALG"  
##  [73] "ALK"   "ALL"   "ALLE"  "ALLY"  "ALPN"  "ALSN"  "ALV"   "ALX"   "AM"   
##  [82] "AMC"   "AME"   "AMG"   "AMH"   "AMHC"  "AMP"   "AMRC"  "AMT"   "AMTD" 
##  [91] "AMX"   "AN"    "ANET"  "ANF"   "ANH"   "ANTM"  "AOD"   "AON"   "AOS"  
## [100] "AP"    "APA"   "APAM"  "APD"   "APH"   "APLE"  "APO"   "AR"    "ARC"  
## [109] "ARCO"  "ARDC"  "ARE"   "ARES"  "ARI"   "ARL"   "ARMK"  "ARR"   "ARW"  
## [118] "ASA"   "ASB"   "ASC"   "ASG"   "ASGN"  "ASH"   "ASPN"  "ASR"   "ASX"  
## [127] "AT"    "ATEN"  "ATHM"  "ATI"   "ATLS"  "ATO"   "ATR"   "ATTO"  "ATV"  
## [136] "AU"    "AUY"   "AVA"   "AVAL"  "AVB"   "AVD"   "AVH"   "AVK"   "AVT"  
## [145] "AVY"   "AWF"   "AWI"   "AWK"   "AWP"   "AWR"   "AXE"   "AXL"   "AXP"  
## [154] "AXR"   "AXS"   "AXTA"  "AYI"   "AZN"   "AZO"   "AZZ"   "B"     "BA"   
## [163] "BABA"  "BAC"   "BAF"   "BAH"   "BAK"   "BAM"   "BANC"  "BAP"   "BAX"  
## [172] "BBD"   "BBDO"  "BBF"   "BBK"   "BBL"   "BBN"   "BBVA"  "BBW"   "BBX"  
## [181] "BBY"   "BC"    "BCC"   "BCE"   "BCEI"  "BCH"   "BCO"   "BCS"   "BCX"  
## [190] "BDC"   "BDJ"   "BDN"   "BDX"   "BEN"   "BEP"   "BERY"  "BFAM"  "BFK"  
## [199] "BFO"   "BFS"   "BFZ"   "BG"    "BGB"   "BGG"   "BGH"   "BGR"   "BGS"  
## [208] "BGT"   "BGX"   "BGY"   "BH"    "BHE"   "BHK"   "BHLB"  "BHP"   "BIF"  
## [217] "BIG"   "BIO"   "BIP"   "BIT"   "BITA"  "BK"    "BKD"   "BKE"   "BKH"  
## [226] "BKK"   "BKN"   "BKT"   "BKU"   "BLK"   "BLL"   "BLW"   "BLX"   "BMA"  
## [235] "BME"   "BMI"   "BMO"   "BMY"   "BNS"   "BNY"   "BOE"   "BOH"   "BOOT" 
## [244] "BP"    "BPT"   "BPY"   "BQH"   "BR"    "BRC"   "BRFS"  "BRO"   "BRP"  
## [253] "BRT"   "BRX"   "BSAC"  "BSBR"  "BSD"   "BSE"   "BSL"   "BSMX"  "BST"  
## [262] "BSX"   "BTA"   "BTE"   "BTO"   "BTT"   "BTU"   "BTZ"   "BUD"   "BUI"  
## [271] "BURL"  "BVN"   "BWA"   "BWG"   "BX"    "BXC"   "BXMT"  "BXMX"  "BXP"  
## [280] "BXS"   "BYD"   "BYM"   "BZH"   "C"     "CC"    "CL"    "CN"    "CP"   
## [289] "CB"    "CACI"  "CAE"

The graph above tells us about average price of stocks for each sector name.

Cheking only with 500 stokcs data and analyze the distribution of data in each sector

First graph tells us about increase in count of sectors, division by sectors

The graph above tells us about yearly increase in count of sectors

Third graph tells us about yearly increase in average price of sectors

0.2.3 Build Models

Below is Graph of the sector by Month and year,which shows some pattern

We will do some analysis to see how stocks from few of these industries fit with AR(Auto Regression) and MA(Moving Average) model.

Analyzing average price of stocks yearly for each sector in the dataset

Analyzing average price of stocks monthly for each sector in the dataset

## Warning: Removed 2904 rows containing missing values (geom_col).

Box plot for year

## Warning: Removed 3518 rows containing non-finite values (stat_boxplot).

From the box plot above we can analyze that mostly all the sectors in our dataset have some outliers throughout 5 years, except two sectors that are: Technology and Utilities.

Analyzing top 3 stocks in each Sector

##       [,1]                                                [,2]  
##  [1,] "AutoZone, Inc"                                     "AZO" 
##  [2,] "AutoZone, Inc"                                     "AZO" 
##  [3,] "BlackRock, Inc"                                    "BLK" 
##  [4,] "Biglari Holdings Inc"                              "BH"  
##  [5,] "Biglari Holdings Inc"                              "BH"  
##  [6,] "The Boeing Company"                                "BA"  
##  [7,] "Bio-Rad Laboratories, Inc"                         "BIO" 
##  [8,] "Allergan plc"                                      "AGN" 
##  [9,] "Alliance Data Systems Corporation"                 "ADS" 
## [10,] "Anthem, Inc"                                       "ANTM"
## [11,] "Becton, Dickinson and Company"                     "BDX" 
## [12,] "Credicorp Ltd"                                     "BAP" 
## [13,] "Canadian Pacific Railway Limited"                  "CP"  
## [14,] "Acuity Brands, Inc"                                "AYI" 
## [15,] "Arista Networks, Inc"                              "ANET"
## [16,] "Grupo Aeroportuario del Sureste, S. A. B. de C. V" "ASR" 
## [17,] "Air Products and Chemicals, Inc"                   "APD" 
## [18,] "CACI International Inc"                            "CACI"
## [19,] "Advance Auto Parts, Inc"                           "AAP" 
## [20,] "Advance Auto Parts, Inc"                           "AAP" 
##       [,3]                       
##  [1,] "Consumer Cyclical Sector" 
##  [2,] "Consumer Defensive Sector"
##  [3,] "Financial Services Sector"
##  [4,] "Consumer Cyclical Sector" 
##  [5,] "Consumer Defensive Sector"
##  [6,] "Industrials Sector"       
##  [7,] "Healthcare Sector"        
##  [8,] "Healthcare Sector"        
##  [9,] "Financial Services Sector"
## [10,] "Healthcare Sector"        
## [11,] "Healthcare Sector"        
## [12,] "Financial Services Sector"
## [13,] "Industrials Sector"       
## [14,] "Industrials Sector"       
## [15,] "Technology Sector"        
## [16,] "Industrials Sector"       
## [17,] "Basic Materials Sector"   
## [18,] "Technology Sector"        
## [19,] "Consumer Cyclical Sector" 
## [20,] "Consumer Defensive Sector"

We will study the flow on some of the stocks from Health and Tech Sectors like:

ANTM Anthem, Inc

ANET Arista Networks, Inc

BA The Boeing Company

##        X1           begins_at                     open_price     
##  Min.   :   0.0   Min.   :2015-04-28 00:00:00   Min.   :   0.00  
##  1st Qu.: 314.8   1st Qu.:2016-07-26 18:00:00   1st Qu.:  13.61  
##  Median : 629.5   Median :2017-10-24 12:00:00   Median :  26.33  
##  Mean   : 629.5   Mean   :2017-10-25 14:51:25   Mean   :  48.92  
##  3rd Qu.: 944.2   3rd Qu.:2019-01-25 18:00:00   3rd Qu.:  56.09  
##  Max.   :1259.0   Max.   :2020-04-27 00:00:00   Max.   :1266.56  
##   close_price          high_price        low_price          volume         
##  Min.   :   0.0052   Min.   :   0.00   Min.   :   0.0   Min.   :        0  
##  1st Qu.:  13.6100   1st Qu.:  13.72   1st Qu.:  13.5   1st Qu.:   103282  
##  Median :  26.3300   Median :  26.67   Median :  26.0   Median :   421838  
##  Mean   :  48.9221   Mean   :  49.43   Mean   :  48.4   Mean   :  1650980  
##  3rd Qu.:  56.1200   3rd Qu.:  56.68   3rd Qu.:  55.5   3rd Qu.:  1332573  
##  Max.   :1250.0000   Max.   :1274.41   Max.   :1232.0   Max.   :375088650  
##    session          interpolated       sname          
##  Length:370440      Mode :logical   Length:370440     
##  Class :character   FALSE:362193    Class :character  
##  Mode  :character   TRUE :8247      Mode  :character  
##                                                       
##                                                       
## 
## Observations: 370,440
## Variables: 10
## $ X1           <dbl> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, ...
## $ begins_at    <dttm> 2015-04-28, 2015-04-29, 2015-04-30, 2015-05-01, 2015-...
## $ open_price   <dbl> 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, ...
## $ close_price  <dbl> 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, ...
## $ high_price   <dbl> 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, ...
## $ low_price    <dbl> 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, 24.8871, ...
## $ volume       <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ...
## $ session      <chr> "reg", "reg", "reg", "reg", "reg", "reg", "reg", "reg"...
## $ interpolated <lgl> TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, ...
## $ sname        <chr> "AA", "AA", "AA", "AA", "AA", "AA", "AA", "AA", "AA", ...

Converting the data of stokcs in wide format

Fit an AR model to the follwing data:

ANTM Anthem, Inc

ANET Arista Networks, Inc

BA The Boeing Company

## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## 
## Attaching package: 'xts'
## The following objects are masked from 'package:dplyr':
## 
##     first, last
##  [1] "2020-01-02" "2020-01-03" "2020-01-06" "2020-01-07" "2020-01-08"
##  [6] "2020-01-09" "2020-01-10" "2020-01-13" "2020-01-14" "2020-01-15"
## [11] "2020-01-16" "2020-01-17" "2020-01-21" "2020-01-22" "2020-01-23"
## [16] "2020-01-24" "2020-01-27" "2020-01-28" "2020-01-29" "2020-01-30"
## [21] "2020-01-31" "2020-02-03" "2020-02-04" "2020-02-05" "2020-02-06"
## [26] "2020-02-07" "2020-02-10" "2020-02-11" "2020-02-12" "2020-02-13"
## [31] "2020-02-14" "2020-02-18" "2020-02-19" "2020-02-20" "2020-02-21"
## [36] "2020-02-24" "2020-02-25" "2020-02-26" "2020-02-27" "2020-02-28"
## [41] "2020-03-02" "2020-03-03" "2020-03-04" "2020-03-05" "2020-03-06"
## [46] "2020-03-09" "2020-03-10" "2020-03-11" "2020-03-12" "2020-03-13"
## [51] "2020-03-16" "2020-03-17" "2020-03-18" "2020-03-19" "2020-03-20"
## [56] "2020-03-23" "2020-03-24" "2020-03-25" "2020-03-26" "2020-03-27"
## [61] "2020-03-30" "2020-03-31" "2020-04-01" "2020-04-02" "2020-04-03"
## [66] "2020-04-06" "2020-04-07" "2020-04-08" "2020-04-09" "2020-04-13"
## [71] "2020-04-14" "2020-04-15" "2020-04-16" "2020-04-17" "2020-04-20"
## [76] "2020-04-21" "2020-04-22" "2020-04-23" "2020-04-24" "2020-04-27"
##         [,1]
##  [1,] 302.67
##  [2,] 293.68
##  [3,] 295.75
##  [4,] 299.20
##  [5,] 300.89
##  [6,] 307.83
##  [7,] 307.94
##  [8,] 306.88
##  [9,] 296.04
## [10,] 297.26
## [11,] 302.92
## [12,] 305.07
## [13,] 303.94
## [14,] 306.68
## [15,] 303.46
## [16,] 304.84
## [17,] 293.63
## [18,] 285.00
## [19,] 279.50
## [20,] 269.26
## [21,] 265.68
## [22,] 266.11
## [23,] 269.90
## [24,] 275.36
## [25,] 288.78
## [26,] 279.70
## [27,] 275.88
## [28,] 277.04
## [29,] 286.27
## [30,] 294.73
## [31,] 299.29
## [32,] 297.45
## [33,] 302.01
## [34,] 301.38
## [35,] 292.03
## [36,] 283.49
## [37,] 279.21
## [38,] 269.84
## [39,] 263.06
## [40,] 249.94
## [41,] 259.60
## [42,] 270.13
## [43,] 286.55
## [44,] 287.65
## [45,] 278.18
## [46,] 264.00
## [47,] 278.33
## [48,] 278.60
## [49,] 262.03
## [50,] 267.50
## [51,] 226.50
## [52,] 229.76
## [53,] 224.45
## [54,] 206.22
## [55,] 204.29
## [56,] 188.54
## [57,] 183.98
## [58,] 190.09
## [59,] 221.28
## [60,] 217.61
## [61,] 224.66
## [62,] 235.28
## [63,] 217.01
## [64,] 210.53
## [65,] 208.89
## [66,] 215.51
## [67,] 235.70
## [68,] 227.15
## [69,] 240.94
## [70,] 240.78
## [71,] 245.76
## [72,] 249.15
## [73,] 254.51
## [74,] 279.01
## [75,] 262.48
## [76,] 255.00
## [77,] 255.09
## [78,] 264.71
## [79,] 265.48
## [80,] 267.56
##              [,1]
## 2020-01-02 302.67
##              [,1]
## 2020-01-02 302.67
## 2020-01-03 293.68
## 2020-01-06 295.75
## 2020-01-07 299.20
## 2020-01-08 300.89
## 2020-01-09 307.83
## 2020-01-10 307.94
## 2020-01-13 306.88
## 2020-01-14 296.04
## 2020-01-15 297.26
## 2020-01-16 302.92
## 2020-01-17 305.07
## 2020-01-21 303.94
## 2020-01-22 306.68
## 2020-01-23 303.46
## 2020-01-24 304.84
## 2020-01-27 293.63
## 2020-01-28 285.00
## 2020-01-29 279.50
## 2020-01-30 269.26
## 2020-01-31 265.68
## 2020-02-03 266.11
## 2020-02-04 269.90
## 2020-02-05 275.36
## 2020-02-06 288.78
## 2020-02-07 279.70
## 2020-02-10 275.88
## 2020-02-11 277.04
## 2020-02-12 286.27
## 2020-02-13 294.73
## 2020-02-14 299.29
## 2020-02-18 297.45
## 2020-02-19 302.01
## 2020-02-20 301.38
## 2020-02-21 292.03
## 2020-02-24 283.49
## 2020-02-25 279.21
## 2020-02-26 269.84
## 2020-02-27 263.06
## 2020-02-28 249.94
## 2020-03-02 259.60
## 2020-03-03 270.13
## 2020-03-04 286.55
## 2020-03-05 287.65
## 2020-03-06 278.18
## 2020-03-09 264.00
## 2020-03-10 278.33
## 2020-03-11 278.60
## 2020-03-12 262.03
## 2020-03-13 267.50
## 2020-03-16 226.50
## 2020-03-17 229.76
## 2020-03-18 224.45
## 2020-03-19 206.22
## 2020-03-20 204.29
## 2020-03-23 188.54
## 2020-03-24 183.98
## 2020-03-25 190.09
## 2020-03-26 221.28
## 2020-03-27 217.61
## 2020-03-30 224.66
## 2020-03-31 235.28
## 2020-04-01 217.01
## 2020-04-02 210.53
## 2020-04-03 208.89
## 2020-04-06 215.51
## 2020-04-07 235.70
## 2020-04-08 227.15
## 2020-04-09 240.94
## 2020-04-13 240.78
## 2020-04-14 245.76
## 2020-04-15 249.15
## 2020-04-16 254.51
## 2020-04-17 279.01
## 2020-04-20 262.48
## 2020-04-21 255.00
## 2020-04-22 255.09
## 2020-04-23 264.71
## 2020-04-24 265.48
## 2020-04-27 267.56
##              [,1]
## 2020-04-14 245.76
## 2020-04-15 249.15
## 2020-04-16 254.51
## 2020-04-17 279.01
## 2020-04-20 262.48
## 2020-04-21 255.00
## 2020-04-22 255.09
## 2020-04-23 264.71
## 2020-04-24 265.48
## 2020-04-27 267.56
## [1]  0 21 40 62 80
##              [,1]
## 2020-04-14 245.76
##  [1] "Jan" "Jan" "Jan" "Jan" "Jan" "Jan" "Jan" "Jan" "Jan" "Jan" "Jan" "Jan"
## [13] "Jan" "Jan" "Jan" "Jan" "Jan" "Jan" "Jan" "Jan" "Jan" "Feb" "Feb" "Feb"
## [25] "Feb" "Feb" "Feb" "Feb" "Feb" "Feb" "Feb" "Feb" "Feb" "Feb" "Feb" "Feb"
## [37] "Feb" "Feb" "Feb" "Feb" "Mar" "Mar" "Mar" "Mar" "Mar" "Mar" "Mar" "Mar"
## [49] "Mar" "Mar" "Mar" "Mar" "Mar" "Mar" "Mar" "Mar" "Mar" "Mar" "Mar" "Mar"
## [61] "Mar" "Mar" "Apr" "Apr" "Apr" "Apr" "Apr" "Apr" "Apr" "Apr" "Apr" "Apr"
## [73] "Apr" "Apr" "Apr" "Apr" "Apr" "Apr" "Apr" "Apr"
## [1] 4
##              [,1]
## 2020-01-02 302.67
## 2020-01-03 293.68
## 2020-01-06 295.75
## 2020-01-07 299.20
## 2020-01-08 300.89
## 2020-01-09 307.83
## 2020-01-10 307.94
## 2020-01-13 306.88
## 2020-01-14 296.04
## 2020-01-15 297.26

In the above graph, it shows weekly change in Anthem stock data i.e. open price, close price, high price, low price.

## Don't know how to automatically pick scale for object of type yearmon. Defaulting to continuous.

In the above graph, it shows monthly change in Anthem stock data i.e. open price, close price, high price, low price.

Periodicity of Anthem Stocks data

## Daily periodicity from 2020-01-02 to 2020-04-27
## An 'xts' object on 2020-01-02/2020-04-27 containing:
##   Data: num [1:80, 1] 303 294 296 299 301 ...
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
##  NULL
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 17
##              [,1]
## 2020-01-02 302.67
## 2020-01-03 293.68
##              [,1]
## 2020-01-03 293.68
## 2020-02-03 266.11
## 2020-03-03 270.13
## 2020-04-03 208.89

With the commands head() and tail() we can see the first and last 6 lines of the base. There are 6 columns with: opening price, maximum and minimum prices, closing price, volume of transactions and adjusted price. Using the command summary() we verify the descriptive statistics of each price series and volume. The command str() returns the object structure. In this case, it’s a xts object, a time series.

0.2.4 Time Series Forecasting

##              [,1]
## 2015-04-28 150.15
## 2015-04-29 155.96
## 2015-04-30 151.83
## 2015-05-01 151.92
## 2015-05-04 153.45
## 2015-05-05 154.67
##              [,1]
## 2020-04-20 262.48
## 2020-04-21 255.00
## 2020-04-22 255.09
## 2020-04-23 264.71
## 2020-04-24 265.48
## 2020-04-27 267.56
##      Index             stocks_ANTM   
##  Min.   :2015-04-28   Min.   :117.0  
##  1st Qu.:2016-07-26   1st Qu.:145.2  
##  Median :2017-10-24   Median :195.9  
##  Mean   :2017-10-25   Mean   :206.5  
##  3rd Qu.:2019-01-26   3rd Qu.:262.8  
##  Max.   :2020-04-27   Max.   :317.6
## An 'xts' object on 2015-04-28/2020-04-27 containing:
##   Data: num [1:1259, 1] 150 156 152 152 153 ...
##   Indexed by objects of class: [Date] TZ: UTC
##   xts Attributes:  
##  NULL

## [1] 1

## 
## Autocorrelations of series 'stocks_BA', by lag
## 
##     0     1     2     3     4     5     6     7     8     9    10    11    12 
## 1.000 0.997 0.995 0.992 0.989 0.986 0.983 0.980 0.977 0.973 0.970 0.967 0.964 
##    13    14    15    16    17    18    19    20    21    22    23    24    25 
## 0.961 0.958 0.955 0.952 0.948 0.945 0.942 0.939 0.936 0.933 0.930 0.927 0.923 
##    26    27    28    29    30 
## 0.920 0.916 0.913 0.910 0.907

Plot for 2020 Data only

##              [,1]
## 2020-01-02 302.67
## 2020-01-03 293.68
## 2020-01-06 295.75
## 2020-01-07 299.20
## 2020-01-08 300.89
## 2020-01-09 307.83
## [1] 1

For for rest of the data before 2020

Plot for 2020 Data only

##              [,1]
## 2015-04-28 150.15
## 2015-04-29 155.96
## 2015-04-30 151.83
## 2015-05-01 151.92
## 2015-05-04 153.45
## 2015-05-05 154.67
## [1] 1

The ACF plots test if an individual lag autocorrelation is different than zero. An alternative approach is to use the Ljung-Box test, which tests whether any of a group of autocorrelations of a time series are different from zero.

In essence it tests the “overall randomness” based on a number of lags. If the result is a small p-value than it indicates the data are probably not white noise.

For 2020 Data

## 
##  Box-Ljung test
## 
## data:  wide_data_Main_20$ANTM
## X-squared = 480.01, df = 30, p-value < 2.2e-16
## 
##  Box-Ljung test
## 
## data:  wide_data_Main$ANTM
## X-squared = 4976, df = 4, p-value < 2.2e-16

Here, we perform a Ljung-Box test on the first 24 lag autocorrelations. The resulting p-value is significant at p < .001, so this supports our ACF plot consideration above where we stated it’s likely this is not purely white noise and that some time series information exists in this data.

These plots suggest that these slots the stocks improved from their position from mid of 2016 though 2018, and then it remained constant in progress untill Late 2019 and early 2020.

the trend is the long-term increase or decrease in the data. There is an increasing trend in the cement data. the seasonal pattern occurs when a time series is affected by seasonal factors such as the time of the year or the day of the week. The daily data of the stocks_ANTM doens’t show any seasonality in the graph.

the cycle occurs when the data exhibit rises and falls that are not of a fixed period. These fluctuations are usually due to economic conditions and are often related to the “business cycle”. We can see a few cycles in our in stocks_ANTM data from 2015 to 2018 and then in 2020 we have sudden drop due to covid 19. #https://afit-r.github.io/ts_exploration

0.2.5 Autocorrelation of Time Series

Another way to look at time series data is to plot each observation against another observation that occurred some time previously. For example, we could plot yt agaisnt yt-1 . This is called a lag plot because you are plotting the time series against lags of itself.

##           [,1]
## [1,] 0.9344986
## [1] 0.8950863
## [1] 0.6139195
## [1] 0.9075179
##           [,1]
## [1,] 0.9196915
## [1] 0.9196915
## [1] 0.8360629
## [1] 0.9595244

## [1] "xts" "zoo"

##                  
## 2015-04-28 150.15
## 2015-04-29 155.96
## 2015-04-30 151.83
## 2015-05-01 151.92
## 2015-05-04 153.45
## 2015-05-05 154.67

White Noise : Time series that show no autocorrelation are called “white noise”. Above plots shows that its of type of Random Walk model , and the (MA Model) Moving Average model should give better estimates of this index.

## 
## Call:
## arima(x = stocks_ANTM, order = c(1, 0, 0))
## 
## Coefficients:
##          ar1  intercept
##       0.9978   222.5894
## s.e.  0.0018    45.1308
## 
## sigma^2 estimated as 16.88:  log likelihood = -3568.12,  aic = 7142.25
## 
## Training set error measures:
##                     ME     RMSE      MAE         MPE    MAPE     MASE
## Training set 0.0542719 4.108244 2.529343 -0.01210249 1.19899 1.002039
##                    ACF1
## Training set -0.0209907
## 
## Call:
## arima(x = stocks_ANTM, order = c(0, 0, 1))
## 
## Coefficients:
##          ma1  intercept
##       0.9678   206.4969
## s.e.  0.0058     1.7300
## 
## sigma^2 estimated as 973.9:  log likelihood = -6119.59,  aic = 12245.19
## 
## Training set error measures:
##                      ME     RMSE      MAE       MPE     MAPE     MASE      ACF1
## Training set 0.01566924 31.20706 27.96412 -4.741534 15.01558 11.07842 0.9108069
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.

Fit with Only data After 2020

0.2.6 Predicting Time Series data

We will evalute all the data models and see its predction using both the models with Currnt Years data.

## $pred
## Time Series:
## Start = 1260 
## End = 1269 
## Frequency = 1 
##  [1] 267.4619 267.3640 267.2663 267.1688 267.0716 266.9745 266.8777 266.7811
##  [9] 266.6846 266.5884
## 
## $se
## Time Series:
## Start = 1260 
## End = 1269 
## Frequency = 1 
##  [1]  4.108244  5.803600  7.100186  8.189663  9.146360 10.008444 10.798612
##  [8] 11.531670 12.217916 12.864855

## Time Series:
## Start = 1180 
## End = 1182 
## Frequency = 1 
## [1] 305.1904 305.0809 304.9715
## Time Series:
## Start = 1180 
## End = 1182 
## Frequency = 1 
## [1] 305.2679 305.1673 305.0706

## [1] 80  1
## [1] 5

Rajwant Mishra, Priya Shaji, Debabrata Kabiraj, Isabel Ramesar, Sin Ying Wong and Fan Xu

May 1, 2020