######A continuación se presentará el proceso de creación de la serie USgas para posteriormente crear las entradas de regresión que representan la tendencia de la serie y los componentes estacionales. ######En primer lugar, se carga la serie desde el paquete TSstudio y se traza con la función ts_plot, el resultado es la siguiente gráfica:
library(TSstudio)
## Warning: package 'TSstudio' was built under R version 4.1.1
data(USgas)
ts_plot(USgas,
title = "US Monthly Natural Gas consumption",
Ytitle = "Billion Cubic Feet",
Xtitle = "Year")
######Además, se puede saber las principales características de la serie utilizando la función ts_info:
ts_info(USgas)
## The USgas series is a ts object with 1 variable and 238 observations
## Frequency: 12
## Start time: 2000 1
## End time: 2019 10
######Como se observa en el gráfico de la serie, USgas es una serie mensual con un fuerte componente estacional mensual y una línea de tendencia bastante estable. Con la función ts_decompose, se puede descomponer la serie original en su tendencia, estacionalidad y el error asociado (residual):
ts_decompose(USgas)
######Como se puede en la siguiente la tabla, por cada año se tiene una fila:
USgas
## Jan Feb Mar Apr May Jun Jul Aug Sep Oct
## 2000 2510.5 2330.7 2050.6 1783.3 1632.9 1513.1 1525.6 1653.1 1475.0 1567.8
## 2001 2677.0 2309.5 2246.6 1807.2 1522.4 1444.4 1598.1 1669.2 1494.1 1649.1
## 2002 2487.6 2242.4 2258.4 1881.0 1611.5 1591.4 1748.4 1725.7 1542.2 1645.9
## 2003 2700.5 2500.3 2197.9 1743.5 1514.7 1368.4 1600.5 1651.6 1428.6 1553.2
## 2004 2675.8 2511.1 2100.9 1745.2 1573.0 1483.7 1584.9 1578.0 1482.2 1557.2
## 2005 2561.9 2243.0 2205.8 1724.9 1522.6 1534.1 1686.6 1695.1 1422.5 1428.2
## 2006 2165.3 2144.4 2126.4 1681.0 1526.3 1550.9 1758.7 1751.7 1462.1 1644.2
## 2007 2475.6 2567.0 2128.8 1810.1 1559.1 1555.2 1659.9 1896.1 1590.5 1627.8
## 2008 2734.0 2503.4 2278.2 1823.9 1576.4 1604.2 1708.6 1682.9 1460.9 1635.8
## 2009 2729.7 2332.5 2170.7 1741.3 1504.0 1527.8 1658.0 1736.5 1575.0 1666.5
## 2010 2809.8 2481.0 2142.9 1691.8 1617.3 1649.5 1825.8 1878.9 1637.5 1664.9
## 2011 2888.6 2452.4 2230.5 1825.0 1667.4 1657.3 1890.5 1891.8 1655.6 1744.5
## 2012 2756.2 2500.7 2127.8 1953.1 1873.8 1868.4 2069.8 2008.8 1807.2 1901.1
## 2013 2878.8 2567.2 2521.1 1967.5 1752.5 1742.9 1926.3 1927.4 1767.0 1866.8
## 2014 3204.1 2741.2 2557.9 1961.7 1810.2 1745.4 1881.0 1933.1 1809.3 1912.8
## 2015 3115.0 2925.2 2591.3 2007.9 1858.1 1899.9 2067.7 2052.7 1901.3 1987.3
## 2016 3091.7 2652.3 2356.3 2083.8 1965.8 2000.7 2186.6 2208.4 1947.8 1925.2
## 2017 2914.2 2340.6 2523.7 1932.5 1892.5 1910.9 2142.1 2094.3 1920.9 2032.0
## 2018 3335.0 2705.9 2792.6 2346.3 2050.9 2058.7 2344.6 2307.7 2151.5 2279.1
## 2019 3399.9 2999.2 2899.9 2201.1 2121.0 2115.2 2407.5 2437.2 2215.6 2472.3
## Nov Dec
## 2000 1908.5 2587.5
## 2001 1701.0 2120.2
## 2002 1913.6 2378.9
## 2003 1753.6 2263.7
## 2004 1782.8 2327.7
## 2005 1663.4 2326.4
## 2006 1765.4 2122.8
## 2007 1834.5 2399.2
## 2008 1868.9 2399.7
## 2009 1776.2 2491.9
## 2010 1973.3 2714.1
## 2011 2031.9 2541.9
## 2012 2167.8 2503.9
## 2013 2316.9 2920.8
## 2014 2357.5 2679.2
## 2015 2249.1 2588.2
## 2016 2159.4 2866.3
## 2017 2357.7 3084.5
## 2018 2709.9 2993.1
## 2019
######En la gráfica anterior se puede obsevar una uniformidad en la tendencia de la serie entre 2000 y 2010; sin embargo, en el futuro se comienza a apreciar un crecimiento lineal. Por lo tanto, no se puede considerar que la tendencia de la serie es estrictamente lineal entre 2000 y 2018. Esta es una idea importante que ayudará a definir la entrada de tendencia para el modelo de regresión. ######Antes del uso de la función de regresión lineal lm incorporada del paquete stats, es necesario convertir la serie de un objeto ts a un objeto data.frame. Para esto, se utilizará la función ts_to_prophet del paquete TSstudio:
USgas_df <- ts_to_prophet(USgas)
head(USgas_df)
## ds y
## 1 2000-01-01 2510.5
## 2 2000-02-01 2330.7
## 3 2000-03-01 2050.6
## 4 2000-04-01 1783.3
## 5 2000-05-01 1632.9
## 6 2000-06-01 1513.1
USgas_df$trend <- 1:nrow(USgas_df)
USgas_df
## ds y trend
## 1 2000-01-01 2510.5 1
## 2 2000-02-01 2330.7 2
## 3 2000-03-01 2050.6 3
## 4 2000-04-01 1783.3 4
## 5 2000-05-01 1632.9 5
## 6 2000-06-01 1513.1 6
## 7 2000-07-01 1525.6 7
## 8 2000-08-01 1653.1 8
## 9 2000-09-01 1475.0 9
## 10 2000-10-01 1567.8 10
## 11 2000-11-01 1908.5 11
## 12 2000-12-01 2587.5 12
## 13 2001-01-01 2677.0 13
## 14 2001-02-01 2309.5 14
## 15 2001-03-01 2246.6 15
## 16 2001-04-01 1807.2 16
## 17 2001-05-01 1522.4 17
## 18 2001-06-01 1444.4 18
## 19 2001-07-01 1598.1 19
## 20 2001-08-01 1669.2 20
## 21 2001-09-01 1494.1 21
## 22 2001-10-01 1649.1 22
## 23 2001-11-01 1701.0 23
## 24 2001-12-01 2120.2 24
## 25 2002-01-01 2487.6 25
## 26 2002-02-01 2242.4 26
## 27 2002-03-01 2258.4 27
## 28 2002-04-01 1881.0 28
## 29 2002-05-01 1611.5 29
## 30 2002-06-01 1591.4 30
## 31 2002-07-01 1748.4 31
## 32 2002-08-01 1725.7 32
## 33 2002-09-01 1542.2 33
## 34 2002-10-01 1645.9 34
## 35 2002-11-01 1913.6 35
## 36 2002-12-01 2378.9 36
## 37 2003-01-01 2700.5 37
## 38 2003-02-01 2500.3 38
## 39 2003-03-01 2197.9 39
## 40 2003-04-01 1743.5 40
## 41 2003-05-01 1514.7 41
## 42 2003-06-01 1368.4 42
## 43 2003-07-01 1600.5 43
## 44 2003-08-01 1651.6 44
## 45 2003-09-01 1428.6 45
## 46 2003-10-01 1553.2 46
## 47 2003-11-01 1753.6 47
## 48 2003-12-01 2263.7 48
## 49 2004-01-01 2675.8 49
## 50 2004-02-01 2511.1 50
## 51 2004-03-01 2100.9 51
## 52 2004-04-01 1745.2 52
## 53 2004-05-01 1573.0 53
## 54 2004-06-01 1483.7 54
## 55 2004-07-01 1584.9 55
## 56 2004-08-01 1578.0 56
## 57 2004-09-01 1482.2 57
## 58 2004-10-01 1557.2 58
## 59 2004-11-01 1782.8 59
## 60 2004-12-01 2327.7 60
## 61 2005-01-01 2561.9 61
## 62 2005-02-01 2243.0 62
## 63 2005-03-01 2205.8 63
## 64 2005-04-01 1724.9 64
## 65 2005-05-01 1522.6 65
## 66 2005-06-01 1534.1 66
## 67 2005-07-01 1686.6 67
## 68 2005-08-01 1695.1 68
## 69 2005-09-01 1422.5 69
## 70 2005-10-01 1428.2 70
## 71 2005-11-01 1663.4 71
## 72 2005-12-01 2326.4 72
## 73 2006-01-01 2165.3 73
## 74 2006-02-01 2144.4 74
## 75 2006-03-01 2126.4 75
## 76 2006-04-01 1681.0 76
## 77 2006-05-01 1526.3 77
## 78 2006-06-01 1550.9 78
## 79 2006-07-01 1758.7 79
## 80 2006-08-01 1751.7 80
## 81 2006-09-01 1462.1 81
## 82 2006-10-01 1644.2 82
## 83 2006-11-01 1765.4 83
## 84 2006-12-01 2122.8 84
## 85 2007-01-01 2475.6 85
## 86 2007-02-01 2567.0 86
## 87 2007-03-01 2128.8 87
## 88 2007-04-01 1810.1 88
## 89 2007-05-01 1559.1 89
## 90 2007-06-01 1555.2 90
## 91 2007-07-01 1659.9 91
## 92 2007-08-01 1896.1 92
## 93 2007-09-01 1590.5 93
## 94 2007-10-01 1627.8 94
## 95 2007-11-01 1834.5 95
## 96 2007-12-01 2399.2 96
## 97 2008-01-01 2734.0 97
## 98 2008-02-01 2503.4 98
## 99 2008-03-01 2278.2 99
## 100 2008-04-01 1823.9 100
## 101 2008-05-01 1576.4 101
## 102 2008-06-01 1604.2 102
## 103 2008-07-01 1708.6 103
## 104 2008-08-01 1682.9 104
## 105 2008-09-01 1460.9 105
## 106 2008-10-01 1635.8 106
## 107 2008-11-01 1868.9 107
## 108 2008-12-01 2399.7 108
## 109 2009-01-01 2729.7 109
## 110 2009-02-01 2332.5 110
## 111 2009-03-01 2170.7 111
## 112 2009-04-01 1741.3 112
## 113 2009-05-01 1504.0 113
## 114 2009-06-01 1527.8 114
## 115 2009-07-01 1658.0 115
## 116 2009-08-01 1736.5 116
## 117 2009-09-01 1575.0 117
## 118 2009-10-01 1666.5 118
## 119 2009-11-01 1776.2 119
## 120 2009-12-01 2491.9 120
## 121 2010-01-01 2809.8 121
## 122 2010-02-01 2481.0 122
## 123 2010-03-01 2142.9 123
## 124 2010-04-01 1691.8 124
## 125 2010-05-01 1617.3 125
## 126 2010-06-01 1649.5 126
## 127 2010-07-01 1825.8 127
## 128 2010-08-01 1878.9 128
## 129 2010-09-01 1637.5 129
## 130 2010-10-01 1664.9 130
## 131 2010-11-01 1973.3 131
## 132 2010-12-01 2714.1 132
## 133 2011-01-01 2888.6 133
## 134 2011-02-01 2452.4 134
## 135 2011-03-01 2230.5 135
## 136 2011-04-01 1825.0 136
## 137 2011-05-01 1667.4 137
## 138 2011-06-01 1657.3 138
## 139 2011-07-01 1890.5 139
## 140 2011-08-01 1891.8 140
## 141 2011-09-01 1655.6 141
## 142 2011-10-01 1744.5 142
## 143 2011-11-01 2031.9 143
## 144 2011-12-01 2541.9 144
## 145 2012-01-01 2756.2 145
## 146 2012-02-01 2500.7 146
## 147 2012-03-01 2127.8 147
## 148 2012-04-01 1953.1 148
## 149 2012-05-01 1873.8 149
## 150 2012-06-01 1868.4 150
## 151 2012-07-01 2069.8 151
## 152 2012-08-01 2008.8 152
## 153 2012-09-01 1807.2 153
## 154 2012-10-01 1901.1 154
## 155 2012-11-01 2167.8 155
## 156 2012-12-01 2503.9 156
## 157 2013-01-01 2878.8 157
## 158 2013-02-01 2567.2 158
## 159 2013-03-01 2521.1 159
## 160 2013-04-01 1967.5 160
## 161 2013-05-01 1752.5 161
## 162 2013-06-01 1742.9 162
## 163 2013-07-01 1926.3 163
## 164 2013-08-01 1927.4 164
## 165 2013-09-01 1767.0 165
## 166 2013-10-01 1866.8 166
## 167 2013-11-01 2316.9 167
## 168 2013-12-01 2920.8 168
## 169 2014-01-01 3204.1 169
## 170 2014-02-01 2741.2 170
## 171 2014-03-01 2557.9 171
## 172 2014-04-01 1961.7 172
## 173 2014-05-01 1810.2 173
## 174 2014-06-01 1745.4 174
## 175 2014-07-01 1881.0 175
## 176 2014-08-01 1933.1 176
## 177 2014-09-01 1809.3 177
## 178 2014-10-01 1912.8 178
## 179 2014-11-01 2357.5 179
## 180 2014-12-01 2679.2 180
## 181 2015-01-01 3115.0 181
## 182 2015-02-01 2925.2 182
## 183 2015-03-01 2591.3 183
## 184 2015-04-01 2007.9 184
## 185 2015-05-01 1858.1 185
## 186 2015-06-01 1899.9 186
## 187 2015-07-01 2067.7 187
## 188 2015-08-01 2052.7 188
## 189 2015-09-01 1901.3 189
## 190 2015-10-01 1987.3 190
## 191 2015-11-01 2249.1 191
## 192 2015-12-01 2588.2 192
## 193 2016-01-01 3091.7 193
## 194 2016-02-01 2652.3 194
## 195 2016-03-01 2356.3 195
## 196 2016-04-01 2083.8 196
## 197 2016-05-01 1965.8 197
## 198 2016-06-01 2000.7 198
## 199 2016-07-01 2186.6 199
## 200 2016-08-01 2208.4 200
## 201 2016-09-01 1947.8 201
## 202 2016-10-01 1925.2 202
## 203 2016-11-01 2159.4 203
## 204 2016-12-01 2866.3 204
## 205 2017-01-01 2914.2 205
## 206 2017-02-01 2340.6 206
## 207 2017-03-01 2523.7 207
## 208 2017-04-01 1932.5 208
## 209 2017-05-01 1892.5 209
## 210 2017-06-01 1910.9 210
## 211 2017-07-01 2142.1 211
## 212 2017-08-01 2094.3 212
## 213 2017-09-01 1920.9 213
## 214 2017-10-01 2032.0 214
## 215 2017-11-01 2357.7 215
## 216 2017-12-01 3084.5 216
## 217 2018-01-01 3335.0 217
## 218 2018-02-01 2705.9 218
## 219 2018-03-01 2792.6 219
## 220 2018-04-01 2346.3 220
## 221 2018-05-01 2050.9 221
## 222 2018-06-01 2058.7 222
## 223 2018-07-01 2344.6 223
## 224 2018-08-01 2307.7 224
## 225 2018-09-01 2151.5 225
## 226 2018-10-01 2279.1 226
## 227 2018-11-01 2709.9 227
## 228 2018-12-01 2993.1 228
## 229 2019-01-01 3399.9 229
## 230 2019-02-01 2999.2 230
## 231 2019-03-01 2899.9 231
## 232 2019-04-01 2201.1 232
## 233 2019-05-01 2121.0 233
## 234 2019-06-01 2115.2 234
## 235 2019-07-01 2407.5 235
## 236 2019-08-01 2437.2 236
## 237 2019-09-01 2215.6 237
## 238 2019-10-01 2472.3 238
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
USgas_df$seasonal <- factor(month(USgas_df$ds, label = T), ordered = FALSE)
USgas_df
## ds y trend seasonal
## 1 2000-01-01 2510.5 1 ene
## 2 2000-02-01 2330.7 2 feb
## 3 2000-03-01 2050.6 3 mar
## 4 2000-04-01 1783.3 4 abr
## 5 2000-05-01 1632.9 5 may
## 6 2000-06-01 1513.1 6 jun
## 7 2000-07-01 1525.6 7 jul
## 8 2000-08-01 1653.1 8 ago
## 9 2000-09-01 1475.0 9 sept
## 10 2000-10-01 1567.8 10 oct
## 11 2000-11-01 1908.5 11 nov
## 12 2000-12-01 2587.5 12 dic
## 13 2001-01-01 2677.0 13 ene
## 14 2001-02-01 2309.5 14 feb
## 15 2001-03-01 2246.6 15 mar
## 16 2001-04-01 1807.2 16 abr
## 17 2001-05-01 1522.4 17 may
## 18 2001-06-01 1444.4 18 jun
## 19 2001-07-01 1598.1 19 jul
## 20 2001-08-01 1669.2 20 ago
## 21 2001-09-01 1494.1 21 sept
## 22 2001-10-01 1649.1 22 oct
## 23 2001-11-01 1701.0 23 nov
## 24 2001-12-01 2120.2 24 dic
## 25 2002-01-01 2487.6 25 ene
## 26 2002-02-01 2242.4 26 feb
## 27 2002-03-01 2258.4 27 mar
## 28 2002-04-01 1881.0 28 abr
## 29 2002-05-01 1611.5 29 may
## 30 2002-06-01 1591.4 30 jun
## 31 2002-07-01 1748.4 31 jul
## 32 2002-08-01 1725.7 32 ago
## 33 2002-09-01 1542.2 33 sept
## 34 2002-10-01 1645.9 34 oct
## 35 2002-11-01 1913.6 35 nov
## 36 2002-12-01 2378.9 36 dic
## 37 2003-01-01 2700.5 37 ene
## 38 2003-02-01 2500.3 38 feb
## 39 2003-03-01 2197.9 39 mar
## 40 2003-04-01 1743.5 40 abr
## 41 2003-05-01 1514.7 41 may
## 42 2003-06-01 1368.4 42 jun
## 43 2003-07-01 1600.5 43 jul
## 44 2003-08-01 1651.6 44 ago
## 45 2003-09-01 1428.6 45 sept
## 46 2003-10-01 1553.2 46 oct
## 47 2003-11-01 1753.6 47 nov
## 48 2003-12-01 2263.7 48 dic
## 49 2004-01-01 2675.8 49 ene
## 50 2004-02-01 2511.1 50 feb
## 51 2004-03-01 2100.9 51 mar
## 52 2004-04-01 1745.2 52 abr
## 53 2004-05-01 1573.0 53 may
## 54 2004-06-01 1483.7 54 jun
## 55 2004-07-01 1584.9 55 jul
## 56 2004-08-01 1578.0 56 ago
## 57 2004-09-01 1482.2 57 sept
## 58 2004-10-01 1557.2 58 oct
## 59 2004-11-01 1782.8 59 nov
## 60 2004-12-01 2327.7 60 dic
## 61 2005-01-01 2561.9 61 ene
## 62 2005-02-01 2243.0 62 feb
## 63 2005-03-01 2205.8 63 mar
## 64 2005-04-01 1724.9 64 abr
## 65 2005-05-01 1522.6 65 may
## 66 2005-06-01 1534.1 66 jun
## 67 2005-07-01 1686.6 67 jul
## 68 2005-08-01 1695.1 68 ago
## 69 2005-09-01 1422.5 69 sept
## 70 2005-10-01 1428.2 70 oct
## 71 2005-11-01 1663.4 71 nov
## 72 2005-12-01 2326.4 72 dic
## 73 2006-01-01 2165.3 73 ene
## 74 2006-02-01 2144.4 74 feb
## 75 2006-03-01 2126.4 75 mar
## 76 2006-04-01 1681.0 76 abr
## 77 2006-05-01 1526.3 77 may
## 78 2006-06-01 1550.9 78 jun
## 79 2006-07-01 1758.7 79 jul
## 80 2006-08-01 1751.7 80 ago
## 81 2006-09-01 1462.1 81 sept
## 82 2006-10-01 1644.2 82 oct
## 83 2006-11-01 1765.4 83 nov
## 84 2006-12-01 2122.8 84 dic
## 85 2007-01-01 2475.6 85 ene
## 86 2007-02-01 2567.0 86 feb
## 87 2007-03-01 2128.8 87 mar
## 88 2007-04-01 1810.1 88 abr
## 89 2007-05-01 1559.1 89 may
## 90 2007-06-01 1555.2 90 jun
## 91 2007-07-01 1659.9 91 jul
## 92 2007-08-01 1896.1 92 ago
## 93 2007-09-01 1590.5 93 sept
## 94 2007-10-01 1627.8 94 oct
## 95 2007-11-01 1834.5 95 nov
## 96 2007-12-01 2399.2 96 dic
## 97 2008-01-01 2734.0 97 ene
## 98 2008-02-01 2503.4 98 feb
## 99 2008-03-01 2278.2 99 mar
## 100 2008-04-01 1823.9 100 abr
## 101 2008-05-01 1576.4 101 may
## 102 2008-06-01 1604.2 102 jun
## 103 2008-07-01 1708.6 103 jul
## 104 2008-08-01 1682.9 104 ago
## 105 2008-09-01 1460.9 105 sept
## 106 2008-10-01 1635.8 106 oct
## 107 2008-11-01 1868.9 107 nov
## 108 2008-12-01 2399.7 108 dic
## 109 2009-01-01 2729.7 109 ene
## 110 2009-02-01 2332.5 110 feb
## 111 2009-03-01 2170.7 111 mar
## 112 2009-04-01 1741.3 112 abr
## 113 2009-05-01 1504.0 113 may
## 114 2009-06-01 1527.8 114 jun
## 115 2009-07-01 1658.0 115 jul
## 116 2009-08-01 1736.5 116 ago
## 117 2009-09-01 1575.0 117 sept
## 118 2009-10-01 1666.5 118 oct
## 119 2009-11-01 1776.2 119 nov
## 120 2009-12-01 2491.9 120 dic
## 121 2010-01-01 2809.8 121 ene
## 122 2010-02-01 2481.0 122 feb
## 123 2010-03-01 2142.9 123 mar
## 124 2010-04-01 1691.8 124 abr
## 125 2010-05-01 1617.3 125 may
## 126 2010-06-01 1649.5 126 jun
## 127 2010-07-01 1825.8 127 jul
## 128 2010-08-01 1878.9 128 ago
## 129 2010-09-01 1637.5 129 sept
## 130 2010-10-01 1664.9 130 oct
## 131 2010-11-01 1973.3 131 nov
## 132 2010-12-01 2714.1 132 dic
## 133 2011-01-01 2888.6 133 ene
## 134 2011-02-01 2452.4 134 feb
## 135 2011-03-01 2230.5 135 mar
## 136 2011-04-01 1825.0 136 abr
## 137 2011-05-01 1667.4 137 may
## 138 2011-06-01 1657.3 138 jun
## 139 2011-07-01 1890.5 139 jul
## 140 2011-08-01 1891.8 140 ago
## 141 2011-09-01 1655.6 141 sept
## 142 2011-10-01 1744.5 142 oct
## 143 2011-11-01 2031.9 143 nov
## 144 2011-12-01 2541.9 144 dic
## 145 2012-01-01 2756.2 145 ene
## 146 2012-02-01 2500.7 146 feb
## 147 2012-03-01 2127.8 147 mar
## 148 2012-04-01 1953.1 148 abr
## 149 2012-05-01 1873.8 149 may
## 150 2012-06-01 1868.4 150 jun
## 151 2012-07-01 2069.8 151 jul
## 152 2012-08-01 2008.8 152 ago
## 153 2012-09-01 1807.2 153 sept
## 154 2012-10-01 1901.1 154 oct
## 155 2012-11-01 2167.8 155 nov
## 156 2012-12-01 2503.9 156 dic
## 157 2013-01-01 2878.8 157 ene
## 158 2013-02-01 2567.2 158 feb
## 159 2013-03-01 2521.1 159 mar
## 160 2013-04-01 1967.5 160 abr
## 161 2013-05-01 1752.5 161 may
## 162 2013-06-01 1742.9 162 jun
## 163 2013-07-01 1926.3 163 jul
## 164 2013-08-01 1927.4 164 ago
## 165 2013-09-01 1767.0 165 sept
## 166 2013-10-01 1866.8 166 oct
## 167 2013-11-01 2316.9 167 nov
## 168 2013-12-01 2920.8 168 dic
## 169 2014-01-01 3204.1 169 ene
## 170 2014-02-01 2741.2 170 feb
## 171 2014-03-01 2557.9 171 mar
## 172 2014-04-01 1961.7 172 abr
## 173 2014-05-01 1810.2 173 may
## 174 2014-06-01 1745.4 174 jun
## 175 2014-07-01 1881.0 175 jul
## 176 2014-08-01 1933.1 176 ago
## 177 2014-09-01 1809.3 177 sept
## 178 2014-10-01 1912.8 178 oct
## 179 2014-11-01 2357.5 179 nov
## 180 2014-12-01 2679.2 180 dic
## 181 2015-01-01 3115.0 181 ene
## 182 2015-02-01 2925.2 182 feb
## 183 2015-03-01 2591.3 183 mar
## 184 2015-04-01 2007.9 184 abr
## 185 2015-05-01 1858.1 185 may
## 186 2015-06-01 1899.9 186 jun
## 187 2015-07-01 2067.7 187 jul
## 188 2015-08-01 2052.7 188 ago
## 189 2015-09-01 1901.3 189 sept
## 190 2015-10-01 1987.3 190 oct
## 191 2015-11-01 2249.1 191 nov
## 192 2015-12-01 2588.2 192 dic
## 193 2016-01-01 3091.7 193 ene
## 194 2016-02-01 2652.3 194 feb
## 195 2016-03-01 2356.3 195 mar
## 196 2016-04-01 2083.8 196 abr
## 197 2016-05-01 1965.8 197 may
## 198 2016-06-01 2000.7 198 jun
## 199 2016-07-01 2186.6 199 jul
## 200 2016-08-01 2208.4 200 ago
## 201 2016-09-01 1947.8 201 sept
## 202 2016-10-01 1925.2 202 oct
## 203 2016-11-01 2159.4 203 nov
## 204 2016-12-01 2866.3 204 dic
## 205 2017-01-01 2914.2 205 ene
## 206 2017-02-01 2340.6 206 feb
## 207 2017-03-01 2523.7 207 mar
## 208 2017-04-01 1932.5 208 abr
## 209 2017-05-01 1892.5 209 may
## 210 2017-06-01 1910.9 210 jun
## 211 2017-07-01 2142.1 211 jul
## 212 2017-08-01 2094.3 212 ago
## 213 2017-09-01 1920.9 213 sept
## 214 2017-10-01 2032.0 214 oct
## 215 2017-11-01 2357.7 215 nov
## 216 2017-12-01 3084.5 216 dic
## 217 2018-01-01 3335.0 217 ene
## 218 2018-02-01 2705.9 218 feb
## 219 2018-03-01 2792.6 219 mar
## 220 2018-04-01 2346.3 220 abr
## 221 2018-05-01 2050.9 221 may
## 222 2018-06-01 2058.7 222 jun
## 223 2018-07-01 2344.6 223 jul
## 224 2018-08-01 2307.7 224 ago
## 225 2018-09-01 2151.5 225 sept
## 226 2018-10-01 2279.1 226 oct
## 227 2018-11-01 2709.9 227 nov
## 228 2018-12-01 2993.1 228 dic
## 229 2019-01-01 3399.9 229 ene
## 230 2019-02-01 2999.2 230 feb
## 231 2019-03-01 2899.9 231 mar
## 232 2019-04-01 2201.1 232 abr
## 233 2019-05-01 2121.0 233 may
## 234 2019-06-01 2115.2 234 jun
## 235 2019-07-01 2407.5 235 jul
## 236 2019-08-01 2437.2 236 ago
## 237 2019-09-01 2215.6 237 sept
## 238 2019-10-01 2472.3 238 oct
h <- 12
train <- USgas_df[1:(nrow(USgas_df) - h), ]
test <- USgas_df[(nrow(USgas_df) - h + 1):nrow(USgas_df), ]
train
## ds y trend seasonal
## 1 2000-01-01 2510.5 1 ene
## 2 2000-02-01 2330.7 2 feb
## 3 2000-03-01 2050.6 3 mar
## 4 2000-04-01 1783.3 4 abr
## 5 2000-05-01 1632.9 5 may
## 6 2000-06-01 1513.1 6 jun
## 7 2000-07-01 1525.6 7 jul
## 8 2000-08-01 1653.1 8 ago
## 9 2000-09-01 1475.0 9 sept
## 10 2000-10-01 1567.8 10 oct
## 11 2000-11-01 1908.5 11 nov
## 12 2000-12-01 2587.5 12 dic
## 13 2001-01-01 2677.0 13 ene
## 14 2001-02-01 2309.5 14 feb
## 15 2001-03-01 2246.6 15 mar
## 16 2001-04-01 1807.2 16 abr
## 17 2001-05-01 1522.4 17 may
## 18 2001-06-01 1444.4 18 jun
## 19 2001-07-01 1598.1 19 jul
## 20 2001-08-01 1669.2 20 ago
## 21 2001-09-01 1494.1 21 sept
## 22 2001-10-01 1649.1 22 oct
## 23 2001-11-01 1701.0 23 nov
## 24 2001-12-01 2120.2 24 dic
## 25 2002-01-01 2487.6 25 ene
## 26 2002-02-01 2242.4 26 feb
## 27 2002-03-01 2258.4 27 mar
## 28 2002-04-01 1881.0 28 abr
## 29 2002-05-01 1611.5 29 may
## 30 2002-06-01 1591.4 30 jun
## 31 2002-07-01 1748.4 31 jul
## 32 2002-08-01 1725.7 32 ago
## 33 2002-09-01 1542.2 33 sept
## 34 2002-10-01 1645.9 34 oct
## 35 2002-11-01 1913.6 35 nov
## 36 2002-12-01 2378.9 36 dic
## 37 2003-01-01 2700.5 37 ene
## 38 2003-02-01 2500.3 38 feb
## 39 2003-03-01 2197.9 39 mar
## 40 2003-04-01 1743.5 40 abr
## 41 2003-05-01 1514.7 41 may
## 42 2003-06-01 1368.4 42 jun
## 43 2003-07-01 1600.5 43 jul
## 44 2003-08-01 1651.6 44 ago
## 45 2003-09-01 1428.6 45 sept
## 46 2003-10-01 1553.2 46 oct
## 47 2003-11-01 1753.6 47 nov
## 48 2003-12-01 2263.7 48 dic
## 49 2004-01-01 2675.8 49 ene
## 50 2004-02-01 2511.1 50 feb
## 51 2004-03-01 2100.9 51 mar
## 52 2004-04-01 1745.2 52 abr
## 53 2004-05-01 1573.0 53 may
## 54 2004-06-01 1483.7 54 jun
## 55 2004-07-01 1584.9 55 jul
## 56 2004-08-01 1578.0 56 ago
## 57 2004-09-01 1482.2 57 sept
## 58 2004-10-01 1557.2 58 oct
## 59 2004-11-01 1782.8 59 nov
## 60 2004-12-01 2327.7 60 dic
## 61 2005-01-01 2561.9 61 ene
## 62 2005-02-01 2243.0 62 feb
## 63 2005-03-01 2205.8 63 mar
## 64 2005-04-01 1724.9 64 abr
## 65 2005-05-01 1522.6 65 may
## 66 2005-06-01 1534.1 66 jun
## 67 2005-07-01 1686.6 67 jul
## 68 2005-08-01 1695.1 68 ago
## 69 2005-09-01 1422.5 69 sept
## 70 2005-10-01 1428.2 70 oct
## 71 2005-11-01 1663.4 71 nov
## 72 2005-12-01 2326.4 72 dic
## 73 2006-01-01 2165.3 73 ene
## 74 2006-02-01 2144.4 74 feb
## 75 2006-03-01 2126.4 75 mar
## 76 2006-04-01 1681.0 76 abr
## 77 2006-05-01 1526.3 77 may
## 78 2006-06-01 1550.9 78 jun
## 79 2006-07-01 1758.7 79 jul
## 80 2006-08-01 1751.7 80 ago
## 81 2006-09-01 1462.1 81 sept
## 82 2006-10-01 1644.2 82 oct
## 83 2006-11-01 1765.4 83 nov
## 84 2006-12-01 2122.8 84 dic
## 85 2007-01-01 2475.6 85 ene
## 86 2007-02-01 2567.0 86 feb
## 87 2007-03-01 2128.8 87 mar
## 88 2007-04-01 1810.1 88 abr
## 89 2007-05-01 1559.1 89 may
## 90 2007-06-01 1555.2 90 jun
## 91 2007-07-01 1659.9 91 jul
## 92 2007-08-01 1896.1 92 ago
## 93 2007-09-01 1590.5 93 sept
## 94 2007-10-01 1627.8 94 oct
## 95 2007-11-01 1834.5 95 nov
## 96 2007-12-01 2399.2 96 dic
## 97 2008-01-01 2734.0 97 ene
## 98 2008-02-01 2503.4 98 feb
## 99 2008-03-01 2278.2 99 mar
## 100 2008-04-01 1823.9 100 abr
## 101 2008-05-01 1576.4 101 may
## 102 2008-06-01 1604.2 102 jun
## 103 2008-07-01 1708.6 103 jul
## 104 2008-08-01 1682.9 104 ago
## 105 2008-09-01 1460.9 105 sept
## 106 2008-10-01 1635.8 106 oct
## 107 2008-11-01 1868.9 107 nov
## 108 2008-12-01 2399.7 108 dic
## 109 2009-01-01 2729.7 109 ene
## 110 2009-02-01 2332.5 110 feb
## 111 2009-03-01 2170.7 111 mar
## 112 2009-04-01 1741.3 112 abr
## 113 2009-05-01 1504.0 113 may
## 114 2009-06-01 1527.8 114 jun
## 115 2009-07-01 1658.0 115 jul
## 116 2009-08-01 1736.5 116 ago
## 117 2009-09-01 1575.0 117 sept
## 118 2009-10-01 1666.5 118 oct
## 119 2009-11-01 1776.2 119 nov
## 120 2009-12-01 2491.9 120 dic
## 121 2010-01-01 2809.8 121 ene
## 122 2010-02-01 2481.0 122 feb
## 123 2010-03-01 2142.9 123 mar
## 124 2010-04-01 1691.8 124 abr
## 125 2010-05-01 1617.3 125 may
## 126 2010-06-01 1649.5 126 jun
## 127 2010-07-01 1825.8 127 jul
## 128 2010-08-01 1878.9 128 ago
## 129 2010-09-01 1637.5 129 sept
## 130 2010-10-01 1664.9 130 oct
## 131 2010-11-01 1973.3 131 nov
## 132 2010-12-01 2714.1 132 dic
## 133 2011-01-01 2888.6 133 ene
## 134 2011-02-01 2452.4 134 feb
## 135 2011-03-01 2230.5 135 mar
## 136 2011-04-01 1825.0 136 abr
## 137 2011-05-01 1667.4 137 may
## 138 2011-06-01 1657.3 138 jun
## 139 2011-07-01 1890.5 139 jul
## 140 2011-08-01 1891.8 140 ago
## 141 2011-09-01 1655.6 141 sept
## 142 2011-10-01 1744.5 142 oct
## 143 2011-11-01 2031.9 143 nov
## 144 2011-12-01 2541.9 144 dic
## 145 2012-01-01 2756.2 145 ene
## 146 2012-02-01 2500.7 146 feb
## 147 2012-03-01 2127.8 147 mar
## 148 2012-04-01 1953.1 148 abr
## 149 2012-05-01 1873.8 149 may
## 150 2012-06-01 1868.4 150 jun
## 151 2012-07-01 2069.8 151 jul
## 152 2012-08-01 2008.8 152 ago
## 153 2012-09-01 1807.2 153 sept
## 154 2012-10-01 1901.1 154 oct
## 155 2012-11-01 2167.8 155 nov
## 156 2012-12-01 2503.9 156 dic
## 157 2013-01-01 2878.8 157 ene
## 158 2013-02-01 2567.2 158 feb
## 159 2013-03-01 2521.1 159 mar
## 160 2013-04-01 1967.5 160 abr
## 161 2013-05-01 1752.5 161 may
## 162 2013-06-01 1742.9 162 jun
## 163 2013-07-01 1926.3 163 jul
## 164 2013-08-01 1927.4 164 ago
## 165 2013-09-01 1767.0 165 sept
## 166 2013-10-01 1866.8 166 oct
## 167 2013-11-01 2316.9 167 nov
## 168 2013-12-01 2920.8 168 dic
## 169 2014-01-01 3204.1 169 ene
## 170 2014-02-01 2741.2 170 feb
## 171 2014-03-01 2557.9 171 mar
## 172 2014-04-01 1961.7 172 abr
## 173 2014-05-01 1810.2 173 may
## 174 2014-06-01 1745.4 174 jun
## 175 2014-07-01 1881.0 175 jul
## 176 2014-08-01 1933.1 176 ago
## 177 2014-09-01 1809.3 177 sept
## 178 2014-10-01 1912.8 178 oct
## 179 2014-11-01 2357.5 179 nov
## 180 2014-12-01 2679.2 180 dic
## 181 2015-01-01 3115.0 181 ene
## 182 2015-02-01 2925.2 182 feb
## 183 2015-03-01 2591.3 183 mar
## 184 2015-04-01 2007.9 184 abr
## 185 2015-05-01 1858.1 185 may
## 186 2015-06-01 1899.9 186 jun
## 187 2015-07-01 2067.7 187 jul
## 188 2015-08-01 2052.7 188 ago
## 189 2015-09-01 1901.3 189 sept
## 190 2015-10-01 1987.3 190 oct
## 191 2015-11-01 2249.1 191 nov
## 192 2015-12-01 2588.2 192 dic
## 193 2016-01-01 3091.7 193 ene
## 194 2016-02-01 2652.3 194 feb
## 195 2016-03-01 2356.3 195 mar
## 196 2016-04-01 2083.8 196 abr
## 197 2016-05-01 1965.8 197 may
## 198 2016-06-01 2000.7 198 jun
## 199 2016-07-01 2186.6 199 jul
## 200 2016-08-01 2208.4 200 ago
## 201 2016-09-01 1947.8 201 sept
## 202 2016-10-01 1925.2 202 oct
## 203 2016-11-01 2159.4 203 nov
## 204 2016-12-01 2866.3 204 dic
## 205 2017-01-01 2914.2 205 ene
## 206 2017-02-01 2340.6 206 feb
## 207 2017-03-01 2523.7 207 mar
## 208 2017-04-01 1932.5 208 abr
## 209 2017-05-01 1892.5 209 may
## 210 2017-06-01 1910.9 210 jun
## 211 2017-07-01 2142.1 211 jul
## 212 2017-08-01 2094.3 212 ago
## 213 2017-09-01 1920.9 213 sept
## 214 2017-10-01 2032.0 214 oct
## 215 2017-11-01 2357.7 215 nov
## 216 2017-12-01 3084.5 216 dic
## 217 2018-01-01 3335.0 217 ene
## 218 2018-02-01 2705.9 218 feb
## 219 2018-03-01 2792.6 219 mar
## 220 2018-04-01 2346.3 220 abr
## 221 2018-05-01 2050.9 221 may
## 222 2018-06-01 2058.7 222 jun
## 223 2018-07-01 2344.6 223 jul
## 224 2018-08-01 2307.7 224 ago
## 225 2018-09-01 2151.5 225 sept
## 226 2018-10-01 2279.1 226 oct
test
## ds y trend seasonal
## 227 2018-11-01 2709.9 227 nov
## 228 2018-12-01 2993.1 228 dic
## 229 2019-01-01 3399.9 229 ene
## 230 2019-02-01 2999.2 230 feb
## 231 2019-03-01 2899.9 231 mar
## 232 2019-04-01 2201.1 232 abr
## 233 2019-05-01 2121.0 233 may
## 234 2019-06-01 2115.2 234 jun
## 235 2019-07-01 2407.5 235 jul
## 236 2019-08-01 2437.2 236 ago
## 237 2019-09-01 2215.6 237 sept
## 238 2019-10-01 2472.3 238 oct
md_trend <- lm(y ~ trend, data = train)
summary(md_trend)
##
## Call:
## lm(formula = y ~ trend, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -547.2 -307.4 -153.2 333.1 1052.6
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1751.0074 52.6435 33.26 < 2e-16 ***
## trend 2.4489 0.4021 6.09 4.86e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 394.4 on 224 degrees of freedom
## Multiple R-squared: 0.1421, Adjusted R-squared: 0.1382
## F-statistic: 37.09 on 1 and 224 DF, p-value: 4.861e-09
train$yhat <- predict(md_trend, newdata = train)
test$yhat <- predict(md_trend, newdata = test)
library(plotly)
## Warning: package 'plotly' was built under R version 4.1.1
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
plot_lm <- function(data, train, test, title = NULL){
p <- plot_ly(data = data,
x = ~ ds,
y = ~ y,
type = "scatter",
mode = "line",
name = "Actual") %>%
add_lines(x = ~ train$ds,
y = ~ train$yhat,
line = list(color = "red"),
name = "Fitted") %>%
add_lines(x = ~ test$ds,
y = ~ test$yhat,
line = list(color = "green", dash = "dot", width = 3),
name = "Forecasted") %>%
layout(title = title,
xaxis = list(title = "Year"),
yaxis = list(title = "Billion Cubic Feet"),
legend = list(x = 0.05, y = 0.95))
return(p)
}
plot_lm(data = USgas_df,
train = train,
test = test,
title = "Predicting the Trend Component of the Series")
mape_trend <- c(mean(abs(train$y - train$yhat) / train$y),
mean(abs(test$y - test$yhat) / test$y))
mape_trend
## [1] 0.1644088 0.1299951
md_seasonal <- lm(y ~ seasonal, data = train)
summary(md_seasonal)
##
## Call:
## lm(formula = y ~ seasonal, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -608.98 -162.34 -50.77 148.40 566.89
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2774.28 49.75 55.759 < 2e-16 ***
## seasonalfeb -297.92 70.36 -4.234 3.41e-05 ***
## seasonalmar -479.10 70.36 -6.809 9.77e-11 ***
## seasonalabr -905.28 70.36 -12.866 < 2e-16 ***
## seasonalmay -1088.42 70.36 -15.468 < 2e-16 ***
## seasonaljun -1105.49 70.36 -15.711 < 2e-16 ***
## seasonaljul -939.35 70.36 -13.350 < 2e-16 ***
## seasonalago -914.12 70.36 -12.991 < 2e-16 ***
## seasonalsept -1114.74 70.36 -15.843 < 2e-16 ***
## seasonaloct -1022.21 70.36 -14.527 < 2e-16 ***
## seasonalnov -797.53 71.33 -11.180 < 2e-16 ***
## seasonaldic -256.67 71.33 -3.598 0.000398 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 216.9 on 214 degrees of freedom
## Multiple R-squared: 0.7521, Adjusted R-squared: 0.7394
## F-statistic: 59.04 on 11 and 214 DF, p-value: < 2.2e-16
train$yhat <- predict(md_seasonal, newdata = train)
test$yhat <- predict(md_seasonal, newdata = test)
######Ahora se puede usar la función plot_lm para visualizar el modelo ajustado y los valores de pronóstico:
plot_lm(data = USgas_df,
train = train,
test = test,
title = "Predicting the Seasonal Component of the Series")
mape_seasonal <- c(mean(abs(train$y - train$yhat) / train$y),
mean(abs(test$y - test$yhat) / test$y))
mape_seasonal
## [1] 0.08628973 0.21502100
md1 <- lm(y ~ seasonal + trend, data = train)
summary(md1)
##
## Call:
## lm(formula = y ~ seasonal + trend, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -514.73 -77.17 -17.70 85.80 336.95
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2488.8994 32.6011 76.344 < 2e-16 ***
## seasonalfeb -300.5392 41.4864 -7.244 7.84e-12 ***
## seasonalmar -484.3363 41.4870 -11.674 < 2e-16 ***
## seasonalabr -913.1334 41.4880 -22.010 < 2e-16 ***
## seasonalmay -1098.8884 41.4895 -26.486 < 2e-16 ***
## seasonaljun -1118.5855 41.4913 -26.960 < 2e-16 ***
## seasonaljul -955.0563 41.4936 -23.017 < 2e-16 ***
## seasonalago -932.4482 41.4962 -22.471 < 2e-16 ***
## seasonalsept -1135.6874 41.4993 -27.366 < 2e-16 ***
## seasonaloct -1045.7687 41.5028 -25.198 < 2e-16 ***
## seasonalnov -808.0016 42.0617 -19.210 < 2e-16 ***
## seasonaldic -269.7642 42.0635 -6.413 9.05e-10 ***
## trend 2.6182 0.1305 20.065 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 127.9 on 213 degrees of freedom
## Multiple R-squared: 0.9142, Adjusted R-squared: 0.9094
## F-statistic: 189.2 on 12 and 213 DF, p-value: < 2.2e-16
train$yhat <- predict(md1, newdata = train)
test$yhat <- predict(md1, newdata = test)
plot_lm(data = USgas_df,
train = train,
test = test,
title = "Predicting the Trend and Seasonal Components of the
Series")
mape_md1 <- c(mean(abs(train$y - train$yhat) / train$y),
mean(abs(test$y - test$yhat) / test$y))
mape_md1
## [1] 0.04774945 0.09143290
md2 <- lm(y ~ seasonal + trend + I(trend^2), data = train)
summary(md2)
##
## Call:
## lm(formula = y ~ seasonal + trend + I(trend^2), data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -468.47 -54.66 -2.21 63.11 294.32
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.635e+03 3.224e+01 81.738 < 2e-16 ***
## seasonalfeb -3.004e+02 3.540e+01 -8.487 3.69e-15 ***
## seasonalmar -4.841e+02 3.540e+01 -13.676 < 2e-16 ***
## seasonalabr -9.128e+02 3.540e+01 -25.787 < 2e-16 ***
## seasonalmay -1.099e+03 3.540e+01 -31.033 < 2e-16 ***
## seasonaljun -1.118e+03 3.540e+01 -31.588 < 2e-16 ***
## seasonaljul -9.547e+02 3.540e+01 -26.968 < 2e-16 ***
## seasonalago -9.322e+02 3.541e+01 -26.329 < 2e-16 ***
## seasonalsept -1.136e+03 3.541e+01 -32.070 < 2e-16 ***
## seasonaloct -1.046e+03 3.541e+01 -29.532 < 2e-16 ***
## seasonalnov -8.001e+02 3.590e+01 -22.286 < 2e-16 ***
## seasonaldic -2.618e+02 3.590e+01 -7.293 5.95e-12 ***
## trend -1.270e+00 4.472e-01 -2.840 0.00494 **
## I(trend^2) 1.713e-02 1.908e-03 8.977 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 109.1 on 212 degrees of freedom
## Multiple R-squared: 0.9379, Adjusted R-squared: 0.9341
## F-statistic: 246.1 on 13 and 212 DF, p-value: < 2.2e-16
train$yhat <- predict(md2, newdata = train)
test$yhat <- predict(md2, newdata = test)
plot_lm(data = USgas_df,
train = train,
test = test,
title = "Predicting the Trend (Polynomial) and Seasonal Components
of the Series")
mape_md2 <- c(mean(abs(train$y - train$yhat) / train$y),
mean(abs(test$y - test$yhat) / test$y))
mape_md2
## [1] 0.03688770 0.04212618
USgas_split <- ts_split(USgas, sample.out = h)
train.ts <- USgas_split$train
test.ts <- USgas_split$test
library(forecast)
## Warning: package 'forecast' was built under R version 4.1.1
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
md3 <- tslm(train.ts ~ season + trend + I(trend^2))
summary(md3)
##
## Call:
## tslm(formula = train.ts ~ season + trend + I(trend^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -468.47 -54.66 -2.21 63.11 294.32
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.635e+03 3.224e+01 81.738 < 2e-16 ***
## season2 -3.004e+02 3.540e+01 -8.487 3.69e-15 ***
## season3 -4.841e+02 3.540e+01 -13.676 < 2e-16 ***
## season4 -9.128e+02 3.540e+01 -25.787 < 2e-16 ***
## season5 -1.099e+03 3.540e+01 -31.033 < 2e-16 ***
## season6 -1.118e+03 3.540e+01 -31.588 < 2e-16 ***
## season7 -9.547e+02 3.540e+01 -26.968 < 2e-16 ***
## season8 -9.322e+02 3.541e+01 -26.329 < 2e-16 ***
## season9 -1.136e+03 3.541e+01 -32.070 < 2e-16 ***
## season10 -1.046e+03 3.541e+01 -29.532 < 2e-16 ***
## season11 -8.001e+02 3.590e+01 -22.286 < 2e-16 ***
## season12 -2.618e+02 3.590e+01 -7.293 5.95e-12 ***
## trend -1.270e+00 4.472e-01 -2.840 0.00494 **
## I(trend^2) 1.713e-02 1.908e-03 8.977 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 109.1 on 212 degrees of freedom
## Multiple R-squared: 0.9379, Adjusted R-squared: 0.9341
## F-statistic: 246.1 on 13 and 212 DF, p-value: < 2.2e-16
test_forecast y plot_forecast).
r <- which(USgas_df$ds == as.Date("2014-01-01"))
USgas_df$s_break <- ifelse(year(USgas_df$ds) >= 2010, 1, 0)
USgas_df$s_break[r] <- 1
md3 <- tslm(USgas ~ season + trend + I(trend^2) + s_break, data = USgas_df)
summary(md3)
##
## Call:
## tslm(formula = USgas ~ season + trend + I(trend^2) + s_break,
## data = USgas_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -469.25 -50.68 -2.66 63.63 275.89
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.661e+03 3.200e+01 83.164 < 2e-16 ***
## season2 -3.054e+02 3.448e+01 -8.858 2.61e-16 ***
## season3 -4.849e+02 3.448e+01 -14.062 < 2e-16 ***
## season4 -9.272e+02 3.449e+01 -26.885 < 2e-16 ***
## season5 -1.108e+03 3.449e+01 -32.114 < 2e-16 ***
## season6 -1.127e+03 3.450e+01 -32.660 < 2e-16 ***
## season7 -9.568e+02 3.450e+01 -27.730 < 2e-16 ***
## season8 -9.340e+02 3.451e+01 -27.061 < 2e-16 ***
## season9 -1.138e+03 3.452e+01 -32.972 < 2e-16 ***
## season10 -1.040e+03 3.453e+01 -30.122 < 2e-16 ***
## season11 -7.896e+02 3.497e+01 -22.577 < 2e-16 ***
## season12 -2.649e+02 3.498e+01 -7.571 9.72e-13 ***
## trend -1.928e+00 4.479e-01 -4.304 2.51e-05 ***
## I(trend^2) 1.862e-02 1.676e-03 11.113 < 2e-16 ***
## s_break 6.060e+01 2.836e+01 2.137 0.0337 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 109 on 223 degrees of freedom
## Multiple R-squared: 0.9423, Adjusted R-squared: 0.9387
## F-statistic: 260.3 on 14 and 223 DF, p-value: < 2.2e-16
library(UKgrid)
## Warning: package 'UKgrid' was built under R version 4.1.1
UKdaily <- extract_grid(type = "data.frame",
columns = "ND",
aggregate = "daily")
head(UKdaily)
## TIMESTAMP ND
## 1 2005-04-01 1920069
## 2 2005-04-02 1674699
## 3 2005-04-03 1631352
## 4 2005-04-04 1916693
## 5 2005-04-05 1952082
## 6 2005-04-06 1964584
ts_plot(UKdaily,
title = "The UK National Demand for Electricity",
Ytitle = "MW",
Xtitle = "Year")
ts_heatmap(UKdaily[which(year(UKdaily$TIMESTAMP) >= 2016),],
title = "UK the Daily National Grid Demand Heatmap")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
UKdaily <- UKdaily %>%
mutate(wday = wday(TIMESTAMP, label = TRUE),
month = month(TIMESTAMP, label = TRUE),
lag365 = dplyr::lag(ND, 365)) %>%
filter(!is.na(lag365)) %>%
arrange(TIMESTAMP)
str(UKdaily)
## 'data.frame': 4939 obs. of 5 variables:
## $ TIMESTAMP: Date, format: "2006-04-01" "2006-04-02" ...
## $ ND : int 1718405 1691341 1960325 2023886 2026204 2008422 1981175 1770440 1749715 2012865 ...
## $ wday : Ord.factor w/ 7 levels "dom\\."<"lun\\."<..: 7 1 2 3 4 5 6 7 1 2 ...
## $ month : Ord.factor w/ 12 levels "ene"<"feb"<"mar"<..: 4 4 4 4 4 4 4 4 4 4 ...
## $ lag365 : int 1920069 1674699 1631352 1916693 1952082 1964584 1990895 2003982 1811436 1684720 ...
start_date <- min(UKdaily$TIMESTAMP)
start <- c(year(start_date), yday(start_date))
UK_ts <- ts(UKdaily$ND,
start = start,
frequency = 365)
acf(UK_ts, lag.max = 365*4)
h <- 365
ts_split:
UKpartitions <- ts_split(UK_ts, sample.out = h)
train_ts <- UKpartitions$train
test_ts <- UKpartitions$test
train_df <- UKdaily[1:(nrow(UKdaily) - h), ]
test_df <- UKdaily[(nrow(UKdaily) - h + 1):nrow(UKdaily), ]
md_tslm1 <- tslm(train_ts ~ season + trend)
fc_tslm1 <- forecast(md_tslm1, h = h)
test_forecast(actual = UK_ts,
forecast.obj = fc_tslm1,
test = test_ts)
accuracy(fc_tslm1, test_ts)
## ME RMSE MAE MPE MAPE MASE
## Training set -9.030002e-12 121551.4 100439.64 -0.5721337 6.294111 0.8418677
## Test set -1.740215e+04 123156.6 96785.27 -1.8735336 7.160573 0.8112374
## ACF1 Theil's U
## Training set 0.5277884 NA
## Test set 0.5106681 1.071899
md_tslm2 <- tslm(train_ts ~ season + trend + wday + month,
data = train_df)
fc_tslm2 <- forecast(md_tslm2, h = h, newdata = test_df)
accuracy(fc_tslm2, test_ts)
## ME RMSE MAE MPE MAPE MASE
## Training set -1.589161e-12 70032.14 51999.67 -0.1725754 3.158853 0.4358522
## Test set -1.765395e+04 81683.91 65842.88 -1.3694262 4.703788 0.5518836
## ACF1 Theil's U
## Training set 0.7508301 NA
## Test set 0.6120506 0.6898049
md_tslm3 <- tslm(train_ts ~ season + trend + wday + month + lag365,
data = train_df)
fc_tslm3 <- forecast(md_tslm3, h = h, newdata = test_df)
test_forecast(actual = UK_ts,
forecast.obj = fc_tslm3,
test = test_ts)
accuracy(fc_tslm3, test_ts)
## ME RMSE MAE MPE MAPE MASE
## Training set -1.175109e-11 69904.92 52006.75 -0.1717563 3.163385 0.4359116
## Test set -1.754061e+04 81783.55 65957.66 -1.3613252 4.722083 0.5528457
## ACF1 Theil's U
## Training set 0.7500146 NA
## Test set 0.6094666 0.6925767
lag365:
summary(md_tslm3)$coefficients %>% tail(1)
## Estimate Std. Error t value Pr(>|t|)
## lag365 -0.06038328 0.01545495 -3.907051 9.490321e-05
anova(md_tslm3)
## Analysis of Variance Table
##
## Response: train_ts
## Df Sum Sq Mean Sq F value Pr(>F)
## season 364 1.4279e+14 3.9227e+11 73.5348 < 2.2e-16 ***
## trend 1 7.2634e+13 7.2634e+13 13615.7078 < 2.2e-16 ***
## wday 6 4.5009e+13 7.5016e+12 1406.2214 < 2.2e-16 ***
## month 11 1.3721e+11 1.2473e+10 2.3382 0.007266 **
## lag365 1 8.1432e+10 8.1432e+10 15.2650 9.49e-05 ***
## Residuals 4190 2.2352e+13 5.3345e+09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
final_md <- tslm(UK_ts ~ season + trend + wday + month + lag365,
data = UKdaily)
checkresiduals(final_md)
##
## Breusch-Godfrey test for serial correlation of order up to 730
##
## data: Residuals from Linear regression model
## LM test = 3301.1, df = 730, p-value < 2.2e-16
UK_fc_df <- data.frame(date = seq.Date(from = max(UKdaily$TIMESTAMP) +
days(1),
by = "day",
length.out = h))
UK_fc_df$wday <- factor(wday(UK_fc_df$date, label = TRUE), ordered = FALSE)
UK_fc_df$month <- factor(month(UK_fc_df$date, label = TRUE), ordered =
FALSE)
UK_fc_df$lag365 <- tail(UKdaily$ND, h)
UKgrid_fc <- forecast(final_md, h = h, newdata = UK_fc_df)
plot_forecast(UKgrid_fc,
title = "The UK National Demand for Electricity Forecast",
Ytitle = "MW",
Xtitle = "Year")
library(readxl)
## Warning: package 'readxl' was built under R version 4.1.1
file.choose()
## [1] "C:\\Users\\Steven\\Desktop\\Analisis estadistico\\informe-3.Rmd"
library(readr)
library(knitr)
info3<- read.csv("C:\\Users\\Steven\\Desktop\\database.csv")
library(dplyr)
info3 <- info3 %>%
arrange(Date)
info3 <- ts(info3$Magnitude, start = c(1965, 1), end = c(2016, 12), frequency = 12)
class(info3)
## [1] "ts"
library(TSstudio)
data(info3)
## Warning in data(info3): data set 'info3' not found
ts_plot(info3,
title = "Terremotos ocurridos entre 1965-2016",
Ytitle = "Magnitud",
Xtitle = "Fecha")
ts_info(info3)
## The info3 series is a ts object with 1 variable and 624 observations
## Frequency: 12
## Start time: 1965 1
## End time: 2016 12
ts_decompose(info3)
info3_df <- ts_to_prophet(info3)
head(info3_df)
## ds y
## 1 1965-01-01 6.5
## 2 1965-02-01 5.9
## 3 1965-03-01 5.6
## 4 1965-04-01 5.6
## 5 1965-05-01 6.0
## 6 1965-06-01 6.8
info3_df$trend <- 1:nrow(info3_df)
library(lubridate)
info3_df$seasonal <- factor(month(info3_df$ds, label = T), ordered = FALSE)
head(info3_df)
## ds y trend seasonal
## 1 1965-01-01 6.5 1 ene
## 2 1965-02-01 5.9 2 feb
## 3 1965-03-01 5.6 3 mar
## 4 1965-04-01 5.6 4 abr
## 5 1965-05-01 6.0 5 may
## 6 1965-06-01 6.8 6 jun
h <- 12
train <- info3_df[1:(nrow(info3_df) - h), ]
test <- info3_df[(nrow(info3_df) - h + 1):nrow(info3_df), ]
md_trend <- lm(y ~ trend, data = train)
summary(md_trend)
##
## Call:
## lm(formula = y ~ trend, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4523 -0.3027 -0.1563 0.1497 1.9512
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.9541963 0.0357910 166.360 <2e-16 ***
## trend -0.0001675 0.0001012 -1.656 0.0982 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4422 on 610 degrees of freedom
## Multiple R-squared: 0.004476, Adjusted R-squared: 0.002844
## F-statistic: 2.743 on 1 and 610 DF, p-value: 0.09822
train$yhat <- predict(md_trend, newdata = train)
test$yhat <- predict(md_trend, newdata = test)
library(plotly)
plot_lm <- function(data, train, test, title = NULL){
p <- plot_ly(data = data,
x = ~ ds,
y = ~ y,
type = "scatter",
mode = "line",
name = "Actual") %>%
add_lines(x = ~ train$ds,
y = ~ train$yhat,
line = list(color = "red"),
name = "Fitted") %>%
add_lines(x = ~ test$ds,
y = ~ test$yhat,
line = list(color = "green", dash = "dot", width = 3),
name = "Forecasted") %>%
layout(title = title,
xaxis = list(title = "Fecha"),
yaxis = list(title = "Magnitud"),
legend = list(x = 0.05, y = 0.95))
return(p)}
datos: Los datos de entrada, un objeto data.frame que sigue la misma estructura que el uno de los USgas_df (incluida la variable yhat) train: el conjunto de entrenamiento correspondiente que se utilizó para entrenar el modelo prueba: Del mismo modo, el conjunto de pruebas correspondiente que se utilizó para evaluar la modelo de previsión title: el título de la trama, por defecto, establecido en NULL
plot_lm(data = info3_df,
train = train,
test = test,
title = "Prediction of the trend component of the series")
mape_trend <- c(mean(abs(train$y - train$yhat) / train$y),
mean(abs(test$y - test$yhat) / test$y))
mape_trend
## [1] 0.05435221 0.04106426
md_seasonal <- lm(y ~ seasonal, data = train)
summary(md_seasonal)
##
## Call:
## lm(formula = y ~ seasonal, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4980 -0.3054 -0.1569 0.1657 1.9588
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.87255 0.06184 94.959 <2e-16 ***
## seasonalfeb -0.01569 0.08746 -0.179 0.858
## seasonalmar 0.09216 0.08746 1.054 0.292
## seasonalabr 0.03922 0.08746 0.448 0.654
## seasonalmay 0.08902 0.08746 1.018 0.309
## seasonaljun 0.12549 0.08746 1.435 0.152
## seasonaljul 0.03725 0.08746 0.426 0.670
## seasonalago 0.06863 0.08746 0.785 0.433
## seasonalsept -0.05882 0.08746 -0.673 0.501
## seasonaloct 0.11569 0.08746 1.323 0.186
## seasonalnov -0.06863 0.08746 -0.785 0.433
## seasonaldic -0.06078 0.08746 -0.695 0.487
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4416 on 600 degrees of freedom
## Multiple R-squared: 0.02311, Adjusted R-squared: 0.005199
## F-statistic: 1.29 on 11 and 600 DF, p-value: 0.2257
train$yhat <- predict(md_seasonal, newdata = train)
test$yhat <- predict(md_seasonal, newdata = test)
plot_lm(data = info3_df,
train = train,
test = test,
title = "Predicting the Seasonal Component of the Series")
mape_seasonal <- c(mean(abs(train$y - train$yhat) / train$y),
mean(abs(test$y - test$yhat) / test$y))
mape_seasonal
## [1] 0.05383805 0.04855883
md1 <- lm(y ~ seasonal + trend, data = train)
summary(md1)
##
## Call:
## lm(formula = y ~ seasonal + trend, data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5457 -0.2986 -0.1457 0.1629 1.9132
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.9223487 0.0688236 86.051 <2e-16 ***
## seasonalfeb -0.0155208 0.0873361 -0.178 0.859
## seasonalmar 0.0924878 0.0873363 1.059 0.290
## seasonalabr 0.0397120 0.0873366 0.455 0.649
## seasonalmay 0.0896814 0.0873370 1.027 0.305
## seasonaljun 0.1263174 0.0873375 1.446 0.149
## seasonaljul 0.0382476 0.0873382 0.438 0.662
## seasonalago 0.0697856 0.0873389 0.799 0.425
## seasonalsept -0.0574999 0.0873398 -0.658 0.511
## seasonaloct 0.1171753 0.0873408 1.342 0.180
## seasonalnov -0.0669730 0.0873419 -0.767 0.444
## seasonaldic -0.0589644 0.0873431 -0.675 0.500
## trend -0.0001654 0.0001009 -1.639 0.102
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.441 on 599 degrees of freedom
## Multiple R-squared: 0.02747, Adjusted R-squared: 0.007988
## F-statistic: 1.41 on 12 and 599 DF, p-value: 0.1563
train$yhat <- predict(md1, newdata = train)
test$yhat <- predict(md1, newdata = test)
plot_lm(data = info3_df,
train = train,
test = test,
title = "Predicting the Trend and Seasonal Components of the
Series")
mape_md1 <- c(mean(abs(train$y - train$yhat) / train$y),
mean(abs(test$y - test$yhat) / test$y))
mape_md1
## [1] 0.05375073 0.04350817
md2 <- lm(y ~ seasonal + trend + I(trend^2), data = train)
summary(md2)
##
## Call:
## lm(formula = y ~ seasonal + trend + I(trend^2), data = train)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.6391 -0.2895 -0.1362 0.1768 1.8339
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.035e+00 7.917e-02 76.231 < 2e-16 ***
## seasonalfeb -1.550e-02 8.683e-02 -0.179 0.85836
## seasonalmar 9.252e-02 8.683e-02 1.066 0.28707
## seasonalabr 3.976e-02 8.683e-02 0.458 0.64723
## seasonalmay 8.973e-02 8.683e-02 1.033 0.30184
## seasonaljun 1.264e-01 8.683e-02 1.455 0.14610
## seasonaljul 3.830e-02 8.683e-02 0.441 0.65930
## seasonalago 6.984e-02 8.683e-02 0.804 0.42157
## seasonalsept -5.746e-02 8.683e-02 -0.662 0.50843
## seasonaloct 1.172e-01 8.684e-02 1.350 0.17760
## seasonalnov -6.696e-02 8.684e-02 -0.771 0.44098
## seasonaldic -5.896e-02 8.684e-02 -0.679 0.49739
## trend -1.266e-03 4.019e-04 -3.149 0.00172 **
## I(trend^2) 1.795e-06 6.349e-07 2.827 0.00486 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4385 on 598 degrees of freedom
## Multiple R-squared: 0.0403, Adjusted R-squared: 0.01943
## F-statistic: 1.932 on 13 and 598 DF, p-value: 0.02436
train$yhat <- predict(md2, newdata = train)
test$yhat <- predict(md2, newdata = test)
plot_lm(data = info3_df,
train = train,
test = test,
title = "Predicting the Trend (Polynomial) and Seasonal Components
of the Series")
mape_md2 <- c(mean(abs(train$y - train$yhat) / train$y),
mean(abs(test$y - test$yhat) / test$y))
mape_md2
## [1] 0.05325876 0.05659700
info3_split <- ts_split(USgas, sample.out = h)
train.ts <- info3_split$train
test.ts <- info3_split$test
library(forecast)
md3 <- tslm(train.ts ~ season + trend + I(trend^2))
summary(md3)
##
## Call:
## tslm(formula = train.ts ~ season + trend + I(trend^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -468.47 -54.66 -2.21 63.11 294.32
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.635e+03 3.224e+01 81.738 < 2e-16 ***
## season2 -3.004e+02 3.540e+01 -8.487 3.69e-15 ***
## season3 -4.841e+02 3.540e+01 -13.676 < 2e-16 ***
## season4 -9.128e+02 3.540e+01 -25.787 < 2e-16 ***
## season5 -1.099e+03 3.540e+01 -31.033 < 2e-16 ***
## season6 -1.118e+03 3.540e+01 -31.588 < 2e-16 ***
## season7 -9.547e+02 3.540e+01 -26.968 < 2e-16 ***
## season8 -9.322e+02 3.541e+01 -26.329 < 2e-16 ***
## season9 -1.136e+03 3.541e+01 -32.070 < 2e-16 ***
## season10 -1.046e+03 3.541e+01 -29.532 < 2e-16 ***
## season11 -8.001e+02 3.590e+01 -22.286 < 2e-16 ***
## season12 -2.618e+02 3.590e+01 -7.293 5.95e-12 ***
## trend -1.270e+00 4.472e-01 -2.840 0.00494 **
## I(trend^2) 1.713e-02 1.908e-03 8.977 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 109.1 on 212 degrees of freedom
## Multiple R-squared: 0.9379, Adjusted R-squared: 0.9341
## F-statistic: 246.1 on 13 and 212 DF, p-value: < 2.2e-16
r <- which(info3_df$ds == as.Date("2014-01-01"))
info3_df$s_break <- ifelse(year(info3_df$ds) >= 2010, 1, 0)
info3_df$s_break[r] <- 1
md3 <- tslm(info3 ~ season + trend + I(trend^2) + s_break, data = info3_df)
summary(md3)
##
## Call:
## tslm(formula = info3 ~ season + trend + I(trend^2) + s_break,
## data = info3_df)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.6397 -0.2919 -0.1320 0.1736 1.8357
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 6.040e+00 8.040e-02 75.128 < 2e-16 ***
## season2 -1.143e-02 8.563e-02 -0.134 0.89383
## season3 9.059e-02 8.563e-02 1.058 0.29052
## season4 5.030e-02 8.563e-02 0.587 0.55714
## season5 8.578e-02 8.563e-02 1.002 0.31688
## season6 1.236e-01 8.563e-02 1.443 0.14957
## season7 4.288e-02 8.564e-02 0.501 0.61678
## season8 6.603e-02 8.564e-02 0.771 0.44096
## season9 -5.505e-02 8.564e-02 -0.643 0.52062
## season10 1.162e-01 8.564e-02 1.357 0.17541
## season11 -5.683e-02 8.564e-02 -0.664 0.50723
## season12 -6.061e-02 8.565e-02 -0.708 0.47940
## trend -1.373e-03 4.687e-04 -2.930 0.00352 **
## I(trend^2) 2.062e-06 8.338e-07 2.473 0.01365 *
## s_break -9.285e-02 8.792e-02 -1.056 0.29135
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4366 on 609 degrees of freedom
## Multiple R-squared: 0.03863, Adjusted R-squared: 0.01653
## F-statistic: 1.748 on 14 and 609 DF, p-value: 0.04302
library(UKgrid)
UKdaily <- extract_grid(type = "data.frame",
columns = "ND",
aggregate = "daily")
head(UKdaily)
## TIMESTAMP ND
## 1 2005-04-01 1920069
## 2 2005-04-02 1674699
## 3 2005-04-03 1631352
## 4 2005-04-04 1916693
## 5 2005-04-05 1952082
## 6 2005-04-06 1964584
ts_plot(UKdaily,
title = "Terremotos ocurridos entre 1965-2016",
Ytitle = "Magnitud",
Xtitle = "Fecha")
ts_heatmap(UKdaily[which(year(UKdaily$TIMESTAMP) >= 2012),],
title = "UK Terremos ocurridos entre 1965-2016")
library(dplyr)
UKdaily <- UKdaily %>%
mutate(wday = wday(TIMESTAMP, label = TRUE),
month = month(TIMESTAMP, label = TRUE),
lag365 = dplyr::lag(ND, 365)) %>%
filter(!is.na(lag365)) %>%
arrange(TIMESTAMP)
str(UKdaily)
## 'data.frame': 4939 obs. of 5 variables:
## $ TIMESTAMP: Date, format: "2006-04-01" "2006-04-02" ...
## $ ND : int 1718405 1691341 1960325 2023886 2026204 2008422 1981175 1770440 1749715 2012865 ...
## $ wday : Ord.factor w/ 7 levels "dom\\."<"lun\\."<..: 7 1 2 3 4 5 6 7 1 2 ...
## $ month : Ord.factor w/ 12 levels "ene"<"feb"<"mar"<..: 4 4 4 4 4 4 4 4 4 4 ...
## $ lag365 : int 1920069 1674699 1631352 1916693 1952082 1964584 1990895 2003982 1811436 1684720 ...
start_date <- min(UKdaily$TIMESTAMP)
start <- c(year(start_date), yday(start_date))
UK_ts <- ts(UKdaily$ND,
start = start,
frequency = 365)
acf(UK_ts, lag.max = 365 * 4)
h <- 365
UKpartitions <- ts_split(UK_ts, sample.out = h)
train_ts <- UKpartitions$train
test_ts <- UKpartitions$test
train_df <- UKdaily[1:(nrow(UKdaily) - h), ]
test_df <- UKdaily[(nrow(UKdaily) - h + 1):nrow(UKdaily), ]
md_tslm1 <- tslm(train_ts ~ season + trend)
fc_tslm1 <- forecast(md_tslm1, h = h)
test_forecast(actual = UK_ts,
forecast.obj = fc_tslm1,
test = test_ts)
accuracy(fc_tslm1, test_ts)
## ME RMSE MAE MPE MAPE MASE
## Training set -9.030002e-12 121551.4 100439.64 -0.5721337 6.294111 0.8418677
## Test set -1.740215e+04 123156.6 96785.27 -1.8735336 7.160573 0.8112374
## ACF1 Theil's U
## Training set 0.5277884 NA
## Test set 0.5106681 1.071899
md_tslm3 <- tslm(train_ts ~ season + trend + wday + month + lag365,
data = train_df)
fc_tslm3 <- forecast(md_tslm3, h = h, newdata = test_df)
test_forecast(actual = UK_ts,
forecast.obj = fc_tslm3,
test = test_ts)
accuracy(fc_tslm3, test_ts)
## ME RMSE MAE MPE MAPE MASE
## Training set -1.175109e-11 69904.92 52006.75 -0.1717563 3.163385 0.4359116
## Test set -1.754061e+04 81783.55 65957.66 -1.3613252 4.722083 0.5528457
## ACF1 Theil's U
## Training set 0.7500146 NA
## Test set 0.6094666 0.6925767
summary(md_tslm3)$coefficients %>% tail(1)
## Estimate Std. Error t value Pr(>|t|)
## lag365 -0.06038328 0.01545495 -3.907051 9.490321e-05
anova(md_tslm3)
## Analysis of Variance Table
##
## Response: train_ts
## Df Sum Sq Mean Sq F value Pr(>F)
## season 364 1.4279e+14 3.9227e+11 73.5348 < 2.2e-16 ***
## trend 1 7.2634e+13 7.2634e+13 13615.7078 < 2.2e-16 ***
## wday 6 4.5009e+13 7.5016e+12 1406.2214 < 2.2e-16 ***
## month 11 1.3721e+11 1.2473e+10 2.3382 0.007266 **
## lag365 1 8.1432e+10 8.1432e+10 15.2650 9.49e-05 ***
## Residuals 4190 2.2352e+13 5.3345e+09
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
final_md <- tslm(UK_ts ~ season + trend + wday + month + lag365,
data = UKdaily)
checkresiduals(final_md)
##
## Breusch-Godfrey test for serial correlation of order up to 730
##
## data: Residuals from Linear regression model
## LM test = 3301.1, df = 730, p-value < 2.2e-16
UK_fc_df <- data.frame(date = seq.Date(from = max(UKdaily$TIMESTAMP) +
days(1),
by = "day",
length.out = h))
UK_fc_df$wday <- factor(wday(UK_fc_df$date, label = TRUE), ordered = FALSE)
UK_fc_df$month <- factor(month(UK_fc_df$date, label = TRUE), ordered =
FALSE)
UK_fc_df$lag365 <- tail(UKdaily$ND, h)
UKgrid_fc <- forecast(final_md, h = h, newdata = UK_fc_df)
plot_forecast(UKgrid_fc,
title = "Terremotos ocurridos entre 1965-2016",
Ytitle = "Magnitud",
Xtitle = "Fecha")