Flu epidemics constitute a major public health concern causing respiratory illnesses, hospitalizations, and deaths. According to the National Vital Statistics Reports published in October 2012, influenza ranked as the eighth leading cause of death in 2011 in the United States. Each year, 250,000 to 500,000 deaths are attributed to influenza related diseases throughout the world.
The U.S. Centers for Disease Control and Prevention (CDC) and the European Influenza Surveillance Scheme (EISS) detect influenza activity through virologic and clinical data, including Influenza-like Illness (ILI) physician visits. Reporting national and regional data, however, are published with a 1-2 week lag.
The csv file FluTrain.csv aggregates this data from January 1, 2004 until December 31, 2011 as follows:
“Week” - The range of dates represented by this observation, in year/month/day format.
“ILI” - This column lists the percentage of ILI-related physician visits for the corresponding week.
“Queries” - This column lists the fraction of queries that are ILI-related for the corresponding week, adjusted to be between 0 and 1 (higher values correspond to more ILI-related search queries).
FluTrain = read.csv("FluTrain.csv")
FluTrain
## Week ILI Queries
## 1 2004-01-04 - 2004-01-10 2.4183312 0.23771580
## 2 2004-01-11 - 2004-01-17 1.8090560 0.22045153
## 3 2004-01-18 - 2004-01-24 1.7120239 0.22576361
## 4 2004-01-25 - 2004-01-31 1.5424951 0.23771580
## 5 2004-02-01 - 2004-02-07 1.4378683 0.22443559
## 6 2004-02-08 - 2004-02-14 1.3242740 0.20717131
## 7 2004-02-15 - 2004-02-21 1.3072567 0.24169987
## 8 2004-02-22 - 2004-02-28 1.0369770 0.21646746
## 9 2004-02-29 - 2004-03-06 1.0103204 0.22576361
## 10 2004-03-07 - 2004-03-13 1.0524925 0.19920319
## 11 2004-03-14 - 2004-03-20 1.0200901 0.13413015
## 12 2004-03-21 - 2004-03-27 0.9244187 0.13811421
## 13 2004-03-28 - 2004-04-03 0.7906450 0.14077025
## 14 2004-04-04 - 2004-04-10 0.8026098 0.08233732
## 15 2004-04-11 - 2004-04-17 0.8361300 0.07569721
## 16 2004-04-18 - 2004-04-24 0.7924358 0.07171315
## 17 2004-04-25 - 2004-05-01 0.6835877 0.06640106
## 18 2004-05-02 - 2004-05-08 0.7574523 0.06108898
## 19 2004-05-09 - 2004-05-15 0.7885854 0.04648074
## 20 2004-05-16 - 2004-05-22 0.8121710 0.04648074
## 21 2004-05-23 - 2004-05-29 0.8044629 0.04913679
## 22 2004-05-30 - 2004-06-05 0.8777009 0.05179283
## 23 2004-06-06 - 2004-06-12 0.7414530 0.05577689
## 24 2004-06-13 - 2004-06-19 0.6610222 0.06108898
## 25 2004-06-20 - 2004-06-26 0.7151092 0.06772908
## 26 2004-06-27 - 2004-07-03 0.5622412 0.04780877
## 27 2004-07-04 - 2004-07-10 0.7868082 0.05179283
## 28 2004-07-11 - 2004-07-17 0.8606578 0.04249668
## 29 2004-07-18 - 2004-07-24 0.6899440 0.04648074
## 30 2004-07-25 - 2004-07-31 0.7796912 0.05046481
## 31 2004-08-01 - 2004-08-07 0.6281439 0.05444887
## 32 2004-08-08 - 2004-08-14 0.9024586 0.05312085
## 33 2004-08-15 - 2004-08-21 0.8064432 0.04116866
## 34 2004-08-22 - 2004-08-28 0.8748878 0.04913679
## 35 2004-08-29 - 2004-09-04 0.9932130 0.09827357
## 36 2004-09-05 - 2004-09-11 0.8761408 0.11155379
## 37 2004-09-12 - 2004-09-18 0.9480916 0.15405046
## 38 2004-09-19 - 2004-09-25 0.9269426 0.19787517
## 39 2004-09-26 - 2004-10-02 0.9716430 0.26029217
## 40 2004-10-03 - 2004-10-09 0.8971591 0.32270916
## 41 2004-10-10 - 2004-10-16 1.0224828 0.37715803
## 42 2004-10-17 - 2004-10-23 1.0629632 0.37848606
## 43 2004-10-24 - 2004-10-30 1.1469570 0.31208499
## 44 2004-10-31 - 2004-11-06 1.2049501 0.28286853
## 45 2004-11-07 - 2004-11-13 1.3051655 0.28552457
## 46 2004-11-14 - 2004-11-20 1.2869916 0.30544489
## 47 2004-11-21 - 2004-11-27 1.5946756 0.25232404
## 48 2004-11-28 - 2004-12-04 1.3971432 0.34130146
## 49 2004-12-05 - 2004-12-11 1.4499567 0.32403718
## 50 2004-12-12 - 2004-12-18 1.6174545 0.35590970
## 51 2004-12-19 - 2004-12-25 2.1911192 0.42629482
## 52 2004-12-26 - 2005-01-01 2.5664893 0.54581673
## 53 2005-01-02 - 2005-01-08 2.1764491 0.45418327
## 54 2005-01-09 - 2005-01-15 2.2017121 0.35458167
## 55 2005-01-16 - 2005-01-22 2.5301211 0.38911023
## 56 2005-01-23 - 2005-01-29 3.0652381 0.42363878
## 57 2005-01-30 - 2005-02-05 3.9806083 0.51527225
## 58 2005-02-06 - 2005-02-12 4.5956803 0.59893758
## 59 2005-02-13 - 2005-02-19 4.7519706 0.42762284
## 60 2005-02-20 - 2005-02-26 4.1796206 0.42231076
## 61 2005-02-27 - 2005-03-05 3.4535851 0.36122178
## 62 2005-03-06 - 2005-03-12 3.1585224 0.37184595
## 63 2005-03-13 - 2005-03-19 2.6732010 0.26560425
## 64 2005-03-20 - 2005-03-26 2.3516104 0.33067729
## 65 2005-03-27 - 2005-04-02 1.8924285 0.21381142
## 66 2005-04-03 - 2005-04-09 1.5249048 0.20717131
## 67 2005-04-10 - 2005-04-16 1.4113441 0.15936255
## 68 2005-04-17 - 2005-04-23 1.2506826 0.15670651
## 69 2005-04-24 - 2005-04-30 1.2070250 0.12217795
## 70 2005-05-01 - 2005-05-07 1.0789550 0.14608234
## 71 2005-05-08 - 2005-05-14 1.1452080 0.10491368
## 72 2005-05-15 - 2005-05-21 1.0612426 0.11686587
## 73 2005-05-22 - 2005-05-28 1.0567977 0.09827357
## 74 2005-05-29 - 2005-06-04 1.2519310 0.07835325
## 75 2005-06-05 - 2005-06-11 1.0141893 0.07304117
## 76 2005-06-12 - 2005-06-18 1.0419693 0.08366534
## 77 2005-06-19 - 2005-06-25 0.9540274 0.06507304
## 78 2005-06-26 - 2005-07-02 0.8482299 0.06640106
## 79 2005-07-03 - 2005-07-09 0.8418715 0.07304117
## 80 2005-07-10 - 2005-07-16 0.7308936 0.08499336
## 81 2005-07-17 - 2005-07-23 0.7134316 0.09030544
## 82 2005-07-24 - 2005-07-30 0.6706772 0.09694555
## 83 2005-07-31 - 2005-08-06 0.6892776 0.11553785
## 84 2005-08-07 - 2005-08-13 0.7049290 0.13280212
## 85 2005-08-14 - 2005-08-20 0.6159033 0.13545817
## 86 2005-08-21 - 2005-08-27 0.6094256 0.13014608
## 87 2005-08-28 - 2005-09-03 0.6802587 0.14077025
## 88 2005-09-04 - 2005-09-10 0.7754884 0.13811421
## 89 2005-09-11 - 2005-09-17 0.6834214 0.15006640
## 90 2005-09-18 - 2005-09-24 0.7810748 0.17795485
## 91 2005-09-25 - 2005-10-01 0.8069435 0.30146082
## 92 2005-10-02 - 2005-10-08 1.0763468 0.29614874
## 93 2005-10-09 - 2005-10-15 1.0586890 0.30544489
## 94 2005-10-16 - 2005-10-22 1.1152326 0.25365206
## 95 2005-10-23 - 2005-10-29 1.1238125 0.28286853
## 96 2005-10-30 - 2005-11-05 1.2548892 0.30013280
## 97 2005-11-06 - 2005-11-12 1.3366090 0.22709163
## 98 2005-11-13 - 2005-11-19 1.3786364 0.32270916
## 99 2005-11-20 - 2005-11-26 1.6082900 0.28950863
## 100 2005-11-27 - 2005-12-03 1.4831056 0.31474104
## 101 2005-12-04 - 2005-12-10 1.6537399 0.35325365
## 102 2005-12-11 - 2005-12-17 2.0067892 0.39973440
## 103 2005-12-18 - 2005-12-24 2.5685716 0.39575033
## 104 2005-12-25 - 2005-12-31 3.0527762 0.48738380
## 105 2006-01-01 - 2006-01-07 2.4250373 0.48738380
## 106 2006-01-08 - 2006-01-14 2.0019506 0.38645418
## 107 2006-01-15 - 2006-01-21 2.0586902 0.34794157
## 108 2006-01-22 - 2006-01-28 2.2127697 0.36387782
## 109 2006-01-29 - 2006-02-04 2.3222001 0.40106242
## 110 2006-02-05 - 2006-02-11 2.4927920 0.36786188
## 111 2006-02-12 - 2006-02-18 2.7948942 0.36918991
## 112 2006-02-19 - 2006-02-25 2.9691114 0.38778220
## 113 2006-02-26 - 2006-03-04 2.8395905 0.30146082
## 114 2006-03-05 - 2006-03-11 2.7779902 0.28552457
## 115 2006-03-12 - 2006-03-18 2.4728693 0.28818061
## 116 2006-03-19 - 2006-03-25 2.1806146 0.29083665
## 117 2006-03-26 - 2006-04-01 2.0167951 0.23505976
## 118 2006-04-02 - 2006-04-08 1.6410133 0.18459495
## 119 2006-04-09 - 2006-04-15 1.3582865 0.17928287
## 120 2006-04-16 - 2006-04-22 1.1427983 0.18725100
## 121 2006-04-23 - 2006-04-29 1.0403125 0.14077025
## 122 2006-04-30 - 2006-05-06 0.9643469 0.16334661
## 123 2006-05-07 - 2006-05-13 0.9379817 0.12749004
## 124 2006-05-14 - 2006-05-20 0.9474493 0.13147410
## 125 2006-05-21 - 2006-05-27 0.8919182 0.13545817
## 126 2006-05-28 - 2006-06-03 0.8646427 0.14475432
## 127 2006-06-04 - 2006-06-10 0.9703199 0.13280212
## 128 2006-06-11 - 2006-06-17 0.8443901 0.10756972
## 129 2006-06-18 - 2006-06-24 0.7748704 0.11022576
## 130 2006-06-25 - 2006-07-01 0.8213725 0.10756972
## 131 2006-07-02 - 2006-07-08 0.8727445 0.10225764
## 132 2006-07-09 - 2006-07-15 0.9226345 0.10358566
## 133 2006-07-16 - 2006-07-22 0.8994868 0.08233732
## 134 2006-07-23 - 2006-07-29 0.8430824 0.08897742
## 135 2006-07-30 - 2006-08-05 0.8818244 0.06507304
## 136 2006-08-06 - 2006-08-12 0.8171452 0.07835325
## 137 2006-08-13 - 2006-08-19 0.8715001 0.08897742
## 138 2006-08-20 - 2006-08-26 0.7386205 0.11288181
## 139 2006-08-27 - 2006-09-02 0.7979660 0.10889774
## 140 2006-09-03 - 2006-09-09 1.0139373 0.12881806
## 141 2006-09-10 - 2006-09-16 0.8809358 0.13280212
## 142 2006-09-17 - 2006-09-23 0.9433663 0.21912351
## 143 2006-09-24 - 2006-09-30 0.8915462 0.20849934
## 144 2006-10-01 - 2006-10-07 1.2032228 0.22310757
## 145 2006-10-08 - 2006-10-14 1.0578822 0.23505976
## 146 2006-10-15 - 2006-10-21 1.1305354 0.20584329
## 147 2006-10-22 - 2006-10-28 1.1255230 0.25896414
## 148 2006-10-29 - 2006-11-04 1.2080820 0.26162019
## 149 2006-11-05 - 2006-11-11 1.3495244 0.30544489
## 150 2006-11-12 - 2006-11-18 1.4689004 0.34661355
## 151 2006-11-19 - 2006-11-25 1.8276716 0.28818061
## 152 2006-11-26 - 2006-12-02 1.6656012 0.33997344
## 153 2006-12-03 - 2006-12-09 1.8596834 0.35590970
## 154 2006-12-10 - 2006-12-16 2.3889130 0.36387782
## 155 2006-12-17 - 2006-12-23 2.7897759 0.40239044
## 156 2006-12-24 - 2006-12-30 3.1154858 0.49402390
## 157 2006-12-31 - 2007-01-06 2.2694245 0.54183267
## 158 2007-01-07 - 2007-01-13 1.8635464 0.43160691
## 159 2007-01-14 - 2007-01-20 1.9998635 0.40371846
## 160 2007-01-21 - 2007-01-27 2.4406044 0.38778220
## 161 2007-01-28 - 2007-02-03 2.8301821 0.45683931
## 162 2007-02-04 - 2007-02-10 3.1234256 0.40504648
## 163 2007-02-11 - 2007-02-17 3.2701949 0.44090305
## 164 2007-02-18 - 2007-02-24 3.1775688 0.39840637
## 165 2007-02-25 - 2007-03-03 2.7236366 0.33598938
## 166 2007-03-04 - 2007-03-10 2.5020140 0.32403718
## 167 2007-03-11 - 2007-03-17 2.4271992 0.28154050
## 168 2007-03-18 - 2007-03-24 1.9604132 0.26427623
## 169 2007-03-25 - 2007-03-31 1.5913980 0.25896414
## 170 2007-04-01 - 2007-04-07 1.3697835 0.22709163
## 171 2007-04-08 - 2007-04-14 1.3631668 0.24037185
## 172 2007-04-15 - 2007-04-21 1.1736951 0.21912351
## 173 2007-04-22 - 2007-04-28 1.0635756 0.18725100
## 174 2007-04-29 - 2007-05-05 0.9697111 0.17529881
## 175 2007-05-06 - 2007-05-12 0.9653617 0.13413015
## 176 2007-05-13 - 2007-05-19 0.8567489 0.14608234
## 177 2007-05-20 - 2007-05-26 0.8633465 0.12084993
## 178 2007-05-27 - 2007-06-02 0.9353695 0.12483400
## 179 2007-06-03 - 2007-06-09 0.7455694 0.12616202
## 180 2007-06-10 - 2007-06-16 0.7404281 0.09296149
## 181 2007-06-17 - 2007-06-23 0.6728965 0.10358566
## 182 2007-06-24 - 2007-06-30 0.6662820 0.08897742
## 183 2007-07-01 - 2007-07-07 0.6627473 0.08897742
## 184 2007-07-08 - 2007-07-14 0.5456190 0.09960159
## 185 2007-07-15 - 2007-07-21 0.5862306 0.09694555
## 186 2007-07-22 - 2007-07-28 0.6606867 0.09030544
## 187 2007-07-29 - 2007-08-04 0.5340928 0.10624170
## 188 2007-08-05 - 2007-08-11 0.5855491 0.09827357
## 189 2007-08-12 - 2007-08-18 0.6180750 0.12217795
## 190 2007-08-19 - 2007-08-25 0.6874647 0.11155379
## 191 2007-08-26 - 2007-09-01 0.7156961 0.12217795
## 192 2007-09-02 - 2007-09-08 0.8293131 0.15537849
## 193 2007-09-09 - 2007-09-15 0.8009115 0.14209827
## 194 2007-09-16 - 2007-09-22 0.9184839 0.18990704
## 195 2007-09-23 - 2007-09-29 0.8142590 0.18061089
## 196 2007-09-30 - 2007-10-06 1.0719708 0.25099602
## 197 2007-10-07 - 2007-10-13 1.2178574 0.29083665
## 198 2007-10-14 - 2007-10-20 1.2457554 0.30013280
## 199 2007-10-21 - 2007-10-27 1.3598449 0.28286853
## 200 2007-10-28 - 2007-11-03 1.4467085 0.26560425
## 201 2007-11-04 - 2007-11-10 1.5328638 0.31606906
## 202 2007-11-11 - 2007-11-17 1.6665324 0.30677291
## 203 2007-11-18 - 2007-11-24 1.9748773 0.29747676
## 204 2007-11-25 - 2007-12-01 1.6730547 0.31341302
## 205 2007-12-02 - 2007-12-08 1.6340509 0.34528552
## 206 2007-12-09 - 2007-12-15 1.7459475 0.35059761
## 207 2007-12-16 - 2007-12-22 1.9364319 0.37583001
## 208 2007-12-23 - 2007-12-29 2.4890534 0.44355910
## 209 2007-12-30 - 2008-01-05 2.2540484 0.50464807
## 210 2008-01-06 - 2008-01-12 2.0914715 0.38911023
## 211 2008-01-13 - 2008-01-19 2.3593428 0.35856574
## 212 2008-01-20 - 2008-01-26 3.3233143 0.37848606
## 213 2008-01-27 - 2008-02-02 4.4338100 0.41434263
## 214 2008-02-03 - 2008-02-09 5.3454714 0.52456839
## 215 2008-02-10 - 2008-02-16 5.4225751 0.54847278
## 216 2008-02-17 - 2008-02-23 5.3030330 0.57370518
## 217 2008-02-24 - 2008-03-01 4.2445550 0.48339973
## 218 2008-03-02 - 2008-03-08 3.6280001 0.40371846
## 219 2008-03-09 - 2008-03-15 3.0346275 0.33731740
## 220 2008-03-16 - 2008-03-22 2.5359536 0.30942895
## 221 2008-03-23 - 2008-03-29 2.0573015 0.31739708
## 222 2008-03-30 - 2008-04-05 1.7415035 0.25232404
## 223 2008-04-06 - 2008-04-12 1.4065217 0.22576361
## 224 2008-04-13 - 2008-04-19 1.2686070 0.19521912
## 225 2008-04-20 - 2008-04-26 1.0771887 0.18193891
## 226 2008-04-27 - 2008-05-03 0.9934452 0.17662683
## 227 2008-05-04 - 2008-05-10 0.9112119 0.14608234
## 228 2008-05-11 - 2008-05-17 0.9721091 0.15006640
## 229 2008-05-18 - 2008-05-24 0.9932575 0.14342630
## 230 2008-05-25 - 2008-05-31 1.0913202 0.12881806
## 231 2008-06-01 - 2008-06-07 0.8884460 0.12084993
## 232 2008-06-08 - 2008-06-14 0.8876915 0.12350598
## 233 2008-06-15 - 2008-06-21 0.8831874 0.12483400
## 234 2008-06-22 - 2008-06-28 0.8267564 0.13014608
## 235 2008-06-29 - 2008-07-05 0.7832014 0.10889774
## 236 2008-07-06 - 2008-07-12 0.7806103 0.12483400
## 237 2008-07-13 - 2008-07-19 0.7690726 0.10624170
## 238 2008-07-20 - 2008-07-26 0.7212979 0.09296149
## 239 2008-07-27 - 2008-08-02 0.7525273 0.10624170
## 240 2008-08-03 - 2008-08-09 0.7527210 0.10225764
## 241 2008-08-10 - 2008-08-16 0.7927660 0.13147410
## 242 2008-08-17 - 2008-08-23 0.7438962 0.11553785
## 243 2008-08-24 - 2008-08-30 0.8141663 0.14741036
## 244 2008-08-31 - 2008-09-06 0.8384009 0.16865870
## 245 2008-09-07 - 2008-09-13 0.8511236 0.17529881
## 246 2008-09-14 - 2008-09-20 1.1097575 0.24037185
## 247 2008-09-21 - 2008-09-27 1.0311436 0.24966799
## 248 2008-09-28 - 2008-10-04 1.0228436 0.25099602
## 249 2008-10-05 - 2008-10-11 1.0301739 0.27622842
## 250 2008-10-12 - 2008-10-18 1.0124478 0.26162019
## 251 2008-10-19 - 2008-10-25 1.0835911 0.28286853
## 252 2008-10-26 - 2008-11-01 1.1657765 0.30942895
## 253 2008-11-02 - 2008-11-08 1.1912964 0.27622842
## 254 2008-11-09 - 2008-11-15 1.2807470 0.31474104
## 255 2008-11-16 - 2008-11-22 1.2705251 0.33864542
## 256 2008-11-23 - 2008-11-29 1.5957825 0.32138114
## 257 2008-11-30 - 2008-12-06 1.4584994 0.40903054
## 258 2008-12-07 - 2008-12-13 1.4992072 0.36387782
## 259 2008-12-14 - 2008-12-20 1.6298157 0.36786188
## 260 2008-12-21 - 2008-12-27 2.1556121 0.42231076
## 261 2008-12-28 - 2009-01-03 2.0205270 0.48738380
## 262 2009-01-04 - 2009-01-10 1.5456623 0.38645418
## 263 2009-01-11 - 2009-01-17 1.6422367 0.36387782
## 264 2009-01-18 - 2009-01-24 1.9652378 0.32536521
## 265 2009-01-25 - 2009-01-31 2.3436784 0.39043825
## 266 2009-02-01 - 2009-02-07 2.8605744 0.38247012
## 267 2009-02-08 - 2009-02-14 3.3421049 0.41434263
## 268 2009-02-15 - 2009-02-21 3.2056588 0.40106242
## 269 2009-02-22 - 2009-02-28 3.1004908 0.40903054
## 270 2009-03-01 - 2009-03-07 2.9581850 0.36520584
## 271 2009-03-08 - 2009-03-14 2.4638058 0.32669323
## 272 2009-03-15 - 2009-03-21 2.1927224 0.30146082
## 273 2009-03-22 - 2009-03-28 1.8739459 0.27357238
## 274 2009-03-29 - 2009-04-04 1.6481690 0.24701195
## 275 2009-04-05 - 2009-04-11 1.4987776 0.24435591
## 276 2009-04-12 - 2009-04-18 1.2923267 0.23904382
## 277 2009-04-19 - 2009-04-25 1.2716411 0.29880478
## 278 2009-04-26 - 2009-05-02 2.9815890 0.55538098
## 279 2009-05-03 - 2009-05-09 2.4370224 0.55511288
## 280 2009-05-10 - 2009-05-16 2.2813011 0.34794157
## 281 2009-05-17 - 2009-05-23 3.8157199 0.32536521
## 282 2009-05-24 - 2009-05-30 4.2131523 0.29482072
## 283 2009-05-31 - 2009-06-06 3.1783224 0.28685259
## 284 2009-06-07 - 2009-06-13 2.5097162 0.28685259
## 285 2009-06-14 - 2009-06-20 2.0663177 0.35590970
## 286 2009-06-21 - 2009-06-27 1.7180460 0.26560425
## 287 2009-06-28 - 2009-07-04 1.5596467 0.25763612
## 288 2009-07-05 - 2009-07-11 1.3085629 0.28419655
## 289 2009-07-12 - 2009-07-18 1.1869460 0.37317397
## 290 2009-07-19 - 2009-07-25 1.1379623 0.35192563
## 291 2009-07-26 - 2009-08-01 1.1500523 0.32536521
## 292 2009-08-02 - 2009-08-08 1.1126189 0.30411687
## 293 2009-08-09 - 2009-08-15 1.1614188 0.38114210
## 294 2009-08-16 - 2009-08-22 1.6410714 0.30810093
## 295 2009-08-23 - 2009-08-29 2.4716598 0.34661355
## 296 2009-08-30 - 2009-09-05 3.7196936 0.41168659
## 297 2009-09-06 - 2009-09-12 3.9497480 0.46082337
## 298 2009-09-13 - 2009-09-19 4.0875636 0.51925631
## 299 2009-09-20 - 2009-09-26 4.0189724 0.67994688
## 300 2009-09-27 - 2009-10-03 4.6036164 0.67861886
## 301 2009-10-04 - 2009-10-10 5.6608671 0.74369190
## 302 2009-10-11 - 2009-10-17 6.8152222 0.77689243
## 303 2009-10-18 - 2009-10-24 7.6188921 1.00000000
## 304 2009-10-25 - 2009-10-31 7.3883586 0.92695883
## 305 2009-11-01 - 2009-11-07 6.3392723 0.86055777
## 306 2009-11-08 - 2009-11-14 4.9434950 0.67065073
## 307 2009-11-15 - 2009-11-21 3.8099612 0.56972112
## 308 2009-11-22 - 2009-11-28 3.4410588 0.44887118
## 309 2009-11-29 - 2009-12-05 2.6677306 0.50464807
## 310 2009-12-06 - 2009-12-12 2.4718250 0.43027888
## 311 2009-12-13 - 2009-12-19 2.3449995 0.43824701
## 312 2009-12-20 - 2009-12-26 2.7143498 0.50199203
## 313 2009-12-27 - 2010-01-02 2.6766718 0.56175299
## 314 2010-01-03 - 2010-01-09 1.9828382 0.52589641
## 315 2010-01-10 - 2010-01-16 1.8274862 0.38778220
## 316 2010-01-17 - 2010-01-23 1.9260563 0.37848606
## 317 2010-01-24 - 2010-01-30 1.9249472 0.39309429
## 318 2010-01-31 - 2010-02-06 2.0887684 0.41965471
## 319 2010-02-07 - 2010-02-13 2.0343408 0.41035857
## 320 2010-02-14 - 2010-02-20 1.9764946 0.38778220
## 321 2010-02-21 - 2010-02-27 1.9936177 0.39176627
## 322 2010-02-28 - 2010-03-06 1.8538260 0.37715803
## 323 2010-03-07 - 2010-03-13 1.8673036 0.35192563
## 324 2010-03-14 - 2010-03-20 1.6998677 0.34130146
## 325 2010-03-21 - 2010-03-27 1.4974082 0.32802125
## 326 2010-03-28 - 2010-04-03 1.4511188 0.28154050
## 327 2010-04-04 - 2010-04-10 1.2071478 0.26560425
## 328 2010-04-11 - 2010-04-17 1.1741508 0.25365206
## 329 2010-04-18 - 2010-04-24 1.1620668 0.26029217
## 330 2010-04-25 - 2010-05-01 1.1721343 0.23107570
## 331 2010-05-02 - 2010-05-08 1.1216765 0.21912351
## 332 2010-05-09 - 2010-05-15 1.1498116 0.24568393
## 333 2010-05-16 - 2010-05-22 1.1332758 0.21646746
## 334 2010-05-23 - 2010-05-29 1.0817133 0.21248340
## 335 2010-05-30 - 2010-06-05 1.1995860 0.22045153
## 336 2010-06-06 - 2010-06-12 0.9528083 0.20053121
## 337 2010-06-13 - 2010-06-19 0.9160321 0.19389110
## 338 2010-06-20 - 2010-06-26 0.9265822 0.16733068
## 339 2010-06-27 - 2010-07-03 0.8696197 0.17795485
## 340 2010-07-04 - 2010-07-10 0.9031331 0.18459495
## 341 2010-07-11 - 2010-07-17 0.7737757 0.17795485
## 342 2010-07-18 - 2010-07-24 0.7427744 0.17264276
## 343 2010-07-25 - 2010-07-31 0.7309345 0.16733068
## 344 2010-08-01 - 2010-08-07 0.7868818 0.18326693
## 345 2010-08-08 - 2010-08-14 0.7630507 0.20185923
## 346 2010-08-15 - 2010-08-21 0.8410432 0.19920319
## 347 2010-08-22 - 2010-08-28 0.7915728 0.20053121
## 348 2010-08-29 - 2010-09-04 0.9127318 0.22443559
## 349 2010-09-05 - 2010-09-11 1.0339765 0.28286853
## 350 2010-09-12 - 2010-09-18 0.9340091 0.34130146
## 351 2010-09-19 - 2010-09-25 1.0818888 0.36786188
## 352 2010-09-26 - 2010-10-02 1.0656260 0.37583001
## 353 2010-10-03 - 2010-10-09 1.1350529 0.37450199
## 354 2010-10-10 - 2010-10-16 1.2525629 0.35192563
## 355 2010-10-17 - 2010-10-23 1.2456956 0.38645418
## 356 2010-10-24 - 2010-10-30 1.2677380 0.35458167
## 357 2010-10-31 - 2010-11-06 1.4372295 0.37051793
## 358 2010-11-07 - 2010-11-13 1.5334125 0.41434263
## 359 2010-11-14 - 2010-11-20 1.6944544 0.42895086
## 360 2010-11-21 - 2010-11-27 1.9915024 0.41301461
## 361 2010-11-28 - 2010-12-04 1.8130453 0.47144754
## 362 2010-12-05 - 2010-12-11 2.0142579 0.50464807
## 363 2010-12-12 - 2010-12-18 2.5565913 0.54050465
## 364 2010-12-19 - 2010-12-25 3.3818486 0.67463479
## 365 2010-12-26 - 2011-01-01 3.4317231 0.80610890
## 366 2011-01-02 - 2011-01-08 2.6915111 0.75962815
## 367 2011-01-09 - 2011-01-15 2.9106289 0.56706507
## 368 2011-01-16 - 2011-01-22 3.4923189 0.51925631
## 369 2011-01-23 - 2011-01-29 4.0036963 0.57237716
## 370 2011-01-30 - 2011-02-05 4.4353368 0.53652058
## 371 2011-02-06 - 2011-02-12 4.2421482 0.57104914
## 372 2011-02-13 - 2011-02-19 4.3971861 0.56308101
## 373 2011-02-20 - 2011-02-26 3.9025565 0.50597610
## 374 2011-02-27 - 2011-03-05 3.1507275 0.47941567
## 375 2011-03-06 - 2011-03-12 2.7242234 0.42363878
## 376 2011-03-13 - 2011-03-19 2.3333563 0.36122178
## 377 2011-03-20 - 2011-03-26 1.9250003 0.35192563
## 378 2011-03-27 - 2011-04-02 1.7524260 0.32934927
## 379 2011-04-03 - 2011-04-09 1.5770365 0.30013280
## 380 2011-04-10 - 2011-04-16 1.3576558 0.29216467
## 381 2011-04-17 - 2011-04-23 1.3122310 0.28552457
## 382 2011-04-24 - 2011-04-30 1.1493747 0.27490040
## 383 2011-05-01 - 2011-05-07 1.1145057 0.25232404
## 384 2011-05-08 - 2011-05-14 1.1098449 0.25498008
## 385 2011-05-15 - 2011-05-21 1.0524026 0.26427623
## 386 2011-05-22 - 2011-05-28 1.0353647 0.26294821
## 387 2011-05-29 - 2011-06-04 1.1177658 0.23373174
## 388 2011-06-05 - 2011-06-11 0.9829495 0.22974768
## 389 2011-06-12 - 2011-06-18 0.9251944 0.21248340
## 390 2011-06-19 - 2011-06-25 0.8355311 0.20185923
## 391 2011-06-26 - 2011-07-02 0.8323927 0.18459495
## 392 2011-07-03 - 2011-07-09 0.8555910 0.18990704
## 393 2011-07-10 - 2011-07-16 0.7069494 0.19389110
## 394 2011-07-17 - 2011-07-23 0.6943868 0.21513944
## 395 2011-07-24 - 2011-07-30 0.6879762 0.21381142
## 396 2011-07-31 - 2011-08-06 0.6447430 0.20584329
## 397 2011-08-07 - 2011-08-13 0.6753299 0.21646746
## 398 2011-08-14 - 2011-08-20 0.7282297 0.22177955
## 399 2011-08-21 - 2011-08-27 0.8065263 0.21513944
## 400 2011-08-28 - 2011-09-03 0.8604084 0.24966799
## 401 2011-09-04 - 2011-09-10 0.9360754 0.28154050
## 402 2011-09-11 - 2011-09-17 0.9666827 0.30942895
## 403 2011-09-18 - 2011-09-24 0.9960071 0.37051793
## 404 2011-09-25 - 2011-10-01 1.1084635 0.41035857
## 405 2011-10-02 - 2011-10-08 1.2030858 0.39442231
## 406 2011-10-09 - 2011-10-15 1.2369566 0.37715803
## 407 2011-10-16 - 2011-10-22 1.2525865 0.39176627
## 408 2011-10-23 - 2011-10-29 1.3054612 0.39840637
## 409 2011-10-30 - 2011-11-05 1.4528432 0.40504648
## 410 2011-11-06 - 2011-11-12 1.4408922 0.42496680
## 411 2011-11-13 - 2011-11-19 1.4622115 0.45551129
## 412 2011-11-20 - 2011-11-26 1.6554147 0.41301461
## 413 2011-11-27 - 2011-12-03 1.4657230 0.47808765
## 414 2011-12-04 - 2011-12-10 1.5181061 0.46480744
## 415 2011-12-11 - 2011-12-17 1.6639544 0.47941567
## 416 2011-12-18 - 2011-12-24 1.8527356 0.53784860
## 417 2011-12-25 - 2011-12-31 2.1241299 0.61885790
summary(FluTrain$ILI)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.5341 0.9025 1.2530 1.6770 2.0590 7.6190
str(FluTrain)
## 'data.frame': 417 obs. of 3 variables:
## $ Week : Factor w/ 417 levels "2004-01-04 - 2004-01-10",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ ILI : num 2.42 1.81 1.71 1.54 1.44 ...
## $ Queries: num 0.238 0.22 0.226 0.238 0.224 ...
FluTrain[which.max(FluTrain$ILI),]
## Week ILI Queries
## 303 2009-10-18 - 2009-10-24 7.618892 1
Looking at the time period 2004-2011, which week corresponds to the highest percentage of ILI-related physician visits?
2009-October-24
Which week corresponds to the highest percentage of ILI-related query fraction?
FluTrain[which.max(FluTrain$Queries),]
## Week ILI Queries
## 303 2009-10-18 - 2009-10-24 7.618892 1
hist(FluTrain$ILI)
Most of the ILI values are small, with a relatively small number of much larger values (in statistics, this sort of data is called “skew right”).
When handling a skewed dependent variable, it is often useful to predict the logarithm of the dependent variable instead of the dependent variable itself – this prevents the small number of unusually large or small observations from having an undue influence on the sum of squared errors of predictive models. In this problem, we will predict the natural log of the ILI variable, which can be computed in R using the log() function.
plot(log(FluTrain$ILI),FluTrain$Queries)
There is a positive, linear relationship between log(ILI) and Queries.
The Linear Regression Model possible equation. log(ILI) = intercept + coefficient x Queries, where the coefficient is positive.
RegModel1 = lm(log(ILI) ~ Queries, data = FluTrain)
summary(RegModel1)
##
## Call:
## lm(formula = log(ILI) ~ Queries, data = FluTrain)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.76003 -0.19696 -0.01657 0.18685 1.06450
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.49934 0.03041 -16.42 <2e-16 ***
## Queries 2.96129 0.09312 31.80 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2995 on 415 degrees of freedom
## Multiple R-squared: 0.709, Adjusted R-squared: 0.7083
## F-statistic: 1011 on 1 and 415 DF, p-value: < 2.2e-16
From summary(FluTrend1), we read that the R-squared value is 0.709.
cor(FluTrain$ILI,FluTrain$Queries)
## [1] 0.8142115
The values of the three expressions are then: Correlation^2 = 0.7090201 log(1/Correlation) = 0.1719357 exp(-0.5*Correlation) = 0.6563792 It appears that Correlation^2 is equal to the R-squared value. It can be proved that this is always the case
The csv file FluTest.csv provides the 2012 weekly data of the ILI-related search queries and the observed weekly percentage of ILI-related physician visits.
Normally, we would obtain test-set predictions from the model FluTrend1 using the code PredTest1 = predict(FluTrend1, newdata=FluTest) However, the dependent variable in our model is log(ILI), so PredTest1 would contain predictions of the log(ILI) value. We are instead interested in obtaining predictions of the ILI value. We can convert from predictions of log(ILI) to predictions of ILI via exponentiation, or the exp() function. The new code, which predicts the ILI value, is
FluTest = read.csv("FluTest.csv")
PredTest1 = exp(predict(RegModel1, newdata = FluTest))
What is our estimate for the percentage of ILI-related physician visits for the week of March 11, 2012?
which(FluTest$Week == "2012-03-11 - 2012-03-17")
## [1] 11
PredTest1[11]
## 11
## 2.187378
What is the relative error betweeen the estimate (our prediction) and the observed value for the week of March 11, 2012?
FluTest$ILI[11]
## [1] 2.293422
(Observed ILI - Estimated ILI)/Observed ILI = .04624
What is the Root Mean Square Error (RMSE) between our estimates and the actual observations for the percentage of ILI-related physician visits, on the test set?
SSE = sum((PredTest1 - FluTest$ILI)^2)
SSE
## [1] 29.17708
RMSE = sqrt(SSE/nrow(FluTest))
RMSE
## [1] 0.7490645
The observations in this dataset are consecutive weekly measurements of the dependent and independent variables. This sort of dataset is called a “time series.” Often, statistical models can be improved by predicting the current value of the dependent variable using the value of the dependent variable from earlier weeks. In our models, this means we will predict the ILI variable in the current week using values of the ILI variable from previous weeks.
First, we need to decide the amount of time to lag the observations. Because the ILI variable is reported with a 1- or 2-week lag, a decision maker cannot rely on the previous week’s ILI value to predict the current week’s value. Instead, the decision maker will only have data available from 2 or more weeks ago. We will build a variable called ILILag2 that contains the ILI value from 2 weeks before the current observation.
install.packages("zoo", repos = "http://cran.us.r-project.org")
## Installing package into 'C:/Users/aaror/Documents/R/win-library/3.3'
## (as 'lib' is unspecified)
## package 'zoo' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\aaror\AppData\Local\Temp\RtmpGm8PCl\downloaded_packages
library(zoo)
## Warning: package 'zoo' was built under R version 3.3.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
In these commands, the value of -2 passed to lag means to return 2 observations before the current one; a positive value would have returned future observations. The parameter na.pad=TRUE means to add missing values for the first two weeks of our dataset, where we can’t compute the data from 2 weeks earlier.
ILILag2 = lag(zoo(FluTrain$ILI),-2, na.pad = TRUE)
FluTrain$ILILag2 = coredata(ILILag2)
summary(FluTrain$ILILag2)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.5341 0.9010 1.2520 1.6750 2.0580 7.6190 2
plot(FluTrain$ILI,FluTrain$ILILag2)
Training a linear regression model on the FluTrain dataset to predict the log of the ILI variable using the Queries variable as well as the log of the ILILag2 variable. Calling this model as FluTrend2.
FluTrend2 = lm(log(ILI) ~ log(ILILag2) + Queries, data = FluTrain )
summary(FluTrend2)
##
## Call:
## lm(formula = log(ILI) ~ log(ILILag2) + Queries, data = FluTrain)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.52209 -0.11082 -0.01819 0.08143 0.76785
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.24064 0.01953 -12.32 <2e-16 ***
## log(ILILag2) 0.65569 0.02251 29.14 <2e-16 ***
## Queries 1.25578 0.07910 15.88 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1703 on 412 degrees of freedom
## (2 observations deleted due to missingness)
## Multiple R-squared: 0.9063, Adjusted R-squared: 0.9059
## F-statistic: 1993 on 2 and 412 DF, p-value: < 2.2e-16
Moving from FluTrend1 to FluTrend2, in-sample R^2 improved from 0.709 to 0.9063, and the new variable is highly significant. As a result, there is no sign of overfitting, and FluTrend2 is superior to FluTrend1 on the training set.
ILILag2 = lag(zoo(FluTest$ILI),-2, na.pad = TRUE)
FluTest$ILILag2 = coredata(ILILag2)
summary(FluTest$ILILag2)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.9018 1.1360 1.3410 1.5190 1.7610 3.6000 2
In this problem, the training and testing sets are split sequentially – the training set contains all observations from 2004-2011 and the testing set contains all observations from 2012. There is no time gap between the two datasets, meaning the first observation in FluTest was recorded one week after the last observation in FluTrain. From this, we can identify how to fill in the missing values for the ILILag2 variable in FluTest.
Which value should be used to fill in the ILILag2 variable for the first observation in FluTest? Ans :- The ILI value of the second-to-last observation in the FluTrain data frame. correct
Which value should be used to fill in the ILILag2 variable for the second observation in FluTest? Ans :- The ILI value of the last observation in the FluTrain data frame.
FluTest$ILILag2[1] = FluTrain$ILI[416]
FluTest$ILILag2[2] = FluTrain$ILI[417]
FluTest$ILILag2[1]
## [1] 1.852736
FluTest$ILILag2[2]
## [1] 2.12413
PredTest2 = exp(predict(FluTrend2, newdata = FluTest))
SSE = sum((PredTest2 - FluTest$ILI)^2)
SSE
## [1] 4.500877
RMSE = sqrt(SSE/nrow(FluTest))
RMSE
## [1] 0.2942029
The test-set RMSE of FluTrend2 is 0.294, as opposed to the 0.749 value obtained by the FluTrend1 model. In this problem, we used a simple time series model with a single lag term. ARIMA models are a more general form of the model we built, which can include multiple lag terms as well as more complicated combinations of previous values of the dependent variable.