Part C consists of two data sets. These are simple 2 columns sets, however they have different time stamps. Your optional assignment is to time-base sequence the data and aggregate based on hour (example of what this looks like, follows). Note for multiple recordings within an hour, take the mean. Then to determine if the data is stationary and can it be forecast. If so, provide a week forward forecast and present results via Rpubs and .rmd and the forecast in an Excel readable file.

Observe The Data

wfp1<-readxl::read_excel("Waterflow_Pipe1.xlsx")
wfp2<-readxl::read_excel("Waterflow_Pipe2.xlsx")
#conver Date Time to correct format
wfp1 <- wfp1 %>% mutate(`Date Time` = as.POSIXct(`Date Time`*60*60*24, origin='1899-12-30'))
wfp2 <- wfp2 %>% mutate(`Date Time` = as.POSIXct(`Date Time`*60*60*24, origin='1899-12-30'))
## [1] "Summary of waterflow pipe 1 dataset"
##    Date Time                     WaterFlow     
##  Min.   :2015-10-22 17:24:06   Min.   : 1.067  
##  1st Qu.:2015-10-25 04:21:35   1st Qu.:13.683  
##  Median :2015-10-27 13:07:30   Median :19.880  
##  Mean   :2015-10-27 13:49:15   Mean   :19.897  
##  3rd Qu.:2015-10-30 01:24:51   3rd Qu.:26.159  
##  Max.   :2015-11-01 15:35:43   Max.   :38.913
## [1] "Structure of waterflow pipe 1 dataset"
## Classes 'tbl_df', 'tbl' and 'data.frame':    1000 obs. of  2 variables:
##  $ Date Time: POSIXct, format: "2015-10-22 17:24:06" "2015-10-22 17:40:02" ...
##  $ WaterFlow: num  23.4 28 23.1 30 6 ...
## [1] ""
paste('Summary of waterflow pipe 2 dataset')
## [1] "Summary of waterflow pipe 2 dataset"
summary(wfp2)
##    Date Time                     WaterFlow     
##  Min.   :2015-10-22 18:00:00   Min.   : 1.885  
##  1st Qu.:2015-11-02 02:45:00   1st Qu.:28.140  
##  Median :2015-11-12 12:30:00   Median :39.682  
##  Mean   :2015-11-12 12:30:00   Mean   :39.556  
##  3rd Qu.:2015-11-22 22:15:00   3rd Qu.:50.782  
##  Max.   :2015-12-03 08:00:00   Max.   :78.303
paste(' Structure of waterflow pipe 2 dataset')
## [1] " Structure of waterflow pipe 2 dataset"
str(wfp2)
## Classes 'tbl_df', 'tbl' and 'data.frame':    1000 obs. of  2 variables:
##  $ Date Time: POSIXct, format: "2015-10-22 18:00:00" "2015-10-22 18:59:59" ...
##  $ WaterFlow: num  18.8 43.1 38 36.1 31.9 ...

It looks both data sets cover ther same time period. There appears to be a lot of missing data and requires a lot of data cleaning. We will combine the two data sets and take the mean for each hour.

#combine the datasets
wfp1<-readxl::read_excel("Waterflow_Pipe1.xlsx")
wfp2<-readxl::read_excel("Waterflow_Pipe2.xlsx")

wfp <-rbind(wfp1,wfp2)
summary(wfp)
##    Date Time       WaterFlow     
##  Min.   :42300   Min.   : 1.067  
##  1st Qu.:42304   1st Qu.:17.636  
##  Median :42308   Median :26.730  
##  Mean   :42313   Mean   :29.726  
##  3rd Qu.:42321   3rd Qu.:39.665  
##  Max.   :42342   Max.   :78.303

Now that we have combined our dataset, its time to take the mean of the same hours.

wfp <- wfp %>% mutate(`Date Time` = as.POSIXct(`Date Time`*60*60*24, origin='1899-12-30'))
wfp <- wfp %>% mutate(Date = lubridate::date(`Date Time`),Hour = lubridate::hour(`Date Time`)) %>% select(Date, Hour, WaterFlow) %>% arrange(Date, Hour)
wfp <- wfp %>%group_by(Date, Hour) %>% summarize(WaterFlow = mean(WaterFlow))


head(wfp)
## # A tibble: 6 x 3
## # Groups:   Date [1]
##   Date        Hour WaterFlow
##   <date>     <int>     <dbl>
## 1 2015-10-22    17      26.1
## 2 2015-10-22    18      22.3
## 3 2015-10-22    19      15.2
## 4 2015-10-22    20      25.6
## 5 2015-10-22    21      24.7
## 6 2015-10-22    22      22.7
wfp %>%
  ggplot(aes(Hour, WaterFlow, color=as.factor(Date))) +
  geom_point() +
  geom_line(aes(group=Date)) +
  labs(color='Date')

wfp %>% group_by(Date) %>% count(Hour)
## # A tibble: 745 x 3
## # Groups:   Date [43]
##    Date        Hour     n
##    <date>     <int> <int>
##  1 2015-10-22    17     1
##  2 2015-10-22    18     1
##  3 2015-10-22    19     1
##  4 2015-10-22    20     1
##  5 2015-10-22    21     1
##  6 2015-10-22    22     1
##  7 2015-10-22    23     1
##  8 2015-10-23     0     1
##  9 2015-10-23     1     1
## 10 2015-10-23     2     1
## # ... with 735 more rows