Table of Contents

1) Program Overview

The purpose of this program is to append new temperature files to existing temperature time series files. The program is designed to handle a variety of different file naming conventions, file types, column/row formats, data types, and column names.

1.1) R Project File Contents

Inputs Folder where the program will create and store copies of the new data files and combined data files that you select with the dialogue box. Working with copies of the new and combined data safeguards against corrupting or loosing older versions of those data archived in the Data Base.

Module Compendiums — Folder where compilations of other scripts that contain functions used in the MAIN.R script are housed. The compendium used by the MAIN.R script is listed at the start of the script. Previous compendiums were used in older versions of MAIN.R.

Old Scripts — Folder that’s just an archive of old versions of the MAIN.R and module scripts. These scripts aren’t used for anything.

Operational Files — Folder containing any files that are used in by the MAIN.R script or any of the modules. These include (but not limited to) the .xlsx files containing all the metadata about each logger in the PGR network.

Outputs — A folder that can (optionally) be used by the user to output the new combined files so they can be checked for accuracy before putting them into the Data Base.

MAIN.R — The main R script that appends the new files to the old combined data files.

Figure 1.
Figure 1.


2) Guide Book

This section presents a step-by-step guide for users to use the program.

2.1) What file formats can be used?

Table 1 below shows the file logger/file type combinations that are supported by the program. The special pre-formatting requirements are any formatting that the user should double check before running the script. However, they usually are already set this way by default. This table may not cover all of the possible formatings that come off of the stations. If there are any settings that are missing let Casey know so he can update the script.

Table 1. Types of files, by logger type, that can be read by the program.
Logger Type File Type Special Pre-Formatting Requirements
HOBO
  • .xlsx
  • There must be a single DateTime column, not divided into separate Date and Time columns.
HOBO bluetooth
  • .csv
Pendant
  • .xlsx

  • .csv

  • There must be a single DateTime column, not divided into separate Date and Time columns.
LOGR
  • .xlsx

  • .csv

  • Of the 4 files that the LOGR outputs, you should use the one that contains “temp” in the name.
  • The columns containing ground temperature data need to have “CH” somewhere in their column names (this seems to be a default).
  • The datetimes should be text or numeric unix values.
    • If the datetimes values are text, they should be in one of the following formats:

      • mdY HM

      • Ymd HMS

Met Station
  • .dat


2.2) Where should the new and combined files be before running the script?

The new data and combined data files should already be inside their appropriate Site folder in the Data Base (Fig 2). The program will make copies of the select new and combined data and place them in the Inputs folder of the R Project folder (Fig. 1).

Figure X.X.
Figure X.X.


2.3) How should my new data and combined data files be named before running the program?

The new files can keep the original names that the logger gave them. Table 2 summarizes the possible formats for the combined file names:

Table 2. Combined file name formats.
Data Type Format Example
Air aircombined_{borehole}_{monthYear} aircombined_BH11a_Sept2015
Internal Surface surfaceIntcombined_{borehole}_{monthYear} surfaceIntcombined_BH24_Apr2019
External Surface surfaceExtcombined_{borehole}_{monthYear} surfaceExtcombined_BH05_Jun2021
Ground groundcombined_{borehole}_{monthYear} groundcombined_BH13_Sept2020

The month and year at the end of the combined file names are only to help us as users know when the last time that file was updated, and is not required for the functioning of the program.

2.4) What if the site is brand new and does not have any combined data yet?

The current version of the program does not create new combined files. You will need to make new empty combined files from scratch and also fill them in with the data the first time. The next version will either prompt the user whether to create a new combined file for a new site (see Section 4) Upcoming Updates for more details). For now, following the instructions below to prepare the files for a new site so they can be used with the script in the future:

Preparing a New Site for Program Use

  1. Create new .xlsx combined files for the site.

    Check what kinds of data you have (air, internal surface, external surface, ground). For each data type, make a new .xlsx file with the appropriate combined data name (see table in Section 2.3 table). The column names of each of these new combined files are shown in Fig 3. You don’t need all of them, only the ones relevant to the data coming off the logger:

    Figure 3.
    Figure 3.


    1. Update the appropriate Logger_Information.xlsx file

      The Logger_Information.xlsx files contain all of the meta data about each of the loggers for a particular region. They are found in the R project subdirectory (Fig 4):

      Data Base > R > Temperature Combining Program > Operational Files

      Figure 4.
      Figure 4.


      Make a new row in the appropriate .xlsx file for each new logger. Table 3 below summarizes what to enter in to each column in a logger information sheet:

      Table 3. Information to fill into a logger information sheet.
      Column To Fill In
      Raw_File_Name
      • The name of the file that comes off of the logger.

      • WITHOUT the file extension (e.g. .xlsx)

      Logger
      • The type of logger. One of:

        • HOBO

        • HOBObluetooth

        • LOGR

        • pendant

        • Met

      • (Met basically implies CR1000x)

      Borehole
      • The simple borehole designator
      Type
      • Type of data output by the logger

      • One of:

        • air

        • surfaceInt

        • surfaceExt

        • ground

      • If there are multiple types, separate the types by a comma and NO spaces

      Combined_File_Name
      • The name of the excel file that contains the compiled time series of data for the logger

      • WITHOUT the file extension (e.g. .xlsx)

      • See Table 2 for combined file name formats

      Tandem_Loggers
      • Tandem loggers are loggers that are part of the same borehole or station that are recording the same data type (e.g. two HOBO loggers recording ground temperature at different sets of depth)

      • If a borehole/station has tandem loggers, ENTER THE DATA TYPE (air, surfaceInt, surfaceExt, or ground)

      • ELSE, LEAVE BLANK

      Column_Names
      • A comma-delineated list of the names of the columns.

      • Column names are separated by a comma and NO SPACES

      • See Figure 3 for column name formats

3) MAIN.R Script

3.1) Package Installation and Importing

Imports external packages of code used by the program.

```r
```r
```r
library(pacman)
pacman::p_load(dplyr,writexl,clock,ggplot2,plotly,reshape2,reshape,stats,tidyverse,ggsci,
               openxlsx,tidyselect,stringr,readxl,lubridate,DT,tibble,haven,DescTools,
               imputeTS,berryFunctions,clock,gdata,extrafont,ggpmisc,ggsci,ggh4x,viridis,rlist,
               zoom,stringi,ggrepel,fabR,devtools,tcltk)

<!-- rnb-source-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->


## 3.2) Module Importing

Runs the scripts within the separate module files in the **Module Compendiums** folder.


<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuYGBgclxuIyBHZXRzIG1hdGNoZXMgYmV0d2VlbiBuZXdfZmlsZW5hbWVzX3RydW5jIGFuZCB0aGUgUmF3IEZpbGUgTmFtZXMgaW4gdGhlIExvZ2dlcl9JbmZvcm1hdGlvbiBmaWxlLlxubmV3X2ZpbGVfaW5mbyA8LSBuZXdfZmlsZV9pbmZvIHw+IFxuICBtdXRhdGUoZmlsZW5hbWVNYXRjaGVzID0gaWZlbHNlKG5ld19maWxlbmFtZXNfdHJ1bmMgJWluJSBsb2dnZXJJbmZvJFJhd19GaWxlX05hbWUsIFxuICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIG5ld19maWxlbmFtZXNfdHJ1bmMsIFxuICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIE5BKSlcblxuIyBHZXQgbm9uLW1hdGNoZXNcbm5ld19maWxlX2luZm8gPC0gbmV3X2ZpbGVfaW5mbyB8PiBcbiAgbXV0YXRlKGZpbGVuYW1lTm9NYXRjaCA9IGlmZWxzZSghKG5ld19maWxlbmFtZXNfdHJ1bmMgJWluJSBsb2dnZXJJbmZvJFJhd19GaWxlX05hbWUpLFxuICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIG5ld19maWxlbmFtZXNfdHJ1bmMsXG4gICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgTkEpKVxuXG5cbiMgSWYgbm9uLW1hdGNoZXMgKFxcbmV3X2ZpbGVuYW1lc190cnVuY1xcIHRoYXQgZG9uJ3QgaGF2ZSBhIGNvcnJlc3BvbmRpbmcgXFxSYXdfRmlsZV9OYW1lXFwgaW4gXFxsb2dnZXJJbmZvXFwpIFxuIyB3ZXJlIGZvdW5kLCBub3RpZnkgdXNlciBhbmQgYXNrIHdoZXRoZXIgdG8gY29udGludWUgb3IgcXVpdC5cbiMgSWYgdGhlIHVzZXIgc2VsZWN0cyBcXHlcXCwgbm9uLW1hdGNoZXMgYXJlIHJlbW92ZWQgZnJvbSBib3RoIG5ld19maWxlbmFtZXMgYW5kIG5ld19maWxlbmFtZXNfdHJ1bmNcbmlmKGxlbmd0aChuZXdfZmlsZV9pbmZvJGZpbGVuYW1lTm9NYXRjaCkgPiAwKSB7XG4gIFxuICAjIFRlbGwgdGhlIHVzZXIgdGhlcmUgYXJlIG5vIG1hdGNoZXMgYmV0d2VlbiB0aGUgbmV3IGRhdGEgZmlsZSBuYW1lcyBhbmQgdGhlIFJhd19GaWxlX05hbWUgY29sdW1uIGluIHRoZSBsb2dnZXIgaW5mb3JtYXRpb24gc2hlZXRcbiAgbWVzc2FnZShwYXN0ZShcXFRoZXJlIGFyZVxcLGxlbmd0aChuZXdfZmlsZV9pbmZvJGZpbGVuYW1lTm9NYXRjaCksXG4gICAgICAgICAgICAgICAgXFxuZXcgZGF0YSBmaWxlcyB3aG8ncyBuYW1lcyBkaWQgbm90IG1hdGNoIGFueSBvZiB0aG9zZSBsaXN0ZWQgaW4gdGhlICdSYXdfRmlsZV9OYW1lJyBjb2x1bW4gaW4gdGhlICdMb2dnZXJfSW5mb3JtYXRpb24ueGxzeCc6XFwpKVxuICBwcmludChkYXRhLmZyYW1lKE5vX01hdGNoZXMgPSBuZXdfZmlsZV9pbmZvJGZpbGVuYW1lTm9NYXRjaCkgfD4gZmlsdGVyKCFpcy5uYShOb19NYXRjaGVzKSkpXG4gIFxuICBcbiAgIyBBc2sgdGhlIHVzZXIgaWYgdGhleSB3b3VsZCBsaWtlIHRvIGNvbnRpbnVlIHdpdGggdGhlIHByb2dyYW0gZXZlbiB0aG91Z2ggdGhlcmUgd2VyZSBpbnN0YW5jZXMgb2Ygbm8gZmlsZSBuYW1lIG1hdGNoZXNcbiAgYW5zd2VyX25vbWF0Y2ggPC0gcmVhZGxpbmUoXFxXb3VsZCB5b3UgbGlrZSB0byBjb250aW51ZT9beS9uXVxcKVxuICBcbiAgXG4gICMgSWYgdGhlIHVzZXIncyBhbnN3ZXIgKGFuc3dlcl9ub21hdGNoKSBpcyBcXG5cXCwgc3RvcHMgdGhlIHByb2dyYW1cbiAgaWYoYW5zd2VyX25vbWF0Y2ggPT0gXFxuXFwpIHtcbiAgICBcbiAgICBcbiAgICBzdG9wKFxcUGxlYXNlIG1ha2UgYXBwcm9wcmlhdGUgZmlsZSBuYW1lIGNoYW5nZXMgdG8gdGhlIG5ldyBmaWxlcyBvciBhZGQgbmV3IHJvd3MgdG8gdGhlICdMb2dnZXJfSW5mb3JtYXRpb24ueGxzeCcgc2hlZXQgYmVmb3JlIHJldHJ5aW5nLlxcKVxuICAgIFxuICAgIFxuICAjIElmIHRoZSB1c2VyJ3MgYW5zd2VyIChhbnN3ZXJfbm9tYXRjaCkgaXMgXFx5XFwuLi5cbiAgfSBlbHNlIGlmIChhbnN3ZXJfbm9tYXRjaCA9PSBcXHlcXCkge1xuICAgIFxuICAgIFxuICAgIG1lc3NhZ2UoXFxcXG5cXG5Qcm9ncmFtIHdpbGwgYXR0ZW1wdCB0byByZW5hbWUgbmV3IGZpbGVzIHRvIHNvbWV0aGluZyB0aGF0IGNhbiBiZSB1c2VkLlxcbklmIGl0IGNhbm5vdFxuYGBgXG5gYGAifQ== -->

```r
```r
```r
# Gets matches between new_filenames_trunc and the Raw File Names in the Logger_Information file.
new_file_info <- new_file_info |> 
  mutate(filenameMatches = ifelse(new_filenames_trunc %in% loggerInfo$Raw_File_Name, 
                                  new_filenames_trunc, 
                                  NA))

# Get non-matches
new_file_info <- new_file_info |> 
  mutate(filenameNoMatch = ifelse(!(new_filenames_trunc %in% loggerInfo$Raw_File_Name),
                                  new_filenames_trunc,
                                  NA))


# If non-matches (\new_filenames_trunc\ that don't have a corresponding \Raw_File_Name\ in \loggerInfo\) 
# were found, notify user and ask whether to continue or quit.
# If the user selects \y\, non-matches are removed from both new_filenames and new_filenames_trunc
if(length(new_file_info$filenameNoMatch) > 0) {
  
  # Tell the user there are no matches between the new data file names and the Raw_File_Name column in the logger information sheet
  message(paste(\There are\,length(new_file_info$filenameNoMatch),
                \new data files who's names did not match any of those listed in the 'Raw_File_Name' column in the 'Logger_Information.xlsx':\))
  print(data.frame(No_Matches = new_file_info$filenameNoMatch) |> filter(!is.na(No_Matches)))
  
  
  # Ask the user if they would like to continue with the program even though there were instances of no file name matches
  answer_nomatch <- readline(\Would you like to continue?[y/n]\)
  
  
  # If the user's answer (answer_nomatch) is \n\, stops the program
  if(answer_nomatch == \n\) {
    
    
    stop(\Please make appropriate file name changes to the new files or add new rows to the 'Logger_Information.xlsx' sheet before retrying.\)
    
    
  # If the user's answer (answer_nomatch) is \y\...
  } else if (answer_nomatch == \y\) {
    
    
    message(\\n\nProgram will attempt to rename new files to something that can be used.\nIf it cannot

<!-- rnb-source-end -->

<!-- rnb-chunk-end -->


<!-- rnb-text-begin -->


## 3.3) Create Session Info File

Creates an table containing meta data about the current R session. This includes the directories of the last combined and new data files that were selected by the user, so when they run the script again, the dialogue windows open to those directories rather than a remote, high level folder somewhere on the computer (basically saves the user a bit of time).


<!-- rnb-text-end -->


<!-- rnb-chunk-begin -->


<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuYGBgclxuIyBDcmVhdGUgYSBzdWJzZXQgb2YgdGhlIGxvZ2dlckluZm8gdGliYmxlIHRoYXQgb25seSBjb250YWlucyByb3dzIGNvcnJlc3BvbmRpbmcgdG8gZmlsZSBuYW1lIG1hdGNoZXNcbiMgYW5kIGNyZWF0ZSBhIG5ldyBcXEltcG9ydF9QYXRod2F5c1xcIGNvbHVtbi5cbmluZm9ybWF0aW9uIDwtIGxvZ2dlckluZm8gfD4gXG4gIGZpbHRlcihSYXdfRmlsZV9OYW1lICVpbiUgbmV3X2ZpbGVfaW5mbyRuZXdfZmlsZW5hbWVzX3RydW5jKSB8PiBcbiAgbXV0YXRlKEltcG9ydF9QYXRod2F5cyA9IGltcG9ydF9wYXRod2F5cywgLmFmdGVyID0gUmF3X0ZpbGVfTmFtZSlcblxuXG4jSW1wb3J0IGFsbCBjb21iaW5lZCBkYXRhIGFzIHRpYmJsZXMgYW5kIG9yZ2FuaXplIHRoZW0gaW50byBhIGxpc3Rcbm5ld19kYXRhIDwtIGxhcHBseShpbmZvcm1hdGlvbiRJbXBvcnRfUGF0aHdheXMsIGZ1bmN0aW9uKGZpbGVwYXRoKSBSRUFEX05FV19URU1QRVJfRklMRShmaWxlcGF0aCkgKVxubmFtZXMobmV3X2RhdGEpIDwtIG5ld19maWxlX2luZm8kbmV3X2ZpbGVuYW1lc190cnVuY1xuXG5cbiNJbXBvcnQgYWxsIGNvbWJpbmVkIGRhdGEgYXMgdGliYmxlcyBhbmQgb3JnYW5pemUgdGhlbSBpbnRvIGEgbGlzdFxuY29tYl9kYXRhIDwtIGxhcHBseShjb21iX2ZpbGVwYXRocywgZnVuY3Rpb24oeCkgUkVBRF9URU1QRVJfRklMRSh4KSApXG5uYW1lcyhjb21iX2RhdGEpIDwtIGNvbWJfZmlsZW5hbWVzXG5cblxuXG5tZXNzYWdlKFxcXFxuXFxuUHJvZ3Jlc3M6ICA3KSBJbXBvcnRhdGlvbiBjb21wbGV0ZS5cXClcblxuXG5cblxuIyBSZWFycmFuZ2UgdGhlIHJvd3MgaW4gXFxpbmZvcm1hdGlvblxcIHRvIG1hdGNoIHRoZSBvcmRlciBvZiBkYXRhIGZpbGVzIGluIFxcbmV3X2RhdGFcXCBcbiMgKGJlY2F1c2UgZm9yIHNvbWUgcmVhc29uIGl0IGRpZG4ndCBkbyB0aGF0IGF1dG9tYXRpY2FsbHkgZHVyaW5nIGltcG9ydGF0aW9uKVxueSA9IE5VTExcbmlmKGxlbmd0aChuZXdfZGF0YSkgPiAxKSB7XG4gIGZvcihpIGluIG5ld19maWxlX2luZm8kbmV3X2ZpbGVuYW1lc190cnVuYykge1xuICAgIHggPC0gaW5mb3JtYXRpb24gfD4gZmlsdGVyKFJhd19GaWxlX05hbWUgPT0gaSlcbiAgICB5IDwtIGJpbmRfcm93cyh5LHgpXG4gIH1cbiAgaW5mb3JtYXRpb24gPC0geVxuICBybSh4LHkpXG59XG5gYGBcbmBgYFxuYGBgIn0= -->

```r
```r
```r
# Create a subset of the loggerInfo tibble that only contains rows corresponding to file name matches
# and create a new \Import_Pathways\ column.
information <- loggerInfo |> 
  filter(Raw_File_Name %in% new_file_info$new_filenames_trunc) |> 
  mutate(Import_Pathways = import_pathways, .after = Raw_File_Name)


#Import all combined data as tibbles and organize them into a list
new_data <- lapply(information$Import_Pathways, function(filepath) READ_NEW_TEMPER_FILE(filepath) )
names(new_data) <- new_file_info$new_filenames_trunc


#Import all combined data as tibbles and organize them into a list
comb_data <- lapply(comb_filepaths, function(x) READ_TEMPER_FILE(x) )
names(comb_data) <- comb_filenames



message(\\n\nProgress:  7) Importation complete.\)




# Rearrange the rows in \information\ to match the order of data files in \new_data\ 
# (because for some reason it didn't do that automatically during importation)
y = NULL
if(length(new_data) > 1) {
  for(i in new_file_info$new_filenames_trunc) {
    x <- information |> filter(Raw_File_Name == i)
    y <- bind_rows(y,x)
  }
  information <- y
  rm(x,y)
}

3.4) Import Logger Information .xlsx

The user selects the region that the new logger data has come from. This imports the relevant logger information excel file that contains meta data about the loggers. The rest of the program uses this meta data to direct how new data is reformatted and how new and old data is combined.

```r
# Gets matches between new_filenames_trunc and the Raw File Names in the Logger_Information file.
new_file_info <- new_file_info |> 
  mutate(filenameMatches = ifelse(new_filenames_trunc %in% loggerInfo$Raw_File_Name, 
                                  new_filenames_trunc, 
                                  NA))

# Get non-matches
new_file_info <- new_file_info |> 
  mutate(filenameNoMatch = ifelse(!(new_filenames_trunc %in% loggerInfo$Raw_File_Name),
                                  new_filenames_trunc,
                                  NA))


# If non-matches (\new_filenames_trunc\ that don't have a corresponding \Raw_File_Name\ in \loggerInfo\) 
# were found, notify user and ask whether to continue or quit.
# If the user selects \y\, non-matches are removed from both new_filenames and new_filenames_trunc
if(length(new_file_info$filenameNoMatch) > 0) {
  
  # Tell the user there are no matches between the new data file names and the Raw_File_Name column in the logger information sheet
  message(paste(\There are\,length(new_file_info$filenameNoMatch),
                \new data files who's names did not match any of those listed in the 'Raw_File_Name' column in the 'Logger_Information.xlsx':\))
  print(data.frame(No_Matches = new_file_info$filenameNoMatch) |> filter(!is.na(No_Matches)))
  
  
  # Ask the user if they would like to continue with the program even though there were instances of no file name matches
  answer_nomatch <- readline(\Would you like to continue?[y/n]\)
  
  
  # If the user's answer (answer_nomatch) is \n\, stops the program
  if(answer_nomatch == \n\) {
    
    
    stop(\Please make appropriate file name changes to the new files or add new rows to the 'Logger_Information.xlsx' sheet before retrying.\)
    
    
  # If the user's answer (answer_nomatch) is \y\...
  } else if (answer_nomatch == \y\) {
    
    
    message(\\n\nProgram will attempt to rename new files to something that can be used.\nIf it cannot

3.5) Choose New Data Files

The section pops up file explorer windows where the user navigates to and selects the new data and compiled data files they want to combine. The program creates copies of these data files and places them in the Inputs/New Data and Inputs/Combined Data subdirectories.

Note The dialogue window might pop up behind all the other open windows on your computer!

# Open a windows explorer window to choose the new data files
new_filepaths <- tk_choose.files(last(sesh_info$lastNewDirectory), caption = "Select New Files")


# If there aren't any character strings in "new_filepaths", stops the script and
# tells the user they need to select new data files.
if(length(new_filepaths) == 0) {
  message("No new data files were selected. Exiting...")
  break
}

# A tibble containing information about the file pathways, file names, extensions,
# roots, whether or not file name matches were found in the logger info sheet, and
# what name, if any, can replace those file names that don't have matches.
new_file_info <- tibble(
  
  new_filepaths = new_filepaths,
  
  new_filepath_roots = sapply(new_filepaths, function(x) {
    str_split(x,"/") |> 
      unlist() |> 
      head(-1) |> 
      paste0(collapse = "/")
  }) |> paste0("/"),
  
  new_filenames = new_filepaths |> basename(),
  
)

# Add a "new_filename_extensions" column to "new_file_info"
new_file_info <- new_file_info |> 
  mutate(
    new_filename_extensions = sapply(new_filenames, function(x) {
      str_split(x, "\\.") |> 
        unlist() |> 
        last() 
    }),
    .after = "new_filepath_roots"
  )


# Add a "new_filenames_trunc" column to "new_file_info" that is just the "new_filenames"
# column without the file extensions
new_file_info <- new_file_info |> 
  mutate(new_filenames_trunc = new_filenames |> 
           str_replace_all(pattern = ".xlsx|.csv", replacement = ""),
         .after = "new_filename_extensions"
         )
  


message("\n\nProgress:  5.1) Progress:  New data files chosen.")



## 5.2) SELECTING COMBINED DATA FILES----



# comb_filepaths <- tk_choose.files(paste0(getwd(),"/Outputs"), caption = "Select Combined Files")
comb_filepaths <- tk_choose.files(paste0(ifelse(is.na(last(sesh_info$lastCombDirectory)), getwd(), last(sesh_info$lastCombDirectory)),"/Outputs"), caption = "Select Combined Files")


# If there aren't any character strings in "comb_filepaths", stops the script and
# tells the user they need to select combined data files.
if(length(comb_filepaths) == 0) {
  message("No combined data files were selected. Exiting...")
  break
}


# Get just the name of the file
comb_filenames <- comb_filepaths |> basename()


# Removes the month and year at the end of the combined file names and renames
# the actual file names to these altered versions.
# (These dates are only for our benefit when we are manually looking through the data base,
# but actually make writing this script a bit more difficult.)
for(i in 1:length(comb_filenames)) {
  
  filename <- comb_filenames[i]
  filename_dateremoved <- gsub("_[A-Za-z]+\\d{4}(?=\\.xlsx)", "", filename, perl = TRUE)
  comb_filenames[i] <- filename_dateremoved
  
}
rm(filename,filename_dateremoved,i)



message("\n\nProgress:  5.2) Progress:  Combined data files chosen.")



## 5.3) SELECTING OUTPUT DIRECTORY----


# Open windows explorer window to choose the output file
chosen_output_directory <- tk_choose.dir(last(sesh_info$lastOutputDirectory), caption = "Select Output Directory")
if(length(chosen_output_directory) == 0) {
  message("No output directory was selected. Exiting...")
  break
}


# Update session info
sesh_info <- sesh_info |> add_row(lastNewDirectory = new_filepaths[1] |> 
                                    str_split("/") |> 
                                    unlist() |> 
                                    head(-1) |> 
                                    paste0(collapse = "/"),
                                  lastCombDirectory = comb_filepaths[1] |> 
                                    str_split("/") |> 
                                    unlist() |> 
                                    head(-1) |> 
                                    paste0(collapse = "/"),
                                  lastOutputDirectory = chosen_output_directory)




message("\n\nProgress:  5.3) Progress:  Output directory chosen.")

3.6) Discover New File Name Matches

This section confirms to the user whether or not there matches found between the new data files that were selected and corresponding rows of meta data in the imported Logger_Information.xlsx file. If a match wasn’t found, the user is prompted to answer whether or not to continue processing the files for which matches were found (new data files without matches won’t be used for anything). If a match wasn’t found, the program won’t be able to properly manipulate the data.

# Gets matches between new_filenames_trunc and the Raw File Names in the Logger_Information file.
new_file_info <- new_file_info |> 
  mutate(filenameMatches = ifelse(new_filenames_trunc %in% loggerInfo$Raw_File_Name, 
                                  new_filenames_trunc, 
                                  NA))

# Get non-matches
new_file_info <- new_file_info |> 
  mutate(filenameNoMatch = ifelse(!(new_filenames_trunc %in% loggerInfo$Raw_File_Name),
                                  new_filenames_trunc,
                                  NA))


# If non-matches ("new_filenames_trunc" that don't have a corresponding "Raw_File_Name" in "loggerInfo") 
# were found, notify user and ask whether to continue or quit.
# If the user selects "y", non-matches are removed from both new_filenames and new_filenames_trunc
if(length(new_file_info$filenameNoMatch) > 0) {
  
  # Tell the user there are no matches between the new data file names and the Raw_File_Name column in the logger information sheet
  message(paste("There are",length(new_file_info$filenameNoMatch),
                "new data files who's names did not match any of those listed in the 'Raw_File_Name' column in the 'Logger_Information.xlsx':"))
  print(data.frame(No_Matches = new_file_info$filenameNoMatch) |> filter(!is.na(No_Matches)))
  
  
  # Ask the user if they would like to continue with the program even though there were instances of no file name matches
  answer_nomatch <- readline("Would you like to continue?[y/n]")
  
  
  # If the user's answer (answer_nomatch) is "n", stops the program
  if(answer_nomatch == "n") {
    
    
    stop("Please make appropriate file name changes to the new files or add new rows to the 'Logger_Information.xlsx' sheet before retrying.")
    
    
  # If the user's answer (answer_nomatch) is "y"...
  } else if (answer_nomatch == "y") {
    
    
    message("\n\nProgram will attempt to rename new files to something that can be used.\nIf it cannot, the file names of new data without matches will not be used for processing\n\n.")
    Sys.sleep(5)
    
    
    new_file_info <- new_file_info |> 
      mutate(replaced_names = TREAT_VAGRANT_LOGGER_NAMES(filenameNoMatch))
    
    
    
    # Replace the relevant "Raw_File_Name" values in "loggerInfo" to their replacement names
    y = sapply(filter(new_file_info, !is.na(replaced_names)) |> pull(replaced_names) , str_which, pattern = loggerInfo$Raw_File_Name)
    loggerInfo[y, "Raw_File_Name"] <- new_file_info |> filter(!is.na(replaced_names)) |> pull(new_filenames_trunc)
    rm(y)
    
    # Create new vector of pathways for use during importing. This includes any "new_pathways"
    # associated with existing "filenameMatches" and with "replaced_names".
    import_pathways <<- new_file_info |> 
      filter(!is.na(filenameMatches) | !is.na(replaced_names)) |> 
      pull(new_filepaths)
    
    
    message("Non-matched filenames and filenames that couldn't be renamed have been removed from new filenames list...")
    Sys.sleep(3)
    
    
  } else {
    
    stop("Please provide a 'y' or 'n' answer.")
    
  }
}



message("\n\nProgress:  6) Parsing filename matches complete.")

3.7) Import New and Combined Data

Imports the data that the user selected.

# Create a subset of the loggerInfo tibble that only contains rows corresponding to file name matches
# and create a new "Import_Pathways" column.
information <- loggerInfo |> 
  filter(Raw_File_Name %in% new_file_info$new_filenames_trunc) |> 
  mutate(Import_Pathways = import_pathways, .after = Raw_File_Name)


#Import all combined data as tibbles and organize them into a list
new_data <- lapply(information$Import_Pathways, function(filepath) READ_NEW_TEMPER_FILE(filepath) )
names(new_data) <- new_file_info$new_filenames_trunc


#Import all combined data as tibbles and organize them into a list
comb_data <- lapply(comb_filepaths, function(x) READ_TEMPER_FILE(x) )
names(comb_data) <- comb_filenames



message("\n\nProgress:  7) Importation complete.")




# Rearrange the rows in "information" to match the order of data files in "new_data" 
# (because for some reason it didn't do that automatically during importation)
y = NULL
if(length(new_data) > 1) {
  for(i in new_file_info$new_filenames_trunc) {
    x <- information |> filter(Raw_File_Name == i)
    y <- bind_rows(y,x)
  }
  information <- y
  rm(x,y)
}

3.8) Set User-Controlled Parameters

Prompts the user to enter in the maximum time gap to fill with interpolated time and temperature values.

crit_gap_hr <- readline("Set time gap threshold for imputation (in hours):  ") |> as.numeric()

3.9) Data Cleaning

Removes and outlier points in the time series. Because automated processes to identify outliers is very complex, this version of the program currently only removes values > 50ºC and < -50ºC, which are too low or high to be considered real.

# For each column with temperature data (air,surface,or ground), replaces temperatures
# that are greater than 50C or less than -50C with NA.


for(i in 1:length(new_data)) {

 workDat <- new_data[[i]]

 # Subsets the names of colums in workDat containing temperature data.
 temperCols <- colnames(workDat) |> PPVO(patterns = c("Air","Surface_Ext","Surface_Int","Depth"))
 temperCols <- colnames(workDat)[temperCols]

 # For each column in "workDat" containing temperature data...
 for(ii in temperCols) {

   z <- workDat[[ii]]
   z[z > 50 | z < -50] <- NA
   workDat[[ii]] <- z

 }
 
 print(paste0("CLEANING COMPLETE:  ",names(new_data)[i]))

}
rm(i)

3.10) Impute Time Gaps

Fills in gaps in the DateTime and temperature data columns that were either in the data set already or were produced by the cleaning process.

# HEADS UP!... the datetime columns in the "new_data" tibbles are now in UTC again.
  
for(i in 1:length(new_data)) {
  workDat_name <- names(new_data)[i]
  workDat <- new_data[[i]] |>
    OPEN_ROWGAPS() |> 
    DATETIME_IMPUTE() |> 
    select(!c(hr_diff,Index))
  new_data[[workDat_name]] <- workDat
  
  print(paste0("IMPUTATION COMPLETE:  ", workDat_name))
  
} 
rm(workDat, workDat_name)

3.11) Append New Data to Combined Data Files

Appends new data tables available for each site to the relevant combined temperature data tables. For a site or borehole that has tandem (x2) ground temperature loggers, the script combines the new tandem ground temperature data together first, and then appends this combined new data to the old compiled data.

Warning! —- The treatment of tandem loggers is currently only valid for tandem GROUND TEMPERATURE loggers!!

# Creates a copy of the list "comb_data" that new versions of the combined data tibbles
# will be assigned to.
new_comb_data <- comb_data



# Get the levels of the "Borehole" column in "information"
boreholes <- information$Borehole |> 
  as.factor() |> 
  levels() |> 
  as.character()


# For each borehole in "information"...
for(b in boreholes) {
  
  
  # Isolate rows in "information" pertaining to the site
  workInfoBorehole <- information |> filter(Borehole == b)
  
  
  # If the borehole doesn't have tandem ground temperature loggers...
  if (sum(!is.na(workInfoBorehole$Tandem_Loggers)) == 0) {
    
    
    message(paste0("No tandem ground temperature data for ", b, ". Using nominal appending method."))
    
    
    # For each logger at the borehole...
    for(rawname in workInfoBorehole$Raw_File_Name) {
      
      
      # Isolate the data types collected by that logger
      workTypes <- workInfoBorehole |> 
        filter(Raw_File_Name == rawname) |> 
        select(Type) |> 
        unlist() |> 
        lapply(function(x) strsplit(x, ",")) |> 
        unlist() |> 
        unique()
      
      
      # Isolate the working new data tibble
      workNew <- new_data[[rawname]]
      
      
      # For each working data type...
      for(t in workTypes) {
        
        
        
        
        # Subset the new data tibble with respect to the relevant columns to the data type
        if(t == "air") {
          workNew_subset <- workNew |> select(DateTime, Air)
        } else if(t == "surfaceInt") {
          workNew_subset <- workNew |> select(DateTime, Surface_Int)
        } else if(t == "surfaceExt") {
          workNew_subset <- workNew |> select(DateTime, Surface_Ext)
        } else if(t == "ground") {
          workNew_subset <- workNew |> select(DateTime, contains("Depth"))
        }
        
        
       
        
        # Gets the working combined file name
        workComb_filename <- workInfoBorehole |> 
          filter(Raw_File_Name == rawname) |> 
          select(Combined_File_Name) |>
          as.character() |> 
          lapply(function(x) strsplit(x, ",") |> unlist()) |> 
          unlist() |> 
          str_subset(t) |> 
          paste0(".xlsx")
          
        
        
        
        # Isolates the working combined data tibble
        workComb <- comb_data[[workComb_filename]]
        
        
        
        
        # Join the subset new data tibble to the isolated combined tibble
        comb_data[[workComb_filename]] <- workComb |> full_join(workNew_subset)
        comb_data[[workComb_filename]] <- comb_data[[workComb_filename]] |>
          distinct(DateTime, .keep_all = TRUE)
        
        
        
      }
      
      
    }
    
    
    
    
    
    
  } else if(sum(!is.na(workInfoBorehole$Tandem_Loggers)) > 0) {
    
    
    message(paste0("Tandem ground temperature data appending method initiated for ", b, "."))
    
    
    # The raw file names for loggers containing tandem data
    tandem_raw_filenames <- workInfoBorehole |> 
      filter(!is.na(Tandem_Loggers)) |> 
      select(Raw_File_Name) |> 
      unlist()
    
    if(length(tandem_raw_filenames) < 2) {
      stop(paste0("Only one tandem file was detected for ", b,". Two tandem files are needed to
      finish appending. Please manually append these data instead."))
    }
    
    
    
    # Create a new temporary tibble by adding the columns of both tandem files together
    to_append <- full_join(new_data[[ tandem_raw_filenames[1] ]], new_data[[ tandem_raw_filenames[2] ]])
    
    
    
    # Append the temporary tibble "to_append" to the combined ground file
    
    for(i in comb_data) {
      if(nrow(i) == 0) {
        i[,"DateTime"] = as.POSIXct(i[,"DateTime"])
      }
    }
    
    appended_ground <- comb_data[[paste0("groundcombined_", b, ".xlsx")]] |>
      full_join(y = to_append) |> 
      select(starts_with("DateTime") | starts_with("Depth"))
    comb_data[[paste0("groundcombined_", b, ".xlsx")]] <- appended_ground
    comb_data[[paste0("groundcombined_", b, ".xlsx")]] <- comb_data[[paste0("groundcombined_", b, ".xlsx")]] |> 
      distinct(DateTime, .keep_all = TRUE)
    
    # If a "aircombined" file exists, append the temporary tibble "to_append" to 
    # the "aircombined" data
    if(sum(str_starts(names(comb_data), "air")) > 0) {
      appended_air <- comb_data[[paste0("aircombined_", b, ".xlsx")]] |> 
        full_join(y = to_append) |> 
        select(starts_with("DateTime") | starts_with("Air")) 
      comb_data[[paste0("aircombined_", b, ".xlsx")]] <- appended_air
      comb_data[[paste0("groundcombined_", b, ".xlsx")]] <- comb_data[[paste0("groundcombined_", b, ".xlsx")]] |> 
        distinct(DateTime, .keep_all = TRUE)
    }
    
    # If a "surfaceInt" file exists, append the temporary tibble "to_append" to 
    # the "surfaceInt" data
    if(sum(str_starts(names(comb_data), "surfaceInt")) > 0) {
      appended_surfaceInt <- comb_data[[paste0("surfaceIntcombined_", b, ".xlsx")]] |> 
        full_join(y = to_append) |> 
        select(starts_with("DateTime") | starts_with("Surface_Int"))
      comb_data[[paste0("surfaceIntcombined_", b, ".xlsx")]] <- appended_surfaceInt
      comb_data[[paste0("groundcombined_", b, ".xlsx")]] <- comb_data[[paste0("groundcombined_", b, ".xlsx")]] |> 
        distinct(DateTime, .keep_all = TRUE)
    }
    
    # If a "surfaceExt" file exists, append the temporary tibble "to_append" to 
    # the "surfaceExt" data
    if(sum(str_starts(names(comb_data), "surfaceExt")) > 0) {
      appended_surfaceExt <- comb_data[[paste0("surfaceExtcombined_", b, ".xlsx")]] |> 
        full_join(y = to_append) |> 
        select(starts_with("DateTime") | starts_with("Surface_Ext"))
      comb_data[[paste0("surfaceExtcombined_", b, ".xlsx")]] <- appended_surfaceExt
      comb_data[[paste0("groundcombined_", b, ".xlsx")]] <- comb_data[[paste0("groundcombined_", b, ".xlsx")]] |> 
        distinct(DateTime, .keep_all = TRUE)
    }
    
    
    
    
    
    
  }
  
  print(paste0("APPENDING COMPLETE:  ", b))
  
}

3.12) Data Exporting

Exports the data to the directory selected by the user using the dialogue window. The name of the output combined data file DOES NOT include the month and year of the update and needs to be added manually by the user. Doing this automatically is an aim for a future update.

for(i in 1:length(comb_data)) {
  
  forexport <- comb_data[[i]]
  name <- names(comb_data)[i] |> 
    strsplit("\\.") |> 
    unlist() %>%
    .[1]
  write_xlsx(forexport, paste0(chosen_output_directory,"/", name, ".xlsx"))
  
  
  print(paste0("EXPORTATION COMPLETE:  ", name))

}

4) Upcoming Updates

---
title: "TEMPERATURE DATA COMBINING PROGRAM"
output:
  html_notebook: default
  word_document: default
---

# Table of Contents

> -   [1) Program Overview]
>     -   [1.1) R Project File Contents]
> -   [2) Guide Book]
>     -   [2.1) What file formats can be used?]
>     -   [2.2) Where should the new and combined files be before running the script?]
>     -   [2.3) How should my new data and combined data files be named before running the program?]
>     -   [2.4) What if the site is brand new and does not have any combined data yet?]
>         -   [Preparing a New Site for Program Use]
> -   [3) MAIN.R Script]
>     -   [3.1) Package Installation and Importing]
>
>     -   [3.2) Module Importing]
>
>     -   [3.3) Create Session Info File]
>
>     -   [3.4) Import Logger Information .xlsx]
>
>     -   [3.5) Choose New Data Files]
>
>     -   [3.6) Discover New File Name Matches]
>
>     -   [3.7) Import New and Combined Data]
>
>     -   [3.8) Set User-Controlled Parameters]
>
>     -   [3.9) Data Cleaning]
>
>     -   [3.10) Impute Time Gaps]
>
>     -   [3.11) Append New Data to Combined Data Files]
>
>     -   [3.12) Data Exporting]
> -   [4) Upcoming Updates]

# 1) Program Overview

The purpose of this program is to append new temperature files to existing temperature time series files. The program is designed to handle a variety of different file naming conventions, file types, column/row formats, data types, and column names.

## 1.1) R Project File Contents

[***Inputs***]{.underline} ***—*** Folder where the program will create and store copies of the new data files and combined data files that you select with the dialogue box. Working with copies of the new and combined data safeguards against corrupting or loosing older versions of those data archived in the Data Base.

[***Module Compendiums***]{.underline} — Folder where compilations of other scripts that contain functions used in the MAIN.R script are housed. The compendium used by the MAIN.R script is listed at the start of the script. Previous compendiums were used in older versions of MAIN.R.

[***Old Scripts***]{.underline} — Folder that's just an archive of old versions of the MAIN.R and module scripts. These scripts aren't used for anything.

[***Operational Files***]{.underline} — Folder containing any files that are used in by the MAIN.R script or any of the modules. These include (but not limited to) the .xlsx files containing all the metadata about each logger in the PGR network.

[***Outputs***]{.underline} — A folder that can (optionally) be used by the user to output the new combined files so they can be checked for accuracy before putting them into the Data Base.

[***MAIN.R***]{.underline} — The main R script that appends the new files to the old combined data files.

![Figure 1.](Operational%20Files/RMD%20Document%20Images/Program%20Overview/R%20Project%20Folder.png)

<br>

# 2) Guide Book

This section presents a step-by-step guide for users to use the program.

## 2.1) What file formats can be used?

Table 1 below shows the file logger/file type combinations that are supported by the program. The special pre-formatting requirements are any formatting that the user should double check before running the script. However, they usually are already set this way by default. This table may not cover all of the possible formatings that come off of the stations. If there are any settings that are missing let Casey know so he can update the script.

+----------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+
| Logger Type    | File Type | Special Pre-Formatting Requirements                                                                                                |
+================+===========+====================================================================================================================================+
| HOBO           | -   .xlsx | -   There must be a single DateTime column, not divided into separate Date and Time columns.                                       |
+----------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+
| HOBO bluetooth | -   .csv  |                                                                                                                                    |
+----------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+
| Pendant        | -   .xlsx | -   There must be a single DateTime column, not divided into separate Date and Time columns.                                       |
|                |           |                                                                                                                                    |
|                | -   .csv  |                                                                                                                                    |
+----------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+
| LOGR           | -   .xlsx | -   Of the 4 files that the LOGR outputs, you should use the one that contains "temp" in the name.                                 |
|                |           | -   The columns containing ground temperature data need to have "CH" somewhere in their column names (this seems to be a default). |
|                | -   .csv  | -   The datetimes should be text or numeric unix values.                                                                           |
|                |           |     -   If the datetimes values are text, they should be in one of the following formats:                                          |
|                |           |                                                                                                                                    |
|                |           |         -   mdY HM                                                                                                                 |
|                |           |                                                                                                                                    |
|                |           |         -   Ymd HMS                                                                                                                |
+----------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+
| Met Station    | -   .dat  |                                                                                                                                    |
+----------------+-----------+------------------------------------------------------------------------------------------------------------------------------------+

: Table 1. Types of files, by logger type, that can be read by the program.

<br>

## 2.2) Where should the new and combined files be before running the script?

The new data and combined data files should already be inside their appropriate Site folder in the Data Base (Fig 2). The program will make copies of the select new and combined data and place them in the Inputs folder of the R Project folder (Fig. 1).

![Figure X.X.](Operational%20Files/RMD%20Document%20Images/Guide%20Book/Data%20Base%20Organization.png)

<br>

## 2.3) How should my new data and combined data files be named before running the program?

The new files can keep the original names that the logger gave them. Table 2 summarizes the possible formats for the combined file names:

| Data Type | Format | Example |
|----|----|----|
| Air | aircombined\_{borehole}\_{monthYear} | aircombined_BH11a_Sept2015 |
| Internal Surface | surfaceIntcombined\_{borehole}\_{monthYear} | surfaceIntcombined_BH24_Apr2019 |
| External Surface | surfaceExtcombined\_{borehole}\_{monthYear} | surfaceExtcombined_BH05_Jun2021 |
| Ground | groundcombined\_{borehole}\_{monthYear} | groundcombined_BH13_Sept2020 |

: Table 2. Combined file name formats.

The month and year at the end of the combined file names are only to help us as users know when the last time that file was updated, and is not required for the functioning of the program.

## 2.4) What if the site is brand new and does not have any combined data yet?

The current version of the program does not create new combined files. You will need to make new empty combined files from scratch and also fill them in with the data the first time. The next version will either prompt the user whether to create a new combined file for a new site (see *Section 4) Upcoming Updates* for more details). For now, following the instructions below to prepare the files for a new site so they can be used with the script in the future:

> ### Preparing a New Site for Program Use
>
> 1.  ***Create new .xlsx combined files for the site.***
>
>     Check what kinds of data you have (air, internal surface, external surface, ground). For each data type, make a new .xlsx file with the appropriate combined data name (see table in Section 2.3 table). The column names of each of these new combined files are shown in Fig 3. You don't need all of them, only the ones relevant to the data coming off the logger:
>
>     ![Figure 3.](Operational%20Files/RMD%20Document%20Images/Guide%20Book/combined%20file%20table%20formats.png)
>
>     <br>
>
>     1.  ***Update the appropriate Logger_Information.xlsx file***
>
>         The Logger_Information.xlsx files contain all of the meta data about each of the loggers for a particular region. They are found in the R project subdirectory (Fig 4):
>
>         **Data Base \> R \> Temperature Combining Program \> Operational Files**
>
>         ![Figure 4.](Operational%20Files/RMD%20Document%20Images/Guide%20Book/Updating%20Logger_Info%20(3).png)
>
>         <br>
>
>         Make a new row in the appropriate .xlsx file for each new logger. Table 3 below summarizes what to enter in to each column in a logger information sheet:
>
>         +--------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>         | Column             | To Fill In                                                                                                                                                                                         |
>         +====================+====================================================================================================================================================================================================+
>         | Raw_File_Name      | -   The name of the file that comes off of the logger.                                                                                                                                             |
>         |                    |                                                                                                                                                                                                    |
>         |                    | -   WITHOUT the file extension (e.g. .xlsx)                                                                                                                                                        |
>         +--------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>         | Logger             | -   The type of logger. One of:                                                                                                                                                                    |
>         |                    |                                                                                                                                                                                                    |
>         |                    |     -   HOBO                                                                                                                                                                                       |
>         |                    |                                                                                                                                                                                                    |
>         |                    |     -   HOBObluetooth                                                                                                                                                                              |
>         |                    |                                                                                                                                                                                                    |
>         |                    |     -   LOGR                                                                                                                                                                                       |
>         |                    |                                                                                                                                                                                                    |
>         |                    |     -   pendant                                                                                                                                                                                    |
>         |                    |                                                                                                                                                                                                    |
>         |                    |     -   Met                                                                                                                                                                                        |
>         |                    |                                                                                                                                                                                                    |
>         |                    | -   (Met basically implies CR1000x)                                                                                                                                                                |
>         +--------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>         | Borehole           | -   The simple borehole designator                                                                                                                                                                 |
>         +--------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>         | Type               | -   Type of data output by the logger                                                                                                                                                              |
>         |                    |                                                                                                                                                                                                    |
>         |                    | -   One of:                                                                                                                                                                                        |
>         |                    |                                                                                                                                                                                                    |
>         |                    |     -   air                                                                                                                                                                                        |
>         |                    |                                                                                                                                                                                                    |
>         |                    |     -   surfaceInt                                                                                                                                                                                 |
>         |                    |                                                                                                                                                                                                    |
>         |                    |     -   surfaceExt                                                                                                                                                                                 |
>         |                    |                                                                                                                                                                                                    |
>         |                    |     -   ground                                                                                                                                                                                     |
>         |                    |                                                                                                                                                                                                    |
>         |                    | -   If there are multiple types, separate the types by a comma and NO spaces                                                                                                                       |
>         +--------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>         | Combined_File_Name | -   The name of the excel file that contains the compiled time series of data for the logger                                                                                                       |
>         |                    |                                                                                                                                                                                                    |
>         |                    | -   WITHOUT the file extension (e.g. .xlsx)                                                                                                                                                        |
>         |                    |                                                                                                                                                                                                    |
>         |                    | -   *See **Table 2** for combined file name formats*                                                                                                                                               |
>         +--------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>         | Tandem_Loggers     | -   Tandem loggers are loggers that are part of the same borehole or station that are recording the same data type (e.g. two HOBO loggers recording ground temperature at different sets of depth) |
>         |                    |                                                                                                                                                                                                    |
>         |                    | -   If a borehole/station has tandem loggers, ENTER THE DATA TYPE (air, surfaceInt, surfaceExt, or ground)                                                                                         |
>         |                    |                                                                                                                                                                                                    |
>         |                    | -   ELSE, LEAVE BLANK                                                                                                                                                                              |
>         +--------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>         | Column_Names       | -   A comma-delineated list of the names of the columns.                                                                                                                                           |
>         |                    |                                                                                                                                                                                                    |
>         |                    | -   Column names are separated by a comma and NO SPACES                                                                                                                                            |
>         |                    |                                                                                                                                                                                                    |
>         |                    | -   *See Figure 3 for column name formats*                                                                                                                                                         |
>         +--------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
>
>         : Table 3. Information to fill into a logger information sheet.

# 3) MAIN.R Script

## 3.1) Package Installation and Importing

Imports external packages of code used by the program.

```{r}
library(pacman)
pacman::p_load(dplyr,writexl,clock,ggplot2,plotly,reshape2,reshape,stats,tidyverse,ggsci,
               openxlsx,tidyselect,stringr,readxl,lubridate,DT,tibble,haven,DescTools,
               imputeTS,berryFunctions,clock,gdata,extrafont,ggpmisc,ggsci,ggh4x,viridis,rlist,
               zoom,stringi,ggrepel,fabR,devtools,tcltk)
```

## 3.2) Module Importing

Runs the scripts within the separate module files in the **Module Compendiums** folder.

```{r}
source("Module Compendiums/Compendium 2/CUSTOM FUNCTIONS.R")
source("Module Compendiums/Compendium 2/READ_NEW_TEMPER_FILE.R")
```

## 3.3) Create Session Info File

Creates an table containing meta data about the current R session. This includes the directories of the last combined and new data files that were selected by the user, so when they run the script again, the dialogue windows open to those directories rather than a remote, high level folder somewhere on the computer (basically saves the user a bit of time).

```{r}
if(!exists("sesh_info")) {
  sesh_info <- tibble(lastNewDirectory = getwd(),
                         lastCombDirectory = NA,
                         lastOutputDirectory = NA)
}
```

## 3.4) Import Logger Information .xlsx

The user selects the region that the new logger data has come from. This imports the relevant logger information excel file that contains meta data about the loggers. The rest of the program uses this meta data to direct how new data is reformatted and how new and old data is combined.

```{r}
# User select region
regions <- c("Alaska Highway", "Dempster Highway", "Klondike Highway","Region Not Listed")
regions_tib <- tibble(Index = 1:length(regions), Region = c("Alaska Highway", "Dempster Highway", "Klondike Highway","Region Not Listed"))
print(regions_tib)
selectedRegion <- readline("Selection region (choose index number):  ") |> as.numeric()
selectedRegion <- regions_tib[[selectedRegion, 2]]

#Import sensor depths .xlsx sheet
if(selectedRegion == "Alaska Highway") {
  loggerInfo <- read_xlsx("Operational Files/AH_Logger_Information.xlsx")
} else if (selectedRegion == "Dempster Highway") {
  loggerInfo <- read_xlsx("Operational Files/DH_Logger_Information.xlsx")
} else if (selectedRegion == "Klondike Highway") {
  loggerInfo <- read_xlsx("Operational Files/KH_Logger_Information.xlsx")
}
```

## 3.5) Choose New Data Files

The section pops up file explorer windows where the user navigates to and selects the new data and compiled data files they want to combine. The program creates copies of these data files and places them in the **Inputs/New Data** and **Inputs/Combined Data** subdirectories.

[**Note**]{.underline} **—** The dialogue window might pop up behind all the other open windows on your computer!

```{r}
# Open a windows explorer window to choose the new data files
new_filepaths <- tk_choose.files(last(sesh_info$lastNewDirectory), caption = "Select New Files")


# If there aren't any character strings in "new_filepaths", stops the script and
# tells the user they need to select new data files.
if(length(new_filepaths) == 0) {
  message("No new data files were selected. Exiting...")
  break
}

# A tibble containing information about the file pathways, file names, extensions,
# roots, whether or not file name matches were found in the logger info sheet, and
# what name, if any, can replace those file names that don't have matches.
new_file_info <- tibble(
  
  new_filepaths = new_filepaths,
  
  new_filepath_roots = sapply(new_filepaths, function(x) {
    str_split(x,"/") |> 
      unlist() |> 
      head(-1) |> 
      paste0(collapse = "/")
  }) |> paste0("/"),
  
  new_filenames = new_filepaths |> basename(),
  
)

# Add a "new_filename_extensions" column to "new_file_info"
new_file_info <- new_file_info |> 
  mutate(
    new_filename_extensions = sapply(new_filenames, function(x) {
      str_split(x, "\\.") |> 
        unlist() |> 
        last() 
    }),
    .after = "new_filepath_roots"
  )


# Add a "new_filenames_trunc" column to "new_file_info" that is just the "new_filenames"
# column without the file extensions
new_file_info <- new_file_info |> 
  mutate(new_filenames_trunc = new_filenames |> 
           str_replace_all(pattern = ".xlsx|.csv", replacement = ""),
         .after = "new_filename_extensions"
         )
  


message("\n\nProgress:  5.1) Progress:  New data files chosen.")



## 5.2) SELECTING COMBINED DATA FILES----



# comb_filepaths <- tk_choose.files(paste0(getwd(),"/Outputs"), caption = "Select Combined Files")
comb_filepaths <- tk_choose.files(paste0(ifelse(is.na(last(sesh_info$lastCombDirectory)), getwd(), last(sesh_info$lastCombDirectory)),"/Outputs"), caption = "Select Combined Files")


# If there aren't any character strings in "comb_filepaths", stops the script and
# tells the user they need to select combined data files.
if(length(comb_filepaths) == 0) {
  message("No combined data files were selected. Exiting...")
  break
}


# Get just the name of the file
comb_filenames <- comb_filepaths |> basename()


# Removes the month and year at the end of the combined file names and renames
# the actual file names to these altered versions.
# (These dates are only for our benefit when we are manually looking through the data base,
# but actually make writing this script a bit more difficult.)
for(i in 1:length(comb_filenames)) {
  
  filename <- comb_filenames[i]
  filename_dateremoved <- gsub("_[A-Za-z]+\\d{4}(?=\\.xlsx)", "", filename, perl = TRUE)
  comb_filenames[i] <- filename_dateremoved
  
}
rm(filename,filename_dateremoved,i)



message("\n\nProgress:  5.2) Progress:  Combined data files chosen.")



## 5.3) SELECTING OUTPUT DIRECTORY----


# Open windows explorer window to choose the output file
chosen_output_directory <- tk_choose.dir(last(sesh_info$lastOutputDirectory), caption = "Select Output Directory")
if(length(chosen_output_directory) == 0) {
  message("No output directory was selected. Exiting...")
  break
}


# Update session info
sesh_info <- sesh_info |> add_row(lastNewDirectory = new_filepaths[1] |> 
                                    str_split("/") |> 
                                    unlist() |> 
                                    head(-1) |> 
                                    paste0(collapse = "/"),
                                  lastCombDirectory = comb_filepaths[1] |> 
                                    str_split("/") |> 
                                    unlist() |> 
                                    head(-1) |> 
                                    paste0(collapse = "/"),
                                  lastOutputDirectory = chosen_output_directory)




message("\n\nProgress:  5.3) Progress:  Output directory chosen.")
```

## 3.6) Discover New File Name Matches

This section confirms to the user whether or not there matches found between the new data files that were selected and corresponding rows of meta data in the imported Logger_Information.xlsx file. If a match wasn't found, the user is prompted to answer whether or not to continue processing the files for which matches were found (new data files without matches won't be used for anything). If a match wasn't found, the program won't be able to properly manipulate the data.

```{r}
# Gets matches between new_filenames_trunc and the Raw File Names in the Logger_Information file.
new_file_info <- new_file_info |> 
  mutate(filenameMatches = ifelse(new_filenames_trunc %in% loggerInfo$Raw_File_Name, 
                                  new_filenames_trunc, 
                                  NA))

# Get non-matches
new_file_info <- new_file_info |> 
  mutate(filenameNoMatch = ifelse(!(new_filenames_trunc %in% loggerInfo$Raw_File_Name),
                                  new_filenames_trunc,
                                  NA))


# If non-matches ("new_filenames_trunc" that don't have a corresponding "Raw_File_Name" in "loggerInfo") 
# were found, notify user and ask whether to continue or quit.
# If the user selects "y", non-matches are removed from both new_filenames and new_filenames_trunc
if(length(new_file_info$filenameNoMatch) > 0) {
  
  # Tell the user there are no matches between the new data file names and the Raw_File_Name column in the logger information sheet
  message(paste("There are",length(new_file_info$filenameNoMatch),
                "new data files who's names did not match any of those listed in the 'Raw_File_Name' column in the 'Logger_Information.xlsx':"))
  print(data.frame(No_Matches = new_file_info$filenameNoMatch) |> filter(!is.na(No_Matches)))
  
  
  # Ask the user if they would like to continue with the program even though there were instances of no file name matches
  answer_nomatch <- readline("Would you like to continue?[y/n]")
  
  
  # If the user's answer (answer_nomatch) is "n", stops the program
  if(answer_nomatch == "n") {
    
    
    stop("Please make appropriate file name changes to the new files or add new rows to the 'Logger_Information.xlsx' sheet before retrying.")
    
    
  # If the user's answer (answer_nomatch) is "y"...
  } else if (answer_nomatch == "y") {
    
    
    message("\n\nProgram will attempt to rename new files to something that can be used.\nIf it cannot, the file names of new data without matches will not be used for processing\n\n.")
    Sys.sleep(5)
    
    
    new_file_info <- new_file_info |> 
      mutate(replaced_names = TREAT_VAGRANT_LOGGER_NAMES(filenameNoMatch))
    
    
    
    # Replace the relevant "Raw_File_Name" values in "loggerInfo" to their replacement names
    y = sapply(filter(new_file_info, !is.na(replaced_names)) |> pull(replaced_names) , str_which, pattern = loggerInfo$Raw_File_Name)
    loggerInfo[y, "Raw_File_Name"] <- new_file_info |> filter(!is.na(replaced_names)) |> pull(new_filenames_trunc)
    rm(y)
    
    # Create new vector of pathways for use during importing. This includes any "new_pathways"
    # associated with existing "filenameMatches" and with "replaced_names".
    import_pathways <<- new_file_info |> 
      filter(!is.na(filenameMatches) | !is.na(replaced_names)) |> 
      pull(new_filepaths)
    
    
    message("Non-matched filenames and filenames that couldn't be renamed have been removed from new filenames list...")
    Sys.sleep(3)
    
    
  } else {
    
    stop("Please provide a 'y' or 'n' answer.")
    
  }
}



message("\n\nProgress:  6) Parsing filename matches complete.")
```

## 3.7) Import New and Combined Data

Imports the data that the user selected.

```{r}
# Create a subset of the loggerInfo tibble that only contains rows corresponding to file name matches
# and create a new "Import_Pathways" column.
information <- loggerInfo |> 
  filter(Raw_File_Name %in% new_file_info$new_filenames_trunc) |> 
  mutate(Import_Pathways = import_pathways, .after = Raw_File_Name)


#Import all combined data as tibbles and organize them into a list
new_data <- lapply(information$Import_Pathways, function(filepath) READ_NEW_TEMPER_FILE(filepath) )
names(new_data) <- new_file_info$new_filenames_trunc


#Import all combined data as tibbles and organize them into a list
comb_data <- lapply(comb_filepaths, function(x) READ_TEMPER_FILE(x) )
names(comb_data) <- comb_filenames



message("\n\nProgress:  7) Importation complete.")




# Rearrange the rows in "information" to match the order of data files in "new_data" 
# (because for some reason it didn't do that automatically during importation)
y = NULL
if(length(new_data) > 1) {
  for(i in new_file_info$new_filenames_trunc) {
    x <- information |> filter(Raw_File_Name == i)
    y <- bind_rows(y,x)
  }
  information <- y
  rm(x,y)
}
```

## 3.8) Set User-Controlled Parameters

Prompts the user to enter in the maximum time gap to fill with interpolated time and temperature values.

```{r}
crit_gap_hr <- readline("Set time gap threshold for imputation (in hours):  ") |> as.numeric()
```

## 3.9) Data Cleaning

Removes and outlier points in the time series. Because automated processes to identify outliers is very complex, this version of the program currently only removes values \> 50ºC and \< -50ºC, which are too low or high to be considered real.

```{r}
# For each column with temperature data (air,surface,or ground), replaces temperatures
# that are greater than 50C or less than -50C with NA.


for(i in 1:length(new_data)) {

 workDat <- new_data[[i]]

 # Subsets the names of colums in workDat containing temperature data.
 temperCols <- colnames(workDat) |> PPVO(patterns = c("Air","Surface_Ext","Surface_Int","Depth"))
 temperCols <- colnames(workDat)[temperCols]

 # For each column in "workDat" containing temperature data...
 for(ii in temperCols) {

   z <- workDat[[ii]]
   z[z > 50 | z < -50] <- NA
   workDat[[ii]] <- z

 }
 
 print(paste0("CLEANING COMPLETE:  ",names(new_data)[i]))

}
rm(i)
```

## 3.10) Impute Time Gaps

Fills in gaps in the DateTime and temperature data columns that were either in the data set already or were produced by the cleaning process.

```{r}
# HEADS UP!... the datetime columns in the "new_data" tibbles are now in UTC again.
  
for(i in 1:length(new_data)) {
  workDat_name <- names(new_data)[i]
  workDat <- new_data[[i]] |>
    OPEN_ROWGAPS() |> 
    DATETIME_IMPUTE() |> 
    select(!c(hr_diff,Index))
  new_data[[workDat_name]] <- workDat
  
  print(paste0("IMPUTATION COMPLETE:  ", workDat_name))
  
} 
rm(workDat, workDat_name)
```

## 3.11) Append New Data to Combined Data Files

Appends new data tables available for each site to the relevant combined temperature data tables. For a site or borehole that has tandem (x2) ground temperature loggers, the script combines the new tandem ground temperature data together first, and then appends this combined new data to the old compiled data.

[**Warning!**]{.underline} **—-** The treatment of tandem loggers is currently only valid for tandem GROUND TEMPERATURE loggers!!

```{r}
# Creates a copy of the list "comb_data" that new versions of the combined data tibbles
# will be assigned to.
new_comb_data <- comb_data



# Get the levels of the "Borehole" column in "information"
boreholes <- information$Borehole |> 
  as.factor() |> 
  levels() |> 
  as.character()


# For each borehole in "information"...
for(b in boreholes) {
  
  
  # Isolate rows in "information" pertaining to the site
  workInfoBorehole <- information |> filter(Borehole == b)
  
  
  # If the borehole doesn't have tandem ground temperature loggers...
  if (sum(!is.na(workInfoBorehole$Tandem_Loggers)) == 0) {
    
    
    message(paste0("No tandem ground temperature data for ", b, ". Using nominal appending method."))
    
    
    # For each logger at the borehole...
    for(rawname in workInfoBorehole$Raw_File_Name) {
      
      
      # Isolate the data types collected by that logger
      workTypes <- workInfoBorehole |> 
        filter(Raw_File_Name == rawname) |> 
        select(Type) |> 
        unlist() |> 
        lapply(function(x) strsplit(x, ",")) |> 
        unlist() |> 
        unique()
      
      
      # Isolate the working new data tibble
      workNew <- new_data[[rawname]]
      
      
      # For each working data type...
      for(t in workTypes) {
        
        
        
        
        # Subset the new data tibble with respect to the relevant columns to the data type
        if(t == "air") {
          workNew_subset <- workNew |> select(DateTime, Air)
        } else if(t == "surfaceInt") {
          workNew_subset <- workNew |> select(DateTime, Surface_Int)
        } else if(t == "surfaceExt") {
          workNew_subset <- workNew |> select(DateTime, Surface_Ext)
        } else if(t == "ground") {
          workNew_subset <- workNew |> select(DateTime, contains("Depth"))
        }
        
        
       
        
        # Gets the working combined file name
        workComb_filename <- workInfoBorehole |> 
          filter(Raw_File_Name == rawname) |> 
          select(Combined_File_Name) |>
          as.character() |> 
          lapply(function(x) strsplit(x, ",") |> unlist()) |> 
          unlist() |> 
          str_subset(t) |> 
          paste0(".xlsx")
          
        
        
        
        # Isolates the working combined data tibble
        workComb <- comb_data[[workComb_filename]]
        
        
        
        
        # Join the subset new data tibble to the isolated combined tibble
        comb_data[[workComb_filename]] <- workComb |> full_join(workNew_subset)
        comb_data[[workComb_filename]] <- comb_data[[workComb_filename]] |>
          distinct(DateTime, .keep_all = TRUE)
        
        
        
      }
      
      
    }
    
    
    
    
    
    
  } else if(sum(!is.na(workInfoBorehole$Tandem_Loggers)) > 0) {
    
    
    message(paste0("Tandem ground temperature data appending method initiated for ", b, "."))
    
    
    # The raw file names for loggers containing tandem data
    tandem_raw_filenames <- workInfoBorehole |> 
      filter(!is.na(Tandem_Loggers)) |> 
      select(Raw_File_Name) |> 
      unlist()
    
    if(length(tandem_raw_filenames) < 2) {
      stop(paste0("Only one tandem file was detected for ", b,". Two tandem files are needed to
      finish appending. Please manually append these data instead."))
    }
    
    
    
    # Create a new temporary tibble by adding the columns of both tandem files together
    to_append <- full_join(new_data[[ tandem_raw_filenames[1] ]], new_data[[ tandem_raw_filenames[2] ]])
    
    
    
    # Append the temporary tibble "to_append" to the combined ground file
    
    for(i in comb_data) {
      if(nrow(i) == 0) {
        i[,"DateTime"] = as.POSIXct(i[,"DateTime"])
      }
    }
    
    appended_ground <- comb_data[[paste0("groundcombined_", b, ".xlsx")]] |>
      full_join(y = to_append) |> 
      select(starts_with("DateTime") | starts_with("Depth"))
    comb_data[[paste0("groundcombined_", b, ".xlsx")]] <- appended_ground
    comb_data[[paste0("groundcombined_", b, ".xlsx")]] <- comb_data[[paste0("groundcombined_", b, ".xlsx")]] |> 
      distinct(DateTime, .keep_all = TRUE)
    
    # If a "aircombined" file exists, append the temporary tibble "to_append" to 
    # the "aircombined" data
    if(sum(str_starts(names(comb_data), "air")) > 0) {
      appended_air <- comb_data[[paste0("aircombined_", b, ".xlsx")]] |> 
        full_join(y = to_append) |> 
        select(starts_with("DateTime") | starts_with("Air")) 
      comb_data[[paste0("aircombined_", b, ".xlsx")]] <- appended_air
      comb_data[[paste0("groundcombined_", b, ".xlsx")]] <- comb_data[[paste0("groundcombined_", b, ".xlsx")]] |> 
        distinct(DateTime, .keep_all = TRUE)
    }
    
    # If a "surfaceInt" file exists, append the temporary tibble "to_append" to 
    # the "surfaceInt" data
    if(sum(str_starts(names(comb_data), "surfaceInt")) > 0) {
      appended_surfaceInt <- comb_data[[paste0("surfaceIntcombined_", b, ".xlsx")]] |> 
        full_join(y = to_append) |> 
        select(starts_with("DateTime") | starts_with("Surface_Int"))
      comb_data[[paste0("surfaceIntcombined_", b, ".xlsx")]] <- appended_surfaceInt
      comb_data[[paste0("groundcombined_", b, ".xlsx")]] <- comb_data[[paste0("groundcombined_", b, ".xlsx")]] |> 
        distinct(DateTime, .keep_all = TRUE)
    }
    
    # If a "surfaceExt" file exists, append the temporary tibble "to_append" to 
    # the "surfaceExt" data
    if(sum(str_starts(names(comb_data), "surfaceExt")) > 0) {
      appended_surfaceExt <- comb_data[[paste0("surfaceExtcombined_", b, ".xlsx")]] |> 
        full_join(y = to_append) |> 
        select(starts_with("DateTime") | starts_with("Surface_Ext"))
      comb_data[[paste0("surfaceExtcombined_", b, ".xlsx")]] <- appended_surfaceExt
      comb_data[[paste0("groundcombined_", b, ".xlsx")]] <- comb_data[[paste0("groundcombined_", b, ".xlsx")]] |> 
        distinct(DateTime, .keep_all = TRUE)
    }
    
    
    
    
    
    
  }
  
  print(paste0("APPENDING COMPLETE:  ", b))
  
}
```

## 3.12) Data Exporting

Exports the data to the directory selected by the user using the dialogue window. The name of the output combined data file DOES NOT include the month and year of the update and needs to be added manually by the user. Doing this automatically is an aim for a future update.

```{r}
for(i in 1:length(comb_data)) {
  
  forexport <- comb_data[[i]]
  name <- names(comb_data)[i] |> 
    strsplit("\\.") |> 
    unlist() %>%
    .[1]
  write_xlsx(forexport, paste0(chosen_output_directory,"/", name, ".xlsx"))
  
  
  print(paste0("EXPORTATION COMPLETE:  ", name))

}
```

# 4) Upcoming Updates
