The purpose of this code is to re-format the environmental upcasts (bottle measurements) and downcasts (continuous CTD measurements) from the WOAC cruises so that they can be added to the zooplankton abundance data. It produces a file with upcast data named WOAC_upcasts and a file with downcast data named WOAC_downcasts.

load packages

library(openxlsx)
library(dplyr)
library(lubridate)
library(readxl)

Download data from NANOOS website: I first went to https://nvs.nanoos.org/CruiseSalish and downloaded all relevant files. I moved them to the NANOOS-files folder. I unzipped them manually. I created folders named upcasts and downcasts to add the appropriate files to.

Upcast Bottle Measurements

This code produces a dataframe named WOAC_upcasts which has all of the upcast files merged vertically with common columns.

#find current directory
pwd
## /Users/hailaschultz/Dropbox/Schultz_Dissertation/Data_Analysis/Schultz_dissertation-2/code

move all files labeled upcast to the upcast folder

#move directories
cd /Users/hailaschultz/Dropbox/Schultz_Dissertation/Data_Analysis/Schultz_dissertation-2/data/NANOOS_files

find . -name '*labupcast.xlsx' -exec mv {} ../NANOOS_files/upcasts/ \;
## mv: ./upcasts/SalishCruise_September2021_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_September2021_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_July2020_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_July2020_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_April2015_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_April2015_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_July2021_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_July2021_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_September2020_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_September2020_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_April2019_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_April2019_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_September2022_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_September2022_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_April2016_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_April2016_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_April2017_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_April2017_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_July2022_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_July2022_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_April2018_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_April2018_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_September2017_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_September2017_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_July2019_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_July2019_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_July2016_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_July2016_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_April2023_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_April2023_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_September2018_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_September2018_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_September2019_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_September2019_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_April2022_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_April2022_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_July2017_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_July2017_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_July2018_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_July2018_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_September2016_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_September2016_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_September2014_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_September2014_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_July2015_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_July2015_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_April2021_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_April2021_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_July2014_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_July2014_labupcast.xlsx are identical
## mv: ./upcasts/SalishCruise_September2015_labupcast.xlsx and ../NANOOS_files/upcasts/SalishCruise_September2015_labupcast.xlsx are identical

The 2014 and 2015 upcasts were not yet in NANOOS, so I had to download them from NCEI: https://www.ncei.noaa.gov/access/ocean-carbon-acidification-data-system/oceans/SalishCruise_DataPackage.html After downloading, I moved them directly to the upcasts folder and manually converted from csv to excel

import the excel files into R and merge into one large list

#name the directory
excel_dir<-"/Users/hailaschultz/Dropbox/Schultz_Dissertation/Data_Analysis/Schultz_dissertation-2/data/NANOOS_files/upcasts"

#get a list of file names
excel_files <- list.files(path = excel_dir, pattern = "\\.xlsx$", full.names = TRUE)

# Initialize an empty list to store the data frames
dfs <- list()

# Loop through each Excel file and read it into a data frame
for (file in excel_files) {
  # Read the Excel file into a data frame
  df <- read_excel(file)
  
  # Store the data frame in the list
  dfs[[length(dfs) + 1]] <- df
}

example of how to access dataframes

dfs[[1]]
## # A tibble: 235 × 39
##    CRUISE_ID DATE_UTC            TIME_UTC            DATE_LOCAL         
##    <chr>     <dttm>              <dttm>              <dttm>             
##  1 CAB1028   2015-04-05 00:00:00 1899-12-31 16:35:32 2015-04-05 00:00:00
##  2 CAB1028   2015-04-05 00:00:00 1899-12-31 16:36:59 2015-04-05 00:00:00
##  3 CAB1028   2015-04-05 00:00:00 1899-12-31 16:38:36 2015-04-05 00:00:00
##  4 CAB1028   2015-04-05 00:00:00 1899-12-31 16:40:28 2015-04-05 00:00:00
##  5 CAB1028   2015-04-05 00:00:00 1899-12-31 16:42:15 2015-04-05 00:00:00
##  6 CAB1028   2015-04-05 00:00:00 1899-12-31 16:43:16 2015-04-05 00:00:00
##  7 CAB1028   2015-04-05 00:00:00 1899-12-31 16:44:02 2015-04-05 00:00:00
##  8 CAB1028   2015-04-05 00:00:00 1899-12-31 16:44:47 2015-04-05 00:00:00
##  9 CAB1028   2015-04-05 00:00:00 1899-12-31 16:45:28 2015-04-05 00:00:00
## 10 CAB1028   2015-04-05 00:00:00 1899-12-31 16:46:11 2015-04-05 00:00:00
## # ℹ 225 more rows
## # ℹ 35 more variables: TIME_LOCAL <dttm>, LONGITUDE_DEC <dbl>,
## #   LATITUDE_DEC <dbl>, STATION_NO <dbl>, NISKIN_NO <dbl>, CTDPRS_DBAR <dbl>,
## #   CTDTMP_DEG_C_ITS90 <dbl>, CTDTMP_FLAG_W <dbl>, CTDSAL_PSS78 <dbl>,
## #   CTDSAL_FLAG_W <dbl>, SIGMATHETA_KG_M3 <dbl>, CTDOXY_UMOL_KG_ADJ <dbl>,
## #   CTDOXY_MG_L <dbl>, CTDOXY_FLAG_W <dbl>, OXYGEN_UMOL_KG <dbl>,
## #   OXYGEN_MG_L_1 <dbl>, OXYGEN_MG_L_2 <dbl>, OXYGEN_MG_L_3 <dbl>, …

See which columns all of the dataframes have in common

# Get column names of the first data frame
common_columns <- names(dfs[[1]])

# Loop through the remaining data frames and find common column names
for (i in 2:length(dfs)) {
  # Get column names of the current data frame
  current_columns <- names(dfs[[i]])
  
  # Find common column names with previous data frames
  common_columns <- intersect(common_columns, current_columns)
}

# 'common_columns' now contains the column names that are common across all data frames
print(common_columns)
##  [1] "CRUISE_ID"          "DATE_UTC"           "TIME_UTC"          
##  [4] "DATE_LOCAL"         "TIME_LOCAL"         "LONGITUDE_DEC"     
##  [7] "LATITUDE_DEC"       "STATION_NO"         "NISKIN_NO"         
## [10] "CTDPRS_DBAR"        "CTDTMP_DEG_C_ITS90" "CTDTMP_FLAG_W"     
## [13] "CTDSAL_PSS78"       "CTDSAL_FLAG_W"      "SIGMATHETA_KG_M3"  
## [16] "CTDOXY_FLAG_W"      "OXYGEN_UMOL_KG"     "OXYGEN_MG_L_1"     
## [19] "OXYGEN_MG_L_2"      "OXYGEN_MG_L_3"      "OXYGEN_FLAG_W"     
## [22] "TA_UMOL_KG"         "DIC_UMOL_KG"        "TA_FLAG_W"         
## [25] "DIC_FLAG_W"         "NITRATE_UMOL_L"     "NITRITE_UMOL_L"    
## [28] "AMMONIUM_UMOL_L"    "PHOSPHATE_UMOL_L"   "SILICATE_UMOL_L"   
## [31] "NUTRIENTS_FLAG_W"   "CHLA (ug/l)"

convert the date to local format

# Loop through each dataframe in the list
for (i in seq_along(dfs)) {
  # Check if the dataframe has a column labeled "DATE_LOCAL"
  if ("DATE_LOCAL" %in% names(dfs[[i]])) {
    # Convert the "DATE_LOCAL" column to character
    dfs[[i]]$DATE_LOCAL <- as.character(dfs[[i]]$DATE_LOCAL)
  }
}

for (i in seq_along(dfs)) {
  # Check if the dataframe has a column labeled "DATE_LOCAL"
  if ("DATE_LOCAL" %in% names(dfs[[i]])) {
    # Convert the "DATE_LOCAL" column to character
    dfs[[i]]$DATE_UTC <- as.character(dfs[[i]]$DATE_UTC)
  }
}

for (i in seq_along(dfs)) {
  # Check if the dataframe has a column labeled "DATE_LOCAL"
  if ("DATE_LOCAL" %in% names(dfs[[i]])) {
    # Convert the "DATE_LOCAL" column to character
    dfs[[i]]$TIME_UTC <- as.character(dfs[[i]]$TIME_UTC)
  }
}

for (i in seq_along(dfs)) {
  # Check if the dataframe has a column labeled "DATE_LOCAL"
  if ("DATE_LOCAL" %in% names(dfs[[i]])) {
    # Convert the "DATE_LOCAL" column to character
    dfs[[i]]$TIME_LOCAL <- as.character(dfs[[i]]$TIME_LOCAL)
  }
}

subset each dataframe to the common columns

# Loop through each data frame in the list
for (i in seq_along(dfs)) {
  # Subset the data frame to only the common columns
  dfs[[i]] <- dfs[[i]][, common_columns, drop = FALSE]
}

combine the dataframes vertically

combined_df <- do.call(rbind, dfs)

convert date from character to date

combined_df$Date <- ymd(combined_df$DATE_LOCAL)

extract month and year

combined_df$Month <- month(combined_df$Date)
combined_df$Year <- year(combined_df$Date)
unique(combined_df$Month)
## [1]  4  3 NA  7  6  9 10

recode months

combined_df$Month <- recode_factor(combined_df$Month, 
                              '4' = "APR",'3'="APR", '7' = "JUL", 
                              '6' = "JUL", '9' = "SEP", '10' = "SEP")

Subset stations to WOAC biology stations

unique(combined_df$STATION_NO)
##  [1] "28"  "5"   "1"   "3"   "4"   "26"  "22"  "21"  "20"  "7"   "8"   "10" 
## [13] "17"  "15"  "14"  "13"  "401" "12"  "11"  "402" "29"  "30"  "31"  "33" 
## [25] "35"  "36"  "38"  "28b" NA    "27"  "19"  "18"  "9"   "16"  "32"  "37"
# Define the seven named stations
named_stations <- c("4", "8", "12", "28", "38", "402", "22")

# Subset the dataframe based on the named stations
subset_df <- combined_df[combined_df$STATION_NO %in% named_stations, ]

#add P to station numbers
subset_df$STATION_NO <- paste0("P", subset_df$STATION_NO)

Rename

WOAC_upcasts<-subset_df

Downcast CTD data

This code creates a dataframe named downcasts_combined which has all of the downcast files merged vertically with common columns. 2014-2015 are processed separately from 2016-2022 at first because they are different file types and have different column names for the same variables.

import and merge downcast excel files (2016-2022)

# name downcast folder
downcast_dir<-"/Users/hailaschultz/Dropbox/Schultz_Dissertation/Data_Analysis/Schultz_dissertation-2/data/NANOOS_files/woac_downcasts"

downcast_excel_files <- list.files(path = downcast_dir, pattern = "\\.xlsx$", full.names = TRUE)

# Initialize an empty list to store the data frames
dfs <- list()

# Loop through each Excel file and read it into a data frame
for (file in downcast_excel_files) {
  # Read the Excel file into a data frame
  df <- read_excel(file)
  
  # Store the data frame in the list
  dfs[[length(dfs) + 1]] <- df
}

See which columns all of the dataframes have in common

# Get column names of the first data frame
common_columns <- names(dfs[[1]])

# Loop through the remaining data frames and find common column names
for (i in 2:length(dfs)) {
  # Get column names of the current data frame
  current_columns <- names(dfs[[i]])
  
  # Find common column names with previous data frames
  common_columns <- intersect(common_columns, current_columns)
}

# 'common_columns' now contains the column names that are common across all data frames
print(common_columns)
##  [1] "Uploadtime"                                   
##  [2] "NMEAtimeUTC"                                  
##  [3] "CruiseID"                                     
##  [4] "Station"                                      
##  [5] "Waypoint"                                     
##  [6] "Cast"                                         
##  [7] "prDM: Pressure  Digiquartz"                   
##  [8] "depSM: Depth"                                 
##  [9] "Temperature"                                  
## [10] "potemp090C: Potential Temperature"            
## [11] "potemp190C: Potential Temperature  2"         
## [12] "c0S/m: Conductivity"                          
## [13] "sal00: Salinity  Practical"                   
## [14] "sal11: Salinity  Practical  2"                
## [15] "density00: Density"                           
## [16] "sigma-t00: Density"                           
## [17] "density11: Density  2"                        
## [18] "sigma-È11: Density  2"                        
## [19] "sigma-t11: Density  2"                        
## [20] "sbeox1V: Oxygen raw  SBE 43  2"               
## [21] "sbeox0ML/L: Oxygen  SBE 43"                   
## [22] "sbeox0Mg/L: Oxygen  SBE 43"                   
## [23] "sbeox1ML/L: Oxygen  SBE 43  2"                
## [24] "sbeox1Mg/L: Oxygen  SBE 43  2"                
## [25] "sbox0Mm/Kg: Oxygen  SBE 43"                   
## [26] "sbox1Mm/Kg: Oxygen  SBE 43  2"                
## [27] "sbeox0PS: Oxygen  SBE 43"                     
## [28] "sbeox1PS: Oxygen  SBE 43  2"                  
## [29] "oxsolMg/L: Oxygen Saturation  Garcia & Gordon"
## [30] "flECO-AFL: Fluorescence  WET Labs ECO-AFL/FL" 
## [31] "PAR"                                          
## [32] "CStarTr0: Beam Transmission  WET Labs C-Star" 
## [33] "CStarAt0: Beam Attenuation  WET Labs C-Star"  
## [34] "turbWETntu0: Turbidity  WET Labs ECO"         
## [35] "timeS: Time  Elapsed"                         
## [36] "scan: Scan Count"

subset to common columns

# Loop through each data frame in the list
for (i in seq_along(dfs)) {
  # Subset the data frame to only the common columns
  dfs[[i]] <- dfs[[i]][, common_columns, drop = FALSE]
}

combine datasheets vertically

combined_df <- do.call(rbind, dfs)

remove units rows

combined_df<-subset(combined_df,NMEAtimeUTC!="[]")

convert all dates to the correct format

library(dplyr)

# Define a helper function to process each entry
convert_NMEAtime <- function(x) {
  if (grepl("^[0-9]+\\.[0-9]+$", x)) {
    # Convert Excel numeric date to POSIXct
    as.POSIXct(as.numeric(x) * 86400, origin = "1899-12-30", tz = "UTC") # Excel epoch starts on 1899-12-30
  } else {
    # Parse human-readable datetime
    as.POSIXct(x, format = "%b %d %Y %H:%M:%S", tz = "UTC")
  }
}


# Apply the conversion function to standardize the column
combined_df <- combined_df %>%
  mutate(
    # Convert all entries in NMEAtimeUTC to POSIXct format
    NMEAtimeUTC = sapply(NMEAtimeUTC, convert_NMEAtime)
  ) %>%
  # Convert POSIXct to desired character format
  mutate(
    NMEAtimeUTC = format(as.POSIXct(NMEAtimeUTC), "%b %d %Y %H:%M:%S")
  )

extract month

combined_df$Month <- substr(combined_df$NMEAtimeUTC, 1, 3)
unique(combined_df$Month)
## [1] "Apr" "Jul" "Jun" "Sep"
combined_df$Month <- recode_factor(combined_df$Month, 
                              'Apr' = "APR", 'Jul' = "JUL", 'Jun'="JUL",
                              '6' = "JUL", 'Sep' = "SEP")

extract year

combined_df$Year <- substr(combined_df$NMEAtimeUTC, 8, 11)
unique(combined_df$Year)
## [1] "2016" "2017" "2018" "2019" "2021" "2022" "2023" "2020"

Subset stations

unique(combined_df$Station)
##  [1] "P28"      "P5"       "P1"       "P3"       "P4"       "P26"     
##  [7] "P22"      "P21"      "P20"      "P7"       "P8"       "P10"     
## [13] "P17"      "P15"      "P14"      "P13"      "P401"     "P12"     
## [19] "P11"      "P402"     "P29"      "P30"      "P31"      "P33"     
## [25] "P35"      "P36"      "P38"      "RC001 08" "RC001 07" "RC001 06"
## [31] "RC001 09" "P27"      "P07"      "P08"      "P01"      "P03"     
## [37] "P04"      "P05"      "P04b"     "p1"       "p27"      "p28"     
## [43] "p3"       "p4"       "p5"       "p13"      "p21"      "p22"     
## [49] "p26"      "p20"      "p7"       "p8"       "p10"      "p11"     
## [55] "p12"      "p14"      "p15"      "p17"      "p401"     "p402"    
## [61] "p29"      "p30"      "p31"      "p33"      "p35"      "p36"     
## [67] "p38"
# Define the seven named stations
named_stations <- c("P4", "P8", "P12", "P28", "P38", "P402", "P22")

# Subset the dataframe based on the named stations
subset_df <- combined_df[combined_df$Station %in% named_stations, ]
unique(subset_df$Station)
## [1] "P28"  "P4"   "P22"  "P8"   "P12"  "P402" "P38"

imoort and merge csv files (2014-2015)

# Directory containing the CSV files
downcast_dir <- "/Users/hailaschultz/Dropbox/Schultz_Dissertation/Data_Analysis/Schultz_dissertation-2/data/NANOOS_files/CSV_files"

# List all CSV files in the directory
downcast_csv_files <- list.files(path = downcast_dir, pattern = "\\.csv$", full.names = TRUE)

# Initialize an empty list to store the data frames
dfs <- list()

# Loop through each CSV file and read it into a data frame
for (file in downcast_csv_files) {
  # Read the CSV file into a data frame
  df <- read.csv(file, stringsAsFactors = FALSE)
  
  # Store the data frame in the list
  dfs[[length(dfs) + 1]] <- df
}

See which columns all of the dataframes have in common

# Get column names of the first data frame
common_columns <- names(dfs[[1]])

# Loop through the remaining data frames and find common column names
for (i in 2:length(dfs)) {
  # Get column names of the current data frame
  current_columns <- names(dfs[[i]])
  
  # Find common column names with previous data frames
  common_columns <- intersect(common_columns, current_columns)
}

# 'common_columns' now contains the column names that are common across all data frames
print(common_columns)
##  [1] "Cruise.ID"                "UTC.Time"                
##  [3] "Latitude.DegMin"          "Longitude.DegMin"        
##  [5] "Latitude.Deg"             "Longitude.Deg"           
##  [7] "Station"                  "Pressure"                
##  [9] "Depth"                    "Temperature"             
## [11] "Potential.Temperature"    "Salinity"                
## [13] "Sigma.t"                  "Sigma.theta"             
## [15] "Oxygen.Concentration.MG"  "Oxygen.Concentration.MOL"
## [17] "Oxygen.Saturation"        "Chlorophyll.Fluorescence"
## [19] "Beam.Transmission"        "Beam.Attenuation"

subset to common columns

# Loop through each data frame in the list
for (i in seq_along(dfs)) {
  # Subset the data frame to only the common columns
  dfs[[i]] <- dfs[[i]][, common_columns, drop = FALSE]
}

combine datasheets vertically

csv_combined_df <- do.call(rbind, dfs)

remove units rows

csv_combined_df<-subset(csv_combined_df,Pressure!="CTD")
csv_combined_df<-subset(csv_combined_df,Pressure!="[db]")

extract month

csv_combined_df$Month <- substr(csv_combined_df$UTC.Time, 1, 3)
unique(csv_combined_df$Month)
## [1] "Apr" "Jul" "Jun" "Oct" "Sep"
csv_combined_df$Month <- recode_factor(csv_combined_df$Month, 
                              'Apr' = "APR", 'Jul' = "JUL", 'Jun'="JUL",
                              '6' = "JUL", 'Sep' = "SEP",'Oct'="OCT")

extract year

csv_combined_df$Year <- substr(csv_combined_df$UTC.Time, 8, 11)
unique(csv_combined_df$Year)
## [1] "2015" "2014"

Subset stations

unique(csv_combined_df$Station)
##  [1] "P28"  "P5"   "P1"   "P3"   "P4"   "P26"  "P22"  "P21"  "P20"  "P7"  
## [11] "P8"   "P10"  "P17"  "P15"  "P14"  "P13"  "P401" "P12"  "P11"  "P402"
## [21] "P29"  "P30"  "P31"  "P33"  "P36"  "P38"  "P27"  "P19"  "P18"  "P9"  
## [31] "P16"  "P32"  "P35"  "P37"  "P381" "P122" "P128" "P132" "P136" "P6"  
## [41] "P105" "P120" "P123"
# Define the seven named stations
named_stations <- c("P4", "P8", "P12", "P28", "P38", "P402", "P22")

# Subset the dataframe based on the named stations
csv_subset_df <- csv_combined_df[csv_combined_df$Station %in% named_stations, ]
unique(csv_subset_df$Station)
## [1] "P28"  "P4"   "P22"  "P8"   "P12"  "P402" "P38"

merge two datasets

colnames(subset_df)
##  [1] "Uploadtime"                                   
##  [2] "NMEAtimeUTC"                                  
##  [3] "CruiseID"                                     
##  [4] "Station"                                      
##  [5] "Waypoint"                                     
##  [6] "Cast"                                         
##  [7] "prDM: Pressure  Digiquartz"                   
##  [8] "depSM: Depth"                                 
##  [9] "Temperature"                                  
## [10] "potemp090C: Potential Temperature"            
## [11] "potemp190C: Potential Temperature  2"         
## [12] "c0S/m: Conductivity"                          
## [13] "sal00: Salinity  Practical"                   
## [14] "sal11: Salinity  Practical  2"                
## [15] "density00: Density"                           
## [16] "sigma-t00: Density"                           
## [17] "density11: Density  2"                        
## [18] "sigma-È11: Density  2"                        
## [19] "sigma-t11: Density  2"                        
## [20] "sbeox1V: Oxygen raw  SBE 43  2"               
## [21] "sbeox0ML/L: Oxygen  SBE 43"                   
## [22] "sbeox0Mg/L: Oxygen  SBE 43"                   
## [23] "sbeox1ML/L: Oxygen  SBE 43  2"                
## [24] "sbeox1Mg/L: Oxygen  SBE 43  2"                
## [25] "sbox0Mm/Kg: Oxygen  SBE 43"                   
## [26] "sbox1Mm/Kg: Oxygen  SBE 43  2"                
## [27] "sbeox0PS: Oxygen  SBE 43"                     
## [28] "sbeox1PS: Oxygen  SBE 43  2"                  
## [29] "oxsolMg/L: Oxygen Saturation  Garcia & Gordon"
## [30] "flECO-AFL: Fluorescence  WET Labs ECO-AFL/FL" 
## [31] "PAR"                                          
## [32] "CStarTr0: Beam Transmission  WET Labs C-Star" 
## [33] "CStarAt0: Beam Attenuation  WET Labs C-Star"  
## [34] "turbWETntu0: Turbidity  WET Labs ECO"         
## [35] "timeS: Time  Elapsed"                         
## [36] "scan: Scan Count"                             
## [37] "Month"                                        
## [38] "Year"
colnames(csv_subset_df)
##  [1] "Cruise.ID"                "UTC.Time"                
##  [3] "Latitude.DegMin"          "Longitude.DegMin"        
##  [5] "Latitude.Deg"             "Longitude.Deg"           
##  [7] "Station"                  "Pressure"                
##  [9] "Depth"                    "Temperature"             
## [11] "Potential.Temperature"    "Salinity"                
## [13] "Sigma.t"                  "Sigma.theta"             
## [15] "Oxygen.Concentration.MG"  "Oxygen.Concentration.MOL"
## [17] "Oxygen.Saturation"        "Chlorophyll.Fluorescence"
## [19] "Beam.Transmission"        "Beam.Attenuation"        
## [21] "Month"                    "Year"

rename columns from first dataset

subset_df<-subset_df %>% rename(Cruise.ID = CruiseID, 
                                Depth = "depSM: Depth",
                                Potential.Temperature="potemp090C: Potential Temperature",
                                Salinity="sal00: Salinity  Practical",
                                Oxygen.Concentration.MG="sbeox0Mg/L: Oxygen  SBE 43",
                                Chlorophyll.Fluorescence="flECO-AFL: Fluorescence  WET Labs ECO-AFL/FL")
# Find common columns
common_columns <- intersect(names(subset_df), names(csv_subset_df))

# Subset each data frame to only the common columns
subset_df_common <- subset_df[, common_columns, drop = FALSE]
csv_subset_df_common <- csv_subset_df[, common_columns, drop = FALSE]

# Combine the data frames vertically
WOAC_downcasts <- rbind(subset_df_common, csv_subset_df_common)