Check SOI GDL Access Dataset

library(GeoLocatoR)
library(tidyverse)
library(lubridate)
library(DT)
# Root folder of the dataset (typically the Z-drive)
soi_data_directory <- "/Users/rafnuss/Library/CloudStorage/OneDrive-Vogelwarte/2-geolocator_data/UNIT_Vogelzug"

# Load the geolocator (GDL) database from the access file
gdl <- read_gdl(access_file = file.path(soi_data_directory, "database/GDL_Data.accdb"), filter_col = FALSE)
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
data_order_file <- read_gdl_access(access_file = file.path(soi_data_directory, "database/GDL_Data.accdb"))

d <- read_gdl_data(data_order_file[1])
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
  dat <- vroom(...)
  problems(dat)
o <- read_gdl_orders(data_order_file[2])

GDL_Orders

Read the table GDL_Orders and find inconsistencies:

problems(o) %>% datatable()

Summarize data per Responsible

o %>%
  group_by(ResponsibleP3) %>%
  summarize(
    OrderCount = n(),
    TotalNumberOrdered = sum(NumberOrdered)
  ) %>%
  datatable()

GDL_Data

OrderName

No orderName:

d %>%
  filter(is.na(d$OrderName)) %>%
  datatable()

OrderName not present in GDL_Order:

d %>%
  group_by(OrderName) %>%
  summarize(
    GDLCount = n()
  ) %>%
  filter(!(OrderName %in% o$OrderName)) %>%
  datatable()

GDL_ID

No GDL_ID

d %>%
  filter(is.na(GDL_ID)) %>%
  datatable()

Duplicate GDL_ID AND OrderName:

d %>%
  group_by(GDL_ID, OrderName) %>%
  filter(n() > 1) %>%
  ungroup() %>%
  arrange(GDL_ID) %>%
  datatable()

Duplicate GDL_ID (only):

d %>%
  group_by(GDL_ID) %>%
  filter(n() > 1) %>%
  ungroup() %>%
  arrange(GDL_ID) %>%
  datatable()

GDL_Type, hardware, firmware

d %>%
  group_by(GDL_Type, FirmwareVersion, HardwareVersion) %>%
  summarize(.groups = "keep", countTag = n()) %>%
  datatable()
d %>%
  filter(is.na(GDL_Type) | is.na(FirmwareVersion) | is.na(HardwareVersion)) %>%
  datatable()

Species

d %>%
  group_by(Species) %>%
  summarize(countTag = n()) %>%
  arrange(countTag) %>%
  datatable()

Actual data

-> folder and subfolder structure?

All file extensions

tibble(filename = list.files("/Users/rafnuss/Library/CloudStorage/Box-Box/geolocator_data/UNIT_Vogelzug/data/", recursive = TRUE, full.names = TRUE)) %>%
  mutate(ext = tools::file_ext(filename)) %>%
  group_by(ext) %>%
  summarise(count = n()) %>%
  arrange(count) %>%
  datatable()
for (i in seq_len(nrow(d))) {
  if (is.na(d$GDL_ID[i])) {
    next
  }
  folder <- list.files(glue::glue("./data/{d$OrderName[i]}/"), pattern = d$GDL_ID[i])

  if (length(folder) == 1) {
    d$gdl_id_version <- folder
  } else if (any(grepl(".glf", folder))) {
    d$gdl_id_version <- folder[1]
  } else if (length(folder) > 1) {
    print(folder)
  }
}

Producation database

Order model

Order Information

Variable Name Description Comments
OrderID Unique identifier for the order Integer
OrderName Name or identifier for the specific order Code usualy 6 charachter for species scientific name , 2 letter country code , year??
Species Main? species of the bird associated with the order
Client Client associated with the order
ResponsibleP3 Person responsible for the order
ClientCategory Category of the client cooperation, own, purchase
GDL_Type Type of geolocator tag ordered duplicated with GDL data
NumberOrdered Total number of tags ordered
OrderStatus Current status of the order
NumberDelivered Total number of tags delivered
DeliveryDate Expected delivery date for the order
DateDelivered Actual date when the tags were delivered
UnitPrice Price per unit of the tag
Currency Currency used for the order
InvoiceDate Date when the invoice was issued
RemarksProduction Additional remarks related to the production
RemarksOrder Additional remarks related to the order

Software Settings

Variable Name Description Comments
IntervalBlue Interval setting for blue light recording
LoggingPeriodEnabled Indicates if logging period is enabled
StartPeriod1 Start time for the first logging period
IntervalRGC Interval for recording green channel data
GreenEnabled Indicates if green channel is enabled
RedEnabled Indicates if red channel is enabled
ClearEnabled Indicates if clear channel is enabled
LightLevelThreshold Threshold level for light detection
BaseLoggingInterval Base interval for logging data
IntervalFactorLight Interval factor for light level recording
IntervalFactorAirTemperature Interval factor for air temperature recording
IntervalFactorBodyTemperature Interval factor for body temperature recording
IntervalFactorPressure Interval factor for pressure recording
IntervalFactorAcceleration Interval factor for acceleration recording
IntervalFactorMagnetic Interval factor for magnetic field recording
AccSamplingRate Sampling rate for accelerometer data
AccXaxisEnabled Indicates if X-axis accelerometer is enabled
AccYaxisEnabled Indicates if Y-axis accelerometer is enabled
AccZaxisEnabled Indicates if Z-axis accelerometer is enabled
DataloggerEnabled Indicates if datalogger is enabled
RadioEnabled Indicates if radio transmission is enabled
Frequencies Frequencies used for data transmission
TransmitPower Power level for data transmission
WakeUpTrigger Trigger for waking up the device
PatternID Identifier for the data transmission pattern
PulseWidth Width of the transmission pulse
PulseInterval Interval between transmission pulses
PatternInterval Interval for the transmission pattern
PulsePattern Pattern of the transmission pulses
ActiveAfterWakeUp Indicates if the device is active after wake up
TRSchedule Transmission schedule
LoggingSchedule Logging schedule

Hardware Characteristics

Variable Name Description Comments
LWL Unknown
LWL_Length Length of the unknown item
LWL_Diameter Diameter of the unknown item
Harness Type of harness used
HarnessMaterial Material of the harness
HarnessAttachement Method of attaching the harness
HarnessThickness Thickness of the harness
LegHarnessDiameter Diameter of the leg harness
BreastHarnessDiameterHead Diameter of the breast harness at the head
BreastHarnessDiameterTail Diameter of the breast harness at the tail
HarnessDescription Description of the harness

Priority Settings

Variable Name Description Comments
PrioritySensor Priority level for sensor data
PriorityMemory Priority level for memory usage

Data model

Tag

Variable Name Description Comments
OrderName * Identifier for the specific order of the tags Could we make this standard (e.g., year-countrycode-site?-tagtype?)
GDL_ID * Unique identifier for the geolocator tag This should absolutely be unique and standard (e.g., ordername-integer)
GDL_Type * Type of geolocator tag (GDL1, GDL2, GDL3a, GDL3pam, uTag)
HardwareVersion * Hardware version of the tag (e.g., v1.0) This is currently not standardizsed. It is also present in the *.setting file
FirmwareVersion * Firmware version of the tag This is currently not standardizsed. It is also present in the *.setting file
PrintManufacturer Manufacturer of the tag Usually empty (Teltronic AG or Hybrid SA)
FinalAssembly Final assembly details of the tag Usually empty (own or teltronic)

Software Settings

Variable Name Description Comments
LoggingPeriodEnabled Indicates if logging period is enabled
UTC_StartLog1 UTC start time for the first logging period
LightLevelThreshold Threshold level for light detection
BaseLoggingInterval Base interval for logging data
IntervalFactorLight Interval factor for light level recording
IntervalFactorAirTemperature Interval factor for air temperature recording
IntervalFactorBodyTemperature Interval factor for body temperature recording
IntervalFactorPressure Interval factor for pressure recording
IntervalFactorAcceleration Interval factor for acceleration recording
IntervalFactorMagnetic Interval factor for magnetic field recording
Frequency Recording frequency of the tag

Hardware Characteristic

Variable Name Description Comments
TotalWeight Total weight of the tag
LWL ??
LWL_Length Length of the ???
LWL_Diameter Diameter of the ???
DateBatteryConnected Date when the battery was connected to the tag
VoltageReady Voltage level when the tag was ready
SleepCurrent Current draw in sleep mode
Harness Type of harness used to attach the tag
HarnessMaterial Material of the harness
HarnessAttachment Attachment method of the harness
HarnessThickness Thickness of the harness
LegHarnessDiameter Diameter of the leg harness
BreastHarnessDiameterHead Diameter of the breast harness at the head
BreastHarnessDiameterTail Diameter of the breast harness at the tail

Species

Variable Name Description Comments
species Species of the bird equipped with the tag
Species_origin Origin of the bird species

Equipment

Variable Name Description Comments
LongitudeAttached Longitude of the attachment site
LatitudeAttached Latitude of the attachment site
UTC_Attached UTC time when the tag was attached
DateDeparture Date of bird’s departure from the attachment site Useful for light calibration and GeoPressure:

Retrieval

Variable Name Description Comments
LongitudeRemoved Longitude where the tag was removed
LatitudeRemoved Latitude where the tag was removed
UTC_Removed UTC time when the tag was removed
DateArrival Date of bird’s arrival at the destination

Technical Retrieval Information

Variable Name Description Comments
UTC_StartRTC Start time of the Real-Time Clock on the tag
UTC_StopRTC Stop time of the Real-Time Clock on the tag
UTC_StopReference Reference stop time for data analysis
UTC_FirstData UTC time of the first data recorded by the tag
UTC_LastData UTC time of the last data recorded by the tag
VoltageEnd Voltage of the tag at the end of deployment
MemoryUsed Memory used by the tag
MagCalib Magnetic calibration data

Miscellaneous

Variable Name Description Comments
Remarks Additional remarks or comments

Folder structure

data/*OrderName*/*GDLID*_*date*

filename Description Comments
GDLID_*date*.glf light data
GDLID_*date*.acceleration acceleration data
GDLID_*date*.pressure pressure data
GDLID_*date*.AirTemperature air temperature data
GDLID_*date*.data ??
GDLID_*date*.settings ??
GDLID_*date*.log ??
GDLID_*date*.rep ??
GDLID_*date*.settings.bin ??

Notes

Suggestions of improvement

  1. Metadata as markdown or word or excel?
  2. Cleaning up datafolder:
    1. data/*OrderName*/*GDL_ID*
    2. data/*OrderName*/*GDL_ID*_*date* for old version only
    3. no zip, pdf, etc file -> move to another folder? e.g., data/note
  3. Data validation/test: rmarkdown -> html
  4. Access is dying: Move to excel?

Limitations

  • More flexible contributors list with role, contact details, etc. (e.g. https://r-pkgs.org/description.html#sec-description-authors-at-r)
  • Licenses of data (https://r-pkgs.org/description.html#the-license-field)
  • More context to the project: description, temporal, spatial, taxonomic extend, websites?, project funding, publication?
  • We need a unique number. So far, I’ve assumed that GDL_ID is unique, but it’s not! Also, date in folder is not great, there could be a data version on the setting corresponding to the date of creation.
  • We should be able to store more information on the animal (ring numner, sex, age, ssp. size etc…)
  • How to record information on resightings, both captured without the tag (tag lost), or seen but not captured. or tag damaged?
  • We need an easy way to known how much data is available for each tag and the quality of the data (e.g. pressure), and the date extend of the data can be different for each tag.
  • Tag effect require control group. Do we want to store this information too?
  • Data format are not standard and not explained

Processing pipeline

Order -> production -> sending -> equipement -> retrieval -> sending back -> extraction -> bundling -> send data.

  • Where are which information stored in (1) Order table, (2) GDL table or (3) settings files?
  • Also, which table is storing the possibly different information between order, delivered and retrieved?