library (GeoLocatoR)
library (tidyverse)
library (lubridate)
library (DT)
# Root folder of the dataset (typically the Z-drive)
soi_data_directory <- "/Users/rafnuss/Library/CloudStorage/OneDrive-Vogelwarte/2-geolocator_data/UNIT_Vogelzug"
# Load the geolocator (GDL) database from the access file
gdl <- read_gdl (access_file = file.path (soi_data_directory, "database/GDL_Data.accdb" ), filter_col = FALSE )
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
data_order_file <- read_gdl_access (access_file = file.path (soi_data_directory, "database/GDL_Data.accdb" ))
d <- read_gdl_data (data_order_file[1 ])
Warning: One or more parsing issues, call `problems()` on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
o <- read_gdl_orders (data_order_file[2 ])
GDL_Orders
Read the table GDL_Orders and find inconsistencies:
problems (o) %>% datatable ()
Summarize data per Responsible
o %>%
group_by (ResponsibleP3) %>%
summarize (
OrderCount = n (),
TotalNumberOrdered = sum (NumberOrdered)
) %>%
datatable ()
GDL_Data
OrderName
No orderName:
d %>%
filter (is.na (d$ OrderName)) %>%
datatable ()
OrderName not present in GDL_Order:
d %>%
group_by (OrderName) %>%
summarize (
GDLCount = n ()
) %>%
filter (! (OrderName %in% o$ OrderName)) %>%
datatable ()
GDL_ID
No GDL_ID
d %>%
filter (is.na (GDL_ID)) %>%
datatable ()
Duplicate GDL_ID AND OrderName:
d %>%
group_by (GDL_ID, OrderName) %>%
filter (n () > 1 ) %>%
ungroup () %>%
arrange (GDL_ID) %>%
datatable ()
Duplicate GDL_ID (only):
d %>%
group_by (GDL_ID) %>%
filter (n () > 1 ) %>%
ungroup () %>%
arrange (GDL_ID) %>%
datatable ()
GDL_Type, hardware, firmware
d %>%
group_by (GDL_Type, FirmwareVersion, HardwareVersion) %>%
summarize (.groups = "keep" , countTag = n ()) %>%
datatable ()
d %>%
filter (is.na (GDL_Type) | is.na (FirmwareVersion) | is.na (HardwareVersion)) %>%
datatable ()
Species
d %>%
group_by (Species) %>%
summarize (countTag = n ()) %>%
arrange (countTag) %>%
datatable ()
Actual data
-> folder and subfolder structure?
All file extensions
tibble (filename = list.files ("/Users/rafnuss/Library/CloudStorage/Box-Box/geolocator_data/UNIT_Vogelzug/data/" , recursive = TRUE , full.names = TRUE )) %>%
mutate (ext = tools:: file_ext (filename)) %>%
group_by (ext) %>%
summarise (count = n ()) %>%
arrange (count) %>%
datatable ()
for (i in seq_len (nrow (d))) {
if (is.na (d$ GDL_ID[i])) {
next
}
folder <- list.files (glue:: glue ("./data/{d$OrderName[i]}/" ), pattern = d$ GDL_ID[i])
if (length (folder) == 1 ) {
d$ gdl_id_version <- folder
} else if (any (grepl (".glf" , folder))) {
d$ gdl_id_version <- folder[1 ]
} else if (length (folder) > 1 ) {
print (folder)
}
}
Producation database
Order model
Software Settings
IntervalBlue
Interval setting for blue light recording
LoggingPeriodEnabled
Indicates if logging period is enabled
StartPeriod1
Start time for the first logging period
IntervalRGC
Interval for recording green channel data
GreenEnabled
Indicates if green channel is enabled
RedEnabled
Indicates if red channel is enabled
ClearEnabled
Indicates if clear channel is enabled
LightLevelThreshold
Threshold level for light detection
BaseLoggingInterval
Base interval for logging data
IntervalFactorLight
Interval factor for light level recording
IntervalFactorAirTemperature
Interval factor for air temperature recording
IntervalFactorBodyTemperature
Interval factor for body temperature recording
IntervalFactorPressure
Interval factor for pressure recording
IntervalFactorAcceleration
Interval factor for acceleration recording
IntervalFactorMagnetic
Interval factor for magnetic field recording
AccSamplingRate
Sampling rate for accelerometer data
AccXaxisEnabled
Indicates if X-axis accelerometer is enabled
AccYaxisEnabled
Indicates if Y-axis accelerometer is enabled
AccZaxisEnabled
Indicates if Z-axis accelerometer is enabled
DataloggerEnabled
Indicates if datalogger is enabled
RadioEnabled
Indicates if radio transmission is enabled
Frequencies
Frequencies used for data transmission
TransmitPower
Power level for data transmission
WakeUpTrigger
Trigger for waking up the device
PatternID
Identifier for the data transmission pattern
PulseWidth
Width of the transmission pulse
PulseInterval
Interval between transmission pulses
PatternInterval
Interval for the transmission pattern
PulsePattern
Pattern of the transmission pulses
ActiveAfterWakeUp
Indicates if the device is active after wake up
TRSchedule
Transmission schedule
LoggingSchedule
Logging schedule
Hardware Characteristics
LWL
Unknown
LWL_Length
Length of the unknown item
LWL_Diameter
Diameter of the unknown item
Harness
Type of harness used
HarnessMaterial
Material of the harness
HarnessAttachement
Method of attaching the harness
HarnessThickness
Thickness of the harness
LegHarnessDiameter
Diameter of the leg harness
BreastHarnessDiameterHead
Diameter of the breast harness at the head
BreastHarnessDiameterTail
Diameter of the breast harness at the tail
HarnessDescription
Description of the harness
Priority Settings
PrioritySensor
Priority level for sensor data
PriorityMemory
Priority level for memory usage
Data model
Tag
OrderName *
Identifier for the specific order of the tags
Could we make this standard (e.g., year-countrycode-site?-tagtype?)
GDL_ID *
Unique identifier for the geolocator tag
This should absolutely be unique and standard (e.g., ordername-integer)
GDL_Type *
Type of geolocator tag (GDL1, GDL2, GDL3a, GDL3pam, uTag)
HardwareVersion *
Hardware version of the tag (e.g., v1.0)
This is currently not standardizsed. It is also present in the *.setting file
FirmwareVersion *
Firmware version of the tag
This is currently not standardizsed. It is also present in the *.setting file
PrintManufacturer
Manufacturer of the tag
Usually empty (Teltronic AG or Hybrid SA)
FinalAssembly
Final assembly details of the tag
Usually empty (own or teltronic)
Software Settings
LoggingPeriodEnabled
Indicates if logging period is enabled
UTC_StartLog1
UTC start time for the first logging period
LightLevelThreshold
Threshold level for light detection
BaseLoggingInterval
Base interval for logging data
IntervalFactorLight
Interval factor for light level recording
IntervalFactorAirTemperature
Interval factor for air temperature recording
IntervalFactorBodyTemperature
Interval factor for body temperature recording
IntervalFactorPressure
Interval factor for pressure recording
IntervalFactorAcceleration
Interval factor for acceleration recording
IntervalFactorMagnetic
Interval factor for magnetic field recording
Frequency
Recording frequency of the tag
Hardware Characteristic
TotalWeight
Total weight of the tag
LWL
??
LWL_Length
Length of the ???
LWL_Diameter
Diameter of the ???
DateBatteryConnected
Date when the battery was connected to the tag
VoltageReady
Voltage level when the tag was ready
SleepCurrent
Current draw in sleep mode
Harness
Type of harness used to attach the tag
HarnessMaterial
Material of the harness
HarnessAttachment
Attachment method of the harness
HarnessThickness
Thickness of the harness
LegHarnessDiameter
Diameter of the leg harness
BreastHarnessDiameterHead
Diameter of the breast harness at the head
BreastHarnessDiameterTail
Diameter of the breast harness at the tail
Species
species
Species of the bird equipped with the tag
Species_origin
Origin of the bird species
Equipment
LongitudeAttached
Longitude of the attachment site
LatitudeAttached
Latitude of the attachment site
UTC_Attached
UTC time when the tag was attached
DateDeparture
Date of bird’s departure from the attachment site
Useful for light calibration and GeoPressure:
Retrieval
LongitudeRemoved
Longitude where the tag was removed
LatitudeRemoved
Latitude where the tag was removed
UTC_Removed
UTC time when the tag was removed
DateArrival
Date of bird’s arrival at the destination
Miscellaneous
Remarks
Additional remarks or comments
Folder structure
data/*OrderName*/*GDLID*_*date*
GDLID_*date*.glf
light data
GDLID_*date*.acceleration
acceleration data
GDLID_*date*.pressure
pressure data
GDLID_*date*.AirTemperature
air temperature data
GDLID_*date*.data
??
GDLID_*date*.settings
??
GDLID_*date*.log
??
GDLID_*date*.rep
??
GDLID_*date*.settings.bin
??
Notes
Suggestions of improvement
Metadata as markdown or word or excel?
Cleaning up datafolder:
data/*OrderName*/*GDL_ID*
data/*OrderName*/*GDL_ID*_*date* for old version only
no zip, pdf, etc file -> move to another folder? e.g., data/note
Data validation/test: rmarkdown -> html
Access is dying: Move to excel?
Limitations
More flexible contributors list with role, contact details, etc. (e.g. https://r-pkgs.org/description.html#sec-description-authors-at-r )
Licenses of data (https://r-pkgs.org/description.html#the-license-field )
More context to the project: description, temporal, spatial, taxonomic extend, websites?, project funding, publication?
We need a unique number. So far, I’ve assumed that GDL_ID is unique, but it’s not! Also, date in folder is not great, there could be a data version on the setting corresponding to the date of creation.
We should be able to store more information on the animal (ring numner, sex, age, ssp. size etc…)
How to record information on resightings, both captured without the tag (tag lost), or seen but not captured. or tag damaged?
We need an easy way to known how much data is available for each tag and the quality of the data (e.g. pressure), and the date extend of the data can be different for each tag.
Tag effect require control group. Do we want to store this information too?
Data format are not standard and not explained
Processing pipeline
Order -> production -> sending -> equipement -> retrieval -> sending back -> extraction -> bundling -> send data.
Where are which information stored in (1) Order table, (2) GDL table or (3) settings files?
Also, which table is storing the possibly different information between order, delivered and retrieved?