GPS Tag Processing

Overview

This is a modified version of the original QMD to display the cleaning process to the supervisory team.

The following script is designed to process the raw GPS data collected from the Biotrack PinPoint tags and prepare it for analysis. The script includes the removal of invalid and inaccurate GPS fixes, the creation of relevant columns (e.g. time stamp, survey night) and the removal of irrelevant columns.

Set up

GPS data is collected from the tags, downloaded onto a computer and processed by Pinpoint Host. This produces a test document or CSV which acts as our raw data and is imported in the below script (Ensure that the file path is changed for each hedgehog).

For the purpose of producing maps at an early stage, the original order of code has been changed so that invalid fixes containing NA values are removed immediately. Similarly, the time stamp value and the conversion to sf are done here to allow for map creation.

The data from a female hedgehog from West Kilbride is used for this example.

#Import data
raw_data <- read.delim("45441_WK_YB_F_V8_99_0.txt",
                       header = TRUE,
                       sep = "",
                       fill = TRUE,
                       strip.white = TRUE,
                       check.names = FALSE)

#Remove all fixes with "NotEnoughSats" status in order to erase all N/A entries
data <- raw_data %>% filter(Status == "Valid")

##Create DateTime value. Involves conversion for UTC to BST
# Clean raw text
data$`FIX-date` <- as.character(trimws(gsub("[^[:print:]]", "", data$`FIX-date`)))
data$`FIX-time` <- as.character(trimws(gsub("[^[:print:]]", "", data$`FIX-time`)))

# Step 1: Create a timestamp string
datetime_str <- paste(data$`FIX-date`, data$`FIX-time`)

# Step 2: Parse as if it were in GMT (UTC)
datetime_gmt <- as.POSIXct(datetime_str, format = "%y/%m/%d %H:%M:%OS", tz = "UTC")

# Step 3: Convert to Europe/London (adds +1 hour for BST when applicable)
data$DateTime <- with_tz(datetime_gmt, tzone = "Europe/London")



#Covert data to sf object  
data_sf <- st_as_sf(data, coords = c("Longitude", "Latitude"), crs = 4326)
#Retain Lat and Long values
coords <- st_coordinates(data_sf)
data_sf <- data_sf %>%
  mutate(Longitude = coords[, 1],
         Latitude = coords[, 2])

The raw data is mapped below. The first map is not of much use due to major outliers caused by low accuracy GPS fixes. The second (zoomed) map is shown to better demonstrate the spread of GPS fixes, and give a comparison for maps presented later in the process.

I struggled to get the zoom and frame right for these. Fixing one problem caused another, so I hope this is sufficient to get an idea of how the data looks.

Filtering Invalid Fixes

Invalid and inaccurate fixes need to be removed to ensure that we are only left with fixes suitable for analysis. Previous researchers at NTU have used the following criteria to filter fixes based on manufacturer recommendations:

  • Status = Valid

  • Number of satellites used to take fix =/> 4

  • Horizontal Dilution of Precision (HDOP) =/< 5

  • Location error (eRes) =/< 10

The following code will filter out unsuitable fixes as per the above criteria. This includes an alteration to the ‘Sats’ field, which shows two numbers in a fraction format. These represent the number of GPS satellites used/Number of GPS satellites available at the time of fix. These need to be split into two separate fields in order to properly filter out inaccurate fixes.

#Separate the Sats field prior to filtering 
data_sf$Sats <- as.character(data_sf$Sats)
data_sf <- data_sf %>%
  separate(Sats, c("Sats_Used", "Sats_Available", "/")) %>% select(-`/`)
data_sf$Sats_Used <- as.numeric(data_sf$Sats_Used)
data_sf$Sats_Available <- as.numeric(data_sf$Sats_Available)

#Filter entries that do not meet quality requirements 
data_valid <- data_sf %>% filter(Status == "Valid", Sats_Used > 4,
                              HDOP < 5, eRes < 10, eRes >= 0)
Zoom: 15

Define Deployment Period

The raw data contains fixes from before and after the deployment period (i.e. when not attached to a hedgehog). In the the below, the capture and recapture is used to define the deployment period and to filter out fixes collected outside this period (Requires manual editing).

#Define the deployment period 
time_attached <- ymd_hms("2025-06-18 01:00:00", tz = "Europe/London") 
time_removed <- ymd_hms("2025-07-02 02:40:00", tz = "Europe/London")  

#Filter fixes outside deployment period 
data_deploy <- data_valid %>%  
  filter(between(DateTime, time_attached, time_removed)) 
Zoom: 15

Data Preparation

Ahead of the data analysis, the following fields are added into the data set:

  • X and Y coordinate values - For ease in plotting points in GIS. Converted from WGS84 to British National Grid.

  • Year - The year the data was collected

  • Night - An identifier for each individual night the hedgehog was tracked for

  • Site - The study site the data was collected from. Value will need to be entered manually

  • Sex - The sex of the tracked hedgehog. Value will need to be entered manually

  • HHCID- A unique identifier for each tracked hedgehog (Site code_Year_Sex_sequential letter of hedgehog from that site’s trapping season e.g. BRK_2025_M_A)

  • Burst - Groups fixes based on whether they were recorded sequentially. Gaps in the sequence will occur following fix filtration and this will prevent issues in later analyses (presumably).

  • FID - An individual ID for each of the fixes.

The next code block adds each of these new fields into the data set.

#Create X and Y coordinate values
gps_projected <- st_transform(data_deploy, cr = 27700)
gps_coords <- st_coordinates(gps_projected)
data_deploy$x <- gps_coords[, 1]
data_deploy$y <- gps_coords[, 2]

#Create Year value
data_prep <- data_deploy %>%
  mutate(Date = ymd(`FIX-date`),  
         Year = year(Date))

#Create Night Values
data_prep <- data_prep %>%
  mutate(
   SurveyDate = as.Date(DateTime - hours(12))
  )

first_night <- min(data_prep$SurveyDate, na.rm = TRUE)

data_prep <- data_prep %>%
  mutate(
    SurveyNight = as.integer(difftime(SurveyDate, first_night, units = "days"))
  )
 
#Create Site, Sex and HHID
#Change value depending on the source of data
data_prep <- data_prep %>%
  mutate(Site = "West Kilbride",
         Sex = "F",
         Site_code = "WSK",
         HHC = "B")
data_prep$HHID <- paste(data_prep$Site_code, data_prep$Year,
                        data_prep$Sex, data_prep$HHC, sep="_")

#Create Burst
data_prep <- data_prep %>%
  arrange(Index) %>%
  mutate(
    Break = c(0, diff(Index) != 1),
    Burst = cumsum(Break) + 1
  )%>%
  select(-Break)
data_prep$Burst <- paste(data_prep$HHID, data_prep$Burst, sep=".")

#Create FID
data_prep <- data_prep %>%
  mutate(FID = paste(HHID, Index, sep="."))

The next code block then removes the fields that are not required for the analysis and help to tidy the data set. A

#Filter out unneeded columns
data_processed <- data_prep %>% 
  select(HHID, FID, x, y, Latitude, Longitude, Year, DateTime, Site,
          Sex, SurveyNight, Burst) 

#Display subset of final dataset
head(data_processed)
Simple feature collection with 6 features and 12 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: -4.86248 ymin: 55.69263 xmax: -4.86168 ymax: 55.6934
Geodetic CRS:  WGS 84
          HHID             FID        x        y Latitude Longitude Year
1 WSK_2025_F_B WSK_2025_F_B.13 220209.1 648070.2 55.69268  -4.86201 2025
2 WSK_2025_F_B WSK_2025_F_B.14 220229.6 648063.8 55.69263  -4.86168 2025
3 WSK_2025_F_B WSK_2025_F_B.15 220221.2 648071.9 55.69270  -4.86182 2025
4 WSK_2025_F_B WSK_2025_F_B.16 220218.3 648064.2 55.69263  -4.86186 2025
5 WSK_2025_F_B WSK_2025_F_B.17 220183.2 648098.0 55.69292  -4.86244 2025
6 WSK_2025_F_B WSK_2025_F_B.18 220182.9 648151.5 55.69340  -4.86248 2025
             DateTime          Site Sex SurveyNight          Burst
1 2025-06-18 01:00:09 West Kilbride   F           0 WSK_2025_F_B.1
2 2025-06-18 01:10:08 West Kilbride   F           0 WSK_2025_F_B.1
3 2025-06-18 01:22:23 West Kilbride   F           0 WSK_2025_F_B.1
4 2025-06-18 01:30:14 West Kilbride   F           0 WSK_2025_F_B.1
5 2025-06-18 01:40:07 West Kilbride   F           0 WSK_2025_F_B.1
6 2025-06-18 01:50:11 West Kilbride   F           0 WSK_2025_F_B.1
                   geometry
1 POINT (-4.86201 55.69268)
2 POINT (-4.86168 55.69263)
3  POINT (-4.86182 55.6927)
4 POINT (-4.86186 55.69263)
5 POINT (-4.86244 55.69292)
6  POINT (-4.86248 55.6934)

For visual purposes, GPS fixes are sorted by survey night and by burst in the below maps:

Zoom: 15

Zoom: 15

Export Data

The data is now ready to be exported as a CSV for analysis and for exploration in GIS. Ensure that the file path and file name is changed.

#Export as CSV with relevant name and file path
#write.csv(data_processed, "C:/Users/jones/OneDrive/Documents/PhD/Hedgehog_data/Telemetry/Site_Data/Cleaned_data/Scotland_2025/West_Kilbride/WSK_2025_F_B.csv")