This document will walk you through preprocessing electrodermal activity data collected from one student in the EDA lab practical with the Empatica E4. We use the signal package to clean our raw data and ggplot2 to visualize and assess our EDA data quality. We will also use the dplyr package for easy data manipulation.

Setup

Import packages

library(signal)
library(dplyr)
library(ggplot2)

Load your EDA data

Begin by setting the working directory from where your data will be loaded.

setwd('~/Documents/Northeastern/Teaching/CS 4910/Week 6/')
f <- 'E4_Student1/'  # set to the decompressed Zip folder

Read in the relevant CSV file from the Empatica E4.

csv <- read.csv(paste(f, 'EDA.csv', sep = ''), header = F)
head(csv)
##             V1
## 1 1.645213e+09
## 2 4.000000e+00
## 3 0.000000e+00
## 4 8.389180e-01
## 5 8.635200e-01
## 6 8.238160e-01

Data Manipulation

Set up the data frame

First, create a new data frame to store our EDA and timing data. To do this, we need to extract only the actual EDA values from the CSV file. These are the values after the third row in the CSV file. One way to do this is by using the filter() function from the dplyr package.

Note: We use the double colon :: operator to tell R that we are accessing the filtering function from dplyr because the signal package also contains a filter() function.

eda <- csv %>%
  dplyr::filter(!row_number() %in% 1:3)
names(eda) <- 'SCL'  # rename the 'V1' column

Create the timing data

To create a time variable, we need to know the start and end times of the recording. The recording start time (in Unix) and sampling rate are given in the first two rows of the EDA.csv file.

start <- csv[1, 1]          # set the start time
fs <- csv[2, 1]             # set the sampling rate

Since we know that the sampling rate (\(F_s\)) is the number of samples recorded from the device per second, we use this value to figure out the duration of the recording.

duration <- nrow(eda) / fs  # get the recording duration in seconds
end <- start + duration     # calculate the recording end time

Now that we have the start and end times of the recording, we can create our Time variable as a sequence of values from start to end. We increment our values in the sequence by 0.25 because our sampling rate is 4 Hz, or 4 samples per second.

Note: We use the head() function on our sequence to remove the last timestamp from our data. This is because end is when the E4 stopped recording, so there was no data at this time point.

eda$Time <- head(seq(from = start, to = end, by = 0.25), -1)

Convert the Time variable from Unix to a date-time format.

eda$Time <- as.POSIXct(eda$Time, origin = '1970-01-01')

Preview your new data frame.

head(eda)
##        SCL                Time
## 1 0.838918 2022-02-18 14:35:54
## 2 0.863520 2022-02-18 14:35:54
## 3 0.823816 2022-02-18 14:35:54
## 4 0.828939 2022-02-18 14:35:54
## 5 0.835343 2022-02-18 14:35:55
## 6 0.837904 2022-02-18 14:35:55

Data Quality Assessment

Plot the raw signal

Using ggplot2 and the data from our EDA data frame, we start by plotting the raw signal of Student 1’s EDA with a line chart. Give the chart properly labeled axes and an appropriate title.

ggplot(eda) + 
  geom_line(aes(x = Time, y = SCL), size = 0.5) + 
  labs(x = 'Time (24-Hour)', 
       y = 'Skin Conductance Level (\u00B5S)', 
       title = 'Student 1\'s EDA')

Clean the signal

Based on this visualization, we can see that the raw signal has some movement artifact that can be cleaned up. We will use a bidirectional 5th order low-pass Butterworth filter with a cut-off frequency of 0.05 Hz to clean our signal.

bf <- butter(n = 5, W = 0.05, type = 'low')

Create the cleaned signal data with your Butterworth filter.

eda$Clean <- filtfilt(bf, eda$SCL)

Compare the signals

Preview your data to compare the raw and cleaned signal values. Notice the smoothing effect of your filter on the raw values.

head(eda[c('SCL', 'Clean')], 10)
##         SCL     Clean
## 1  0.838918 0.4346005
## 2  0.863520 0.4757751
## 3  0.823816 0.5163025
## 4  0.828939 0.5558108
## 5  0.835343 0.5939481
## 6  0.837904 0.6303892
## 7  0.830219 0.6648415
## 8  0.836623 0.6970501
## 9  0.834062 0.7268013
## 10 0.826377 0.7539262

Now you can visualize the comparison by overlaying your cleaned signal data on top of the raw signal.

combined <- ggplot(eda) + 
  geom_line(aes(x = Time, y = SCL, col = 'Raw'), size = 0.5)  +
  geom_line(aes(x = Time, y = Clean, col = 'Cleaned'), size = 0.5) + 
  labs(x = 'Time (24-Hour)', 
       y = 'Skin Conductance Level (\u00B5S)', 
       color = '',                                # remove the legend title
       title = 'Student 1\'s EDA') +
  scale_color_manual(values = c('red', 'black'))  # differentiate the signals
combined

Analyze by condition

Depending on the experiment, your EDA data may be collected during different conditions. The tags.csv file contains time stamps of button presses on the E4 device (i.e., event markings). Each button press marks the start and end time of each condition.

Each event marking on the Empatica E4 is outputted as a Unix timestamp that needs to be converted to a date-time format to be used in our visualization.

events <- read.csv(paste(f, 'tags.csv', sep = ''), header = F)
events$Time <- as.POSIXct(events$V1, origin = '1970-01-01')
events
##           V1                Time
## 1 1645212971 2022-02-18 14:36:11
## 2 1645213015 2022-02-18 14:36:55
## 3 1645213140 2022-02-18 14:38:59
## 4 1645213186 2022-02-18 14:39:46

Now we can overlay these event markings as reference lines with x-intercepts equivalent to the timestamps.

final <- combined + 
    geom_vline(
      data = events, 
      aes(xintercept = Time), color = 'blue', linetype = 'dashed')
final

Based on this visualization, we can observe changes in the EDA data of Student 1 during various conditions. The skin conductance signal during the baseline rest period appears relatively flat compared to the SCL during the two activity periods around it.