load packages

library(tidyverse)
library(formattable) # useful for controlling decimal places

options(scipen = 999) #set no scientific notation

The process

first, make data wide using pivot_wider so that brow and cheek are in columns
use scale() to transform RMS scores into z scores for each participant.
then make bin wide so that each row represents a single trial and bins 1-10 are in different columns
make new column for each bin, calculating the difference between that binZ and baselineZ.

A. How to calculate Z scores from EMG data

Notes re the scale() function.

The scale() function makes use of the following arguments.

x: a numeric object
center: if TRUE, the objects’ column means are subtracted from the values in those columns (ignoring NAs); if FALSE, centering is not performed
scale: if TRUE, the centered column values are divided by the column’s standard deviation (when center is also TRUE; otherwise, the root mean square is used); if FALSE, scaling is not performed

1. Centering Variables

Normally, to center a variable, you would subtract the mean of all data points from each individual data point. With scale(), this can be accomplished in one simple call.

center variable A using the scale() function

scale(A, center = TRUE, scale = FALSE)

2. Generating Z-Scores

Normally, to create z-scores (standardized scores) from a variable, you would subtract the mean of all data points from each individual data point, then divide those points by the standard deviation of all points. Again, this can be accomplished in one call using scale().

generate z-scores for variable A using the scale() function

scale(A, center = TRUE, scale = TRUE)

Lets try it

read in clean data

Sample data from N=5 participants

data <- read_csv("sampleEMG.csv") %>%
  select(-X1)

## Warning: Missing column names filled in: 'X1' [1]

## Parsed with column specification:
## cols(
##   X1 = col_double(),
##   pp_no = col_character(),
##   condition = col_character(),
##   bin = col_character(),
##   bin_no = col_double(),
##   trial = col_character(),
##   muscle = col_character(),
##   rms = col_double()
## )

Test it on a single participant

Filter the df to include only data from 1 participant.

justpp1 <- data %>%
  filter(pp_no == "pp1")

Then use pivot_wider to put the data for brow and cheek back into separate columns.

widepp1 <- justpp1 %>%
  pivot_wider(names_from = muscle, values_from = rms)

This is the base way to create new variables called zbrow and zcheek using the scale function and specifying center = TRUE and scale = TRUE

widepp1$zbrow = scale(widepp1$brow, center = TRUE, scale = TRUE)

widepp1$zcheek = scale(widepp1$cheek, center = TRUE, scale = TRUE)

It would be cool to be able to do this in a tidyverse mutate way… will try that below.

Check that the mean of the new zbrow and zcheek columns is 0— YES!

summary(widepp1)

##     pp_no            condition             bin                bin_no  
##  Length:176         Length:176         Length:176         Min.   : 0  
##  Class :character   Class :character   Class :character   1st Qu.: 2  
##  Mode  :character   Mode  :character   Mode  :character   Median : 5  
##                                                           Mean   : 5  
##                                                           3rd Qu.: 8  
##                                                           Max.   :10  
##     trial                brow             cheek        
##  Length:176         Min.   : 0.9923   Min.   :  4.389  
##  Class :character   1st Qu.: 1.6991   1st Qu.:  6.171  
##  Mode  :character   Median : 2.2721   Median :  7.627  
##                     Mean   : 2.9080   Mean   : 13.934  
##                     3rd Qu.: 3.0276   3rd Qu.: 10.047  
##                     Max.   :22.2281   Max.   :168.999  
##       zbrow.V1            zcheek.V1     
##  Min.   :-0.844109   Min.   :-0.420465  
##  1st Qu.:-0.532668   1st Qu.:-0.342000  
##  Median :-0.280227   Median :-0.277856  
##  Mean   : 0.000000   Mean   : 0.000000  
##  3rd Qu.: 0.052660   3rd Qu.:-0.171246  
##  Max.   : 8.512778   Max.   : 6.830998

Okay, great. See how scores are centred around mean of 0… that’s what we want.

Extend to all participants

Now… you want to zscore all your RMS scores:

first, pivot_wider(muscle, rms) …. this will create two new columns, brow and cheek where the values are RMS scores
then, calculate new zscores columns for each muscle
then pivot_wider bin
then calculate difference score between BL and STIM for each muscle, for each trial

1. make muscle wide

data_wide <- data %>%
  pivot_wider(names_from = "muscle", values_from = "rms")

2. now calculate z scores

You want them to be centred for each participant, so group_by(pp_no) first.

Option 1: mutate z scores manually

mutate(z_score = (value - mean(value)) / sd(value))

data_z_manual <- data_wide %>%
  group_by(pp_no) %>%
  mutate(Zbrow = (brow - mean(brow))/sd(brow)) %>%
  mutate(Zcheek = (cheek - mean(cheek))/sd(cheek))

Option 2: mutate z scores using scale()

data_z_scale <- data_wide %>%
  group_by(pp_no) %>%
  mutate(Zbrow = scale(brow, center = TRUE, scale = TRUE)) %>%
  mutate(Zcheek = scale(cheek, center = TRUE, scale = TRUE))

Check 1: check one participant

Use a filter then summary to check that it gives you the same values as before for a single participant.

data_z_scale %>%
  filter(pp_no == "pp1") %>%
  summary()

##     pp_no            condition             bin                bin_no  
##  Length:176         Length:176         Length:176         Min.   : 0  
##  Class :character   Class :character   Class :character   1st Qu.: 2  
##  Mode  :character   Mode  :character   Mode  :character   Median : 5  
##                                                           Mean   : 5  
##                                                           3rd Qu.: 8  
##                                                           Max.   :10  
##     trial                brow             cheek             Zbrow         
##  Length:176         Min.   : 0.9923   Min.   :  4.389   Min.   :-0.84411  
##  Class :character   1st Qu.: 1.6991   1st Qu.:  6.171   1st Qu.:-0.53267  
##  Mode  :character   Median : 2.2721   Median :  7.627   Median :-0.28023  
##                     Mean   : 2.9080   Mean   : 13.934   Mean   : 0.00000  
##                     3rd Qu.: 3.0276   3rd Qu.: 10.047   3rd Qu.: 0.05266  
##                     Max.   :22.2281   Max.   :168.999   Max.   : 8.51278  
##      Zcheek       
##  Min.   :-0.4205  
##  1st Qu.:-0.3420  
##  Median :-0.2779  
##  Mean   : 0.0000  
##  3rd Qu.:-0.1712  
##  Max.   : 6.8310

Check 2: group_by() summarise()

One more check, do a group_by(pp_no) and summarise() mean for z brow and zcheek to make sure that for each particpant the mean is 0.

check_z <- data_z_scale %>%
  group_by(pp_no) %>%
  summarise(meanZbrow = mean(Zbrow), meanZcheek= mean(Zcheek)) 

glimpse(check_z)

## Rows: 5
## Columns: 3
## $ pp_no      <chr> "pp1", "pp2", "pp3", "pp4", "pp5"
## $ meanZbrow  <dbl> 0.00000000000000009106744, -0.00000000000000003789718â¦
## $ meanZcheek <dbl> 0.00000000000000003683964, 0.00000000000000008911649,â¦

The global options(scipen = 999) is working (no scientific notation) but how to get fewer decimal places? round() just rounds to the nearest whole number, makes everything 0. The formattable pacakge has an option to set digits and format = FALSE (meaning no scientific notation). But on glimpse() it leaves the data in a weird formttble format- problem? probably not.

check_z <- data_z_scale %>%
  group_by(pp_no) %>%
  summarise(meanZbrow = mean(Zbrow), meanZcheek= mean(Zcheek)) 
                                                    
# set 4 decimal places
check_z$meanZbrow <- formattable(check_z$meanZbrow, digits = 4, format = "f")
check_z$meanZcheek <- formattable(check_z$meanZcheek, digits = 4, format = "f")

glimpse(check_z)

## Rows: 5
## Columns: 3
## $ pp_no      <chr> "pp1", "pp2", "pp3", "pp4", "pp5"
## $ meanZbrow  <formttbl> 0.0000, -0.0000, 0.0000, -0.0000, 0.0000
## $ meanZcheek <formttbl> 0.0000, 0.0000, 0.0000, -0.0000, 0.0000

B. How to calculate difference scores

Okay, now that we have z scores for each muscle/participant, we need to calculate difference scores from baseline. Need to make BIN wide to allow for calculations across columns. Its a bit difficult to so that for both muscles at the same time, so lets separate and work out the difference scores for brow and cheek separately.

BROW FIRST

1. select just the variables you need

add a muscle column to make things easier to join back together later

brow_z <- data_z_scale %>%
   mutate(muscle = "brow") %>%
  select(pp_no, condition, bin, trial, muscle, Zbrow) 
 

glimpse(brow_z)

## Rows: 880
## Columns: 6
## Groups: pp_no [5]
## $ pp_no     <chr> "pp1", "pp1", "pp1", "pp1", "pp1", "pp1", "pp1", "pp1"â¦
## $ condition <chr> "stimtype1", "stimtype1", "stimtype1", "stimtype1", "sâ¦
## $ bin       <chr> "bin_0", "bin_1", "bin_2", "bin_3", "bin_4", "bin_5", â¦
## $ trial     <chr> "trial1", "trial1", "trial1", "trial1", "trial1", "triâ¦
## $ muscle    <chr> "brow", "brow", "brow", "brow", "brow", "brow", "brow"â¦
## $ Zbrow     <dbl> -0.31206180, -0.33616363, -0.09166442, -0.76475325, -0â¦

2. pivot_wider on bin

Make the bin column wide and rename bin_0 as BL (i.e. baseline)

brow_z_wide <- brow_z %>%
  pivot_wider(names_from = "bin", values_from = "Zbrow") %>%
  rename(BL = bin_0)

3. mutate() diff scores

Uses wide columns to calcuate the difference between each bin column and BL, creating a new set of columns starting with “diff”, drop BL column and all columns starting with bin (i.e. raw z scores).

brow_z_diff <- brow_z_wide %>%
  mutate(diff_bin1 = bin_1 - BL, diff_bin2 = bin_2 - BL,
         diff_bin3 = bin_3- BL, diff_bin4 = bin_4 - BL,
         diff_bin5 = bin_5 - BL, diff_bin6 = bin_6 - BL,
         diff_bin7 = bin_7 - BL, diff_bin8 = bin_8 - BL, 
        diff_bin9 = bin_9 - BL, diff_bin10 = bin_10 - BL) %>%
  select(-BL, - starts_with("bin"))

This brow_z_diff df contains for each bin the difference between stimulus and basline, so POSITIVE difference scores = greater activity during STIM than BL and NEGATIVE difference scores = greater activity during BL than STIM

4. pivot_longer() for easy plotting

brow_z_diff_long <- brow_z_diff %>%
  pivot_longer(names_to = "bin", values_to = "Zdiff", diff_bin1:diff_bin10)

brow_z_diff_long$bin <- as_factor(brow_z_diff_long$bin)

5. plot- do these scores make sense?

brow_z_diff_long %>%
  group_by(condition, bin) %>%
  summarise(meanBROWdiff = mean(Zdiff, na.rm = TRUE)) %>%
  ggplot(aes(x = bin, y = meanBROWdiff, colour = condition, group = condition)) +
  geom_point() + 
  geom_line() +
  labs(title = "brow activity difference from baseline")

NOW CHEEK

1. select just the variables you need

cheek_z <- data_z_scale %>%
  mutate(muscle = "cheek") %>%
  select(pp_no, condition, bin, trial, muscle, Zcheek) 

glimpse(cheek_z)

## Rows: 880
## Columns: 6
## Groups: pp_no [5]
## $ pp_no     <chr> "pp1", "pp1", "pp1", "pp1", "pp1", "pp1", "pp1", "pp1"â¦
## $ condition <chr> "stimtype1", "stimtype1", "stimtype1", "stimtype1", "sâ¦
## $ bin       <chr> "bin_0", "bin_1", "bin_2", "bin_3", "bin_4", "bin_5", â¦
## $ trial     <chr> "trial1", "trial1", "trial1", "trial1", "trial1", "triâ¦
## $ muscle    <chr> "cheek", "cheek", "cheek", "cheek", "cheek", "cheek", â¦
## $ Zcheek    <dbl> -0.2100443, -0.3602717, -0.1314636, -0.2858276, -0.340â¦

2. pivot_wider on bin

Make the bin column wide and rename bin_0 as BL (i.e. baseline)

cheek_z_wide <- cheek_z %>%
  pivot_wider(names_from = "bin", values_from = "Zcheek") %>%
  rename(BL = bin_0)

3. mutate() diff scores

cheek_z_diff <- cheek_z_wide %>%
  mutate(diff_bin1 = bin_1 - BL, diff_bin2 = bin_2 - BL,
         diff_bin3 = bin_3- BL, diff_bin4 = bin_4 - BL,
         diff_bin5 = bin_5 - BL, diff_bin6 = bin_6 - BL,
         diff_bin7 = bin_7 - BL, diff_bin8 = bin_8 - BL, 
        diff_bin9 = bin_9 - BL, diff_bin10 = bin_10 - BL) %>%
  select(-BL, - starts_with("bin"))

This cheek_z_diff df contains for each bin the difference between stimulus and basline, so POSITIVE difference scores = greater activity during STIM than BL and NEGATIVE difference scores = greater activity during BL than STIM

4. pivot_longer() for easy plotting

cheek_z_diff_long <- cheek_z_diff %>%
  pivot_longer(names_to = "bin", values_to = "Zdiff", diff_bin1:diff_bin10)

cheek_z_diff_long$bin <- as_factor(cheek_z_diff_long$bin)

5. plot- do these scores make sense?

cheek_z_diff_long %>%
  group_by(condition, bin) %>%
  summarise(meanCHEEKdiff = mean(Zdiff, na.rm = TRUE)) %>%
  ggplot(aes(x = bin, y = meanCHEEKdiff, colour = condition, group = condition)) +
  geom_point() + 
  geom_line() +
  labs(title = "cheek activity difference from baseline")

centering Z scores

Jenny

10 June 2020

load packages

The process

A. How to calculate Z scores from EMG data

Notes re the scale() function.

1. Centering Variables

center variable A using the scale() function

2. Generating Z-Scores

generate z-scores for variable A using the scale() function

Lets try it

read in clean data

Test it on a single participant

Extend to all participants

1. make muscle wide

2. now calculate z scores

Option 1: mutate z scores manually

Option 2: mutate z scores using scale()

Check 1: check one participant

Check 2: group_by() summarise()

B. How to calculate difference scores

BROW FIRST

1. select just the variables you need

2. pivot_wider on bin

3. mutate() diff scores

4. pivot_longer() for easy plotting

5. plot- do these scores make sense?

NOW CHEEK

1. select just the variables you need

2. pivot_wider on bin

3. mutate() diff scores

4. pivot_longer() for easy plotting

5. plot- do these scores make sense?