library(tidyverse)
library(formattable) # useful for controlling decimal places
options(scipen = 999) #set no scientific notation
The scale() function makes use of the following arguments.
Normally, to center a variable, you would subtract the mean of all data points from each individual data point. With scale(), this can be accomplished in one simple call.
scale(A, center = TRUE, scale = FALSE)
Normally, to create z-scores (standardized scores) from a variable, you would subtract the mean of all data points from each individual data point, then divide those points by the standard deviation of all points. Again, this can be accomplished in one call using scale().
scale(A, center = TRUE, scale = TRUE)
Sample data from N=5 participants
data <- read_csv("sampleEMG.csv") %>%
select(-X1)
## Warning: Missing column names filled in: 'X1' [1]
## Parsed with column specification:
## cols(
## X1 = col_double(),
## pp_no = col_character(),
## condition = col_character(),
## bin = col_character(),
## bin_no = col_double(),
## trial = col_character(),
## muscle = col_character(),
## rms = col_double()
## )
Filter the df to include only data from 1 participant.
justpp1 <- data %>%
filter(pp_no == "pp1")
Then use pivot_wider to put the data for brow and cheek back into separate columns.
widepp1 <- justpp1 %>%
pivot_wider(names_from = muscle, values_from = rms)
This is the base way to create new variables called zbrow and zcheek using the scale function and specifying center = TRUE and scale = TRUE
widepp1$zbrow = scale(widepp1$brow, center = TRUE, scale = TRUE)
widepp1$zcheek = scale(widepp1$cheek, center = TRUE, scale = TRUE)
It would be cool to be able to do this in a tidyverse mutate way… will try that below.
Check that the mean of the new zbrow and zcheek columns is 0— YES!
summary(widepp1)
## pp_no condition bin bin_no
## Length:176 Length:176 Length:176 Min. : 0
## Class :character Class :character Class :character 1st Qu.: 2
## Mode :character Mode :character Mode :character Median : 5
## Mean : 5
## 3rd Qu.: 8
## Max. :10
## trial brow cheek
## Length:176 Min. : 0.9923 Min. : 4.389
## Class :character 1st Qu.: 1.6991 1st Qu.: 6.171
## Mode :character Median : 2.2721 Median : 7.627
## Mean : 2.9080 Mean : 13.934
## 3rd Qu.: 3.0276 3rd Qu.: 10.047
## Max. :22.2281 Max. :168.999
## zbrow.V1 zcheek.V1
## Min. :-0.844109 Min. :-0.420465
## 1st Qu.:-0.532668 1st Qu.:-0.342000
## Median :-0.280227 Median :-0.277856
## Mean : 0.000000 Mean : 0.000000
## 3rd Qu.: 0.052660 3rd Qu.:-0.171246
## Max. : 8.512778 Max. : 6.830998
Okay, great. See how scores are centred around mean of 0… that’s what we want.
Now… you want to zscore all your RMS scores:
data_wide <- data %>%
pivot_wider(names_from = "muscle", values_from = "rms")
You want them to be centred for each participant, so group_by(pp_no) first.
mutate(z_score = (value - mean(value)) / sd(value))
data_z_manual <- data_wide %>%
group_by(pp_no) %>%
mutate(Zbrow = (brow - mean(brow))/sd(brow)) %>%
mutate(Zcheek = (cheek - mean(cheek))/sd(cheek))
data_z_scale <- data_wide %>%
group_by(pp_no) %>%
mutate(Zbrow = scale(brow, center = TRUE, scale = TRUE)) %>%
mutate(Zcheek = scale(cheek, center = TRUE, scale = TRUE))
Use a filter then summary to check that it gives you the same values as before for a single participant.
data_z_scale %>%
filter(pp_no == "pp1") %>%
summary()
## pp_no condition bin bin_no
## Length:176 Length:176 Length:176 Min. : 0
## Class :character Class :character Class :character 1st Qu.: 2
## Mode :character Mode :character Mode :character Median : 5
## Mean : 5
## 3rd Qu.: 8
## Max. :10
## trial brow cheek Zbrow
## Length:176 Min. : 0.9923 Min. : 4.389 Min. :-0.84411
## Class :character 1st Qu.: 1.6991 1st Qu.: 6.171 1st Qu.:-0.53267
## Mode :character Median : 2.2721 Median : 7.627 Median :-0.28023
## Mean : 2.9080 Mean : 13.934 Mean : 0.00000
## 3rd Qu.: 3.0276 3rd Qu.: 10.047 3rd Qu.: 0.05266
## Max. :22.2281 Max. :168.999 Max. : 8.51278
## Zcheek
## Min. :-0.4205
## 1st Qu.:-0.3420
## Median :-0.2779
## Mean : 0.0000
## 3rd Qu.:-0.1712
## Max. : 6.8310
One more check, do a group_by(pp_no) and summarise() mean for z brow and zcheek to make sure that for each particpant the mean is 0.
check_z <- data_z_scale %>%
group_by(pp_no) %>%
summarise(meanZbrow = mean(Zbrow), meanZcheek= mean(Zcheek))
glimpse(check_z)
## Rows: 5
## Columns: 3
## $ pp_no <chr> "pp1", "pp2", "pp3", "pp4", "pp5"
## $ meanZbrow <dbl> 0.00000000000000009106744, -0.00000000000000003789718â¦
## $ meanZcheek <dbl> 0.00000000000000003683964, 0.00000000000000008911649,â¦
The global options(scipen = 999) is working (no scientific notation) but how to get fewer decimal places? round() just rounds to the nearest whole number, makes everything 0. The formattable
pacakge has an option to set digits and format = FALSE (meaning no scientific notation). But on glimpse() it leaves the data in a weird formttble format- problem? probably not.
check_z <- data_z_scale %>%
group_by(pp_no) %>%
summarise(meanZbrow = mean(Zbrow), meanZcheek= mean(Zcheek))
# set 4 decimal places
check_z$meanZbrow <- formattable(check_z$meanZbrow, digits = 4, format = "f")
check_z$meanZcheek <- formattable(check_z$meanZcheek, digits = 4, format = "f")
glimpse(check_z)
## Rows: 5
## Columns: 3
## $ pp_no <chr> "pp1", "pp2", "pp3", "pp4", "pp5"
## $ meanZbrow <formttbl> 0.0000, -0.0000, 0.0000, -0.0000, 0.0000
## $ meanZcheek <formttbl> 0.0000, 0.0000, 0.0000, -0.0000, 0.0000
Okay, now that we have z scores for each muscle/participant, we need to calculate difference scores from baseline. Need to make BIN wide to allow for calculations across columns. Its a bit difficult to so that for both muscles at the same time, so lets separate and work out the difference scores for brow and cheek separately.
add a muscle column to make things easier to join back together later
brow_z <- data_z_scale %>%
mutate(muscle = "brow") %>%
select(pp_no, condition, bin, trial, muscle, Zbrow)
glimpse(brow_z)
## Rows: 880
## Columns: 6
## Groups: pp_no [5]
## $ pp_no <chr> "pp1", "pp1", "pp1", "pp1", "pp1", "pp1", "pp1", "pp1"â¦
## $ condition <chr> "stimtype1", "stimtype1", "stimtype1", "stimtype1", "sâ¦
## $ bin <chr> "bin_0", "bin_1", "bin_2", "bin_3", "bin_4", "bin_5", â¦
## $ trial <chr> "trial1", "trial1", "trial1", "trial1", "trial1", "triâ¦
## $ muscle <chr> "brow", "brow", "brow", "brow", "brow", "brow", "brow"â¦
## $ Zbrow <dbl> -0.31206180, -0.33616363, -0.09166442, -0.76475325, -0â¦
Make the bin column wide and rename bin_0 as BL (i.e. baseline)
brow_z_wide <- brow_z %>%
pivot_wider(names_from = "bin", values_from = "Zbrow") %>%
rename(BL = bin_0)
Uses wide columns to calcuate the difference between each bin column and BL, creating a new set of columns starting with “diff”, drop BL column and all columns starting with bin (i.e. raw z scores).
brow_z_diff <- brow_z_wide %>%
mutate(diff_bin1 = bin_1 - BL, diff_bin2 = bin_2 - BL,
diff_bin3 = bin_3- BL, diff_bin4 = bin_4 - BL,
diff_bin5 = bin_5 - BL, diff_bin6 = bin_6 - BL,
diff_bin7 = bin_7 - BL, diff_bin8 = bin_8 - BL,
diff_bin9 = bin_9 - BL, diff_bin10 = bin_10 - BL) %>%
select(-BL, - starts_with("bin"))
This brow_z_diff df contains for each bin the difference between stimulus and basline, so POSITIVE difference scores = greater activity during STIM than BL and NEGATIVE difference scores = greater activity during BL than STIM
brow_z_diff_long <- brow_z_diff %>%
pivot_longer(names_to = "bin", values_to = "Zdiff", diff_bin1:diff_bin10)
brow_z_diff_long$bin <- as_factor(brow_z_diff_long$bin)
brow_z_diff_long %>%
group_by(condition, bin) %>%
summarise(meanBROWdiff = mean(Zdiff, na.rm = TRUE)) %>%
ggplot(aes(x = bin, y = meanBROWdiff, colour = condition, group = condition)) +
geom_point() +
geom_line() +
labs(title = "brow activity difference from baseline")
cheek_z <- data_z_scale %>%
mutate(muscle = "cheek") %>%
select(pp_no, condition, bin, trial, muscle, Zcheek)
glimpse(cheek_z)
## Rows: 880
## Columns: 6
## Groups: pp_no [5]
## $ pp_no <chr> "pp1", "pp1", "pp1", "pp1", "pp1", "pp1", "pp1", "pp1"â¦
## $ condition <chr> "stimtype1", "stimtype1", "stimtype1", "stimtype1", "sâ¦
## $ bin <chr> "bin_0", "bin_1", "bin_2", "bin_3", "bin_4", "bin_5", â¦
## $ trial <chr> "trial1", "trial1", "trial1", "trial1", "trial1", "triâ¦
## $ muscle <chr> "cheek", "cheek", "cheek", "cheek", "cheek", "cheek", â¦
## $ Zcheek <dbl> -0.2100443, -0.3602717, -0.1314636, -0.2858276, -0.340â¦
Make the bin column wide and rename bin_0 as BL (i.e. baseline)
cheek_z_wide <- cheek_z %>%
pivot_wider(names_from = "bin", values_from = "Zcheek") %>%
rename(BL = bin_0)
Uses wide columns to calcuate the difference between each bin column and BL, creating a new set of columns starting with “diff”, drop BL column and all columns starting with bin (i.e. raw z scores).
cheek_z_diff <- cheek_z_wide %>%
mutate(diff_bin1 = bin_1 - BL, diff_bin2 = bin_2 - BL,
diff_bin3 = bin_3- BL, diff_bin4 = bin_4 - BL,
diff_bin5 = bin_5 - BL, diff_bin6 = bin_6 - BL,
diff_bin7 = bin_7 - BL, diff_bin8 = bin_8 - BL,
diff_bin9 = bin_9 - BL, diff_bin10 = bin_10 - BL) %>%
select(-BL, - starts_with("bin"))
This cheek_z_diff df contains for each bin the difference between stimulus and basline, so POSITIVE difference scores = greater activity during STIM than BL and NEGATIVE difference scores = greater activity during BL than STIM
cheek_z_diff_long <- cheek_z_diff %>%
pivot_longer(names_to = "bin", values_to = "Zdiff", diff_bin1:diff_bin10)
cheek_z_diff_long$bin <- as_factor(cheek_z_diff_long$bin)
cheek_z_diff_long %>%
group_by(condition, bin) %>%
summarise(meanCHEEKdiff = mean(Zdiff, na.rm = TRUE)) %>%
ggplot(aes(x = bin, y = meanCHEEKdiff, colour = condition, group = condition)) +
geom_point() +
geom_line() +
labs(title = "cheek activity difference from baseline")