Dysarthria is a speech disorder where the muscles used to produce speech are damaged, paralyzed, or weakened. Close to 90% of people with Parkinson’s Disease (PD) present with hypokinetic dysarthria, evidenced by reduced vocal loudness, monotone, reduced fundamental frequency range, consonant and vowel imprecision, breathiness and irregular pauses.1 Evaluation of dysarthia in disorders such as PD has been limited to perceptual tests or user-controlled laboratory analysis based upon rather small samples of human vocalizations. The development of a fully-automated method to detect early, distinctive patterns of neuro-degeneration using only acoustic data from connected speech such as reading a short passage or monologue would have the potential to revolutionize the diagnostic process for diseases manifesting motor speech disorders.2
The dataset associated with the article “Automated analysis of connected speech reveals early biomarkers of Parkinson’s disease in patients with rapid eye movement sleep behaviour disorder”2 is available as an excel file via the article on Nature.com. It’s also on Kaggle as a csv file.3
The data include 30 patients with early untreated Parkinson’s, 50 patients with REM sleep behavior disorder (RBD), which are at high risk developing Parkinson’s disease, and 50 healthy controls (HC). All patients were scored clinically by a neurologist with experience in movement disorders. All subjects were examined during a single session with a speech specialist. All subjects were recorded reading a standardized, phonetically-balanced text of 80 words and a monologue about their interests, job, family or current activities for approximately 90 seconds. Speech features were automatically analyzed by Jan Hlavnička et al. (2017).3
Both my father and my father-in-law had Parkinson’s disease, so I found this dataset and paper about developing automated diagnosis from speech recordings interesting. The diagrams in the article have plots that are simple and clear.
I’ll try to approximate a couple of the plots, but first I’ll get the data, clean it up and do a bit of analysis.
setwd("C:/Users/Owner/Desktop/My Documents/School/DataFiles/Parkinsons")
peb <- tibble(read_csv("parkinsonsEarlyBiomarkers.csv")) # read Parkinsons Early Biomarkers dataset
spec(peb) # look at the peb tibble
## cols(
## `Participant code` = col_character(),
## `Age (years)` = col_double(),
## Gender = col_character(),
## `Positive history of Parkinson disease in family` = col_character(),
## `Age of disease onset (years)` = col_character(),
## `Duration of disease from first symptoms (years)` = col_character(),
## `Antidepressant therapy` = col_character(),
## `Antiparkinsonian medication` = col_character(),
## `Antipsychotic medication` = col_character(),
## `Benzodiazepine medication` = col_character(),
## `Levodopa equivalent (mg/day)` = col_double(),
## `Clonazepam (mg/day)` = col_double(),
## `Overview of motor examination: Hoehn & Yahr scale (-)` = col_character(),
## `Overview of motor examination: UPDRS III total (-)` = col_character(),
## `18. Speech` = col_character(),
## `19. Facial Expression` = col_character(),
## `20. Tremor at Rest - head` = col_character(),
## `20. Tremor at Rest - RUE` = col_character(),
## `20. Tremor at Rest - LUE` = col_character(),
## `20. Tremor at Rest - RLE` = col_character(),
## `20. Tremor at Rest - LLE` = col_character(),
## `21. Action or Postural Tremor - RUE` = col_character(),
## `21. Action or Postural Tremor - LUE` = col_character(),
## `22. Rigidity - neck` = col_character(),
## `22. Rigidity - RUE` = col_character(),
## `22. Rigidity - LUE` = col_character(),
## `22. Rigidity - RLE` = col_character(),
## `22. Rigidity - LLE` = col_character(),
## `23.Finger Taps - RUE` = col_character(),
## `23.Finger Taps - LUE` = col_character(),
## `24. Hand Movements - RUE` = col_character(),
## `24. Hand Movements - LUE` = col_character(),
## `25. Rapid Alternating Movements - RUE` = col_character(),
## `25. Rapid Alternating Movements - LUE` = col_character(),
## `26. Leg Agility - RLE` = col_character(),
## `26. Leg Agility - LLE` = col_character(),
## `27. Arising from Chair` = col_character(),
## `28. Posture` = col_character(),
## `29. Gait` = col_character(),
## `30. Postural Stability` = col_character(),
## `31. Body Bradykinesia and Hypokinesia` = col_character(),
## `Entropy of speech timing (-)` = col_double(),
## `Rate of speech timing (-/min)` = col_double(),
## `Acceleration of speech timing (-/min2)` = col_double(),
## `Duration of pause intervals (ms)` = col_double(),
## `Duration of voiced intervals (ms)` = col_double(),
## `Gaping in-between voiced intervals (-/min)` = col_double(),
## `Duration of unvoiced stops (ms)` = col_double(),
## `Decay of unvoiced fricatives (‰/min)` = col_double(),
## `Relative loudness of respiration (dB)` = col_double(),
## `Pause intervals per respiration (-)` = col_double(),
## `Rate of speech respiration (-/min)` = col_double(),
## `Latency of respiratory exchange (ms)` = col_double(),
## `Entropy of speech timing (-)_1` = col_double(),
## `Rate of speech timing (-/min)_1` = col_double(),
## `Acceleration of speech timing (-/min2)_1` = col_double(),
## `Duration of pause intervals (ms)_1` = col_double(),
## `Duration of voiced intervals (ms)_1` = col_double(),
## `Gaping in-between voiced intervals (-/min)_1` = col_double(),
## `Duration of unvoiced stops (ms)_1` = col_double(),
## `Decay of unvoiced fricatives (‰/min)_1` = col_double(),
## `Relative loudness of respiration (dB)_1` = col_double(),
## `Pause intervals per respiration (-)_1` = col_double(),
## `Rate of speech respiration (-/min)_1` = col_double(),
## `Latency of respiratory exchange (ms)_1` = col_double()
## )
For my analysis of the data, I’ll deselect the medication and the 26 motor skill scores (numbered 19. through 31. in the field labels). Then I’ll give the variables concise names.
units <- names(peb[42:53]) %>% str_extract("(\\(.*\\))") # pull the exam measurement units off of the names
names(peb) <- gsub("(of) | (per) | (in-between) | (\\(.*\\))"," ", names(peb)) # clean out extraneous words
names(peb) <- str_to_title(names(peb)) # capitalize each word
names(peb) <- gsub(" ","",names(peb)) # eliminate spaces
names(peb) # look at the names
## [1] "ParticipantCode"
## [2] "Age"
## [3] "Gender"
## [4] "PositiveHistoryParkinsonDiseaseInFamily"
## [5] "AgeDiseaseOnset"
## [6] "DurationDiseaseFromFirstSymptoms"
## [7] "AntidepressantTherapy"
## [8] "AntiparkinsonianMedication"
## [9] "AntipsychoticMedication"
## [10] "BenzodiazepineMedication"
## [11] "LevodopaEquivalent"
## [12] "Clonazepam"
## [13] "OverviewMotorExamination:Hoehn&YahrScale"
## [14] "OverviewMotorExamination:UpdrsIiiTotal"
## [15] "18.Speech"
## [16] "19.FacialExpression"
## [17] "20.TremorAtRest-Head"
## [18] "20.TremorAtRest-Rue"
## [19] "20.TremorAtRest-Lue"
## [20] "20.TremorAtRest-Rle"
## [21] "20.TremorAtRest-Lle"
## [22] "21.ActionOrPosturalTremor-Rue"
## [23] "21.ActionOrPosturalTremor-Lue"
## [24] "22.Rigidity-Neck"
## [25] "22.Rigidity-Rue"
## [26] "22.Rigidity-Lue"
## [27] "22.Rigidity-Rle"
## [28] "22.Rigidity-Lle"
## [29] "23.FingerTaps-Rue"
## [30] "23.FingerTaps-Lue"
## [31] "24.HandMovements-Rue"
## [32] "24.HandMovements-Lue"
## [33] "25.RapidAlternatingMovements-Rue"
## [34] "25.RapidAlternatingMovements-Lue"
## [35] "26.LegAgility-Rle"
## [36] "26.LegAgility-Lle"
## [37] "27.ArisingFromChair"
## [38] "28.Posture"
## [39] "29.Gait"
## [40] "30.PosturalStability"
## [41] "31.BodyBradykinesiaAndHypokinesia"
## [42] "EntropySpeechTiming"
## [43] "RateSpeechTiming"
## [44] "AccelerationSpeechTiming"
## [45] "DurationPauseIntervals"
## [46] "DurationVoicedIntervals"
## [47] "GapingVoicedIntervals"
## [48] "DurationUnvoicedStops"
## [49] "DecayUnvoicedFricatives"
## [50] "RelativeLoudnessRespiration"
## [51] "PauseIntervalsRespiration"
## [52] "RateSpeechRespiration"
## [53] "LatencyRespiratoryExchange"
## [54] "EntropySpeechTiming_1"
## [55] "RateSpeechTiming_1"
## [56] "AccelerationSpeechTiming_1"
## [57] "DurationPauseIntervals_1"
## [58] "DurationVoicedIntervals_1"
## [59] "GapingVoicedIntervals_1"
## [60] "DurationUnvoicedStops_1"
## [61] "DecayUnvoicedFricatives_1"
## [62] "RelativeLoudnessRespiration_1"
## [63] "PauseIntervalsRespiration_1"
## [64] "RateSpeechRespiration_1"
## [65] "LatencyRespiratoryExchange_1"
Create a set of acronym labels for the exam measurements.
| Label | Exam measurement |
|---|---|
| RST | rate of speech timing |
| AST | acceleration of speech timing |
| DPI | duration of pause intervals |
| EST | entropy of speech timing |
| DUS | duration of unvoiced stops |
| DUF | decay of unvoiced fricatives |
| DVI | duration of voiced intervals |
| GVI | gaping in-between voiced intervals |
| RSR | rate of speech respiration |
| PIR | pause intervals per respiration |
| RLR | relative loudness of respiration |
| LRE | latency of respiratory exchange |
tLabels <- gsub("[a-z]", "", names(peb[42:53])) # grab the capital letters to create acronyms
for (i in 1:12) {
tLabels[i] <- str_glue(tLabels[i], " ", units[i]) # concatenate the acronyms with units
}
tLabels # look at the labels
## [1] "EST (-)" "RST (-/min)" "AST (-/min2)" "DPI (ms)" "DVI (ms)"
## [6] "GVI (-/min)" "DUS (ms)" "DUF (‰/min)" "RLR (dB)" "PIR (-)"
## [11] "RSR (-/min)" "LRE (ms)"
pdata <- select(peb, id = ParticipantCode, # create a tibble with the needed variables
age = Age,
dysarthia = `18.Speech`,
est_r = EntropySpeechTiming,
rst_r = RateSpeechTiming,
ast_r = AccelerationSpeechTiming,
dpi_r = DurationPauseIntervals,
dvi_r = DurationVoicedIntervals,
gvi_r = GapingVoicedIntervals,
dus_r = DurationUnvoicedStops,
duf_r = DecayUnvoicedFricatives,
rlr_r = RelativeLoudnessRespiration,
pir_r = PauseIntervalsRespiration,
rsr_r = RateSpeechRespiration,
lre_r = LatencyRespiratoryExchange,
est_m = EntropySpeechTiming_1,
rst_m = RateSpeechTiming_1,
ast_m = AccelerationSpeechTiming_1,
dpi_m = DurationPauseIntervals_1,
dvi_m = DurationVoicedIntervals_1,
gvi_m = GapingVoicedIntervals_1,
dus_m = DurationUnvoicedStops_1,
duf_m = DecayUnvoicedFricatives_1,
rlr_m = RelativeLoudnessRespiration_1,
pir_m = PauseIntervalsRespiration_1,
rsr_m = RateSpeechRespiration_1,
lre_m = LatencyRespiratoryExchange_1
)
pdata$group <- str_extract(peb$ParticipantCode, "(PD|RBD|HC)") # pull out the group categories from the ids
pdata$group <- pdata$group %>% str_replace("HC", "controls") %>% # factor group to order and label like the article diagrams
factor(levels = c("PD", "RBD", "controls"))
# assuming control group does not have dysarthia, change hyphen to 0 to prevent NA
pdata$dysarthia <- pdata$dysarthia %>% str_replace('-','0') %>% as.numeric()
head(pdata) %>% kable() %>% kable_minimal()
| id | age | dysarthia | est_r | rst_r | ast_r | dpi_r | dvi_r | gvi_r | dus_r | duf_r | rlr_r | pir_r | rsr_r | lre_r | est_m | rst_m | ast_m | dpi_m | dvi_m | gvi_m | dus_m | duf_m | rlr_m | pir_m | rsr_m | lre_m | group |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| PD01 | 58 | 0 | 1.564 | 354 | 6.05 | 146 | 264 | 58.65 | 31.38 | -2.101 | -22.47 | 4.50 | 21.14 | 167 | 1.564 | 333 | -2.82 | 158 | 318 | 49.01 | 22.37 | 0.588 | -19.77 | 6 | 13.81 | 127 | PD |
| PD02 | 68 | 1 | 1.564 | 340 | 27.52 | 173 | 253 | 48.26 | 22.38 | -1.745 | -24.59 | 7.00 | 15.28 | 163 | 1.569 | 285 | 8.20 | 295 | 264 | 40.56 | 26.88 | -0.825 | -23.26 | 4 | 21.77 | 313 | PD |
| PD03 | 68 | 0 | 1.550 | 211 | 11.97 | 377 | 322 | 47.54 | 38.12 | 2.657 | -16.89 | 3.00 | 20.76 | 372 | 1.550 | 247 | 4.71 | 280 | 317 | 48.97 | 22.37 | -0.955 | -13.29 | 4 | 22.52 | 201 | PD |
| PD04 | 75 | 0 | 1.519 | 140 | -2.49 | 360 | 663 | 13.72 | 44.88 | -0.934 | -25.54 | 1.00 | 18.71 | 119 | 1.539 | 112 | -9.09 | 397 | 800 | 18.69 | 49.37 | 0.791 | -25.08 | 2 | 14.37 | 151 | PD |
| PD05 | 61 | 1 | 1.543 | 269 | 6.72 | 211 | 328 | 42.90 | 47.12 | -0.973 | -22.61 | 5.00 | 16.26 | 78 | 1.560 | 230 | 11.77 | 206 | 480 | 33.54 | 26.87 | 0.075 | -22.32 | 5 | 14.61 | 151 | PD |
| PD06 | 58 | 1 | 1.553 | 317 | 24.19 | 186 | 286 | 43.83 | 33.63 | 0.921 | -25.00 | 2.75 | 27.07 | 124 | 1.518 | 181 | 13.38 | 611 | 398 | 18.18 | 49.37 | 1.488 | -25.08 | 2 | 18.21 | 593 | PD |
What percentage of each group is tagged as having dysarthia by perceptual tests?
dpct <- group_by(pdata, group) %>% # create a table of percentages by group
summarize(num = n(), dysarthia = sum(dysarthia), dysarthiaPct = (sum(dysarthia) / n()) * 100)
kable(dpct, caption = "Percentage of participants tagged with dysarthia", digits = 2, # print the table
col.names = c("Group", "#Participants", "Dysarthia", "%")) %>% kable_material()
| Group | #Participants | Dysarthia | % |
|---|---|---|---|
| PD | 30 | 13 | 43.33 |
| RBD | 50 | 3 | 6.00 |
| controls | 50 | 0 | 0.00 |
Only 6% of the RBD group are tagged with dysarthia by the perceptual evaluation, while 43% of the group with untreated Parkinson’s disease are tagged.
Age itself can have an impact on speech and vocal quality. I’ll look at the distribution of participant ages to see how they compare between the test groups.
# create a quick plot of the participant ages in each group
qplot(age, data = pdata, fill=group, geom = "dotplot", facets = "group")
It’s not common to see Parkinson’s disease in people younger than 50, but for a small subset the disease strikes early. People are diagnosed with Parkinson’s at an average age of 60. My father was diagnosed at 52, and my father-in-law in his seventies. Anything younger than 50 is considered young-onset Parkinson’s, or YOPD. Rarely, PD may be diagnosed in people younger than 40 — about 2 percent of the 1 million people with Parkinson’s were diagnosed earlier than age 40.4 One of the most recognizable is Michael J. Fox, who was diagnosed at age 29.5
The samples show three YOPD subjects and a reasonable balance of subjects in the three groups across the age ranges.
Now I’ll look at the correlations of the exam measurements for the two test groups against the controls. First I’ll compare the Parkinson’s group with the control group.
pacman::p_load(GGally)
pcor <- pdata %>% filter(group != "RBD") %>% select(est_r:lre_m) # filter for the PD and controls groups
# create a correlation plot
pdcorr <- ggcorr(pcor, nbreaks = 5, palette = "RdGy", # set the number of levels and the colors
hjust = .65, size = 4, # nudge and format the axis labels
label = TRUE, label_size = 4, label_color = "white", # format the coefficient
name = "Significance") + # legend title
labs(title = "PD / Control Correlation", # plot title
caption = "source: https://www.nature.com/articles/s41598-017-00047-5") +
theme(plot.title.position = "plot",
plot.title = element_text(hjust = .5, vjust = 0, size = rel(2)), # position and format the title and caption
plot.caption.position = "plot")
pdcorr
Next I’ll look at the correlations for the REM sleep disorder group and the control group.
pcor <- pdata %>% filter(group != "PD") %>% select(est_r:lre_m) # filter for the RBD and controls groups
rbdcorr <- ggcorr(pcor, nbreaks = 5, palette = "RdGy", # set the number of levels and the colors
hjust = .65, size = 4, # nudge and format the axis labels
label = TRUE, label_size = 4, label_color = "white", # format the coefficient
name = "Significance") + # legend title
labs(title = "RBD / Control Correlation", # plot title
caption = "source: https://www.nature.com/articles/s41598-017-00047-5") +
theme(plot.title.position = "plot",
plot.title = element_text(hjust = .5, vjust = 0, size = rel(2)), # position and format the title and caption
plot.caption.position = "plot")
rbdcorr
# create another version of the plot with just the significant coefficients highlighted
ggcorr(pcor, geom = "blank", label = TRUE, hjust = 0.65, size = 4) +
geom_point(size = 10, aes(color = coefficient > 0, alpha = abs(coefficient) > 0.5)) +
scale_alpha_manual(values = c("TRUE" = 0.25, "FALSE" = 0)) +
guides(alpha = FALSE) +
scale_color_discrete(name = "Significance", labels = c("<= -0.05", ">= 0.5")) +
labs(title = "RBD / Control Correlation",
subtitle = "Another view of the correlation table",
caption = "source: https://www.nature.com/articles/s41598-017-00047-5") +
theme(plot.title.position = "plot", plot.title = element_text(hjust = .5, vjust = 0, size = rel(2)),
plot.subtitle = element_text(hjust = .5), plot.caption.position = "plot")
The correlations for PD and RBD are similar. There are a few significant coefficients that are different, e.g. dpi_m to lre_m and rst_r to dvi_m. Looking at tables of the significant correlations shows that the RBD group has five coefficients in common with the PD group.
# create and print tables of the coefficients for the PD and controls groups
pdcorr$data %>% filter(abs(coefficient) > 0.7 ) %>% # filter for significant values
select(1:3) %>% # select the columns to print
arrange(coefficient) %>%
kable(caption = "Significant correlation of Parkinson's group to control group", digits = 2, # format and print the table
col.names = c("PD", "Control", "Coefficient")) %>% kable_minimal() # use a kable theme
| PD | Control | Coefficient |
|---|---|---|
| dvi_r | rst_r | -0.85 |
| dpi_r | rst_r | -0.80 |
| dvi_m | rst_m | -0.79 |
| dpi_m | rst_m | -0.77 |
| gvi_m | est_m | 0.72 |
| rlr_m | rlr_r | 0.74 |
| dvi_m | dvi_r | 0.84 |
# do the same thing for the RBD to controls groups
rbdcorr$data %>% filter(abs(coefficient) > 0.7 ) %>%
select(1:3) %>%
arrange(coefficient) %>%
kable(caption = "Significant correlation of RBD group to control group", digits = 2,
col.names = c("RBD", "Control", "Coefficient")) %>% kable_minimal()
| RBD | Control | Coefficient |
|---|---|---|
| dvi_r | rst_r | -0.88 |
| dvi_m | rst_m | -0.78 |
| dpi_m | rst_m | -0.78 |
| dpi_r | rst_r | -0.75 |
| gvi_m | est_m | 0.73 |
A blinded experiment was executed by quadratic discriminant function analysis that showed the most distinctive disturbed speech patterns between the PD and control groups were found for a combination of
Figure 5 from the paper shows plots of two pairings of exam measurements:
Figure 5. Hlavnička, J., Čmejla, R., Tykalová, T. et al. Automated analysis of connected speech reveals early biomarkers of Parkinson’s disease in patients with rapid eye movement sleep behaviour disorder.2
The solid gray line represents the border of discrimination between speech negatives (RBD subjects with asymptomatic speech performance) and speech positives (subjects with speech performance closer to PD speakers). Speech negatives and speech positives were determined in the Index Test during the analysis of the experiment. That data isn’t available and I need more experience with statistics to reproduce the Index Test. Such as:
So I’ll create the plots with a regression line.
cols <- c('red', 'powderblue', 'darkgreen') # colors for the points
shps <- c(20, 16, 4) # shapes for the points
g <- guide_legend("Group") # initalize the legend
# Graph A - x = RST reading task, y = DPI monologue task
gA <- pdata %>% ggplot() +
geom_smooth(aes(rst_r, dpi_m), method = lm, formula=y~x, se = FALSE, color = "grey", size = 2) + # regression line
geom_point(aes(rst_r, dpi_m, color = group, shape = group, size = group)) + # point plot on top
scale_fill_discrete(name = "Group") +
coord_cartesian(xlim = c(100, 500), ylim = c(100, 700)) + # expand the axes scales
labs(title = "Graph A - Acoustic features of distinctive speech patterns", # titles and labels
x = "Reading RST (-/min)",
y = "Monologue DPI (ms)") +
theme_light() +
scale_shape_manual(values = shps, guide = g) + # use the specified shapes
scale_color_manual(values=cols, guide = g) + # colors for each shape
scale_size_discrete(range = c(3,5), guide = g) # sizes for each shape
gA
gB <- pdata %>% ggplot() +
geom_smooth(aes(dpi_r, dus_m), method = lm, formula=y~x, se = FALSE, color = "grey", size = 2) + # regression line
geom_point(aes(dpi_r, dus_m, color = group, shape = group, size = group)) + # point plot on top
scale_fill_discrete(name = "Group") + # name the legend
labs(title = "Graph B - Acoustic features of distinctive speech patterns", # add titles and labels
x = "Reading DPI (ms)",
y = "Monologue DUS (ms)") +
theme_light() +
scale_shape_manual(values = shps, guide = g) + # use the specified shapes
scale_color_manual(values=cols, guide = g) + # colors for each shape
scale_size_discrete(range = c(3,5), guide = g) # sizes for each shape
gB
Here is an interactive version of Graph A built using plot_ly:
shps <- c("circle", "circle", "x") # shapes for the scatter plot markers
cols <- c('red', 'lightblue', 'darkgreen') # colors for the scatter plot markers
szs <- pdata$group %>% str_replace("PD", "7") %>% # sizes for the scatter plot markers
str_replace("RBD", "20") %>%
str_replace("controls", "14") %>% as.numeric()
fit <- lm(dpi_m ~ rst_r, data = pdata) # formula for the regression line
p2 <- plot_ly() %>% # start an interactive plot
add_trace(data = pdata, # add a scatter plot layer
x=~rst_r,
y=~dpi_m,
type="scatter",
mode= "markers",
marker = list(size = szs), # specify the marker attributes
color = pdata$group,
colors = cols,
symbol = pdata$group,
symbols = shps,
hoverinfo= "text", # create the popup text
text = paste("Participant: ", pdata$id,
"<br>",
"Group: ", pdata$group,
"<br>",
"Rate of speech intervals: ", pdata$rst_r,
"<br>",
"Duration of pause intervals: ", pdata$dpi_m
)
) %>%
add_trace(data = pdata, x = ~rst_r, y = fitted(fit),
type = "scatter", # add the regression line
mode = "lines",
line = list(color = 'grey', width = 4),
opacity = 0.5,
showlegend = FALSE, # don't show line on legend
inherit = FALSE) # don't inherit popup or aesthetics from scatter plot layer
p2 <- p2 %>% layout(title = "Graph A - Acoustic features of distinctive speech patterns", # add titles
xaxis = list(title = "Reading RST (-/min)"),
yaxis = list(title = "Monologue DPI (ms)" ),
legend = list(title = list(text="Group")),
margin = list(t = 90))
p2
The following figure from the paper shows the results of the acoustic speech analysis. The 12 exam measurements are grouped into four categories and plotted to show the ranges.
Figure 3. Hlavnička, J., Čmejla, R., Tykalová, T. et al. Automated analysis of connected speech reveals early biomarkers of Parkinson’s disease in patients with rapid eye movement sleep behaviour disorder.2
To replicate these plots I’ll rearrange the data to combine the measurements for the two exam recordings for each participant into a single column for variable type and a column for it’s associated value.
pd1 <- pdata %>% pivot_longer(est_r:lre_m,
names_to = c("meastyp", "examtyp"),
names_pattern = "(.*)_(.)",
values_to = "amt"
)
getMlabel <- sort(tLabels) # create a lookup table of the labels
nmes <- unique(pd1$meastyp) %>% sort # create the lookup names
names(getMlabel) <- sort(nmes)
pd1$mlabel <- getMlabel[pd1$meastyp] # add labels to pd1
pd1$meastyp <-factor(pd1$meastyp, levels=c("est", # order the types as they are in the dataset
"rst",
"ast",
"dpi",
"dvi",
"gvi",
"dus",
"duf",
"rlr",
"pir",
"rsr",
"lre")
)
Create a faceted boxplot to look at the spread for each of the variables within each group.
ggplot(pd1, aes(meastyp, amt, fill = meastyp)) + # create a faceted boxplot
geom_boxplot() +
coord_cartesian(ylim = c(-35, 615)) + # expand the y-axis scale
facet_grid(cols = vars(group), scales = "free") + # facet by the group
theme_bw() + scale_x_discrete(breaks=NULL) + # don't show labels on the x-axis
scale_fill_discrete(name = "Speech measurement",
labels = tLabels) +
labs(x = NULL, y = NULL)
The units as shown in the legend for each type are quite different. High outliers for PD subjects are evident, speech rate and voiced duration are similar for PD and RBD but differ from the control group. Relative loudness of respiration has all negative values.
I’ll split the data by the four categories in figure 3, and plot each category.
# Add speech categories
pd1 <- pd1 %>% mutate(spchcat = if_else(meastyp %in% c("rlr", "lre", "pir", "rsr"), "Respiration",
if_else(meastyp %in% c("rst", "ast", "dpi", "est"), "Timing",
if_else(meastyp %in% c("gvi", "dvi"), "Phonation", "Articulation")
))
)
gcat <- "Respiration" # initialize the category variable
pcolor <- "green" # initialize the color variable
pd1$group <- factor(pd1$group, levels = c("PD", "RBD", "controls")) # factor group to display in specified order
# create a function with the plot template
pt <- function (gcat, pcolor) {
dcat <- pd1 %>% filter(spchcat == gcat) # create a faceted box plot
ggplot(dcat) +
geom_boxplot(aes(group, amt, fill = examtyp), outlier.shape = NA) +
facet_wrap(vars(mlabel), scales = "free") +
theme_minimal() +
scale_fill_grey(name = "Legend:", labels = c("Reading passage", "Monologue")) +
labs(title = gcat, x = NULL, y = NULL) +
theme(plot.title.position = "plot", plot.title = element_text(hjust = .5, vjust = 0, size = rel(2)), # add and position title
legend.position = "bottom", legend.direction = "horizontal", # put the legend on the bottom
plot.background = element_rect(color = pcolor, size = 6), # put a rectangle around it
plot.margin = margin(.5, .5, .5, .5, "cm"),
strip.text.x = element_text(face = "bold", size = rel(1.5) )) # bold the facet labels
}
# execute the template for each category
pt("Respiration", "dodgerblue")
pt("Phonation", "firebrick2")
pt("Timing", "seagreen3")
pt("Articulation", "yellow3")
I think these are a reasonably close approximation to the plots in the diagram. I could probably get closer using layers or different functions.
This project was a challenge, I did a lot of research to find commands and methods to get the results I wanted. I would find new ways to do things I’d already coded, so I re-factored my code quite a bit. I tried three different ways to create the correlation plots and settled on using ggcorr as the most legible for the large number variables. And I feel that dropping out the insignificant coefficients or highlighting the significant coefficients helps to focus on the areas of interest.
Moya-Galé G, Levy ES (2018). Parkinson’s disease-associated dysarthria: prevalence, impact and management strategies. https://www.dovepress.com/parkinsons-disease-associated-dysarthria-prevalence-impact-and-managem-peer-reviewed-article-JPRLS
Hlavnička, J., Čmejla, R., Tykalová, T. et al. (2017). Automated analysis of connected speech reveals early biomarkers of Parkinson’s disease in patients with rapid eye movement sleep behaviour disorder. https://www.nature.com/articles/s41598-017-00047-5
Early biomarkers of parkinsons disease dataset. https://www.kaggle.com/ruslankl/early-biomarkers-of-parkinsons-disease
Young-Onset Parkinson’s Disease. https://www.hopkinsmedicine.org/health/conditions-and-diseases/parkinsons-disease/youngonset-parkinsons-disease
Michael J. Fox interview on NBC’s Today Sunday Sitdown, Nov. 22, 2020. https://www.today.com/health/michael-j-fox-recalls-wife-tracy-pollan-s-reaction-parkinson-t199866