Leah Kelly

Introduction

This project examines recognition accuracy across repeated learning cycles in two groups: Healthy Controls (HC) and People with Aphasia (PWA). Learning and recognition performance are important topics in psycholinguistics because they show how language processing changes over time and how neurological differences shape learning outcomes. By comparing performance across cycles and groups, this analysis highlights both group level patterns and individual variability.

Using quantitative methods introduced in LNGN 320/APLN 536, this project relies on data visualization to explore trends, variation, and group differences. Rather than focusing on raw data alone, the analysis uses summary measures and multiple visualization types to make patterns in learning and accuracy easier to interpret.

Dataset Description

Data on recognition accuracy is gathered over several learning cycles to make up the dataset. The researchers who conducted this study took two groups of people, Healthy Control and Person with Aphasia, and presented them with a set of made-up words.

The analysis represents novel word learning and measured by recognition accuracy across the repeated learning cycles between the two groups (HC and PWA). Recognition based measures is more useful to measure in people with aphasia since their speech production is typically limited or hindered. This way, the researchers can examine lexical learning while not focusing too much on phonological production.

Group and cycle variables were converted to factors, and missing or invalid recognition values were excluded from visualizations where necessary. No transformations were applied beyond these basic cleaning steps, allowing the patterns observed in the figures to closely reflect the original data structure.

Research Questions

This analysis addresses the following research questions:

  1. How does recognition accuracy change across learning cycles?

  2. Do Healthy Controls and People with Aphasia differ in overall recognition accuracy?

  3. How much variability exists within each group across cycles?

These questions are designed to be answered visually and quantitatively using summary statistics and comparative plots.

library(tidyverse)

# Load data
data <- read_csv("/Users/leahkelly/Downloads/DataR (2).csv")

# Clean data
data_clean <- data %>%
  mutate(
    group = factor(group, levels = c("HC", "PWA")),
    cycle = factor(cycle)
  )

# Summary stats
rec_summary <- data_clean %>%
  group_by(group, cycle) %>%
  summarise(
    n = sum(!is.na(recognition)),
    mean_recognition = mean(recognition, na.rm = TRUE),
    sd_recognition = sd(recognition, na.rm = TRUE),
    se_recognition = sd_recognition / sqrt(n),
    .groups = "drop"
  )

# Plot
ggplot(rec_summary, aes(x = cycle, y = mean_recognition, group = group,
                        linetype = group, shape = group)) +
  geom_line() +
  geom_point(size = 3) +
  geom_errorbar(
    aes(ymin = mean_recognition - se_recognition,
        ymax = mean_recognition + se_recognition),
    width = 0.1
  ) +
  scale_y_continuous(
    labels = scales::percent_format(accuracy = 1),
    limits = c(0, 1)
  ) +
  labs(
    title = "Recognition Accuracy Across Learning Cycles",
    subtitle = "Comparing Healthy Controls (HC) and People with Aphasia (PWA)",
    x = "Cycle",
    y = "Mean recognition accuracy",
    linetype = "Group",
    shape = "Group"
  ) +
  theme_minimal(base_size = 13)

2.1 Visualization Type 1: Mean Accuracy Across Learning Cycles

The first visualization demonstrates mean recognition accuracy across learning cycles for both groups, along with measures of variability. This type of plot is well suited for identifying overall trends and comparing group performance over time.

Figure 1 shows that recognition accuracy increases across cycles for both groups, showing increased learning over time. Healthy Controls consistently demonstrate higher mean accuracy than People with Aphasia at each cycle. However, the upward trend for both groups suggests that repeated exposure supports improvement regardless of group. Error bars are there for variability around the mean and show that while the groups differ, their performance ranges partially overlap.

library(tidyverse)

# Load data
data <- read_csv("/Users/leahkelly/Downloads/DataR (2).csv")

# Clean + optional filter to avoid missing/out-of-range warnings
data_clean <- data %>%
  mutate(
    group = factor(group, levels = c("HC", "PWA")),
    cycle = factor(cycle)
  ) %>%
  filter(!is.na(recognition), recognition >= 0, recognition <= 1)

# Summary stats (mean ± SE)
rec_summary <- data_clean %>%
  group_by(group, cycle) %>%
  summarise(
    n = n(),
    mean_recognition = mean(recognition),
    sd_recognition = sd(recognition),
    se_recognition = sd_recognition / sqrt(n),
    .groups = "drop"
  ) %>%
  mutate(
    ymin = pmax(0, mean_recognition - se_recognition),
    ymax = pmin(1, mean_recognition + se_recognition)
  )

# Plot
ggplot(rec_summary, aes(x = cycle, y = mean_recognition, group = group)) +
  geom_ribbon(
    aes(ymin = ymin, ymax = ymax, fill = group),
    alpha = 0.18,
    color = NA
  ) +
  geom_jitter(
    data = data_clean,
    aes(x = cycle, y = recognition, color = group),
    width = 0.12,
    alpha = 0.35,
    size = 2,
    inherit.aes = FALSE
  ) +
  geom_line(aes(color = group), linewidth = 1.2) +
  geom_point(aes(color = group), size = 3) +
  scale_y_continuous(
    labels = scales::percent_format(accuracy = 1),
    limits = c(0, 1)
  ) +
  scale_color_brewer(palette = "Set2") +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Recognition Accuracy Across Learning Cycles (Raw + Mean)",
    subtitle = "Dots = individual scores. Line = group mean. Band = mean ± 1 SE.",
    x = "Cycle",
    y = "Recognition accuracy",
    color = "Group",
    fill = "Group"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.minor = element_blank(),
    legend.position = "top"
  )

2.2 Visualization Type 2: Distribution of Accuracy Scores

To complement the mean-based analysis, the second visualization focuses on the distribution of recognition accuracy scores using violin and boxplot elements. This visualization highlights the spread, and shape of the data within each cycle and group.

Figure 2 reveals substantial variability within both groups, mainly in earlier cycles. The distribution for People with Aphasia is wider, suggesting greater individual differences in performance. Healthy Controls show more clustered scores, especially in later cycles, suggesting more consistent recognition accuracy. This visualization demonstrates why relying solely on means can be misleading, recognizing individual scores in the data is important to the bigger picture.

library(tidyverse)

# Load data
data <- read_csv("/Users/leahkelly/Downloads/DataR (2).csv")

# Clean data
data_clean <- data %>%
  mutate(
    group = factor(group, levels = c("HC", "PWA")),
    cycle = factor(cycle)
  )

# Create Correct vs Not Correct (assumes recognition is 0/1)
data_binary <- data_clean %>%
  filter(!is.na(recognition)) %>%
  mutate(outcome = if_else(recognition == 1, "Correct", "Not Correct"))

# Proportions per group/cycle
prop_data <- data_binary %>%
  count(group, cycle, outcome) %>%
  group_by(group, cycle) %>%
  mutate(prop = n / sum(n)) %>%
  ungroup()

# Plot
ggplot(prop_data, aes(x = cycle, y = prop, fill = outcome)) +
  geom_col(width = 0.75) +
  facet_wrap(~ group) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Correct vs Not Correct Responses Across Learning Cycles",
    subtitle = "Each bar totals 100% within each group and cycle",
    x = "Cycle",
    y = "Percentage of responses",
    fill = ""
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.minor = element_blank(),
    legend.position = "top"
  )

2.3 Visualization Type 3: Correct vs. Not Correct Responses

The third visualization displays the proportion of correct versus not correct responses across learning cycles using a stacked bar chart. This format emphasizes relative proportions and makes group comparisons intuitive.

Figure 3 shows an increasing proportion of correct responses across cycles for both groups. Healthy Controls reach a higher proportion of correct responses earlier, while People with Aphasia show more gradual improvement. Because each bar sums to 100%, this visualization makes it easy to compare accuracy patterns across cycles without being influenced by differences in raw counts.

Conclusion and Discussion

Across all visualizations, a consistent pattern emerges: recognition accuracy improves across learning cycles for both groups, but Healthy Controls outperform People with Aphasia at each stage. The combination of visualizations reveals complementary insights. Mean based plots highlight overall trends, distributional plots expose individual variability, and proportional charts clarify group differences in correctness.

This data bolsters the fact that with repeated exposure to a word, it becomes easier for the brain to retain in healthy individuals, but also in people who have neurological conditions as well.

These visualizations are particularly useful because they make abstract numerical patterns visible and easy to digest and interpret. Overall, this project demonstrates how quantitative visualization techniques can be used to explore learning and recognition patterns in linguistic data in a clear and meaningful way.

These visualizations support the study’s conclusion that recognition based measurement provide a more informative assessment of novel word learning in individuals with aphasia. Although we are not provided more information on lesion location or type of aphasia, the overall data reflects a meaningful pattern in learning performance.

---
title: "Recognition Accuracy Healthy Individuals vs. People With Aphasia"
output:
  html_notebook:
    code_folding: hide
  html_document:
    df_print: paged
---

Leah Kelly

## [**Introduction**]{.underline}

This project examines recognition accuracy across repeated learning cycles in two groups: Healthy Controls (HC) and People with Aphasia (PWA). Learning and recognition performance are important topics in psycholinguistics because they show how language processing changes over time and how neurological differences shape learning outcomes. By comparing performance across cycles and groups, this analysis highlights both group level patterns and individual variability.

Using quantitative methods introduced in LNGN 320/APLN 536, this project relies on data visualization to explore trends, variation, and group differences. Rather than focusing on raw data alone, the analysis uses summary measures and multiple visualization types to make patterns in learning and accuracy easier to interpret.

## [**Dataset Description**]{.underline}

Data on recognition accuracy is gathered over several learning cycles to make up the dataset. The researchers who conducted this study took two groups of people, Healthy Control and Person with Aphasia, and presented them with a set of made-up words.

The analysis represents novel word learning and measured by recognition accuracy across the repeated learning cycles between the two groups (HC and PWA). Recognition based measures is more useful to measure in people with aphasia since their speech production is typically limited or hindered. This way, the researchers can examine lexical learning while not focusing too much on phonological production.

Group and cycle variables were converted to factors, and missing or invalid recognition values were excluded from visualizations where necessary. No transformations were applied beyond these basic cleaning steps, allowing the patterns observed in the figures to closely reflect the original data structure.

## [**Research Questions**]{.underline}

This analysis addresses the following research questions:

1.  How does recognition accuracy change across learning cycles?

2.  Do Healthy Controls and People with Aphasia differ in overall recognition accuracy?

3.  How much variability exists within each group across cycles?

These questions are designed to be answered visually and quantitatively using summary statistics and comparative plots.

```{r}
library(tidyverse)

# Load data
data <- read_csv("/Users/leahkelly/Downloads/DataR (2).csv")

# Clean data
data_clean <- data %>%
  mutate(
    group = factor(group, levels = c("HC", "PWA")),
    cycle = factor(cycle)
  )

# Summary stats
rec_summary <- data_clean %>%
  group_by(group, cycle) %>%
  summarise(
    n = sum(!is.na(recognition)),
    mean_recognition = mean(recognition, na.rm = TRUE),
    sd_recognition = sd(recognition, na.rm = TRUE),
    se_recognition = sd_recognition / sqrt(n),
    .groups = "drop"
  )

# Plot
ggplot(rec_summary, aes(x = cycle, y = mean_recognition, group = group,
                        linetype = group, shape = group)) +
  geom_line() +
  geom_point(size = 3) +
  geom_errorbar(
    aes(ymin = mean_recognition - se_recognition,
        ymax = mean_recognition + se_recognition),
    width = 0.1
  ) +
  scale_y_continuous(
    labels = scales::percent_format(accuracy = 1),
    limits = c(0, 1)
  ) +
  labs(
    title = "Recognition Accuracy Across Learning Cycles",
    subtitle = "Comparing Healthy Controls (HC) and People with Aphasia (PWA)",
    x = "Cycle",
    y = "Mean recognition accuracy",
    linetype = "Group",
    shape = "Group"
  ) +
  theme_minimal(base_size = 13)

```

## [**2.1 Visualization Type 1: Mean Accuracy Across Learning Cycles**]{.underline}

The first visualization demonstrates mean recognition accuracy across learning cycles for both groups, along with measures of variability. This type of plot is well suited for identifying overall trends and comparing group performance over time.

Figure 1 shows that recognition accuracy increases across cycles for both groups, showing increased learning over time. Healthy Controls consistently demonstrate higher mean accuracy than People with Aphasia at each cycle. However, the upward trend for both groups suggests that repeated exposure supports improvement regardless of group. Error bars are there for variability around the mean and show that while the groups differ, their performance ranges partially overlap.

```{r}
library(tidyverse)

# Load data
data <- read_csv("/Users/leahkelly/Downloads/DataR (2).csv")

# Clean + optional filter to avoid missing/out-of-range warnings
data_clean <- data %>%
  mutate(
    group = factor(group, levels = c("HC", "PWA")),
    cycle = factor(cycle)
  ) %>%
  filter(!is.na(recognition), recognition >= 0, recognition <= 1)

# Summary stats (mean ± SE)
rec_summary <- data_clean %>%
  group_by(group, cycle) %>%
  summarise(
    n = n(),
    mean_recognition = mean(recognition),
    sd_recognition = sd(recognition),
    se_recognition = sd_recognition / sqrt(n),
    .groups = "drop"
  ) %>%
  mutate(
    ymin = pmax(0, mean_recognition - se_recognition),
    ymax = pmin(1, mean_recognition + se_recognition)
  )

# Plot
ggplot(rec_summary, aes(x = cycle, y = mean_recognition, group = group)) +
  geom_ribbon(
    aes(ymin = ymin, ymax = ymax, fill = group),
    alpha = 0.18,
    color = NA
  ) +
  geom_jitter(
    data = data_clean,
    aes(x = cycle, y = recognition, color = group),
    width = 0.12,
    alpha = 0.35,
    size = 2,
    inherit.aes = FALSE
  ) +
  geom_line(aes(color = group), linewidth = 1.2) +
  geom_point(aes(color = group), size = 3) +
  scale_y_continuous(
    labels = scales::percent_format(accuracy = 1),
    limits = c(0, 1)
  ) +
  scale_color_brewer(palette = "Set2") +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Recognition Accuracy Across Learning Cycles (Raw + Mean)",
    subtitle = "Dots = individual scores. Line = group mean. Band = mean ± 1 SE.",
    x = "Cycle",
    y = "Recognition accuracy",
    color = "Group",
    fill = "Group"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.minor = element_blank(),
    legend.position = "top"
  )

```

## [2.2 Visualization Type 2: Distribution of Accuracy Scores]{.underline}

To complement the mean-based analysis, the second visualization focuses on the distribution of recognition accuracy scores using violin and boxplot elements. This visualization highlights the spread, and shape of the data within each cycle and group.

Figure 2 reveals substantial variability within both groups, mainly in earlier cycles. The distribution for People with Aphasia is wider, suggesting greater individual differences in performance. Healthy Controls show more clustered scores, especially in later cycles, suggesting more consistent recognition accuracy. This visualization demonstrates why relying solely on means can be misleading, recognizing individual scores in the data is important to the bigger picture.

```{r}
library(tidyverse)

# Load data
data <- read_csv("/Users/leahkelly/Downloads/DataR (2).csv")

# Clean data
data_clean <- data %>%
  mutate(
    group = factor(group, levels = c("HC", "PWA")),
    cycle = factor(cycle)
  )

# Create Correct vs Not Correct (assumes recognition is 0/1)
data_binary <- data_clean %>%
  filter(!is.na(recognition)) %>%
  mutate(outcome = if_else(recognition == 1, "Correct", "Not Correct"))

# Proportions per group/cycle
prop_data <- data_binary %>%
  count(group, cycle, outcome) %>%
  group_by(group, cycle) %>%
  mutate(prop = n / sum(n)) %>%
  ungroup()

# Plot
ggplot(prop_data, aes(x = cycle, y = prop, fill = outcome)) +
  geom_col(width = 0.75) +
  facet_wrap(~ group) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Correct vs Not Correct Responses Across Learning Cycles",
    subtitle = "Each bar totals 100% within each group and cycle",
    x = "Cycle",
    y = "Percentage of responses",
    fill = ""
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.minor = element_blank(),
    legend.position = "top"
  )

```

## [**2.3 Visualization Type 3: Correct vs. Not Correct Responses**]{.underline}

The third visualization displays the proportion of correct versus not correct responses across learning cycles using a stacked bar chart. This format emphasizes relative proportions and makes group comparisons intuitive.

Figure 3 shows an increasing proportion of correct responses across cycles for both groups. Healthy Controls reach a higher proportion of correct responses earlier, while People with Aphasia show more gradual improvement. Because each bar sums to 100%, this visualization makes it easy to compare accuracy patterns across cycles without being influenced by differences in raw counts.

## [**Conclusion and Discussion**]{.underline}

Across all visualizations, a consistent pattern emerges: recognition accuracy improves across learning cycles for both groups, but Healthy Controls outperform People with Aphasia at each stage. The combination of visualizations reveals complementary insights. Mean based plots highlight overall trends, distributional plots expose individual variability, and proportional charts clarify group differences in correctness.

This data bolsters the fact that with repeated exposure to a word, it becomes easier for the brain to retain in healthy individuals, but also in people who have neurological conditions as well.

These visualizations are particularly useful because they make abstract numerical patterns visible and easy to digest and interpret. Overall, this project demonstrates how quantitative visualization techniques can be used to explore learning and recognition patterns in linguistic data in a clear and meaningful way.

These visualizations support the study's conclusion that recognition based measurement provide a more informative assessment of novel word learning in individuals with aphasia. Although we are not provided more information on lesion location or type of aphasia, the overall data reflects a meaningful pattern in learning performance.

###### Navarrete-Orejudo, L., Xim Cerda-Company, Guillem Olivé, Martin, N., Laine, M., Antoni Rodríguez-Fornells, & Peñaloza, C. (2023). Expressive recall and recognition as complementary measures to assess novel word learning ability in aphasia. *Brain and Language*, *243*, 105303–105303. <https://doi.org/10.1016/j.bandl.2023.105303> 
