Leah Kelly
Introduction
This project examines recognition accuracy across repeated learning
cycles in two groups: Healthy Controls (HC) and People with Aphasia
(PWA). Learning and recognition performance are important topics in
psycholinguistics because they show how language processing changes over
time and how neurological differences shape learning outcomes. By
comparing performance across cycles and groups, this analysis highlights
both group level patterns and individual variability.
Using quantitative methods introduced in LNGN 320/APLN 536, this
project relies on data visualization to explore trends, variation, and
group differences. Rather than focusing on raw data alone, the analysis
uses summary measures and multiple visualization types to make patterns
in learning and accuracy easier to interpret.
Dataset Description
Data on recognition accuracy is gathered over several learning cycles
to make up the dataset. The researchers who conducted this study took
two groups of people, Healthy Control and Person with Aphasia, and
presented them with a set of made-up words.
The analysis represents novel word learning and measured by
recognition accuracy across the repeated learning cycles between the two
groups (HC and PWA). Recognition based measures is more useful to
measure in people with aphasia since their speech production is
typically limited or hindered. This way, the researchers can examine
lexical learning while not focusing too much on phonological
production.
Group and cycle variables were converted to factors, and missing or
invalid recognition values were excluded from visualizations where
necessary. No transformations were applied beyond these basic cleaning
steps, allowing the patterns observed in the figures to closely reflect
the original data structure.
Research Questions
This analysis addresses the following research questions:
How does recognition accuracy change across learning
cycles?
Do Healthy Controls and People with Aphasia differ in overall
recognition accuracy?
How much variability exists within each group across
cycles?
These questions are designed to be answered visually and
quantitatively using summary statistics and comparative plots.
library(tidyverse)
# Load data
data <- read_csv("/Users/leahkelly/Downloads/DataR (2).csv")
# Clean data
data_clean <- data %>%
mutate(
group = factor(group, levels = c("HC", "PWA")),
cycle = factor(cycle)
)
# Summary stats
rec_summary <- data_clean %>%
group_by(group, cycle) %>%
summarise(
n = sum(!is.na(recognition)),
mean_recognition = mean(recognition, na.rm = TRUE),
sd_recognition = sd(recognition, na.rm = TRUE),
se_recognition = sd_recognition / sqrt(n),
.groups = "drop"
)
# Plot
ggplot(rec_summary, aes(x = cycle, y = mean_recognition, group = group,
linetype = group, shape = group)) +
geom_line() +
geom_point(size = 3) +
geom_errorbar(
aes(ymin = mean_recognition - se_recognition,
ymax = mean_recognition + se_recognition),
width = 0.1
) +
scale_y_continuous(
labels = scales::percent_format(accuracy = 1),
limits = c(0, 1)
) +
labs(
title = "Recognition Accuracy Across Learning Cycles",
subtitle = "Comparing Healthy Controls (HC) and People with Aphasia (PWA)",
x = "Cycle",
y = "Mean recognition accuracy",
linetype = "Group",
shape = "Group"
) +
theme_minimal(base_size = 13)

2.1 Visualization Type 1: Mean Accuracy Across Learning
Cycles
The first visualization demonstrates mean recognition accuracy across
learning cycles for both groups, along with measures of variability.
This type of plot is well suited for identifying overall trends and
comparing group performance over time.
Figure 1 shows that recognition accuracy increases across cycles for
both groups, showing increased learning over time. Healthy Controls
consistently demonstrate higher mean accuracy than People with Aphasia
at each cycle. However, the upward trend for both groups suggests that
repeated exposure supports improvement regardless of group. Error bars
are there for variability around the mean and show that while the groups
differ, their performance ranges partially overlap.
library(tidyverse)
# Load data
data <- read_csv("/Users/leahkelly/Downloads/DataR (2).csv")
# Clean + optional filter to avoid missing/out-of-range warnings
data_clean <- data %>%
mutate(
group = factor(group, levels = c("HC", "PWA")),
cycle = factor(cycle)
) %>%
filter(!is.na(recognition), recognition >= 0, recognition <= 1)
# Summary stats (mean ± SE)
rec_summary <- data_clean %>%
group_by(group, cycle) %>%
summarise(
n = n(),
mean_recognition = mean(recognition),
sd_recognition = sd(recognition),
se_recognition = sd_recognition / sqrt(n),
.groups = "drop"
) %>%
mutate(
ymin = pmax(0, mean_recognition - se_recognition),
ymax = pmin(1, mean_recognition + se_recognition)
)
# Plot
ggplot(rec_summary, aes(x = cycle, y = mean_recognition, group = group)) +
geom_ribbon(
aes(ymin = ymin, ymax = ymax, fill = group),
alpha = 0.18,
color = NA
) +
geom_jitter(
data = data_clean,
aes(x = cycle, y = recognition, color = group),
width = 0.12,
alpha = 0.35,
size = 2,
inherit.aes = FALSE
) +
geom_line(aes(color = group), linewidth = 1.2) +
geom_point(aes(color = group), size = 3) +
scale_y_continuous(
labels = scales::percent_format(accuracy = 1),
limits = c(0, 1)
) +
scale_color_brewer(palette = "Set2") +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Recognition Accuracy Across Learning Cycles (Raw + Mean)",
subtitle = "Dots = individual scores. Line = group mean. Band = mean ± 1 SE.",
x = "Cycle",
y = "Recognition accuracy",
color = "Group",
fill = "Group"
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold"),
panel.grid.minor = element_blank(),
legend.position = "top"
)

2.2 Visualization Type 2: Distribution of Accuracy
Scores
To complement the mean-based analysis, the second visualization
focuses on the distribution of recognition accuracy scores using violin
and boxplot elements. This visualization highlights the spread, and
shape of the data within each cycle and group.
Figure 2 reveals substantial variability within both groups, mainly
in earlier cycles. The distribution for People with Aphasia is wider,
suggesting greater individual differences in performance. Healthy
Controls show more clustered scores, especially in later cycles,
suggesting more consistent recognition accuracy. This visualization
demonstrates why relying solely on means can be misleading, recognizing
individual scores in the data is important to the bigger picture.
library(tidyverse)
# Load data
data <- read_csv("/Users/leahkelly/Downloads/DataR (2).csv")
# Clean data
data_clean <- data %>%
mutate(
group = factor(group, levels = c("HC", "PWA")),
cycle = factor(cycle)
)
# Create Correct vs Not Correct (assumes recognition is 0/1)
data_binary <- data_clean %>%
filter(!is.na(recognition)) %>%
mutate(outcome = if_else(recognition == 1, "Correct", "Not Correct"))
# Proportions per group/cycle
prop_data <- data_binary %>%
count(group, cycle, outcome) %>%
group_by(group, cycle) %>%
mutate(prop = n / sum(n)) %>%
ungroup()
# Plot
ggplot(prop_data, aes(x = cycle, y = prop, fill = outcome)) +
geom_col(width = 0.75) +
facet_wrap(~ group) +
scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
scale_fill_brewer(palette = "Set2") +
labs(
title = "Correct vs Not Correct Responses Across Learning Cycles",
subtitle = "Each bar totals 100% within each group and cycle",
x = "Cycle",
y = "Percentage of responses",
fill = ""
) +
theme_minimal(base_size = 14) +
theme(
plot.title = element_text(face = "bold"),
panel.grid.minor = element_blank(),
legend.position = "top"
)

2.3 Visualization Type 3: Correct vs. Not Correct
Responses
The third visualization displays the proportion of correct versus not
correct responses across learning cycles using a stacked bar chart. This
format emphasizes relative proportions and makes group comparisons
intuitive.
Figure 3 shows an increasing proportion of correct responses across
cycles for both groups. Healthy Controls reach a higher proportion of
correct responses earlier, while People with Aphasia show more gradual
improvement. Because each bar sums to 100%, this visualization makes it
easy to compare accuracy patterns across cycles without being influenced
by differences in raw counts.
Conclusion and Discussion
Across all visualizations, a consistent pattern emerges: recognition
accuracy improves across learning cycles for both groups, but Healthy
Controls outperform People with Aphasia at each stage. The combination
of visualizations reveals complementary insights. Mean based plots
highlight overall trends, distributional plots expose individual
variability, and proportional charts clarify group differences in
correctness.
This data bolsters the fact that with repeated exposure to a word, it
becomes easier for the brain to retain in healthy individuals, but also
in people who have neurological conditions as well.
These visualizations are particularly useful because they make
abstract numerical patterns visible and easy to digest and interpret.
Overall, this project demonstrates how quantitative visualization
techniques can be used to explore learning and recognition patterns in
linguistic data in a clear and meaningful way.
These visualizations support the study’s conclusion that recognition
based measurement provide a more informative assessment of novel word
learning in individuals with aphasia. Although we are not provided more
information on lesion location or type of aphasia, the overall data
reflects a meaningful pattern in learning performance.
---
title: "Recognition Accuracy Healthy Individuals vs. People With Aphasia"
output:
  html_notebook:
    code_folding: hide
  html_document:
    df_print: paged
---

Leah Kelly

## [**Introduction**]{.underline}

This project examines recognition accuracy across repeated learning cycles in two groups: Healthy Controls (HC) and People with Aphasia (PWA). Learning and recognition performance are important topics in psycholinguistics because they show how language processing changes over time and how neurological differences shape learning outcomes. By comparing performance across cycles and groups, this analysis highlights both group level patterns and individual variability.

Using quantitative methods introduced in LNGN 320/APLN 536, this project relies on data visualization to explore trends, variation, and group differences. Rather than focusing on raw data alone, the analysis uses summary measures and multiple visualization types to make patterns in learning and accuracy easier to interpret.

## [**Dataset Description**]{.underline}

Data on recognition accuracy is gathered over several learning cycles to make up the dataset. The researchers who conducted this study took two groups of people, Healthy Control and Person with Aphasia, and presented them with a set of made-up words.

The analysis represents novel word learning and measured by recognition accuracy across the repeated learning cycles between the two groups (HC and PWA). Recognition based measures is more useful to measure in people with aphasia since their speech production is typically limited or hindered. This way, the researchers can examine lexical learning while not focusing too much on phonological production.

Group and cycle variables were converted to factors, and missing or invalid recognition values were excluded from visualizations where necessary. No transformations were applied beyond these basic cleaning steps, allowing the patterns observed in the figures to closely reflect the original data structure.

## [**Research Questions**]{.underline}

This analysis addresses the following research questions:

1.  How does recognition accuracy change across learning cycles?

2.  Do Healthy Controls and People with Aphasia differ in overall recognition accuracy?

3.  How much variability exists within each group across cycles?

These questions are designed to be answered visually and quantitatively using summary statistics and comparative plots.

```{r}
library(tidyverse)

# Load data
data <- read_csv("/Users/leahkelly/Downloads/DataR (2).csv")

# Clean data
data_clean <- data %>%
  mutate(
    group = factor(group, levels = c("HC", "PWA")),
    cycle = factor(cycle)
  )

# Summary stats
rec_summary <- data_clean %>%
  group_by(group, cycle) %>%
  summarise(
    n = sum(!is.na(recognition)),
    mean_recognition = mean(recognition, na.rm = TRUE),
    sd_recognition = sd(recognition, na.rm = TRUE),
    se_recognition = sd_recognition / sqrt(n),
    .groups = "drop"
  )

# Plot
ggplot(rec_summary, aes(x = cycle, y = mean_recognition, group = group,
                        linetype = group, shape = group)) +
  geom_line() +
  geom_point(size = 3) +
  geom_errorbar(
    aes(ymin = mean_recognition - se_recognition,
        ymax = mean_recognition + se_recognition),
    width = 0.1
  ) +
  scale_y_continuous(
    labels = scales::percent_format(accuracy = 1),
    limits = c(0, 1)
  ) +
  labs(
    title = "Recognition Accuracy Across Learning Cycles",
    subtitle = "Comparing Healthy Controls (HC) and People with Aphasia (PWA)",
    x = "Cycle",
    y = "Mean recognition accuracy",
    linetype = "Group",
    shape = "Group"
  ) +
  theme_minimal(base_size = 13)

```

## [**2.1 Visualization Type 1: Mean Accuracy Across Learning Cycles**]{.underline}

The first visualization demonstrates mean recognition accuracy across learning cycles for both groups, along with measures of variability. This type of plot is well suited for identifying overall trends and comparing group performance over time.

Figure 1 shows that recognition accuracy increases across cycles for both groups, showing increased learning over time. Healthy Controls consistently demonstrate higher mean accuracy than People with Aphasia at each cycle. However, the upward trend for both groups suggests that repeated exposure supports improvement regardless of group. Error bars are there for variability around the mean and show that while the groups differ, their performance ranges partially overlap.

```{r}
library(tidyverse)

# Load data
data <- read_csv("/Users/leahkelly/Downloads/DataR (2).csv")

# Clean + optional filter to avoid missing/out-of-range warnings
data_clean <- data %>%
  mutate(
    group = factor(group, levels = c("HC", "PWA")),
    cycle = factor(cycle)
  ) %>%
  filter(!is.na(recognition), recognition >= 0, recognition <= 1)

# Summary stats (mean ± SE)
rec_summary <- data_clean %>%
  group_by(group, cycle) %>%
  summarise(
    n = n(),
    mean_recognition = mean(recognition),
    sd_recognition = sd(recognition),
    se_recognition = sd_recognition / sqrt(n),
    .groups = "drop"
  ) %>%
  mutate(
    ymin = pmax(0, mean_recognition - se_recognition),
    ymax = pmin(1, mean_recognition + se_recognition)
  )

# Plot
ggplot(rec_summary, aes(x = cycle, y = mean_recognition, group = group)) +
  geom_ribbon(
    aes(ymin = ymin, ymax = ymax, fill = group),
    alpha = 0.18,
    color = NA
  ) +
  geom_jitter(
    data = data_clean,
    aes(x = cycle, y = recognition, color = group),
    width = 0.12,
    alpha = 0.35,
    size = 2,
    inherit.aes = FALSE
  ) +
  geom_line(aes(color = group), linewidth = 1.2) +
  geom_point(aes(color = group), size = 3) +
  scale_y_continuous(
    labels = scales::percent_format(accuracy = 1),
    limits = c(0, 1)
  ) +
  scale_color_brewer(palette = "Set2") +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Recognition Accuracy Across Learning Cycles (Raw + Mean)",
    subtitle = "Dots = individual scores. Line = group mean. Band = mean ± 1 SE.",
    x = "Cycle",
    y = "Recognition accuracy",
    color = "Group",
    fill = "Group"
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.minor = element_blank(),
    legend.position = "top"
  )

```

## [2.2 Visualization Type 2: Distribution of Accuracy Scores]{.underline}

To complement the mean-based analysis, the second visualization focuses on the distribution of recognition accuracy scores using violin and boxplot elements. This visualization highlights the spread, and shape of the data within each cycle and group.

Figure 2 reveals substantial variability within both groups, mainly in earlier cycles. The distribution for People with Aphasia is wider, suggesting greater individual differences in performance. Healthy Controls show more clustered scores, especially in later cycles, suggesting more consistent recognition accuracy. This visualization demonstrates why relying solely on means can be misleading, recognizing individual scores in the data is important to the bigger picture.

```{r}
library(tidyverse)

# Load data
data <- read_csv("/Users/leahkelly/Downloads/DataR (2).csv")

# Clean data
data_clean <- data %>%
  mutate(
    group = factor(group, levels = c("HC", "PWA")),
    cycle = factor(cycle)
  )

# Create Correct vs Not Correct (assumes recognition is 0/1)
data_binary <- data_clean %>%
  filter(!is.na(recognition)) %>%
  mutate(outcome = if_else(recognition == 1, "Correct", "Not Correct"))

# Proportions per group/cycle
prop_data <- data_binary %>%
  count(group, cycle, outcome) %>%
  group_by(group, cycle) %>%
  mutate(prop = n / sum(n)) %>%
  ungroup()

# Plot
ggplot(prop_data, aes(x = cycle, y = prop, fill = outcome)) +
  geom_col(width = 0.75) +
  facet_wrap(~ group) +
  scale_y_continuous(labels = scales::percent_format(accuracy = 1)) +
  scale_fill_brewer(palette = "Set2") +
  labs(
    title = "Correct vs Not Correct Responses Across Learning Cycles",
    subtitle = "Each bar totals 100% within each group and cycle",
    x = "Cycle",
    y = "Percentage of responses",
    fill = ""
  ) +
  theme_minimal(base_size = 14) +
  theme(
    plot.title = element_text(face = "bold"),
    panel.grid.minor = element_blank(),
    legend.position = "top"
  )

```

## [**2.3 Visualization Type 3: Correct vs. Not Correct Responses**]{.underline}

The third visualization displays the proportion of correct versus not correct responses across learning cycles using a stacked bar chart. This format emphasizes relative proportions and makes group comparisons intuitive.

Figure 3 shows an increasing proportion of correct responses across cycles for both groups. Healthy Controls reach a higher proportion of correct responses earlier, while People with Aphasia show more gradual improvement. Because each bar sums to 100%, this visualization makes it easy to compare accuracy patterns across cycles without being influenced by differences in raw counts.

## [**Conclusion and Discussion**]{.underline}

Across all visualizations, a consistent pattern emerges: recognition accuracy improves across learning cycles for both groups, but Healthy Controls outperform People with Aphasia at each stage. The combination of visualizations reveals complementary insights. Mean based plots highlight overall trends, distributional plots expose individual variability, and proportional charts clarify group differences in correctness.

This data bolsters the fact that with repeated exposure to a word, it becomes easier for the brain to retain in healthy individuals, but also in people who have neurological conditions as well.

These visualizations are particularly useful because they make abstract numerical patterns visible and easy to digest and interpret. Overall, this project demonstrates how quantitative visualization techniques can be used to explore learning and recognition patterns in linguistic data in a clear and meaningful way.

These visualizations support the study's conclusion that recognition based measurement provide a more informative assessment of novel word learning in individuals with aphasia. Although we are not provided more information on lesion location or type of aphasia, the overall data reflects a meaningful pattern in learning performance.
