Lab Activity Logistics

Sit in teams of minimum 3 and maximum 4.
- Two students in each team should work on the qualitative part (double coding) and the remaining on the quantitative.
You have until 10:05 to email your responses in a PDF format to Hiba (hard deadline). The grade of the lab activity will be based on your efforts and not how accurate your responses are. So make sure to show your thinking/rationale for all the questions, even if you do not have the time to finish them.
The first team that finishes both the quantitative and qualitative sections and are willing to present their responses will get a prize. :)

Lab Activity Description

Study Design:

We conducted a survey study to understand how a usable consumer-facing transparency “nutrition” label for AI chatbots impacts users’ trust toward the chatbot.

We recruited 100 participants from Prolific.
Participants were users of both ChatGPT and Gemini.
Each participant was randomly assigned to either the ChatGPT condition or the Gemini condition.
Participants viewed our developed transparency label for their assigned chatbot.

Variables Collected:

participant_id: Unique identifier for each participant
chatbot: Which chatbot the participant was assigned to (ChatGPT or Gemini)
ai_trust_level: General trust toward AI (ordinal scale)
pre_trust_score: Baseline level of trust toward the assigned chatbot before viewing our label (continuous scale from 1-5)
post_trust_score: Level of trust toward the assigned chatbot after viewing our label (continuous scale from 1-5)
iuipc_score: Internet Users’ Information Privacy Concerns score (continuous scale)
usage_frequency: Frequency of chatbot usage (Rarely, Sometimes, Frequently)
age: Participant’s age in years
technical_background: Whether the participant has a technical background (Yes/No)

Quantitative Data Analysi:s

For each hypothesis below, you must:

Specify the statistical test you are running and justify why it is appropriate.
Test all assumptions of your chosen test and include screenshots of your assumption checks.
Run the statistical test and include a screenshot of your R output.
Interpret the results in plain language, addressing:
- Whether the null hypothesis is rejected or not
- What this means in the context of our study

Hypothesis 1: Our sample’s mean post-trust score toward chatbots is significantly different from the national mean score of trust toward AI (3.2).

Hypothesis 2: The mean IUIPC score of participants who have a technical background is significantly higher than those with no technical background.

Hypothesis 3: Our transparency label significantly increases users’ level of trust toward the chatbot they use.

Hypothesis 4: For participants 1 through 20 only, our transparency label significantly increases users’ level of trust toward the chatbot they use.

Hypothesis 5: There is a significant difference in participants’ IUIPC scores based on their usage frequency of AI chatbots. If the main test shows significant differences, conduct post-hoc tests and explain which specific pairs of usage frequency groups have significant differences in mean IUIPC scores?

Qualitative Data Analysis: For qualitative analysis, go to the qualitative dataset posted on Ed, where you can find open-ended responses from 50 participants about the impact of our developed label on their level of trust toward AI chatbot. Qualitatively analyze the reasons either behind low trust or high trust. Identify 3-5 categories for your codebook (e.g., “Privacy Concerns,” “lack of trust toward companies”). Take 10-20 responses, create a codebook and calculate Cohen’s Kappa among two coders. Each response should receive only ONE code from each coder. This is necessary for Cohen’s Kappa calculation.

Introduction to Quantitative Analysis in HCI Using R

This notebook provides a comprehensive guide to conducting quantitative analysis in Human-Computer Interaction (HCI) research using the R programming language.

Download R and RStudio

Before beginning any analysis, you need to install the required software on your computer:

R: The statistical computing language and environment
RStudio: An integrated development environment (IDE) that makes R easier to use

To install both programs:

Visit this website
First, download and install R for your operating system
Then, download and install RStudio Desktop (free version)
Launch RStudio to begin working with R. Go to File -> New File -> R Script
To run any chunk of your code, select the lines and then press Ctrl + Enter (or Cmd + Return) or use the Run button.

Installing Required Packages

R packages extend the basic functionality of R. We need to load several packages for our quantitative analysis work:

# Install packages (only need to do this once)
install.packages("ordinal")
install.packages("simr")
install.packages("lme4")
install.packages("readr")
install.packages("knitr")
install.packages("ggplot2")
install.packages("ggpubr")
install.packages("dplyr")
install.packages("agricolae")
install.packages("pwrss")
install.packages("car")
install.packages("olsrr")
install.packages("DescTools")

# Load installed packages (do this every time you start R)
library(ordinal)
library(simr)
library(lme4)
library(readr)
library(knitr)
library(ggplot2)
library(ggpubr)
library(dplyr)
library(agricolae)
library(pwrss)
library(car)
library(olsrr)
library(DescTools)

After you load the libraries ones, do not install the packages or load them again in the same coding session. Run the code that comes after these packages.

Opening the Dataset

Method 1: Same Directory (Recommended)

Step 1: Place your dataset file in the same folder as this R Script

Step 2: Set your working directory to the source file location: - Click on Session in the RStudio menu bar - Select Set Working Directory - Choose To Source File Location

Step 3: Load your data using just the filename:

# Load the dataset from the same directory
dataset <- read.csv("Quantitative_CPS226_Fall25_data.csv")

Method 2: Interactive File Selection

# Two-step process
file_path <- file.choose()
print(paste("Path of the dataset is:", file_path))
dataset <- read.csv(file_path)

# One-step process
dataset <- read.csv(file.choose())

Statistical Tests Reference

Checking Assumptions

Outlier Detection

# Boxplot for visual inspection
ggplot(data.frame(scores = dataset$variable_name, group = "Group"), 
       aes(x = group, y = scores)) +
  geom_boxplot(alpha = 0.7) +
  geom_jitter(width = 0.2, alpha = 0.6, color = "red") +
  labs(title = "Boxplot with Individual Points", y = "Score", x = "")

# Numerical identification of outliers
Q1 <- quantile(dataset$variable_name, 0.25)
Q3 <- quantile(dataset$variable_name, 0.75)
IQR <- Q3 - Q1
lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR
outliers <- dataset$variable_name[dataset$variable_name < lower_bound | 
                                   dataset$variable_name > upper_bound]
print(paste("Number of outliers:", length(outliers)))

Normality Tests

# Shapiro-Wilk test
shapiro.test(dataset$variable_name)

# Histogram
ggplot(data.frame(scores = dataset$variable_name), aes(x = scores)) +
  geom_histogram(fill = "lightblue", color = "black", binwidth = 0.1) +
  labs(title = "Distribution", x = "Score", y = "Frequency") +
  theme_minimal()

# Q-Q Plot
ggplot(data.frame(scores = dataset$variable_name), aes(sample = scores)) +
  stat_qq() +
  stat_qq_line(color = "red") +
  labs(title = "Q-Q Plot", x = "Theoretical Quantiles", y = "Sample Quantiles") +
  theme_minimal()

Homogeneity of Variance

# Levene's Test
leveneTest(dependent_variable ~ grouping_variable, data = dataset)

T-Tests

One-Sample T-Test

# General form
t.test(dataset$variable_name, mu = hypothesized_mean)

Independent-Sample T-Test

# General form
t.test(dependent_variable ~ grouping_variable, data = dataset)

Paired-Sample T-Test

# General form
t.test(dataset$variable1, dataset$variable2, paired = TRUE)

ANOVA

One-Way ANOVA

# General form
anova_model <- aov(dependent_variable ~ independent_variable, data = dataset)
summary(anova_model)

Post-Hoc Tests

# Tukey's HSD Test
TukeyHSD(anova_model)

# Dunnett's Test
DunnettTest(dependent_variable ~ grouping_variable, data = dataset, control = "control_group")

# Pairwise comparisons with corrections
pairwise.t.test(dataset$dependent_variable, dataset$grouping_variable, 
                p.adjust.method = "bonferroni")

pairwise.t.test(dataset$dependent_variable, dataset$grouping_variable, 
                p.adjust.method = "holm")

Non-Parametric Tests

# Mann-Whitney U test (Independent samples)
wilcox.test(dependent_variable ~ grouping_variable, data = dataset)

# Wilcoxon signed-rank test (Paired samples)
wilcox.test(dataset$variable1, dataset$variable2, paired = TRUE)

# Kruskal-Wallis test (3+ independent groups)
kruskal.test(dependent_variable ~ grouping_variable, data = dataset)

# Friedman test (Repeated measures)
friedman.test(as.matrix(dataset[, c("measure1", "measure2", "measure3")]))

Data Filtering

# Filter dataset by condition
subset_data <- dataset[dataset$variable_name == "condition", ]

CPS226_Fall’25

Pardis Emami-Naeini

2025-11-07