We conducted a survey study to understand how a usable consumer-facing transparency “nutrition” label for AI chatbots impacts users’ trust toward the chatbot.
Study Design:
Variables Collected:
participant_id: Unique identifier for each
participantchatbot: Which chatbot the participant was assigned to
(ChatGPT or Gemini)ai_trust_level: General trust toward AI (ordinal
scale)pre_trust_score: Baseline level of trust toward the
assigned chatbot before viewing our label (continuous scale from
1-5)post_trust_score: Level of trust toward the assigned
chatbot after viewing our label (continuous scale from 1-5)iuipc_score: Internet Users’ Information Privacy
Concerns score (continuous scale)usage_frequency: Frequency of chatbot usage (Rarely,
Sometimes, Frequently)age: Participant’s age in yearstechnical_background: Whether the participant has a
technical background (Yes/No)For each hypothesis below, you must:
Hypothesis 1: Our sample’s mean post-trust score toward chatbots is significantly different from the national mean score of trust toward AI (3.2).
Hypothesis 2: The mean IUIPC score of participants who have a technical background is significantly higher than those with no technical background.
Hypothesis 3: Our transparency label significantly increases users’ level of trust toward the chatbot they use.
Hypothesis 4: For participants 1 through 20 only, our transparency label significantly increases users’ level of trust toward the chatbot they use.
Hypothesis 5: There is a significant difference in participants’ IUIPC scores based on their usage frequency of AI chatbots. If the main test shows significant differences, conduct post-hoc tests and explain which specific pairs of usage frequency groups have significant differences in mean IUIPC scores.
This notebook provides a comprehensive guide to conducting quantitative analysis in Human-Computer Interaction (HCI) research using the R programming language.
Before beginning any analysis, you need to install the required software on your computer:
To install both programs:
R packages extend the basic functionality of R. We need to load several packages for our quantitative analysis work:
# Install packages (only need to do this once)
install.packages("ordinal") # For ordinal logistic regression
install.packages("simr") # For power analysis and simulation
install.packages("lme4") # For linear and generalized linear mixed-effects models
install.packages("readr") # For reading various data file formats
install.packages("knitr") # For creating formatted tables and reports
install.packages("ggplot2") # For creating high-quality data visualizations
install.packages("ggpubr") # For publication-ready statistical plots
install.packages("dplyr") # For data manipulation and transformation
install.packages("agricolae") # For agricultural and experimental design statistics
install.packages("pwrss") # For statistical power analysis
install.packages("car") # For various statistical tests
install.packages("olsrr") # For regression model diagnostics and validation
install.packages("DescTools") # For statistical tests, including post hoc tests
# Load installed packages (do this every time you start R)
library(ordinal)
library(simr)
library(lme4)
library(readr)
library(knitr)
library(ggplot2)
library(ggpubr)
library(dplyr)
library(agricolae)
library(pwrss)
library(car)
library(olsrr)
library(DescTools)
Note: If you encounter errors about missing
packages, install them first using:
install.packages("package_name")
There are two main approaches to loading your dataset into R:
Step 1: Place your dataset file in the same folder as this R Script
Step 2: Set your working directory to the source file location: - Click on Session in the RStudio menu bar - Select Set Working Directory - Choose To Source File Location
Step 3: Load your data using just the filename:
# Load the dataset from the same directory
dataset <- read.csv("Quantitative_CPS226_Fall25_data.csv")
If your dataset is located elsewhere, use the interactive file chooser:
# Method 2a: Two-step process
file_path <- file.choose()
print(paste("Path of the dataset is:", file_path))
dataset <- read.csv(file_path)
# Method 2b: One-step process
dataset <- read.csv(file.choose())
Note: The file.choose() function opens
a dialog box where you can navigate to and select your data file.
Before conducting any statistical analysis, it is essential to examine your data structure and variable types. This ensures your data is properly formatted and ready for analysis.
The most comprehensive way to inspect your dataset structure:
# Display complete structure of the dataset
str(dataset)
## 'data.frame': 100 obs. of 9 variables:
## $ participant_id : int 1 2 3 4 5 6 7 8 9 10 ...
## $ age : int 32 28 45 34 52 29 38 31 46 35 ...
## $ technical_background: chr "Yes" "No" "Yes" "No" ...
## $ chatbot_type : chr "ChatGPT" "Gemini" "ChatGPT" "ChatGPT" ...
## $ iuipc_score : num 2.1 3.8 2.9 3.2 3.5 2.4 3.9 2.6 3.1 3.3 ...
## $ ai_trust_level : chr "High trust" "Little trust" "Moderate trust" "Moderate trust" ...
## $ pre_trust_score : num 2.8 4.2 3.1 3.5 3.8 2.9 4.1 3 3.4 3.6 ...
## $ post_trust_score : num 3.4 3.6 3.2 3.1 3.3 3.6 3.7 3.8 3 3.1 ...
## $ usage_frequency : chr "Daily" "Weekly" "Daily" "Monthly" ...
# Alternative view (if dplyr is loaded)
glimpse(dataset)
## Rows: 100
## Columns: 9
## $ participant_id <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15…
## $ age <int> 32, 28, 45, 34, 52, 29, 38, 31, 46, 35, 27, 41, 3…
## $ technical_background <chr> "Yes", "No", "Yes", "No", "Yes", "Yes", "No", "Ye…
## $ chatbot_type <chr> "ChatGPT", "Gemini", "ChatGPT", "ChatGPT", "Gemin…
## $ iuipc_score <dbl> 2.1, 3.8, 2.9, 3.2, 3.5, 2.4, 3.9, 2.6, 3.1, 3.3,…
## $ ai_trust_level <chr> "High trust", "Little trust", "Moderate trust", "…
## $ pre_trust_score <dbl> 2.8, 4.2, 3.1, 3.5, 3.8, 2.9, 4.1, 3.0, 3.4, 3.6,…
## $ post_trust_score <dbl> 3.4, 3.6, 3.2, 3.1, 3.3, 3.6, 3.7, 3.8, 3.0, 3.1,…
## $ usage_frequency <chr> "Daily", "Weekly", "Daily", "Monthly", "Weekly", …
For detailed examination of individual variables:
# Check data types for key variables
cat("Type of 'age' variable:", typeof(dataset$age), "\n")
## Type of 'age' variable: integer
cat("Type of 'ai_trust_level' variable:", typeof(dataset$ai_trust_level), "\n")
## Type of 'ai_trust_level' variable: character
Questions to Consider:
Get an overview of your data distribution:
# Basic summary statistics for all variables
summary(dataset)
## participant_id age technical_background chatbot_type
## Min. : 1.00 Min. :26.00 Length:100 Length:100
## 1st Qu.: 25.75 1st Qu.:32.00 Class :character Class :character
## Median : 50.50 Median :37.00 Mode :character Mode :character
## Mean : 50.50 Mean :38.21
## 3rd Qu.: 75.25 3rd Qu.:44.25
## Max. :100.00 Max. :54.00
## iuipc_score ai_trust_level pre_trust_score post_trust_score
## Min. :2.100 Length:100 Min. :2.600 Min. :2.900
## 1st Qu.:2.700 Class :character 1st Qu.:3.000 1st Qu.:3.300
## Median :3.100 Mode :character Median :3.400 Median :3.600
## Mean :3.146 Mean :3.484 Mean :3.593
## 3rd Qu.:3.600 3rd Qu.:3.900 3rd Qu.:3.900
## Max. :4.300 Max. :4.600 Max. :4.300
## usage_frequency
## Length:100
## Class :character
## Mode :character
##
##
##
# Display the first 10 rows
head(dataset, 10)
## participant_id age technical_background chatbot_type iuipc_score
## 1 1 32 Yes ChatGPT 2.1
## 2 2 28 No Gemini 3.8
## 3 3 45 Yes ChatGPT 2.9
## 4 4 34 No ChatGPT 3.2
## 5 5 52 Yes Gemini 3.5
## 6 6 29 Yes ChatGPT 2.4
## 7 7 38 No Gemini 3.9
## 8 8 31 Yes ChatGPT 2.6
## 9 9 46 No ChatGPT 3.1
## 10 10 35 Yes Gemini 3.3
## ai_trust_level pre_trust_score post_trust_score usage_frequency
## 1 High trust 2.8 3.4 Daily
## 2 Little trust 4.2 3.6 Weekly
## 3 Moderate trust 3.1 3.2 Daily
## 4 Moderate trust 3.5 3.1 Monthly
## 5 Little trust 3.8 3.3 Weekly
## 6 High trust 2.9 3.6 Daily
## 7 Little trust 4.1 3.7 Rarely
## 8 High trust 3.0 3.8 Daily
## 9 Moderate trust 3.4 3.0 Weekly
## 10 Moderate trust 3.6 3.1 Monthly
# Check for missing values
colSums(is.na(dataset))
## participant_id age technical_background
## 0 0 0
## chatbot_type iuipc_score ai_trust_level
## 0 0 0
## pre_trust_score post_trust_score usage_frequency
## 0 0 0
This shows whether we have any missing values in our dataset.
We can use a variety of methods to understand what is going on in our dataset:
# Measures of Central Tendency
mean(dataset$age) # Average age
## [1] 38.21
median(dataset$age) # Middle value
## [1] 37
# Function to find the mode
age_mode <- function(x) {
frequency_of_age <- tabulate(x)
mode_position <- which.max(frequency_of_age)
return(mode_position)
}
cat("Mode of age is:", age_mode(dataset$age), "\n")
## Mode of age is: 29
# Measures of Spread
sd(dataset$age) # Standard deviation
## [1] 7.767187
var(dataset$age) # Variance
## [1] 60.32919
range(dataset$age) # Min and max
## [1] 26 54
IQR(dataset$age) # Interquartile range
## [1] 12.25
We can use various statistical approaches to test whether observed differences between groups are statistically significant.
A one-sample t-test tests whether the sample mean differs significantly from a known population value.
# Check for outliers: Create boxplot
df <- data.frame(scores = dataset$post_trust_score, group = "Trust")
ggplot(df, aes(x = group, y = scores)) +
geom_boxplot(alpha = 0.7) +
geom_jitter(width = 0.2, alpha = 0.6, color = "red") +
labs(title = "Post-Trust Scores: Boxplot with Individual Points",
y = "Trust Score", x = "")
# Identify outliers numerically
Q1 <- quantile(dataset$post_trust_score, 0.25)
Q3 <- quantile(dataset$post_trust_score, 0.75)
IQR <- Q3 - Q1
lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR
outliers <- dataset$post_trust_score[dataset$post_trust_score < lower_bound |
dataset$post_trust_score > upper_bound]
print(paste("Number of outliers:", length(outliers)))
## [1] "Number of outliers: 0"
if(length(outliers) > 0) {
print(paste("Outlier values:", paste(outliers, collapse = ", ")))
}
# Check for normality: Shapiro-Wilk test
shapiro.test(dataset$post_trust_score)
##
## Shapiro-Wilk normality test
##
## data: dataset$post_trust_score
## W = 0.9697, p-value = 0.02097
The Shapiro-Wilk test indicates whether the data deviate from normality. However, with a sample size greater than 30, the t-test remains robust due to the Central Limit Theorem, which ensures the sampling distribution of means is approximately normal regardless of the underlying data distribution.
# Check for normality: Histogram
ggplot(df, aes(x = scores)) +
geom_histogram(fill = "lightblue", color = "black", binwidth = 0.1) +
labs(title = "Distribution of Post-Trust Scores",
x = "Post-Trust Score",
y = "Frequency") +
theme_minimal()
# Check for normality: Q-Q Plot
ggplot(df, aes(sample = scores)) +
stat_qq() +
stat_qq_line(color = "red") +
labs(title = "Q-Q Plot: Post-Trust Scores",
x = "Theoretical Quantiles",
y = "Sample Quantiles") +
theme_minimal()
# Perform one-sample t-test
# General form: t.test(variable, mu = hypothesized_mean)
# Example: Test if post-trust score differs from neutral value of 3
t.test(dataset$post_trust_score, mu = 3)
##
## One Sample t-test
##
## data: dataset$post_trust_score
## t = 16.139, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 3
## 95 percent confidence interval:
## 3.520095 3.665905
## sample estimates:
## mean of x
## 3.593
An independent-sample t-test compares the means of two independent groups.
# General form: t.test(variable ~ group, data = dataset)
# Method 1: Using subsetting
t.test(dataset[which(dataset$technical_background == "Yes"), ]$iuipc_score,
dataset[which(dataset$technical_background == "No"), ]$iuipc_score)
##
## Welch Two Sample t-test
##
## data: dataset[which(dataset$technical_background == "Yes"), ]$iuipc_score and dataset[which(dataset$technical_background == "No"), ]$iuipc_score
## t = -4.1429, df = 97.782, p-value = 7.291e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.6596687 -0.2323681
## sample estimates:
## mean of x mean of y
## 2.927451 3.373469
# Method 2: Using formula notation (recommended)
t.test(iuipc_score ~ technical_background, data = dataset)
##
## Welch Two Sample t-test
##
## data: iuipc_score by technical_background
## t = 4.1429, df = 97.782, p-value = 7.291e-05
## alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
## 95 percent confidence interval:
## 0.2323681 0.6596687
## sample estimates:
## mean in group No mean in group Yes
## 3.373469 2.927451
Assumptions:
We have already covered the code to check assumptions 1 and 2. Now let’s look at how to evaluate assumption 3.
# Levene's Test for Homogeneity of Variance
# General form: leveneTest(variable ~ group, data = dataset)
leveneTest(iuipc_score ~ technical_background, data = dataset)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 0.1948 0.6599
## 98
If the p-value is greater than 0.05, we do not have enough evidence to reject the null hypothesis. Therefore, we can say that the variances of these two groups are similar, and the equal variance assumption is met.
A paired-sample t-test compares two related measurements (e.g., pre-test and post-test scores).
# General form: t.test(variable1, variable2, paired = TRUE)
t.test(dataset$pre_trust_score, dataset$post_trust_score, paired = TRUE)
##
## Paired t-test
##
## data: dataset$pre_trust_score and dataset$post_trust_score
## t = -1.4693, df = 99, p-value = 0.1449
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
## -0.25619657 0.03819657
## sample estimates:
## mean difference
## -0.109
T-tests can only be applied when we compare two conditions. If we have three or more conditions to compare, we need to use ANOVA.
Let’s test whether there is a difference in participants’ IUIPC scores based on their usage frequency of AI.
# General form: aov(dependent_variable ~ independent_variable, data = dataset)
anova_model <- aov(iuipc_score ~ usage_frequency, data = dataset)
summary(anova_model)
## Df Sum Sq Mean Sq F value Pr(>F)
## usage_frequency 3 26.046 8.682 114.1 <2e-16 ***
## Residuals 96 7.302 0.076
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The ANOVA tells us whether there is a significant difference among the groups, but it does not tell us which specific groups differ from each other. To identify these differences, we need to conduct post-hoc analyses.
We have three main options for post-hoc analysis:
# Tukey's HSD Test
TukeyHSD(anova_model)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = iuipc_score ~ usage_frequency, data = dataset)
##
## $usage_frequency
## diff lwr upr p adj
## Monthly-Daily 0.9237548 0.7073760 1.1401336 0.00e+00
## Rarely-Daily 1.4031199 1.1964975 1.6097422 0.00e+00
## Weekly-Daily 0.5136853 0.3288045 0.6985662 0.00e+00
## Rarely-Monthly 0.4793651 0.2477390 0.7109911 2.70e-06
## Weekly-Monthly -0.4100694 -0.6225283 -0.1976106 1.26e-05
## Weekly-Rarely -0.8894345 -1.0919482 -0.6869209 0.00e+00
# Dunnett's Test
# General form: DunnettTest(dependent_variable ~ group_variable, data = dataset, control = "control_group")
DunnettTest(iuipc_score ~ usage_frequency, data = dataset, control = "Rarely")
##
## Dunnett's test for comparing several treatments with a control :
## 95% family-wise confidence level
##
## $Rarely
## diff lwr.ci upr.ci pval
## Daily-Rarely -1.4031199 -1.5909608 -1.2152789 < 2e-16 ***
## Monthly-Rarely -0.4793651 -0.6899369 -0.2687932 8.3e-07 ***
## Weekly-Rarely -0.8894345 -1.0735402 -0.7053288 < 2e-16 ***
##
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Multiple Comparisons with Different Corrections
# 1. No correction
pairwise.t.test(dataset$iuipc_score, dataset$usage_frequency,
p.adjust.method = "none")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: dataset$iuipc_score and dataset$usage_frequency
##
## Daily Monthly Rarely
## Monthly < 2e-16 - -
## Rarely < 2e-16 4.6e-07 -
## Weekly 9.9e-11 2.1e-06 < 2e-16
##
## P value adjustment method: none
# 2. Bonferroni correction (most conservative)
# Formula: p_adjusted = p_raw × number_of_comparisons
pairwise.t.test(dataset$iuipc_score, dataset$usage_frequency,
p.adjust.method = "bonferroni")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: dataset$iuipc_score and dataset$usage_frequency
##
## Daily Monthly Rarely
## Monthly < 2e-16 - -
## Rarely < 2e-16 2.8e-06 -
## Weekly 5.9e-10 1.3e-05 < 2e-16
##
## P value adjustment method: bonferroni
# 3. Holm correction (less conservative than Bonferroni)
# Adjusts p-values sequentially based on ranking
pairwise.t.test(dataset$iuipc_score, dataset$usage_frequency,
p.adjust.method = "holm")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: dataset$iuipc_score and dataset$usage_frequency
##
## Daily Monthly Rarely
## Monthly < 2e-16 - -
## Rarely < 2e-16 9.2e-07 -
## Weekly 3.0e-10 2.1e-06 < 2e-16
##
## P value adjustment method: holm
Non-parametric tests are used when the assumptions of parametric tests (e.g., normality) are violated. Here are the non-parametric alternatives to common parametric tests:
| Parametric Test | Non-Parametric Alternative |
|---|---|
| Independent sample t-test | Mann-Whitney U test |
| Paired t-test | Wilcoxon signed-rank test |
| One-way ANOVA (independent groups) | Kruskal-Wallis test |
| Repeated measures ANOVA | Friedman test |
# (1) Independent sample t-test → Mann-Whitney U test
# General form: wilcox.test(dependent_var ~ grouping_var, data = dataset)
wilcox.test(iuipc_score ~ technical_background, data = dataset)
# (2) Paired t-test → Wilcoxon signed-rank test
# General form: wilcox.test(measure1, measure2, paired = TRUE)
wilcox.test(dataset[dataset$participant_id < 21, ]$pre_trust_score, dataset[dataset$participant_id < 21, ]$post_trust_score, paired = TRUE)
# (3) Repeated measures one-way ANOVA → Friedman test
# General form: friedman.test(as.matrix(dataset[, c("condition1", "condition2", "condition3")]))
# Example with three repeated measures
friedman.test(as.matrix(dataset[, c("measure1", "measure2", "measure3")]))
# (4) One-way ANOVA on 3+ independent groups → Kruskal-Wallis test
# General form: kruskal.test(dependent_var ~ grouping_var, data = dataset)
kruskal.test(iuipc_score ~ usage_frequency, data = dataset)
##
## Kruskal-Wallis rank sum test
##
## data: iuipc_score by usage_frequency
## Kruskal-Wallis chi-squared = 76.824, df = 3, p-value < 2.2e-16