Lab Activity

We conducted a survey study to understand how a usable consumer-facing transparency “nutrition” label for AI chatbots impacts users’ trust toward the chatbot.

Study Design:

  • We recruited 100 participants from Prolific.
  • Participants were users of both ChatGPT and Gemini.
  • Each participant was randomly assigned to either the ChatGPT condition or the Gemini condition.
  • Participants viewed our developed transparency label for their assigned chatbot.

Variables Collected:

  • participant_id: Unique identifier for each participant
  • chatbot: Which chatbot the participant was assigned to (ChatGPT or Gemini)
  • ai_trust_level: General trust toward AI (ordinal scale)
  • pre_trust_score: Baseline level of trust toward the assigned chatbot before viewing our label (continuous scale from 1-5)
  • post_trust_score: Level of trust toward the assigned chatbot after viewing our label (continuous scale from 1-5)
  • iuipc_score: Internet Users’ Information Privacy Concerns score (continuous scale)
  • usage_frequency: Frequency of chatbot usage (Rarely, Sometimes, Frequently)
  • age: Participant’s age in years
  • technical_background: Whether the participant has a technical background (Yes/No)

Lab Activity Instructions

For each hypothesis below, you must:

  1. Specify the statistical test you are running and justify why it is appropriate.
  2. Test all assumptions of your chosen test and include screenshots of your assumption checks.
  3. Run the statistical test and include a screenshot of your R output.
  4. Interpret the results in plain language, addressing:
    • Whether the null hypothesis is rejected or not
    • What this means in the context of our study

Hypothesis 1: Our sample’s mean post-trust score toward chatbots is significantly different from the national mean score of trust toward AI (3.2).

Hypothesis 2: The mean IUIPC score of participants who have a technical background is significantly higher than those with no technical background.

Hypothesis 3: Our transparency label significantly increases users’ level of trust toward the chatbot they use.

Hypothesis 4: For participants 1 through 20 only, our transparency label significantly increases users’ level of trust toward the chatbot they use.

Hypothesis 5: There is a significant difference in participants’ IUIPC scores based on their usage frequency of AI chatbots. If the main test shows significant differences, conduct post-hoc tests and explain which specific pairs of usage frequency groups have significant differences in mean IUIPC scores.


Introduction to Quantitative Analysis in HCI Using R

This notebook provides a comprehensive guide to conducting quantitative analysis in Human-Computer Interaction (HCI) research using the R programming language.

Download R and RStudio

Before beginning any analysis, you need to install the required software on your computer:

  • R: The statistical computing language and environment
  • RStudio: An integrated development environment (IDE) that makes R easier to use

To install both programs:

  1. Visit this website
  2. First, download and install R for your operating system
  3. Then, download and install RStudio Desktop (free version)
  4. Launch RStudio to begin working with R

Installing Required Packages

R packages extend the basic functionality of R. We need to load several packages for our quantitative analysis work:

# Install packages (only need to do this once)
install.packages("ordinal")    # For ordinal logistic regression
install.packages("simr")       # For power analysis and simulation
install.packages("lme4")       # For linear and generalized linear mixed-effects models
install.packages("readr")      # For reading various data file formats
install.packages("knitr")      # For creating formatted tables and reports
install.packages("ggplot2")    # For creating high-quality data visualizations
install.packages("ggpubr")     # For publication-ready statistical plots
install.packages("dplyr")      # For data manipulation and transformation
install.packages("agricolae")  # For agricultural and experimental design statistics
install.packages("pwrss")      # For statistical power analysis
install.packages("car")        # For various statistical tests
install.packages("olsrr")      # For regression model diagnostics and validation
install.packages("DescTools")  # For statistical tests, including post hoc tests
# Load installed packages (do this every time you start R)
library(ordinal)
library(simr)
library(lme4)
library(readr)
library(knitr)
library(ggplot2)
library(ggpubr)
library(dplyr)
library(agricolae)
library(pwrss)
library(car)
library(olsrr)
library(DescTools)

Note: If you encounter errors about missing packages, install them first using: install.packages("package_name")

Opening the Dataset

There are two main approaches to loading your dataset into R:

Method 2: Interactive File Selection

If your dataset is located elsewhere, use the interactive file chooser:

# Method 2a: Two-step process
file_path <- file.choose()
print(paste("Path of the dataset is:", file_path))
dataset <- read.csv(file_path)

# Method 2b: One-step process
dataset <- read.csv(file.choose())

Note: The file.choose() function opens a dialog box where you can navigate to and select your data file.

Inspecting Your Data

Before conducting any statistical analysis, it is essential to examine your data structure and variable types. This ensures your data is properly formatted and ready for analysis.

Check All Variable Types

The most comprehensive way to inspect your dataset structure:

# Display complete structure of the dataset
str(dataset)
## 'data.frame':    100 obs. of  9 variables:
##  $ participant_id      : int  1 2 3 4 5 6 7 8 9 10 ...
##  $ age                 : int  32 28 45 34 52 29 38 31 46 35 ...
##  $ technical_background: chr  "Yes" "No" "Yes" "No" ...
##  $ chatbot_type        : chr  "ChatGPT" "Gemini" "ChatGPT" "ChatGPT" ...
##  $ iuipc_score         : num  2.1 3.8 2.9 3.2 3.5 2.4 3.9 2.6 3.1 3.3 ...
##  $ ai_trust_level      : chr  "High trust" "Little trust" "Moderate trust" "Moderate trust" ...
##  $ pre_trust_score     : num  2.8 4.2 3.1 3.5 3.8 2.9 4.1 3 3.4 3.6 ...
##  $ post_trust_score    : num  3.4 3.6 3.2 3.1 3.3 3.6 3.7 3.8 3 3.1 ...
##  $ usage_frequency     : chr  "Daily" "Weekly" "Daily" "Monthly" ...
# Alternative view (if dplyr is loaded)
glimpse(dataset)
## Rows: 100
## Columns: 9
## $ participant_id       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15…
## $ age                  <int> 32, 28, 45, 34, 52, 29, 38, 31, 46, 35, 27, 41, 3…
## $ technical_background <chr> "Yes", "No", "Yes", "No", "Yes", "Yes", "No", "Ye…
## $ chatbot_type         <chr> "ChatGPT", "Gemini", "ChatGPT", "ChatGPT", "Gemin…
## $ iuipc_score          <dbl> 2.1, 3.8, 2.9, 3.2, 3.5, 2.4, 3.9, 2.6, 3.1, 3.3,…
## $ ai_trust_level       <chr> "High trust", "Little trust", "Moderate trust", "…
## $ pre_trust_score      <dbl> 2.8, 4.2, 3.1, 3.5, 3.8, 2.9, 4.1, 3.0, 3.4, 3.6,…
## $ post_trust_score     <dbl> 3.4, 3.6, 3.2, 3.1, 3.3, 3.6, 3.7, 3.8, 3.0, 3.1,…
## $ usage_frequency      <chr> "Daily", "Weekly", "Daily", "Monthly", "Weekly", …

Check Specific Variable Types

For detailed examination of individual variables:

# Check data types for key variables
cat("Type of 'age' variable:", typeof(dataset$age), "\n")
## Type of 'age' variable: integer
cat("Type of 'ai_trust_level' variable:", typeof(dataset$ai_trust_level), "\n")
## Type of 'ai_trust_level' variable: character

Questions to Consider:

  • In class, we talked about two types of quantitative data: interval or ratio.
    • What is the type of age?
    • What is the type of IUIPC?
  • In class, we talked about two types of qualitative data: nominal or ordinal
    • What is the type of trust in AI?
    • What is the type of chatbot?

Summary Statistics

Get an overview of your data distribution:

# Basic summary statistics for all variables
summary(dataset)
##  participant_id        age        technical_background chatbot_type      
##  Min.   :  1.00   Min.   :26.00   Length:100           Length:100        
##  1st Qu.: 25.75   1st Qu.:32.00   Class :character     Class :character  
##  Median : 50.50   Median :37.00   Mode  :character     Mode  :character  
##  Mean   : 50.50   Mean   :38.21                                          
##  3rd Qu.: 75.25   3rd Qu.:44.25                                          
##  Max.   :100.00   Max.   :54.00                                          
##   iuipc_score    ai_trust_level     pre_trust_score post_trust_score
##  Min.   :2.100   Length:100         Min.   :2.600   Min.   :2.900   
##  1st Qu.:2.700   Class :character   1st Qu.:3.000   1st Qu.:3.300   
##  Median :3.100   Mode  :character   Median :3.400   Median :3.600   
##  Mean   :3.146                      Mean   :3.484   Mean   :3.593   
##  3rd Qu.:3.600                      3rd Qu.:3.900   3rd Qu.:3.900   
##  Max.   :4.300                      Max.   :4.600   Max.   :4.300   
##  usage_frequency   
##  Length:100        
##  Class :character  
##  Mode  :character  
##                    
##                    
## 
# Display the first 10 rows
head(dataset, 10)
##    participant_id age technical_background chatbot_type iuipc_score
## 1               1  32                  Yes      ChatGPT         2.1
## 2               2  28                   No       Gemini         3.8
## 3               3  45                  Yes      ChatGPT         2.9
## 4               4  34                   No      ChatGPT         3.2
## 5               5  52                  Yes       Gemini         3.5
## 6               6  29                  Yes      ChatGPT         2.4
## 7               7  38                   No       Gemini         3.9
## 8               8  31                  Yes      ChatGPT         2.6
## 9               9  46                   No      ChatGPT         3.1
## 10             10  35                  Yes       Gemini         3.3
##    ai_trust_level pre_trust_score post_trust_score usage_frequency
## 1      High trust             2.8              3.4           Daily
## 2    Little trust             4.2              3.6          Weekly
## 3  Moderate trust             3.1              3.2           Daily
## 4  Moderate trust             3.5              3.1         Monthly
## 5    Little trust             3.8              3.3          Weekly
## 6      High trust             2.9              3.6           Daily
## 7    Little trust             4.1              3.7          Rarely
## 8      High trust             3.0              3.8           Daily
## 9  Moderate trust             3.4              3.0          Weekly
## 10 Moderate trust             3.6              3.1         Monthly

Check for Data Quality Issues

# Check for missing values
colSums(is.na(dataset))
##       participant_id                  age technical_background 
##                    0                    0                    0 
##         chatbot_type          iuipc_score       ai_trust_level 
##                    0                    0                    0 
##      pre_trust_score     post_trust_score      usage_frequency 
##                    0                    0                    0

This shows whether we have any missing values in our dataset.

Descriptive Statistics

We can use a variety of methods to understand what is going on in our dataset:

# Measures of Central Tendency
mean(dataset$age)        # Average age
## [1] 38.21
median(dataset$age)      # Middle value
## [1] 37
# Function to find the mode
age_mode <- function(x) { 
  frequency_of_age <- tabulate(x)
  mode_position <- which.max(frequency_of_age)
  return(mode_position)
}

cat("Mode of age is:", age_mode(dataset$age), "\n")
## Mode of age is: 29
# Measures of Spread  
sd(dataset$age)          # Standard deviation
## [1] 7.767187
var(dataset$age)         # Variance
## [1] 60.32919
range(dataset$age)       # Min and max
## [1] 26 54
IQR(dataset$age)         # Interquartile range
## [1] 12.25

Tests of Differences

We can use various statistical approaches to test whether observed differences between groups are statistically significant.

T-Test

One-Sample T-Test

A one-sample t-test tests whether the sample mean differs significantly from a known population value.

# Check for outliers: Create boxplot
df <- data.frame(scores = dataset$post_trust_score, group = "Trust")

ggplot(df, aes(x = group, y = scores)) +
  geom_boxplot(alpha = 0.7) +
  geom_jitter(width = 0.2, alpha = 0.6, color = "red") +
  labs(title = "Post-Trust Scores: Boxplot with Individual Points",
       y = "Trust Score", x = "")

# Identify outliers numerically
Q1 <- quantile(dataset$post_trust_score, 0.25)
Q3 <- quantile(dataset$post_trust_score, 0.75)
IQR <- Q3 - Q1

lower_bound <- Q1 - 1.5 * IQR
upper_bound <- Q3 + 1.5 * IQR

outliers <- dataset$post_trust_score[dataset$post_trust_score < lower_bound | 
                                     dataset$post_trust_score > upper_bound]
print(paste("Number of outliers:", length(outliers)))
## [1] "Number of outliers: 0"
if(length(outliers) > 0) {
  print(paste("Outlier values:", paste(outliers, collapse = ", ")))
}

# Check for normality: Shapiro-Wilk test
shapiro.test(dataset$post_trust_score)
## 
##  Shapiro-Wilk normality test
## 
## data:  dataset$post_trust_score
## W = 0.9697, p-value = 0.02097

The Shapiro-Wilk test indicates whether the data deviate from normality. However, with a sample size greater than 30, the t-test remains robust due to the Central Limit Theorem, which ensures the sampling distribution of means is approximately normal regardless of the underlying data distribution.

# Check for normality: Histogram
ggplot(df, aes(x = scores)) +
  geom_histogram(fill = "lightblue", color = "black", binwidth = 0.1) +
  labs(title = "Distribution of Post-Trust Scores",
       x = "Post-Trust Score",
       y = "Frequency") +
  theme_minimal()

# Check for normality: Q-Q Plot
ggplot(df, aes(sample = scores)) +
  stat_qq() +
  stat_qq_line(color = "red") +
  labs(title = "Q-Q Plot: Post-Trust Scores",
       x = "Theoretical Quantiles",
       y = "Sample Quantiles") +
  theme_minimal()

# Perform one-sample t-test
# General form: t.test(variable, mu = hypothesized_mean)
# Example: Test if post-trust score differs from neutral value of 3
t.test(dataset$post_trust_score, mu = 3)
## 
##  One Sample t-test
## 
## data:  dataset$post_trust_score
## t = 16.139, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 3
## 95 percent confidence interval:
##  3.520095 3.665905
## sample estimates:
## mean of x 
##     3.593

Independent-Sample T-Test

An independent-sample t-test compares the means of two independent groups.

# General form: t.test(variable ~ group, data = dataset)

# Method 1: Using subsetting
t.test(dataset[which(dataset$technical_background == "Yes"), ]$iuipc_score,
       dataset[which(dataset$technical_background == "No"), ]$iuipc_score)
## 
##  Welch Two Sample t-test
## 
## data:  dataset[which(dataset$technical_background == "Yes"), ]$iuipc_score and dataset[which(dataset$technical_background == "No"), ]$iuipc_score
## t = -4.1429, df = 97.782, p-value = 7.291e-05
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.6596687 -0.2323681
## sample estimates:
## mean of x mean of y 
##  2.927451  3.373469
# Method 2: Using formula notation (recommended)
t.test(iuipc_score ~ technical_background, data = dataset)
## 
##  Welch Two Sample t-test
## 
## data:  iuipc_score by technical_background
## t = 4.1429, df = 97.782, p-value = 7.291e-05
## alternative hypothesis: true difference in means between group No and group Yes is not equal to 0
## 95 percent confidence interval:
##  0.2323681 0.6596687
## sample estimates:
##  mean in group No mean in group Yes 
##          3.373469          2.927451

Assumptions:

  1. Normality of distributions in each group
  2. No significant outliers in each group
  3. Similar variances (homogeneity of variance)

We have already covered the code to check assumptions 1 and 2. Now let’s look at how to evaluate assumption 3.

# Levene's Test for Homogeneity of Variance
# General form: leveneTest(variable ~ group, data = dataset)
leveneTest(iuipc_score ~ technical_background, data = dataset)
## Warning in leveneTest.default(y = y, group = group, ...): group coerced to
## factor.
## Levene's Test for Homogeneity of Variance (center = median)
##       Df F value Pr(>F)
## group  1  0.1948 0.6599
##       98

If the p-value is greater than 0.05, we do not have enough evidence to reject the null hypothesis. Therefore, we can say that the variances of these two groups are similar, and the equal variance assumption is met.

Paired-Sample T-Test

A paired-sample t-test compares two related measurements (e.g., pre-test and post-test scores).

# General form: t.test(variable1, variable2, paired = TRUE)
t.test(dataset$pre_trust_score, dataset$post_trust_score, paired = TRUE)
## 
##  Paired t-test
## 
## data:  dataset$pre_trust_score and dataset$post_trust_score
## t = -1.4693, df = 99, p-value = 0.1449
## alternative hypothesis: true mean difference is not equal to 0
## 95 percent confidence interval:
##  -0.25619657  0.03819657
## sample estimates:
## mean difference 
##          -0.109

ANOVA: Analysis of Variance

T-tests can only be applied when we compare two conditions. If we have three or more conditions to compare, we need to use ANOVA.

Let’s test whether there is a difference in participants’ IUIPC scores based on their usage frequency of AI.

# General form: aov(dependent_variable ~ independent_variable, data = dataset)
anova_model <- aov(iuipc_score ~ usage_frequency, data = dataset)
summary(anova_model)
##                 Df Sum Sq Mean Sq F value Pr(>F)    
## usage_frequency  3 26.046   8.682   114.1 <2e-16 ***
## Residuals       96  7.302   0.076                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The ANOVA tells us whether there is a significant difference among the groups, but it does not tell us which specific groups differ from each other. To identify these differences, we need to conduct post-hoc analyses.

Post-Hoc Tests

We have three main options for post-hoc analysis:

  1. Tukey’s Honestly Significant Difference (HSD): Comparing all groups to each other
  2. Dunnett’s Test: Comparing all groups to a reference group (control condition)
  3. Multiple Comparisons with Corrections: Pairwise comparisons with adjusted p-values
# Tukey's HSD Test
TukeyHSD(anova_model)
##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = iuipc_score ~ usage_frequency, data = dataset)
## 
## $usage_frequency
##                      diff        lwr        upr    p adj
## Monthly-Daily   0.9237548  0.7073760  1.1401336 0.00e+00
## Rarely-Daily    1.4031199  1.1964975  1.6097422 0.00e+00
## Weekly-Daily    0.5136853  0.3288045  0.6985662 0.00e+00
## Rarely-Monthly  0.4793651  0.2477390  0.7109911 2.70e-06
## Weekly-Monthly -0.4100694 -0.6225283 -0.1976106 1.26e-05
## Weekly-Rarely  -0.8894345 -1.0919482 -0.6869209 0.00e+00
# Dunnett's Test
# General form: DunnettTest(dependent_variable ~ group_variable, data = dataset, control = "control_group")
DunnettTest(iuipc_score ~ usage_frequency, data = dataset, control = "Rarely")
## 
##   Dunnett's test for comparing several treatments with a control :  
##     95% family-wise confidence level
## 
## $Rarely
##                      diff     lwr.ci     upr.ci    pval    
## Daily-Rarely   -1.4031199 -1.5909608 -1.2152789 < 2e-16 ***
## Monthly-Rarely -0.4793651 -0.6899369 -0.2687932 8.3e-07 ***
## Weekly-Rarely  -0.8894345 -1.0735402 -0.7053288 < 2e-16 ***
## 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Multiple Comparisons with Different Corrections

# 1. No correction
pairwise.t.test(dataset$iuipc_score, dataset$usage_frequency, 
                p.adjust.method = "none")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  dataset$iuipc_score and dataset$usage_frequency 
## 
##         Daily   Monthly Rarely 
## Monthly < 2e-16 -       -      
## Rarely  < 2e-16 4.6e-07 -      
## Weekly  9.9e-11 2.1e-06 < 2e-16
## 
## P value adjustment method: none
# 2. Bonferroni correction (most conservative)
# Formula: p_adjusted = p_raw × number_of_comparisons
pairwise.t.test(dataset$iuipc_score, dataset$usage_frequency, 
                p.adjust.method = "bonferroni")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  dataset$iuipc_score and dataset$usage_frequency 
## 
##         Daily   Monthly Rarely 
## Monthly < 2e-16 -       -      
## Rarely  < 2e-16 2.8e-06 -      
## Weekly  5.9e-10 1.3e-05 < 2e-16
## 
## P value adjustment method: bonferroni
# 3. Holm correction (less conservative than Bonferroni)
# Adjusts p-values sequentially based on ranking
pairwise.t.test(dataset$iuipc_score, dataset$usage_frequency, 
                p.adjust.method = "holm")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  dataset$iuipc_score and dataset$usage_frequency 
## 
##         Daily   Monthly Rarely 
## Monthly < 2e-16 -       -      
## Rarely  < 2e-16 9.2e-07 -      
## Weekly  3.0e-10 2.1e-06 < 2e-16
## 
## P value adjustment method: holm

Non-Parametric Tests

Non-parametric tests are used when the assumptions of parametric tests (e.g., normality) are violated. Here are the non-parametric alternatives to common parametric tests:

Parametric Test Non-Parametric Alternative
Independent sample t-test Mann-Whitney U test
Paired t-test Wilcoxon signed-rank test
One-way ANOVA (independent groups) Kruskal-Wallis test
Repeated measures ANOVA Friedman test
# (1) Independent sample t-test → Mann-Whitney U test
# General form: wilcox.test(dependent_var ~ grouping_var, data = dataset)
wilcox.test(iuipc_score ~ technical_background, data = dataset)

# (2) Paired t-test → Wilcoxon signed-rank test
# General form: wilcox.test(measure1, measure2, paired = TRUE)
wilcox.test(dataset[dataset$participant_id < 21, ]$pre_trust_score, dataset[dataset$participant_id < 21, ]$post_trust_score, paired = TRUE)

# (3) Repeated measures one-way ANOVA → Friedman test
# General form: friedman.test(as.matrix(dataset[, c("condition1", "condition2", "condition3")]))
# Example with three repeated measures
friedman.test(as.matrix(dataset[, c("measure1", "measure2", "measure3")]))
# (4) One-way ANOVA on 3+ independent groups → Kruskal-Wallis test
# General form: kruskal.test(dependent_var ~ grouping_var, data = dataset)
kruskal.test(iuipc_score ~ usage_frequency, data = dataset)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  iuipc_score by usage_frequency
## Kruskal-Wallis chi-squared = 76.824, df = 3, p-value < 2.2e-16