Rmarkdown: Repeated-Measures MANOVA (Profile Analysis)

Introduction

Welcome to this R demo session! Here, I will demonstrate how to use R to conduct profile analysis.

Data description

Below is a hypothetical data set appropriate for using profile analysis as an alternative to repeated-measures ANOVA.

We’re exploring a hypothetical scenario: How much do Shakira, Donald Trump, and Dr. Phil value power, money, appearance, and intelligence? It’s important to note that the data set we’re examining, as well as the research question posed, are entirely fictional and created for illustrative purposes.

To answer this question, we have cloned five clones for these three people. Each clone has a chip implanted in their brains to measure their attitude for power, money, appearance, and intelligence. Each clone were also placed in a simulation which resembled their real lives, and their interest in power, money, appearance, and intelligence were recorded with a value between 1-10.

power = c(3,4,4,2,2,8,9,9,8,10,6,7,5,6,6)
money = c(5,7,6,6,5,9,9,9,10,9,8,9,8,9,9)
appearance = c(7,8,8,7,8,2,2,3,2,1,6,7,7,8,7)
intelligence = c(8,7,8,7,7,1,2,2,2,1,8,8,7,9,8)

# For each person, we have five clones
names = rep(c("Shakira", "Donald Trump", "Dr. Phil"), c(5, 5, 5))

profile_data <- data.frame(names, power, money, appearance, intelligence)

Let’s visualize the data using a profile plot, which displays the mean scores by each person (IV) and by each attribute (DV) we are interested in.

Data visualization

library(ggplot2)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ lubridate 1.9.3     ✔ tibble    3.2.1
## ✔ purrr     1.0.2     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dplyr)

profile_data_long <- profile_data %>%
  pivot_longer(cols = c(power, money, appearance, intelligence), # Specify the columns to be pivoted into long format
               names_to = "Variable", # Rename the new column holding the original column names as "Variable"
               values_to = "Value") # Rename the new column holding the values from the original columns as "Value" 

ggplot(data = profile_data_long, aes(x = as.factor(Variable), 
                                     y = Value, group = names)) +
  geom_point(aes(color = names),position = position_jitter(width = .1), alpha = .08, size = 2) + # add individual data points
  stat_summary(fun = mean, 
               geom = "point", 
               aes(color = names)) + # add the mean as a point
  stat_summary(fun = mean, 
               geom = "line", 
               aes(color = names)) + # add the line between groups
  stat_summary(fun.data = mean_cl_boot, 
               geom = "errorbar", 
               width = 0.3, 
               aes(color = names)) +  # add error bars 
  labs(x = "Birth Order",
       y = "Mean Scores") + # rename x- and y-axis 
  scale_color_brewer(palette = "Set1")

From the graph, we can easily see that the individuals being compared seem to have distinct profiles. For example, money seems to be valued the most for Donald Trumps, whereas power shows the highest rating for Dr. Phils. Shakiras score highest on appearance (again, this is totally hypothetical dataset).

From the graph, you can see that the lines have the tendency to cross each other. From our naked eyes we can see that the profiles are not parallel and the lines are not necessarily equal and flat. In addition, among the individuals themselves, they do not seem to value the four attributes in the same way

What the graph suggests is that the individuals do not seem to share the same level of interests for power, money, appearance, and intelligence. At least not in the same pattern and not in the same way.

Profile analysis

Now, you might say that we cannot just use a graph to judge whether the profiles are similar or not. We need statistics to do that. You are right, this is where profile analysis comes into play.

Using the pbg function within the prfileR package, we can easily perform the profile analysis for testing the parallelism, coincidental profiles, and flatness.

#install.packages("profileR") # please install the package if you have not done so.
library(profileR)

## Loading required package: RColorBrewer

## Loading required package: reshape

## 
## Attaching package: 'reshape'

## The following object is masked from 'package:lubridate':
## 
##     stamp

## The following object is masked from 'package:dplyr':
## 
##     rename

## The following objects are masked from 'package:tidyr':
## 
##     expand, smiths

## Loading required package: lavaan

## This is lavaan 0.6-19
## lavaan is FREE software! Please report any bugs.

# Create a dataset without names as this will be used for the pbg() function.
profile_NoName <- profile_data %>%
  select(-names) 

model = pbg(profile_NoName, profile_data$names, original.names = T, profile.plot = T) # specifying profile.plot = T displays the profile plot

The profile plot generated by the pbg() function, while informative, is not ideally presented — the legend occupies too much space, which detracts from the overall clarity of the visualization. A ggplot2 produced plot, in contrast, offers a more refined, making it a better option for graphical data presentation

Using summary(model) gives us more detailed results for the profile analysis.

summary(model)

## Call:
## pbg(data = profile_NoName, group = profile_data$names, original.names = T, 
##     profile.plot = T)
## 
## Hypothesis Tests:
## $`Ho: Profiles are parallel`
##   Multivariate.Test    Statistic  Approx.F num.df den.df      p.value
## 1             Wilks  0.008867296  32.06503      6     20 3.025510e-09
## 2            Pillai  1.521533407  11.66007      6     22 6.987849e-06
## 3  Hotelling-Lawley 51.958568149  77.93785      6     18 6.786168e-12
## 4               Roy 50.780651335 186.19572      3     11 1.044885e-09
## 
## $`Ho: Profiles have equal levels`
##             Df Sum Sq Mean Sq F value   Pr(>F)    
## group        2  10.68   5.338   26.14 4.23e-05 ***
## Residuals   12   2.45   0.204                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## $`Ho: Profiles are flat`
##          F df1 df2      p-value
## 1 54.87772   3  10 1.626441e-06

Parallelism

Parallelism is the main test of interest in a profile analysis by group because the test of parallelism examines whether each segment of a profile is identical. A segment refers to the difference in the values of the same variables across multiple time points or the difference between multiple variables in a single time point. Therefore, the segment is simply the slope of the line between the means of two adjacent variables. Parallelism is assessed using a one-way MANOVA. If the null hypothesis of parallelism is rejected, there is a significant interaction between group membership (groups of clones) and the variables (the four attributes) or group membership and the time points (e.g., if a test is repeatedly administered). In other words, the amount of increase or decrease between successive measurements of the variable is different for at least one of the groups.

In this example, we can see that from the results, all four test statistics (Wilks, Pillai, Hotelling-Lawley, and Roy) related to the test of parallelism show that all p-values are below the .05 level, we can reject the null hypothesis, meaning that the profiles are not parallel. In other words, the three groups of clones do not share the same values for power, money, appearance, and intelligence in the same way.

Coincidental profiles

If the profiles are parallel, one typically tests equality of the levels, which examines whether the profiles coincide (i.e., there are no group differences). This test is used for determining whether at least one group scored higher than other groups, on average, across all the variables or time points. To evaluate this, the grand mean of all time points or variables is calculated for each group. Since all of the time points or variables are collapsed into a group mean, the resulting procedure becomes a univariate test and a between-groups main effect in ANOVA is performed. The test simply measures the relative contributions of between-group and within-group variations to the total sum of squared errors. Based on this test, if the group levels are significantly different from one another, then the null hypothesis of equal levels is rejected. That is, at least one group performs significantly higher or lower than the other groups based on the average of p variables.

From the results, the p-value for the one-way ANOVA is below the significance level, suggesting that the mean scores of the four attributes differ across the three groups of clones.

Note that if the variables are not measured on a comparable scale, it’s important to standardize them to z-scores before advancing with the analysis.

Test of flatness

Flatness is a measure of the extent to which the profiles are flat within any group (i.e., there are no differences in the average values of the variables or the average value of a single variable measured across multiple time points), given that the profiles are parallel. The null hypothesis of flatness is that the segments are 0. This is the analog to the profile analysis for one sample except for multiple groups or repeated measurements.

Note that this question is typically relevant only if the profiles are parallel. If the profiles are not parallel, then at least one of them is necessarily not flat. Although it is conceivable that nonflat profiles from two or more groups could cancel each other out to produce, on average, a flat profile, this result is often not of research interest.

In our example, the null hypothesis for this test of flatness is that for Shakira clones, Donald Trump clones, and Dr. Phil clones, they each value the four attributes relatively equally. From the p-value for the test, the null hypothesis is rejected.

Mean for average matrix above

The pbg() function is quite useful for those with a grasp of profile analysis, as it automates the process of testing for parallelism, coincident profiles, and flatness. However, to deepen your understanding, it’s beneficial to know what occurs statistically during these tests. For this reason, I will guide you through the profile analysis process using fundamental R code, explaining each step to ensure clarity and enhance your comprehension of the underlying methodology.

Parallelism test: using difference scores

In order to perform parallelism test from scratch, we need to compute the difference scores. Because essentially the test of parallelism is just a one-way MANOVA of these score differences.

attach(profile_data)

## The following objects are masked _by_ .GlobalEnv:
## 
##     appearance, intelligence, money, names, power

# Using attach(profile_data) streamlines the process by allowing us to reference variables directly, omitting the need to specify the dataset with each variable call.

PM = power - money
MA = money - appearance
AI = appearance - intelligence
diff = data.frame(names, PM, MA, AI)
diff

##           names PM MA AI
## 1       Shakira -2 -2 -1
## 2       Shakira -3 -1  1
## 3       Shakira -2 -2  0
## 4       Shakira -4 -1  0
## 5       Shakira -3 -3  1
## 6  Donald Trump -1  7  1
## 7  Donald Trump  0  7  0
## 8  Donald Trump  0  6  1
## 9  Donald Trump -2  8  0
## 10 Donald Trump  1  8  0
## 11     Dr. Phil -2  2 -2
## 12     Dr. Phil -2  2 -1
## 13     Dr. Phil -3  1  0
## 14     Dr. Phil -3  1 -1
## 15     Dr. Phil -3  2 -1

detach(profile_data)

# One-way MANOVA on DVs' differences
library(MASS)

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

options(scipen=999) # control the penalty for displaying numbers in scientific notation (R will favor regular fixed notation when printing numbers)

model = manova(cbind(PM, MA, AI) ~ factor(names))

summary(model, test="Wilks")

##               Df     Wilks approx F num Df den Df         Pr(>F)    
## factor(names)  2 0.0088673   32.065      6     20 0.000000003026 ***
## Residuals     12                                                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(model, test="Pillai")

##               Df Pillai approx F num Df den Df      Pr(>F)    
## factor(names)  2 1.5215    11.66      6     22 0.000006988 ***
## Residuals     12                                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(model, test="Hotelling")

##               Df Hotelling-Lawley approx F num Df den Df            Pr(>F)    
## factor(names)  2           51.959   77.938      6     18 0.000000000006786 ***
## Residuals     12                                                              
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

summary(model, test="Roy")

##               Df    Roy approx F num Df den Df         Pr(>F)    
## factor(names)  2 50.781    186.2      3     11 0.000000001045 ***
## Residuals     12                                                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Test of coincidental profiles: one-way ANOVA

From our lecture, we understand that to execute the test of coincidental profiles independently, we should carry out a one-way ANOVA. This involves using the average scores across the four focal attributes for each individual as the DV, and the categorization variable—the three clone groups—as the IV.

profile_data <- profile_data %>%
  rowwise() %>%
  # Create a new column 'mean' to store the row-wise mean
  mutate(mean = mean(c_across(c(power, money, appearance, intelligence)), na.rm = TRUE))

model <- aov(mean ~ names, data = profile_data)
summary(model)

##             Df Sum Sq Mean Sq F value    Pr(>F)    
## names        2  10.68   5.338   26.14 0.0000423 ***
## Residuals   12   2.45   0.204                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Test of flatness: one-sample Hotelling test

When we are trying to test the flatness of the profiles, we are interested in knowing whether the adjacent DVs differ from each other. We can convert this question to examine whether the difference scores for each pairs of adjacent DVs equal to zero.

Again, this test is usually irrelevant if the test of parallelism is significant (showing an interaction between IV groups and DVs). However, we are still performing the test for demonstration purposes.

We use Hotellings T2 test to test the flatness. The null hypothesis for this test is that the mean vector of the difference scores (PM, MA, AI) is equal to the vector c(0, 0, 0), which represents no difference between the compared dependent variables (profiles are flat).

# install.packages("ICSNP")
library(ICSNP)

## Loading required package: mvtnorm

## Loading required package: ICS

Shakira = HotellingsT2(diff[1:5, 2:4], mu=c(0, 0, 0))
Shakira

## 
##  Hotelling's one sample T2-test
## 
## data:  diff[1:5, 2:4]
## T.2 = 28.525, df1 = 3, df2 = 2, p-value = 0.03406
## alternative hypothesis: true location is not equal to c(0,0,0)

Trump = HotellingsT2(diff[6:10, 2:4], mu=c(0, 0, 0))
Trump

## 
##  Hotelling's one sample T2-test
## 
## data:  diff[6:10, 2:4]
## T.2 = 187.67, df1 = 3, df2 = 2, p-value = 0.005305
## alternative hypothesis: true location is not equal to c(0,0,0)

Phil = HotellingsT2(diff[11:15, 2:4], mu=c(0, 0, 0))
Phil

## 
##  Hotelling's one sample T2-test
## 
## data:  diff[11:15, 2:4]
## T.2 = 81.833, df1 = 3, df2 = 2, p-value = 0.0121
## alternative hypothesis: true location is not equal to c(0,0,0)

From the results, we can determine that the null hypothesis is rejected, meaning that for Shakira, Donald Trump, and Dr. Phil, they do not have a flat profiles. In other words, for each individual, they do not value these four attributes equally.

Closing remarks:

You may decide whether to conduct a post-hoc analysis depending on the research question and/or the MANOVA and ANOVA results obtained from the profile analysis to identify the differences. For more details on post-hoc analysis, please refer back to the relevant sections in previous lectures.