Figure 1. Formative and Reflective Model

Introduction

We learned that there are two different “Measurement Models”, how items reflect the construct (reflective model) or how items can form the construct (formative model). Altough in reality the distinction is now always clear cut, the distinction makes sense for some practical applications.

For example, if we have an unidimensional reflective model, we can build item-pools, where a large set of items can be used to test students or patients.

The result of the test is still valid, even if not all items are given to a student or a patient. This is of course used in school, where a random sample of exam questions are sampled from the item pool every year, or if a student needs to do a repetition, a new sample can be drawn. Another application is the use of computer adaptive testing, where this method allows for more precise estimations of the true value.

In the reflective model, the items are - or should be -, highly correlated.

In the formative model, the items do not need to be correlated and items that are not highly correlation can not be dropped. Every item might be important, even when it is not correlated with the other items.

We will do now an exercise in R to get a better feeling for these two models.

In this exercise we will try to illustrate the difference between the reflective and the formative models.

Reflective model

Data

Exercise

First, we will simulate data that correspond to a data-structure from a reflective model. Copy the code below to create the data frame.

### first create the empty data frame ####
set.seed(1234) # With set.seed() we will have reproducible results 
number_of_items <- 10
number_of_observations <- 100
number_of_items_plus_2 <- number_of_items+2

empty.df <- as.data.frame(matrix(0, ncol=number_of_items, nrow=number_of_observations, dimnames=list(NULL, 
                                                        paste0('item_',1:number_of_items))))

# generate the true value of the construct ####
library(tidyverse)
data<-empty.df %>% 
  mutate(true_value=rnorm(nrow(.), 50, 15)) %>% 
  select(true_value, everything())

# hist(data$true_value)
# psych::describe(data$true_value)

# fill the empty data frame ####
data<-data %>% 
  mutate(across(starts_with("item_"),~ true_value+rnorm(number_of_observations, 0, 8))) %>% 
  mutate(id=1:number_of_observations) %>% 
  select(id, everything())
  1. How many rows (observations) and columns (variables) has the data frame?
  2. Get some descriptives (e.g. via the summary() function or the describe() function from the psych package.
  3. Create a histogram from the variable true_score.

Solution

  1. How many rows (observations) and columns (variables) has the data frame?
dim(data)
## [1] 100  12

There are 100 observations and 12 variables (the 10 items, the true values and the id).


  1. Get some descriptives (e.g. via the summary() function or the describe() function from the psych package.
library(psych)
describe(data)
##            vars   n  mean    sd median trimmed   mad   min    max  range skew
## id            1 100 50.50 29.01  50.50   50.50 37.06  1.00 100.00  99.00 0.00
## true_value    2 100 47.65 15.07  44.23   46.55 14.22 14.81  88.23  73.42 0.59
## item_1        3 100 47.98 17.00  45.75   47.77 16.36 10.79  92.34  81.55 0.19
## item_2        4 100 48.89 17.46  47.78   48.19 17.69 -8.57  97.45 106.02 0.19
## item_3        5 100 47.58 18.26  45.50   46.93 18.69  4.38 103.54  99.16 0.39
## item_4        6 100 47.47 16.96  45.51   46.46 16.05  7.62  95.64  88.02 0.52
## item_5        7 100 46.55 17.76  45.47   45.85 16.84 11.79  96.71  84.92 0.41
## item_6        8 100 46.95 16.84  45.38   45.76 15.65 14.21  92.59  78.38 0.57
## item_7        9 100 47.64 17.43  46.28   47.06 17.40  4.96  99.38  94.41 0.33
## item_8       10 100 47.79 16.13  43.86   46.47 15.03 20.39  92.29  71.89 0.68
## item_9       11 100 47.11 17.82  45.16   46.81 18.35  7.49  88.97  81.48 0.18
## item_10      12 100 46.97 16.96  44.44   46.17 18.71 17.60  90.24  72.64 0.43
##            kurtosis   se
## id            -1.24 2.90
## true_value    -0.02 1.51
## item_1        -0.27 1.70
## item_2         0.59 1.75
## item_3        -0.10 1.83
## item_4         0.27 1.70
## item_5        -0.25 1.78
## item_6        -0.04 1.68
## item_7        -0.03 1.74
## item_8        -0.20 1.61
## item_9        -0.41 1.78
## item_10       -0.70 1.70

  1. Create a histogram from the variable true_value.
hist(data$true_value)


Correlations

Exercise

Have a look at how the 10 items in the data frame are correlated. You can use the pairs.panels() function from the psych package to check the pairwise correlations for all 10 items. What do you see from the correlation matrix?


Solution

pairs.panels(data[3:12])

The 10 items are highly correlated. For example, the correlation between Item 2 and Item 1 is 0.81, and the correlation between Item 4 and Item 6 is 0.76.


Mean scores of randomly sampled items in a reflective model

Copy and run the code below. This will create a new data frame called sampled_items_1 with a random selection of five out of the ten items and the mean of these five items.

sampled_items_1 <- data[,sample(3:number_of_items_plus_2, 5)]
sampled_items_1 <- sampled_items_1 %>% 
  mutate(mean=rowMeans(.)) %>% 
  mutate(id=1:number_of_observations)

Cave: As this is a random-function, you will get different results each time.

Exercise

  1. Which 5 items were selected for the data frame sampled_items_1?
  2. Create another data frame with 5 sampled items (sampled_items_2) and calculate the mean of these 5 items for each person. (Just copy the code above but replate sampled_items_1 with sampled_items_2). Which 5 items were selected for the data frame sampled_items_2)?
  3. Create a scatterplot of the mean values from sampled_items_1 and sampled_items_2. What do you see?
  4. Confirm your impression from the scatterplot by calculating the correlations between the mean values from sampled_items_1 and sampled_items_2. What is you conclusion?

Solution

CAVE: the solutions may be slighlty different in your case due to the sample() funktion.

  1. Which five items were selected for the data frame sampled_items_1?
names(sampled_items_1)
## [1] "item_8"  "item_10" "item_4"  "item_7"  "item_1"  "mean"    "id"
  1. Create another data frame with 5 sampled items (sampled_items_2) and calculate the mean of these 5 items for each person. (Just copy the code above but replate sampled_items_1 with sampled_items_2). Which 5 items were selected for the data frame sampled_items_2)?
sampled_items_2 <- data[,sample(3:number_of_items_plus_2, 5)]
sampled_items_2 <- sampled_items_2 %>% 
  mutate(mean=rowMeans(.)) %>% 
  mutate(id=1:number_of_observations)
names(sampled_items_2)
## [1] "item_2" "item_8" "item_4" "item_3" "item_7" "mean"   "id"

  1. Create a scatterplot of the mean values from sampled_items_1 and sampled_items_2. What do you see?
plot(sampled_items_1$mean, sampled_items_2$mean)

It seems that the mean scores for sampled_items_1 and sampled_items_2 are almost the same…


  1. Confirm your impression from the scatterplot by calculating the correlations between the mean values from sampled_items_1 and sampled_items_2. What is you conclusion?
cor.test(sampled_items_1$mean, sampled_items_2$mean)
## 
##  Pearson's product-moment correlation
## 
## data:  sampled_items_1$mean and sampled_items_2$mean
## t = 53.389, df = 98, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.9751494 0.9887124
## sample estimates:
##       cor 
## 0.9832406

We can see that there is a high correlation (almost 1) between the estimation from the Item set 1 and the Items set 2. The items are almost exchangeable. Therfore, it is allowed to take a random sample of the items.

One could also look at the differences between the the means from sampled_items_1 and sampled_items_2:

boxplot(sampled_items_1$mean - sampled_items_2$mean, main = "Differences between the mean scores (0 to 100)")

The differences are close to zero.

To be fair: In high stake settings, a higher number of items should be selected. For example in education, up to 100 Items would be selected to “guarantee” a fair comparison between the item sets.



Formative model

Data

Exercise

  1. To illustrate a formaitve model, we simulate items that form a quality of life score. Copy the code below to create the data. How many observations and variables does the data frame contain?
response_options<-c(1,2,3,4,5)
number_of_observation=100
number_of_items=10

set.seed(12345) # with set.seed() we will have reproducible results. 
satisfied_health<-sample(response_options, number_of_observation, replace=TRUE)

pain_prevents_doing<-sample(response_options, number_of_observation, replace=TRUE)

get_support<-sample(response_options, number_of_observation, replace=TRUE)

feel_safe<-sample(response_options, number_of_observation, replace=TRUE)

enough_energy<-sample(response_options, number_of_observation, replace=TRUE)

accept_bodily_appearance<-sample(response_options, number_of_observation, replace=TRUE)

enough_money<-sample(response_options, number_of_observation, replace=TRUE)

satisfied_sleep<-sample(response_options, number_of_observation, replace=TRUE)

satisfied_work<-sample(response_options, number_of_observation, replace=TRUE)
 
satisfied_partnership<-sample(response_options, number_of_observation, replace=TRUE)


data_formative<-data.frame(satisfied_health,pain_prevents_doing,get_support,
                           enough_energy,accept_bodily_appearance,enough_money,
                           satisfied_sleep,satisfied_work,satisfied_partnership,
                           feel_safe)

Solution

How many observations and variables does the data frame contain?

dim(data_formative)
## [1] 100  10

The data frame data_formative contains 100 observations and 10 variables (the ten items).


Correlations

Exercise

Create a correlation matrix to evaluate the correlations between the items. What do you see?


Solution

psych::pairs.panels(data_formative)

All pairwise correlations are close to zero.


Mean scores of randomly sampled items in a formative model

Copy and run the code below. This will create a new data frame called random_items_1` with a random selection of five out of the ten items and the mean of these five items.

random_items_1 <- data_formative[,sample(1:number_of_items, 5)]
random_items_1 <- random_items_1 %>% 
  mutate(score=rowMeans(.[1:5])) %>% 
  mutate(id=1:number_of_observation)

Cave: As this is a random-function, you will get different results each time.

Exercise

  1. Which 5 items were selected for the data frame random_items_1?
  2. Create another data frame random_item2 with 5 sampled items and calculate the mean of these 5 items for each person. (Just copy the code above but replate random_items_1 with random_items_2). Which 5 items were selected for the data frame random_items_2)?
  3. Create a scatterplot of the mean values from random_items_1 and random_items_2. What do you see?
  4. Confirm your impression from the scatterplot by calculating the correlations between the mean values from random_items_1 and random_items_2. What is you conclusion?

Solution

  1. Which 5 items were selected for the data frame random_items_1?
names(random_items_1)
## [1] "get_support"              "accept_bodily_appearance"
## [3] "satisfied_partnership"    "enough_money"            
## [5] "satisfied_health"         "score"                   
## [7] "id"

  1. Create another data frame random_item2 with 5 sampled items and calculate the mean of these 5 items for each person. (Just copy the code above but replate random_items_1 with random_items_2). Which 5 items were selected for the data frame random_items_2)?
random_items_2 <- data_formative[,sample(1:number_of_items, 5)]
random_items_2 <- random_items_2 %>% 
  mutate(score=rowMeans(.[1:5])) %>% 
  mutate(id=1:number_of_observation)
names(random_items_2)
## [1] "get_support"     "satisfied_work"  "satisfied_sleep" "enough_money"   
## [5] "feel_safe"       "score"           "id"

  1. Create a scatterplot of the mean values from random_items_1 and random_items_2. What do you see?
plot(random_items_1$score, random_items_2$score)

It seems that the correlation of the scores of the two sets of items is low. This means that the scores from item set 1 is different from item set 2.


  1. Confirm your impression from the scatterplot by calculating the correlations between the mean values from random_items_1 and random_items_2. What is you conclusion?
cor.test(random_items_1$score, random_items_2$score)
## 
##  Pearson's product-moment correlation
## 
## data:  random_items_1$score and random_items_2$score
## t = 4.7621, df = 98, p-value = 6.63e-06
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2591435 0.5804880
## sample estimates:
##       cor 
## 0.4334964

The correlation is only moderate. This shows that we are not allowed to randomly select items from a pool of items in a formative model.

One could also look at the differences between the the means from sampled_items_1 and sampled_items_2:

boxplot(random_items_1$score - random_items_2$score, main = "Differences between the mean scores (1 to 5")



Take home

In the reflective model, we are allowed to randomly sample from the item-pool, as all items are highly intercorrelated. In the formative model, we are not allow to do this, because the items are not highly correlated with each other, as they contain information on different aspects of the quality of life. And one person can have problems in one area (e.g. with money), but be completely satisfied in another area. In the reflective model, the items co-vary because there is the “motor” of the construct that drives the responses of the items. Therefore, if you know the answer of some of the items, you can make good guesses about the responses in the other items.


  • In a reflective models, the items need to be highly correlated
  • This does not need to be the case in a formative model
  • Therefore, the Cronbach’s alpha is useless in a formative model (but should be high in a reflective model)
  • If you have a large item pool from a reflective model, you can take a random sample of items to generate a score.