LikertMakeR

Hume Winzar

December 2022

LikertMakeR

LikertMakeR synthesises and correlates Likert-scale and related rating-scale data. You decide the mean and standard deviation, and (optionally) the correlations among vectors, and the package will generate data with those same predefined properties.

The package generates a column of values that simulate the same properties as a rating scale. If multiple columns are generated, then you can use LikertMakeR to rearrange the values so that the new variables are correlated exactly in accord with a user-predefined correlation matrix.

Purpose

The package should be useful for teaching in the Social Sciences, and for scholars who wish to “replicate” rating-scale data for further analysis and visualisation when only summary statistics have been reported.

I was prompted to write the functions in LikertMakeR after reviewing too many journal article submissions where authors presented questionnaire results with only means and standard deviations (often only the means), with no understanding of the real distributions. Hopefully, this tool will help researchers, teachers, and other reviewers, to better think about rating-scale distributions, and the effects of variance, boundaries and number of items in a scale.

Rating scale properties

A Likert scale is the mean, or sum, of several ordinal rating scales. They are bipolar (usually “agree-disagree”) responses to propositions that are determined to be moderately-to-highly correlated and capturing various facets of a construct.

Rating scales, such as Likert scales, are not continuous or unbounded.

For example, a 5-point Likert scale that is constructed with, say, five items (questions) will have a summed range of between 5 (all rated ‘1’) and 25 (all rated ‘5’) with all integers in between, and the mean range will be ‘1’ to ‘5’ with intervals of 1/5=0.20. A 7-point Likert scale constructed from eight items will have a summed range between 8 (all rated ‘1’) and 56 (all rated ‘7’) with all integers in between, and the mean range will be ‘1’ to ‘7’ with intervals of 1/8=0.125.

Rating-scale boundaries define minima and maxima for any scale values. If the mean is close to one boundary then data points will gather more closely to that boundary and the data will always be skewed.

Alternative methods & packages

LikertMakeR is intended for synthesising & correlating rating-scale data with means, standard deviations, and correlations as close as possible to predefined parameters. If you don’t need your data to be close to exact, then other options may be faster or more flexible.

Different approaches include:

     
     n <- 128
     sample(1:5, n, replace = TRUE,
       prob = c(0.1, 0.2, 0.4, 0.2, 0.1)
     )

Using LikertMakeR

Download and Install LikertMakeR from GitHub.


library(devtools)

install_github("WinzarH/LikertMakeR")

# load the package
library(LikertMakeR)

Generate synthetic rating-scale data

To synthesise a rating scale with LikertMakeR, the user must input the following parameters:

LikertMakeR offers two different functions for synthesising a rating scale: lfast() and lexact()

lfast()

  • lfast() draws a random sample from a scaled Beta distribution. It is very fast but does not guarantee exact mean and standard deviation. Recommended for relatively large sample sizes.

lfast() example

a five-item, seven-point Likert scale

## a five-item, seven-point Likert scale

x <- lfast(
  n = 512,
  mean = 4.0,
  sd = 1.0,
  lowerbound = 1,
  upperbound = 7,
  items = 5
)

an 11-point likelihood-of-purchase scale

## an 11-point likelihood-of-purchase scale

x <- lfast(256, 2, 2, 0, 10, seed = 42)

lexact()

lexact() attempts to produce a vector with exact first and second moments. It uses the Differential Evolution algorithm in the DEoptim package to find appropriate values within the desired constraints.

If feasible, lexact() should produce data with moments that are correct to two decimal places. Infeasible cases occur when the requested standard deviation is too large for the combination of mean, n-items, and scale boundaries.

lexact() example #1

a five-item, seven-point Likert scale

x <- lexact(
  n = 64,
  mean = 5.0,
  sd = 1.0,
  lowerbound = 1,
  upperbound = 7,
  items = 5
)
#> 
#> ***** summary of DEoptim object ***** 
#> best member   :  28 21 20 23 30 21 26 22 27 21 26 27 26 22 28 34 26 27 30 27 22 33 21 28 23 27 29 24 15 29 33 25 20 25 17 33 21 13 31 19 21 26 35 31 24 27 32 25 17 22 28 29 34 26 25 25 19 31 24 24 18 18 21 18 
#> best value    :  0.02519 
#> after         :  134 generations 
#> fn evaluated  :  86400 times 
#> *************************************

lexact() can take time to complete the optimisation task. For example, lexact() executed the above example with the following sample-size & time combinations on the author’s laptop [ Windows 11 (Intel(R) i7-12700H 2.30 GHz) ].

n seconds
16 0.20
32 0.37
64 1.55
128 7.03
256 45.54

lexact() example #2

11-point likelihood-of-purchase scale

Reproducible example


x <- lexact(64, 2, 1.8, 0, 10, seed = 42)
#> 
#> ***** summary of DEoptim object ***** 
#> best member   :  3 1 0 5 1 1 0 1 4 2 0 2 2 4 1 2 2 6 0 5 3 4 4 1 4 1 0 4 1 0 2 2 2 8 1 1 4 0 2 2 0 2 3 2 1 3 2 0 1 3 0 1 0 1 4 0 7 0 1 2 1 1 3 2 
#> best value    :  0.87831 
#> after         :  110 generations 
#> fn evaluated  :  71040 times 
#> *************************************

lexact() example #3

a 7-point negative-to-positive scale with 4 items

x <- lexact(
  n = 64,
  mean = 1.2,
  sd = 1.00,
  lowerbound = -3,
  upperbound = 3,
  items = 4
)
#> 
#> ***** summary of DEoptim object ***** 
#> best member   :  10 8 0 8 12 4 7 7 3 7 6 9 1 1 6 6 2 12 0 8 2 7 7 7 9 -2 2 5 8 7 4 8 -3 -3 8 6 7 6 4 -1 8 8 2 2 4 11 -2 8 -5 5 1 5 8 4 7 10 5 9 0 1 10 0 2 -1 
#> best value    :  0.60951 
#> after         :  93 generations 
#> fn evaluated  :  60160 times 
#> *************************************

Correlating vectors of synthetic rating scales

LikertMakeR offers another function, lcor(), which rearranges the values in the columns of a data-set so that they are correlated at a specified level. It does not change the values - it swaps their positions within each column so that univariate statistics do not change, but their correlations with other vectors do.

lcor() systematically selects pairs of values in a column and swaps their places, and checks to see if this swap improves the correlation matrix. If the revised data-frame produces a correlation matrix closer to the target correlation matrix, then the swap is retained. Otherwise, the values are returned to their original places. This process is iterated across each column.

To create the desired correlated data, the user must define the following data-frames:

lcor() example #1

Let’s generate some data: three 5-point Likert scales, each made with five items.

generate uncorrelated synthetic data

n <- 32

# set.seed(42)

x1 <- lexact(n, 2.5, 0.75, 1, 5, 5)
#> 
#> ***** summary of DEoptim object ***** 
#> best member   :  12 8 13 9 10 14 9 13 13 8 12 14 6 22 15 11 12 10 8 15 20 14 14 19 16 14 16 11 12 6 14 10 
#> best value    :  0.69599 
#> after         :  55 generations 
#> fn evaluated  :  17920 times 
#> *************************************
x2 <- lexact(n, 3.0, 1.50, 1, 5, 5)
#> 
#> ***** summary of DEoptim object ***** 
#> best member   :  24 21 11 24 22 11 23 5 24 14 8 25 24 7 20 6 9 23 6 8 15 19 5 12 6 5 24 22 7 24 14 12 
#> best value    :  0.23398 
#> after         :  45 generations 
#> fn evaluated  :  14720 times 
#> *************************************
x3 <- lexact(n, 3.5, 1.00, 1, 5, 5)
#> 
#> ***** summary of DEoptim object ***** 
#> best member   :  21 22 17 21 20 16 11 22 13 20 15 21 15 22 21 6 22 18 16 12 24 11 21 14 16 22 20 11 6 25 22 17 
#> best value    :  0.10413 
#> after         :  5 generations 
#> fn evaluated  :  1920 times 
#> *************************************

mydat3 <- cbind(x1, x2, x3) |> data.frame()

The first ten observations from this data-frame are:

#>        x1  x2  x3
#> par1  2.4 4.8 4.2
#> par2  1.6 4.2 4.4
#> par3  2.6 2.2 3.4
#> par4  1.8 4.8 4.2
#> par5  2.0 4.4 4.0
#> par6  2.8 2.2 3.2
#> par7  1.8 4.6 2.2
#> par8  2.6 1.0 4.4
#> par9  2.6 4.8 2.6
#> par10 1.6 2.8 4.0

Mean values:

#>  x1  x2  x3 
#> 2.5 3.0 3.5

Standard deviations:

#>    x1    x2    x3 
#> 0.748 1.501 0.999

We can see that the data are close to what is expected. The synthetic data have low correlations:

#>       x1    x2   x3
#> x1  1.00 -0.34 0.10
#> x2 -0.34  1.00 0.15
#> x3  0.10  0.15 1.00
a target correlation matrix

## describe a target correlation matrix

tgt3 <- matrix(
  c(
    1.00, 0.80, 0.75,
    0.80, 1.00, 0.90,
    0.75, 0.90, 1.00
  ),
  nrow = 3
)

So now we have a data-frame with desired first and second moments, and a target correlation matrix.

applying the lcor() function.

## apply lcor function

new3 <- lcor(mydat3, tgt3)

A new data frame with correlations close to our desired correlation matrix:

#>      x1  x2   x3
#> x1 1.00 0.8 0.75
#> x2 0.80 1.0 0.90
#> x3 0.75 0.9 1.00

And the means and standard deviations have not changed from the original data-frame.

Mean values:

#>  x1  x2  x3 
#> 2.5 3.0 3.5

Standard deviations:

#>    x1    x2    x3 
#> 0.748 1.501 0.999

A large data-frame can take some time. Time to run a task in lcor() depends on

  • number of observations,

  • number of columns,

  • number of intervals within each scale,

  • speed of your computer.

The lcor() function in the example above had the following sample-size/time combinations on the author’s laptop:

n seconds
16 0.16
32 0.45
64 1.52
128 5.62
256 23.30

Value pairs are evaluated one pair at a time, so a vectorised process is infeasible at present.

lcor() example #2

Dummy data for scale development

Let’s generate some data: five 7-point items to help students decide what items should be included in a Likert scale.

generate uncorrelated synthetic data with lfast()


n <- 128

x1 <- lfast(n, 2.50, 1.50, 1, 7, 1)
x2 <- lfast(n, 4.50, 1.00, 1, 7, 1)
x3 <- lfast(n, 4.50, 1.50, 1, 7, 1)
x4 <- lfast(n, 5.25, 0.70, 1, 7, 1)
x5 <- lfast(n, 5.25, 1.25, 1, 7, 1)

mydat5 <- cbind(x1, x2, x3, x4, x5) |> data.frame()

Mean values:

#>    x1    x2    x3    x4    x5 
#> 2.406 4.469 4.469 5.258 5.258

Standard deviations:

#>    x1    x2    x3    x4    x5 
#> 1.498 1.011 1.474 0.713 1.365

The synthetic data have low correlations:

#>       x1    x2    x3    x4    x5
#> x1  1.00  0.11 -0.03  0.12 -0.02
#> x2  0.11  1.00 -0.07 -0.02  0.09
#> x3 -0.03 -0.07  1.00  0.02 -0.04
#> x4  0.12 -0.02  0.02  1.00  0.15
#> x5 -0.02  0.09 -0.04  0.15  1.00
a target correlation matrix

## describe a target correlation matrix

tgt5 <- matrix(
  c(
    1.00, 0.80, 0.75, 0.25, -0.75,
    0.80, 1.00, 0.70, 0.25, -0.70,
    0.75, 0.70, 1.00, 0.25, -0.50,
    0.25, 0.25, 0.25, 1.00, -0.40,
    -0.75, -0.70, -0.50, -0.40, 1.00
  ),
  nrow = 5
)

So now we have a data-frame with desired first and second moments, and a target correlation matrix.

applying the lcor() function.

## apply lcor function

new5 <- lcor(mydat5, tgt5)

A new data frame with correlations close to our desired correlation matrix:

#>       x1    x2    x3    x4    x5
#> x1  1.00  0.80  0.73  0.25 -0.62
#> x2  0.80  1.00  0.70  0.25 -0.70
#> x3  0.73  0.70  1.00  0.25 -0.50
#> x4  0.25  0.25  0.25  1.00 -0.40
#> x5 -0.62 -0.70 -0.50 -0.40  1.00