Research Scenario

A researcher measures personality in their study using the Big Five Inventory-2 (BFI-2), which includes five subscales capturing each of the following personality traits: 1) Extraversion, 2) Agreeableness, 3) Conscientiousness, 4) Neuroticism (or negative emotionality), and 5) Openness (or open-mindedness).

We need to prepare the data collected using this survey for analysis. You can see each of the items as well as which items belong to which of the subscales listed above on the bfi2-form.pdf document.

Import Data

First, let’s import the data.

data <- import("bfi2.csv")
head(data)

##   big_2_1 big_2_2 big_2_3 big_2_4 big_2_5 big_2_6 big_2_7 big_2_8 big_2_9
## 1       4       5       2       3       4       3       5       2       4
## 2       5       4       2       2       4       4       5       4       2
## 3       3       5       3       2       1       3       4       4       2
## 4       4       5       1       4       5       4       5       1       4
## 5       4       4       4       2       4       2       5       4       4
## 6       3       2       2       3      NA       3       3       3       3
##   big_2_10 big_2_11 big_2_12 big_2_13 big_2_14 big_2_15 big_2_16 big_2_17
## 1        4        1        2        5        4        2        5        1
## 2        4        3        4        4        4        3        2        1
## 3        5        2        2        4        4        3        3        2
## 4        3        1        1        5        3        4        3        4
## 5        4        2        2        4        4        3        4        1
## 6        4        3        3        2        4        2        2        3
##   big_2_18 big_2_19 big_2_20 big_2_21 big_2_22 big_2_23 big_2_24 big_2_25
## 1        4        4        4        3        2        2        4        2
## 2        5        5        2        5        2        5        2        2
## 3        5        4        4        4        2        4        2        2
## 4        3        1        4        3        1        3        4        3
## 5        3        3        2        2        1        4        5        2
## 6        2        2        3        2        2        2        4        3
##   big_2_26 big_2_27 big_2_28 big_2_29 big_2_30 big_2_31 big_2_32 big_2_33
## 1        2        4        2        4        2        4        5        5
## 2        2        5        2        4        4        4        4        4
## 3        1        3        3        4        1        4        1        5
## 4        2        2       NA        5        5        3        5        5
## 5        2        4        2        4        2        4        4        2
## 6        2        4        4        2        4        3        3        2
##   big_2_34 big_2_35 big_2_36 big_2_37 big_2_38 big_2_39 big_2_40 big_2_41
## 1        4        4        2        2        5        2        4        4
## 2        5        3        1        2        4        4        5        3
## 3        5        5        2        4        4        4        3        3
## 4        3        3        3        1        5        1        3        4
## 5        3        4        2        2        4        2        4        4
## 6        4        5        4        3        3        4        3        3
##   big_2_42 big_2_43 big_2_44 big_2_45 big_2_46 big_2_47 big_2_48 big_2_49
## 1        2        5        4        2        2        1        1        2
## 2        4        4        5        3        5        2        3        1
## 3        4        4        2        2        3        2        1        1
## 4        3        5        5        2        3        1        1        2
## 5        2        4        4        2        4        2        3        2
## 6        4        2        4        4        3        3        2        4
##   big_2_50 big_2_51 big_2_52 big_2_53 big_2_54 big_2_55 big_2_56 big_2_57
## 1        4        5        5        4        3        3        4        4
## 2        4        3        5        4        4        2        3        4
## 3        2        2        4        3        4        2        4        1
## 4        3        2        4        5        1        3        3        3
## 5        4        4        5        4        1        4        4        4
## 6       NA        2        4        4        4        2        3        2
##   big_2_58 big_2_59 big_2_60 big_2_61 big_2_62 big_2_63 big_2_64 big_2_65
## 1        2        2        3        5        4        1        2        4
## 2        4        2        4        5        4        1        1        4
## 3        4        4        4        5        1        1        4        2
## 4        1        2        4        5        4        1        3        4
## 5        3        2        4        5        5        1        5        3
## 6        3        4        4        3        3        2        3        4

Handling Survey Data

Reverse-coding

Before aggregating items that belong to each personality trait subscale, we need to reverse-code items as needed.

In the scoring key section of the bfi2-form.pdf document, items with an R next to them should be reverse-coded.

Remember that when reverse-coding items, it’s wise to store the results in a new object rather than overwrite the original, raw data. This way, if any errors occur in the reverse-coding process, they can be more easily identified and fixed.

We’ll use the same method for reverse coding that we discussed in class:

\[(Max - X) + Min\]

Max is the highest possible score
X is the participant’s actual score
Min is the lowest possible score

For this example, responses on the BFI-2 were given on a scale from 1 (strongly disagree) to 5 (strongly agree). The maximum, then, is 5, and the minimum is 1.

Additionally, in the code below:

The mutate() function allows us to create new variables that are functions of existing variables in the data set
The across() function allows us to perform a computation across multiple columns in the data set
The period is an indicator to input each participant’s raw score on each of the items listed

data2 <- data %>%
  mutate(across(c(big_2_11,
                  big_2_16,
                  big_2_26,
                  big_2_31,
                  big_2_36,
                  big_2_51,
                  big_2_12,
                  big_2_17,
                  big_2_22,
                  big_2_37,
                  big_2_42,
                  big_2_47,
                  big_2_3,
                  big_2_8,
                  big_2_23,
                  big_2_28,
                  big_2_48,
                  big_2_58,
                  big_2_4,
                  big_2_9,
                  big_2_24,
                  big_2_29,
                  big_2_44,
                  big_2_49,
                  big_2_5,
                  big_2_25,
                  big_2_30,
                  big_2_45,
                  big_2_50,
                  big_2_55,
                  big_2_63), ~5 - . + 1))

Internal Consistency

Next, let’s calculate Cronbach’s alpha to measure the internal consistency of each subscale on this personality measure. Cronbach’s alpha should be calculated separately for each subscale since they are each measuring a different construct.

First, let’s calculate Cronbach’s alpha for the items from the BFI-2 that are meant to be assessing extraversion.

alpha_extraversion <- data2 %>%
  select(big_2_1, big_2_6, big_2_11, big_2_16, big_2_21, big_2_26, big_2_31, big_2_36, big_2_41, big_2_46, big_2_51, big_2_56) %>%
  psych::alpha()

alpha_extraversion

## 
## Reliability analysis   
## Call: psych::alpha(x = .)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.83      0.83    0.85      0.29 4.9 0.015  3.4 0.61     0.27
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt      0.8  0.83  0.86
## Duhachek   0.8  0.83  0.86
## 
##  Reliability if an item is dropped:
##          raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## big_2_1       0.80      0.80    0.82      0.27 4.0    0.018 0.018  0.25
## big_2_6       0.82      0.82    0.84      0.29 4.5    0.017 0.023  0.27
## big_2_11      0.83      0.84    0.85      0.32 5.1    0.015 0.019  0.28
## big_2_16      0.80      0.81    0.83      0.27 4.1    0.018 0.021  0.24
## big_2_21      0.82      0.81    0.84      0.29 4.4    0.017 0.023  0.25
## big_2_26      0.83      0.83    0.85      0.31 4.9    0.015 0.023  0.31
## big_2_31      0.82      0.82    0.84      0.30 4.7    0.016 0.022  0.27
## big_2_36      0.82      0.82    0.85      0.30 4.6    0.016 0.025  0.27
## big_2_41      0.81      0.81    0.83      0.28 4.3    0.017 0.021  0.27
## big_2_46      0.80      0.80    0.82      0.27 4.1    0.018 0.018  0.25
## big_2_51      0.83      0.83    0.85      0.30 4.8    0.016 0.023  0.27
## big_2_56      0.82      0.81    0.83      0.28 4.4    0.017 0.023  0.27
## 
##  Item statistics 
##            n raw.r std.r r.cor r.drop mean   sd
## big_2_1  256  0.78  0.78  0.78   0.71  3.6 1.05
## big_2_6  255  0.60  0.60  0.54   0.49  3.2 1.05
## big_2_11 255  0.36  0.37  0.28   0.24  4.0 0.89
## big_2_16 257  0.73  0.72  0.70   0.64  3.0 1.13
## big_2_21 257  0.61  0.62  0.58   0.52  3.5 0.93
## big_2_26 257  0.44  0.44  0.35   0.31  3.7 1.14
## big_2_31 257  0.54  0.53  0.47   0.42  2.6 1.11
## big_2_36 257  0.54  0.54  0.47   0.43  3.5 0.99
## big_2_41 257  0.64  0.65  0.62   0.55  3.4 1.00
## big_2_46 255  0.73  0.73  0.73   0.65  3.5 1.04
## big_2_51 256  0.50  0.49  0.43   0.38  3.2 1.06
## big_2_56 257  0.61  0.63  0.59   0.52  3.7 0.92
## 
## Non missing response frequency for each item
##             1    2    3    4    5 miss
## big_2_1  0.03 0.13 0.23 0.39 0.21 0.00
## big_2_6  0.04 0.24 0.28 0.34 0.10 0.01
## big_2_11 0.01 0.07 0.13 0.48 0.31 0.01
## big_2_16 0.09 0.29 0.23 0.31 0.07 0.00
## big_2_21 0.02 0.13 0.32 0.41 0.12 0.00
## big_2_26 0.03 0.17 0.19 0.33 0.28 0.00
## big_2_31 0.15 0.43 0.18 0.18 0.05 0.00
## big_2_36 0.04 0.12 0.25 0.47 0.12 0.00
## big_2_41 0.02 0.16 0.31 0.36 0.14 0.00
## big_2_46 0.04 0.16 0.27 0.38 0.15 0.01
## big_2_51 0.04 0.23 0.28 0.34 0.11 0.00
## big_2_56 0.02 0.10 0.27 0.45 0.17 0.00

Q: Do the items on the extraversion subscale have good internal consistency?

Yes, a Cronbach’s alpha between .8 and .9 indicates good internal consistency.

Go ahead and calculate Cronbach’s alpha for the other four subscales (agreeableness, conscientiousness, neuroticism, and openness) by referring to the bfi2-form.pdf document to see which items belong to which subscale.

# Agreeableness items
alpha_agreeableness <- data2 %>%
  dplyr::select(big_2_2, big_2_7, big_2_12, big_2_17, big_2_22, big_2_27, big_2_32, big_2_37, big_2_42, big_2_47, big_2_52, big_2_57) %>%
  psych::alpha()

alpha_agreeableness

## 
## Reliability analysis   
## Call: psych::alpha(x = .)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.77      0.79    0.81      0.23 3.7 0.021  3.7 0.55     0.23
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.73  0.77  0.81
## Duhachek  0.73  0.77  0.81
## 
##  Reliability if an item is dropped:
##          raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## big_2_2       0.75      0.76    0.78      0.22 3.2    0.023 0.018  0.21
## big_2_7       0.76      0.77    0.78      0.23 3.3    0.023 0.016  0.23
## big_2_12      0.75      0.77    0.79      0.24 3.4    0.023 0.018  0.23
## big_2_17      0.79      0.79    0.81      0.26 3.9    0.019 0.015  0.25
## big_2_22      0.75      0.77    0.79      0.23 3.3    0.023 0.019  0.24
## big_2_27      0.75      0.77    0.79      0.23 3.3    0.023 0.019  0.22
## big_2_32      0.75      0.77    0.79      0.23 3.3    0.023 0.019  0.23
## big_2_37      0.74      0.76    0.78      0.22 3.1    0.025 0.018  0.20
## big_2_42      0.77      0.79    0.81      0.25 3.8    0.021 0.016  0.24
## big_2_47      0.75      0.77    0.79      0.23 3.3    0.024 0.019  0.23
## big_2_52      0.75      0.76    0.77      0.22 3.1    0.023 0.014  0.23
## big_2_57      0.75      0.77    0.79      0.23 3.3    0.023 0.019  0.20
## 
##  Item statistics 
##            n raw.r std.r r.cor r.drop mean   sd
## big_2_2  255  0.59  0.63  0.59   0.50  4.1 0.90
## big_2_7  256  0.52  0.57  0.53   0.42  4.5 0.76
## big_2_12 256  0.54  0.52  0.45   0.42  3.2 1.00
## big_2_17 256  0.37  0.32  0.21   0.17  3.6 1.32
## big_2_22 256  0.58  0.56  0.50   0.45  3.7 1.07
## big_2_27 256  0.56  0.57  0.51   0.44  3.8 1.03
## big_2_32 257  0.53  0.55  0.49   0.41  3.9 0.97
## big_2_37 256  0.68  0.65  0.62   0.56  3.4 1.15
## big_2_42 257  0.38  0.37  0.28   0.24  2.8 1.05
## big_2_47 256  0.62  0.58  0.53   0.48  3.7 1.15
## big_2_52 255  0.59  0.64  0.64   0.51  4.4 0.69
## big_2_57 257  0.59  0.59  0.54   0.46  3.5 1.08
## 
## Non missing response frequency for each item
##             1    2    3    4    5 miss
## big_2_2  0.01 0.04 0.17 0.40 0.37 0.01
## big_2_7  0.00 0.02 0.07 0.32 0.58 0.00
## big_2_12 0.02 0.25 0.29 0.35 0.09 0.00
## big_2_17 0.09 0.14 0.17 0.26 0.33 0.00
## big_2_22 0.02 0.14 0.23 0.35 0.26 0.00
## big_2_27 0.03 0.11 0.16 0.45 0.26 0.00
## big_2_32 0.04 0.05 0.14 0.49 0.28 0.00
## big_2_37 0.04 0.20 0.23 0.32 0.21 0.00
## big_2_42 0.07 0.40 0.28 0.19 0.07 0.00
## big_2_47 0.03 0.18 0.18 0.32 0.29 0.00
## big_2_52 0.00 0.00 0.08 0.42 0.49 0.01
## big_2_57 0.04 0.19 0.24 0.37 0.17 0.00

# Conscientiousness items
alpha_conscientiousness <- data2 %>%
  dplyr::select(big_2_3, big_2_8, big_2_13, big_2_18, big_2_23, big_2_28, big_2_33, big_2_38, big_2_43, big_2_48, big_2_53, big_2_58) %>%
  psych::alpha()

alpha_conscientiousness

## 
## Reliability analysis   
## Call: psych::alpha(x = .)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.84      0.84    0.86      0.31 5.3 0.015  3.5 0.61      0.3
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.80  0.84  0.86
## Duhachek  0.81  0.84  0.87
## 
##  Reliability if an item is dropped:
##          raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## big_2_3       0.82      0.82    0.84      0.30 4.7    0.017 0.014  0.29
## big_2_8       0.82      0.83    0.84      0.31 4.9    0.016 0.015  0.30
## big_2_13      0.82      0.83    0.84      0.30 4.7    0.016 0.012  0.29
## big_2_18      0.82      0.83    0.84      0.30 4.8    0.016 0.013  0.29
## big_2_23      0.84      0.84    0.85      0.32 5.2    0.015 0.012  0.30
## big_2_28      0.83      0.83    0.85      0.31 5.0    0.016 0.016  0.30
## big_2_33      0.82      0.82    0.83      0.29 4.6    0.017 0.012  0.29
## big_2_38      0.82      0.83    0.84      0.30 4.8    0.016 0.014  0.29
## big_2_43      0.82      0.83    0.84      0.30 4.8    0.016 0.013  0.30
## big_2_48      0.82      0.82    0.84      0.30 4.7    0.017 0.014  0.28
## big_2_53      0.82      0.82    0.84      0.30 4.7    0.017 0.014  0.29
## big_2_58      0.83      0.84    0.85      0.32 5.2    0.015 0.013  0.30
## 
##  Item statistics 
##            n raw.r std.r r.cor r.drop mean   sd
## big_2_3  257  0.69  0.67  0.64   0.59  3.3 1.16
## big_2_8  256  0.60  0.58  0.53   0.49  3.1 1.13
## big_2_13 257  0.61  0.64  0.61   0.52  4.0 0.90
## big_2_18 257  0.59  0.61  0.56   0.50  3.8 0.93
## big_2_23 256  0.49  0.47  0.40   0.36  2.7 1.12
## big_2_28 255  0.57  0.56  0.49   0.46  3.2 1.10
## big_2_33 256  0.69  0.68  0.66   0.59  3.5 1.10
## big_2_38 256  0.61  0.63  0.58   0.52  3.9 0.87
## big_2_43 256  0.58  0.61  0.58   0.50  4.1 0.82
## big_2_48 257  0.65  0.65  0.60   0.56  4.0 1.01
## big_2_53 256  0.64  0.66  0.62   0.55  3.9 0.90
## big_2_58 257  0.51  0.49  0.41   0.39  2.8 1.11
## 
## Non missing response frequency for each item
##             1    2    3    4    5 miss
## big_2_3  0.05 0.25 0.22 0.31 0.18 0.00
## big_2_8  0.05 0.32 0.27 0.23 0.13 0.00
## big_2_13 0.01 0.04 0.21 0.40 0.33 0.00
## big_2_18 0.02 0.08 0.23 0.45 0.23 0.00
## big_2_23 0.13 0.39 0.21 0.21 0.06 0.00
## big_2_28 0.03 0.29 0.25 0.29 0.14 0.01
## big_2_33 0.03 0.18 0.25 0.34 0.20 0.00
## big_2_38 0.01 0.05 0.20 0.48 0.26 0.00
## big_2_43 0.01 0.04 0.12 0.49 0.35 0.00
## big_2_48 0.02 0.08 0.18 0.35 0.37 0.00
## big_2_53 0.00 0.09 0.18 0.48 0.25 0.00
## big_2_58 0.07 0.42 0.21 0.21 0.09 0.00

# Neuroticism items
alpha_neuroticism <- data2 %>%
  dplyr::select(big_2_4, big_2_9, big_2_14, big_2_19, big_2_24, big_2_29, big_2_34, big_2_39, big_2_44, big_2_49, big_2_54, big_2_59) %>%
  psych::alpha()

alpha_neuroticism

## 
## Reliability analysis   
## Call: psych::alpha(x = .)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.88      0.87    0.89      0.37   7 0.011  2.7 0.72     0.37
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt     0.85  0.88   0.9
## Duhachek  0.85  0.88   0.9
## 
##  Reliability if an item is dropped:
##          raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## big_2_4       0.86      0.86    0.87      0.36 6.2    0.013 0.0106  0.36
## big_2_9       0.87      0.87    0.88      0.38 6.8    0.012 0.0097  0.39
## big_2_14      0.86      0.86    0.87      0.37 6.4    0.012 0.0104  0.37
## big_2_19      0.87      0.87    0.88      0.39 6.9    0.012 0.0084  0.39
## big_2_24      0.87      0.87    0.88      0.38 6.6    0.012 0.0101  0.37
## big_2_29      0.86      0.86    0.87      0.36 6.2    0.013 0.0101  0.35
## big_2_34      0.86      0.86    0.87      0.36 6.2    0.013 0.0107  0.36
## big_2_39      0.86      0.86    0.87      0.36 6.2    0.013 0.0091  0.35
## big_2_44      0.87      0.86    0.87      0.37 6.4    0.012 0.0101  0.37
## big_2_49      0.87      0.87    0.88      0.37 6.4    0.012 0.0112  0.37
## big_2_54      0.86      0.86    0.87      0.36 6.1    0.013 0.0090  0.35
## big_2_59      0.87      0.86    0.87      0.37 6.4    0.012 0.0100  0.37
## 
##  Item statistics 
##            n raw.r std.r r.cor r.drop mean   sd
## big_2_4  257  0.69  0.70  0.67   0.62  2.7 1.11
## big_2_9  257  0.53  0.55  0.48   0.45  2.3 0.96
## big_2_14 257  0.67  0.66  0.62   0.58  2.8 1.15
## big_2_19 257  0.52  0.52  0.45   0.43  3.2 1.03
## big_2_24 257  0.58  0.59  0.53   0.49  2.3 1.01
## big_2_29 256  0.69  0.70  0.67   0.62  2.5 1.11
## big_2_34 257  0.71  0.69  0.67   0.62  3.3 1.24
## big_2_39 256  0.70  0.70  0.69   0.63  2.6 1.10
## big_2_44 256  0.65  0.66  0.62   0.58  2.3 0.96
## big_2_49 256  0.65  0.65  0.60   0.56  3.5 1.18
## big_2_54 257  0.72  0.72  0.71   0.65  2.6 1.17
## big_2_59 257  0.66  0.65  0.62   0.57  2.7 1.17
## 
## Non missing response frequency for each item
##             1    2    3    4    5 miss
## big_2_4  0.13 0.33 0.27 0.21 0.06    0
## big_2_9  0.19 0.43 0.25 0.12 0.01    0
## big_2_14 0.14 0.30 0.23 0.28 0.05    0
## big_2_19 0.05 0.20 0.26 0.42 0.07    0
## big_2_24 0.22 0.40 0.26 0.10 0.03    0
## big_2_29 0.21 0.35 0.23 0.18 0.03    0
## big_2_34 0.08 0.23 0.17 0.32 0.19    0
## big_2_39 0.17 0.34 0.25 0.20 0.04    0
## big_2_44 0.21 0.43 0.23 0.12 0.01    0
## big_2_49 0.07 0.14 0.20 0.38 0.20    0
## big_2_54 0.21 0.30 0.22 0.23 0.04    0
## big_2_59 0.18 0.32 0.21 0.24 0.05    0

# Openness items
alpha_openness <- data2 %>%
  dplyr::select(big_2_5, big_2_10, big_2_15, big_2_20, big_2_25, big_2_30, big_2_35, big_2_40, big_2_45, big_2_50, big_2_55, big_2_60) %>%
  psych::alpha()

alpha_openness

## 
## Reliability analysis   
## Call: psych::alpha(x = .)
## 
##   raw_alpha std.alpha G6(smc) average_r S/N   ase mean   sd median_r
##       0.83      0.84    0.85       0.3 5.1 0.015  3.7 0.63     0.31
## 
##     95% confidence boundaries 
##          lower alpha upper
## Feldt      0.8  0.83  0.86
## Duhachek   0.8  0.83  0.86
## 
##  Reliability if an item is dropped:
##          raw_alpha std.alpha G6(smc) average_r S/N alpha se  var.r med.r
## big_2_5       0.82      0.82    0.84      0.30 4.7    0.017 0.0100  0.31
## big_2_10      0.82      0.82    0.84      0.30 4.7    0.017 0.0099  0.31
## big_2_15      0.82      0.83    0.84      0.30 4.8    0.016 0.0094  0.31
## big_2_20      0.82      0.82    0.83      0.30 4.6    0.017 0.0078  0.31
## big_2_25      0.82      0.82    0.83      0.29 4.6    0.017 0.0102  0.30
## big_2_30      0.82      0.82    0.83      0.30 4.6    0.017 0.0096  0.31
## big_2_35      0.81      0.81    0.83      0.29 4.4    0.017 0.0088  0.30
## big_2_40      0.82      0.82    0.84      0.30 4.7    0.017 0.0104  0.31
## big_2_45      0.82      0.82    0.84      0.30 4.7    0.017 0.0100  0.30
## big_2_50      0.83      0.83    0.84      0.31 4.8    0.016 0.0093  0.31
## big_2_55      0.82      0.83    0.84      0.30 4.8    0.016 0.0101  0.31
## big_2_60      0.83      0.83    0.84      0.30 4.8    0.016 0.0093  0.31
## 
##  Item statistics 
##            n raw.r std.r r.cor r.drop mean   sd
## big_2_5  256  0.61  0.59  0.54   0.49  3.4 1.19
## big_2_10 256  0.58  0.60  0.56   0.49  4.1 0.84
## big_2_15 257  0.53  0.56  0.51   0.44  3.7 0.90
## big_2_20 256  0.62  0.61  0.59   0.52  3.7 1.11
## big_2_25 257  0.64  0.62  0.58   0.53  3.7 1.16
## big_2_30 256  0.62  0.62  0.58   0.52  3.6 1.10
## big_2_35 257  0.69  0.69  0.68   0.62  3.9 1.00
## big_2_40 257  0.59  0.60  0.55   0.49  3.8 1.02
## big_2_45 255  0.60  0.60  0.55   0.50  3.8 1.02
## big_2_50 256  0.58  0.55  0.49   0.45  3.0 1.33
## big_2_55 255  0.58  0.57  0.52   0.48  3.6 1.02
## big_2_60 255  0.52  0.55  0.49   0.43  3.7 0.91
## 
## Non missing response frequency for each item
##             1    2    3    4    5 miss
## big_2_5  0.05 0.23 0.22 0.29 0.20 0.00
## big_2_10 0.01 0.03 0.17 0.45 0.34 0.00
## big_2_15 0.02 0.09 0.21 0.53 0.14 0.00
## big_2_20 0.04 0.12 0.25 0.32 0.27 0.00
## big_2_25 0.05 0.12 0.21 0.32 0.30 0.00
## big_2_30 0.03 0.16 0.21 0.35 0.25 0.00
## big_2_35 0.02 0.07 0.22 0.37 0.32 0.00
## big_2_40 0.02 0.10 0.24 0.39 0.26 0.00
## big_2_45 0.02 0.10 0.21 0.37 0.30 0.01
## big_2_50 0.16 0.28 0.17 0.24 0.16 0.00
## big_2_55 0.03 0.14 0.24 0.42 0.17 0.01
## big_2_60 0.01 0.11 0.23 0.49 0.15 0.01

Q: How would you judge the internal consistency of the items on each of the other four personality trait subscales?

Agreeableness subscale has acceptable internal consistency. The conscientiousness, neuroticism, and openness subscales all have good internal consistency.

Aggregating Scores

Now that appropriate items have been reverse-coded, we can create a single, aggregate score for each of the subscales that represent how participants scored on each personality trait overall.

A common method of creating a single composite score for a variable measured using multiple items is by calculating the average score across all of the corresponding items. Make sure you use the correct data set containing the reverse-coded items (not the raw data set)!

First, let’s create an aggregate score for the extraversion subscale. We can do this by selecting the columns from the dataframe corresponding to our extraversion items and then pass these items to the rowMeans() function.

Note about how missing data is handled:

By default, the rowMeans() function does not calculate the mean for rows with missing data on any of the items
To override this, you can add the na.rm = TRUE argument, in which case the row mean will be calculated for everyone by simply excluding any missing entries from the calculation

This might be a reasonable approach to handling missingness if the amount of missing data per participant is minimal. Otherwise, we will talk about more methods for handling missing data later on in the course.

data2$extraversion <- data2 %>%
  select(big_2_1, big_2_6, big_2_11, big_2_16, big_2_21, big_2_26, big_2_31, big_2_36, big_2_41, big_2_46, big_2_51, big_2_56) %>%
  rowMeans(na.rm = TRUE)

Let’s take a look at the data frame to make sure the new aggregated variable was added at the end of it as expected.

View(data2)

Go ahead and construct an aggregate score for the other four subscales (agreeableness, conscientiousness, neuroticism, and openness) by referring to the bfi2-form.pdf document to see which items belong to which subscale.

data2$agreeableness <- data2 %>%
  dplyr::select(big_2_2, big_2_7, big_2_12, big_2_17, big_2_22, big_2_27, big_2_32, big_2_37, big_2_42, big_2_47, big_2_52, big_2_57) %>%
  rowMeans(na.rm = TRUE)

data2$conscientiousness <- data2 %>%
  dplyr::select(big_2_3, big_2_8, big_2_13, big_2_18, big_2_23, big_2_28, big_2_33, big_2_38, big_2_43, big_2_48, big_2_53, big_2_58) %>%
  rowMeans(na.rm = TRUE)

data2$neuroticism <- data2 %>%
  dplyr::select(big_2_4, big_2_9, big_2_14, big_2_19, big_2_24, big_2_29, big_2_34, big_2_39, big_2_44, big_2_49, big_2_54, big_2_59) %>%
  rowMeans(na.rm = TRUE)

data2$openness <- data2 %>%
  dplyr::select(big_2_5, big_2_10, big_2_15, big_2_20, big_2_25, big_2_30, big_2_35, big_2_40, big_2_45, big_2_50, big_2_55, big_2_60) %>%
  rowMeans(na.rm = TRUE)

For reasons we will discuss more later in the course, it’s important to understand the relationships between continuous variables that you intend to use as predictors in your regression models. Next, let’s move onto examining the correlations among the five personality traits that we just constructed aggregated variables for.

Correlations

Covariance

Covariance captures how the variances of two variables are related, i.e., how they co-vary. If higher values of one variable tend to correspond with higher values of the other variable, and lower values of one variable tend to correspond with lower values of the other variable, then the covariance would be positive. However, if the two variables are inversely related (i.e., higher values on one variable correspond with lower values on the other variable), then the covariance would be negative.

\[\large cov_{xy} = {\frac{\sum{(x-\bar{x})(y-\bar{y})}}{N-1}}\]

To calculate covariance, use the function cov() from the {stats} package. The cov() function takes two arguments: the first variable “x” and the second variable “y”.

Let’s calculate the covariance between extraversion and agreeableness.

cov(data2$extraversion, data2$agreeableness)

## [1] 0.06302961

Covariance Matrix

Feeding cov() a data frame, or multiple columns from a data frame, will generate a covariance matrix. Let’s calculate a covariance matrix that shows the covariance for all pairs of personality traits. Round to two decimal places.

data2 %>%
  select(extraversion, agreeableness, conscientiousness, neuroticism, openness) %>%
  cov() %>%
  round(2)

##                   extraversion agreeableness conscientiousness neuroticism
## extraversion              0.37          0.06              0.12       -0.14
## agreeableness             0.06          0.30              0.15       -0.14
## conscientiousness         0.12          0.15              0.37       -0.14
## neuroticism              -0.14         -0.14             -0.14        0.51
## openness                  0.14          0.11              0.11       -0.08
##                   openness
## extraversion          0.14
## agreeableness         0.11
## conscientiousness     0.11
## neuroticism          -0.08
## openness              0.39

Correlation

Correlations are standardized covariances. Because correlations are in standardized units, we can compare them across scales of measurements and across studies. Recall that mathematically, a correlation is the covariance divided by the product of the standard deviations of each variable.

\[\large r_{xy} = {\frac{cov(X,Y)}{\hat\sigma_{x}\hat\sigma_{y}}}\]

Let’s calculate the correlation coefficient for the relationship between extraversion and agreeableness using the cor() function from the {stats} package.

cor(data2$extraversion, data2$agreeableness)

## [1] 0.189135

Correlation Matrix

As with covariances, we can generate a matrix of correlations by feeding a data frame, or multiple columns from a data frame, to cor(). Let’s calculate a covariance matrix that shows the covariance for all pairs of personality traits. Round to two decimal places.

cor_matrix <- data2 %>%
  select(extraversion, agreeableness, conscientiousness, neuroticism, openness) %>%
  cor() %>%
  round(2)

cor_matrix

##                   extraversion agreeableness conscientiousness neuroticism
## extraversion              1.00          0.19              0.33       -0.32
## agreeableness             0.19          1.00              0.46       -0.35
## conscientiousness         0.33          0.46              1.00       -0.33
## neuroticism              -0.32         -0.35             -0.33        1.00
## openness                  0.37          0.32              0.28       -0.17
##                   openness
## extraversion          0.37
## agreeableness         0.32
## conscientiousness     0.28
## neuroticism          -0.17
## openness              1.00

Q: What do you notice about the relationships between different pairs of personality traits based on this correlation matrix?

None of the correlations are above 0.50. The strongest relationship appears to exist between conscientiousness and agreeableness, r = 0.46. The relationship between the two is positive (as people increase on conscientiousness, they tended to increase on agreeableness). The weakest relationship appears to exist between neuroticism and openness, r = -0.17. The relationship between the two is negative (as people increase on neurocisim, they tend to decrease on openness, but the relationship is weak).

There’s many other things you could notice!

Visualizing Correlations

Scatterplots

You all are already familiar with scatterplots which can be used to visualize the relationship between two continuous variables. For example, let’s use ggplot to visualize the relationship between extraversion and agreeableness. You can add a line of best fit by adding a geom_smooth() layer to the plot.

ggplot(data = data2, aes(x = extraversion, y = agreeableness)) +
  geom_point() +
  geom_smooth(method = "lm")

## `geom_smooth()` using formula = 'y ~ x'

SPLOM plots

There are also ways of visualizing the correlations among all of the continuous variables in your dataframe that are of interest in your study. “SPLOM” stands for scatter plot matrix. The pairs.panel() function from the {psych} package allows a quick way to visualize relationships among all the continuous variables in your data frame. The lower diagonal contains scatter plots showing bivariate relationships between pairs of variables, and the upper diagonal contains the corresponding correlation coefficients. Histograms for each variable are shown along the diagonal.

data2 %>%
  select(extraversion, agreeableness, conscientiousness, neuroticism, openness) %>%
  pairs.panels(lm = TRUE)

Heat maps

Heat maps are a great way to get a high-level visualization of a correlation matrix. They are particularly useful for visualizing the number of “clusters” in your data if that’s something you’re looking for. We can plot a heatmap of a correlation matrix using the corrplot() function from the {corrplot} package. Note: make sure that you are feeding the function a correlation matrix (not the data set). We’ll use the correlation matrix that we constructed earlier.

corrplot(corr = cor_matrix, method = "square")

APA Tables

The package {apaTables} has a very useful function apa.cor.table() that creates nicely formatted tables of correlation matrices in APA format. This code prints the table to a word document called “cor_matrix.doc” that shows up as a separate document in the file folder that is set as your current working directory.

apa.cor.table(cor_matrix, 
              filename = "cor_matrix.doc", 
              table.number = 1)

Testing Significance of a Correlation

The corr.test() function from the {psych} package can be used to test whether the correlation between two variables is significantly different from zero.

Let’s test whether the correlation between extraversion and agreeabless (r = 0.19) is significantly different from zero.

data2 %>%
  select(extraversion, agreeableness) %>%
  corr.test()

## Call:corr.test(x = .)
## Correlation matrix 
##               extraversion agreeableness
## extraversion          1.00          0.19
## agreeableness         0.19          1.00
## Sample Size 
## [1] 257
## Probability values (Entries above the diagonal are adjusted for multiple tests.) 
##               extraversion agreeableness
## extraversion             0             0
## agreeableness            0             0
## 
##  To see confidence intervals of the correlations, print with the short=FALSE option

The p-values are not printed to multiple decimals, so it is difficult to extrapolate whether the correlation is significant or not. To work around this, let’s store the output of corr.test() to an object and then look at the p-value stored within this object. Additionally, we can pull out the confidence interval.

r_ext_agr <- data2 %>%
  select(extraversion, agreeableness) %>%
  corr.test()

r_ext_agr$p

##               extraversion agreeableness
## extraversion   0.000000000   0.002328257
## agreeableness  0.002328257   0.000000000

r_ext_agr$ci

##                  lower        r     upper           p
## extrv-agrbl 0.06835406 0.189135 0.3044518 0.002328257

Q: Is the correlation between extraversion and agreeableness significant?

Yes, the p-value is .002, which is less than an alpha of .05.