A researcher measures personality in their study using the Big Five Inventory-2 (BFI-2), which includes five subscales capturing each of the following personality traits: 1) Extraversion, 2) Agreeableness, 3) Conscientiousness, 4) Neuroticism (or negative emotionality), and 5) Openness (or open-mindedness).
We need to prepare the data collected using this survey for analysis. You can see each of the items as well as which items belong to which of the subscales listed above on the bfi2-form.pdf document.
First, let’s import the data.
data <- import("bfi2.csv")
head(data)
## big_2_1 big_2_2 big_2_3 big_2_4 big_2_5 big_2_6 big_2_7 big_2_8 big_2_9
## 1 4 5 2 3 4 3 5 2 4
## 2 5 4 2 2 4 4 5 4 2
## 3 3 5 3 2 1 3 4 4 2
## 4 4 5 1 4 5 4 5 1 4
## 5 4 4 4 2 4 2 5 4 4
## 6 3 2 2 3 NA 3 3 3 3
## big_2_10 big_2_11 big_2_12 big_2_13 big_2_14 big_2_15 big_2_16 big_2_17
## 1 4 1 2 5 4 2 5 1
## 2 4 3 4 4 4 3 2 1
## 3 5 2 2 4 4 3 3 2
## 4 3 1 1 5 3 4 3 4
## 5 4 2 2 4 4 3 4 1
## 6 4 3 3 2 4 2 2 3
## big_2_18 big_2_19 big_2_20 big_2_21 big_2_22 big_2_23 big_2_24 big_2_25
## 1 4 4 4 3 2 2 4 2
## 2 5 5 2 5 2 5 2 2
## 3 5 4 4 4 2 4 2 2
## 4 3 1 4 3 1 3 4 3
## 5 3 3 2 2 1 4 5 2
## 6 2 2 3 2 2 2 4 3
## big_2_26 big_2_27 big_2_28 big_2_29 big_2_30 big_2_31 big_2_32 big_2_33
## 1 2 4 2 4 2 4 5 5
## 2 2 5 2 4 4 4 4 4
## 3 1 3 3 4 1 4 1 5
## 4 2 2 NA 5 5 3 5 5
## 5 2 4 2 4 2 4 4 2
## 6 2 4 4 2 4 3 3 2
## big_2_34 big_2_35 big_2_36 big_2_37 big_2_38 big_2_39 big_2_40 big_2_41
## 1 4 4 2 2 5 2 4 4
## 2 5 3 1 2 4 4 5 3
## 3 5 5 2 4 4 4 3 3
## 4 3 3 3 1 5 1 3 4
## 5 3 4 2 2 4 2 4 4
## 6 4 5 4 3 3 4 3 3
## big_2_42 big_2_43 big_2_44 big_2_45 big_2_46 big_2_47 big_2_48 big_2_49
## 1 2 5 4 2 2 1 1 2
## 2 4 4 5 3 5 2 3 1
## 3 4 4 2 2 3 2 1 1
## 4 3 5 5 2 3 1 1 2
## 5 2 4 4 2 4 2 3 2
## 6 4 2 4 4 3 3 2 4
## big_2_50 big_2_51 big_2_52 big_2_53 big_2_54 big_2_55 big_2_56 big_2_57
## 1 4 5 5 4 3 3 4 4
## 2 4 3 5 4 4 2 3 4
## 3 2 2 4 3 4 2 4 1
## 4 3 2 4 5 1 3 3 3
## 5 4 4 5 4 1 4 4 4
## 6 NA 2 4 4 4 2 3 2
## big_2_58 big_2_59 big_2_60 big_2_61 big_2_62 big_2_63 big_2_64 big_2_65
## 1 2 2 3 5 4 1 2 4
## 2 4 2 4 5 4 1 1 4
## 3 4 4 4 5 1 1 4 2
## 4 1 2 4 5 4 1 3 4
## 5 3 2 4 5 5 1 5 3
## 6 3 4 4 3 3 2 3 4
Before aggregating items that belong to each personality trait subscale, we need to reverse-code items as needed.
In the scoring key section of the bfi2-form.pdf document, items with an R next to them should be reverse-coded.
Remember that when reverse-coding items, it’s wise to store the results in a new object rather than overwrite the original, raw data. This way, if any errors occur in the reverse-coding process, they can be more easily identified and fixed.
We’ll use the same method for reverse coding that we discussed in class:
\[(Max - X) + Min\]
For this example, responses on the BFI-2 were given on a scale from 1 (strongly disagree) to 5 (strongly agree). The maximum, then, is 5, and the minimum is 1.
Additionally, in the code below:
The mutate() function allows us to create new variables that are functions of existing variables in the data set
The across() function allows us to perform a computation across multiple columns in the data set
The period is an indicator to input each participant’s raw score on each of the items listed
data2 <- data %>%
mutate(across(c(big_2_11,
big_2_16,
big_2_26,
big_2_31,
big_2_36,
big_2_51,
big_2_12,
big_2_17,
big_2_22,
big_2_37,
big_2_42,
big_2_47,
big_2_3,
big_2_8,
big_2_23,
big_2_28,
big_2_48,
big_2_58,
big_2_4,
big_2_9,
big_2_24,
big_2_29,
big_2_44,
big_2_49,
big_2_5,
big_2_25,
big_2_30,
big_2_45,
big_2_50,
big_2_55,
big_2_63), ~5 - . + 1))
Next, let’s calculate Cronbach’s alpha to measure the internal consistency of each subscale on this personality measure. Cronbach’s alpha should be calculated separately for each subscale since they are each measuring a different construct.
First, let’s calculate Cronbach’s alpha for the items from the BFI-2 that are meant to be assessing extraversion.
alpha_extraversion <- data2 %>%
select(big_2_1, big_2_6, big_2_11, big_2_16, big_2_21, big_2_26, big_2_31, big_2_36, big_2_41, big_2_46, big_2_51, big_2_56) %>%
psych::alpha()
alpha_extraversion
##
## Reliability analysis
## Call: psych::alpha(x = .)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.83 0.83 0.85 0.29 4.9 0.015 3.4 0.61 0.27
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.8 0.83 0.86
## Duhachek 0.8 0.83 0.86
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## big_2_1 0.80 0.80 0.82 0.27 4.0 0.018 0.018 0.25
## big_2_6 0.82 0.82 0.84 0.29 4.5 0.017 0.023 0.27
## big_2_11 0.83 0.84 0.85 0.32 5.1 0.015 0.019 0.28
## big_2_16 0.80 0.81 0.83 0.27 4.1 0.018 0.021 0.24
## big_2_21 0.82 0.81 0.84 0.29 4.4 0.017 0.023 0.25
## big_2_26 0.83 0.83 0.85 0.31 4.9 0.015 0.023 0.31
## big_2_31 0.82 0.82 0.84 0.30 4.7 0.016 0.022 0.27
## big_2_36 0.82 0.82 0.85 0.30 4.6 0.016 0.025 0.27
## big_2_41 0.81 0.81 0.83 0.28 4.3 0.017 0.021 0.27
## big_2_46 0.80 0.80 0.82 0.27 4.1 0.018 0.018 0.25
## big_2_51 0.83 0.83 0.85 0.30 4.8 0.016 0.023 0.27
## big_2_56 0.82 0.81 0.83 0.28 4.4 0.017 0.023 0.27
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## big_2_1 256 0.78 0.78 0.78 0.71 3.6 1.05
## big_2_6 255 0.60 0.60 0.54 0.49 3.2 1.05
## big_2_11 255 0.36 0.37 0.28 0.24 4.0 0.89
## big_2_16 257 0.73 0.72 0.70 0.64 3.0 1.13
## big_2_21 257 0.61 0.62 0.58 0.52 3.5 0.93
## big_2_26 257 0.44 0.44 0.35 0.31 3.7 1.14
## big_2_31 257 0.54 0.53 0.47 0.42 2.6 1.11
## big_2_36 257 0.54 0.54 0.47 0.43 3.5 0.99
## big_2_41 257 0.64 0.65 0.62 0.55 3.4 1.00
## big_2_46 255 0.73 0.73 0.73 0.65 3.5 1.04
## big_2_51 256 0.50 0.49 0.43 0.38 3.2 1.06
## big_2_56 257 0.61 0.63 0.59 0.52 3.7 0.92
##
## Non missing response frequency for each item
## 1 2 3 4 5 miss
## big_2_1 0.03 0.13 0.23 0.39 0.21 0.00
## big_2_6 0.04 0.24 0.28 0.34 0.10 0.01
## big_2_11 0.01 0.07 0.13 0.48 0.31 0.01
## big_2_16 0.09 0.29 0.23 0.31 0.07 0.00
## big_2_21 0.02 0.13 0.32 0.41 0.12 0.00
## big_2_26 0.03 0.17 0.19 0.33 0.28 0.00
## big_2_31 0.15 0.43 0.18 0.18 0.05 0.00
## big_2_36 0.04 0.12 0.25 0.47 0.12 0.00
## big_2_41 0.02 0.16 0.31 0.36 0.14 0.00
## big_2_46 0.04 0.16 0.27 0.38 0.15 0.01
## big_2_51 0.04 0.23 0.28 0.34 0.11 0.00
## big_2_56 0.02 0.10 0.27 0.45 0.17 0.00
Q: Do the items on the extraversion subscale have good internal consistency?
Yes, a Cronbach’s alpha between .8 and .9 indicates good internal consistency.
Go ahead and calculate Cronbach’s alpha for the other four subscales (agreeableness, conscientiousness, neuroticism, and openness) by referring to the bfi2-form.pdf document to see which items belong to which subscale.
# Agreeableness items
alpha_agreeableness <- data2 %>%
dplyr::select(big_2_2, big_2_7, big_2_12, big_2_17, big_2_22, big_2_27, big_2_32, big_2_37, big_2_42, big_2_47, big_2_52, big_2_57) %>%
psych::alpha()
alpha_agreeableness
##
## Reliability analysis
## Call: psych::alpha(x = .)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.77 0.79 0.81 0.23 3.7 0.021 3.7 0.55 0.23
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.73 0.77 0.81
## Duhachek 0.73 0.77 0.81
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## big_2_2 0.75 0.76 0.78 0.22 3.2 0.023 0.018 0.21
## big_2_7 0.76 0.77 0.78 0.23 3.3 0.023 0.016 0.23
## big_2_12 0.75 0.77 0.79 0.24 3.4 0.023 0.018 0.23
## big_2_17 0.79 0.79 0.81 0.26 3.9 0.019 0.015 0.25
## big_2_22 0.75 0.77 0.79 0.23 3.3 0.023 0.019 0.24
## big_2_27 0.75 0.77 0.79 0.23 3.3 0.023 0.019 0.22
## big_2_32 0.75 0.77 0.79 0.23 3.3 0.023 0.019 0.23
## big_2_37 0.74 0.76 0.78 0.22 3.1 0.025 0.018 0.20
## big_2_42 0.77 0.79 0.81 0.25 3.8 0.021 0.016 0.24
## big_2_47 0.75 0.77 0.79 0.23 3.3 0.024 0.019 0.23
## big_2_52 0.75 0.76 0.77 0.22 3.1 0.023 0.014 0.23
## big_2_57 0.75 0.77 0.79 0.23 3.3 0.023 0.019 0.20
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## big_2_2 255 0.59 0.63 0.59 0.50 4.1 0.90
## big_2_7 256 0.52 0.57 0.53 0.42 4.5 0.76
## big_2_12 256 0.54 0.52 0.45 0.42 3.2 1.00
## big_2_17 256 0.37 0.32 0.21 0.17 3.6 1.32
## big_2_22 256 0.58 0.56 0.50 0.45 3.7 1.07
## big_2_27 256 0.56 0.57 0.51 0.44 3.8 1.03
## big_2_32 257 0.53 0.55 0.49 0.41 3.9 0.97
## big_2_37 256 0.68 0.65 0.62 0.56 3.4 1.15
## big_2_42 257 0.38 0.37 0.28 0.24 2.8 1.05
## big_2_47 256 0.62 0.58 0.53 0.48 3.7 1.15
## big_2_52 255 0.59 0.64 0.64 0.51 4.4 0.69
## big_2_57 257 0.59 0.59 0.54 0.46 3.5 1.08
##
## Non missing response frequency for each item
## 1 2 3 4 5 miss
## big_2_2 0.01 0.04 0.17 0.40 0.37 0.01
## big_2_7 0.00 0.02 0.07 0.32 0.58 0.00
## big_2_12 0.02 0.25 0.29 0.35 0.09 0.00
## big_2_17 0.09 0.14 0.17 0.26 0.33 0.00
## big_2_22 0.02 0.14 0.23 0.35 0.26 0.00
## big_2_27 0.03 0.11 0.16 0.45 0.26 0.00
## big_2_32 0.04 0.05 0.14 0.49 0.28 0.00
## big_2_37 0.04 0.20 0.23 0.32 0.21 0.00
## big_2_42 0.07 0.40 0.28 0.19 0.07 0.00
## big_2_47 0.03 0.18 0.18 0.32 0.29 0.00
## big_2_52 0.00 0.00 0.08 0.42 0.49 0.01
## big_2_57 0.04 0.19 0.24 0.37 0.17 0.00
# Conscientiousness items
alpha_conscientiousness <- data2 %>%
dplyr::select(big_2_3, big_2_8, big_2_13, big_2_18, big_2_23, big_2_28, big_2_33, big_2_38, big_2_43, big_2_48, big_2_53, big_2_58) %>%
psych::alpha()
alpha_conscientiousness
##
## Reliability analysis
## Call: psych::alpha(x = .)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.84 0.84 0.86 0.31 5.3 0.015 3.5 0.61 0.3
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.80 0.84 0.86
## Duhachek 0.81 0.84 0.87
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## big_2_3 0.82 0.82 0.84 0.30 4.7 0.017 0.014 0.29
## big_2_8 0.82 0.83 0.84 0.31 4.9 0.016 0.015 0.30
## big_2_13 0.82 0.83 0.84 0.30 4.7 0.016 0.012 0.29
## big_2_18 0.82 0.83 0.84 0.30 4.8 0.016 0.013 0.29
## big_2_23 0.84 0.84 0.85 0.32 5.2 0.015 0.012 0.30
## big_2_28 0.83 0.83 0.85 0.31 5.0 0.016 0.016 0.30
## big_2_33 0.82 0.82 0.83 0.29 4.6 0.017 0.012 0.29
## big_2_38 0.82 0.83 0.84 0.30 4.8 0.016 0.014 0.29
## big_2_43 0.82 0.83 0.84 0.30 4.8 0.016 0.013 0.30
## big_2_48 0.82 0.82 0.84 0.30 4.7 0.017 0.014 0.28
## big_2_53 0.82 0.82 0.84 0.30 4.7 0.017 0.014 0.29
## big_2_58 0.83 0.84 0.85 0.32 5.2 0.015 0.013 0.30
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## big_2_3 257 0.69 0.67 0.64 0.59 3.3 1.16
## big_2_8 256 0.60 0.58 0.53 0.49 3.1 1.13
## big_2_13 257 0.61 0.64 0.61 0.52 4.0 0.90
## big_2_18 257 0.59 0.61 0.56 0.50 3.8 0.93
## big_2_23 256 0.49 0.47 0.40 0.36 2.7 1.12
## big_2_28 255 0.57 0.56 0.49 0.46 3.2 1.10
## big_2_33 256 0.69 0.68 0.66 0.59 3.5 1.10
## big_2_38 256 0.61 0.63 0.58 0.52 3.9 0.87
## big_2_43 256 0.58 0.61 0.58 0.50 4.1 0.82
## big_2_48 257 0.65 0.65 0.60 0.56 4.0 1.01
## big_2_53 256 0.64 0.66 0.62 0.55 3.9 0.90
## big_2_58 257 0.51 0.49 0.41 0.39 2.8 1.11
##
## Non missing response frequency for each item
## 1 2 3 4 5 miss
## big_2_3 0.05 0.25 0.22 0.31 0.18 0.00
## big_2_8 0.05 0.32 0.27 0.23 0.13 0.00
## big_2_13 0.01 0.04 0.21 0.40 0.33 0.00
## big_2_18 0.02 0.08 0.23 0.45 0.23 0.00
## big_2_23 0.13 0.39 0.21 0.21 0.06 0.00
## big_2_28 0.03 0.29 0.25 0.29 0.14 0.01
## big_2_33 0.03 0.18 0.25 0.34 0.20 0.00
## big_2_38 0.01 0.05 0.20 0.48 0.26 0.00
## big_2_43 0.01 0.04 0.12 0.49 0.35 0.00
## big_2_48 0.02 0.08 0.18 0.35 0.37 0.00
## big_2_53 0.00 0.09 0.18 0.48 0.25 0.00
## big_2_58 0.07 0.42 0.21 0.21 0.09 0.00
# Neuroticism items
alpha_neuroticism <- data2 %>%
dplyr::select(big_2_4, big_2_9, big_2_14, big_2_19, big_2_24, big_2_29, big_2_34, big_2_39, big_2_44, big_2_49, big_2_54, big_2_59) %>%
psych::alpha()
alpha_neuroticism
##
## Reliability analysis
## Call: psych::alpha(x = .)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.88 0.87 0.89 0.37 7 0.011 2.7 0.72 0.37
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.85 0.88 0.9
## Duhachek 0.85 0.88 0.9
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## big_2_4 0.86 0.86 0.87 0.36 6.2 0.013 0.0106 0.36
## big_2_9 0.87 0.87 0.88 0.38 6.8 0.012 0.0097 0.39
## big_2_14 0.86 0.86 0.87 0.37 6.4 0.012 0.0104 0.37
## big_2_19 0.87 0.87 0.88 0.39 6.9 0.012 0.0084 0.39
## big_2_24 0.87 0.87 0.88 0.38 6.6 0.012 0.0101 0.37
## big_2_29 0.86 0.86 0.87 0.36 6.2 0.013 0.0101 0.35
## big_2_34 0.86 0.86 0.87 0.36 6.2 0.013 0.0107 0.36
## big_2_39 0.86 0.86 0.87 0.36 6.2 0.013 0.0091 0.35
## big_2_44 0.87 0.86 0.87 0.37 6.4 0.012 0.0101 0.37
## big_2_49 0.87 0.87 0.88 0.37 6.4 0.012 0.0112 0.37
## big_2_54 0.86 0.86 0.87 0.36 6.1 0.013 0.0090 0.35
## big_2_59 0.87 0.86 0.87 0.37 6.4 0.012 0.0100 0.37
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## big_2_4 257 0.69 0.70 0.67 0.62 2.7 1.11
## big_2_9 257 0.53 0.55 0.48 0.45 2.3 0.96
## big_2_14 257 0.67 0.66 0.62 0.58 2.8 1.15
## big_2_19 257 0.52 0.52 0.45 0.43 3.2 1.03
## big_2_24 257 0.58 0.59 0.53 0.49 2.3 1.01
## big_2_29 256 0.69 0.70 0.67 0.62 2.5 1.11
## big_2_34 257 0.71 0.69 0.67 0.62 3.3 1.24
## big_2_39 256 0.70 0.70 0.69 0.63 2.6 1.10
## big_2_44 256 0.65 0.66 0.62 0.58 2.3 0.96
## big_2_49 256 0.65 0.65 0.60 0.56 3.5 1.18
## big_2_54 257 0.72 0.72 0.71 0.65 2.6 1.17
## big_2_59 257 0.66 0.65 0.62 0.57 2.7 1.17
##
## Non missing response frequency for each item
## 1 2 3 4 5 miss
## big_2_4 0.13 0.33 0.27 0.21 0.06 0
## big_2_9 0.19 0.43 0.25 0.12 0.01 0
## big_2_14 0.14 0.30 0.23 0.28 0.05 0
## big_2_19 0.05 0.20 0.26 0.42 0.07 0
## big_2_24 0.22 0.40 0.26 0.10 0.03 0
## big_2_29 0.21 0.35 0.23 0.18 0.03 0
## big_2_34 0.08 0.23 0.17 0.32 0.19 0
## big_2_39 0.17 0.34 0.25 0.20 0.04 0
## big_2_44 0.21 0.43 0.23 0.12 0.01 0
## big_2_49 0.07 0.14 0.20 0.38 0.20 0
## big_2_54 0.21 0.30 0.22 0.23 0.04 0
## big_2_59 0.18 0.32 0.21 0.24 0.05 0
# Openness items
alpha_openness <- data2 %>%
dplyr::select(big_2_5, big_2_10, big_2_15, big_2_20, big_2_25, big_2_30, big_2_35, big_2_40, big_2_45, big_2_50, big_2_55, big_2_60) %>%
psych::alpha()
alpha_openness
##
## Reliability analysis
## Call: psych::alpha(x = .)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.83 0.84 0.85 0.3 5.1 0.015 3.7 0.63 0.31
##
## 95% confidence boundaries
## lower alpha upper
## Feldt 0.8 0.83 0.86
## Duhachek 0.8 0.83 0.86
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## big_2_5 0.82 0.82 0.84 0.30 4.7 0.017 0.0100 0.31
## big_2_10 0.82 0.82 0.84 0.30 4.7 0.017 0.0099 0.31
## big_2_15 0.82 0.83 0.84 0.30 4.8 0.016 0.0094 0.31
## big_2_20 0.82 0.82 0.83 0.30 4.6 0.017 0.0078 0.31
## big_2_25 0.82 0.82 0.83 0.29 4.6 0.017 0.0102 0.30
## big_2_30 0.82 0.82 0.83 0.30 4.6 0.017 0.0096 0.31
## big_2_35 0.81 0.81 0.83 0.29 4.4 0.017 0.0088 0.30
## big_2_40 0.82 0.82 0.84 0.30 4.7 0.017 0.0104 0.31
## big_2_45 0.82 0.82 0.84 0.30 4.7 0.017 0.0100 0.30
## big_2_50 0.83 0.83 0.84 0.31 4.8 0.016 0.0093 0.31
## big_2_55 0.82 0.83 0.84 0.30 4.8 0.016 0.0101 0.31
## big_2_60 0.83 0.83 0.84 0.30 4.8 0.016 0.0093 0.31
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## big_2_5 256 0.61 0.59 0.54 0.49 3.4 1.19
## big_2_10 256 0.58 0.60 0.56 0.49 4.1 0.84
## big_2_15 257 0.53 0.56 0.51 0.44 3.7 0.90
## big_2_20 256 0.62 0.61 0.59 0.52 3.7 1.11
## big_2_25 257 0.64 0.62 0.58 0.53 3.7 1.16
## big_2_30 256 0.62 0.62 0.58 0.52 3.6 1.10
## big_2_35 257 0.69 0.69 0.68 0.62 3.9 1.00
## big_2_40 257 0.59 0.60 0.55 0.49 3.8 1.02
## big_2_45 255 0.60 0.60 0.55 0.50 3.8 1.02
## big_2_50 256 0.58 0.55 0.49 0.45 3.0 1.33
## big_2_55 255 0.58 0.57 0.52 0.48 3.6 1.02
## big_2_60 255 0.52 0.55 0.49 0.43 3.7 0.91
##
## Non missing response frequency for each item
## 1 2 3 4 5 miss
## big_2_5 0.05 0.23 0.22 0.29 0.20 0.00
## big_2_10 0.01 0.03 0.17 0.45 0.34 0.00
## big_2_15 0.02 0.09 0.21 0.53 0.14 0.00
## big_2_20 0.04 0.12 0.25 0.32 0.27 0.00
## big_2_25 0.05 0.12 0.21 0.32 0.30 0.00
## big_2_30 0.03 0.16 0.21 0.35 0.25 0.00
## big_2_35 0.02 0.07 0.22 0.37 0.32 0.00
## big_2_40 0.02 0.10 0.24 0.39 0.26 0.00
## big_2_45 0.02 0.10 0.21 0.37 0.30 0.01
## big_2_50 0.16 0.28 0.17 0.24 0.16 0.00
## big_2_55 0.03 0.14 0.24 0.42 0.17 0.01
## big_2_60 0.01 0.11 0.23 0.49 0.15 0.01
Q: How would you judge the internal consistency of the items on each of the other four personality trait subscales?
Agreeableness subscale has acceptable internal consistency. The conscientiousness, neuroticism, and openness subscales all have good internal consistency.
Now that appropriate items have been reverse-coded, we can create a single, aggregate score for each of the subscales that represent how participants scored on each personality trait overall.
A common method of creating a single composite score for a variable measured using multiple items is by calculating the average score across all of the corresponding items. Make sure you use the correct data set containing the reverse-coded items (not the raw data set)!
First, let’s create an aggregate score for the extraversion subscale.
We can do this by selecting the columns from the dataframe corresponding
to our extraversion items and then pass these items to the
rowMeans() function.
Note about how missing data is handled:
na.rm = TRUE
argument, in which case the row mean will be calculated for everyone by
simply excluding any missing entries from the calculationThis might be a reasonable approach to handling missingness if the amount of missing data per participant is minimal. Otherwise, we will talk about more methods for handling missing data later on in the course.
data2$extraversion <- data2 %>%
select(big_2_1, big_2_6, big_2_11, big_2_16, big_2_21, big_2_26, big_2_31, big_2_36, big_2_41, big_2_46, big_2_51, big_2_56) %>%
rowMeans(na.rm = TRUE)
Let’s take a look at the data frame to make sure the new aggregated variable was added at the end of it as expected.
View(data2)
Go ahead and construct an aggregate score for the other four subscales (agreeableness, conscientiousness, neuroticism, and openness) by referring to the bfi2-form.pdf document to see which items belong to which subscale.
data2$agreeableness <- data2 %>%
dplyr::select(big_2_2, big_2_7, big_2_12, big_2_17, big_2_22, big_2_27, big_2_32, big_2_37, big_2_42, big_2_47, big_2_52, big_2_57) %>%
rowMeans(na.rm = TRUE)
data2$conscientiousness <- data2 %>%
dplyr::select(big_2_3, big_2_8, big_2_13, big_2_18, big_2_23, big_2_28, big_2_33, big_2_38, big_2_43, big_2_48, big_2_53, big_2_58) %>%
rowMeans(na.rm = TRUE)
data2$neuroticism <- data2 %>%
dplyr::select(big_2_4, big_2_9, big_2_14, big_2_19, big_2_24, big_2_29, big_2_34, big_2_39, big_2_44, big_2_49, big_2_54, big_2_59) %>%
rowMeans(na.rm = TRUE)
data2$openness <- data2 %>%
dplyr::select(big_2_5, big_2_10, big_2_15, big_2_20, big_2_25, big_2_30, big_2_35, big_2_40, big_2_45, big_2_50, big_2_55, big_2_60) %>%
rowMeans(na.rm = TRUE)
For reasons we will discuss more later in the course, it’s important to understand the relationships between continuous variables that you intend to use as predictors in your regression models. Next, let’s move onto examining the correlations among the five personality traits that we just constructed aggregated variables for.
Covariance captures how the variances of two variables are related, i.e., how they co-vary. If higher values of one variable tend to correspond with higher values of the other variable, and lower values of one variable tend to correspond with lower values of the other variable, then the covariance would be positive. However, if the two variables are inversely related (i.e., higher values on one variable correspond with lower values on the other variable), then the covariance would be negative.
\[\large cov_{xy} = {\frac{\sum{(x-\bar{x})(y-\bar{y})}}{N-1}}\]
To calculate covariance, use the function cov() from the
{stats} package. The cov() function takes two
arguments: the first variable “x” and the second variable “y”.
cov(data2$extraversion, data2$agreeableness)
## [1] 0.06302961
cov() a data frame, or multiple columns from a
data frame, will generate a covariance matrix. Let’s calculate a
covariance matrix that shows the covariance for all pairs of
personality traits. Round to two decimal places.data2 %>%
select(extraversion, agreeableness, conscientiousness, neuroticism, openness) %>%
cov() %>%
round(2)
## extraversion agreeableness conscientiousness neuroticism
## extraversion 0.37 0.06 0.12 -0.14
## agreeableness 0.06 0.30 0.15 -0.14
## conscientiousness 0.12 0.15 0.37 -0.14
## neuroticism -0.14 -0.14 -0.14 0.51
## openness 0.14 0.11 0.11 -0.08
## openness
## extraversion 0.14
## agreeableness 0.11
## conscientiousness 0.11
## neuroticism -0.08
## openness 0.39
\[\large r_{xy} = {\frac{cov(X,Y)}{\hat\sigma_{x}\hat\sigma_{y}}}\]
cor()
function from the {stats} package.cor(data2$extraversion, data2$agreeableness)
## [1] 0.189135
cor(). Let’s calculate a covariance matrix that shows the
covariance for all pairs of personality traits. Round to two
decimal places.cor_matrix <- data2 %>%
select(extraversion, agreeableness, conscientiousness, neuroticism, openness) %>%
cor() %>%
round(2)
cor_matrix
## extraversion agreeableness conscientiousness neuroticism
## extraversion 1.00 0.19 0.33 -0.32
## agreeableness 0.19 1.00 0.46 -0.35
## conscientiousness 0.33 0.46 1.00 -0.33
## neuroticism -0.32 -0.35 -0.33 1.00
## openness 0.37 0.32 0.28 -0.17
## openness
## extraversion 0.37
## agreeableness 0.32
## conscientiousness 0.28
## neuroticism -0.17
## openness 1.00
Q: What do you notice about the relationships between different pairs of personality traits based on this correlation matrix?
None of the correlations are above 0.50. The strongest relationship appears to exist between conscientiousness and agreeableness, r = 0.46. The relationship between the two is positive (as people increase on conscientiousness, they tended to increase on agreeableness). The weakest relationship appears to exist between neuroticism and openness, r = -0.17. The relationship between the two is negative (as people increase on neurocisim, they tend to decrease on openness, but the relationship is weak).
There’s many other things you could notice!
You all are already familiar with scatterplots which can be used to
visualize the relationship between two continuous variables. For
example, let’s use ggplot to visualize the relationship
between extraversion and agreeableness. You can add a line of best fit
by adding a geom_smooth() layer to the plot.
ggplot(data = data2, aes(x = extraversion, y = agreeableness)) +
geom_point() +
geom_smooth(method = "lm")
## `geom_smooth()` using formula = 'y ~ x'
There are also ways of visualizing the correlations among
all of the continuous variables in your dataframe that are of
interest in your study. “SPLOM” stands for scatter plot matrix. The
pairs.panel() function from the {psych}
package allows a quick way to visualize relationships among all the
continuous variables in your data frame. The lower diagonal contains
scatter plots showing bivariate relationships between pairs of
variables, and the upper diagonal contains the corresponding correlation
coefficients. Histograms for each variable are shown along the
diagonal.
data2 %>%
select(extraversion, agreeableness, conscientiousness, neuroticism, openness) %>%
pairs.panels(lm = TRUE)
Heat maps are a great way to get a high-level visualization of a
correlation matrix. They are particularly useful for visualizing the
number of “clusters” in your data if that’s something you’re looking
for. We can plot a heatmap of a correlation matrix using the
corrplot() function from the {corrplot}
package. Note: make sure that you are feeding the function a correlation
matrix (not the data set). We’ll use the correlation matrix that we
constructed earlier.
corrplot(corr = cor_matrix, method = "square")
{apaTables} has a very useful function
apa.cor.table() that creates nicely formatted tables of
correlation matrices in APA format. This code prints the table to a word
document called “cor_matrix.doc” that shows up as a separate document in
the file folder that is set as your current working directory.apa.cor.table(cor_matrix,
filename = "cor_matrix.doc",
table.number = 1)
The corr.test() function from the {psych}
package can be used to test whether the correlation between two
variables is significantly different from zero.
Let’s test whether the correlation between extraversion and agreeabless (r = 0.19) is significantly different from zero.
data2 %>%
select(extraversion, agreeableness) %>%
corr.test()
## Call:corr.test(x = .)
## Correlation matrix
## extraversion agreeableness
## extraversion 1.00 0.19
## agreeableness 0.19 1.00
## Sample Size
## [1] 257
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## extraversion agreeableness
## extraversion 0 0
## agreeableness 0 0
##
## To see confidence intervals of the correlations, print with the short=FALSE option
The p-values are not printed to multiple decimals, so it is difficult to extrapolate whether the correlation is significant or not. To work around this, let’s store the output of corr.test() to an object and then look at the p-value stored within this object. Additionally, we can pull out the confidence interval.
r_ext_agr <- data2 %>%
select(extraversion, agreeableness) %>%
corr.test()
r_ext_agr$p
## extraversion agreeableness
## extraversion 0.000000000 0.002328257
## agreeableness 0.002328257 0.000000000
r_ext_agr$ci
## lower r upper p
## extrv-agrbl 0.06835406 0.189135 0.3044518 0.002328257
Q: Is the correlation between extraversion and agreeableness significant?
Yes, the p-value is .002, which is less than an alpha of .05.