R Package of the Day: psych

Pyo

Why I chose psych

  • I chose psych because it is widely used in psychology and survey research.
  • It is useful for analyzing survey data, reliability, and validity..
  • It allows us to visualize complex human personality structures using actual data.

What is psych?

The psych package is a general-purpose toolbox for psychological, psychometric, and personality research.

Main functions I will show: - describe() - alpha() - fa()

The dataset: Big Five Inventory (bfi)

  • Source: psych package built-in dataset.
  • Size: 2,800 cases and 25 personality items.
  • Five factors: Agreeableness, Conscientiousness, Extraversion, Neuroticism, and Openness.

This dataset is included as a demonstration set for scale construction and factor analysis. These include personality data like bfi, mood scales such as msq, ability test scores like sat.act

Load the data

library(psych)

data(bfi)
dim(bfi)
[1] 2800   28

The bfi data includes 25 personality items plus some demographic variables, so I will focus on the personality items only.

Example 1: Descriptive statistics

a_items <- bfi[, c("A1", "A2", "A3", "A4", "A5")]
psych::describe(a_items)
   vars    n mean   sd median trimmed  mad min max range  skew kurtosis   se
A1    1 2784 2.41 1.41      2    2.23 1.48   1   6     5  0.83    -0.31 0.03
A2    2 2773 4.80 1.17      5    4.98 1.48   1   6     5 -1.12     1.05 0.02
A3    3 2774 4.60 1.30      5    4.79 1.48   1   6     5 -1.00     0.44 0.02
A4    4 2781 4.70 1.48      5    4.93 1.48   1   6     5 -1.03     0.04 0.03
A5    5 2784 4.56 1.26      5    4.71 1.48   1   6     5 -0.85     0.16 0.02

Usingdescribe() we can quickly obtain descriptive statistics for multiple personality items at once, which makes it easy to explore the basic properties of the Big Five scales in the bfi dataset.

Example 2: Reliability analysis

a_alpha <- psych::alpha(a_items)
Some items ( A1 ) were negatively correlated with the first principal component and 
probably should be reversed.  
To do this, run the function again with the 'check.keys=TRUE' option
a_alpha$total
 raw_alpha std.alpha   G6(smc) average_r       S/N        ase     mean
 0.4314561 0.4586749 0.5321028 0.1449072 0.8473189 0.01628817 4.216732
        sd  median_r
 0.7368514 0.3213122

alpha() reports Cronbach’s alpha, which is one of the most common measures of internal consistency reliability.
This helps us check whether the Agreeableness items behave like one coherent scale.

Example 3: Factor analysis

items25 <- bfi[, 1:25]
fa_result <- psych::fa(items25, nfactors = 5, rotate = "varimax")
print(fa_result$loadings, cutoff = 0.40)

Loadings:
   MR2    MR1    MR3    MR5    MR4   
A1                      -0.410       
A2                       0.615       
A3                       0.637       
A4                       0.423       
A5                       0.533       
C1                0.536              
C2                0.647              
C3                0.550              
C4               -0.607              
C5               -0.553              
E1        -0.574                     
E2        -0.680                     
E3         0.540                     
E4         0.645                     
E5         0.501                     
N1  0.769                            
N2  0.748                            
N3  0.734                            
N4  0.585                            
N5  0.541                            
O1                              0.496
O2                             -0.453
O3                              0.590
O4                                   
O5                             -0.536

                 MR2   MR1   MR3   MR5   MR4
SS loadings    2.690 2.442 1.975 1.783 1.479
Proportion Var 0.108 0.098 0.079 0.071 0.059
Cumulative Var 0.108 0.205 0.284 0.356 0.415

Using psych::fa(), we can see that the observed items do in fact cluster into the expected Big Five structure in this real dataset.

Why this package is useful

  • It combines several important psychometric tools in one package.
  • It works well for survey scales and personality data.
  • It is especially useful when researchers want to move from item summaries to scale reliability and latent structure.

Strengths

  • Easy access to descriptive statistics.
  • Convenient reliability analysis with detailed output.
  • Strong support for factor analysis and scale construction.

Limitations

  • Output can be dense for beginners.
  • Correct interpretation still requires statistical judgment.
  • Some results are easier to visualize with additional packages.