Analysis of the ETH Zurich data

Data

Load the data from the two years and bind them to create a single data frame:

# load data
eth_2018 <- read_csv("data/2018/all.csv")
eth_2017 <- read_csv("data/2017/all.csv")

# add year as the identifier column to each data frame
eth_2018 <- eth_2018 %>% 
  mutate(year = 2018)
eth_2017 <- eth_2017 %>% 
  mutate(year = 2017)

# bind the two data frames together
eth <- bind_rows(eth_2017, eth_2018) %>%
  relocate(year)

Skim an overview of the data:

skim(eth) %>% kable()

skim_type	skim_variable	complete_rate	numeric.mean	numeric.sd	numeric.p0	numeric.p25	numeric.p50	numeric.p75	numeric.p100	numeric.hist
numeric	year	1	2017.5086232	0.4999987	2017	2017	2018	2018	2018	▇▁▁▁▇
numeric	A1	1	0.9517685	0.2142864	0	1	1	1	1	▁▁▁▁▇
numeric	A2	1	0.2934814	0.4554237	0	0	0	1	1	▇▁▁▁▃
numeric	A3	1	0.6331482	0.4820160	0	0	1	1	1	▅▁▁▁▇
numeric	A4	1	0.6337328	0.4818541	0	0	1	1	1	▅▁▁▁▇
numeric	A5	1	0.4793920	0.4996482	0	0	0	1	1	▇▁▁▁▇
numeric	A6	1	0.7202572	0.4489384	0	0	1	1	1	▃▁▁▁▇
numeric	A7	1	0.7065186	0.4554237	0	0	1	1	1	▃▁▁▁▇
numeric	A8	1	0.4849459	0.4998464	0	0	0	1	1	▇▁▁▁▇
numeric	A9	1	0.4612686	0.4985705	0	0	0	1	1	▇▁▁▁▇
numeric	A10	1	0.5422391	0.4982855	0	0	1	1	1	▇▁▁▁▇
numeric	A11	1	0.5784858	0.4938737	0	0	1	1	1	▆▁▁▁▇
numeric	A12	1	0.6968723	0.4596771	0	0	1	1	1	▃▁▁▁▇
numeric	A13	1	0.3656825	0.4816914	0	0	0	1	1	▇▁▁▁▅
numeric	A14	1	0.5667933	0.4955910	0	0	1	1	1	▆▁▁▁▇
numeric	A15	1	0.4615609	0.4985931	0	0	0	1	1	▇▁▁▁▇
numeric	A16	1	0.1935107	0.3951075	0	0	0	0	1	▇▁▁▁▂
numeric	A17	1	0.5960246	0.4907644	0	0	1	1	1	▆▁▁▁▇
numeric	A18	1	0.3724057	0.4835163	0	0	0	1	1	▇▁▁▁▅
numeric	A19	1	0.3469746	0.4760772	0	0	0	1	1	▇▁▁▁▅
numeric	A20	1	0.7240573	0.4470534	0	0	1	1	1	▃▁▁▁▇
numeric	A21	1	0.5150541	0.4998464	0	0	1	1	1	▇▁▁▁▇
numeric	A22	1	0.5171003	0.4997805	0	0	1	1	1	▇▁▁▁▇
numeric	A23	1	0.4223911	0.4940123	0	0	0	1	1	▇▁▁▁▆
numeric	A24	1	0.5822859	0.4932547	0	0	1	1	1	▆▁▁▁▇
numeric	A25	1	0.6381175	0.4806153	0	0	1	1	1	▅▁▁▁▇
numeric	A26	1	0.8018123	0.3986926	0	1	1	1	1	▂▁▁▁▇
numeric	A27	1	0.3957907	0.4890913	0	0	0	1	1	▇▁▁▁▅
numeric	A28	1	0.4387606	0.4963081	0	0	0	1	1	▇▁▁▁▆
numeric	A29	1	0.5007308	0.5000726	0	0	1	1	1	▇▁▁▁▇
numeric	A30	1	0.7430576	0.4370113	0	0	1	1	1	▃▁▁▁▇
numeric	A31	1	0.3735750	0.4838235	0	0	0	1	1	▇▁▁▁▅
numeric	A32	1	0.2151418	0.4109807	0	0	0	0	1	▇▁▁▁▂
numeric	A33	1	0.7699503	0.4209259	0	1	1	1	1	▂▁▁▁▇
numeric	A34	1	0.5983631	0.4903009	0	0	1	1	1	▆▁▁▁▇
numeric	A35	1	0.3285589	0.4697579	0	0	0	1	1	▇▁▁▁▃
numeric	A36	1	0.2244958	0.4173108	0	0	0	0	1	▇▁▁▁▂

Multidimensional Item Response Theory

MIRT in R

We can fit a Multidimensional Item Response Theory (mirt) model. From the function definition:

mirt fits a maximum likelihood (or maximum a posteriori) factor analysis model to any mixture of dichotomous and polytomous data under the item response theory paradigm using either Cai's (2010) Metropolis-Hastings Robbins-Monro (MHRM) algorithm.

The process is to first fit the model, and save the result as a model object that we can then parse to get tabular or visual displays of the model that we might want. When fitting the model, we have the option to specify a few arguments, which then get interpreted as parameters to be passed to the model.

fit_2pl <- mirt(
  data = eth[2:37],  # all but the year column
  model = 1,         # number of factors to extract
  itemtype = "2PL",  # 2 parameter logistic model
  SE = TRUE          # estimate standard errors
  )

## 
Iteration: 1, Log-Lik: -66428.235, Max-Change: 0.72929
Iteration: 2, Log-Lik: -65241.884, Max-Change: 0.24383
Iteration: 3, Log-Lik: -65081.725, Max-Change: 0.14895
Iteration: 4, Log-Lik: -65014.543, Max-Change: 0.11122
Iteration: 5, Log-Lik: -64977.384, Max-Change: 0.09454
Iteration: 6, Log-Lik: -64955.545, Max-Change: 0.06530
Iteration: 7, Log-Lik: -64942.861, Max-Change: 0.06130
Iteration: 8, Log-Lik: -64935.381, Max-Change: 0.04421
Iteration: 9, Log-Lik: -64930.611, Max-Change: 0.03616
Iteration: 10, Log-Lik: -64927.669, Max-Change: 0.02949
Iteration: 11, Log-Lik: -64925.871, Max-Change: 0.01790
Iteration: 12, Log-Lik: -64924.788, Max-Change: 0.01302
Iteration: 13, Log-Lik: -64922.989, Max-Change: 0.00234
Iteration: 14, Log-Lik: -64922.963, Max-Change: 0.00196
Iteration: 15, Log-Lik: -64922.949, Max-Change: 0.00172
Iteration: 16, Log-Lik: -64922.930, Max-Change: 0.00086
Iteration: 17, Log-Lik: -64922.925, Max-Change: 0.00094
Iteration: 18, Log-Lik: -64922.922, Max-Change: 0.00077
Iteration: 19, Log-Lik: -64922.912, Max-Change: 0.00018
Iteration: 20, Log-Lik: -64922.912, Max-Change: 0.00019
Iteration: 21, Log-Lik: -64922.912, Max-Change: 0.00018
Iteration: 22, Log-Lik: -64922.910, Max-Change: 0.00011
Iteration: 23, Log-Lik: -64922.910, Max-Change: 0.00010
Iteration: 24, Log-Lik: -64922.910, Max-Change: 0.00010
## 
## Calculating information matrix...

We then compute factor score estimates and augment the existing data frame with these estimates, to keep everything in one place. To do the estimation, we use the fscores() function from the mirt package which takes in a computed model object and computes factor score estimates according to the method specified. We will use the EAP method for factor score estimation, which is the “expected a-posteriori” method, the default. We specify it explicitl below, but the results would have been the same if we omitted specifying the method arhument since it’s the default method the function uses.

eth <- eth %>%
  mutate(F1 = fscores(fit_2pl, method = "EAP"))

We can also calculate the model coefficient estimates using a generic function coef() which is used to extract model coefficients from objects returned by modeling functions. We will set the IRTpars argument to TRUE, which means slope intercept parameters will be converted into traditional IRT parameters.

coefs_2pl <- coef(fit_2pl, IRTpars = TRUE)

The resulting object coefs is a list, with one element for each question, and an additional GroupPars element that we won’t be using. The output is a bit long, so we’re only showing a few of the elements here:

coefs_2pl[1:3]

## $A1
##                a         b  g  u
## par     1.200868 -2.990220  0  1
## CI_2.5  1.005934 -3.356706 NA NA
## CI_97.5 1.395802 -2.623733 NA NA
## 
## $A2
##                 a        b  g  u
## par     0.5617763 1.672543  0  1
## CI_2.5  0.4755542 1.407243 NA NA
## CI_97.5 0.6479984 1.937843 NA NA
## 
## $A3
##                a          b  g  u
## par     1.348138 -0.5426261  0  1
## CI_2.5  1.229726 -0.6171188 NA NA
## CI_97.5 1.466549 -0.4681333 NA NA

coefs_2pl[35:37]

## $A35
##                a         b  g  u
## par     1.276322 0.7324404  0  1
## CI_2.5  1.160025 0.6487785 NA NA
## CI_97.5 1.392619 0.8161023 NA NA
## 
## $A36
##                a        b  g  u
## par     1.245564 1.275905  0  1
## CI_2.5  1.121704 1.161266 NA NA
## CI_97.5 1.369425 1.390544 NA NA
## 
## $GroupPars
##         MEAN_1 COV_11
## par          0      1
## CI_2.5      NA     NA
## CI_97.5     NA     NA

Let’s take a closer look at the first element:

coefs_2pl[1]

## $A1
##                a         b  g  u
## par     1.200868 -2.990220  0  1
## CI_2.5  1.005934 -3.356706 NA NA
## CI_97.5 1.395802 -2.623733 NA NA

In this output:

a is discrimination
b is difficulty
endpoints of the 95% confidence intervals are also shown

To make this output a little more user friendly, we should tidy it such that we have a row per question. We’ll do this in two steps. First, write a function that tidies the output for one question, i.e. one list element. Then, map this function over the list of all questions, resulting in a data frame.

tidy_mirt_coefs <- function(x){
  x %>%
    # melt the list element
    melt() %>%
    # convert to a tibble
    as_tibble() %>%
    # convert factors to characters
    mutate(across(where(is.factor), as.character)) %>%
    # only focus on rows where X2 is a or b (discrimination or difficulty)
    filter(X2 %in% c("a", "b")) %>%
    # in X1, relabel par (parameter) as est (estimate)
    mutate(X1 = if_else(X1 == "par", "est", X1)) %>%
    # unite columns X2 and X1 into a new column called var separated by _
    unite(X2, X1, col = "var", sep = "_") %>%
    # turn into a wider data frame
    pivot_wider(names_from = var, values_from = value)
}

Let’s see what this does to a single element in coefs:

tidy_mirt_coefs(coefs_2pl[1])

## # A tibble: 1 x 7
##   L1    a_est a_CI_2.5 a_CI_97.5 b_est b_CI_2.5 b_CI_97.5
##   <chr> <dbl>    <dbl>     <dbl> <dbl>    <dbl>     <dbl>
## 1 A1     1.20     1.01      1.40 -2.99    -3.36     -2.62

And now let’s map it over all 36 elements of coefs:

tidy_2pl <- map_dfr(coefs_2pl[1:36], tidy_mirt_coefs, .id = "Question")

A quick peek at the result:

tidy_2pl

## # A tibble: 36 x 7
##    Question a_est a_CI_2.5 a_CI_97.5   b_est b_CI_2.5 b_CI_97.5
##    <chr>    <dbl>    <dbl>     <dbl>   <dbl>    <dbl>     <dbl>
##  1 A1       1.20     1.01      1.40  -2.99   -3.36      -2.62  
##  2 A2       0.562    0.476     0.648  1.67    1.41       1.94  
##  3 A3       1.35     1.23      1.47  -0.543  -0.617     -0.468 
##  4 A4       1.18     1.07      1.29  -0.592  -0.675     -0.509 
##  5 A5       1.46     1.34      1.58   0.0787  0.0147     0.143 
##  6 A6       1.55     1.42      1.69  -0.864  -0.943     -0.785 
##  7 A7       1.28     1.16      1.40  -0.896  -0.986     -0.805 
##  8 A8       1.16     1.05      1.27   0.0659 -0.00750    0.139 
##  9 A9       2.02     1.86      2.18   0.127   0.0723     0.182 
## 10 A10      1.76     1.62      1.90  -0.147  -0.206     -0.0886
## # … with 26 more rows

And a nicely formatted table of the result:

gt(tidy_2pl) %>%
  fmt_number(columns = contains("_"), decimals = 3) %>%
  data_color(
    columns = contains("a_"),
    colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
  ) %>%
  data_color(
    columns = contains("b_"),
    colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
  ) %>%
  tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
  tab_spanner(label = "Difficulty", columns = contains("b_")) %>%
  cols_label(
    a_est = "Est.",
    b_est = "Est.",
    a_CI_2.5 = "2.5%",
    b_CI_2.5 = "2.5%",
    a_CI_97.5 = "97.5%",
    b_CI_97.5 = "97.5%"
  )

Question	Discrimination			Difficulty
Question	Est.	2.5%	97.5%	Est.	2.5%	97.5%
A1	1.201	1.006	1.396	−2.990	−3.357	−2.624
A2	0.562	0.476	0.648	1.673	1.407	1.938
A3	1.348	1.230	1.467	−0.543	−0.617	−0.468
A4	1.177	1.068	1.286	−0.592	−0.675	−0.509
A5	1.462	1.340	1.584	0.079	0.015	0.143
A6	1.553	1.416	1.690	−0.864	−0.943	−0.785
A7	1.281	1.162	1.400	−0.896	−0.986	−0.805
A8	1.160	1.055	1.265	0.066	−0.007	0.139
A9	2.017	1.856	2.178	0.127	0.072	0.182
A10	1.762	1.620	1.904	−0.147	−0.206	−0.089
A11	1.678	1.540	1.815	−0.281	−0.342	−0.220
A12	0.886	0.789	0.984	−1.092	−1.223	−0.960
A13	0.716	0.628	0.803	0.856	0.718	0.994
A14	1.433	1.312	1.554	−0.259	−0.325	−0.193
A15	0.848	0.757	0.938	0.210	0.117	0.303
A16	1.544	1.396	1.693	1.291	1.189	1.392
A17	1.221	1.111	1.331	−0.411	−0.487	−0.335
A18	1.234	1.122	1.347	0.547	0.468	0.626
A19	1.309	1.191	1.427	0.639	0.560	0.718
A20	2.318	2.119	2.518	−0.741	−0.802	−0.679
A21	2.143	1.973	2.313	−0.048	−0.102	0.005
A22	1.996	1.837	2.154	−0.056	−0.111	−0.001
A23	1.874	1.723	2.025	0.264	0.206	0.322
A24	1.433	1.311	1.554	−0.320	−0.387	−0.253
A25	0.530	0.449	0.612	−1.139	−1.347	−0.930
A26	2.418	2.193	2.642	−1.042	−1.112	−0.972
A27	2.000	1.838	2.162	0.347	0.290	0.404
A28	1.859	1.709	2.008	0.209	0.151	0.266
A29	0.912	0.819	1.005	−0.004	−0.090	0.083
A30	1.258	1.137	1.379	−1.091	−1.193	−0.988
A31	1.255	1.142	1.368	0.537	0.459	0.614
A32	1.607	1.457	1.757	1.151	1.060	1.242
A33	0.932	0.825	1.038	−1.518	−1.680	−1.355
A34	0.906	0.811	1.000	−0.517	−0.614	−0.420
A35	1.276	1.160	1.393	0.732	0.649	0.816
A36	1.246	1.122	1.369	1.276	1.161	1.391

Comparing years and classes

Do students from different programmes of study have different distributions of ability?

# produce list of all the relevant file names
# (match only the courses beginning 401, i.e. exclude "all.csv")
files <- dir_ls("data", recurse = TRUE, regexp = "401.*.csv")

# read all files and add a column called file_path to identify them
eth_entry_test <- vroom(files, id = "file_path")

# parse file_path column into year and class
eth_entry_test <- eth_entry_test %>%
  mutate(
    file_path = str_remove(file_path, "data/"),
    file_path = str_remove(file_path, ".csv"),
    ) %>%
  separate(file_path, c("year", "class"), sep = "/")

# fit IRT model
fit_2pl <- mirt(
  eth_entry_test[3:38],  # all but the year and class columns
  model = 1,             # number of factors to extract
  itemtype = "2PL",      # 2 parameter logistic model
  SE = TRUE              # estimate standard errors
  )

## 
Iteration: 1, Log-Lik: -85250.363, Max-Change: 0.82374
Iteration: 2, Log-Lik: -83714.765, Max-Change: 0.37956
Iteration: 3, Log-Lik: -83535.988, Max-Change: 0.17218
Iteration: 4, Log-Lik: -83473.241, Max-Change: 0.09242
Iteration: 5, Log-Lik: -83442.310, Max-Change: 0.06845
Iteration: 6, Log-Lik: -83425.490, Max-Change: 0.05833
Iteration: 7, Log-Lik: -83416.100, Max-Change: 0.03700
Iteration: 8, Log-Lik: -83410.552, Max-Change: 0.02708
Iteration: 9, Log-Lik: -83407.196, Max-Change: 0.02090
Iteration: 10, Log-Lik: -83402.119, Max-Change: 0.00403
Iteration: 11, Log-Lik: -83401.998, Max-Change: 0.00347
Iteration: 12, Log-Lik: -83401.919, Max-Change: 0.00292
Iteration: 13, Log-Lik: -83401.792, Max-Change: 0.00198
Iteration: 14, Log-Lik: -83401.775, Max-Change: 0.00117
Iteration: 15, Log-Lik: -83401.765, Max-Change: 0.00104
Iteration: 16, Log-Lik: -83401.737, Max-Change: 0.00033
Iteration: 17, Log-Lik: -83401.736, Max-Change: 0.00037
Iteration: 18, Log-Lik: -83401.735, Max-Change: 0.00031
Iteration: 19, Log-Lik: -83401.731, Max-Change: 0.00023
Iteration: 20, Log-Lik: -83401.731, Max-Change: 0.00018
Iteration: 21, Log-Lik: -83401.730, Max-Change: 0.00018
Iteration: 22, Log-Lik: -83401.730, Max-Change: 0.00016
Iteration: 23, Log-Lik: -83401.730, Max-Change: 0.00015
Iteration: 24, Log-Lik: -83401.730, Max-Change: 0.00014
Iteration: 25, Log-Lik: -83401.729, Max-Change: 0.00012
Iteration: 26, Log-Lik: -83401.729, Max-Change: 0.00011
Iteration: 27, Log-Lik: -83401.729, Max-Change: 0.00010
Iteration: 28, Log-Lik: -83401.729, Max-Change: 0.00010
## 
## Calculating information matrix...

# augment data with factor scores from model fit
eth_entry_test <- eth_entry_test %>%
  mutate(F1 = fscores(fit_2pl, method = "EAP"))

Differences between years

Compare the distribution of abilities in the two years.

ggplot(eth_entry_test, aes(F1, fill = as.factor(year), colour = as.factor(year))) +
  geom_density(alpha=0.5) + 
  scale_x_continuous(limits = c(-3.5,3.5)) +
  labs(title = "Density plot", 
       subtitle = "Ability grouped by year of taking the test", 
       x = "Ability", y = "Density",
       fill = "Year", colour = "Year") +
  theme_minimal()

There does not seem to be a big difference between the two year groups, so we combine them in the following analysis.

Differences between classes

Compare the distribution of abilities in the various classes.

ggplot(eth_entry_test, aes(x = F1, y = class, colour = class, fill = class)) +
  geom_density_ridges(alpha = 0.5) + 
  scale_x_continuous(limits = c(-3.5,3.5)) +
  guides(fill = FALSE, colour = FALSE) +
  labs(title = "Density plot", 
       subtitle = "Ability grouped by class of taking the test", 
       x = "Ability", y = "Class") +
  theme_minimal()

## Picking joint bandwidth of 0.256

Factor analysis

fa.parallel(eth[2:37], fa = "fa")

## Parallel analysis suggests that the number of factors =  9  and the number of components =  NA

1 Factor

fitfact <- factanal(eth[2:37], 1, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)

## 
## Call:
## factanal(x = eth[2:37], factors = 1, rotation = "varimax")
## 
## Uniquenesses:
##   A1   A2   A3   A4   A5   A6   A7   A8   A9  A10  A11  A12  A13  A14  A15  A16 
## 0.95 0.95 0.75 0.79 0.72 0.74 0.79 0.79 0.61 0.66 0.67 0.88 0.91 0.73 0.87 0.81 
##  A17  A18  A19  A20  A21  A22  A23  A24  A25  A26  A27  A28  A29  A30  A31  A32 
## 0.78 0.78 0.78 0.63 0.59 0.61 0.64 0.73 0.94 0.69 0.63 0.64 0.85 0.81 0.78 0.78 
##  A33  A34  A35  A36 
## 0.88 0.86 0.79 0.84 
## 
## Loadings:
##  [1] 0.53 0.51 0.62 0.59 0.57 0.52 0.61 0.64 0.62 0.60 0.52 0.56 0.61 0.60     
## [16]      0.50 0.45 0.46 0.46 0.35 0.31 0.36 0.44 0.47 0.46 0.47      0.39 0.44
## [31] 0.47 0.47 0.34 0.38 0.46 0.40
## 
##                Factor1
## SS loadings       8.35
## Proportion Var    0.23
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 5428.36 on 594 degrees of freedom.
## The p-value is 0

load <- tidy(fitfact)

ggplot(load, aes(x = fl1, y = 0)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = NULL,
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

2 Factors

fitfact <- factanal(eth[2:37], 2, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)

## 
## Call:
## factanal(x = eth[2:37], factors = 2, rotation = "varimax")
## 
## Uniquenesses:
##   A1   A2   A3   A4   A5   A6   A7   A8   A9  A10  A11  A12  A13  A14  A15  A16 
## 0.95 0.94 0.75 0.79 0.72 0.74 0.78 0.79 0.61 0.66 0.67 0.88 0.90 0.73 0.87 0.81 
##  A17  A18  A19  A20  A21  A22  A23  A24  A25  A26  A27  A28  A29  A30  A31  A32 
## 0.77 0.40 0.04 0.63 0.59 0.61 0.64 0.72 0.94 0.69 0.63 0.64 0.85 0.80 0.78 0.78 
##  A33  A34  A35  A36 
## 0.88 0.86 0.79 0.84 
## 
## Loadings:
##     Factor1 Factor2
## A9  0.60           
## A10 0.55           
## A11 0.56           
## A20 0.58           
## A21 0.59           
## A22 0.59           
## A23 0.58           
## A24 0.51           
## A26 0.53           
## A27 0.58           
## A28 0.57           
## A18         0.75   
## A19         0.97   
## A1                 
## A2                 
## A3  0.47           
## A4  0.44           
## A5  0.50           
## A6  0.49           
## A7  0.46           
## A8  0.42           
## A12 0.33           
## A13 0.30           
## A14 0.48           
## A15 0.36           
## A16 0.38           
## A17 0.46           
## A25                
## A29 0.36           
## A30 0.44           
## A31 0.45           
## A32 0.43           
## A33 0.34           
## A34 0.36           
## A35 0.42           
## A36 0.38           
## 
##                Factor1 Factor2
## SS loadings       7.25    2.29
## Proportion Var    0.20    0.06
## Cumulative Var    0.20    0.27
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 3050.55 on 559 degrees of freedom.
## The p-value is 0

load <- tidy(fitfact)

ggplot(load, aes(x = fl1, y = fl2)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = "Factor 2",
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

9 Factors (showing only first 2)

fitfact <- factanal(eth[2:37], 9, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)

## 
## Call:
## factanal(x = eth[2:37], factors = 9, rotation = "varimax")
## 
## Uniquenesses:
##   A1   A2   A3   A4   A5   A6   A7   A8   A9  A10  A11  A12  A13  A14  A15  A16 
## 0.88 0.64 0.71 0.74 0.64 0.66 0.72 0.74 0.46 0.64 0.64 0.86 0.79 0.69 0.82 0.56 
##  A17  A18  A19  A20  A21  A22  A23  A24  A25  A26  A27  A28  A29  A30  A31  A32 
## 0.66 0.41 0.00 0.47 0.51 0.57 0.60 0.62 0.91 0.51 0.59 0.61 0.80 0.71 0.75 0.74 
##  A33  A34  A35  A36 
## 0.83 0.69 0.55 0.70 
## 
## Loadings:
##     Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
## A24  0.51                                                                  
## A20          0.62                                                          
## A26  0.31    0.56                                                          
## A9   0.31            0.59                                                  
## A18                          0.70                                          
## A19                          0.96                                          
## A35                                  0.61                                  
## A16                                          0.58                          
## A2                                                   0.59                  
## A1                                                                         
## A3                                                           0.33          
## A4   0.37                                                                  
## A5                                                           0.38          
## A6                                                           0.36          
## A7   0.44                                                                  
## A8                   0.31                                                  
## A10                  0.35                                                  
## A11  0.32            0.39                                                  
## A12                                                                        
## A13                                                  0.34                  
## A14                                                                        
## A15                                                                        
## A17  0.50                                                                  
## A21          0.47                                                          
## A22          0.42                                                          
## A23          0.30    0.41                                                  
## A25                                                                        
## A27          0.38                                                          
## A28                  0.43                                                  
## A29          0.31                                                          
## A30  0.48                                                                  
## A31                                                                        
## A32                                                                        
## A33  0.32                                                                  
## A34                                  0.47                                  
## A36                                  0.44                                  
## 
##                Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8
## SS loadings       2.61    2.22    1.93    1.69    1.37    0.96    0.81    0.76
## Proportion Var    0.07    0.06    0.05    0.05    0.04    0.03    0.02    0.02
## Cumulative Var    0.07    0.13    0.19    0.23    0.27    0.30    0.32    0.34
##                Factor9
## SS loadings       0.24
## Proportion Var    0.01
## Cumulative Var    0.35
## 
## Test of the hypothesis that 9 factors are sufficient.
## The chi square statistic is 528.49 on 342 degrees of freedom.
## The p-value is 3.63e-10

load <- tidy(fitfact)

ggplot(load, aes(x = fl1, y = fl2)) + 
  geom_point() + 
  geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
  labs(x = "Factor 1", y = "Factor 2",
       title = "Standardised Loadings", 
       subtitle = "Based upon correlation matrix") +
  theme_minimal()

Packages

In this analysis we used the following packages. You can learn more about each one by clicking on the links below.

mirt: For IRT analysis
psych: For factor analysis
tidyverse: For data wrangling and visualisation
reshape: For reshaping nested lists
vroom: For reading in many files at once
broom: For tidying model output
fs: For file system operations
gt: For formatting tables
knitr: For markdown tables
ggrepel: For labelling points without overlap
skimr: For data frame level summary
ggridges: For ridge plots