Load the data from the two years and bind them to create a single data frame:
# load data
eth_2018 <- read_csv("data/2018/all.csv")
eth_2017 <- read_csv("data/2017/all.csv")
# add year as the identifier column to each data frame
eth_2018 <- eth_2018 %>%
mutate(year = 2018)
eth_2017 <- eth_2017 %>%
mutate(year = 2017)
# bind the two data frames together
eth <- bind_rows(eth_2017, eth_2018) %>%
relocate(year)
Skim an overview of the data:
skim(eth) %>% kable()
| skim_type | skim_variable | n_missing | complete_rate | numeric.mean | numeric.sd | numeric.p0 | numeric.p25 | numeric.p50 | numeric.p75 | numeric.p100 | numeric.hist |
|---|---|---|---|---|---|---|---|---|---|---|---|
| numeric | year | 0 | 1 | 2017.5086232 | 0.4999987 | 2017 | 2017 | 2018 | 2018 | 2018 | ▇▁▁▁▇ |
| numeric | A1 | 0 | 1 | 0.9517685 | 0.2142864 | 0 | 1 | 1 | 1 | 1 | ▁▁▁▁▇ |
| numeric | A2 | 0 | 1 | 0.2934814 | 0.4554237 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▃ |
| numeric | A3 | 0 | 1 | 0.6331482 | 0.4820160 | 0 | 0 | 1 | 1 | 1 | ▅▁▁▁▇ |
| numeric | A4 | 0 | 1 | 0.6337328 | 0.4818541 | 0 | 0 | 1 | 1 | 1 | ▅▁▁▁▇ |
| numeric | A5 | 0 | 1 | 0.4793920 | 0.4996482 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▇ |
| numeric | A6 | 0 | 1 | 0.7202572 | 0.4489384 | 0 | 0 | 1 | 1 | 1 | ▃▁▁▁▇ |
| numeric | A7 | 0 | 1 | 0.7065186 | 0.4554237 | 0 | 0 | 1 | 1 | 1 | ▃▁▁▁▇ |
| numeric | A8 | 0 | 1 | 0.4849459 | 0.4998464 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▇ |
| numeric | A9 | 0 | 1 | 0.4612686 | 0.4985705 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▇ |
| numeric | A10 | 0 | 1 | 0.5422391 | 0.4982855 | 0 | 0 | 1 | 1 | 1 | ▇▁▁▁▇ |
| numeric | A11 | 0 | 1 | 0.5784858 | 0.4938737 | 0 | 0 | 1 | 1 | 1 | ▆▁▁▁▇ |
| numeric | A12 | 0 | 1 | 0.6968723 | 0.4596771 | 0 | 0 | 1 | 1 | 1 | ▃▁▁▁▇ |
| numeric | A13 | 0 | 1 | 0.3656825 | 0.4816914 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▅ |
| numeric | A14 | 0 | 1 | 0.5667933 | 0.4955910 | 0 | 0 | 1 | 1 | 1 | ▆▁▁▁▇ |
| numeric | A15 | 0 | 1 | 0.4615609 | 0.4985931 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▇ |
| numeric | A16 | 0 | 1 | 0.1935107 | 0.3951075 | 0 | 0 | 0 | 0 | 1 | ▇▁▁▁▂ |
| numeric | A17 | 0 | 1 | 0.5960246 | 0.4907644 | 0 | 0 | 1 | 1 | 1 | ▆▁▁▁▇ |
| numeric | A18 | 0 | 1 | 0.3724057 | 0.4835163 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▅ |
| numeric | A19 | 0 | 1 | 0.3469746 | 0.4760772 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▅ |
| numeric | A20 | 0 | 1 | 0.7240573 | 0.4470534 | 0 | 0 | 1 | 1 | 1 | ▃▁▁▁▇ |
| numeric | A21 | 0 | 1 | 0.5150541 | 0.4998464 | 0 | 0 | 1 | 1 | 1 | ▇▁▁▁▇ |
| numeric | A22 | 0 | 1 | 0.5171003 | 0.4997805 | 0 | 0 | 1 | 1 | 1 | ▇▁▁▁▇ |
| numeric | A23 | 0 | 1 | 0.4223911 | 0.4940123 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▆ |
| numeric | A24 | 0 | 1 | 0.5822859 | 0.4932547 | 0 | 0 | 1 | 1 | 1 | ▆▁▁▁▇ |
| numeric | A25 | 0 | 1 | 0.6381175 | 0.4806153 | 0 | 0 | 1 | 1 | 1 | ▅▁▁▁▇ |
| numeric | A26 | 0 | 1 | 0.8018123 | 0.3986926 | 0 | 1 | 1 | 1 | 1 | ▂▁▁▁▇ |
| numeric | A27 | 0 | 1 | 0.3957907 | 0.4890913 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▅ |
| numeric | A28 | 0 | 1 | 0.4387606 | 0.4963081 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▆ |
| numeric | A29 | 0 | 1 | 0.5007308 | 0.5000726 | 0 | 0 | 1 | 1 | 1 | ▇▁▁▁▇ |
| numeric | A30 | 0 | 1 | 0.7430576 | 0.4370113 | 0 | 0 | 1 | 1 | 1 | ▃▁▁▁▇ |
| numeric | A31 | 0 | 1 | 0.3735750 | 0.4838235 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▅ |
| numeric | A32 | 0 | 1 | 0.2151418 | 0.4109807 | 0 | 0 | 0 | 0 | 1 | ▇▁▁▁▂ |
| numeric | A33 | 0 | 1 | 0.7699503 | 0.4209259 | 0 | 1 | 1 | 1 | 1 | ▂▁▁▁▇ |
| numeric | A34 | 0 | 1 | 0.5983631 | 0.4903009 | 0 | 0 | 1 | 1 | 1 | ▆▁▁▁▇ |
| numeric | A35 | 0 | 1 | 0.3285589 | 0.4697579 | 0 | 0 | 0 | 1 | 1 | ▇▁▁▁▃ |
| numeric | A36 | 0 | 1 | 0.2244958 | 0.4173108 | 0 | 0 | 0 | 0 | 1 | ▇▁▁▁▂ |
We can fit a Multidimensional Item Response Theory (mirt) model. From the function definition:
mirt fits a maximum likelihood (or maximum a posteriori) factor analysis model to any mixture of dichotomous and polytomous data under the item response theory paradigm using either Cai's (2010) Metropolis-Hastings Robbins-Monro (MHRM) algorithm.
The process is to first fit the model, and save the result as a model object that we can then parse to get tabular or visual displays of the model that we might want. When fitting the model, we have the option to specify a few arguments, which then get interpreted as parameters to be passed to the model.
fit_2pl <- mirt(
data = eth[2:37], # all but the year column
model = 1, # number of factors to extract
itemtype = "2PL", # 2 parameter logistic model
SE = TRUE # estimate standard errors
)
##
Iteration: 1, Log-Lik: -66428.235, Max-Change: 0.72929
Iteration: 2, Log-Lik: -65241.884, Max-Change: 0.24383
Iteration: 3, Log-Lik: -65081.725, Max-Change: 0.14895
Iteration: 4, Log-Lik: -65014.543, Max-Change: 0.11122
Iteration: 5, Log-Lik: -64977.384, Max-Change: 0.09454
Iteration: 6, Log-Lik: -64955.545, Max-Change: 0.06530
Iteration: 7, Log-Lik: -64942.861, Max-Change: 0.06130
Iteration: 8, Log-Lik: -64935.381, Max-Change: 0.04421
Iteration: 9, Log-Lik: -64930.611, Max-Change: 0.03616
Iteration: 10, Log-Lik: -64927.669, Max-Change: 0.02949
Iteration: 11, Log-Lik: -64925.871, Max-Change: 0.01790
Iteration: 12, Log-Lik: -64924.788, Max-Change: 0.01302
Iteration: 13, Log-Lik: -64922.989, Max-Change: 0.00234
Iteration: 14, Log-Lik: -64922.963, Max-Change: 0.00196
Iteration: 15, Log-Lik: -64922.949, Max-Change: 0.00172
Iteration: 16, Log-Lik: -64922.930, Max-Change: 0.00086
Iteration: 17, Log-Lik: -64922.925, Max-Change: 0.00094
Iteration: 18, Log-Lik: -64922.922, Max-Change: 0.00077
Iteration: 19, Log-Lik: -64922.912, Max-Change: 0.00018
Iteration: 20, Log-Lik: -64922.912, Max-Change: 0.00019
Iteration: 21, Log-Lik: -64922.912, Max-Change: 0.00018
Iteration: 22, Log-Lik: -64922.910, Max-Change: 0.00011
Iteration: 23, Log-Lik: -64922.910, Max-Change: 0.00010
Iteration: 24, Log-Lik: -64922.910, Max-Change: 0.00010
##
## Calculating information matrix...
We then compute factor score estimates and augment the existing data frame with these estimates, to keep everything in one place. To do the estimation, we use the fscores() function from the mirt package which takes in a computed model object and computes factor score estimates according to the method specified. We will use the EAP method for factor score estimation, which is the “expected a-posteriori” method, the default. We specify it explicitl below, but the results would have been the same if we omitted specifying the method arhument since it’s the default method the function uses.
eth <- eth %>%
mutate(F1 = fscores(fit_2pl, method = "EAP"))
We can also calculate the model coefficient estimates using a generic function coef() which is used to extract model coefficients from objects returned by modeling functions. We will set the IRTpars argument to TRUE, which means slope intercept parameters will be converted into traditional IRT parameters.
coefs_2pl <- coef(fit_2pl, IRTpars = TRUE)
The resulting object coefs is a list, with one element for each question, and an additional GroupPars element that we won’t be using. The output is a bit long, so we’re only showing a few of the elements here:
coefs_2pl[1:3]
## $A1
## a b g u
## par 1.200868 -2.990220 0 1
## CI_2.5 1.005934 -3.356706 NA NA
## CI_97.5 1.395802 -2.623733 NA NA
##
## $A2
## a b g u
## par 0.5617763 1.672543 0 1
## CI_2.5 0.4755542 1.407243 NA NA
## CI_97.5 0.6479984 1.937843 NA NA
##
## $A3
## a b g u
## par 1.348138 -0.5426261 0 1
## CI_2.5 1.229726 -0.6171188 NA NA
## CI_97.5 1.466549 -0.4681333 NA NA
coefs_2pl[35:37]
## $A35
## a b g u
## par 1.276322 0.7324404 0 1
## CI_2.5 1.160025 0.6487785 NA NA
## CI_97.5 1.392619 0.8161023 NA NA
##
## $A36
## a b g u
## par 1.245564 1.275905 0 1
## CI_2.5 1.121704 1.161266 NA NA
## CI_97.5 1.369425 1.390544 NA NA
##
## $GroupPars
## MEAN_1 COV_11
## par 0 1
## CI_2.5 NA NA
## CI_97.5 NA NA
Let’s take a closer look at the first element:
coefs_2pl[1]
## $A1
## a b g u
## par 1.200868 -2.990220 0 1
## CI_2.5 1.005934 -3.356706 NA NA
## CI_97.5 1.395802 -2.623733 NA NA
In this output:
a is discriminationb is difficultyTo make this output a little more user friendly, we should tidy it such that we have a row per question. We’ll do this in two steps. First, write a function that tidies the output for one question, i.e. one list element. Then, map this function over the list of all questions, resulting in a data frame.
tidy_mirt_coefs <- function(x){
x %>%
# melt the list element
melt() %>%
# convert to a tibble
as_tibble() %>%
# convert factors to characters
mutate(across(where(is.factor), as.character)) %>%
# only focus on rows where X2 is a or b (discrimination or difficulty)
filter(X2 %in% c("a", "b")) %>%
# in X1, relabel par (parameter) as est (estimate)
mutate(X1 = if_else(X1 == "par", "est", X1)) %>%
# unite columns X2 and X1 into a new column called var separated by _
unite(X2, X1, col = "var", sep = "_") %>%
# turn into a wider data frame
pivot_wider(names_from = var, values_from = value)
}
Let’s see what this does to a single element in coefs:
tidy_mirt_coefs(coefs_2pl[1])
## # A tibble: 1 x 7
## L1 a_est a_CI_2.5 a_CI_97.5 b_est b_CI_2.5 b_CI_97.5
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 A1 1.20 1.01 1.40 -2.99 -3.36 -2.62
And now let’s map it over all 36 elements of coefs:
tidy_2pl <- map_dfr(coefs_2pl[1:36], tidy_mirt_coefs, .id = "Question")
A quick peek at the result:
tidy_2pl
## # A tibble: 36 x 7
## Question a_est a_CI_2.5 a_CI_97.5 b_est b_CI_2.5 b_CI_97.5
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 A1 1.20 1.01 1.40 -2.99 -3.36 -2.62
## 2 A2 0.562 0.476 0.648 1.67 1.41 1.94
## 3 A3 1.35 1.23 1.47 -0.543 -0.617 -0.468
## 4 A4 1.18 1.07 1.29 -0.592 -0.675 -0.509
## 5 A5 1.46 1.34 1.58 0.0787 0.0147 0.143
## 6 A6 1.55 1.42 1.69 -0.864 -0.943 -0.785
## 7 A7 1.28 1.16 1.40 -0.896 -0.986 -0.805
## 8 A8 1.16 1.05 1.27 0.0659 -0.00750 0.139
## 9 A9 2.02 1.86 2.18 0.127 0.0723 0.182
## 10 A10 1.76 1.62 1.90 -0.147 -0.206 -0.0886
## # … with 26 more rows
And a nicely formatted table of the result:
gt(tidy_2pl) %>%
fmt_number(columns = contains("_"), decimals = 3) %>%
data_color(
columns = contains("a_"),
colors = scales::col_numeric(palette = c("Greens"), domain = NULL)
) %>%
data_color(
columns = contains("b_"),
colors = scales::col_numeric(palette = c("Blues"), domain = NULL)
) %>%
tab_spanner(label = "Discrimination", columns = contains("a_")) %>%
tab_spanner(label = "Difficulty", columns = contains("b_")) %>%
cols_label(
a_est = "Est.",
b_est = "Est.",
a_CI_2.5 = "2.5%",
b_CI_2.5 = "2.5%",
a_CI_97.5 = "97.5%",
b_CI_97.5 = "97.5%"
)
| Question | Discrimination | Difficulty | ||||
|---|---|---|---|---|---|---|
| Est. | 2.5% | 97.5% | Est. | 2.5% | 97.5% | |
| A1 | 1.201 | 1.006 | 1.396 | −2.990 | −3.357 | −2.624 |
| A2 | 0.562 | 0.476 | 0.648 | 1.673 | 1.407 | 1.938 |
| A3 | 1.348 | 1.230 | 1.467 | −0.543 | −0.617 | −0.468 |
| A4 | 1.177 | 1.068 | 1.286 | −0.592 | −0.675 | −0.509 |
| A5 | 1.462 | 1.340 | 1.584 | 0.079 | 0.015 | 0.143 |
| A6 | 1.553 | 1.416 | 1.690 | −0.864 | −0.943 | −0.785 |
| A7 | 1.281 | 1.162 | 1.400 | −0.896 | −0.986 | −0.805 |
| A8 | 1.160 | 1.055 | 1.265 | 0.066 | −0.007 | 0.139 |
| A9 | 2.017 | 1.856 | 2.178 | 0.127 | 0.072 | 0.182 |
| A10 | 1.762 | 1.620 | 1.904 | −0.147 | −0.206 | −0.089 |
| A11 | 1.678 | 1.540 | 1.815 | −0.281 | −0.342 | −0.220 |
| A12 | 0.886 | 0.789 | 0.984 | −1.092 | −1.223 | −0.960 |
| A13 | 0.716 | 0.628 | 0.803 | 0.856 | 0.718 | 0.994 |
| A14 | 1.433 | 1.312 | 1.554 | −0.259 | −0.325 | −0.193 |
| A15 | 0.848 | 0.757 | 0.938 | 0.210 | 0.117 | 0.303 |
| A16 | 1.544 | 1.396 | 1.693 | 1.291 | 1.189 | 1.392 |
| A17 | 1.221 | 1.111 | 1.331 | −0.411 | −0.487 | −0.335 |
| A18 | 1.234 | 1.122 | 1.347 | 0.547 | 0.468 | 0.626 |
| A19 | 1.309 | 1.191 | 1.427 | 0.639 | 0.560 | 0.718 |
| A20 | 2.318 | 2.119 | 2.518 | −0.741 | −0.802 | −0.679 |
| A21 | 2.143 | 1.973 | 2.313 | −0.048 | −0.102 | 0.005 |
| A22 | 1.996 | 1.837 | 2.154 | −0.056 | −0.111 | −0.001 |
| A23 | 1.874 | 1.723 | 2.025 | 0.264 | 0.206 | 0.322 |
| A24 | 1.433 | 1.311 | 1.554 | −0.320 | −0.387 | −0.253 |
| A25 | 0.530 | 0.449 | 0.612 | −1.139 | −1.347 | −0.930 |
| A26 | 2.418 | 2.193 | 2.642 | −1.042 | −1.112 | −0.972 |
| A27 | 2.000 | 1.838 | 2.162 | 0.347 | 0.290 | 0.404 |
| A28 | 1.859 | 1.709 | 2.008 | 0.209 | 0.151 | 0.266 |
| A29 | 0.912 | 0.819 | 1.005 | −0.004 | −0.090 | 0.083 |
| A30 | 1.258 | 1.137 | 1.379 | −1.091 | −1.193 | −0.988 |
| A31 | 1.255 | 1.142 | 1.368 | 0.537 | 0.459 | 0.614 |
| A32 | 1.607 | 1.457 | 1.757 | 1.151 | 1.060 | 1.242 |
| A33 | 0.932 | 0.825 | 1.038 | −1.518 | −1.680 | −1.355 |
| A34 | 0.906 | 0.811 | 1.000 | −0.517 | −0.614 | −0.420 |
| A35 | 1.276 | 1.160 | 1.393 | 0.732 | 0.649 | 0.816 |
| A36 | 1.246 | 1.122 | 1.369 | 1.276 | 1.161 | 1.391 |
Do students from different programmes of study have different distributions of ability?
# produce list of all the relevant file names
# (match only the courses beginning 401, i.e. exclude "all.csv")
files <- dir_ls("data", recurse = TRUE, regexp = "401.*.csv")
# read all files and add a column called file_path to identify them
eth_entry_test <- vroom(files, id = "file_path")
# parse file_path column into year and class
eth_entry_test <- eth_entry_test %>%
mutate(
file_path = str_remove(file_path, "data/"),
file_path = str_remove(file_path, ".csv"),
) %>%
separate(file_path, c("year", "class"), sep = "/")
# fit IRT model
fit_2pl <- mirt(
eth_entry_test[3:38], # all but the year and class columns
model = 1, # number of factors to extract
itemtype = "2PL", # 2 parameter logistic model
SE = TRUE # estimate standard errors
)
##
Iteration: 1, Log-Lik: -85250.363, Max-Change: 0.82374
Iteration: 2, Log-Lik: -83714.765, Max-Change: 0.37956
Iteration: 3, Log-Lik: -83535.988, Max-Change: 0.17218
Iteration: 4, Log-Lik: -83473.241, Max-Change: 0.09242
Iteration: 5, Log-Lik: -83442.310, Max-Change: 0.06845
Iteration: 6, Log-Lik: -83425.490, Max-Change: 0.05833
Iteration: 7, Log-Lik: -83416.100, Max-Change: 0.03700
Iteration: 8, Log-Lik: -83410.552, Max-Change: 0.02708
Iteration: 9, Log-Lik: -83407.196, Max-Change: 0.02090
Iteration: 10, Log-Lik: -83402.119, Max-Change: 0.00403
Iteration: 11, Log-Lik: -83401.998, Max-Change: 0.00347
Iteration: 12, Log-Lik: -83401.919, Max-Change: 0.00292
Iteration: 13, Log-Lik: -83401.792, Max-Change: 0.00198
Iteration: 14, Log-Lik: -83401.775, Max-Change: 0.00117
Iteration: 15, Log-Lik: -83401.765, Max-Change: 0.00104
Iteration: 16, Log-Lik: -83401.737, Max-Change: 0.00033
Iteration: 17, Log-Lik: -83401.736, Max-Change: 0.00037
Iteration: 18, Log-Lik: -83401.735, Max-Change: 0.00031
Iteration: 19, Log-Lik: -83401.731, Max-Change: 0.00023
Iteration: 20, Log-Lik: -83401.731, Max-Change: 0.00018
Iteration: 21, Log-Lik: -83401.730, Max-Change: 0.00018
Iteration: 22, Log-Lik: -83401.730, Max-Change: 0.00016
Iteration: 23, Log-Lik: -83401.730, Max-Change: 0.00015
Iteration: 24, Log-Lik: -83401.730, Max-Change: 0.00014
Iteration: 25, Log-Lik: -83401.729, Max-Change: 0.00012
Iteration: 26, Log-Lik: -83401.729, Max-Change: 0.00011
Iteration: 27, Log-Lik: -83401.729, Max-Change: 0.00010
Iteration: 28, Log-Lik: -83401.729, Max-Change: 0.00010
##
## Calculating information matrix...
# augment data with factor scores from model fit
eth_entry_test <- eth_entry_test %>%
mutate(F1 = fscores(fit_2pl, method = "EAP"))
Compare the distribution of abilities in the two years.
ggplot(eth_entry_test, aes(F1, fill = as.factor(year), colour = as.factor(year))) +
geom_density(alpha=0.5) +
scale_x_continuous(limits = c(-3.5,3.5)) +
labs(title = "Density plot",
subtitle = "Ability grouped by year of taking the test",
x = "Ability", y = "Density",
fill = "Year", colour = "Year") +
theme_minimal()
There does not seem to be a big difference between the two year groups, so we combine them in the following analysis.
Compare the distribution of abilities in the various classes.
ggplot(eth_entry_test, aes(x = F1, y = class, colour = class, fill = class)) +
geom_density_ridges(alpha = 0.5) +
scale_x_continuous(limits = c(-3.5,3.5)) +
guides(fill = FALSE, colour = FALSE) +
labs(title = "Density plot",
subtitle = "Ability grouped by class of taking the test",
x = "Ability", y = "Class") +
theme_minimal()
## Picking joint bandwidth of 0.256
fa.parallel(eth[2:37], fa = "fa")
## Parallel analysis suggests that the number of factors = 9 and the number of components = NA
fitfact <- factanal(eth[2:37], 1, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
##
## Call:
## factanal(x = eth[2:37], factors = 1, rotation = "varimax")
##
## Uniquenesses:
## A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16
## 0.95 0.95 0.75 0.79 0.72 0.74 0.79 0.79 0.61 0.66 0.67 0.88 0.91 0.73 0.87 0.81
## A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32
## 0.78 0.78 0.78 0.63 0.59 0.61 0.64 0.73 0.94 0.69 0.63 0.64 0.85 0.81 0.78 0.78
## A33 A34 A35 A36
## 0.88 0.86 0.79 0.84
##
## Loadings:
## [1] 0.53 0.51 0.62 0.59 0.57 0.52 0.61 0.64 0.62 0.60 0.52 0.56 0.61 0.60
## [16] 0.50 0.45 0.46 0.46 0.35 0.31 0.36 0.44 0.47 0.46 0.47 0.39 0.44
## [31] 0.47 0.47 0.34 0.38 0.46 0.40
##
## Factor1
## SS loadings 8.35
## Proportion Var 0.23
##
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 5428.36 on 594 degrees of freedom.
## The p-value is 0
load <- tidy(fitfact)
ggplot(load, aes(x = fl1, y = 0)) +
geom_point() +
geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
labs(x = "Factor 1", y = NULL,
title = "Standardised Loadings",
subtitle = "Based upon correlation matrix") +
theme_minimal()
fitfact <- factanal(eth[2:37], 2, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
##
## Call:
## factanal(x = eth[2:37], factors = 2, rotation = "varimax")
##
## Uniquenesses:
## A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16
## 0.95 0.94 0.75 0.79 0.72 0.74 0.78 0.79 0.61 0.66 0.67 0.88 0.90 0.73 0.87 0.81
## A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32
## 0.77 0.40 0.04 0.63 0.59 0.61 0.64 0.72 0.94 0.69 0.63 0.64 0.85 0.80 0.78 0.78
## A33 A34 A35 A36
## 0.88 0.86 0.79 0.84
##
## Loadings:
## Factor1 Factor2
## A9 0.60
## A10 0.55
## A11 0.56
## A20 0.58
## A21 0.59
## A22 0.59
## A23 0.58
## A24 0.51
## A26 0.53
## A27 0.58
## A28 0.57
## A18 0.75
## A19 0.97
## A1
## A2
## A3 0.47
## A4 0.44
## A5 0.50
## A6 0.49
## A7 0.46
## A8 0.42
## A12 0.33
## A13 0.30
## A14 0.48
## A15 0.36
## A16 0.38
## A17 0.46
## A25
## A29 0.36
## A30 0.44
## A31 0.45
## A32 0.43
## A33 0.34
## A34 0.36
## A35 0.42
## A36 0.38
##
## Factor1 Factor2
## SS loadings 7.25 2.29
## Proportion Var 0.20 0.06
## Cumulative Var 0.20 0.27
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 3050.55 on 559 degrees of freedom.
## The p-value is 0
load <- tidy(fitfact)
ggplot(load, aes(x = fl1, y = fl2)) +
geom_point() +
geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
labs(x = "Factor 1", y = "Factor 2",
title = "Standardised Loadings",
subtitle = "Based upon correlation matrix") +
theme_minimal()
fitfact <- factanal(eth[2:37], 9, rotation = "varimax")
print(fitfact, digits = 2, cutoff = 0.3, sort = TRUE)
##
## Call:
## factanal(x = eth[2:37], factors = 9, rotation = "varimax")
##
## Uniquenesses:
## A1 A2 A3 A4 A5 A6 A7 A8 A9 A10 A11 A12 A13 A14 A15 A16
## 0.88 0.64 0.71 0.74 0.64 0.66 0.72 0.74 0.46 0.64 0.64 0.86 0.79 0.69 0.82 0.56
## A17 A18 A19 A20 A21 A22 A23 A24 A25 A26 A27 A28 A29 A30 A31 A32
## 0.66 0.41 0.00 0.47 0.51 0.57 0.60 0.62 0.91 0.51 0.59 0.61 0.80 0.71 0.75 0.74
## A33 A34 A35 A36
## 0.83 0.69 0.55 0.70
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8 Factor9
## A24 0.51
## A20 0.62
## A26 0.31 0.56
## A9 0.31 0.59
## A18 0.70
## A19 0.96
## A35 0.61
## A16 0.58
## A2 0.59
## A1
## A3 0.33
## A4 0.37
## A5 0.38
## A6 0.36
## A7 0.44
## A8 0.31
## A10 0.35
## A11 0.32 0.39
## A12
## A13 0.34
## A14
## A15
## A17 0.50
## A21 0.47
## A22 0.42
## A23 0.30 0.41
## A25
## A27 0.38
## A28 0.43
## A29 0.31
## A30 0.48
## A31
## A32
## A33 0.32
## A34 0.47
## A36 0.44
##
## Factor1 Factor2 Factor3 Factor4 Factor5 Factor6 Factor7 Factor8
## SS loadings 2.61 2.22 1.93 1.69 1.37 0.96 0.81 0.76
## Proportion Var 0.07 0.06 0.05 0.05 0.04 0.03 0.02 0.02
## Cumulative Var 0.07 0.13 0.19 0.23 0.27 0.30 0.32 0.34
## Factor9
## SS loadings 0.24
## Proportion Var 0.01
## Cumulative Var 0.35
##
## Test of the hypothesis that 9 factors are sufficient.
## The chi square statistic is 528.49 on 342 degrees of freedom.
## The p-value is 3.63e-10
load <- tidy(fitfact)
ggplot(load, aes(x = fl1, y = fl2)) +
geom_point() +
geom_label_repel(aes(label = paste0("A", rownames(load))), show.legend = FALSE) +
labs(x = "Factor 1", y = "Factor 2",
title = "Standardised Loadings",
subtitle = "Based upon correlation matrix") +
theme_minimal()
In this analysis we used the following packages. You can learn more about each one by clicking on the links below.