Flipped Test 2: MDS & Unfolding

Data

To collect the data for this project, I asked 20 3-year students to rate their classes from 1st and 2nd year of studying Sociology and Social Informatics on 10-point scale. The following measures were asked about 19 subjects:
- Practical value for their future career;
- Value for personal development and broadening horizons;
- Interestingness of the class;
- Difficulty to complete the class.

Basically, I wanted to conduct my own independent Course Evaluation. After receiving all the answers, I calculated the averages for each class and compiled them into the following table:

library(readr)
library(rmarkdown)
library(smacof)
library(ggrepel)
library(ggplot2)
library(dplyr)

data <- read.csv("D:/downl/subjects - Sheet3.csv",
row.names = 1)

paged_table(data)

And then I reversed the values and converted similarities to dissimilarities.

sd2 <- sim2diss(data, method = "reverse")
paged_table(as.data.frame(sd2))

Unfolding

Model

Because my data is a rectangular matrix, I am going to use unfolding model.

sd_unf <- unfolding(sd2)
sd_unf

## 
## Call: unfolding(delta = sd2)
## 
## Model:               Rectangular smacof 
## Number of subjects:  19 
## Number of objects:   4 
## Transformation:      none 
## Conditionality:      matrix 
## 
## Stress-1 value:    0.097975 
## Penalized Stress:  1.55331 
## Number of iterations: 118

Stress-1 value is quite low, which indicates a good fit, but it’s not a sufficient metric to judge the model, so we are going to compute some other metrics.

Goodness of fit

Firstly, let’s observe stress per point (SPP) and see if we can improve the model.

summary(sd_unf)

## 
## Subject configuration (rows):
##                            D1      D2
## Academic Writing       0.3507 -0.7273
## Algebra                0.1861 -0.0257
## Safe Living Basics     0.8029 -1.5242
## Information Systems    0.1811 -0.2877
## Methodology            0.4569 -0.5453
## General Sociology     -0.1931  0.1478
## Law                   -0.3176  1.1316
## Psychology            -0.2278 -0.5977
## Statistics             0.2784 -0.0149
## Physical Training     -0.6253  1.7468
## Philosophy            -0.5863  0.2818
## Economics              0.6308 -0.4273
## Data Analysis          0.4321 -0.2200
## Information Systems 2  0.7311 -0.7827
## Methodology 2          0.6107 -0.6925
## Anthropology          -0.4855  0.4899
## Social Stratification  0.3225 -0.7886
## Sociological Theory    0.0885 -0.6149
## Economic Theory        0.4012 -0.4741
## 
## Object configuration (columns):
##                           D1      D2
## Future.career         0.6304  0.1754
## Personal.development -0.3267 -0.2194
## Interestingness      -0.4334 -0.1820
## Difficulty            0.1297  0.2260
## 
## 
## Stress per point rows:
##                           SPP  SPP(%)
## General Sociology      0.3691  0.3691
## Social Stratification  0.3904  0.3904
## Physical Training      0.6961  0.6961
## Sociological Theory    0.7146  0.7146
## Economics              0.8803  0.8803
## Psychology             1.3083  1.3083
## Methodology            1.4232  1.4232
## Safe Living Basics     1.6860  1.6860
## Statistics             1.7389  1.7389
## Philosophy             3.2984  3.2984
## Methodology 2          3.3697  3.3697
## Law                    3.4758  3.4758
## Academic Writing       5.0906  5.0906
## Data Analysis          6.0530  6.0530
## Economic Theory        6.8187  6.8187
## Information Systems 2  8.2719  8.2719
## Algebra                8.4471  8.4471
## Anthropology          11.6889 11.6889
## Information Systems   34.2790 34.2790
## 
## Stress per point columns:
##                           SPP  SPP(%)
## General Sociology      0.3691  0.3691
## Social Stratification  0.3904  0.3904
## Physical Training      0.6961  0.6961
## Sociological Theory    0.7146  0.7146
## Economics              0.8803  0.8803
## Psychology             1.3083  1.3083
## Methodology            1.4232  1.4232
## Safe Living Basics     1.6860  1.6860
## Statistics             1.7389  1.7389
## Philosophy             3.2984  3.2984
## Methodology 2          3.3697  3.3697
## Law                    3.4758  3.4758
## Academic Writing       5.0906  5.0906
## Data Analysis          6.0530  6.0530
## Economic Theory        6.8187  6.8187
## Information Systems 2  8.2719  8.2719
## Algebra                8.4471  8.4471
## Anthropology          11.6889 11.6889
## Information Systems   34.2790 34.2790

Information Systems and Anthropology return huge values of SPP I will try to remove them and check if it got better.

data_new <- data %>% filter(!rownames(data) %in% c('Information Systems', 'Anthropology'))
sd3 <- sim2diss(data_new, method = "reverse")
sd_unf2 <- unfolding(sd3)

sd_unf2

## 
## Call: unfolding(delta = sd3)
## 
## Model:               Rectangular smacof 
## Number of subjects:  17 
## Number of objects:   4 
## Transformation:      none 
## Conditionality:      matrix 
## 
## Stress-1 value:    0.070612 
## Penalized Stress:  1.365903 
## Number of iterations: 118

Stress-1 value decreased, so I’ll stick with this version.

I will run the permutation check on the new model and check if this configuration is random.

permtest(sd_unf2, data_new, nrep = 100, verbose = F)

## 
## Call: permtest.smacofR(object = sd_unf2, data = data_new, nrep = 100, 
##     verbose = F)
## 
## SMACOF Permutation Test
## Number of objects: 17 
## Number of replications (permutations): 100 
## 
## Observed stress value: 0.071 
## p-value: <0.001

P-value is low, so we can reject the null hypothesis and say that our configuration has inherent structure.

Additionally, I will examine the Shepart Diagram:

plot(sd_unf2, plot.type = "Shepard")

It appears to be a fairly straight line, which means that not much information was lost during dimensionality reduction and I am satisfied with this model.

Plot

Finally, let us plot the results:

conf_items <- as.data.frame(sd_unf2$conf.col)
conf_perc <- as.data.frame(sd_unf2$conf.row)
p <- ggplot(conf_perc, aes(x = D1, y = D2)) 
p + geom_point(size = 1, colour = "red", alpha = 0.5) + 
  xlab("") +
  ylab("") +
  geom_point(aes(x = D1, y = D2), conf_items, colour = "cadetblue") + 
  geom_text_repel(aes(x = D1, y = D2, label = rownames(conf_perc)), 
            conf_perc, colour = "red") + 
  geom_text_repel(aes(x = D1, y = D2, label = rownames(conf_items)), 
            conf_items, colour = "cadetblue") +
  ggtitle("Unfolding for S&SI Classes") +
  theme_bw()

It seems like classes that are the best for personal development are also the most interesting ones, while difficult and useful for future career classes are also close together.
There is a big cluster of classes that are not the most difficult, averagely useful, and not that interesting.
Data Analysis and Economics are considered the most useful classes, Algebra and Statistics are expectedly the most difficult for sociologists, and philosophy, psychology and general sociology are the most interesting and the best for personal development.