Sociology students’ preferences of Harry Potter characters

thematic::thematic_rmd( 
  bg = "#FFFFFF",
  fg = "#588650FF",
  accent = "#1F5D25FF",
  font = font_spec("Roboto Condensed", scale = 0.8),
  qualitative = hp_palettes$slytherin,
  sequential = sequential_gradient(0.5, 0.75))
pal <- c("#A0AF95FF", "#367135FF")

I was and still am a huge Harry Potter fan. I believe deeply that one’s favorite characters in these books and movies can tell about the one more than his/her major, zodiac sign, or personality type (Yes, I’ve studied sociology for years now).

I will work with obtained data of my classmates Harry Potter characters preferences. The survey can be found here.

The goals of this project are pretty simple:

  1. To know more about my classmates;

  2. To detect patterns of how friendship ties (only mine) reflect on (dis)like for certain characters;

  3. To find clusters of classmates with similar preferences.

classmates <- read.csv("C:/Users/Alina/Desktop/Adv_da/hp.csv",
                    row.names = 1,
                    sep = ";",
                    encoding = 'UTF-8')
hp <- classmates[,-(10:11)]

Unfolding model

hp_un <- unfolding(hp, itmax = 10000, eps = 1e-8)
hp_un
## 
## Call: unfolding(delta = hp, itmax = 10000, eps = 1e-08)
## 
## Model:               Rectangular smacof 
## Number of subjects:  30 
## Number of objects:   9 
## Transformation:      none 
## Conditionality:      matrix 
## 
## Stress-1 value:    0.320597 
## Penalized Stress:  2.927367 
## Number of iterations: 115
hp_un1 <- unfolding(hp, type = "mspline", itmax = 10000, eps = 1e-8)
hp_un1
## 
## Call: unfolding(delta = hp, type = "mspline", itmax = 10000, eps = 1e-08)
## 
## Model:               Rectangular smacof 
## Number of subjects:  30 
## Number of objects:   9 
## Transformation:      mspline 
## Conditionality:      matrix 
## 
## Stress-1 value:    0.380042 
## Penalized Stress:  2.670587 
## Number of iterations: 109

So I have a matrix with different values in rows and columns. There are 30 observations (my classmates) and 9 variables (characters). The stress-1 value is not at all good, but I know better than looking only at it. But I tried to create a model with all other MDS types, this, even on the level of stress-1 value is still better, so I will continue to work with it.

Let’s look at my first “working version” graph!

plot(hp_un, pch = 1, colors = pal)

At this stage, I see how I can interpret the dimensions. The first one is the “headship of character” (Harry, Hermione – most popular here; Luna, Neville – secondary characters). The second dimension is somehow combining friendly and guileless characters with cunning and smart ones.

Visualization

Plotly graph

So one can easily find oneself and friends on it

pal <- c("#367135FF", "#A0AF95FF")
class <- as.data.frame(hp_un$conf.row)
class$who <- "students"
char <- hp_un$conf.col %>% as.data.frame()
char$who <- "characters"
prefer <- rbind(class, char)
prefer %>% plot_ly(x = ~D1, y = ~D2, type = 'scatter',
        mode = 'text', text = ~rownames(prefer), color = ~who, colors = pal, textposition = 'middle right',
        textfont = list(size = 12))

I am pretty close to my friends here. And I’m ready to hear about the situation of others!

Let’s add dimensions

classplot <- ggplot(class, aes(x = D1, y = D2)) 
classplot + geom_point(size = 1, alpha = 0.5, colour = "#A0AF95FF") + 
  coord_fixed(xlim = c(-1.5, 1.5), ylim = c(-1.5, 1.5)) + 
  xlab("") +
  ylab("") +
  geom_point(aes(x = D1, y = D2), char, colour = "#367135FF") + 
  geom_text_repel(aes(x = D1, y = D2, label = rownames(char)), 
            char, vjust = -0.8) + 
  ggtitle("Unfolding Configuration Harry Potter characters")  +
  theme_minimal() +
  annotate("text", x = -1.5, y = 0, label = "Main Characters", angle = 90, color = "#1F5D25") +
  annotate("text", x = 1.5, y = 0, label = "Secondary Characters", angle = 270, color = "#1F5D25") +
  annotate("text", x = 0, y = -1.5, label = "Cunning & Smart", color = "#1F5D25") +
  annotate("text", x = 0, y = 1.5, label = "Kind & Friendly", color = "#1F5D25") 

classplot <- ggplot(class, aes(x = D1, y = D2)) 
classplot + geom_point(size = 1, alpha = 0.5, colour = "#A0AF95FF") + 
  coord_fixed(xlim = c(-1.5, 1.5), ylim = c(-1.5, 1.5)) + 
  geom_text_repel(aes(x = D1, y = D2, label = rownames(class)), 
            class, vjust = -0.8) +
  xlab("") +
  ylab("") + 
  geom_label(aes(x = D1, y = D2, label = rownames(char)), 
            char, vjust = -0.8, colour = "#367135FF") + 
  ggtitle("Unfolding Configuration Harry Potter characters")  +
  theme_minimal() +
  annotate("text", x = -1.5, y = 0, label = "Main Characters", angle = 90, color = "#1F5D25") +
  annotate("text", x = 1.5, y = 0, label = "Secondary Characters", angle = 270, color = "#1F5D25") +
  annotate("text", x = 0, y = -1.5, label = "Cunning & Smart", color = "#1F5D25") +
  annotate("text", x = 0, y = 1.5, label = "Kind & Friendly", color = "#1F5D25") 

Goodness-of-fit assessment

hp_un[["spp.row"]]
##     Алина Чистякова     Оксана Карачева         Лера Львова      Макар Минченко 
##           4.1601252           3.0522769           1.5935627           4.2039511 
##      Катя Тетерина     Тимур Шарифуллин       Маша Миронова       Юля Карпикова 
##           3.0604837           1.1734863           1.2895361           3.2696159 
##        Яна Одинцова   Должин Токтонова  Камилла Гарифуллина     Ксения Горочная 
##           1.7196392           0.8840623           2.8480586           2.2861618 
##    Полина Молчанова                 F14  Маша Лисконоженко                  M16 
##           6.5403739           4.3421101           5.1231325           3.1183105 
##                 F17     Лера Иванникова      Саша Мартынова                 F20 
##           6.4705118           2.5917722           2.7282698           4.2830707 
##              Ариана        Егор Сакулин         Настя Лиски                 F24 
##           5.5655917           2.3855335           1.2635413           0.3514271 
##          Даня Репин  Анастасия Муратова                 F27       Катя Рябинина 
##           2.8136227           3.1891170           2.6593871           5.3824132 
##                 F29    Елизавета Белова 
##           6.2632698           5.3875854

Fun fact: being in Culture & Inequality seminar increases your chances of liking purebloods (especially Draco Malfoy and Luna Lovegood) and having the highest Stress Per Point values. And I do not believe that this is a coincidence.

I understand that there are people whose exclusion may increase the goodness-of-fit. But it is principal for me to not do it. My classmates helped me, I do not want to be selfish + I’m interested in analyzing data as it is!

plot(hp_un, plot.type = "stressplot")

plot(hp_un, plot.type = "Shepard")

As I mentioned earlier, I get that in other cases some deleting observations can be a good idea, but for me - it is not. I’ll look at the stress-1 value difference but won’t work with the new model further.

hp_un3 <- unfolding(breakfast[c(-13, -17, -29, -21), ])
hp_un3$stress
## [1] 0.3007895
hp_un$stress - hp_un3$stress
## [1] 0.01980756

Yes, there’s a slight difference in values for 2 models. Still, it’s so small, so it doesn’t worth it to exclude my colleagues from the analysis.

permtest(hp_un, hp, nrep = 1000, verbose = F)
## 
## Call: permtest.smacofR(object = hp_un, data = hp, nrep = 1000, verbose = F)
## 
## SMACOF Permutation Test
## Number of objects: 30 
## Number of replications (permutations): 1000 
## 
## Observed stress value: 0.321 
## p-value: 0.003

I can reject H0 “stress/configuration are obtained from a random permutation of dissimilarities”. The model, at least, is better than the random one. There’s some structure inside the configuration.

K-means

Just to highlight the clusters here

# K-means clustering
clust <- kmeans(prefer[, 1:2], 3)$cluster %>%
  as.factor()
prefer1 <- prefer %>%
  mutate(groups = clust)
colnames(prefer1[1:2]) <- c("Dim.1", "Dim.2")
# Plot and color by groups
ggscatter(prefer1, x = "D1", y = "D2",
          color = "groups",
          size = 3, 
          ellipse = TRUE,
          shape = "who",
          ellipse.type = "convex",
          repel = TRUE, label = rownames(prefer1),
          xlab = "Main/Secondary Characters",
          ylab = "Kind & Friendly/Cunning & Smart")

Conclusion

  1. For me, it is also possible to interpret the dimensions. The 1st is “Main/Secondary Characters”, and the 2nd is “Kind & Friendly/Cunning & Smart”.

  2. Ron is highly unpopular;

  3. I would say that there are 3 groups (for me, k-means is more interpretable with 3 clusters) of my classmates with somehow similar preferences.

  4. I have quite similar tastes with my friends and with people I communicate with the most, so I’m interested in how it works for others!