Sociology students’ preferences of Harry Potter characters
thematic::thematic_rmd(
bg = "#FFFFFF",
fg = "#588650FF",
accent = "#1F5D25FF",
font = font_spec("Roboto Condensed", scale = 0.8),
qualitative = hp_palettes$slytherin,
sequential = sequential_gradient(0.5, 0.75))
pal <- c("#A0AF95FF", "#367135FF")I was and still am a huge Harry Potter fan. I believe deeply that one’s favorite characters in these books and movies can tell about the one more than his/her major, zodiac sign, or personality type (Yes, I’ve studied sociology for years now).
I will work with obtained data of my classmates Harry Potter characters preferences. The survey can be found here.
The goals of this project are pretty simple:
To know more about my classmates;
To detect patterns of how friendship ties (only mine) reflect on (dis)like for certain characters;
To find clusters of classmates with similar preferences.
classmates <- read.csv("C:/Users/Alina/Desktop/Adv_da/hp.csv",
row.names = 1,
sep = ";",
encoding = 'UTF-8')
hp <- classmates[,-(10:11)]Unfolding model
hp_un <- unfolding(hp, itmax = 10000, eps = 1e-8)
hp_un##
## Call: unfolding(delta = hp, itmax = 10000, eps = 1e-08)
##
## Model: Rectangular smacof
## Number of subjects: 30
## Number of objects: 9
## Transformation: none
## Conditionality: matrix
##
## Stress-1 value: 0.320597
## Penalized Stress: 2.927367
## Number of iterations: 115
hp_un1 <- unfolding(hp, type = "mspline", itmax = 10000, eps = 1e-8)
hp_un1##
## Call: unfolding(delta = hp, type = "mspline", itmax = 10000, eps = 1e-08)
##
## Model: Rectangular smacof
## Number of subjects: 30
## Number of objects: 9
## Transformation: mspline
## Conditionality: matrix
##
## Stress-1 value: 0.380042
## Penalized Stress: 2.670587
## Number of iterations: 109
So I have a matrix with different values in rows and columns. There are 30 observations (my classmates) and 9 variables (characters). The stress-1 value is not at all good, but I know better than looking only at it. But I tried to create a model with all other MDS types, this, even on the level of stress-1 value is still better, so I will continue to work with it.
Let’s look at my first “working version” graph!
plot(hp_un, pch = 1, colors = pal)At this stage, I see how I can interpret the dimensions. The first one is the “headship of character” (Harry, Hermione – most popular here; Luna, Neville – secondary characters). The second dimension is somehow combining friendly and guileless characters with cunning and smart ones.
Visualization
Plotly graph
So one can easily find oneself and friends on it
pal <- c("#367135FF", "#A0AF95FF")
class <- as.data.frame(hp_un$conf.row)
class$who <- "students"
char <- hp_un$conf.col %>% as.data.frame()
char$who <- "characters"
prefer <- rbind(class, char)
prefer %>% plot_ly(x = ~D1, y = ~D2, type = 'scatter',
mode = 'text', text = ~rownames(prefer), color = ~who, colors = pal, textposition = 'middle right',
textfont = list(size = 12))I am pretty close to my friends here. And I’m ready to hear about the situation of others!
Let’s add dimensions
classplot <- ggplot(class, aes(x = D1, y = D2))
classplot + geom_point(size = 1, alpha = 0.5, colour = "#A0AF95FF") +
coord_fixed(xlim = c(-1.5, 1.5), ylim = c(-1.5, 1.5)) +
xlab("") +
ylab("") +
geom_point(aes(x = D1, y = D2), char, colour = "#367135FF") +
geom_text_repel(aes(x = D1, y = D2, label = rownames(char)),
char, vjust = -0.8) +
ggtitle("Unfolding Configuration Harry Potter characters") +
theme_minimal() +
annotate("text", x = -1.5, y = 0, label = "Main Characters", angle = 90, color = "#1F5D25") +
annotate("text", x = 1.5, y = 0, label = "Secondary Characters", angle = 270, color = "#1F5D25") +
annotate("text", x = 0, y = -1.5, label = "Cunning & Smart", color = "#1F5D25") +
annotate("text", x = 0, y = 1.5, label = "Kind & Friendly", color = "#1F5D25") classplot <- ggplot(class, aes(x = D1, y = D2))
classplot + geom_point(size = 1, alpha = 0.5, colour = "#A0AF95FF") +
coord_fixed(xlim = c(-1.5, 1.5), ylim = c(-1.5, 1.5)) +
geom_text_repel(aes(x = D1, y = D2, label = rownames(class)),
class, vjust = -0.8) +
xlab("") +
ylab("") +
geom_label(aes(x = D1, y = D2, label = rownames(char)),
char, vjust = -0.8, colour = "#367135FF") +
ggtitle("Unfolding Configuration Harry Potter characters") +
theme_minimal() +
annotate("text", x = -1.5, y = 0, label = "Main Characters", angle = 90, color = "#1F5D25") +
annotate("text", x = 1.5, y = 0, label = "Secondary Characters", angle = 270, color = "#1F5D25") +
annotate("text", x = 0, y = -1.5, label = "Cunning & Smart", color = "#1F5D25") +
annotate("text", x = 0, y = 1.5, label = "Kind & Friendly", color = "#1F5D25") Goodness-of-fit assessment
hp_un[["spp.row"]]## Алина Чистякова Оксана Карачева Лера Львова Макар Минченко
## 4.1601252 3.0522769 1.5935627 4.2039511
## Катя Тетерина Тимур Шарифуллин Маша Миронова Юля Карпикова
## 3.0604837 1.1734863 1.2895361 3.2696159
## Яна Одинцова Должин Токтонова Камилла Гарифуллина Ксения Горочная
## 1.7196392 0.8840623 2.8480586 2.2861618
## Полина Молчанова F14 Маша Лисконоженко M16
## 6.5403739 4.3421101 5.1231325 3.1183105
## F17 Лера Иванникова Саша Мартынова F20
## 6.4705118 2.5917722 2.7282698 4.2830707
## Ариана Егор Сакулин Настя Лиски F24
## 5.5655917 2.3855335 1.2635413 0.3514271
## Даня Репин Анастасия Муратова F27 Катя Рябинина
## 2.8136227 3.1891170 2.6593871 5.3824132
## F29 Елизавета Белова
## 6.2632698 5.3875854
Fun fact: being in Culture & Inequality seminar increases your chances of liking purebloods (especially Draco Malfoy and Luna Lovegood) and having the highest Stress Per Point values. And I do not believe that this is a coincidence.
I understand that there are people whose exclusion may increase the goodness-of-fit. But it is principal for me to not do it. My classmates helped me, I do not want to be selfish + I’m interested in analyzing data as it is!
plot(hp_un, plot.type = "stressplot")plot(hp_un, plot.type = "Shepard")As I mentioned earlier, I get that in other cases some deleting observations can be a good idea, but for me - it is not. I’ll look at the stress-1 value difference but won’t work with the new model further.
hp_un3 <- unfolding(breakfast[c(-13, -17, -29, -21), ])
hp_un3$stress## [1] 0.3007895
hp_un$stress - hp_un3$stress## [1] 0.01980756
Yes, there’s a slight difference in values for 2 models. Still, it’s so small, so it doesn’t worth it to exclude my colleagues from the analysis.
permtest(hp_un, hp, nrep = 1000, verbose = F)##
## Call: permtest.smacofR(object = hp_un, data = hp, nrep = 1000, verbose = F)
##
## SMACOF Permutation Test
## Number of objects: 30
## Number of replications (permutations): 1000
##
## Observed stress value: 0.321
## p-value: 0.003
I can reject H0 “stress/configuration are obtained from a random permutation of dissimilarities”. The model, at least, is better than the random one. There’s some structure inside the configuration.
K-means
Just to highlight the clusters here
# K-means clustering
clust <- kmeans(prefer[, 1:2], 3)$cluster %>%
as.factor()
prefer1 <- prefer %>%
mutate(groups = clust)
colnames(prefer1[1:2]) <- c("Dim.1", "Dim.2")
# Plot and color by groups
ggscatter(prefer1, x = "D1", y = "D2",
color = "groups",
size = 3,
ellipse = TRUE,
shape = "who",
ellipse.type = "convex",
repel = TRUE, label = rownames(prefer1),
xlab = "Main/Secondary Characters",
ylab = "Kind & Friendly/Cunning & Smart")Conclusion
For me, it is also possible to interpret the dimensions. The 1st is “Main/Secondary Characters”, and the 2nd is “Kind & Friendly/Cunning & Smart”.
Ron is highly unpopular;
I would say that there are 3 groups (for me, k-means is more interpretable with 3 clusters) of my classmates with somehow similar preferences.
I have quite similar tastes with my friends and with people I communicate with the most, so I’m interested in how it works for others!