19/07/2021library(psych)
library(tidyverse)
library(ggplot2)
library(GPArotation)
library(umx)
library(corrplot)
library(readr)
library(formattable)
library(MVN)
We will perform a factor analysis on a dataset of responses to the online interactive version of the Humor Styles Questionnair (HSQ) conducted in 2003 by Martin, R. A., Puhlik-Doris, P., Larsen, G., Gray, J., & Weir, K.. Next, we will check the results and compare them for three different gender identities stated in the questionnaire.
The HSQ was developed as a comprehensive self-report measure of the daily functions of humor, particularly those relevant to psychosocial well-being, postulating two goals of humor use, improving oneself vs. improving one’s relationships with others, as well as two ways of using humor to achieve these goals, positively or negatively.
The following items were rated on a 5 point scale where:
1=Strongly disagree,
2=Disagree,
3=Neither agree not disagree,
4=Agree,
5=Strongly agree.
The last page of the test had the following information entered:
First we select the 32 items and filter them to have only data with an accuracy greater than 90, in this way we reduce the amount of data that we are going to analyze but increase the precision of them so that the analysis is not biased by false values.
In addition, we eliminate the data obtained from people who declared to be 151, 242, 2670, and 44849 years old and also the missing values (-1).
library(readr)
data <- read_csv("data.csv")
View(data)
Data=data.frame(data)
Data <- Data %>%dplyr::filter(accuracy>90, age<=70)
Data<-Data[apply(Data,1,function(row)all(row!=0, row!=-1)),]
Data1 <- Data%>%
dplyr::select(-affiliative,-selfenhancing,-agressive, -selfdefeating)
head(Data1)
After filtering, we obtain the responses of 361 participants to the 32 questions plus information on age, gender and accuracy.
brotools::describe(Data1)
Data1 <- Data1%>%
select(-age, -gender, -accuracy)
From the table we can see that no missing values are pressing and that the minimum and maximum cover the entire range of response options [1;5].
Furthermore, the accuracy is correctly greater than 90, the age does not present unrealistic values and the gender has been reduced to only two options after filtering, 1=male and 2=female.
All the variables are Likert items
Let’s do the normality assumption check on the variables
plot=Data1 %>% pivot_longer(cols=everything(), names_to = "VAR", values_to = "vals")
plot1= ggplot(plot, aes(x=vals)) + geom_bar(fill="blue") + facet_wrap(~VAR) + theme_void()
plot1
Looking at the graphs, all of the variables do not appear to be normally distributed. To properly test the hypothesis of normality, we perform the normality test.
df_shapiro=apply(Data1, 2, shapiro.test)
p.value=unlist(lapply(df_shapiro, function(x) x$p.value))
shap = as.data.frame(p.value)
formattable(shap)
| p.value | |
|---|---|
| Q1 | 5.544107e-22 |
| Q2 | 4.815557e-15 |
| Q3 | 9.912938e-14 |
| Q4 | 3.969369e-14 |
| Q5 | 3.259659e-17 |
| Q6 | 1.047160e-23 |
| Q7 | 2.445749e-14 |
| Q8 | 7.849488e-17 |
| Q9 | 9.304428e-17 |
| Q10 | 5.435999e-14 |
| Q11 | 2.543433e-15 |
| Q12 | 3.130462e-14 |
| Q13 | 8.211494e-28 |
| Q14 | 1.285223e-15 |
| Q15 | 1.059389e-17 |
| Q16 | 8.271325e-15 |
| Q17 | 1.484405e-23 |
| Q18 | 1.257181e-13 |
| Q19 | 3.069519e-15 |
| Q20 | 9.642760e-21 |
| Q21 | 1.935907e-26 |
| Q22 | 1.541646e-13 |
| Q23 | 6.628741e-14 |
| Q24 | 3.495105e-15 |
| Q25 | 4.956617e-28 |
| Q26 | 1.272132e-16 |
| Q27 | 5.683903e-19 |
| Q28 | 3.764479e-15 |
| Q29 | 2.318990e-19 |
| Q30 | 2.587740e-21 |
| Q31 | 2.225144e-15 |
| Q32 | 5.819178e-15 |
The p-value for each variable is significant because from the test result all items have a p-value approaching zero, so we reject the null hypothesis that the data are normally distributed and conclude that all 32 items are not normally distributed.
The KMO test is used to see if we have a factorial structure in our data, which is represented by the correlation of groups of variables.
We first check our correlation matrix.
Visual analysis of the correlation matrix is done to reveal some similarities or correlations that can be found in the data between groups of questions.
R<-cor(Data1)
corrplot(R,type="upper",order="hclust")
The blue structures are positively correlated groups and the red structure represents a negatively correlated group that emerges among the data.
psych::KMO(R)
## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = R)
## Overall MSA = 0.86
## MSA for each item =
## Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q11 Q12 Q13 Q14 Q15 Q16
## 0.91 0.90 0.89 0.87 0.87 0.79 0.79 0.82 0.91 0.85 0.86 0.87 0.89 0.90 0.82 0.86
## Q17 Q18 Q19 Q20 Q21 Q22 Q23 Q24 Q25 Q26 Q27 Q28 Q29 Q30 Q31 Q32
## 0.88 0.83 0.91 0.82 0.89 0.84 0.81 0.83 0.83 0.88 0.83 0.87 0.87 0.74 0.81 0.89
The Kaiser-Meyer-Olkin (KMO) factor adequacy value is 0.86 and is a good value suggesting that this correlation matrix is suitable for analysis (0.7 per variable is excellent) and that there is a factor-based structure within our data.
Now we use the scree plot with the parallel analysis to decide the number of factors that we are going to insert in the analysis.
fa.parallel(Data1, fa="fa",
show.legend=FALSE, main="Scree plot with parallel analysis")
## Parallel analysis suggests that the number of factors = 4 and the number of components = NA
Considering the parallel analysis method and the eigenvalue 1 rule, we can select 4 factors for the analysis. This choice is confirmed by the literature.
res_fa=fa(Data1, nfactors = 4, fm = "pa", rotate = "none")
customGreen = "#71CA97"
customRed = "#ff7f7f"
PA1_formatter <-
formatter("span",
style = x ~ style(
font.weight = "bold",
color = ifelse(x > 0.3 | x< -0.3, customGreen, customRed)))
PA2_formatter <-
formatter("span",
style = x ~ style(
font.weight = "bold",
color = ifelse(x > 0.3 | x< -0.3, customGreen, customRed)))
PA3_formatter <-
formatter("span",
style = x ~ style(
font.weight = "bold",
color = ifelse(x > 0.3 | x< -0.3, customGreen, customRed)))
PA4_formatter <-
formatter("span",
style = x ~ style(
font.weight = "bold",
color = ifelse(x > 0.3 | x< -0.3, customGreen, customRed)))
load = as.data.frame(unclass(res_fa$loadings))
formattable(load, list(
`PA1`=PA1_formatter,
`PA2`=PA2_formatter,
`PA3`=PA2_formatter,
`PA4`=PA2_formatter))
| PA1 | PA2 | PA3 | PA4 | |
|---|---|---|---|---|
| Q1 | -0.5728442 | 0.248471782 | -0.01859520 | 0.303835849 |
| Q2 | 0.5012124 | -0.158987035 | 0.16243860 | 0.298447914 |
| Q3 | 0.4663939 | 0.042766516 | -0.40683747 | 0.037597061 |
| Q4 | 0.3509635 | 0.463059917 | 0.13790131 | -0.090025248 |
| Q5 | 0.5001507 | -0.238101259 | 0.02850684 | -0.222036010 |
| Q6 | 0.4140986 | -0.233697893 | 0.12083764 | 0.217624216 |
| Q7 | -0.2128610 | -0.137876706 | 0.53175472 | -0.209070725 |
| Q8 | 0.4445715 | 0.623968871 | 0.17011407 | -0.112722833 |
| Q9 | -0.4431618 | 0.096036448 | -0.03836657 | 0.252597489 |
| Q10 | 0.5154251 | -0.037220634 | 0.25012711 | 0.437310646 |
| Q11 | 0.2380876 | 0.050607412 | -0.47689473 | 0.155080047 |
| Q12 | 0.3617808 | 0.444380736 | 0.20923101 | -0.166032419 |
| Q13 | 0.5522058 | -0.257164223 | 0.03879829 | -0.231953739 |
| Q14 | 0.5560420 | -0.179641223 | 0.23783242 | 0.287456251 |
| Q15 | -0.3729314 | -0.115922443 | 0.64575453 | -0.101980043 |
| Q16 | -0.3547675 | -0.523663024 | 0.01120780 | 0.106116902 |
| Q17 | -0.6392026 | 0.267697508 | -0.04272231 | 0.374943159 |
| Q18 | 0.5085105 | -0.070623603 | 0.30829088 | 0.501564120 |
| Q19 | 0.5511580 | -0.032316041 | -0.24870777 | 0.004631131 |
| Q20 | 0.3913583 | 0.653464053 | 0.08457721 | -0.036273552 |
| Q21 | 0.5440290 | -0.229455642 | 0.22514016 | -0.330124246 |
| Q22 | -0.4064303 | 0.007282421 | -0.02990634 | -0.086212240 |
| Q23 | -0.3405882 | -0.006351837 | 0.39234821 | -0.046434149 |
| Q24 | 0.1135508 | 0.407601128 | 0.13251194 | 0.055569254 |
| Q25 | -0.4570719 | 0.288461305 | -0.10897220 | 0.248833709 |
| Q26 | 0.5331993 | -0.156688095 | 0.28897994 | 0.306818838 |
| Q27 | 0.2842206 | 0.088276635 | -0.46384765 | 0.130555298 |
| Q28 | 0.4173245 | 0.102765828 | 0.01194820 | 0.107142841 |
| Q29 | -0.3829953 | 0.339726572 | 0.20667678 | 0.246241808 |
| Q30 | 0.2421815 | -0.201599723 | 0.13659153 | 0.293096457 |
| Q31 | -0.3290608 | 0.026582451 | 0.60666460 | -0.023107605 |
| Q32 | 0.4465693 | 0.524057149 | 0.17489787 | -0.122374206 |
Var = as.data.frame(unclass(res_fa$Vaccounted))
formattable(Var)
| PA1 | PA2 | PA3 | PA4 | |
|---|---|---|---|---|
| SS loadings | 6.0886697 | 2.70073670 | 2.50845952 | 1.67051485 |
| Proportion Var | 0.1902709 | 0.08439802 | 0.07838936 | 0.05220359 |
| Cumulative Var | 0.1902709 | 0.27466895 | 0.35305831 | 0.40526190 |
| Proportion Explained | 0.4695012 | 0.20825551 | 0.19342889 | 0.12881445 |
| Cumulative Proportion | 0.4695012 | 0.67775666 | 0.87118555 | 1.00000000 |
CUC<-data.frame(Communality=res_fa$communality,Uniqueness=res_fa$uniquenesses,Complexity=res_fa$complexity)
CUC
We can see that:
Loadings: partial correlation coefficients of factors to items.
h2 = Commonality: the proportion of element/variable variance explained by the factors. It is the sum of the loadings squared; we can consider commonality from 0.2 and up acceptable.
u2 = Uniqueness: (1 - commonality) is the residual variance;
Com = Complexity number: specifically tells you how much an item reflects a single construct. In this case, the average complexity of the item is 2.1 and is too high. Since this value is very far from one, we can say that the solution is not simple.
SS loadings: corresponds to the eigenvalues of the principal components.
Proportion Var: is the proportion of variability explained by each principal component.
Cumulative Var: measures the amount of information explained by the factors considered with respect to the total information.
fa.diagram(res_fa)
From this graph we can see that most of our elements are only explained by the first factor and that the fourth factor remains completely empty.
Also in the factor analysis we want to get a simple structure (1 factor = 1 variable), and we would like our complexity number to be as close to 1 as possible, which means that a variable is explained by only one factor.
In this case our MSE (Mean item complexity) is 2.1 which is particularly high, so we will try to decrease it using a rotation.
Cum = CUC %>% mutate(Vnames=rownames(.))
Cum$Vnames = as.factor(Cum$Vnames)
Cum = Cum %>%
ggplot( aes(x=reorder(Vnames,Communality), y=Communality)) +
geom_bar(stat="identity", fill="darkblue", alpha=1) +
xlab("Item")+
ggtitle("Communality")+
theme_minimal()
Com = CUC %>% mutate(Vnames=rownames(.))
Com$Vnames = as.factor(Com$Vnames)
Com = Com %>%
ggplot( aes(x=reorder(Vnames,Complexity), y=Complexity)) +
geom_bar(stat="identity", fill="darkblue", alpha=1) +
xlab("Item")+
ggtitle("Complexity")+
theme_minimal()
library(scater)
multiplot(Cum, Com)
From the graphs we can see the different Commonalities/Complexities between the items.
Findings :
To obtain a simple solution, we must modify our solution by rotating the result. In this way, we will increase the loads on one factor and decrease the loads on the other factors to obtain a simple solution.
After a preliminary analysis two items, Q22 and Q28, do not seem to be associated with any factor, so they need to be removed from the item pool;
Q22. “If I feel sad or upset, I usually lose my sense of humour”, has a very low commonality of 0.17 and a complexity value equal to 2.1;
Q28. “If I have problems or feel unhappy, I often cover it up by joking, so that even my closest friends don’t know how I really feel”, has a low commonality of 0.19 and a complexity value of 2.6.
We must therefore eliminate these two elements and repeat the analysis to obtain a simple solution.
res_PA_obl = fa(Data1[-c(22, 28)], nfactors = 4, rotate="promax",fm="pa")
load = as.data.frame(unclass(res_PA_obl$loadings))
formattable(load, list(
`PA1`=PA1_formatter,
`PA2`=PA2_formatter,
`PA3`=PA2_formatter,
`PA4`=PA2_formatter))
| PA1 | PA2 | PA3 | PA4 | |
|---|---|---|---|---|
| Q1 | 0.69284016 | -0.0175328641 | 0.0086904381 | 0.013122254 |
| Q2 | -0.07079602 | -0.0005494547 | -0.0464550253 | 0.568952668 |
| Q3 | -0.13884025 | 0.0742259664 | -0.5341641094 | 0.032905302 |
| Q4 | -0.05318185 | 0.5904482221 | 0.0093347477 | -0.005930384 |
| Q5 | -0.57479925 | -0.0081647905 | -0.0075785903 | 0.043381792 |
| Q6 | -0.12744433 | -0.0906494086 | -0.0351729444 | 0.485516548 |
| Q7 | -0.20550700 | -0.0139001024 | 0.6502435629 | -0.024258771 |
| Q8 | -0.04607450 | 0.7838344287 | 0.0003143848 | -0.011951218 |
| Q9 | 0.50770822 | -0.1130313798 | -0.0081745117 | 0.034310067 |
| Q10 | 0.09986589 | 0.1123599780 | -0.0251257275 | 0.723847810 |
| Q11 | 0.09874898 | -0.0304099140 | -0.5803802748 | 0.030546441 |
| Q12 | -0.14476080 | 0.6130981385 | 0.0990125063 | -0.032733127 |
| Q13 | -0.62173774 | -0.0048013608 | -0.0065325099 | 0.066426336 |
| Q14 | -0.12069747 | 0.0244723551 | 0.0133711801 | 0.627754138 |
| Q15 | -0.03364667 | -0.0418442018 | 0.7756850408 | 0.050989954 |
| Q16 | 0.02085328 | -0.6193754805 | 0.1423988102 | 0.091287900 |
| Q17 | 0.80496569 | -0.0403963572 | -0.0216787781 | 0.041532351 |
| Q18 | 0.14430083 | 0.0838554373 | 0.0160305307 | 0.819075574 |
| Q19 | -0.26221647 | 0.0830144175 | -0.3881664132 | 0.124169049 |
| Q20 | 0.07809282 | 0.7627917419 | -0.0978449308 | 0.002223969 |
| Q21 | -0.71462582 | 0.0864619797 | 0.2029396751 | 0.037928261 |
| Q23 | 0.08263528 | -0.0058479938 | 0.4878321063 | -0.011736176 |
| Q24 | 0.17436808 | 0.4342877870 | 0.0294745708 | 0.056974769 |
| Q25 | 0.61047628 | 0.0412556603 | -0.0952824094 | -0.051265668 |
| Q26 | -0.09415520 | 0.0403948402 | 0.0652441016 | 0.641335774 |
| Q27 | 0.06494080 | 0.0232072832 | -0.5740987299 | 0.019163169 |
| Q29 | 0.56402640 | 0.1971541303 | 0.1809038570 | 0.106774710 |
| Q30 | 0.04027884 | -0.1293272551 | 0.0023741911 | 0.478908713 |
| Q31 | 0.09314381 | 0.0805246692 | 0.6778466714 | 0.103987773 |
| Q32 | -0.10381251 | 0.6987012872 | 0.0188514980 | 0.007789612 |
Var = as.data.frame(unclass(res_PA_obl$Vaccounted))
formattable(Var)
| PA1 | PA2 | PA3 | PA4 | |
|---|---|---|---|---|
| SS loadings | 3.6233560 | 3.1584154 | 2.93083251 | 2.88137893 |
| Proportion Var | 0.1207785 | 0.1052805 | 0.09769442 | 0.09604596 |
| Cumulative Var | 0.1207785 | 0.2260590 | 0.32375346 | 0.41979943 |
| Proportion Explained | 0.2877053 | 0.2507877 | 0.23271689 | 0.22879013 |
| Cumulative Proportion | 0.2877053 | 0.5384930 | 0.77120987 | 1.00000000 |
CUC<-data.frame(Communality=res_PA_obl$communality,Uniqueness=res_PA_obl$uniquenesses,Complexity=res_PA_obl$complexity)
CUC
fa.diagram(res_PA_obl)
Findings:
Considering the complexity of the items we obtain that each item is represented by only one factor, this can also be noticed by looking at the loadings table, in fact each item is associated with only one factor with a green measure (>0.30/-0.30);
The commonality of the items is not less than 0.3;
The average complexity is equal to 1.1.
We can therefore conclude that we have reached a simple solution.
This classification has led to four different styles of humour, namely affiliative (improving relationships/beneficial to oneself), self-enhancing (improving oneself/beneficial to others), aggressive (improving oneself/damaging others), and self-destructive (improving relationships/damaging oneself). The affiliative and self-enhancing styles of humour are similar, and both imply a propensity to laugh and to be amused/ironic. However, the affiliative humour style should be shown mainly when with others, while the self-enhancing style should be shown mainly when alone. Aggressive and self-destructive humour styles occur in social situations and share an impulsive element. The aggressive humour style should be mocking, critical and offensive and includes a lack of respect for the feelings of others. The self-deprecating style of humour refers mainly to making fun of oneself in order to make others laugh.
Q17. I don’t usually enjoy telling jokes or entertaining people.
Q21. I like to make people laugh. (REV)
Q1. I don’t usually laugh or joke much with other people.
Q13. I laugh and joke a lot with my closest friends. (REV)
Q25. I don’t often joke with my friends.
Q5. I don’t have to work very hard to make others laugh - I seem to be a humorous person by nature. (REV)
Q29. I can’t usually think of witty things to say when I’m with other people.
Q9. I rarely make others laugh by telling funny stories about
Q8. I often get carried away in putting myself down if it makes my family or friends laugh.
Q20. I often overreact to putting myself down when making jokes or trying to be funny.
Q32. Letting others laugh at me is my way of keeping my friends and family in a good mood.
Q16. I don’t often say funny things to get me down. (REV)
Q12. I often try to get people to like me or accept me more by saying something funny about my weaknesses, mistakes or flaws.
Q4. I let people laugh at me or make fun of me more than I should.
Q24. When I’m with friends or family, I often seem to be the one others make fun of or joke about.
Q15. I don’t like it when people use humor as a way to criticize or put someone down.
Q31. Even if something is really funny to me, I won’t laugh or joke about it if someone is offended by it.
Q7. People are never offended or hurt by my sense of humor.
Q11. When I tell jokes or say funny things, I’m usually not very concerned about how other people take it. (REV)
Q27. If I don’t like someone, I often use humor or teasing to put them down.(REV)
Q3. If someone makes a mistake, I often make fun of them. (REV)
Q23. I never participate in laughing at others even though all my friends do.
Q19. Sometimes I think of something that is so funny that I can’t stop myself from saying it, even if it’s not appropriate for the situation. (REV)
Q18. If I’m alone and feeling miserable, I make an effort to think of something funny to cheer myself up.
Q10. If I am feeling upset or unhappy I usually try to think of something funny about the situation to make myself feel better.
Q26. In my experience, thinking about some funny aspect of a situation is often a very effective way to deal with problems.
Q14. My humorous outlook on life keeps me from getting overly angry or depressed about things.
Q2. If I’m feeling depressed, I can usually cheer myself up with humor.
Q6. Even when I’m alone, I’m often amused by the absurdities of life.
Q30. I don’t need to be with other people to feel amused - I can usually find things to laugh about even when I’m alone.
From the graph above red lines show us the items that were negatively correlated with its item scale. We need to reverse the scale for these items before checking for internal consistency. We need to change the scale for Q21, Q13, Q5, Q16, Q27, Q11, Q3 and Q19.
new<-Data1%>%mutate(Q21R=5+1-Q21)%>%mutate(Q13R=5+1-Q13)%>%mutate(Q5R=5+1-Q5)%>%mutate(Q16R=5+1-Q16)%>%mutate(Q11R=5+1-Q11)%>%mutate(Q27R=5+1-Q27)%>%mutate(Q3R=5+1-Q3)%>%mutate(Q19R=5+1-Q19)
re_data_rev<-new[-c(21,13,5,16,11,27,3,19)]
head(re_data_rev)
Reliability is the degree of agreement/coherence between independent measures within the same construct.
To check reliability we calculate Cronbach’s alpha, as it indicates the reliability of internal consistency, and we would like it to be very high, as the more it reaches 1 the more consistent our model is.
PC1=c("Q17", "Q21R", "Q1", "Q13R", "Q25", "Q5R","Q29", "Q9")
PC1_data<-(re_data_rev[,PC1])
PC1_rel<-reliability(cov(PC1_data))
print(PC1_rel,digit=3)
## Alpha reliability = 0.832
## Standardized alpha = 0.84
##
## Reliability deleting each item in turn:
## Alpha Std.Alpha r(item, total)
## Q17 0.791 0.802 0.714
## Q21R 0.808 0.816 0.604
## Q1 0.804 0.814 0.626
## Q13R 0.813 0.819 0.577
## Q25 0.818 0.826 0.522
## Q5R 0.814 0.825 0.553
## Q29 0.825 0.832 0.486
## Q9 0.829 0.834 0.469
Cronbach’s alpha = 0.83
PC2=c("Q8", "Q20", "Q32", "Q16R","Q4","Q12","Q24")
PC2_data<-(re_data_rev[,PC2])
PC2_rel<-reliability(cov(PC2_data))
print(PC2_rel,digit=3)
## Alpha reliability = 0.832
## Standardized alpha = 0.832
##
## Reliability deleting each item in turn:
## Alpha Std.Alpha r(item, total)
## Q8 0.788 0.788 0.704
## Q20 0.796 0.794 0.673
## Q32 0.797 0.797 0.652
## Q16R 0.811 0.811 0.567
## Q4 0.812 0.812 0.561
## Q12 0.815 0.815 0.542
## Q24 0.840 0.841 0.369
Cronbach’s alpha = 0.83
As we can see from the table, removing variable Q24 causes an increase in Cronbach’s alpha of 0.08 points, so we try to remove it and repeat the analysis in order to improve reliability.
res_PC_ort = fa(Data1[-c(22, 28, 24)], nfactors = 4, n.obs=nrow(Data1), rotate="promax",fm="pa")
PC2=c("Q8", "Q20", "Q32", "Q16R","Q4","Q12")
PC2_data<-(re_data_rev[,PC2])
PC2_rel<-reliability(cov(PC2_data))
print(PC2_rel,digit=3)
## Alpha reliability = 0.84
## Standardized alpha = 0.841
##
## Reliability deleting each item in turn:
## Alpha Std.Alpha r(item, total)
## Q8 0.793 0.795 0.715
## Q20 0.802 0.801 0.690
## Q32 0.810 0.812 0.635
## Q16R 0.820 0.821 0.588
## Q4 0.829 0.831 0.538
## Q12 0.827 0.829 0.548
Cronbach’s alpha = 0.84
After removing question Q24 we obtain Cronbach’s alpha equal to 0.84.
In fact, checking the values of the variable Q24 we understand that it has a very low commonality equal to 0.20 and a Complexity equal to 0.14.
PC3=c("Q15", "Q31", "Q7", "Q27R", "Q11R", "Q3R", "Q23","Q19R")
PC3_data<-(re_data_rev[,PC3])
PC3_rel<-reliability(cov(PC3_data))
print(PC3_rel,digit=3)
## Alpha reliability = 0.813
## Standardized alpha = 0.812
##
## Reliability deleting each item in turn:
## Alpha Std.Alpha r(item, total)
## Q15 0.771 0.771 0.662
## Q31 0.782 0.781 0.596
## Q7 0.795 0.794 0.511
## Q27R 0.795 0.794 0.507
## Q11R 0.798 0.797 0.486
## Q3R 0.790 0.789 0.545
## Q23 0.800 0.800 0.471
## Q19R 0.803 0.802 0.454
Cronbach’s alpha = 0.81
PC4=c("Q18", "Q10", "Q26", "Q2", "Q14", "Q30", "Q6")
PC4_data<-(re_data_rev[,PC4])
PC4_rel<-reliability(cov(PC4_data))
print(PC4_rel,digit=3)
## Alpha reliability = 0.823
## Standardized alpha = 0.82
##
## Reliability deleting each item in turn:
## Alpha Std.Alpha r(item, total)
## Q18 0.780 0.779 0.673
## Q10 0.789 0.787 0.624
## Q26 0.790 0.788 0.616
## Q2 0.802 0.800 0.547
## Q14 0.789 0.786 0.623
## Q30 0.824 0.822 0.392
## Q6 0.812 0.809 0.481
Cronbach’s alpha = 0.82
Given Cronbach’s alpha and reliability by removing each item, we can say that all factors are reliable for all items (except for item Q24) since we have good values, all above 0.7.
If we want to analyse the correlation between our factors we have a problem due to the nature of our items. As we have seen previously Affiliative Humor and Aggressive Humor have high values (4-5) if the individual states that they are not affiliative or aggressive, while they have low values (1-2) when the individual states that he or she is. The opposite happens for the other two factors Self-Defeating Humor and Self-Enhancing Humor, where instead one has high values when one thinks of being Self-Enhancing or Self-Defeating. In order to solve this problem, we invert all item values for the factors Affiliative Humor and Aggressive Humor.
new2<-re_data_rev%>%mutate(Q17=5+1-Q17)%>%mutate(Q21R=5+1-Q21R)%>%mutate(Q1=5+1-Q1)%>%mutate(Q13R=5+1-Q13R)%>%mutate(Q25=5+1-Q25)%>%mutate(Q5R=5+1-Q5R)%>%mutate(Q29=5+1-Q29)%>%mutate(Q9=5+1-Q9)%>%mutate(Q15=5+1-Q15)%>%mutate(Q31=5+1-Q31)%>%mutate(Q7=5+1-Q7)%>%mutate(Q11R=5+1-Q11R)%>%mutate(Q27R=5+1-Q27R)%>%mutate(Q3R=5+1-Q3R)%>%mutate(Q23=5+1-Q23)%>%mutate(Q19R=5+1-Q19R)
res_PA_obl2 = fa(new2, nfactors = 4, n.obs=nrow(new2), rotate="promax",fm="pa")
cor(res_PA_obl2$scores)
## PA1 PA2 PA4 PA3
## PA1 1.0000000 0.2565970 0.5564610 0.3854067
## PA2 0.2565970 1.0000000 0.3306995 0.2450561
## PA4 0.5564610 0.3306995 1.0000000 0.2882763
## PA3 0.3854067 0.2450561 0.2882763 1.0000000
Scores <- as.data.frame(res_PA_obl2$scores)
colnames(Scores)<- c("Affiliative","Self_Defeating", "Aggressive", "Self_Enhancing")
As we have seen there are correlations between our factors.
When we measure traits related to psychology, the reference value in terms of correlation is 0.30 which indicates an average association.
That said, the only associations worth analyzing are between Affiliative-Aggressive and Affiliative-Self-Enhancing.
The analysis reports us a positive correlation equal to 0.38 between Affiliative and Aggressive Humor.
We can therefore expect that participants who reported a high score for Affiliative Humor also presented a higher score for Aggressive Humor and vice versa.
This result is confirmed by the scatterplot where we can see that if we raise the values of Affiliative also increase the values of Aggressive and vice versa.
Scores %>% ggplot(aes(x=Affiliative ,y= Aggressive ))+
geom_point()+
labs(title="Correlation between Affiliative and Aggressive")
The analysis also reports a positive correlation of 0.55 between Affiliative and Self-Enhancing Humor.
Therefore, we can expect that participants who reported a high score for Affiliative Humor would present a high score for Self-Enhancing Humor and vice versa.
This result is confirmed by the scatterplot where we can see that if we increase the values for Affiliative Humour the same happens for the values of Self-Enhancing Humour and vice versa.
Scores %>% ggplot(aes(x=Affiliative ,y= Self_Enhancing ))+
geom_point()+
labs(title="Correlation between Affiliative and Self-Enhancing")
The aim of this section is to find out whether the scores for “Affiliative Humour”, “Self-Destructive Humour”, “Aggressive Humour” and “Self-Improving Humour” are associated with respect to the gender of the individuals.
Specifically, the questionnaire allows us to obtain data for three levels of the Gender category:
1=male ; 2=female ; 3=other .
Since, after the initial filtering, only 2 data points appear for the class “other”, we will only analyse the genders “male” (160 individuals) and “female” (199 individuals). Then we will compare the scores to see if there is any difference between the genders.
Gender <- Data%>%
select(gender)
normalize <- function(x){
return ((x-min(x))/(max(x)-min(x)))
}
Scores <- normalize(res_PA_obl2$scores)
colnames(Scores)<- c("Affiliative","Self_Defeating", "Aggressive", "Self_Enhancing")
GenderScores <- cbind(Scores, Gender)
GenderScores <- GenderScores %>%
filter(gender=="1"|gender=="2")
GenderScores$gender[GenderScores$gender==1] <- "Male"
GenderScores$gender[GenderScores$gender==2] <- "Female"
head(GenderScores)
Females <- GenderScores %>% dplyr::filter(gender == "Male")
Males <- GenderScores %>% dplyr::filter(gender == "Female")
To do this, we first check graphically their distribution with respect to each factor and then we will use Whitney Wilcoxon rank sum test to have a statistical confirmation of their diversity/equality.
The Wilcoxon rank-sum test is a non-parametric test to assess whether two samples of measures come from the same distribution:
It is an alternative to the two-sample unpaired t-test and focuses on the medians, the more the data are distributed symmetrically around the median (almost the same number of values above and below the median), the more similar are group means.
median <- GenderScores %>%
group_by(gender) %>%
summarize(median=median(Affiliative ))
ggplot(GenderScores, aes(x= Affiliative , fill=gender))+
geom_density(alpha=0.3,size=1)+
scale_fill_brewer(palette = "Set1")+
geom_vline(data = median, aes(xintercept = median,
color = gender), size=1.5)+
labs(x= "Affiliative",
subtitle="Affiliative Distribution respect to the Gender")+
theme_minimal()
wilcox.test(Females$Affiliative,Males$Affiliative,alternative = "two.sided", var.equal = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Females$Affiliative and Males$Affiliative
## W = 16850, p-value = 0.3416
## alternative hypothesis: true location shift is not equal to 0
Given the significant level equal to 0.05 we fail to reject the null hypothesis differences in medians are not statically significant for Affiliative Humor with respect to gender, so the two populations are equal.
median <- GenderScores %>%
group_by(gender) %>%
summarize(median=median(Self_Defeating ))
ggplot(GenderScores, aes(x= Self_Defeating , fill=gender))+
geom_density(alpha=0.3,size=1)+
scale_fill_brewer(palette = "Set1")+
geom_vline(data = median, aes(xintercept = median,
color = gender), size=1.5)+
labs(x= "Self_Defeating",
subtitle="Self_Defeating Distribution respect to the Gender")+
theme_minimal()
wilcox.test(Females$Self_Defeating,Males$Self_Defeating,alternative = "two.sided", var.equal = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Females$Self_Defeating and Males$Self_Defeating
## W = 18134, p-value = 0.02352
## alternative hypothesis: true location shift is not equal to 0
Given the significant level equal to 0.05 we reject the null hypothesis for the Wilcoxon rank sum test: differences in medians are statically significant for Self-Defeating Humor with respect to gender, so the two populations are not equal.
median <- GenderScores %>%
group_by(gender) %>%
summarize(median=median(Aggressive ))
ggplot(GenderScores, aes(x= Aggressive , fill=gender))+
geom_density(alpha=0.3,size=1)+
scale_fill_brewer(palette = "Set1")+
geom_vline(data = median, aes(xintercept = median,
color = gender), size=1.5)+
labs(x= "Aggressive",
subtitle="Aggressive Distribution respect to the Gender")+
theme_minimal()
wilcox.test(Females$Aggressive,Males$Aggressive,alternative = "two.sided", var.equal = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Females$Aggressive and Males$Aggressive
## W = 14831, p-value = 0.2654
## alternative hypothesis: true location shift is not equal to 0
Again, we fail to reject the null hypothesis the differences in medians are not statically significant for Aggressive Humor with respect to sex, so the two populations are equal.
median <- GenderScores %>%
group_by(gender) %>%
summarize(median=median(Self_Enhancing ))
ggplot(GenderScores, aes(x= Self_Enhancing , fill=gender))+
geom_density(alpha=0.3,size=1)+
scale_fill_brewer(palette = "Set1")+
geom_vline(data = median, aes(xintercept = median,
color = gender), size=1.5)+
labs(x= "Self_Enhancing",
subtitle="Self_Enhancing Distribution respect to the Gender")+
theme_minimal()
wilcox.test(Females$Self_Enhancing,Males$Self_Enhancing,alternative = "two.sided", var.equal = FALSE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: Females$Self_Enhancing and Males$Self_Enhancing
## W = 20695, p-value = 1.033e-06
## alternative hypothesis: true location shift is not equal to 0
Given the significant level equal to 0.05 we reject the null hypothesis for the Wilcoxon rank sum test: differences in medians are statically significant for Self-Enhancing Humor with respect to gender, so the two populations are not equal.
Referring to the research conducted by Martin, R. A., Puhlik-Doris, P., Larsen, G., Gray, J., & Weir, K. (2003), in both cases we found the same number of factors: Affiliative, Self-enhancing, Aggressive and Self-defeating. In their analysis, males scored higher than females on Aggressive and Self-defeating humour, while in this case males and females have fairly similar scores on Affiliative and Aggressive Humour, while on average there are differences in the values for Self-Enhancing Humour and Self-Defeating Humour.