library(psych)
library(tidyverse)
library(ggplot2)
library(GPArotation)
library(umx)
library(corrplot)
library(readr)
library(formattable)
library(MVN)

We will perform a factor analysis on a dataset of responses to the online interactive version of the Humor Styles Questionnair (HSQ) conducted in 2003 by Martin, R. A., Puhlik-Doris, P., Larsen, G., Gray, J., & Weir, K.. Next, we will check the results and compare them for three different gender identities stated in the questionnaire.

The HSQ was developed as a comprehensive self-report measure of the daily functions of humor, particularly those relevant to psychosocial well-being, postulating two goals of humor use, improving oneself vs. improving one’s relationships with others, as well as two ways of using humor to achieve these goals, positively or negatively.

The following items were rated on a 5 point scale where:

1=Strongly disagree,

2=Disagree,

3=Neither agree not disagree,

4=Agree,

5=Strongly agree.

The last page of the test had the following information entered:

Extract data

First we select the 32 items and filter them to have only data with an accuracy greater than 90, in this way we reduce the amount of data that we are going to analyze but increase the precision of them so that the analysis is not biased by false values.

In addition, we eliminate the data obtained from people who declared to be 151, 242, 2670, and 44849 years old and also the missing values (-1).

library(readr)
data <- read_csv("data.csv")
View(data)
Data=data.frame(data)
Data <- Data %>%dplyr::filter(accuracy>90, age<=70)
Data<-Data[apply(Data,1,function(row)all(row!=0, row!=-1)),]



Data1 <- Data%>%
  dplyr::select(-affiliative,-selfenhancing,-agressive, -selfdefeating)
head(Data1)

After filtering, we obtain the responses of 361 participants to the 32 questions plus information on age, gender and accuracy.

Descriptive Statistics

brotools::describe(Data1)
Data1 <- Data1%>%
  select(-age, -gender, -accuracy)

From the table we can see that no missing values are pressing and that the minimum and maximum cover the entire range of response options [1;5].

Furthermore, the accuracy is correctly greater than 90, the age does not present unrealistic values and the gender has been reduced to only two options after filtering, 1=male and 2=female.

Checks for the factor analysis

Nature of variables

All the variables are Likert items

Check Normality

Let’s do the normality assumption check on the variables

Distribution

plot=Data1 %>% pivot_longer(cols=everything(), names_to = "VAR", values_to = "vals")
plot1= ggplot(plot, aes(x=vals)) + geom_bar(fill="blue") + facet_wrap(~VAR) + theme_void()
plot1

Looking at the graphs, all of the variables do not appear to be normally distributed. To properly test the hypothesis of normality, we perform the normality test.

Shapiro Test

df_shapiro=apply(Data1, 2, shapiro.test)
p.value=unlist(lapply(df_shapiro, function(x) x$p.value))

shap = as.data.frame(p.value)
formattable(shap)
p.value
Q1 5.544107e-22
Q2 4.815557e-15
Q3 9.912938e-14
Q4 3.969369e-14
Q5 3.259659e-17
Q6 1.047160e-23
Q7 2.445749e-14
Q8 7.849488e-17
Q9 9.304428e-17
Q10 5.435999e-14
Q11 2.543433e-15
Q12 3.130462e-14
Q13 8.211494e-28
Q14 1.285223e-15
Q15 1.059389e-17
Q16 8.271325e-15
Q17 1.484405e-23
Q18 1.257181e-13
Q19 3.069519e-15
Q20 9.642760e-21
Q21 1.935907e-26
Q22 1.541646e-13
Q23 6.628741e-14
Q24 3.495105e-15
Q25 4.956617e-28
Q26 1.272132e-16
Q27 5.683903e-19
Q28 3.764479e-15
Q29 2.318990e-19
Q30 2.587740e-21
Q31 2.225144e-15
Q32 5.819178e-15

The p-value for each variable is significant because from the test result all items have a p-value approaching zero, so we reject the null hypothesis that the data are normally distributed and conclude that all 32 items are not normally distributed.

KMO test

The KMO test is used to see if we have a factorial structure in our data, which is represented by the correlation of groups of variables.

We first check our correlation matrix.
Visual analysis of the correlation matrix is done to reveal some similarities or correlations that can be found in the data between groups of questions.

R<-cor(Data1)
corrplot(R,type="upper",order="hclust")

The blue structures are positively correlated groups and the red structure represents a negatively correlated group that emerges among the data.

psych::KMO(R)
## Kaiser-Meyer-Olkin factor adequacy
## Call: psych::KMO(r = R)
## Overall MSA =  0.86
## MSA for each item = 
##   Q1   Q2   Q3   Q4   Q5   Q6   Q7   Q8   Q9  Q10  Q11  Q12  Q13  Q14  Q15  Q16 
## 0.91 0.90 0.89 0.87 0.87 0.79 0.79 0.82 0.91 0.85 0.86 0.87 0.89 0.90 0.82 0.86 
##  Q17  Q18  Q19  Q20  Q21  Q22  Q23  Q24  Q25  Q26  Q27  Q28  Q29  Q30  Q31  Q32 
## 0.88 0.83 0.91 0.82 0.89 0.84 0.81 0.83 0.83 0.88 0.83 0.87 0.87 0.74 0.81 0.89

The Kaiser-Meyer-Olkin (KMO) factor adequacy value is 0.86 and is a good value suggesting that this correlation matrix is suitable for analysis (0.7 per variable is excellent) and that there is a factor-based structure within our data.

FACTOR ANALYSIS

Selecting the number of factors

Now we use the scree plot with the parallel analysis to decide the number of factors that we are going to insert in the analysis.

fa.parallel(Data1,  fa="fa", 
            show.legend=FALSE, main="Scree plot with parallel analysis")

## Parallel analysis suggests that the number of factors =  4  and the number of components =  NA

Considering the parallel analysis method and the eigenvalue 1 rule, we can select 4 factors for the analysis. This choice is confirmed by the literature.

Principal Axis analysis - Without Rotation

res_fa=fa(Data1, nfactors = 4, fm = "pa", rotate = "none")

customGreen = "#71CA97"

customRed = "#ff7f7f"

PA1_formatter <- 
  formatter("span", 
            style = x ~ style(
              font.weight = "bold", 
              color = ifelse(x > 0.3 | x< -0.3, customGreen, customRed)))

PA2_formatter <- 
  formatter("span", 
            style = x ~ style(
              font.weight = "bold", 
              color = ifelse(x > 0.3 | x< -0.3, customGreen, customRed)))


PA3_formatter <- 
  formatter("span", 
            style = x ~ style(
              font.weight = "bold", 
              color = ifelse(x > 0.3 | x< -0.3, customGreen, customRed)))


PA4_formatter <- 
  formatter("span", 
            style = x ~ style(
              font.weight = "bold", 
              color = ifelse(x > 0.3 | x< -0.3, customGreen, customRed)))



load = as.data.frame(unclass(res_fa$loadings))
formattable(load,  list(  
`PA1`=PA1_formatter,
`PA2`=PA2_formatter, 
`PA3`=PA2_formatter,
`PA4`=PA2_formatter))
PA1 PA2 PA3 PA4
Q1 -0.5728442 0.248471782 -0.01859520 0.303835849
Q2 0.5012124 -0.158987035 0.16243860 0.298447914
Q3 0.4663939 0.042766516 -0.40683747 0.037597061
Q4 0.3509635 0.463059917 0.13790131 -0.090025248
Q5 0.5001507 -0.238101259 0.02850684 -0.222036010
Q6 0.4140986 -0.233697893 0.12083764 0.217624216
Q7 -0.2128610 -0.137876706 0.53175472 -0.209070725
Q8 0.4445715 0.623968871 0.17011407 -0.112722833
Q9 -0.4431618 0.096036448 -0.03836657 0.252597489
Q10 0.5154251 -0.037220634 0.25012711 0.437310646
Q11 0.2380876 0.050607412 -0.47689473 0.155080047
Q12 0.3617808 0.444380736 0.20923101 -0.166032419
Q13 0.5522058 -0.257164223 0.03879829 -0.231953739
Q14 0.5560420 -0.179641223 0.23783242 0.287456251
Q15 -0.3729314 -0.115922443 0.64575453 -0.101980043
Q16 -0.3547675 -0.523663024 0.01120780 0.106116902
Q17 -0.6392026 0.267697508 -0.04272231 0.374943159
Q18 0.5085105 -0.070623603 0.30829088 0.501564120
Q19 0.5511580 -0.032316041 -0.24870777 0.004631131
Q20 0.3913583 0.653464053 0.08457721 -0.036273552
Q21 0.5440290 -0.229455642 0.22514016 -0.330124246
Q22 -0.4064303 0.007282421 -0.02990634 -0.086212240
Q23 -0.3405882 -0.006351837 0.39234821 -0.046434149
Q24 0.1135508 0.407601128 0.13251194 0.055569254
Q25 -0.4570719 0.288461305 -0.10897220 0.248833709
Q26 0.5331993 -0.156688095 0.28897994 0.306818838
Q27 0.2842206 0.088276635 -0.46384765 0.130555298
Q28 0.4173245 0.102765828 0.01194820 0.107142841
Q29 -0.3829953 0.339726572 0.20667678 0.246241808
Q30 0.2421815 -0.201599723 0.13659153 0.293096457
Q31 -0.3290608 0.026582451 0.60666460 -0.023107605
Q32 0.4465693 0.524057149 0.17489787 -0.122374206
Var = as.data.frame(unclass(res_fa$Vaccounted))
formattable(Var)
PA1 PA2 PA3 PA4
SS loadings 6.0886697 2.70073670 2.50845952 1.67051485
Proportion Var 0.1902709 0.08439802 0.07838936 0.05220359
Cumulative Var 0.1902709 0.27466895 0.35305831 0.40526190
Proportion Explained 0.4695012 0.20825551 0.19342889 0.12881445
Cumulative Proportion 0.4695012 0.67775666 0.87118555 1.00000000
CUC<-data.frame(Communality=res_fa$communality,Uniqueness=res_fa$uniquenesses,Complexity=res_fa$complexity)


CUC

Pattern Matrix

We can see that:

  • Loadings: partial correlation coefficients of factors to items.

  • h2 = Commonality: the proportion of element/variable variance explained by the factors. It is the sum of the loadings squared; we can consider commonality from 0.2 and up acceptable.

  • u2 = Uniqueness: (1 - commonality) is the residual variance;

  • Com = Complexity number: specifically tells you how much an item reflects a single construct. In this case, the average complexity of the item is 2.1 and is too high. Since this value is very far from one, we can say that the solution is not simple.

  • SS loadings: corresponds to the eigenvalues of the principal components.

  • Proportion Var: is the proportion of variability explained by each principal component.

  • Cumulative Var: measures the amount of information explained by the factors considered with respect to the total information.

Path Diagram

fa.diagram(res_fa)

From this graph we can see that most of our elements are only explained by the first factor and that the fourth factor remains completely empty.

Also in the factor analysis we want to get a simple structure (1 factor = 1 variable), and we would like our complexity number to be as close to 1 as possible, which means that a variable is explained by only one factor.

In this case our MSE (Mean item complexity) is 2.1 which is particularly high, so we will try to decrease it using a rotation.

Plot Complexity & Communality

Cum = CUC %>% mutate(Vnames=rownames(.))
Cum$Vnames = as.factor(Cum$Vnames)

Cum = Cum  %>%
  ggplot( aes(x=reorder(Vnames,Communality), y=Communality)) +
    geom_bar(stat="identity", fill="darkblue", alpha=1) +
    xlab("Item")+
  ggtitle("Communality")+
    theme_minimal()


Com = CUC %>% mutate(Vnames=rownames(.))
Com$Vnames = as.factor(Com$Vnames)

Com = Com  %>%
  ggplot( aes(x=reorder(Vnames,Complexity), y=Complexity)) +
    geom_bar(stat="identity", fill="darkblue", alpha=1) +
    xlab("Item")+
  ggtitle("Complexity")+
    theme_minimal()

library(scater)

multiplot(Cum, Com)

From the graphs we can see the different Commonalities/Complexities between the items.

Findings :

Principal Axis analysis - Oblique Rotation

To obtain a simple solution, we must modify our solution by rotating the result. In this way, we will increase the loads on one factor and decrease the loads on the other factors to obtain a simple solution.

After a preliminary analysis two items, Q22 and Q28, do not seem to be associated with any factor, so they need to be removed from the item pool;

We must therefore eliminate these two elements and repeat the analysis to obtain a simple solution.

Final plot (Remove Q22 and Q28)

res_PA_obl = fa(Data1[-c(22, 28)], nfactors = 4, rotate="promax",fm="pa")



load = as.data.frame(unclass(res_PA_obl$loadings))
formattable(load,  list(  
`PA1`=PA1_formatter,
`PA2`=PA2_formatter, 
`PA3`=PA2_formatter,
`PA4`=PA2_formatter))
PA1 PA2 PA3 PA4
Q1 0.69284016 -0.0175328641 0.0086904381 0.013122254
Q2 -0.07079602 -0.0005494547 -0.0464550253 0.568952668
Q3 -0.13884025 0.0742259664 -0.5341641094 0.032905302
Q4 -0.05318185 0.5904482221 0.0093347477 -0.005930384
Q5 -0.57479925 -0.0081647905 -0.0075785903 0.043381792
Q6 -0.12744433 -0.0906494086 -0.0351729444 0.485516548
Q7 -0.20550700 -0.0139001024 0.6502435629 -0.024258771
Q8 -0.04607450 0.7838344287 0.0003143848 -0.011951218
Q9 0.50770822 -0.1130313798 -0.0081745117 0.034310067
Q10 0.09986589 0.1123599780 -0.0251257275 0.723847810
Q11 0.09874898 -0.0304099140 -0.5803802748 0.030546441
Q12 -0.14476080 0.6130981385 0.0990125063 -0.032733127
Q13 -0.62173774 -0.0048013608 -0.0065325099 0.066426336
Q14 -0.12069747 0.0244723551 0.0133711801 0.627754138
Q15 -0.03364667 -0.0418442018 0.7756850408 0.050989954
Q16 0.02085328 -0.6193754805 0.1423988102 0.091287900
Q17 0.80496569 -0.0403963572 -0.0216787781 0.041532351
Q18 0.14430083 0.0838554373 0.0160305307 0.819075574
Q19 -0.26221647 0.0830144175 -0.3881664132 0.124169049
Q20 0.07809282 0.7627917419 -0.0978449308 0.002223969
Q21 -0.71462582 0.0864619797 0.2029396751 0.037928261
Q23 0.08263528 -0.0058479938 0.4878321063 -0.011736176
Q24 0.17436808 0.4342877870 0.0294745708 0.056974769
Q25 0.61047628 0.0412556603 -0.0952824094 -0.051265668
Q26 -0.09415520 0.0403948402 0.0652441016 0.641335774
Q27 0.06494080 0.0232072832 -0.5740987299 0.019163169
Q29 0.56402640 0.1971541303 0.1809038570 0.106774710
Q30 0.04027884 -0.1293272551 0.0023741911 0.478908713
Q31 0.09314381 0.0805246692 0.6778466714 0.103987773
Q32 -0.10381251 0.6987012872 0.0188514980 0.007789612
Var = as.data.frame(unclass(res_PA_obl$Vaccounted))
formattable(Var)
PA1 PA2 PA3 PA4
SS loadings 3.6233560 3.1584154 2.93083251 2.88137893
Proportion Var 0.1207785 0.1052805 0.09769442 0.09604596
Cumulative Var 0.1207785 0.2260590 0.32375346 0.41979943
Proportion Explained 0.2877053 0.2507877 0.23271689 0.22879013
Cumulative Proportion 0.2877053 0.5384930 0.77120987 1.00000000
CUC<-data.frame(Communality=res_PA_obl$communality,Uniqueness=res_PA_obl$uniquenesses,Complexity=res_PA_obl$complexity)


CUC
fa.diagram(res_PA_obl)

Findings:

  • Considering the complexity of the items we obtain that each item is represented by only one factor, this can also be noticed by looking at the loadings table, in fact each item is associated with only one factor with a green measure (>0.30/-0.30);

  • The commonality of the items is not less than 0.3;

  • The average complexity is equal to 1.1.

We can therefore conclude that we have reached a simple solution.

Interpretation

This classification has led to four different styles of humour, namely affiliative (improving relationships/beneficial to oneself), self-enhancing (improving oneself/beneficial to others), aggressive (improving oneself/damaging others), and self-destructive (improving relationships/damaging oneself). The affiliative and self-enhancing styles of humour are similar, and both imply a propensity to laugh and to be amused/ironic. However, the affiliative humour style should be shown mainly when with others, while the self-enhancing style should be shown mainly when alone. Aggressive and self-destructive humour styles occur in social situations and share an impulsive element. The aggressive humour style should be mocking, critical and offensive and includes a lack of respect for the feelings of others. The self-deprecating style of humour refers mainly to making fun of oneself in order to make others laugh.

Q17. I don’t usually enjoy telling jokes or entertaining people.
Q21. I like to make people laugh. (REV)
Q1. I don’t usually laugh or joke much with other people.
Q13. I laugh and joke a lot with my closest friends. (REV)
Q25. I don’t often joke with my friends.
Q5. I don’t have to work very hard to make others laugh - I seem to be a humorous person by nature. (REV)
Q29. I can’t usually think of witty things to say when I’m with other people.
Q9. I rarely make others laugh by telling funny stories about

Q8. I often get carried away in putting myself down if it makes my family or friends laugh.
Q20. I often overreact to putting myself down when making jokes or trying to be funny.
Q32. Letting others laugh at me is my way of keeping my friends and family in a good mood.
Q16. I don’t often say funny things to get me down. (REV)
Q12. I often try to get people to like me or accept me more by saying something funny about my weaknesses, mistakes or flaws.
Q4. I let people laugh at me or make fun of me more than I should.
Q24. When I’m with friends or family, I often seem to be the one others make fun of or joke about.

Q15. I don’t like it when people use humor as a way to criticize or put someone down.
Q31. Even if something is really funny to me, I won’t laugh or joke about it if someone is offended by it.
Q7. People are never offended or hurt by my sense of humor.
Q11. When I tell jokes or say funny things, I’m usually not very concerned about how other people take it. (REV)
Q27. If I don’t like someone, I often use humor or teasing to put them down.(REV)
Q3. If someone makes a mistake, I often make fun of them. (REV)
Q23. I never participate in laughing at others even though all my friends do.
Q19. Sometimes I think of something that is so funny that I can’t stop myself from saying it, even if it’s not appropriate for the situation. (REV)

Q18. If I’m alone and feeling miserable, I make an effort to think of something funny to cheer myself up.
Q10. If I am feeling upset or unhappy I usually try to think of something funny about the situation to make myself feel better.
Q26. In my experience, thinking about some funny aspect of a situation is often a very effective way to deal with problems.
Q14. My humorous outlook on life keeps me from getting overly angry or depressed about things.
Q2. If I’m feeling depressed, I can usually cheer myself up with humor.
Q6. Even when I’m alone, I’m often amused by the absurdities of life.
Q30. I don’t need to be with other people to feel amused - I can usually find things to laugh about even when I’m alone.

From the graph above red lines show us the items that were negatively correlated with its item scale. We need to reverse the scale for these items before checking for internal consistency. We need to change the scale for Q21, Q13, Q5, Q16, Q27, Q11, Q3 and Q19.

Reverse scale

new<-Data1%>%mutate(Q21R=5+1-Q21)%>%mutate(Q13R=5+1-Q13)%>%mutate(Q5R=5+1-Q5)%>%mutate(Q16R=5+1-Q16)%>%mutate(Q11R=5+1-Q11)%>%mutate(Q27R=5+1-Q27)%>%mutate(Q3R=5+1-Q3)%>%mutate(Q19R=5+1-Q19)

re_data_rev<-new[-c(21,13,5,16,11,27,3,19)]
head(re_data_rev)

Reliability

Reliability is the degree of agreement/coherence between independent measures within the same construct.

To check reliability we calculate Cronbach’s alpha, as it indicates the reliability of internal consistency, and we would like it to be very high, as the more it reaches 1 the more consistent our model is.

Reliability for the first factor

PC1=c("Q17", "Q21R", "Q1", "Q13R", "Q25", "Q5R","Q29", "Q9")
PC1_data<-(re_data_rev[,PC1]) 
PC1_rel<-reliability(cov(PC1_data))
print(PC1_rel,digit=3)
## Alpha reliability =  0.832 
## Standardized alpha =  0.84 
## 
## Reliability deleting each item in turn:
##      Alpha Std.Alpha r(item, total)
## Q17  0.791     0.802          0.714
## Q21R 0.808     0.816          0.604
## Q1   0.804     0.814          0.626
## Q13R 0.813     0.819          0.577
## Q25  0.818     0.826          0.522
## Q5R  0.814     0.825          0.553
## Q29  0.825     0.832          0.486
## Q9   0.829     0.834          0.469

Cronbach’s alpha = 0.83

Reliability for the second factor

Checking reliability

PC2=c("Q8", "Q20", "Q32", "Q16R","Q4","Q12","Q24")
PC2_data<-(re_data_rev[,PC2])
PC2_rel<-reliability(cov(PC2_data))
print(PC2_rel,digit=3)
## Alpha reliability =  0.832 
## Standardized alpha =  0.832 
## 
## Reliability deleting each item in turn:
##      Alpha Std.Alpha r(item, total)
## Q8   0.788     0.788          0.704
## Q20  0.796     0.794          0.673
## Q32  0.797     0.797          0.652
## Q16R 0.811     0.811          0.567
## Q4   0.812     0.812          0.561
## Q12  0.815     0.815          0.542
## Q24  0.840     0.841          0.369

Cronbach’s alpha = 0.83

As we can see from the table, removing variable Q24 causes an increase in Cronbach’s alpha of 0.08 points, so we try to remove it and repeat the analysis in order to improve reliability.

  • Q24. When I am with friends or family, I often seem to be the one that other people make fun of or joke about.

reliability without Q24

res_PC_ort = fa(Data1[-c(22, 28, 24)], nfactors = 4, n.obs=nrow(Data1), rotate="promax",fm="pa")

PC2=c("Q8", "Q20", "Q32", "Q16R","Q4","Q12")
PC2_data<-(re_data_rev[,PC2])
PC2_rel<-reliability(cov(PC2_data))
print(PC2_rel,digit=3)
## Alpha reliability =  0.84 
## Standardized alpha =  0.841 
## 
## Reliability deleting each item in turn:
##      Alpha Std.Alpha r(item, total)
## Q8   0.793     0.795          0.715
## Q20  0.802     0.801          0.690
## Q32  0.810     0.812          0.635
## Q16R 0.820     0.821          0.588
## Q4   0.829     0.831          0.538
## Q12  0.827     0.829          0.548

Cronbach’s alpha = 0.84

After removing question Q24 we obtain Cronbach’s alpha equal to 0.84.

In fact, checking the values of the variable Q24 we understand that it has a very low commonality equal to 0.20 and a Complexity equal to 0.14.

Reliability for the third factor

PC3=c("Q15", "Q31", "Q7", "Q27R", "Q11R", "Q3R", "Q23","Q19R")
PC3_data<-(re_data_rev[,PC3])
PC3_rel<-reliability(cov(PC3_data))
print(PC3_rel,digit=3)
## Alpha reliability =  0.813 
## Standardized alpha =  0.812 
## 
## Reliability deleting each item in turn:
##      Alpha Std.Alpha r(item, total)
## Q15  0.771     0.771          0.662
## Q31  0.782     0.781          0.596
## Q7   0.795     0.794          0.511
## Q27R 0.795     0.794          0.507
## Q11R 0.798     0.797          0.486
## Q3R  0.790     0.789          0.545
## Q23  0.800     0.800          0.471
## Q19R 0.803     0.802          0.454

Cronbach’s alpha = 0.81

Reliability for the fourth factor

PC4=c("Q18", "Q10", "Q26", "Q2", "Q14", "Q30", "Q6")
PC4_data<-(re_data_rev[,PC4])
PC4_rel<-reliability(cov(PC4_data))
print(PC4_rel,digit=3)
## Alpha reliability =  0.823 
## Standardized alpha =  0.82 
## 
## Reliability deleting each item in turn:
##     Alpha Std.Alpha r(item, total)
## Q18 0.780     0.779          0.673
## Q10 0.789     0.787          0.624
## Q26 0.790     0.788          0.616
## Q2  0.802     0.800          0.547
## Q14 0.789     0.786          0.623
## Q30 0.824     0.822          0.392
## Q6  0.812     0.809          0.481

Cronbach’s alpha = 0.82

Given Cronbach’s alpha and reliability by removing each item, we can say that all factors are reliable for all items (except for item Q24) since we have good values, all above 0.7.

Correlations

If we want to analyse the correlation between our factors we have a problem due to the nature of our items. As we have seen previously Affiliative Humor and Aggressive Humor have high values (4-5) if the individual states that they are not affiliative or aggressive, while they have low values (1-2) when the individual states that he or she is. The opposite happens for the other two factors Self-Defeating Humor and Self-Enhancing Humor, where instead one has high values when one thinks of being Self-Enhancing or Self-Defeating. In order to solve this problem, we invert all item values for the factors Affiliative Humor and Aggressive Humor.

new2<-re_data_rev%>%mutate(Q17=5+1-Q17)%>%mutate(Q21R=5+1-Q21R)%>%mutate(Q1=5+1-Q1)%>%mutate(Q13R=5+1-Q13R)%>%mutate(Q25=5+1-Q25)%>%mutate(Q5R=5+1-Q5R)%>%mutate(Q29=5+1-Q29)%>%mutate(Q9=5+1-Q9)%>%mutate(Q15=5+1-Q15)%>%mutate(Q31=5+1-Q31)%>%mutate(Q7=5+1-Q7)%>%mutate(Q11R=5+1-Q11R)%>%mutate(Q27R=5+1-Q27R)%>%mutate(Q3R=5+1-Q3R)%>%mutate(Q23=5+1-Q23)%>%mutate(Q19R=5+1-Q19R)


res_PA_obl2 = fa(new2, nfactors = 4, n.obs=nrow(new2), rotate="promax",fm="pa")
cor(res_PA_obl2$scores)
##           PA1       PA2       PA4       PA3
## PA1 1.0000000 0.2565970 0.5564610 0.3854067
## PA2 0.2565970 1.0000000 0.3306995 0.2450561
## PA4 0.5564610 0.3306995 1.0000000 0.2882763
## PA3 0.3854067 0.2450561 0.2882763 1.0000000
Scores <- as.data.frame(res_PA_obl2$scores)
colnames(Scores)<- c("Affiliative","Self_Defeating", "Aggressive", "Self_Enhancing")

As we have seen there are correlations between our factors.
When we measure traits related to psychology, the reference value in terms of correlation is 0.30 which indicates an average association.
That said, the only associations worth analyzing are between Affiliative-Aggressive and Affiliative-Self-Enhancing.

Correlation between Affiliative and Aggressive

The analysis reports us a positive correlation equal to 0.38 between Affiliative and Aggressive Humor.
We can therefore expect that participants who reported a high score for Affiliative Humor also presented a higher score for Aggressive Humor and vice versa.

This result is confirmed by the scatterplot where we can see that if we raise the values of Affiliative also increase the values of Aggressive and vice versa.

Scores %>% ggplot(aes(x=Affiliative ,y= Aggressive ))+
  geom_point()+
  labs(title="Correlation between Affiliative and Aggressive")

Correlation between Affiliative and Self-Enhancing

The analysis also reports a positive correlation of 0.55 between Affiliative and Self-Enhancing Humor.
Therefore, we can expect that participants who reported a high score for Affiliative Humor would present a high score for Self-Enhancing Humor and vice versa.

This result is confirmed by the scatterplot where we can see that if we increase the values for Affiliative Humour the same happens for the values of Self-Enhancing Humour and vice versa.

Scores %>% ggplot(aes(x=Affiliative ,y= Self_Enhancing ))+
  geom_point()+
  labs(title="Correlation between Affiliative and Self-Enhancing")

Differences between Genders

The aim of this section is to find out whether the scores for “Affiliative Humour”, “Self-Destructive Humour”, “Aggressive Humour” and “Self-Improving Humour” are associated with respect to the gender of the individuals.

Specifically, the questionnaire allows us to obtain data for three levels of the Gender category:

1=male ; 2=female ; 3=other .

Since, after the initial filtering, only 2 data points appear for the class “other”, we will only analyse the genders “male” (160 individuals) and “female” (199 individuals). Then we will compare the scores to see if there is any difference between the genders.

Gender <- Data%>%
  select(gender)

normalize <- function(x){
return ((x-min(x))/(max(x)-min(x)))
}

Scores <- normalize(res_PA_obl2$scores)

colnames(Scores)<- c("Affiliative","Self_Defeating", "Aggressive", "Self_Enhancing")

GenderScores <- cbind(Scores, Gender)

GenderScores <- GenderScores %>%
  filter(gender=="1"|gender=="2")

GenderScores$gender[GenderScores$gender==1] <- "Male"
GenderScores$gender[GenderScores$gender==2] <- "Female"

head(GenderScores)
Females <- GenderScores %>% dplyr::filter(gender == "Male") 

Males <- GenderScores %>% dplyr::filter(gender == "Female")

To do this, we first check graphically their distribution with respect to each factor and then we will use Whitney Wilcoxon rank sum test to have a statistical confirmation of their diversity/equality.

The Wilcoxon rank-sum test is a non-parametric test to assess whether two samples of measures come from the same distribution:

It is an alternative to the two-sample unpaired t-test and focuses on the medians, the more the data are distributed symmetrically around the median (almost the same number of values above and below the median), the more similar are group means.

Affiliative with respect to gender

median <- GenderScores %>%
  group_by(gender) %>%
  summarize(median=median(Affiliative ))

ggplot(GenderScores, aes(x= Affiliative , fill=gender))+
  geom_density(alpha=0.3,size=1)+ 
  scale_fill_brewer(palette  = "Set1")+
  geom_vline(data = median, aes(xintercept = median, 
                                       color = gender), size=1.5)+ 
  labs(x= "Affiliative",
       subtitle="Affiliative Distribution respect to the Gender")+
  theme_minimal()

wilcox.test(Females$Affiliative,Males$Affiliative,alternative = "two.sided", var.equal = FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Females$Affiliative and Males$Affiliative
## W = 16850, p-value = 0.3416
## alternative hypothesis: true location shift is not equal to 0

Given the significant level equal to 0.05 we fail to reject the null hypothesis differences in medians are not statically significant for Affiliative Humor with respect to gender, so the two populations are equal.

Self-Defeating with respect to gender

median <- GenderScores %>%
  group_by(gender) %>%
  summarize(median=median(Self_Defeating ))

ggplot(GenderScores, aes(x= Self_Defeating , fill=gender))+
  geom_density(alpha=0.3,size=1)+ 
  scale_fill_brewer(palette  = "Set1")+
  geom_vline(data = median, aes(xintercept = median, 
                                       color = gender), size=1.5)+ 
  labs(x= "Self_Defeating",
       subtitle="Self_Defeating Distribution respect to the Gender")+
  theme_minimal()

wilcox.test(Females$Self_Defeating,Males$Self_Defeating,alternative = "two.sided", var.equal = FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Females$Self_Defeating and Males$Self_Defeating
## W = 18134, p-value = 0.02352
## alternative hypothesis: true location shift is not equal to 0

Given the significant level equal to 0.05 we reject the null hypothesis for the Wilcoxon rank sum test: differences in medians are statically significant for Self-Defeating Humor with respect to gender, so the two populations are not equal.

Aggressive with respect to gender

median <- GenderScores %>%
  group_by(gender) %>%
  summarize(median=median(Aggressive ))

ggplot(GenderScores, aes(x= Aggressive , fill=gender))+
  geom_density(alpha=0.3,size=1)+ 
  scale_fill_brewer(palette  = "Set1")+
  geom_vline(data = median, aes(xintercept = median, 
                                       color = gender), size=1.5)+ 
  labs(x= "Aggressive",
       subtitle="Aggressive Distribution respect to the Gender")+
  theme_minimal()

wilcox.test(Females$Aggressive,Males$Aggressive,alternative = "two.sided", var.equal = FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Females$Aggressive and Males$Aggressive
## W = 14831, p-value = 0.2654
## alternative hypothesis: true location shift is not equal to 0

Again, we fail to reject the null hypothesis the differences in medians are not statically significant for Aggressive Humor with respect to sex, so the two populations are equal.

Self-Enhancing with respect to gender

median <- GenderScores %>%
  group_by(gender) %>%
  summarize(median=median(Self_Enhancing ))

ggplot(GenderScores, aes(x= Self_Enhancing , fill=gender))+
  geom_density(alpha=0.3,size=1)+ 
  scale_fill_brewer(palette  = "Set1")+
  geom_vline(data = median, aes(xintercept = median, 
                                       color = gender), size=1.5)+ 
  labs(x= "Self_Enhancing",
       subtitle="Self_Enhancing Distribution respect to the Gender")+
  theme_minimal()

wilcox.test(Females$Self_Enhancing,Males$Self_Enhancing,alternative = "two.sided", var.equal = FALSE)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  Females$Self_Enhancing and Males$Self_Enhancing
## W = 20695, p-value = 1.033e-06
## alternative hypothesis: true location shift is not equal to 0

Given the significant level equal to 0.05 we reject the null hypothesis for the Wilcoxon rank sum test: differences in medians are statically significant for Self-Enhancing Humor with respect to gender, so the two populations are not equal.

Conclusion

Referring to the research conducted by Martin, R. A., Puhlik-Doris, P., Larsen, G., Gray, J., & Weir, K. (2003), in both cases we found the same number of factors: Affiliative, Self-enhancing, Aggressive and Self-defeating. In their analysis, males scored higher than females on Aggressive and Self-defeating humour, while in this case males and females have fairly similar scores on Affiliative and Aggressive Humour, while on average there are differences in the values for Self-Enhancing Humour and Self-Defeating Humour.