Multiple Correspondence Analysis (MCA) is a multivariate statistical technique that is used to analyze the relationships between categorical variables. It is a generalization of correspondence analysis (CA), which is used to analyze the relationships between two categorical variables. MCA can be used to explore the associations between multiple categorical variables simultaneously.
MCA works by creating a map of the categorical variables. The map is created by calculating the distances between the different categories of the variables. The closer two categories are on the map, the more similar they are. The further apart two categories are on the map, the less similar they are.
MCA can be used to explore a variety of research questions. For example, MCA can be used to:
MCA is a powerful tool that can be used to gain insights into the relationships between categorical variables. It is a versatile technique that can be used to explore a variety of research questions.
MCA is particularly well-suited for analyzing educational data, as it can be used to explore a wide range of topics, such as student achievement, teacher effectiveness, and school climate.
One of the key advantages of MCA is its capacity for making comparisons between groups. This feature is beneficial in examining differences in student achievement, teacher effectiveness, or school climate across various groups, such as different genders, races, or socioeconomic backgrounds. By utilizing MCA, educational researchers and policymakers can gain valuable insights into these variations and use them to inform decision-making.
MCA also aids in the identification of underlying dimensions or constructs that contribute to educational outcomes. It uncovers latent variables that may not be directly measured but are critical in understanding educational success. For instance, MCA can reveal associations between variables such as student motivation, parental involvement, and academic achievement. By recognizing these underlying dimensions, educators can design targeted interventions to enhance student engagement and academic performance.
Here are some specific examples of how MCA can be used to analyze educational data:
MCA is a valuable tool for educational researchers and policymakers who are interested in understanding how to improve student achievement. By using MCA, they can identify patterns and relationships in data, and make comparisons between groups. This information can be used to develop and implement effective educational policies and interventions.
# Question 6
read_data$q6 <- factor(
ifelse(read_data$q6 %in% c("0 teaching experience", "1-5 years"), 0,
ifelse(read_data$q6 %in% c("11-15 years", "16 - 20", "20+"), 2,
ifelse(read_data$q6 == "6-10 years", 1, read_data$q6)
)
),
levels = c(0, 1, 2),
labels = c("0-5 years", "6-10 years", "11-years and more")
)
# summary(read_data$q6)
# Question 7
read_data$q7 <- factor(
ifelse(read_data$q7 == "Click to write Choice 4", NA,
ifelse(read_data$q7 %in% c("Urban", "Suburban"), 1,
ifelse(read_data$q7 == "Rural", 0, read_data$q7)
)
),
levels = c(0, 1),
labels = c("Rural", "Urban/Suburban")
)
# summary(read_data$q7)
# Question 8
read_data$q8 <- factor(ifelse(
read_data$q8 == "I am a pre-service teacher (i.e., I am currently in training at Centre régional des métiers de l’éducation et de la formation (CRMEF) or Ecole Normale Supérieure (ENS)",
0,
ifelse(
read_data$q8 == "I am an-service teacher (i.e., I am currently working full-time as a teacher)",
1,
NA
)
), levels = c(0, 1), labels = c("pre-service", "inservice"))
# summary(read_data$q8)
# Question 14
# Create a function to extract the first selection
read_data$q14 <- as.character(read_data$q14)
extract_first_selection <- function(text) {
selections <- strsplit(text, ",")[[1]]
first_selection <- trimws(selections[1])
return(first_selection)
}
# Apply the function to create a new variable with the first selection
read_data$q14_first_selection <- sapply(read_data$q14, extract_first_selection)
# Recode the first selection into the desired categories
read_data$q14_recode <- factor(
read_data$q14_first_selection,
levels = c("News stories online", "Social media threads (e.g., Face book, Instagram, Twitter, etc.)", "Books online", "Magazines online", "Newspapers-In print", "Magazines-In print", "Other Materials in print (please specify)"),
labels = c("News stories online", "Social media threads", "Books online", "Magazines online", "Newspapers-In print", "Magazines-In print", "Other Materials in print (please specify)")
)
# Remove the intermediate variable
read_data$q14_first_selection <- NULL
# Change the class of the Variable
read_data$q14_recode <- as.factor(read_data$q14_recode)
# summary(read_data$q14_recode)
# Question 16
read_data$q16 <- factor(recode(read_data$q16,
"0 minutes" = 0,
"15 minutes" = 1,
"30 minutes" = 2,
"45 minutes" = 3,
"1 hour" = 4,
"1.5 hours" = 5,
"2 hours" = 6,
"3 hours or more" = 7
))
# Convert to factor with the specified levels
levels(read_data$q16) <- c("0 minutes", "15 minutes", "30 minutes", "45 minutes", "60 minutes", "90 minutes", "120 minutes", "180 minutes or more")
# summary(read_data$q16)
# summary(as.numeric(read_data$q16))
# Re-coding Responses in Question 17(q17_2, q17_3, q17_4, q17_5, q17_6, q17_7)
# Function
recodelikert <- function(data, variables, categories) {
for (variable in variables) {
data[[variable]] <- recode(data[[variable]], !!!categories)
data[[variable]] <- as.factor(data[[variable]])
}
return(data)
}
# Defining Categories
categories <- list(
"Never" = 0,
"Alittle of the Time" = 1,
"Some of the Time" = 2,
"Most of the Time" = 3
)
read_data <- recodelikert(read_data, c("q17_1", "q17_2", "q17_3", "q17_4", "q17_5", "q17_6", "q17_7"), categories)
# Questions 23 (q23_1, q23_2, q23_3, q23_4, q23_5, q23_6, q23_7)
# Defining Categories
categories_1 <- list(
"Never" = 0,
"A little of the time" = 1,
"Some of the time" = 2,
"Most of the time" = 3
)
read_data <- recodelikert(read_data, c("q23_1", "q23_2", "q23_3", "q23_4", "q23_5", "q23_6", "q23_7"), categories_1)
# Question 19, and saving as q19_recode
read_data$q19_recode <- factor(
ifelse(read_data$q19 %in% c("Did not multi-task", "Not sure"), NA,
ifelse(read_data$q19 == "No, not at all", 0,
ifelse(read_data$q19 == "Yes, some", 1,
ifelse(read_data$q19 == "Yes, a lot", 2, read_data$q19)
)
)
),
levels = c(0, 1, 2),
labels = c("No, not at all", "Yes, some", "Yes, a lot"),
exclude = NULL
)
# Question 20
read_data$q20 <- as.character(read_data$q20)
# Apply the function to create a new variable with the first selection
read_data$q20_first_selection <- sapply(read_data$q20, extract_first_selection)
# Recode the first selection into the desired categories
read_data$q20_recode <- factor(
read_data$q20_first_selection,
levels = c("Textbook Chapters-Online", "Journal articles-Online", "Reports-Online", "Novels-Online
", "Textbook Chapters-In print", "Reports-In print", "Novels-In print", "Other materials-Please specify"),
labels = c("Textbook Chapters-Online", "Journal articles-Online", "Reports-Online", "Novels-Online", "Textbook Chapters-In print", "Reports-In print", "Novels-In print", "Other materials-Please specify")
)
# Remove the intermediate variable
read_data$q20_first_selection <- NULL
# Question 21
read_data$q21 <- as.character(read_data$q21)
# Apply the function to create a new variable with the first selection
read_data$q21_first_selection <- sapply(read_data$q21, extract_first_selection)
# Recode the first selection into the desired categories
read_data$q21_recode <- factor(
read_data$q21_first_selection,
levels = c("6:00 a.m.-11:59 a.m.", "Noon-6:00 p.m.", "6:00 p.m.-11:59 p.m.", "Midnight-5:59 a.m."),
labels = c("6:00 a.m.-11:59 a.m.", "Noon-6:00 p.m.", "6:00 p.m.-11:59 p.m.", "Midnight-5:59 a.m.")
)
# Remove the intermediate variable
read_data$q21_first_selection <- NULL
# Question 22
read_data$q22 <- factor(recode(read_data$q22,
"0 minutes" = 0,
"15 minutes" = 1,
"30 minutes" = 2,
"45 minutes" = 3,
"1 hour" = 4,
"1.5 hours" = 5,
"2 hours" = 6,
"3 hours or more" = 7
))
# Convert to factor with the specified levels
levels(read_data$q22) <- c("0 minutes", "15 minutes", "30 minutes", "45 minutes", "60 minutes", "90 minutes", "120 minutes", "180 minutes or more")
# Question 25 and Saving as q25_recode
read_data$q25_recode <- factor(
ifelse(read_data$q25 %in% c("Did not multi-task", "Not sure"), NA,
ifelse(read_data$q25 == "No, not at all", 0,
ifelse(read_data$q25 == "Yes, some", 1,
ifelse(read_data$q25 == "Yes, a lot", 2, read_data$q25)
)
)
),
levels = c(0, 1, 2),
labels = c("No, not at all", "Yes, some", "Yes, a lot"),
exclude = NULL
)
# summary(read_data[,-2])
# str(read_data)
read_data <- read_data %>%
rename(
gender = q4,
experience = q6,
sch_type = q7,
tchr_type = q8,
rf_text = q14_recode,
rf_time = q15,
rf_length = q16,
rf_tv = q17_1,
rf_music = q17_2,
rf_pd = q17_3,
rf_write = q17_4,
rf_talk_phone = q17_5,
rf_onl_game = q17_6,
rf_soc_network = q17_7,
rf_disp = q19_recode,
ra_text = q20_recode,
ra_time = q21_recode,
ra_length = q22,
ra_tv = q23_1,
ra_music = q23_2,
ra_write = q23_3,
ra_talk_phone = q23_4,
ra_video_game = q23_5,
ra_soc_network = q23_6,
ra_other = q23_7,
ra_disp = q25_recode
)
# summary(read_data[,-2])
# str(read_data)
## 1. Investigating the Relationship Between Displacement and Demographic Factors:
# Cross-tabulation of rf_disp and gender
rf_disp_gender_xtab <- xtabs(~ rf_disp + gender, data = read_data)
# Calculate percentage values with two decimal places
rf_disp_gender_xtab_percentage <- round(prop.table(rf_disp_gender_xtab, margin = 2) * 100, 2)
# rf_disp_gender_xtab_percentage
# Cross-tabulation of ra_disp and gender
ra_disp_gender_xtab <- xtabs(~ ra_disp + gender, data = read_data)
# Calculate percentage values with two decimal places
ra_disp_gender_xtab_percentage <- round(prop.table(ra_disp_gender_xtab, margin = 2) * 100, 2)
# ra_disp_gender_xtab_percentage
# Cross-tabulation of rf_disp and experience
rf_disp_experience_xtab <- xtabs(~ rf_disp + experience, data = read_data)
# Calculate percentage values with two decimal digits
rf_disp_experience_xtab_percentage <- round(prop.table(rf_disp_experience_xtab, margin = 2) * 100, 2)
# rf_disp_experience_xtab_percentage
# Cross-tabulation of rf_disp and experience
ra_disp_experience_xtab <- xtabs(~ ra_disp + experience, data = read_data)
# Calculate percentage values with two decimal places
ra_disp_experience_xtab_percentage <- round(prop.table(ra_disp_experience_xtab, margin = 2) * 100, 2)
# ra_disp_experience_xtab_percentage
# Cross-tabulation of rf_disp and school type
rf_disp_sch_type_xtab <- xtabs(~ rf_disp + sch_type, data = read_data)
# Calculate percentage values with two decimal places
rf_disp_sch_type_xtab_percentage <- round(prop.table(rf_disp_sch_type_xtab, margin = 2) * 100, 2)
# rf_disp_sch_type_xtab_percentage
# Cross-tabulation of ra_disp and school type
ra_disp_sch_type_xtab <- xtabs(~ ra_disp + sch_type, data = read_data)
# Calculate percentage values with two decimal places
ra_disp_sch_type_xtab_percentage <- round(prop.table(ra_disp_sch_type_xtab, margin = 2) * 100, 2)
# ra_disp_sch_type_xtab_percentage
# Cross-tabulation of rf_disp and teacher type
rf_disp_tchr_type_xtab <- xtabs(~ rf_disp + tchr_type, data = read_data)
rf_disp_tchr_type_xtab_percentage <- round(prop.table(rf_disp_tchr_type_xtab, margin = 2) * 100, 2)
# rf_disp_tchr_type_xtab_percentage
# Cross-tabulation of rf_disp and teacher type
ra_disp_tchr_type_xtab <- xtabs(~ ra_disp + tchr_type, data = read_data)
# ra_disp_tchr_type_xtab
# Calculate percentage values with two decimal places
ra_disp_tchr_type_xtab_percentage <- round(prop.table(ra_disp_tchr_type_xtab, margin = 2) * 100, 2)
# ra_disp_tchr_type_xtab_percentage
# Filter out NAs from the dataset
filtered_data <- read_data[complete.cases(read_data), ]
# Create the grouped bar plots
gender_rf_disp <- ggplot(filtered_data, aes(x = rf_disp, fill = gender)) +
geom_bar(position = "dodge", color = "black") +
labs(x = "Reading for Fun Displacement", y = "Frequency") +
scale_y_continuous(breaks = NULL) + # Remove y-axis ticks
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black"),
axis.ticks.y = element_blank(), # Remove y-axis ticks
legend.position = "bottom",
legend.title = element_blank(),
legend.text = element_text(size = 8),
axis.text = element_text(size = 8),
axis.title = element_text(size = 10),
plot.title = element_text(size = 12, face = "bold")
) +
scale_fill_manual(
values = c("#0072B2", "#E69F00"),
labels = c("Female", "Male")
)
gender_ra_disp <- ggplot(filtered_data, aes(x = ra_disp, fill = gender)) +
geom_bar(position = "dodge", color = "black") +
labs(x = "Reading for Academic Purposes", y = "") +
scale_y_continuous(breaks = NULL) + # Remove y-axis ticks
theme_minimal() +
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
axis.line = element_line(colour = "black"),
axis.ticks.y = element_blank(), # Remove y-axis ticks
legend.position = "bottom",
legend.title = element_blank(),
legend.text = element_text(size = 8),
axis.text = element_text(size = 8),
axis.title = element_text(size = 10),
plot.title = element_text(size = 12, face = "bold")
) +
scale_fill_manual(
values = c("#0072B2", "#E69F00"),
labels = c("Female", "Male")
)
# Arrange the plots side by side
grid.arrange(gender_rf_disp, gender_ra_disp, ncol = 2)
Note: Please, do not take away anything from the plot. I created it just for fun. It doesn’t show anything meaningful.
final_data <- read_data |>
dplyr::select(
gender, experience, sch_type,
tchr_type, rf_time, rf_length,
rf_tv, rf_music, rf_pd, rf_write,
rf_talk_phone, rf_onl_game, rf_soc_network,
ra_length, ra_tv, ra_music, ra_write,
ra_talk_phone, ra_video_game, ra_soc_network,
ra_other, rf_text, rf_disp, ra_text,
ra_time, ra_disp
)
# dim(final_data)
# str(final_data)
# names(final_data)
summary(read_data)
duration_in_seconds ip_address gender
Min. : 74.0 Length:700 Female:182
1st Qu.: 418.8 Class :character Male :429
Median : 812.5 Mode :character NA's : 89
Mean : 2509.7
3rd Qu.: 1422.5
Max. :210923.0
experience sch_type tchr_type
0-5 years :313 Rural :281 pre-service:362
6-10 years :229 Urban/Suburban:412 inservice :317
11-years and more:158 NA's : 7 NA's : 21
q14 rf_time rf_length
Length:700 6:00 a.m.-11:59 a.m.:165 30 minutes :184
Class :character 6:00 p.m.-11:59 p.m.:226 45 minutes :156
Mode :character Midnight-5:59 a.m. : 35 60 minutes :155
Noon-6:00 p.m. :274 15 minutes : 75
90 minutes : 53
120 minutes: 51
(Other) : 26
rf_tv rf_music rf_pd rf_write rf_talk_phone rf_onl_game
0 : 64 0 : 36 0 : 30 0 : 67 0 : 49 0 :118
1 :124 1 :236 1 :182 1 :235 1 :239 1 :209
2 :370 2 :314 2 :360 2 :298 2 :319 2 :296
3 :132 3 :103 3 :119 3 : 89 3 : 87 3 : 60
NA's: 10 NA's: 11 NA's: 9 NA's: 11 NA's: 6 NA's: 17
rf_soc_network q19 q20
0 : 44 Did not multi-task: 11 Length:700
1 :224 No, not at all :197 Class :character
2 :311 Not sure : 48 Mode :character
3 :113 Yes, a lot :118
NA's: 8 Yes, some :317
NA's : 9
q21 ra_length ra_tv ra_music ra_write
Length:700 30 minutes :169 0 : 77 0 : 53 0 : 63
Class :character 60 minutes :162 1 :119 1 :230 1 :226
Mode :character 45 minutes :157 2 :342 2 :286 2 :300
15 minutes : 88 3 :154 3 :120 3 :103
90 minutes : 66 NA's: 8 NA's: 11 NA's: 8
120 minutes: 33
(Other) : 25
ra_talk_phone ra_video_game ra_soc_network ra_other
0 : 68 0 :125 0 : 44 0 : 64
1 :224 1 :208 1 :255 1 :149
2 :300 2 :273 2 :286 2 :188
3 : 95 3 : 81 3 :103 3 : 70
NA's: 13 NA's: 13 NA's: 12 NA's:229
q25 rf_text rf_disp
Did not multi-task: 9 News stories online:232 No, not at all:197
No, not at all :185 Books online :216 Yes, some :317
Not sure : 52 Magazines online :143 Yes, a lot :118
Yes, a lot :136 Newspapers-In print: 15 NA's : 68
Yes, some :318 Magazines-In print : 6
(Other) : 0
NA's : 88
ra_text ra_time
Textbook Chapters-Online :201 6:00 a.m.-11:59 a.m.:201
Journal articles-Online :182 Noon-6:00 p.m. :286
Reports-Online :143 6:00 p.m.-11:59 p.m.:194
Textbook Chapters-In print : 49 Midnight-5:59 a.m. : 19
Other materials-Please specify: 11
(Other) : 9
NA's :105
ra_disp
No, not at all:185
Yes, some :318
Yes, a lot :136
NA's : 61
mca_result <- MCA(ra_data, graph = TRUE)
Create a dimension plot to visualize the positions of categories and variables in the MCA solution space. This plot helps understand the relationships between variables and identify patterns or clusters. The categories and variables that are closer together on the plot are more strongly associated.
print(mca_result)
**Results of the Multiple Correspondence Analysis (MCA)**
The analysis was performed on 606 individuals, described by 10 variables
*The results are available in the following objects:
name description
1 "$eig" "eigenvalues"
2 "$var" "results for the variables"
3 "$var$coord" "coord. of the categories"
4 "$var$cos2" "cos2 for the categories"
5 "$var$contrib" "contributions of the categories"
6 "$var$v.test" "v-test for the categories"
7 "$ind" "results for the individuals"
8 "$ind$coord" "coord. for the individuals"
9 "$ind$cos2" "cos2 for the individuals"
10 "$ind$contrib" "contributions of the individuals"
11 "$call" "intermediate results"
12 "$call$marge.col" "weights of columns"
13 "$call$marge.li" "weights of rows"
fviz_mca_var(mca_result, axes = c(1, 2), col.var = "contrib", gradient.cols = c("blue", "white", "red"))
These weights reflect the relative importance of each variable in the MCA analysis. Variables with higher weights have a stronger influence on the analysis, while variables with lower weights have less impact.
weights <- mca_result$call$marge.col
### iv. Create a data frame to store the weights and variable names
weights_df <- data.frame(variable = names(weights), weight = weights)
# Sort the data frame in descending order of weights
weights_df <- weights_df[order(weights_df$weight, decreasing = TRUE), ]
# Print the sorted data frame
print(weights_df)
variable weight
Urban/Suburban Urban/Suburban 0.060561056
ra_tv_2 ra_tv_2 0.050825083
Yes, some Yes, some 0.050825083
0-5 years 0-5 years 0.045379538
ra_talk_phone_2 ra_talk_phone_2 0.044389439
Noon-6:00 p.m. Noon-6:00 p.m. 0.043564356
ra_soc_network_2 ra_soc_network_2 0.042409241
ra_music_2 ra_music_2 0.042244224
ra_video_game_2 ra_video_game_2 0.040594059
Rural Rural 0.039438944
ra_soc_network_1 ra_soc_network_1 0.036633663
ra_music_1 ra_music_1 0.033828383
ra_talk_phone_1 ra_talk_phone_1 0.033498350
6-10 years 6-10 years 0.033168317
ra_video_game_1 ra_video_game_1 0.030033003
No, not at all No, not at all 0.028547855
6:00 a.m.-11:59 a.m. 6:00 a.m.-11:59 a.m. 0.027722772
6:00 p.m.-11:59 p.m. 6:00 p.m.-11:59 p.m. 0.026897690
30 minutes 30 minutes 0.026072607
45 minutes 45 minutes 0.022277228
60 minutes 60 minutes 0.022112211
ra_tv_3 ra_tv_3 0.022112211
11-years and more 11-years and more 0.021452145
Yes, a lot Yes, a lot 0.020627063
ra_video_game_0 ra_video_game_0 0.017986799
ra_music_3 ra_music_3 0.016996700
ra_tv_1 ra_tv_1 0.016501650
ra_soc_network_3 ra_soc_network_3 0.014851485
15 minutes 15 minutes 0.013531353
ra_talk_phone_3 ra_talk_phone_3 0.012706271
ra_video_game_3 ra_video_game_3 0.011386139
ra_tv_0 ra_tv_0 0.010561056
ra_talk_phone_0 ra_talk_phone_0 0.009405941
90 minutes 90 minutes 0.008085809
ra_music_0 ra_music_0 0.006930693
ra_soc_network_0 ra_soc_network_0 0.006105611
120 minutes 120 minutes 0.004950495
Midnight-5:59 a.m. Midnight-5:59 a.m. 0.001815182
0 minutes 0 minutes 0.001815182
180 minutes or more 180 minutes or more 0.001155116
eigenvalue percentage of variance cumulative percentage of variance
dim 1 0.319 10.625 10.625
dim 2 0.262 8.738 19.363
dim 3 0.160 5.323 24.686
dim 4 0.153 5.084 29.771
dim 5 0.141 4.703 34.474
dim 6 0.126 4.199 38.672
dim 7 0.121 4.023 42.695
dim 8 0.114 3.786 46.481
dim 9 0.111 3.688 50.169
dim 10 0.107 3.552 53.721
dim 11 0.102 3.390 57.112
dim 12 0.100 3.336 60.448
dim 13 0.095 3.169 63.617
dim 14 0.094 3.132 66.749
dim 15 0.088 2.929 69.678
dim 16 0.086 2.858 72.536
dim 17 0.082 2.739 75.274
dim 18 0.079 2.642 77.916
dim 19 0.075 2.511 80.427
dim 20 0.071 2.380 82.807
dim 21 0.068 2.252 85.059
dim 22 0.062 2.067 87.126
dim 23 0.061 2.047 89.174
dim 24 0.057 1.910 91.084
dim 25 0.056 1.868 92.952
dim 26 0.053 1.759 94.711
dim 27 0.045 1.494 96.205
dim 28 0.045 1.487 97.692
dim 29 0.041 1.372 99.064
dim 30 0.028 0.936 100.000
Assess the relationships between variables and dimensions. Variables that have higher associations (higher cosine similarity) with specific dimensions are more strongly related to those dimensions. This information can help identify the variables that contribute most to each dimension and understand the underlying patterns in the data.
var_contributions <- mca_result$var$contrib
rounded_contributions <- round(mca_result$var$contrib, 3)
# Sort the rounded contributions in descending order
# sorted_contributions <- sort(rounded_contributions, decreasing = TRUE)
# Print the sorted contributions
# print(sorted_contributions)
print(rounded_contributions)
Dim 1 Dim 2 Dim 3 Dim 4 Dim 5
0-5 years 0.360 0.144 0.180 1.551 6.684
6-10 years 0.144 0.390 0.003 0.285 12.151
11-years and more 1.807 0.050 0.299 1.316 0.330
Rural 1.127 0.839 0.005 0.005 6.418
Urban/Suburban 0.734 0.546 0.003 0.004 4.179
6:00 a.m.-11:59 a.m. 0.010 0.838 4.134 0.018 5.508
Noon-6:00 p.m. 0.329 0.574 0.069 0.823 0.602
6:00 p.m.-11:59 p.m. 0.153 0.007 3.678 0.024 0.569
Midnight-5:59 a.m. 2.894 0.199 0.518 10.999 6.079
0 minutes 0.000 0.001 0.130 6.600 2.183
15 minutes 0.559 0.459 3.736 7.375 5.229
30 minutes 0.416 0.289 0.007 0.000 1.494
45 minutes 0.378 1.132 0.010 2.840 4.625
60 minutes 1.266 0.031 1.470 3.321 5.461
90 minutes 0.141 0.299 2.250 0.018 0.022
120 minutes 2.458 0.513 1.480 0.030 0.198
180 minutes or more 1.308 0.337 2.067 8.371 2.166
ra_tv_0 13.996 2.479 0.052 1.839 0.400
ra_tv_1 0.692 0.458 0.003 17.703 0.179
ra_tv_2 0.642 4.643 0.559 4.963 0.003
ra_tv_3 4.367 7.637 1.544 0.462 0.026
ra_music_0 12.763 3.897 0.242 1.691 0.246
ra_music_1 0.576 3.633 11.286 0.615 1.013
ra_music_2 0.716 1.093 14.938 0.040 0.048
ra_music_3 4.071 9.466 2.783 0.347 0.573
ra_talk_phone_0 8.663 1.311 1.024 1.884 5.228
ra_talk_phone_1 0.746 1.202 6.426 3.788 1.793
ra_talk_phone_2 1.083 2.304 6.921 1.481 0.006
ra_talk_phone_3 3.959 13.191 2.796 0.087 0.124
ra_video_game_0 15.227 3.177 0.004 0.726 0.264
ra_video_game_1 0.009 3.045 0.451 7.162 1.943
ra_video_game_2 1.805 2.320 0.006 2.943 1.456
ra_video_game_3 4.911 12.040 0.739 0.001 0.436
ra_soc_network_0 6.100 1.147 0.398 0.239 6.969
ra_soc_network_1 1.195 1.149 4.947 4.377 5.435
ra_soc_network_2 0.694 2.441 7.214 3.315 0.686
ra_soc_network_3 3.582 13.228 2.103 0.011 0.324
No, not at all 0.082 0.895 1.181 1.185 3.559
Yes, some 0.035 0.090 2.021 1.301 0.004
Yes, a lot 0.002 2.505 12.321 0.259 5.387
A factor analysis was conducted to examine the underlying dimensions of the variables. The analysis revealed five distinct dimensions (Dim 1 to Dim 5) that accounted for the variability in the data. Each dimension appears to represent a unique pattern of variables. Notably, Dim 1 showed moderate positive loadings for variables related to 11-years and more, indicating a potential relationship between this age group and other factors. Dim 2 showed a mix of positive and negative loadings for various variables, suggesting a complex relationship between them. Dim 3 exhibited higher loadings for variables related to specific time intervals, indicating a potential temporal association. Dim 4 showed a mix of loadings for different variables, suggesting a diverse pattern of associations. Dim 5 demonstrated higher loadings for variables related to ‘Yes, a lot,’ potentially indicating a strong association with this response category. Further analysis and contextual information are required to provide a comprehensive interpretation of these dimensions.
Calculate the category contributions to each dimension. Category contributions indicate the extent to which a particular category contributes to the overall structure of each dimension. Higher contributions suggest that the category has a stronger association with that dimension.
categories_coordinates <- mca_result$var$coord
round(categories_coordinates, 3)
Dim 1 Dim 2 Dim 3 Dim 4 Dim 5
0-5 years -0.159 0.091 0.080 0.228 -0.456
6-10 years -0.118 -0.176 -0.012 -0.115 0.719
11-years and more 0.518 0.078 -0.149 -0.306 -0.147
Rural -0.302 -0.236 -0.014 -0.015 0.479
Urban/Suburban 0.197 0.154 0.009 0.009 -0.312
6:00 a.m.-11:59 a.m. -0.034 0.282 0.488 0.032 -0.529
Noon-6:00 p.m. -0.155 -0.186 -0.050 -0.170 0.140
6:00 p.m.-11:59 p.m. 0.134 -0.025 -0.467 0.037 0.173
Midnight-5:59 a.m. 2.254 0.536 0.675 3.040 2.174
0 minutes 0.004 -0.035 -0.338 2.355 1.303
15 minutes -0.363 0.298 0.664 0.912 -0.738
30 minutes -0.225 0.170 -0.021 -0.002 0.284
45 minutes -0.233 -0.365 -0.027 -0.441 0.541
60 minutes 0.427 -0.061 -0.326 -0.479 -0.590
90 minutes -0.236 -0.312 -0.667 0.058 0.062
120 minutes 1.258 0.521 0.691 -0.096 -0.237
180 minutes or more 1.900 0.874 1.691 3.325 1.627
ra_tv_0 2.055 0.784 -0.089 -0.515 -0.231
ra_tv_1 0.366 -0.270 0.018 1.279 0.124
ra_tv_2 -0.201 -0.489 -0.132 -0.386 -0.010
ra_tv_3 -0.793 0.951 0.334 0.179 0.040
ra_music_0 2.423 1.214 0.236 -0.610 0.224
ra_music_1 0.233 -0.531 -0.730 0.167 -0.206
ra_music_2 -0.232 -0.260 0.751 0.038 0.040
ra_music_3 -0.874 1.208 -0.511 -0.177 0.218
ra_talk_phone_0 1.713 0.604 0.417 -0.553 0.886
ra_talk_phone_1 0.266 -0.307 -0.553 0.415 -0.275
ra_talk_phone_2 -0.279 -0.369 0.499 -0.226 -0.014
ra_talk_phone_3 -0.997 1.650 -0.593 0.102 0.117
ra_video_game_0 1.643 0.680 -0.020 -0.248 -0.144
ra_video_game_1 -0.030 -0.516 0.155 0.603 0.302
ra_video_game_2 -0.376 -0.387 -0.016 -0.333 -0.225
ra_video_game_3 -1.173 1.665 -0.322 -0.013 0.232
ra_soc_network_0 1.785 0.702 0.323 -0.244 1.269
ra_soc_network_1 0.322 -0.287 -0.464 0.427 -0.458
ra_soc_network_2 -0.228 -0.388 0.521 -0.345 0.151
ra_soc_network_3 -0.877 1.528 -0.476 0.033 0.175
No, not at all 0.096 -0.287 -0.257 0.252 0.419
Yes, some -0.047 -0.068 -0.252 -0.198 0.011
Yes, a lot -0.017 0.564 0.977 0.139 -0.607
A biplot, which combines the dimension plot with the variable plot. This plot allows you to visualize both the relationships between categories and the relationships between variables in the same plot. It helps interpret the associations between categories, variables, and dimensions simultaneously. This can lead to a better understanding of the relationships and provide valuable information for further analysis or decision-making.
plot.MCA(mca_result, cex = 0.8)
eigenvalues_m <- mca_result$eig
chi_square_stats <- eigenvalues_m^2
df <- 1 # Degrees of freedom for each dimension
p_values <- 1 - pchisq(chi_square_stats, df)
for (i in seq_along(chi_square_stats)) {
cat("Dimension", i, "Chi-square:", chi_square_stats[i], "df:", df, "p-value:", p_values[i], "\n")
}
Dimension 1 Chi-square: 0.1016094 df: 1 p-value: 0.7499068
Dimension 2 Chi-square: 0.06871698 df: 1 p-value: 0.7932141
Dimension 3 Chi-square: 0.02550076 df: 1 p-value: 0.8731256
Dimension 4 Chi-square: 0.02326633 df: 1 p-value: 0.8787665
Dimension 5 Chi-square: 0.01990572 df: 1 p-value: 0.8878007
Dimension 6 Chi-square: 0.01586628 df: 1 p-value: 0.8997625
Dimension 7 Chi-square: 0.01456391 df: 1 p-value: 0.9039437
Dimension 8 Chi-square: 0.01290278 df: 1 p-value: 0.9095625
Dimension 9 Chi-square: 0.01223808 df: 1 p-value: 0.911913
Dimension 10 Chi-square: 0.0113574 df: 1 p-value: 0.9151292
Dimension 11 Chi-square: 0.01034544 df: 1 p-value: 0.9189848
Dimension 12 Chi-square: 0.01001456 df: 1 p-value: 0.9202866
Dimension 13 Chi-square: 0.009039286 df: 1 p-value: 0.9242551
Dimension 14 Chi-square: 0.008827199 df: 1 p-value: 0.9251464
Dimension 15 Chi-square: 0.007721398 df: 1 p-value: 0.9299788
Dimension 16 Chi-square: 0.007351747 df: 1 p-value: 0.9316713
Dimension 17 Chi-square: 0.006750329 df: 1 p-value: 0.9345192
Dimension 18 Chi-square: 0.006280084 df: 1 p-value: 0.9368362
Dimension 19 Chi-square: 0.005676406 df: 1 p-value: 0.9399427
Dimension 20 Chi-square: 0.005097895 df: 1 p-value: 0.9430798
Dimension 21 Chi-square: 0.004563512 df: 1 p-value: 0.9461409
Dimension 22 Chi-square: 0.003845377 df: 1 p-value: 0.950554
Dimension 23 Chi-square: 0.00377282 df: 1 p-value: 0.9510221
Dimension 24 Chi-square: 0.003284831 df: 1 p-value: 0.9542955
Dimension 25 Chi-square: 0.003140405 df: 1 p-value: 0.9553105
Dimension 26 Chi-square: 0.002784244 df: 1 p-value: 0.9579184
Dimension 27 Chi-square: 0.002008099 df: 1 p-value: 0.9642573
Dimension 28 Chi-square: 0.001990375 df: 1 p-value: 0.9644153
Dimension 29 Chi-square: 0.001695185 df: 1 p-value: 0.9671583
Dimension 30 Chi-square: 0.0007883342 df: 1 p-value: 0.9776005
Dimension 31 Chi-square: 112.8993 df: 1 p-value: 0
Dimension 32 Chi-square: 76.3522 df: 1 p-value: 0
Dimension 33 Chi-square: 28.33418 df: 1 p-value: 1.020777e-07
Dimension 34 Chi-square: 25.85147 df: 1 p-value: 3.687233e-07
Dimension 35 Chi-square: 22.11747 df: 1 p-value: 2.564658e-06
Dimension 36 Chi-square: 17.6292 df: 1 p-value: 2.684357e-05
Dimension 37 Chi-square: 16.18212 df: 1 p-value: 5.753452e-05
Dimension 38 Chi-square: 14.33642 df: 1 p-value: 0.0001528783
Dimension 39 Chi-square: 13.59787 df: 1 p-value: 0.0002264426
Dimension 40 Chi-square: 12.61933 df: 1 p-value: 0.0003817777
Dimension 41 Chi-square: 11.49494 df: 1 p-value: 0.0006978603
Dimension 42 Chi-square: 11.12729 df: 1 p-value: 0.0008506695
Dimension 43 Chi-square: 10.04365 df: 1 p-value: 0.001528739
Dimension 44 Chi-square: 9.807999 df: 1 p-value: 0.001737544
Dimension 45 Chi-square: 8.579331 df: 1 p-value: 0.003400003
Dimension 46 Chi-square: 8.168607 df: 1 p-value: 0.004262162
Dimension 47 Chi-square: 7.500366 df: 1 p-value: 0.006168646
Dimension 48 Chi-square: 6.977871 df: 1 p-value: 0.008252371
Dimension 49 Chi-square: 6.307118 df: 1 p-value: 0.01202541
Dimension 50 Chi-square: 5.664328 df: 1 p-value: 0.01731335
Dimension 51 Chi-square: 5.070568 df: 1 p-value: 0.0243354
Dimension 52 Chi-square: 4.272641 df: 1 p-value: 0.03873068
Dimension 53 Chi-square: 4.192022 df: 1 p-value: 0.04061463
Dimension 54 Chi-square: 3.649812 df: 1 p-value: 0.05607554
Dimension 55 Chi-square: 3.489339 df: 1 p-value: 0.06176523
Dimension 56 Chi-square: 3.093604 df: 1 p-value: 0.07860054
Dimension 57 Chi-square: 2.231221 df: 1 p-value: 0.1352469
Dimension 58 Chi-square: 2.211528 df: 1 p-value: 0.1369829
Dimension 59 Chi-square: 1.883539 df: 1 p-value: 0.1699325
Dimension 60 Chi-square: 0.8759269 df: 1 p-value: 0.3493197
Dimension 61 Chi-square: 112.8993 df: 1 p-value: 0
Dimension 62 Chi-square: 374.9406 df: 1 p-value: 0
Dimension 63 Chi-square: 609.4168 df: 1 p-value: 0
Dimension 64 Chi-square: 886.3007 df: 1 p-value: 0
Dimension 65 Chi-square: 1188.438 df: 1 p-value: 0
Dimension 66 Chi-square: 1495.557 df: 1 p-value: 0
Dimension 67 Chi-square: 1822.875 df: 1 p-value: 0
Dimension 68 Chi-square: 2160.528 df: 1 p-value: 0
Dimension 69 Chi-square: 2516.93 df: 1 p-value: 0
Dimension 70 Chi-square: 2885.987 df: 1 p-value: 0
Dimension 71 Chi-square: 3261.758 df: 1 p-value: 0
Dimension 72 Chi-square: 3653.907 df: 1 p-value: 0
Dimension 73 Chi-square: 4047.089 df: 1 p-value: 0
Dimension 74 Chi-square: 4455.363 df: 1 p-value: 0
Dimension 75 Chi-square: 4854.962 df: 1 p-value: 0
Dimension 76 Chi-square: 5261.418 df: 1 p-value: 0
Dimension 77 Chi-square: 5666.222 df: 1 p-value: 0
Dimension 78 Chi-square: 6070.884 df: 1 p-value: 0
Dimension 79 Chi-square: 6468.547 df: 1 p-value: 0
Dimension 80 Chi-square: 6857.042 df: 1 p-value: 0
Dimension 81 Chi-square: 7235.042 df: 1 p-value: 0
Dimension 82 Chi-square: 7590.955 df: 1 p-value: 0
Dimension 83 Chi-square: 7951.919 df: 1 p-value: 0
Dimension 84 Chi-square: 8296.291 df: 1 p-value: 0
Dimension 85 Chi-square: 8640.066 df: 1 p-value: 0
Dimension 86 Chi-square: 8970.14 df: 1 p-value: 0
Dimension 87 Chi-square: 9255.315 df: 1 p-value: 0
Dimension 88 Chi-square: 9543.662 df: 1 p-value: 0
Dimension 89 Chi-square: 9813.694 df: 1 p-value: 0
Dimension 90 Chi-square: 10000 df: 1 p-value: 0
The red dotted line provides a visual reference point for interpreting the significance of variable contributions in relation to the expected distribution. Variables with contributions significantly above or below this line are the ones that are contributing the most to the variation along the specific dimension.
library(factoextra)
# Extract the variable contributions for each dimension
var_contributions <- mca_result$var$contrib
# Create separate plots for each dimension
for (dim in 1:5) {
plot_title <- paste("Contribution of Variables to Dimension", dim)
# Create the contribution plot for the current dimension
p <- fviz_contrib(mca_result, choice = "var", axes = c(dim), gradient.cols = c("blue", "white", "red")) +
ggtitle(plot_title)
# Display the plot
print(p)
}