Multiple Correspondence Analysis (MCA) in Educational Data

Introduction

Multiple Correspondence Analysis (MCA) is a multivariate statistical technique that is used to analyze the relationships between categorical variables. It is a generalization of correspondence analysis (CA), which is used to analyze the relationships between two categorical variables. MCA can be used to explore the associations between multiple categorical variables simultaneously.

MCA works by creating a map of the categorical variables. The map is created by calculating the distances between the different categories of the variables. The closer two categories are on the map, the more similar they are. The further apart two categories are on the map, the less similar they are.

MCA can be used to explore a variety of research questions. For example, MCA can be used to:

Explore the relationships between different demographic variables, such as age, gender, and education level.
Explore the relationships between different product features, such as price, color, and size.
Explore the relationships between different customer segments, such as brand loyalists, price-sensitive shoppers, and impulse buyers.

MCA is a powerful tool that can be used to gain insights into the relationships between categorical variables. It is a versatile technique that can be used to explore a variety of research questions.

MCA for Analyzing Educational Data:

MCA is particularly well-suited for analyzing educational data, as it can be used to explore a wide range of topics, such as student achievement, teacher effectiveness, and school climate.

One of the key advantages of MCA is its capacity for making comparisons between groups. This feature is beneficial in examining differences in student achievement, teacher effectiveness, or school climate across various groups, such as different genders, races, or socioeconomic backgrounds. By utilizing MCA, educational researchers and policymakers can gain valuable insights into these variations and use them to inform decision-making.

MCA also aids in the identification of underlying dimensions or constructs that contribute to educational outcomes. It uncovers latent variables that may not be directly measured but are critical in understanding educational success. For instance, MCA can reveal associations between variables such as student motivation, parental involvement, and academic achievement. By recognizing these underlying dimensions, educators can design targeted interventions to enhance student engagement and academic performance.

Here are some specific examples of how MCA can be used to analyze educational data:

MCA can be used to explore the relationship between student achievement and student background characteristics, such as gender, race, or socioeconomic status.
MCA can be used to identify factors that contribute to teacher effectiveness.
MCA can be used to compare school climates between different schools.
MCA can be used to assess the impact of educational interventions.

MCA is a valuable tool for educational researchers and policymakers who are interested in understanding how to improve student achievement. By using MCA, they can identify patterns and relationships in data, and make comparisons between groups. This information can be used to develop and implement effective educational policies and interventions.

Show code

# Looking at the summary, the variables having `_text` at the end is not useful. Getting rid of them:
new_data <- reading_data %>%
  select(-ends_with("_text"))
# str(new_data)

An Example Workout

a. Raw Data, Variable Information and Summary

Show code

read_data <- reading_data %>%
  select(
    duration_in_seconds, ip_address,
    q4, q6, q7, q8, q14, q15, q16,
    q17_1, q17_2, q17_3, q17_4, q17_5, q17_6, q17_7,
    q19, q20, q21, q22,
    q23_1, q23_2, q23_3, q23_4, q23_5, q23_6, q23_7, q25
  )
# summary(read_data[ ,-(1:2)])

Question 4 [q4 - gender]: Response Codes Female, Male
Question 6 [q6 - experience]: Response Codes 0-5 years = 0, 6-10 years = 1, 11-years and more = 2
Question 7 [q7 - sch_type]: Response Codes Rural = 0, Urban/Suburban = 1
Question 8 [q8 - tchr_type]: Response Codes pre-service = 0, inservice = 1
Question 15 [q15 - rf_time]: Response Codes
Question 16 [q16 - rf_length]: Response Codes 0 minutes = 0, 15 minutes = 1, 30 minutes = 2, 45 minutes = 3, 1 hour = 4, 1.5 hours = 5, 2 hours = 6, 3 hours or more = 7
Question 17 [q17_1 - rf_tv, q17_2 - rf_music, q17_3 - rf_pd, q17_4 - rf_write, q17_5 - rf_talk_phone, q17_6 - rf_onl_game, q17_7 - rf_soc_network]: Response Codes Never = 0, Alittle of the Time = 1, Some of the Time = 2, Most of the Time = 3
Question 19 [q19 - rf_disp]: Response Codes No, not at all = 0, Yes, some = 1, Yes, a lot = 2
Question 20 [q20 - ra_text]: Response Codes Textbook Chapters-Online, Journal articles-Online, Reports-Online, Novels-Online, Textbook Chapters-In print, Reports-In print, Novels-In print, Other materials-Please specify
Question 21 [q21 - ra_time]: Response Codes 6:00 a.m.-11:59 a.m., Noon-6:00 p.m., 6:00 p.m.-11:59 p.m., Midnight-5:59 a.m.
Question 22 [q22 - ra_length]: Response Codes 0 minutes = 0, 15 minutes = 1, 30 minutes = 2, 45 minutes = 3, 1 hour = 4, 1.5 hours = 5, 2 hours = 6, 3 hours or more = 7
Question 23 [q23_1 - ra_tv, q23_2 - ra_music, q23_3 - ra_write, q23_4 - ra_talk_phone, q23_5 - ra_video_game, q23_6 - ra_soc_network, q23_7 - ra_other]: Response Codes Never = 0, A little of the time = 1, Some of the time = 2, Most of the time = 3
Question 25 [q25 - ra_disp]: Response Codes No, not at all = 0, Yes, some = 1, Yes, a lot = 2

Show code

# Question 6
read_data$q6 <- factor(
  ifelse(read_data$q6 %in% c("0 teaching experience", "1-5 years"), 0,
    ifelse(read_data$q6 %in% c("11-15 years", "16 - 20", "20+"), 2,
      ifelse(read_data$q6 == "6-10 years", 1, read_data$q6)
    )
  ),
  levels = c(0, 1, 2),
  labels = c("0-5 years", "6-10 years", "11-years and more")
)
# summary(read_data$q6)

# Question 7
read_data$q7 <- factor(
  ifelse(read_data$q7 == "Click to write Choice 4", NA,
    ifelse(read_data$q7 %in% c("Urban", "Suburban"), 1,
      ifelse(read_data$q7 == "Rural", 0, read_data$q7)
    )
  ),
  levels = c(0, 1),
  labels = c("Rural", "Urban/Suburban")
)
# summary(read_data$q7)

# Question 8
read_data$q8 <- factor(ifelse(
  read_data$q8 == "I am a pre-service teacher (i.e., I am currently in training at Centre régional des métiers de l’éducation et de la formation (CRMEF) or Ecole Normale Supérieure (ENS)",
  0,
  ifelse(
    read_data$q8 == "I am an-service teacher (i.e., I am currently working full-time as a teacher)",
    1,
    NA
  )
), levels = c(0, 1), labels = c("pre-service", "inservice"))
# summary(read_data$q8)

# Question 14
# Create a function to extract the first selection
read_data$q14 <- as.character(read_data$q14)
extract_first_selection <- function(text) {
  selections <- strsplit(text, ",")[[1]]
  first_selection <- trimws(selections[1])
  return(first_selection)
}
# Apply the function to create a new variable with the first selection
read_data$q14_first_selection <- sapply(read_data$q14, extract_first_selection)
# Recode the first selection into the desired categories
read_data$q14_recode <- factor(
  read_data$q14_first_selection,
  levels = c("News stories online", "Social media threads (e.g., Face book, Instagram, Twitter, etc.)", "Books online", "Magazines online", "Newspapers-In print", "Magazines-In print", "Other Materials in print (please specify)"),
  labels = c("News stories online", "Social media threads", "Books online", "Magazines online", "Newspapers-In print", "Magazines-In print", "Other Materials in print (please specify)")
)
# Remove the intermediate variable
read_data$q14_first_selection <- NULL
# Change the class of the Variable
read_data$q14_recode <- as.factor(read_data$q14_recode)
# summary(read_data$q14_recode)

# Question 16
read_data$q16 <- factor(recode(read_data$q16,
  "0 minutes" = 0,
  "15  minutes" = 1,
  "30  minutes" = 2,
  "45  minutes" = 3,
  "1 hour" = 4,
  "1.5 hours" = 5,
  "2 hours" = 6,
  "3 hours or more" = 7
))
# Convert to factor with the specified levels
levels(read_data$q16) <- c("0 minutes", "15 minutes", "30 minutes", "45 minutes", "60 minutes", "90 minutes", "120 minutes", "180 minutes or more")
# summary(read_data$q16)
# summary(as.numeric(read_data$q16))

# Re-coding Responses in Question 17(q17_2, q17_3, q17_4, q17_5, q17_6, q17_7)
# Function
recodelikert <- function(data, variables, categories) {
  for (variable in variables) {
    data[[variable]] <- recode(data[[variable]], !!!categories)
    data[[variable]] <- as.factor(data[[variable]])
  }
  return(data)
}
# Defining Categories
categories <- list(
  "Never" = 0,
  "Alittle of the Time" = 1,
  "Some of the Time" = 2,
  "Most of the Time" = 3
)
read_data <- recodelikert(read_data, c("q17_1", "q17_2", "q17_3", "q17_4", "q17_5", "q17_6", "q17_7"), categories)

# Questions 23 (q23_1, q23_2, q23_3, q23_4, q23_5, q23_6, q23_7)
# Defining Categories
categories_1 <- list(
  "Never" = 0,
  "A little of the time" = 1,
  "Some of the time" = 2,
  "Most of the time" = 3
)
read_data <- recodelikert(read_data, c("q23_1", "q23_2", "q23_3", "q23_4", "q23_5", "q23_6", "q23_7"), categories_1)

# Question 19, and saving as q19_recode
read_data$q19_recode <- factor(
  ifelse(read_data$q19 %in% c("Did not multi-task", "Not sure"), NA,
    ifelse(read_data$q19 == "No, not at all", 0,
      ifelse(read_data$q19 == "Yes, some", 1,
        ifelse(read_data$q19 == "Yes, a lot", 2, read_data$q19)
      )
    )
  ),
  levels = c(0, 1, 2),
  labels = c("No, not at all", "Yes, some", "Yes, a lot"),
  exclude = NULL
)

# Question 20
read_data$q20 <- as.character(read_data$q20)
# Apply the function to create a new variable with the first selection
read_data$q20_first_selection <- sapply(read_data$q20, extract_first_selection)
# Recode the first selection into the desired categories
read_data$q20_recode <- factor(
  read_data$q20_first_selection,
  levels = c("Textbook Chapters-Online", "Journal articles-Online", "Reports-Online", "Novels-Online
", "Textbook Chapters-In print", "Reports-In print", "Novels-In print", "Other materials-Please specify"),
  labels = c("Textbook Chapters-Online", "Journal articles-Online", "Reports-Online", "Novels-Online", "Textbook Chapters-In print", "Reports-In print", "Novels-In print", "Other materials-Please specify")
)
# Remove the intermediate variable
read_data$q20_first_selection <- NULL

# Question 21
read_data$q21 <- as.character(read_data$q21)
# Apply the function to create a new variable with the first selection
read_data$q21_first_selection <- sapply(read_data$q21, extract_first_selection)
# Recode the first selection into the desired categories
read_data$q21_recode <- factor(
  read_data$q21_first_selection,
  levels = c("6:00 a.m.-11:59 a.m.", "Noon-6:00 p.m.", "6:00 p.m.-11:59 p.m.", "Midnight-5:59 a.m."),
  labels = c("6:00 a.m.-11:59 a.m.", "Noon-6:00 p.m.", "6:00 p.m.-11:59 p.m.", "Midnight-5:59 a.m.")
)
# Remove the intermediate variable
read_data$q21_first_selection <- NULL

# Question 22
read_data$q22 <- factor(recode(read_data$q22,
  "0 minutes" = 0,
  "15 minutes" = 1,
  "30 minutes" = 2,
  "45 minutes" = 3,
  "1 hour" = 4,
  "1.5 hours" = 5,
  "2 hours" = 6,
  "3 hours or more" = 7
))
# Convert to factor with the specified levels
levels(read_data$q22) <- c("0 minutes", "15 minutes", "30 minutes", "45 minutes", "60 minutes", "90 minutes", "120 minutes", "180 minutes or more")

# Question 25 and Saving as q25_recode
read_data$q25_recode <- factor(
  ifelse(read_data$q25 %in% c("Did not multi-task", "Not sure"), NA,
    ifelse(read_data$q25 == "No, not at all", 0,
      ifelse(read_data$q25 == "Yes, some", 1,
        ifelse(read_data$q25 == "Yes, a lot", 2, read_data$q25)
      )
    )
  ),
  levels = c(0, 1, 2),
  labels = c("No, not at all", "Yes, some", "Yes, a lot"),
  exclude = NULL
)

# summary(read_data[,-2])
# str(read_data)

Show code

read_data <- read_data %>%
  rename(
    gender = q4,
    experience = q6,
    sch_type = q7,
    tchr_type = q8,
    rf_text = q14_recode,
    rf_time = q15,
    rf_length = q16,
    rf_tv = q17_1,
    rf_music = q17_2,
    rf_pd = q17_3,
    rf_write = q17_4,
    rf_talk_phone = q17_5,
    rf_onl_game = q17_6,
    rf_soc_network = q17_7,
    rf_disp = q19_recode,
    ra_text = q20_recode,
    ra_time = q21_recode,
    ra_length = q22,
    ra_tv = q23_1,
    ra_music = q23_2,
    ra_write = q23_3,
    ra_talk_phone = q23_4,
    ra_video_game = q23_5,
    ra_soc_network = q23_6,
    ra_other = q23_7,
    ra_disp = q25_recode
  )
# summary(read_data[,-2])
# str(read_data)

Show code

## 1. Investigating the Relationship Between Displacement and Demographic Factors:
# Cross-tabulation of rf_disp and gender
rf_disp_gender_xtab <- xtabs(~ rf_disp + gender, data = read_data)
# Calculate percentage values with two decimal places
rf_disp_gender_xtab_percentage <- round(prop.table(rf_disp_gender_xtab, margin = 2) * 100, 2)
# rf_disp_gender_xtab_percentage

# Cross-tabulation of ra_disp and gender
ra_disp_gender_xtab <- xtabs(~ ra_disp + gender, data = read_data)
# Calculate percentage values with two decimal places
ra_disp_gender_xtab_percentage <- round(prop.table(ra_disp_gender_xtab, margin = 2) * 100, 2)
# ra_disp_gender_xtab_percentage

# Cross-tabulation of rf_disp and experience
rf_disp_experience_xtab <- xtabs(~ rf_disp + experience, data = read_data)
# Calculate percentage values with two decimal digits
rf_disp_experience_xtab_percentage <- round(prop.table(rf_disp_experience_xtab, margin = 2) * 100, 2)
# rf_disp_experience_xtab_percentage

# Cross-tabulation of rf_disp and experience
ra_disp_experience_xtab <- xtabs(~ ra_disp + experience, data = read_data)
# Calculate percentage values with two decimal places
ra_disp_experience_xtab_percentage <- round(prop.table(ra_disp_experience_xtab, margin = 2) * 100, 2)
# ra_disp_experience_xtab_percentage

# Cross-tabulation of rf_disp and school type
rf_disp_sch_type_xtab <- xtabs(~ rf_disp + sch_type, data = read_data)
# Calculate percentage values with two decimal places
rf_disp_sch_type_xtab_percentage <- round(prop.table(rf_disp_sch_type_xtab, margin = 2) * 100, 2)
# rf_disp_sch_type_xtab_percentage

# Cross-tabulation of ra_disp and school type
ra_disp_sch_type_xtab <- xtabs(~ ra_disp + sch_type, data = read_data)
# Calculate percentage values with two decimal places
ra_disp_sch_type_xtab_percentage <- round(prop.table(ra_disp_sch_type_xtab, margin = 2) * 100, 2)
# ra_disp_sch_type_xtab_percentage

# Cross-tabulation of rf_disp and teacher type
rf_disp_tchr_type_xtab <- xtabs(~ rf_disp + tchr_type, data = read_data)
rf_disp_tchr_type_xtab_percentage <- round(prop.table(rf_disp_tchr_type_xtab, margin = 2) * 100, 2)
# rf_disp_tchr_type_xtab_percentage

# Cross-tabulation of rf_disp and teacher type
ra_disp_tchr_type_xtab <- xtabs(~ ra_disp + tchr_type, data = read_data)
# ra_disp_tchr_type_xtab
# Calculate percentage values with two decimal places
ra_disp_tchr_type_xtab_percentage <- round(prop.table(ra_disp_tchr_type_xtab, margin = 2) * 100, 2)
# ra_disp_tchr_type_xtab_percentage

Show code

# Filter out NAs from the dataset
filtered_data <- read_data[complete.cases(read_data), ]

# Create the grouped bar plots
gender_rf_disp <- ggplot(filtered_data, aes(x = rf_disp, fill = gender)) +
  geom_bar(position = "dodge", color = "black") +
  labs(x = "Reading for Fun Displacement", y = "Frequency") +
  scale_y_continuous(breaks = NULL) + # Remove y-axis ticks
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.line = element_line(colour = "black"),
    axis.ticks.y = element_blank(), # Remove y-axis ticks
    legend.position = "bottom",
    legend.title = element_blank(),
    legend.text = element_text(size = 8),
    axis.text = element_text(size = 8),
    axis.title = element_text(size = 10),
    plot.title = element_text(size = 12, face = "bold")
  ) +
  scale_fill_manual(
    values = c("#0072B2", "#E69F00"),
    labels = c("Female", "Male")
  )

gender_ra_disp <- ggplot(filtered_data, aes(x = ra_disp, fill = gender)) +
  geom_bar(position = "dodge", color = "black") +
  labs(x = "Reading for Academic Purposes", y = "") +
  scale_y_continuous(breaks = NULL) + # Remove y-axis ticks
  theme_minimal() +
  theme(
    panel.grid.major = element_blank(),
    panel.grid.minor = element_blank(),
    axis.line = element_line(colour = "black"),
    axis.ticks.y = element_blank(), # Remove y-axis ticks
    legend.position = "bottom",
    legend.title = element_blank(),
    legend.text = element_text(size = 8),
    axis.text = element_text(size = 8),
    axis.title = element_text(size = 10),
    plot.title = element_text(size = 12, face = "bold")
  ) +
  scale_fill_manual(
    values = c("#0072B2", "#E69F00"),
    labels = c("Female", "Male")
  )

# Arrange the plots side by side
grid.arrange(gender_rf_disp, gender_ra_disp, ncol = 2)

Note: Please, do not take away anything from the plot. I created it just for fun. It doesn’t show anything meaningful.

Show code

final_data <- read_data |>
  dplyr::select(
    gender, experience, sch_type,
    tchr_type, rf_time, rf_length,
    rf_tv, rf_music, rf_pd, rf_write,
    rf_talk_phone, rf_onl_game, rf_soc_network,
    ra_length, ra_tv, ra_music, ra_write,
    ra_talk_phone, ra_video_game, ra_soc_network,
    ra_other, rf_text, rf_disp, ra_text,
    ra_time, ra_disp
  )
# dim(final_data)
# str(final_data)
# names(final_data)
summary(read_data)

 duration_in_seconds  ip_address           gender   
 Min.   :    74.0    Length:700         Female:182  
 1st Qu.:   418.8    Class :character   Male  :429  
 Median :   812.5    Mode  :character   NA's  : 89  
 Mean   :  2509.7                                   
 3rd Qu.:  1422.5                                   
 Max.   :210923.0                                   
                                                    
             experience            sch_type         tchr_type  
 0-5 years        :313   Rural         :281   pre-service:362  
 6-10 years       :229   Urban/Suburban:412   inservice  :317  
 11-years and more:158   NA's          :  7   NA's       : 21  
                                                               
                                                               
                                                               
                                                               
     q14                            rf_time          rf_length  
 Length:700         6:00 a.m.-11:59 a.m.:165   30 minutes :184  
 Class :character   6:00 p.m.-11:59 p.m.:226   45 minutes :156  
 Mode  :character   Midnight-5:59 a.m.  : 35   60 minutes :155  
                    Noon-6:00 p.m.      :274   15 minutes : 75  
                                               90 minutes : 53  
                                               120 minutes: 51  
                                               (Other)    : 26  
  rf_tv     rf_music    rf_pd     rf_write   rf_talk_phone rf_onl_game
 0   : 64   0   : 36   0   : 30   0   : 67   0   : 49      0   :118   
 1   :124   1   :236   1   :182   1   :235   1   :239      1   :209   
 2   :370   2   :314   2   :360   2   :298   2   :319      2   :296   
 3   :132   3   :103   3   :119   3   : 89   3   : 87      3   : 60   
 NA's: 10   NA's: 11   NA's:  9   NA's: 11   NA's:  6      NA's: 17   
                                                                      
                                                                      
 rf_soc_network                 q19          q20           
 0   : 44       Did not multi-task: 11   Length:700        
 1   :224       No, not at all    :197   Class :character  
 2   :311       Not sure          : 48   Mode  :character  
 3   :113       Yes, a lot        :118                     
 NA's:  8       Yes, some         :317                     
                NA's              :  9                     
                                                           
     q21                  ra_length    ra_tv     ra_music   ra_write  
 Length:700         30 minutes :169   0   : 77   0   : 53   0   : 63  
 Class :character   60 minutes :162   1   :119   1   :230   1   :226  
 Mode  :character   45 minutes :157   2   :342   2   :286   2   :300  
                    15 minutes : 88   3   :154   3   :120   3   :103  
                    90 minutes : 66   NA's:  8   NA's: 11   NA's:  8  
                    120 minutes: 33                                   
                    (Other)    : 25                                   
 ra_talk_phone ra_video_game ra_soc_network ra_other  
 0   : 68      0   :125      0   : 44       0   : 64  
 1   :224      1   :208      1   :255       1   :149  
 2   :300      2   :273      2   :286       2   :188  
 3   : 95      3   : 81      3   :103       3   : 70  
 NA's: 13      NA's: 13      NA's: 12       NA's:229  
                                                      
                                                      
                 q25                     rf_text              rf_disp   
 Did not multi-task:  9   News stories online:232   No, not at all:197  
 No, not at all    :185   Books online       :216   Yes, some     :317  
 Not sure          : 52   Magazines online   :143   Yes, a lot    :118  
 Yes, a lot        :136   Newspapers-In print: 15   NA's          : 68  
 Yes, some         :318   Magazines-In print :  6                       
                          (Other)            :  0                       
                          NA's               : 88                       
                           ra_text                    ra_time   
 Textbook Chapters-Online      :201   6:00 a.m.-11:59 a.m.:201  
 Journal articles-Online       :182   Noon-6:00 p.m.      :286  
 Reports-Online                :143   6:00 p.m.-11:59 p.m.:194  
 Textbook Chapters-In print    : 49   Midnight-5:59 a.m.  : 19  
 Other materials-Please specify: 11                             
 (Other)                       :  9                             
 NA's                          :105                             
           ra_disp   
 No, not at all:185  
 Yes, some     :318  
 Yes, a lot    :136  
 NA's          : 61

A. Reading for Academic Purposes

Show code

ra_data <- final_data[, c("experience", "sch_type", "ra_time", "ra_length", "ra_tv", "ra_music", "ra_talk_phone", "ra_video_game", "ra_soc_network", "ra_disp")]
ra_data <- na.omit(ra_data)

i. Perform the MCA

Show code

mca_result <- MCA(ra_data, graph = TRUE)

ii. Print the results

Create a dimension plot to visualize the positions of categories and variables in the MCA solution space. This plot helps understand the relationships between variables and identify patterns or clusters. The categories and variables that are closer together on the plot are more strongly associated.

Show code

print(mca_result)

**Results of the Multiple Correspondence Analysis (MCA)**
The analysis was performed on 606 individuals, described by 10 variables
*The results are available in the following objects:

   name              description                       
1  "$eig"            "eigenvalues"                     
2  "$var"            "results for the variables"       
3  "$var$coord"      "coord. of the categories"        
4  "$var$cos2"       "cos2 for the categories"         
5  "$var$contrib"    "contributions of the categories" 
6  "$var$v.test"     "v-test for the categories"       
7  "$ind"            "results for the individuals"     
8  "$ind$coord"      "coord. for the individuals"      
9  "$ind$cos2"       "cos2 for the individuals"        
10 "$ind$contrib"    "contributions of the individuals"
11 "$call"           "intermediate results"            
12 "$call$marge.col" "weights of columns"              
13 "$call$marge.li"  "weights of rows"

Show code

fviz_mca_var(mca_result, axes = c(1, 2), col.var = "contrib", gradient.cols = c("blue", "white", "red"))

iii. Extract the weights of the columns

These weights reflect the relative importance of each variable in the MCA analysis. Variables with higher weights have a stronger influence on the analysis, while variables with lower weights have less impact.

Show code

weights <- mca_result$call$marge.col

### iv. Create a data frame to store the weights and variable names
weights_df <- data.frame(variable = names(weights), weight = weights)

# Sort the data frame in descending order of weights
weights_df <- weights_df[order(weights_df$weight, decreasing = TRUE), ]

# Print the sorted data frame
print(weights_df)

                                 variable      weight
Urban/Suburban             Urban/Suburban 0.060561056
ra_tv_2                           ra_tv_2 0.050825083
Yes, some                       Yes, some 0.050825083
0-5 years                       0-5 years 0.045379538
ra_talk_phone_2           ra_talk_phone_2 0.044389439
Noon-6:00 p.m.             Noon-6:00 p.m. 0.043564356
ra_soc_network_2         ra_soc_network_2 0.042409241
ra_music_2                     ra_music_2 0.042244224
ra_video_game_2           ra_video_game_2 0.040594059
Rural                               Rural 0.039438944
ra_soc_network_1         ra_soc_network_1 0.036633663
ra_music_1                     ra_music_1 0.033828383
ra_talk_phone_1           ra_talk_phone_1 0.033498350
6-10 years                     6-10 years 0.033168317
ra_video_game_1           ra_video_game_1 0.030033003
No, not at all             No, not at all 0.028547855
6:00 a.m.-11:59 a.m. 6:00 a.m.-11:59 a.m. 0.027722772
6:00 p.m.-11:59 p.m. 6:00 p.m.-11:59 p.m. 0.026897690
30 minutes                     30 minutes 0.026072607
45 minutes                     45 minutes 0.022277228
60 minutes                     60 minutes 0.022112211
ra_tv_3                           ra_tv_3 0.022112211
11-years and more       11-years and more 0.021452145
Yes, a lot                     Yes, a lot 0.020627063
ra_video_game_0           ra_video_game_0 0.017986799
ra_music_3                     ra_music_3 0.016996700
ra_tv_1                           ra_tv_1 0.016501650
ra_soc_network_3         ra_soc_network_3 0.014851485
15 minutes                     15 minutes 0.013531353
ra_talk_phone_3           ra_talk_phone_3 0.012706271
ra_video_game_3           ra_video_game_3 0.011386139
ra_tv_0                           ra_tv_0 0.010561056
ra_talk_phone_0           ra_talk_phone_0 0.009405941
90 minutes                     90 minutes 0.008085809
ra_music_0                     ra_music_0 0.006930693
ra_soc_network_0         ra_soc_network_0 0.006105611
120 minutes                   120 minutes 0.004950495
Midnight-5:59 a.m.     Midnight-5:59 a.m. 0.001815182
0 minutes                       0 minutes 0.001815182
180 minutes or more   180 minutes or more 0.001155116

iv. Eigenvalues

Show code

# Extract the eigenvalues
eigenvalues <- mca_result$eig

# Round the eigenvalues to two decimal places
rounded_eigenvalues <- round(eigenvalues, 3)

# Print the rounded eigenvalues
print(rounded_eigenvalues)

       eigenvalue percentage of variance cumulative percentage of variance
dim 1       0.319                 10.625                            10.625
dim 2       0.262                  8.738                            19.363
dim 3       0.160                  5.323                            24.686
dim 4       0.153                  5.084                            29.771
dim 5       0.141                  4.703                            34.474
dim 6       0.126                  4.199                            38.672
dim 7       0.121                  4.023                            42.695
dim 8       0.114                  3.786                            46.481
dim 9       0.111                  3.688                            50.169
dim 10      0.107                  3.552                            53.721
dim 11      0.102                  3.390                            57.112
dim 12      0.100                  3.336                            60.448
dim 13      0.095                  3.169                            63.617
dim 14      0.094                  3.132                            66.749
dim 15      0.088                  2.929                            69.678
dim 16      0.086                  2.858                            72.536
dim 17      0.082                  2.739                            75.274
dim 18      0.079                  2.642                            77.916
dim 19      0.075                  2.511                            80.427
dim 20      0.071                  2.380                            82.807
dim 21      0.068                  2.252                            85.059
dim 22      0.062                  2.067                            87.126
dim 23      0.061                  2.047                            89.174
dim 24      0.057                  1.910                            91.084
dim 25      0.056                  1.868                            92.952
dim 26      0.053                  1.759                            94.711
dim 27      0.045                  1.494                            96.205
dim 28      0.045                  1.487                            97.692
dim 29      0.041                  1.372                            99.064
dim 30      0.028                  0.936                           100.000

v. Contributions of variables to dimensions (Inertia)

Assess the relationships between variables and dimensions. Variables that have higher associations (higher cosine similarity) with specific dimensions are more strongly related to those dimensions. This information can help identify the variables that contribute most to each dimension and understand the underlying patterns in the data.

Show code

var_contributions <- mca_result$var$contrib
rounded_contributions <- round(mca_result$var$contrib, 3)
# Sort the rounded contributions in descending order
# sorted_contributions <- sort(rounded_contributions, decreasing = TRUE)
# Print the sorted contributions
# print(sorted_contributions)
print(rounded_contributions)

                      Dim 1  Dim 2  Dim 3  Dim 4  Dim 5
0-5 years             0.360  0.144  0.180  1.551  6.684
6-10 years            0.144  0.390  0.003  0.285 12.151
11-years and more     1.807  0.050  0.299  1.316  0.330
Rural                 1.127  0.839  0.005  0.005  6.418
Urban/Suburban        0.734  0.546  0.003  0.004  4.179
6:00 a.m.-11:59 a.m.  0.010  0.838  4.134  0.018  5.508
Noon-6:00 p.m.        0.329  0.574  0.069  0.823  0.602
6:00 p.m.-11:59 p.m.  0.153  0.007  3.678  0.024  0.569
Midnight-5:59 a.m.    2.894  0.199  0.518 10.999  6.079
0 minutes             0.000  0.001  0.130  6.600  2.183
15 minutes            0.559  0.459  3.736  7.375  5.229
30 minutes            0.416  0.289  0.007  0.000  1.494
45 minutes            0.378  1.132  0.010  2.840  4.625
60 minutes            1.266  0.031  1.470  3.321  5.461
90 minutes            0.141  0.299  2.250  0.018  0.022
120 minutes           2.458  0.513  1.480  0.030  0.198
180 minutes or more   1.308  0.337  2.067  8.371  2.166
ra_tv_0              13.996  2.479  0.052  1.839  0.400
ra_tv_1               0.692  0.458  0.003 17.703  0.179
ra_tv_2               0.642  4.643  0.559  4.963  0.003
ra_tv_3               4.367  7.637  1.544  0.462  0.026
ra_music_0           12.763  3.897  0.242  1.691  0.246
ra_music_1            0.576  3.633 11.286  0.615  1.013
ra_music_2            0.716  1.093 14.938  0.040  0.048
ra_music_3            4.071  9.466  2.783  0.347  0.573
ra_talk_phone_0       8.663  1.311  1.024  1.884  5.228
ra_talk_phone_1       0.746  1.202  6.426  3.788  1.793
ra_talk_phone_2       1.083  2.304  6.921  1.481  0.006
ra_talk_phone_3       3.959 13.191  2.796  0.087  0.124
ra_video_game_0      15.227  3.177  0.004  0.726  0.264
ra_video_game_1       0.009  3.045  0.451  7.162  1.943
ra_video_game_2       1.805  2.320  0.006  2.943  1.456
ra_video_game_3       4.911 12.040  0.739  0.001  0.436
ra_soc_network_0      6.100  1.147  0.398  0.239  6.969
ra_soc_network_1      1.195  1.149  4.947  4.377  5.435
ra_soc_network_2      0.694  2.441  7.214  3.315  0.686
ra_soc_network_3      3.582 13.228  2.103  0.011  0.324
No, not at all        0.082  0.895  1.181  1.185  3.559
Yes, some             0.035  0.090  2.021  1.301  0.004
Yes, a lot            0.002  2.505 12.321  0.259  5.387

A factor analysis was conducted to examine the underlying dimensions of the variables. The analysis revealed five distinct dimensions (Dim 1 to Dim 5) that accounted for the variability in the data. Each dimension appears to represent a unique pattern of variables. Notably, Dim 1 showed moderate positive loadings for variables related to 11-years and more, indicating a potential relationship between this age group and other factors. Dim 2 showed a mix of positive and negative loadings for various variables, suggesting a complex relationship between them. Dim 3 exhibited higher loadings for variables related to specific time intervals, indicating a potential temporal association. Dim 4 showed a mix of loadings for different variables, suggesting a diverse pattern of associations. Dim 5 demonstrated higher loadings for variables related to ‘Yes, a lot,’ potentially indicating a strong association with this response category. Further analysis and contextual information are required to provide a comprehensive interpretation of these dimensions.

vi. Coordinates of categories

Calculate the category contributions to each dimension. Category contributions indicate the extent to which a particular category contributes to the overall structure of each dimension. Higher contributions suggest that the category has a stronger association with that dimension.

Show code

categories_coordinates <- mca_result$var$coord
round(categories_coordinates, 3)

                      Dim 1  Dim 2  Dim 3  Dim 4  Dim 5
0-5 years            -0.159  0.091  0.080  0.228 -0.456
6-10 years           -0.118 -0.176 -0.012 -0.115  0.719
11-years and more     0.518  0.078 -0.149 -0.306 -0.147
Rural                -0.302 -0.236 -0.014 -0.015  0.479
Urban/Suburban        0.197  0.154  0.009  0.009 -0.312
6:00 a.m.-11:59 a.m. -0.034  0.282  0.488  0.032 -0.529
Noon-6:00 p.m.       -0.155 -0.186 -0.050 -0.170  0.140
6:00 p.m.-11:59 p.m.  0.134 -0.025 -0.467  0.037  0.173
Midnight-5:59 a.m.    2.254  0.536  0.675  3.040  2.174
0 minutes             0.004 -0.035 -0.338  2.355  1.303
15 minutes           -0.363  0.298  0.664  0.912 -0.738
30 minutes           -0.225  0.170 -0.021 -0.002  0.284
45 minutes           -0.233 -0.365 -0.027 -0.441  0.541
60 minutes            0.427 -0.061 -0.326 -0.479 -0.590
90 minutes           -0.236 -0.312 -0.667  0.058  0.062
120 minutes           1.258  0.521  0.691 -0.096 -0.237
180 minutes or more   1.900  0.874  1.691  3.325  1.627
ra_tv_0               2.055  0.784 -0.089 -0.515 -0.231
ra_tv_1               0.366 -0.270  0.018  1.279  0.124
ra_tv_2              -0.201 -0.489 -0.132 -0.386 -0.010
ra_tv_3              -0.793  0.951  0.334  0.179  0.040
ra_music_0            2.423  1.214  0.236 -0.610  0.224
ra_music_1            0.233 -0.531 -0.730  0.167 -0.206
ra_music_2           -0.232 -0.260  0.751  0.038  0.040
ra_music_3           -0.874  1.208 -0.511 -0.177  0.218
ra_talk_phone_0       1.713  0.604  0.417 -0.553  0.886
ra_talk_phone_1       0.266 -0.307 -0.553  0.415 -0.275
ra_talk_phone_2      -0.279 -0.369  0.499 -0.226 -0.014
ra_talk_phone_3      -0.997  1.650 -0.593  0.102  0.117
ra_video_game_0       1.643  0.680 -0.020 -0.248 -0.144
ra_video_game_1      -0.030 -0.516  0.155  0.603  0.302
ra_video_game_2      -0.376 -0.387 -0.016 -0.333 -0.225
ra_video_game_3      -1.173  1.665 -0.322 -0.013  0.232
ra_soc_network_0      1.785  0.702  0.323 -0.244  1.269
ra_soc_network_1      0.322 -0.287 -0.464  0.427 -0.458
ra_soc_network_2     -0.228 -0.388  0.521 -0.345  0.151
ra_soc_network_3     -0.877  1.528 -0.476  0.033  0.175
No, not at all        0.096 -0.287 -0.257  0.252  0.419
Yes, some            -0.047 -0.068 -0.252 -0.198  0.011
Yes, a lot           -0.017  0.564  0.977  0.139 -0.607

vii. Plotting the results as a biplot

A biplot, which combines the dimension plot with the variable plot. This plot allows you to visualize both the relationships between categories and the relationships between variables in the same plot. It helps interpret the associations between categories, variables, and dimensions simultaneously. This can lead to a better understanding of the relationships and provide valuable information for further analysis or decision-making.

Show code

plot.MCA(mca_result, cex = 0.8)

ix. Calculate Chi-square Statistics

Show code

eigenvalues_m <- mca_result$eig
chi_square_stats <- eigenvalues_m^2
df <- 1 # Degrees of freedom for each dimension
p_values <- 1 - pchisq(chi_square_stats, df)


for (i in seq_along(chi_square_stats)) {
  cat("Dimension", i, "Chi-square:", chi_square_stats[i], "df:", df, "p-value:", p_values[i], "\n")
}

Dimension 1 Chi-square: 0.1016094 df: 1 p-value: 0.7499068 
Dimension 2 Chi-square: 0.06871698 df: 1 p-value: 0.7932141 
Dimension 3 Chi-square: 0.02550076 df: 1 p-value: 0.8731256 
Dimension 4 Chi-square: 0.02326633 df: 1 p-value: 0.8787665 
Dimension 5 Chi-square: 0.01990572 df: 1 p-value: 0.8878007 
Dimension 6 Chi-square: 0.01586628 df: 1 p-value: 0.8997625 
Dimension 7 Chi-square: 0.01456391 df: 1 p-value: 0.9039437 
Dimension 8 Chi-square: 0.01290278 df: 1 p-value: 0.9095625 
Dimension 9 Chi-square: 0.01223808 df: 1 p-value: 0.911913 
Dimension 10 Chi-square: 0.0113574 df: 1 p-value: 0.9151292 
Dimension 11 Chi-square: 0.01034544 df: 1 p-value: 0.9189848 
Dimension 12 Chi-square: 0.01001456 df: 1 p-value: 0.9202866 
Dimension 13 Chi-square: 0.009039286 df: 1 p-value: 0.9242551 
Dimension 14 Chi-square: 0.008827199 df: 1 p-value: 0.9251464 
Dimension 15 Chi-square: 0.007721398 df: 1 p-value: 0.9299788 
Dimension 16 Chi-square: 0.007351747 df: 1 p-value: 0.9316713 
Dimension 17 Chi-square: 0.006750329 df: 1 p-value: 0.9345192 
Dimension 18 Chi-square: 0.006280084 df: 1 p-value: 0.9368362 
Dimension 19 Chi-square: 0.005676406 df: 1 p-value: 0.9399427 
Dimension 20 Chi-square: 0.005097895 df: 1 p-value: 0.9430798 
Dimension 21 Chi-square: 0.004563512 df: 1 p-value: 0.9461409 
Dimension 22 Chi-square: 0.003845377 df: 1 p-value: 0.950554 
Dimension 23 Chi-square: 0.00377282 df: 1 p-value: 0.9510221 
Dimension 24 Chi-square: 0.003284831 df: 1 p-value: 0.9542955 
Dimension 25 Chi-square: 0.003140405 df: 1 p-value: 0.9553105 
Dimension 26 Chi-square: 0.002784244 df: 1 p-value: 0.9579184 
Dimension 27 Chi-square: 0.002008099 df: 1 p-value: 0.9642573 
Dimension 28 Chi-square: 0.001990375 df: 1 p-value: 0.9644153 
Dimension 29 Chi-square: 0.001695185 df: 1 p-value: 0.9671583 
Dimension 30 Chi-square: 0.0007883342 df: 1 p-value: 0.9776005 
Dimension 31 Chi-square: 112.8993 df: 1 p-value: 0 
Dimension 32 Chi-square: 76.3522 df: 1 p-value: 0 
Dimension 33 Chi-square: 28.33418 df: 1 p-value: 1.020777e-07 
Dimension 34 Chi-square: 25.85147 df: 1 p-value: 3.687233e-07 
Dimension 35 Chi-square: 22.11747 df: 1 p-value: 2.564658e-06 
Dimension 36 Chi-square: 17.6292 df: 1 p-value: 2.684357e-05 
Dimension 37 Chi-square: 16.18212 df: 1 p-value: 5.753452e-05 
Dimension 38 Chi-square: 14.33642 df: 1 p-value: 0.0001528783 
Dimension 39 Chi-square: 13.59787 df: 1 p-value: 0.0002264426 
Dimension 40 Chi-square: 12.61933 df: 1 p-value: 0.0003817777 
Dimension 41 Chi-square: 11.49494 df: 1 p-value: 0.0006978603 
Dimension 42 Chi-square: 11.12729 df: 1 p-value: 0.0008506695 
Dimension 43 Chi-square: 10.04365 df: 1 p-value: 0.001528739 
Dimension 44 Chi-square: 9.807999 df: 1 p-value: 0.001737544 
Dimension 45 Chi-square: 8.579331 df: 1 p-value: 0.003400003 
Dimension 46 Chi-square: 8.168607 df: 1 p-value: 0.004262162 
Dimension 47 Chi-square: 7.500366 df: 1 p-value: 0.006168646 
Dimension 48 Chi-square: 6.977871 df: 1 p-value: 0.008252371 
Dimension 49 Chi-square: 6.307118 df: 1 p-value: 0.01202541 
Dimension 50 Chi-square: 5.664328 df: 1 p-value: 0.01731335 
Dimension 51 Chi-square: 5.070568 df: 1 p-value: 0.0243354 
Dimension 52 Chi-square: 4.272641 df: 1 p-value: 0.03873068 
Dimension 53 Chi-square: 4.192022 df: 1 p-value: 0.04061463 
Dimension 54 Chi-square: 3.649812 df: 1 p-value: 0.05607554 
Dimension 55 Chi-square: 3.489339 df: 1 p-value: 0.06176523 
Dimension 56 Chi-square: 3.093604 df: 1 p-value: 0.07860054 
Dimension 57 Chi-square: 2.231221 df: 1 p-value: 0.1352469 
Dimension 58 Chi-square: 2.211528 df: 1 p-value: 0.1369829 
Dimension 59 Chi-square: 1.883539 df: 1 p-value: 0.1699325 
Dimension 60 Chi-square: 0.8759269 df: 1 p-value: 0.3493197 
Dimension 61 Chi-square: 112.8993 df: 1 p-value: 0 
Dimension 62 Chi-square: 374.9406 df: 1 p-value: 0 
Dimension 63 Chi-square: 609.4168 df: 1 p-value: 0 
Dimension 64 Chi-square: 886.3007 df: 1 p-value: 0 
Dimension 65 Chi-square: 1188.438 df: 1 p-value: 0 
Dimension 66 Chi-square: 1495.557 df: 1 p-value: 0 
Dimension 67 Chi-square: 1822.875 df: 1 p-value: 0 
Dimension 68 Chi-square: 2160.528 df: 1 p-value: 0 
Dimension 69 Chi-square: 2516.93 df: 1 p-value: 0 
Dimension 70 Chi-square: 2885.987 df: 1 p-value: 0 
Dimension 71 Chi-square: 3261.758 df: 1 p-value: 0 
Dimension 72 Chi-square: 3653.907 df: 1 p-value: 0 
Dimension 73 Chi-square: 4047.089 df: 1 p-value: 0 
Dimension 74 Chi-square: 4455.363 df: 1 p-value: 0 
Dimension 75 Chi-square: 4854.962 df: 1 p-value: 0 
Dimension 76 Chi-square: 5261.418 df: 1 p-value: 0 
Dimension 77 Chi-square: 5666.222 df: 1 p-value: 0 
Dimension 78 Chi-square: 6070.884 df: 1 p-value: 0 
Dimension 79 Chi-square: 6468.547 df: 1 p-value: 0 
Dimension 80 Chi-square: 6857.042 df: 1 p-value: 0 
Dimension 81 Chi-square: 7235.042 df: 1 p-value: 0 
Dimension 82 Chi-square: 7590.955 df: 1 p-value: 0 
Dimension 83 Chi-square: 7951.919 df: 1 p-value: 0 
Dimension 84 Chi-square: 8296.291 df: 1 p-value: 0 
Dimension 85 Chi-square: 8640.066 df: 1 p-value: 0 
Dimension 86 Chi-square: 8970.14 df: 1 p-value: 0 
Dimension 87 Chi-square: 9255.315 df: 1 p-value: 0 
Dimension 88 Chi-square: 9543.662 df: 1 p-value: 0 
Dimension 89 Chi-square: 9813.694 df: 1 p-value: 0 
Dimension 90 Chi-square: 10000 df: 1 p-value: 0

viii. Single Dimension Plot and Corresponding Variable Contribution

The red dotted line provides a visual reference point for interpreting the significance of variable contributions in relation to the expected distribution. Variables with contributions significantly above or below this line are the ones that are contributing the most to the variation along the specific dimension.

Above the Red Dotted Line: If a variable’s contribution bar is above the red dotted line, it indicates that the observed distribution of that variable’s categories is contributing more than expected to the variation in the specific dimension. In other words, the variable’s category distribution is not uniform across levels of the dimension and is contributing disproportionately to the dimension.
Below the Red Dotted Line: Conversely, if a variable’s contribution bar is below the red dotted line, it suggests that the variable’s category distribution is contributing less than expected to the variation in the dimension. This indicates that the variable’s category distribution is more uniform across levels of the dimension.

Show code

library(factoextra)

# Extract the variable contributions for each dimension
var_contributions <- mca_result$var$contrib

# Create separate plots for each dimension
for (dim in 1:5) {
  plot_title <- paste("Contribution of Variables to Dimension", dim)

  # Create the contribution plot for the current dimension
  p <- fviz_contrib(mca_result, choice = "var", axes = c(dim), gradient.cols = c("blue", "white", "red")) +
    ggtitle(plot_title)

  # Display the plot
  print(p)
}