Biostatistics Project

Abstract

Introduction & Background

Methods

Data Presentation

Demographic and Socioeconomic Characteristics of the Study Population

The table summarizes the demographic and socioeconomic characteristics of the study population, categorized into two groups: Group 1 (Infants, N = 348) and Group 2 (Children, N = 1,341). Key variables include child and parent-related information such as age, sex, education level, and residence.

Notable observations include:

1.Minimal missing data for most variables in Group 1, with missing percentages generally below 11%. 2.Group 2 has higher percentages of missing data, particularly for variables such as “Parent’s Education” (15.1%) and “Religiosity” (28.1%). 3.Overall, the data provides a clear socio-demographic profile of the two distinct groups, with higher data completeness in the infant group.

This section highlights the completeness and differences in key socio-demographic variables for robust subgroup comparisons in subsequent analyses.

Measure Group 1: Infants Group 2: Children
(N=348) (N=1341)
Child’s Sex 0.56 (0.00 - 1.00) 0.50 (0.00 - 2.00)
Missing 0 (0%) 23 (1.7%)
Child’s Age (Years) 0.10 (0.00 - 3.00) 7.88 (1.00 - 18.00)
Missing 0 (0%) 31 (2.3%)
Birth Order 1.63 (1.00 - 2.00) 1.51 (1.00 - 2.00)
Missing 1 (0.3%) 23 (1.7%)
Child’s Diagnosis 0.05 (0.00 - 1.00) 0.15 (0.00 - 1.00)
Missing 3 (0.9%) 30 (2.2%)
Parent’s Education 16.14 (4.00 - 25.00) 16.82 (10.00 - 30.00)
Missing 38 (10.9%) 203 (15.1%)
Parent’s Age 33.73 (21.00 - 50.00) 39.70 (22.00 - 64.00)
Missing 34 (9.8%) 200 (14.9%)
Parent’s Sex 1.00 (1.00 - 1.00) 0.92 (0.00 - 2.00)
Missing 0 (0%) 195 (14.5%)
Parent’s Chronic Illness 0.09 (0.00 - 1.00) 0.12 (0.00 - 1.00)
Missing 36 (10.3%) 194 (14.5%)
Residence 2.64 (1.00 - 6.00) 2.79 (1.00 - 6.00)
Missing 37 (10.6%) 200 (14.9%)
Marital Status 0.97 (0.00 - 1.00) 0.94 (0.00 - 1.00)
Missing 34 (9.8%) 195 (14.5%)
Religion 1.28 (1.00 - 5.00) 1.23 (1.00 - 5.00)
Missing 36 (10.3%) 192 (14.3%)
Jewish/Arab Identity 0.15 (0.00 - 1.00) 0.17 (0.00 - 1.00)
Missing 31 (8.9%) 192 (14.3%)
Religiosity 1.63 (1.00 - 4.00) 1.66 (1.00 - 4.00)
Missing 78 (22.4%) 377 (28.1%)

Overview of Questionnaire Results for Group 1: Infants

The table provides a detailed summary of questionnaire results for Group 1 (Infants, N = 348). Each measure is represented by its mean value along with the minimum and maximum range, as well as the number and percentage of missing values.

Key insights include:

1.War Exposure (Sum) and Mean Danger measures indicate moderate exposure levels, with means of 1.98 and 2.91, respectively, and minimal missing data (7.76% each). 2.Burnout, Depression, and Anxiety reflect emotional and psychological conditions, with moderate mean values of 2.06, 4.47, and 3.02, respectively, and low missing percentages (all below 7.5%). 3.Co-Parenting Support and Support exhibit relatively high mean scores of 4.37 and 2.55, suggesting positive perceptions in these domains, with missing data percentages below 7.5%.

This summary highlights the completeness and variability of questionnaire responses for Group 1, enabling meaningful interpretations of relevant measurements in the context of the study objectives.

Overall (N = 348)
Measure Mean (Min - Max) Missing (Count, %)
Total ACE 0.84 (0.00 - 8.00) 32 (9.20%)
War Exposure (Sum) 1.98 (0.00 - 8.00) 27 (7.76%)
Total Violence 0.51 (0.00 - 3.00) 29 (8.33%)
Mean Danger 2.91 (1.00 - 5.00) 27 (7.76%)
Burnout 2.06 (1.00 - 4.80) 26 (7.47%)
Depression 4.47 (0.00 - 19.00) 24 (6.90%)
Anxiety 3.02 (0.00 - 20.00) 24 (6.90%)
Co-Parenting Support 4.37 (1.00 - 5.00) 25 (7.18%)
Support 2.55 (1.00 - 5.00) 26 (7.47%)
CSE 3.86 (1.33 - 5.00) 31 (8.91%)
Severe Exposure (Binary) 0.38 (0.00 - 1.00) 27 (7.76%)
Mefunim Score 2.81 (1.00 - 3.00) 34 (9.77%)
Child Current War 1.76 (1.00 - 4.00) 30 (8.62%)
EA INT 3.24 (1.00 - 4.67) 0 (0.00%)
EA Affect 4.61 (1.60 - 5.00) 2 (0.57%)
BPSC (Sum) 1.02 (0.00 - 3.00) 0 (0.00%)

Overview of Questionnaire Results for Group 2: Children

The table provides a detailed summary of questionnaire results for Group 2 (Children, N = 1341). Each measure is represented by its mean value along with the minimum and maximum range, as well as the number and percentage of missing values.

Key insights include:

1.War Exposure (Sum) and Mean Danger measures show moderate levels, with means of 2.04 and 2.82, respectively, and relatively low missing data (11.11% each). 2.Co-Parenting Support demonstrates a moderately high mean score of 4.13, with missing data of 11.93%. 3.Behavioral communication measures such as Avoidant Communications, Active Communications, and Overwhelmed Communications have mean scores of 3.02, 2.40, and 1.98, respectively, with minimal missing values (below 4% for all). 4.Total ACE and Total Violence measures show low mean scores of 0.94 and 0.53, with missing data percentages of 13.87% and 13.05%, respectively.

This table provides a comprehensive overview of questionnaire results for Group 2, highlighting the variability in responses and the minimal data loss for most measures, enabling reliable analysis of children’s responses within the study context.

Overall (N = 1341)
Measure Mean (Min - Max) Missing (Count, %)
Total ACE 0.94 (0.00 - 10.00) 186 (13.87%)
War Exposure (Sum) 2.04 (0.00 - 10.00) 149 (11.11%)
Total Violence 0.53 (0.00 - 4.00) 175 (13.05%)
Mean Danger 2.82 (1.00 - 5.00) 149 (11.11%)
Burnout 2.22 (1.00 - 5.00) 139 (10.37%)
Depression 6.00 (0.00 - 21.00) 135 (10.07%)
Anxiety 3.79 (0.00 - 21.00) 136 (10.14%)
Co-Parenting Support 4.13 (1.00 - 5.00) 160 (11.93%)
Avoidant Communications 3.02 (1.00 - 5.00) 49 (3.65%)
Active Communications 2.40 (1.00 - 5.00) 54 (4.03%)
Overwhelmed Communications 1.98 (1.00 - 5.00) 52 (3.88%)

Distribution of Missing Data Across Observations

measures. Each row represents an observation, and each column corresponds to a measure, with missing values denoted in black and present values in gray. Overall, 37.4% of the data is missing, while 62.6% is complete. The visualization highlights patterns of missingness, showing that some variables have a consistent proportion of missing data, whereas others exhibit sporadic gaps. Additionally, certain observations show significantly higher missingness compared to the rest, which could have implications for subsequent analyses. These insights will be carefully considered and addressed before proceeding with model generation to ensure the validity and robustness of the results.

Distribution of Missing Values Per Record

This histogram illustrates the distribution of the number of missing values per record across the dataset. The x-axis represents the number of missing values per record, while the y-axis displays the frequency of records within each bin. The distribution shows a distinct peak around 100 missing values, indicating that a large proportion of records have similar levels of missingness. However, there are notable clusters at higher numbers of missing values, suggesting that certain records have a significantly higher degree of incompleteness. These variations in missingness highlight the need for careful handling during preprocessing to mitigate potential biases and ensure robust modeling.

Descriptive Statistics

Handling Missing Data and Configuring Measurements

To improve the reliability and usability of the dataset, a robust missing data handling process was implemented. Records with more than 100 missing values were excluded, reducing the dataset from 1,689 to 1,480 observations. This filtering step ensures that analyses are performed on a dataset with fewer missing values, minimizing biases and improving the validity of statistical models. By focusing on observations with acceptable levels of completeness, the dataset becomes more representative and reliable for subsequent analyses.

Additionally, new measurements and configurations were introduced to enhance the interpretability and alignment of the dataset with the study’s objectives. Key variables, such as “Dep” (depression severity), “Anx” (anxiety severity), and “sdq_problems” (behavioral problems), were categorized into meaningful labels. For example, “Dep” was segmented into five categories: Normal, Mild, Moderate, Severe, and Extremely Severe. Similarly, binary variables were created for depression and anxiety to identify severe cases with specific cutoff thresholds, such as dep_binary (severe depression) and anx_binary (severe anxiety).

In addition, behavioral problem metrics were refined, including the creation of SDQ_child_behavior_problem to classify child behavior based on severity and infant_behavior_problems to identify high-risk infants. These configurations enhance the granularity of the dataset, allowing for more nuanced analyses and facilitating targeted insights. By addressing missing data and configuring key variables, the dataset is now well-prepared for robust statistical modeling and hypothesis testing.

Correlation Analysis - Infant Group

The correlation matrix for the infant group highlights relationships among various socio-demographic, behavioral, and psychological variables. Strong positive correlations are observed between parental education (p_edu) and coparenting support, indicating that higher educational attainment is associated with better perceived co-parenting dynamics. Similarly, there is a notable positive correlation between infant behavior problems and parental stress measures (e.g., Burnout and Depression (Dep)), suggesting that caregiver stress may be linked to behavioral challenges in infants.

Negative correlations are evident between child exposure to adverse events (ACE_TOTAL) and coparenting support, reflecting that adverse experiences might be mitigated in environments with stronger parental collaboration. Additionally, variables such as religion and Jewish/Arab identity (JewishArab) display weak correlations with other socio-demographic measures, indicating these factors might play a limited role in influencing key behavioral or stress-related outcomes within this group.

This analysis underscores the interplay between caregiver well-being, educational background, and infant behavioral measures, providing insights that may guide targeted interventions for supporting infant development in high-stress environments.

Correlation Analysis - Child Group

The correlation matrix for the child group provides a detailed view of the relationships between socio-demographic, psychological, and behavioral variables. A strong positive correlation is observed between parental education (p_edu) and coparenting support, consistent with findings in the infant group. This highlights the importance of educational background in fostering positive parental dynamics. Additionally, there is a noticeable correlation between child behavioral problems (sdq_problem_binary) and both parental stress measures (e.g., Depression (Dep) and Anxiety (Anx)), suggesting that child behavior challenges are closely linked to caregiver stress levels.

Notable correlations also exist between child exposure to adverse events (ACE_TOTAL) and measures such as war exposure (war_exposure_sum) and violence exposure (Violence_TOTAL), reflecting the cumulative impact of these factors on the child’s environment. Weak negative correlations are evident between coparenting support and stress-related measures, indicating that stronger parental collaboration may act as a buffer against stress.

Other weak correlations between socio-demographic variables, such as religion and Jewish/Arab identity (JewishArab), suggest limited influence on behavioral or psychological outcomes. This matrix underscores the intricate interplay of environmental, parental, and behavioral factors in shaping the outcomes for children in this group, offering insights for targeted interventions aimed at improving child well-being.

1. Infant Behavior Problems Model

The Random Forest model was applied to predict infant behavior problems using a comprehensive set of predictors related to the infant’s socio-demographic, health, and environmental factors. The results, visualizations, and evaluation metrics provide valuable insights into the model’s performance and the importance of predictors.

Key Evaluation Metrics:

  1. Confusion Matrix: The model correctly identified 2 true positives (infants with behavior problems) and 54 true negatives (infants without behavior problems). However, it misclassified 2 false positives and 1 false negative.
  2. Sensitivity: The model achieved a sensitivity of 0.5, reflecting its ability to correctly identify 50% of the true positives (infants with behavior problems).
  3. Specificity: A specificity of 0.98 demonstrates that 98% of infants without behavior problems were correctly classified.
  4. AUC (Area Under the Curve): The AUC score of 0.92 signifies an excellent level of discrimination between infants with and without behavior problems.

Visualization Insights:

ROC Curve: The ROC curve illustrates the trade-off between sensitivity and specificity across various classification thresholds. The AUC score of 0.92 highlights the model’s strong ability to distinguish between classes, demonstrating significant reliability.

Variable Importance: 1. Violence_TOTAL emerges as the most important predictor, indicating a strong association between exposure to violence and infant behavior problems. 2. Socio-demographic factors, such as JewishArab and residence, play critical roles in the model’s predictions. 3. Psychological measures, such as Depression (Dep) and EA Affect Quality (EA_affect_q), also show high importance. 4. Parental and environmental factors, including coparenting support and ACE_TOTAL, contribute substantially to the model’s performance. 5. Additional measures like danger mean and war exposure are moderately significant, underscoring their influence on infant behavior.

Interpretation:

The results suggest that socio-demographic factors, psychological indicators, and environmental exposures are key determinants of infant behavior problems. The model’s high specificity demonstrates its reliability in identifying infants without behavior problems, while the sensitivity indicates room for improvement in identifying infants with behavior problems.

Conclusion:

The Random Forest model provides meaningful insights into the predictors of infant behavior problems, with strong performance metrics and visualizations highlighting critical factors. Future work may focus on balancing sensitivity and specificity while further exploring additional predictors to enhance its classification capabilities.

##          Actual
## Predicted  0  1
##         0 54  2
##         1  1  2
## Model for infant_behavior_problems 
## Sensitivity: 0.5 
## Specificity: 0.98
## AUC: 0.92

2. Infant Depression Problems Model

The Random Forest model was applied to predict infant depression problems using a wide range of predictors encompassing socio-demographic, health, and environmental factors. The following evaluation metrics and visualizations provide key insights into the model’s performance and the contributing predictors.

Key Evaluation Metrics:

  1. Confusion Matrix: The confusion matrix reveals that the model correctly identified 53 true negatives (infants without depression problems) and 1 true positive (infants with depression problems). However, it resulted in 4 false negatives and 0 false positives.
  2. Sensitivity: The model achieved a sensitivity of 0.2, indicating its ability to correctly identify 20% of infants with depression problems.
  3. Specificity: A perfect specificity of 1 demonstrates the model’s capability to classify all true negatives accurately.
  4. AUC (Area Under the Curve): The AUC score of 0.79 reflects moderate discrimination between infants with and without depression problems.

Visualization Insights:

ROC Curve: The ROC curve illustrates the trade-off between sensitivity and specificity across classification thresholds. The AUC score of 0.79 highlights a reasonable level of model performance. However, the sensitivity remains limited, requiring further improvements.

Variable Importance: The variable importance plots identify the predictors significantly influencing the model’s performance. Anxiety (“Anx”) emerges as the most critical predictor, underlining its association with depression problems in infants. Other notable predictors include coparenting support, ACE_TOTAL, EA Affect Quality (“EA_affect_q”), and child’s age (“c_age”), emphasizing the role of familial, psychological, and developmental factors. Socio-demographic measures, such as parent’s education (“p_edu”), and environmental exposures, including violence and danger mean, also contribute to the predictions, reflecting the multifaceted influences on infant depression problems.

Interpretation:

The model underscores the complexity of factors contributing to infant depression problems, including psychological, environmental, and socio-demographic determinants. Despite its strong specificity, the model’s sensitivity indicates challenges in identifying true positives. This limitation calls for further refinement and potential inclusion of additional predictors to improve its classification capabilities.

Conclusion:

The Random Forest model provides valuable insights into the predictors of infant depression problems. While its specificity is robust and the identified predictors are insightful, enhancing sensitivity remains critical to improving its overall performance. Future adjustments should focus on fine-tuning the model and leveraging additional data sources to optimize accuracy.

##          Actual
## Predicted  0  1
##         0 53  4
##         1  0  1
## Model for dep_binary 
## Sensitivity: 0.2 
## Specificity: 1
## AUC (Area Under Curve): 0.79

3. Infant Anxiety Problems Model

The Random Forest model was developed to predict anxiety problems in infants using an extensive set of socio-demographic, behavioral, and environmental predictors. The evaluation metrics, visualization results, and feature importance provide meaningful insights into the model’s behavior and predictive power.

Key Evaluation Metrics:

  1. Confusion Matrix: The model correctly identified 2 true positives (infants with anxiety problems) and 53 true negatives (infants without anxiety problems). It misclassified 1 instance as a false positive and 3 instances as false negatives.
  2. Sensitivity: The sensitivity score is 0.4, indicating that the model identified 40% of true positives, which is an improvement but still leaves room for better recognition of anxiety problems in infants.
  3. Specificity: A specificity of 0.98 demonstrates the model’s strong reliability in correctly classifying infants without anxiety problems.
  4. AUC (Area Under the Curve): The AUC value of 0.74 reflects a moderate discriminative ability between infants with and without anxiety problems, showcasing a balanced but not optimal model performance.

Visualization Insights:

ROC Curve: The ROC curve depicts the trade-off between sensitivity and specificity across different classification thresholds. The AUC score of 0.74 indicates moderate performance, with potential for improvement in better balancing sensitivity and specificity.

Variable Importance: 1. Depression (“Dep”) emerges as the most influential predictor for infant anxiety problems, underlining the interconnection between depression and anxiety symptoms in early life. 2. Adverse Childhood Experiences (“ACE_TOTAL”) and co-parenting support are also key contributors, highlighting the significance of early life stressors and parental dynamics. 3. Other important predictors include pregnancy-related factors, environmental exposures (such as child current war exposure), and socio-demographic variables like parental age (“p_age”). 4. Behavioral predictors, including **Danger Mean (“Danger__mean”)** and Violence Total (“Violence_TOTAL”), play a moderate role, reflecting the impact of external stressors on anxiety development.

Interpretation:

The model emphasizes the complex interplay of psychological, environmental, and socio-demographic factors in shaping anxiety outcomes in infants. While the high specificity ensures reliable classification of infants without anxiety problems, the moderate sensitivity suggests a need for better identification of positive cases. The importance of depression, adverse childhood experiences, and co-parenting support underscores the significance of addressing both familial and environmental factors.

Conclusion:

The Random Forest model provides valuable insights into the determinants of anxiety problems in infants, demonstrating moderate predictive capability. To enhance the model’s performance, future work could focus on improving sensitivity through feature engineering, exploring additional data sources, or testing alternative machine learning approaches. These steps would ensure a more comprehensive understanding of the predictors influencing infant anxiety problems.

##          Actual
## Predicted  0  1
##         0 53  3
##         1  1  2
## Model for anx_binary 
## Sensitivity: 0.4 
## Specificity: 0.98
## AUC (Area Under Curve): 0.74

4. Child Behavior Problems Model

The Random Forest model was applied to predict child behavior problems using socio-demographic, psychological, and environmental predictors. The model’s evaluation metrics and visualizations provide critical insights into its performance.

Key Evaluation Metrics

  1. Confusion Matrix: The model correctly classified 122 true negatives (children without behavior problems) and 23 true positives (children with behavior problems). However, it misclassified 21 false negatives and 31 false positives.
  2. Sensitivity: The model achieved a sensitivity of 0.43, indicating that it correctly identified 43% of the children with behavior problems.
  3. Specificity: A specificity of 0.85 demonstrates the model’s ability to correctly classify 85% of children without behavior problems.
  4. AUC (Area Under the Curve): The AUC score of 0.69 reflects a moderate level of discrimination between children with and without behavior problems.

Visualization Insights

ROC Curve: The ROC curve illustrates the trade-off between sensitivity and specificity across various classification thresholds. The AUC score of 0.69 signifies moderate model performance in distinguishing between children with and without behavior problems.

Variable Importance: The importance plots highlight the predictors that significantly contributed to the model’s performance. Depression (Dep) and Anxiety (Anx) are the most critical predictors, underlining the impact of psychological factors on behavior problems. Other significant contributors include war exposure sum, coparenting support, and overwhelmed communication, emphasizing the role of environmental stressors and parental support. Socio-demographic factors, such as child’s age (c_age_years), parental age (p_age), and educational setting, also play moderate roles in predicting outcomes.

Interpretation

The results reveal that psychological factors, such as depression and anxiety, along with environmental and parental influences, are key determinants of child behavior problems. The model’s strong specificity indicates its reliability in identifying children without behavior problems, while the moderate sensitivity highlights areas for improvement in detecting behavior issues.

Conclusion

The Random Forest model offers valuable insights into the predictors of child behavior problems. Although the model demonstrates robust specificity, its moderate sensitivity suggests the need for refinement to enhance its ability to identify children with behavior issues. Future efforts could focus on data augmentation, feature selection, and exploring alternative modeling approaches to improve the overall classification performance.

##          Actual
## Predicted   0   1
##         0 122  31
##         1  21  23
## Model for sdq_problem_binary 
## Sensitivity: 0.43 
## Specificity: 0.85
## AUC (Area Under Curve): 0.69

5. Child Depression Problems Model

The Random Forest model was applied to predict child depression problems using a range of socio-demographic, behavioral, and environmental predictors. The model’s performance was evaluated using various metrics, and key predictors were identified.

Key Evaluation Metrics:

  1. Confusion Matrix: True Positives: 18 (correctly identified children with depression problems). True Negatives: 152 (correctly identified children without depression problems). False Positives: 18 (incorrectly identified children with depression problems). False Negatives: 14 (missed cases of children with depression problems).
  2. Sensitivity: The model achieved a sensitivity of 0.5, indicating it correctly identified 50% of children with depression problems.
  3. Specificity: The specificity was 0.92, showing 92% of children without depression problems were accurately identified.
  4. AUC (Area Under the Curve): The AUC score of 0.78 reflects moderate discriminatory power between children with and without depression problems.

Visualization Insights:

ROC Curve: The ROC curve demonstrates the trade-off between sensitivity and specificity across various classification thresholds. The AUC of 0.78 indicates the model’s capability to effectively distinguish between the two classes. Variable Importance: The variable importance plots highlight the most significant predictors influencing the model’s performance: Anxiety (Anx) was the most influential predictor, emphasizing its strong association with depression problems. Danger Mean (Danger_mean) and Overwhelmed Communication (Overwhelmed_comm) also emerged as critical predictors, highlighting the impact of environmental and communication factors. Socio-demographic factors such as coparenting support, CSE, and parental age (p_age) were moderately important. Exposure-related variables like violence exposure (Violence_TOTAL) and ACE_TOTAL contributed to the predictions but with less weight.

Interpretation:

The model’s high specificity underscores its reliability in identifying children without depression problems, while its sensitivity highlights the need for improvement in detecting affected cases. The findings underline the importance of anxiety, communication factors, and environmental stressors as critical predictors of depression in children, providing actionable insights for targeted interventions.

Conclusion:

The Random Forest model provides meaningful insights into the determinants of child depression problems, with moderate performance metrics and significant variable importance findings. Further enhancements, such as optimizing sensitivity and exploring additional predictors, could refine the model’s accuracy and applicability.

##          Actual
## Predicted   0   1
##         0 152  18
##         1  14  18
## Model for dep_binary 
## Sensitivity: 0.5 
## Specificity: 0.92
## AUC (Area Under Curve): 0.78

6. Child Anxiety Problems Model

The Random Forest model aimed to predict child anxiety problems using a diverse set of predictors encompassing socio-demographic, health, and environmental factors. The results, visualizations, and evaluation metrics provide insight into the model’s performance and the influence of various predictors.

Key Evaluation Metrics

  1. Confusion Matrix:
    True Positives: 15 (children with anxiety problems correctly identified)
    True Negatives: 153 (children without anxiety problems correctly identified)
    False Negatives: 16
    False Positives: 17
  2. Sensitivity: The model achieved a sensitivity of 0.48, indicating it correctly identified 48% of the true positive cases.
  3. Specificity: Specificity was 0.90, meaning the model correctly classified 90% of children without anxiety problems.
  4. AUC (Area Under the Curve):The AUC score of 0.80 suggests the model has a strong ability to distinguish between children with and without anxiety problems.

Visualization Insights

ROC Curve:
The ROC curve demonstrates the trade-off between sensitivity and specificity at different classification thresholds. An AUC of 0.80 indicates the model performs well in distinguishing between the two classes.

Variable Importance:
The variable importance plots reveal that key predictors include:
Depression (Dep): A significant contributor to the prediction of child anxiety problems.
Danger Mean (Danger_mean): Reflects the perceived danger, strongly associated with anxiety.
Overwhelmed Communication (Overwhelmed_comm) and Co-parenting Support (coparenting_support): Indicators of emotional and social support. ACE_TOTAL (Adverse Childhood Experiences): Highlights the influence of cumulative childhood adversity. Other noteworthy predictors include Active_comm, **War Exposure (war_exposure__sum), and CSE**, showcasing the role of environmental and socio-demographic factors.

Interpretation

The model identifies key contributors to child anxiety, with significant predictors spanning emotional, social, and environmental domains. The high specificity underscores the model’s reliability in identifying children without anxiety problems, while moderate sensitivity suggests room for improvement in detecting cases of anxiety.

Conclusion

This Random Forest model highlights the multifaceted nature of child anxiety problems, emphasizing the interplay of adverse experiences, socio-demographic factors, and support systems. Further refinement of the model and inclusion of additional predictors could enhance its sensitivity and overall performance.

##          Actual
## Predicted   0   1
##         0 153  16
##         1  17  15
## Model for anx_binary 
## Sensitivity: 0.48 
## Specificity: 0.9
## AUC (Area Under Curve): 0.8

Conclusions

This study evaluated multiple Random Forest models to investigate the relationships between various socio-demographic, psychological, behavioral, and environmental predictors and behavioral and emotional problems in infants and children. The models were rigorously developed, with missing data and imbalances in the dataset addressed using techniques such as SMOTE to enhance the reliability and robustness of the analysis.

Key Findings

Infant Behavior and Emotional Problems 1. Infant Behavior Problems: The model demonstrated high specificity (0.98), indicating strong performance in identifying infants without behavior problems. Key predictors included violence exposure, identity (JewishArab), and depression, highlighting the role of environmental stressors and socio-demographic factors. 2. Infant Depression Problems: The model showed moderate discriminative power (AUC: 0.79) but struggled with sensitivity (0.20), limiting its ability to detect true cases. Anxiety, co-parenting support, and cumulative adverse experiences (ACE_TOTAL) emerged as significant predictors. 3. Infant Anxiety Problems: The model achieved a moderate AUC of 0.74, with anxiety and depression identified as the strongest predictors. Environmental exposures, such as danger perception and war exposure, also played important roles.

Child Behavior and Emotional Problems 1. Child Behavior Problems: The model achieved a moderate AUC of 0.71, with depression, co-parenting support, and communication factors contributing significantly to predictions. While specificity was relatively high (0.88), sensitivity was lower (0.38), indicating some challenges in identifying children with behavior problems. 2. Child Depression Problems: The model showed strong specificity (0.92) and an AUC of 0.78, with anxiety, danger perception, and overwhelmed communication emerging as key predictors. Environmental and socio-demographic factors also moderately influenced predictions. 3. Child Anxiety Problems: This model achieved an AUC of 0.83, with high specificity (0.90) and moderate sensitivity (0.48). Depression, danger perception, and adverse childhood experiences were the most influential predictors, emphasizing the interplay of psychological and environmental factors.

Overall Interpretation The findings underscore the multifaceted nature of behavioral and emotional problems in infants and children, with psychological, socio-demographic, and environmental factors playing pivotal roles across all models. While the models performed well in identifying individuals without problems (high specificity), sensitivity was generally lower, reflecting challenges in detecting true positive cases. This imbalance suggests the need for further refinement in predictive modeling approaches.

Predictive Insights Key Predictors: Across all models, depression, anxiety, co-parenting support, danger perception, and adverse childhood experiences consistently emerged as significant contributors, highlighting their importance in predicting both behavioral and emotional outcomes. Model Performance: The models demonstrated moderate to strong discriminative power (AUC ranging from 0.71 to 0.83), with specificity consistently higher than sensitivity. Complex Interactions: The interplay between psychological, environmental, and socio-demographic factors underscores the complexity of predicting behavioral and emotional problems, emphasizing the need for holistic approaches.

Implications and Future Directions These results provide meaningful insights into the determinants of behavioral and emotional problems in children and infants. The strong performance in specificity suggests these models can reliably identify individuals without problems, making them valuable tools for large-scale screenings. However, the lower sensitivity highlights the need for further enhancements, such as feature engineering, alternative algorithms, or the inclusion of additional predictors, to improve the detection of positive cases. Future research could also explore tailored interventions targeting the key predictors identified in this study to address the root causes of behavioral and emotional issues.