Association Rule Mining for Bullying Risk Patterns
Author
Sebastian Chmielewski
Introduction
Bullying is a widespread phenomenon affecting adolescents across different social and cultural contexts. It is commonly understood as intentional and repetitive aggressive behavior occurring within relationships characterized by an imbalance of power, which distinguishes bullying from isolated peer conflicts. Due to its high prevalence and potential for long-term harm, bullying remains a major concern in adolescent health research.
Research consistently shows that bullying involvement, especially victimization, is associated with negative mental health outcomes such as loneliness, emotional distress, and depressive symptoms. Emotional vulnerability, including persistent feelings of loneliness, may both result from bullying and increase the risk of being victimized, suggesting a bidirectional relationship between emotional well-being and bullying.
In addition to psychosocial factors, individual characteristics such as weight status have been linked to bullying experiences. Adolescents who are underweight, overweight, or obese appear to be at higher risk of peer victimization, possibly due to weight-based stigma and appearance-related norms.
Despite extensive evidence on individual risk factors, less is known about how psychosocial, behavioral, and individual characteristics co-occur within adolescents’ everyday lives. This study addresses this gap by applying association rule mining to data from a large, nationally representative survey of secondary school students in Argentina. The analysis examines how loneliness, social support, school absence, and weight status combine into profiles associated with bullying involvement, with separate analyses for girls and boys to identify potential gender-specific patterns.
The dataset is derived from the Global School-Based Student Health Survey (GSHS), an international school-based survey designed to collect information on health-related behaviors and protective factors among adolescents. The survey is conducted using a self-administered questionnaire completed by students during regular school hours.
In this study, we use data collected in Argentina in 2018. The survey covers a large nationally representative sample of secondary school students. Nearly 57,000 students participated in the study, with satisfactory school-level and student-level response rates, which ensures good coverage of the target population and reduces the risk of systematic non-response bias.
The GSHS questionnaire includes multiple thematic modules addressing physical health, mental well-being, social relationships, and risk behaviors. For the purpose of this project, we focus on a subset of variables related to bullying experiences and selected psychosocial and behavioral factors that have been linked in previous research to bullying involvement and victimization.
The selected variables describe:
different forms of bullying, including bullying on school property, bullying outside school, and cyberbullying;
experiences of physical aggression, such as physical attacks and participation in physical fights;
indicators of emotional well-being, including feelings of loneliness and sadness;
social support and peer relationships, such as having close friends and perceiving other students as kind and helpful;
school-related behaviors, including skipping classes without permission;
selected individual characteristics, such as sex and weight status (underweight, overweight, obese).
Loading the data and initial inspection
We begin with loading the data and performing an initial inspection.
df = pd.read_csv(r"Bullying_2018.csv", sep=';')df = df.replace(r'^\s*$', np.nan, regex=True)df.shapefor i in df.columns:print(i)print(df[i].unique())
record
[ 1 2 3 ... 57093 57094 57095]
Bullied_on_school_property_in_past_12_months
['Yes' 'No' nan]
Bullied_not_on_school_property_in_past_12_months
['Yes' 'No' nan]
Cyber_bullied_in_past_12_months
[nan 'No' 'Yes']
Custom_Age
['13 years old' '14 years old' '16 years old' '12 years old'
'15 years old' '11 years old or younger' '17 years old' nan
'18 years old or older']
Sex
['Female' 'Male' nan]
Physically_attacked
['0 times' '1 time' '12 or more times' '4 or 5 times' '2 or 3 times'
'10 or 11 times' '8 or 9 times' '6 or 7 times' nan]
Physical_fighting
['0 times' '2 or 3 times' '1 time' '4 or 5 times' '6 or 7 times'
'8 or 9 times' '10 or 11 times' nan '12 or more times']
Felt_lonely
['Always' 'Never' 'Rarely' 'Sometimes' 'Most of the time' nan]
Close_friends
['2' '3 or more' '0' nan '1']
Miss_school_no_permission
['10 or more days' '0 days' '6 to 9 days' '3 to 5 days' nan '1 or 2 days']
Other_students_kind_and_helpful
['Never' 'Sometimes' 'Most of the time' nan 'Always' 'Rarely']
Parents_understand_problems
['Always' nan 'Most of the time' 'Never' 'Sometimes' 'Rarely']
Most_of_the_time_or_always_felt_lonely
['Yes' 'No' nan]
Missed_classes_or_school_without_permission
['Yes' 'No' nan]
Were_underweight
[nan 'No' 'Yes']
Were_overweight
[nan 'No' 'Yes']
Were_obese
[nan 'No' 'Yes']
The dataset contains 56,981 observations and 18 variables.
df.head()
record
Bullied_on_school_property_in_past_12_months
Bullied_not_on_school_property_in_past_12_months
Cyber_bullied_in_past_12_months
Custom_Age
Sex
Physically_attacked
Physical_fighting
Felt_lonely
Close_friends
Miss_school_no_permission
Other_students_kind_and_helpful
Parents_understand_problems
Most_of_the_time_or_always_felt_lonely
Missed_classes_or_school_without_permission
Were_underweight
Were_overweight
Were_obese
0
1
Yes
Yes
NaN
13 years old
Female
0 times
0 times
Always
2
10 or more days
Never
Always
Yes
Yes
NaN
NaN
NaN
1
2
No
No
No
13 years old
Female
0 times
0 times
Never
3 or more
0 days
Sometimes
Always
No
No
NaN
NaN
NaN
2
3
No
No
No
14 years old
Male
0 times
0 times
Never
3 or more
0 days
Sometimes
Always
No
No
No
No
No
3
4
No
No
No
16 years old
Male
0 times
2 or 3 times
Never
3 or more
0 days
Sometimes
NaN
No
No
No
No
No
4
5
No
No
No
13 years old
Female
0 times
0 times
Rarely
3 or more
0 days
Most of the time
Most of the time
No
No
NaN
NaN
NaN
An overview of variable types and the number of non-missing observations is obtained below.
Three variables related to weight status (Were_underweight, Were_overweight, Were_obese) contain missing values for approximately 40% of all observations. Although such a level of missingness poses challenges for association rule mining, these variables are considered theoretically important, as weight status is linked to peer victimization, stigma, and psychosocial vulnerability.
Rather than excluding these variables entirely, we adopt an alternative strategy and perform the main analysis on a reduced dataset consisting of observations with available weight status information. This choice represents a deliberate trade-off between sample size and conceptual richness: while the number of transactions is reduced, the resulting dataset allows the inclusion of potentially important individual characteristics that may contribute to bullying involvement.
For the remaining variables, which contain relatively small proportions of missing values, observations with missing entries are removed in later preprocessing steps. Since association rule mining relies on the presence or absence of items within transactions, retaining only complete cases ensures a consistent transactional representation and avoids introducing artificial co-occurrence patterns through imputation.
Binary encoding and transaction creation
In association rule mining, each observation must be represented as a transaction containing a set of items. Therefore, categorical survey responses are transformed into binary indicators (0/1), where value 1 denotes the presence of a given attribute for a student. Only complete observations are retained to ensure consistent transactional representation.
df = df.dropna()df.shape
(32938, 18)
Converting responses to binary format
Survey variables differ in their measurement scales (binary, ordinal, and frequency-based). Therefore, instead of applying a single uniform transformation, variable-specific recoding rules are used. The objective is to preserve meaningful distinctions in response intensity while avoiding excessive sparsity that could negatively affect support in association rule mining.
Variables describing physical attacks and participation in physical fights are not used as antecedent features. These behaviors conceptually overlap with bullying and can be interpreted as manifestations or direct consequences of bullying rather than independent risk factors. Excluding them allows the analysis to focus on psychosocial, behavioral, and individual vulnerability factors that may precede or co-occur with bullying involvement.
Binary Yes/No variables are mapped directly to 0/1 indicators. Selected ordinal variables are collapsed into low and high categories, representing low versus elevated levels of the underlying construct. Frequency-based variables describing school absence and number of close friends are discretized into a small number of ordered categories capturing none, occasional, and frequent occurrences.
Age is extracted from the original textual format and discretized into intervals representing early, middle, and late adolescence. Finally, categorical variables are converted into binary indicators using one-hot encoding.
This preprocessing strategy represents a compromise between full granularity of the original survey scales and overly simplistic binary encoding, and is intended to produce stable and interpretable association rules.
df_enc = df.copy()# ============================================================# NOTE: Physical violence variables are intentionally excluded# ============================================================# ========== School absence ==========def recode_absence(x):if x =="0 days":return"None"elif x in ["1 or 2 days","3 to 5 days"]:return"Occasional"elif x in ["6 to 9 days","10 or more days"]:return"Frequent"else:return np.nandf_enc["School_absence_level"] = df_enc["Miss_school_no_permission"].apply(recode_absence)# ========== Friends ==========def recode_friends(x):if x =="0":return"None"elif x in ["1","2"]:return"Few"elif x =="3 or more":return"Many"else:return np.nandf_enc["Friends_level"] = df_enc["Close_friends"].apply(recode_friends)# ========== Any bullying (target) ==========df_enc["Any_bullying"] = ( (df_enc["Bullied_on_school_property_in_past_12_months"]=="Yes") | (df_enc["Bullied_not_on_school_property_in_past_12_months"]=="Yes") | (df_enc["Cyber_bullied_in_past_12_months"]=="Yes")).astype(int)# ========== Loneliness ==========df_enc["Lonely"] = ( df_enc["Most_of_the_time_or_always_felt_lonely"]=="Yes").astype(int)# ========== Sex ==========df_enc["Sex_Female"] = (df_enc["Sex"]=="Female").astype(int)# ========== Age ==========df_enc["Age_num"] = df_enc["Custom_Age"].str.extract(r"(\d+)").astype(float)df_enc["Age_group"] = pd.cut( df_enc["Age_num"], bins=[10,13,15,17,20], labels=["Age_11_13","Age_14_15","Age_16_17","Age_18_plus"])df_enc = pd.get_dummies(df_enc, columns=["Age_group"], dtype=int)df_enc = df_enc.drop(columns=["Custom_Age","Age_num"])# ========== Support ==========support_map = {"Never":0,"Rarely":0,"Sometimes":1,"Most of the time":1,"Always":1}df_enc["Peers_support"] = df_enc["Other_students_kind_and_helpful"].map(support_map)df_enc["Parents_support"] = df_enc["Parents_understand_problems"].map(support_map)df_enc = df_enc.drop(columns=["Other_students_kind_and_helpful","Parents_understand_problems"])# ========== Weight status (binary) ==========df_enc["Excess_weight"] = ( (df_enc["Were_overweight"]=="Yes") | (df_enc["Were_obese"]=="Yes")).astype(int)df_enc["Underweight"] = ( df_enc["Were_underweight"]=="Yes").astype(int)# ========== Drop raw columns ==========df_enc = df_enc.drop(columns=["Physically_attacked","Physical_fighting","Miss_school_no_permission","Close_friends","Bullied_on_school_property_in_past_12_months","Bullied_not_on_school_property_in_past_12_months","Cyber_bullied_in_past_12_months","Most_of_the_time_or_always_felt_lonely","Sex","Felt_lonely","Missed_classes_or_school_without_permission","record","Were_underweight","Were_overweight","Were_obese"], errors="ignore")# ========== One-hot encoding ==========df_enc = pd.get_dummies( df_enc, columns=["School_absence_level","Friends_level"], dtype=int)df_enc = df_enc.dropna()df_enc.head()
Any_bullying
Lonely
Sex_Female
Age_group_Age_11_13
Age_group_Age_14_15
Age_group_Age_16_17
Age_group_Age_18_plus
Peers_support
Parents_support
Excess_weight
Underweight
School_absence_level_Frequent
School_absence_level_None
School_absence_level_Occasional
Friends_level_Few
Friends_level_Many
Friends_level_None
2
0
0
0
0
1
0
0
1
1
0
0
0
1
0
0
1
0
5
0
0
0
1
0
0
0
1
1
0
0
0
1
0
0
1
0
10
0
0
0
0
1
0
0
1
1
0
0
0
0
1
0
1
0
22
1
1
0
1
0
0
0
0
1
0
0
0
1
0
0
1
0
23
0
1
0
0
1
0
0
1
1
1
0
0
1
0
1
0
0
Creating transactions
To avoid generating rules dominated by the sex variable, the dataset was stratified by sex and transactions were created separately for girls and boys. This allows identification of gender-specific association patterns.
df_girls = df_enc[df_enc["Sex_Female"] ==1].drop(columns=["Sex_Female"])df_boys = df_enc[df_enc["Sex_Female"] ==0].drop(columns=["Sex_Female"])def make_transactions(df):return [list(row[row ==1].index) for _, row in df.iterrows()]trans_girls = make_transactions(df_girls)trans_boys = make_transactions(df_boys)
Item frequency analysis
Before mining frequent itemsets and association rules, we examine how often individual items occur in the dataset. This step helps to understand the prevalence of different behaviors and to select appropriate support thresholds.
For girls, the most frequent single items indicate generally favorable social environments and low levels of school absence. The highest supports were observed for School_absence_level_None (0.73), Peers_support (0.72), and Friends_level_Many (0.66). At the same time, approximately 46% of girls reported experiencing at least one form of bullying (Any_bullying), while around 22% reported frequent loneliness. Excess body weight was present in about 26% of girls, whereas underweight status was relatively rare (1.7%). These results suggest substantial heterogeneity in psychosocial and individual vulnerability factors within the female subgroup.
item_counts_girls = Counter()for t in trans_girls: item_counts_girls.update(t)item_freq_girls = pd.DataFrame.from_dict( item_counts_girls, orient="index", columns=["count"])item_freq_girls["support"] = item_freq_girls["count"] /len(trans_girls)item_freq_girls.sort_values("support", ascending=False).head(15)
count
support
School_absence_level_None
12910
0.730245
Peers_support
12744
0.720855
Friends_level_Many
11680
0.660671
Parents_support
10323
0.583913
Any_bullying
8053
0.455512
Age_group_Age_14_15
7708
0.435998
Age_group_Age_16_17
6917
0.391255
Friends_level_Few
5041
0.285141
Excess_weight
4527
0.256067
Lonely
3971
0.224617
School_absence_level_Occasional
3943
0.223033
Age_group_Age_11_13
2946
0.166638
Friends_level_None
958
0.054189
School_absence_level_Frequent
826
0.046722
Underweight
292
0.016517
Among boys, a similar pattern emerges for the most prevalent attributes. The most frequent items include Peers_support (0.76), Friends_level_Many (0.74), and School_absence_level_None (0.69). Bullying involvement is reported by approximately 35% of boys, which is lower than among girls. Excess body weight is more common in boys (34%) than in girls, while underweight status remains infrequent (2.4%). Notably, loneliness is less prevalent among boys (9%) than among girls (22%), suggesting potential sex differences in emotional distress.
item_counts_boys = Counter()for t in trans_boys: item_counts_boys.update(t)item_freq_boys = pd.DataFrame.from_dict( item_counts_boys, orient="index", columns=["count"])item_freq_boys["support"] = item_freq_boys["count"] /len(trans_boys)item_freq_boys.sort_values("support", ascending=False).head(15)
count
support
Peers_support
11618
0.761387
Friends_level_Many
11258
0.737794
School_absence_level_None
10556
0.691788
Parents_support
9594
0.628744
Age_group_Age_14_15
6608
0.433056
Age_group_Age_16_17
6169
0.404286
Any_bullying
5332
0.349433
Excess_weight
5182
0.339603
School_absence_level_Occasional
3951
0.258929
Friends_level_Few
3206
0.210106
Age_group_Age_11_13
2359
0.154597
Lonely
1419
0.092994
Friends_level_None
795
0.052100
School_absence_level_Frequent
752
0.049282
Underweight
362
0.023724
In addition to single-item frequencies, frequent 2-itemsets were analyzed to identify commonly co-occurring attributes. For both girls and boys, the most frequent pairs involve combinations of positive social indicators, such as (Peers_support, School_absence_level_None) and (Friends_level_Many, Peers_support), with supports exceeding 0.50 in both subgroups. These results indicate that supportive peer environments and regular school attendance tend to co-occur. Pairs involving Any_bullying appear with substantially lower support than pairs reflecting positive social conditions, but they are nonetheless present among the most frequent combinations. For girls, notable examples include (Any_bullying, School_absence_level_None) and (Any_bullying, Peers_support), each with support around 0.30. For boys, bullying-related pairs occur less frequently, consistent with the lower overall prevalence of bullying in this subgroup.
pair_counts_girls = Counter()for t in trans_girls:for pair in combinations(sorted(t), 2): pair_counts_girls.update([pair])pair_freq_girls = pd.DataFrame.from_dict( pair_counts_girls, orient="index", columns=["count"])pair_freq_girls["support"] = pair_freq_girls["count"] /len(trans_girls)pair_freq_girls.sort_values("support", ascending=False).head(15)
count
support
(Peers_support, School_absence_level_None)
9561
0.540811
(Friends_level_Many, Peers_support)
8922
0.504667
(Friends_level_Many, School_absence_level_None)
8660
0.489847
(Parents_support, Peers_support)
8114
0.458963
(Parents_support, School_absence_level_None)
7907
0.447254
(Friends_level_Many, Parents_support)
7248
0.409978
(Age_group_Age_14_15, School_absence_level_None)
5851
0.330958
(Any_bullying, School_absence_level_None)
5522
0.312348
(Age_group_Age_14_15, Peers_support)
5479
0.309916
(Any_bullying, Peers_support)
5349
0.302562
(Age_group_Age_14_15, Friends_level_Many)
5320
0.300922
(Any_bullying, Friends_level_Many)
5074
0.287007
(Age_group_Age_16_17, Peers_support)
5010
0.283387
(Age_group_Age_16_17, School_absence_level_None)
4615
0.261044
(Age_group_Age_14_15, Parents_support)
4466
0.252616
pair_counts_boys = Counter()for t in trans_boys:for pair in combinations(sorted(t), 2): pair_counts_boys.update([pair])pair_freq_boys = pd.DataFrame.from_dict( pair_counts_boys, orient="index", columns=["count"])pair_freq_boys["support"] = pair_freq_boys["count"] /len(trans_boys)pair_freq_boys.sort_values("support", ascending=False).head(15)
count
support
(Friends_level_Many, Peers_support)
8917
0.584376
(Peers_support, School_absence_level_None)
8217
0.538502
(Parents_support, Peers_support)
7890
0.517072
(Friends_level_Many, School_absence_level_None)
7882
0.516548
(Friends_level_Many, Parents_support)
7418
0.486139
(Parents_support, School_absence_level_None)
6923
0.453699
(Age_group_Age_14_15, Peers_support)
4974
0.325972
(Age_group_Age_14_15, Friends_level_Many)
4952
0.324530
(Age_group_Age_16_17, Peers_support)
4786
0.313651
(Age_group_Age_14_15, School_absence_level_None)
4777
0.313061
(Age_group_Age_16_17, Friends_level_Many)
4412
0.289141
(Age_group_Age_14_15, Parents_support)
4208
0.275772
(Excess_weight, Peers_support)
3915
0.256570
(Age_group_Age_16_17, School_absence_level_None)
3911
0.256308
(Excess_weight, Friends_level_Many)
3793
0.248575
Overall, the single-item and pair frequency analyses reveal broadly similar structural patterns across sexes, while also highlighting differences in the prevalence of loneliness and weight-related characteristics. These findings motivate the use of a relatively low minimum support threshold in subsequent association rule mining and provide an empirical foundation for interpreting multi-item rules.
Association rule mining configuration
Association rules were generated using the FP-Growth algorithm. The following parameter settings were applied:
Minimum support = 0.01 – to exclude extremely rare patterns while preserving sufficient coverage of the population.
Minimum confidence = 0.6 – to retain rules with reasonable predictive strength.
Minimum lift = 1.3 – to focus on associations that exceed chance co-occurrence.
Maximum antecedent length = 3 and single-item consequents – to ensure interpretability of the resulting rules.
FP-Growth and association rule mining were performed separately for girls and boys using identical parameter settings to ensure comparability of results.
Sensitivity analysis of minimum support threshold
To assess the impact of the minimum support threshold on the number of discovered rules, a sensitivity analysis was performed for selected support values (0.01, 0.02, 0.03, and 0.05) separately for girls and boys.
for i in [trans_girls, trans_boys]:for s in [0.01, 0.02, 0.03, 0.05]: r = mine_rules(i, min_support=s)print(s, r.shape[0])print('\n')
For girls, decreasing the minimum support threshold leads to a substantial increase in the number of discovered rules (53 rules for support = 0.01, 35 for 0.02, 22 for 0.03, and 10 for 0.05). This monotonic pattern indicates a clear trade-off between rule richness and rule strictness.
For boys, bullying-related rules emerge only at the lowest support threshold (0.01), while higher thresholds result in no rules. This suggests that associations related to bullying among boys are weaker and less frequent, and therefore require more permissive support settings to be detected.
Based on these results, a minimum support threshold of 0.01 was selected for the main analysis as a compromise between retaining a sufficient number of rules and avoiding extremely rare patterns.
Rules with Any_bullying as the consequent were extracted to identify combinations of factors associated with bullying involvement separately for girls and boys.
Bullying-related association rules
Rules with Any_bullying as the consequent were extracted in order to identify combinations of factors associated with bullying involvement. The analysis was performed separately for girls and boys.
Among girls, the strongest rules consistently include loneliness as a central component of the antecedent. The highest-lift rules involve combinations of loneliness with school absence and additional psychosocial or individual characteristics. For example, the combination (Parents_support, School_absence_level_Occasional, Lonely) is associated with bullying with a confidence of approximately 0.74 and a lift of 1.62. Similarly, (Age_group_Age_14_15, School_absence_level_Occasional, Lonely) and (School_absence_level_Frequent, Lonely) yield confidence values above 0.73 and lift values exceeding 1.6.
Two-item rules also highlight the importance of loneliness. The rule (School_absence_level_Occasional, Lonely) → Any_bullying exhibits a confidence of 0.72 and a lift of 1.57, indicating that girls who feel lonely and occasionally miss school are substantially more likely to report bullying experiences than the average girl in the sample.
Weight-related characteristics appear in several high-ranking rules. For instance, combinations such as (Parents_support, Excess_weight, Lonely) and (Age_group_Age_14_15, Excess_weight, Lonely) are associated with bullying with confidence values around 0.70 and lift values close to 1.55. This suggests that excess body weight may amplify the association between emotional distress and bullying involvement.
Overall, the results indicate that bullying among girls is most strongly associated with profiles characterized by emotional vulnerability (loneliness), combined with school disengagement and, in some cases, excess body weight.
For boys, considerably fewer bullying-related rules were discovered. Nevertheless, the identified rules reveal a pattern similar in structure to that observed among girls. The strongest rule is (School_absence_level_Occasional, Lonely, Age_group_Age_16_17) → Any_bullying, with a confidence of 0.64 and a lift of 1.83. Two-item rules such as (School_absence_level_Occasional, Lonely) and (Friends_level_Few, Lonely) also show elevated confidence (above 0.60) and lift values exceeding 1.7.
These findings suggest that, among boys, bullying involvement is primarily associated with loneliness combined with limited peer relationships or occasional school absence. However, the smaller number of detected rules indicates weaker and less stable association structures compared to girls.
Visualization of bullying-related association rules (girls)
To support the interpretation of the discovered association rules for girls, two complementary visualizations were produced.
The first visualization presents a scatter plot of support versus confidence, with point size proportional to lift. Each point corresponds to one association rule. The plot indicates that most bullying-related rules are characterized by relatively low to moderate support (approximately 0.01–0.08) and high confidence (around 0.60–0.74). This pattern suggests that although the identified rules apply to specific subgroups of students, they exhibit substantial predictive strength. Larger points, corresponding to higher lift values, are mainly concentrated in this region, indicating that the strongest rules represent meaningful associations rather than chance co-occurrences.
plt.figure()plt.scatter( rules_girls["support"], rules_girls["confidence"], s = rules_girls["lift"] *20)plt.xlabel("Support")plt.ylabel("Confidence")plt.title("Girls: Support vs Confidence (size = Lift)")plt.show()
The second visualization shows a bar chart of the most frequent antecedent items appearing in bullying-related rules. The results clearly demonstrate that Lonely is by far the most common antecedent, appearing much more frequently than any other item. Other frequently occurring antecedents include Peers_support, Parents_support, Friends_level_Few, Friends_level_Many, School_absence_level_Occasional, and age groups 14–15 and 16–17, as well as Excess_weight. This distribution confirms that emotional distress, aspects of peer relationships, school attendance patterns, and body weight status constitute the dominant components of high-risk profiles associated with bullying among girls.
cnt = Counter()for ants in girls_bully["antecedents"]: cnt.update(list(ants))pd.Series(cnt).sort_values(ascending=False).head(10).plot(kind="bar")plt.title("Most frequent antecedent items (girls)")plt.show()
Together, these visualizations provide a concise summary of rule quality and structure, reinforcing the central role of loneliness and psychosocial factors in bullying-related association patterns.
Conclusion
This study applied association rule mining to adolescent health survey data in order to identify combinations of psychosocial, behavioral, and individual characteristics associated with bullying involvement. By transforming survey responses into transactional form and analyzing girls and boys separately, the analysis revealed clear and interpretable patterns.
Across both sexes, loneliness emerged as the most central factor appearing in bullying-related rules. Bullying involvement is most strongly associated with profiles that combine emotional distress with school disengagement and limited peer relationships. Among girls, excess body weight additionally appears in several high-lift rules, suggesting that weight-related vulnerability may intensify the association between loneliness and bullying.
The results demonstrate that association rule mining can uncover meaningful multi-factor profiles that go beyond simple bivariate relationships. Rather than identifying single predictors, this approach highlights how combinations of characteristics jointly characterize high-risk groups.
Overall, the findings emphasize the importance of addressing emotional well-being and school connectedness as key components of bullying prevention strategies.
Limitations and future work
Several limitations of this study should be acknowledged.
First, the analysis is based on cross-sectional self-reported data, which prevents any causal interpretation. The discovered rules describe co-occurrence patterns rather than directional effects.
Second, a substantial proportion of missing values in weight-related variables required restricting the analysis to complete cases, which reduces sample size and may introduce selection bias.
Third, association rule mining is sensitive to parameter choices such as minimum support and confidence. Although a sensitivity analysis was performed, different thresholds may yield alternative rule sets.
Fourth, the study focused on a limited subset of available survey variables. Incorporating additional contextual factors (e.g., family environment, mental health indicators, or school climate variables) could reveal richer patterns.
Future work could extend this analysis by: * Incorporation of additional psychosocial and family-level variables * Cross-country and cross-cultural analysis