Association Rule Mining for Bullying Risk Patterns

Author

Sebastian Chmielewski

Introduction

Bullying is a widespread phenomenon affecting adolescents across different social and cultural contexts. It is commonly understood as intentional and repetitive aggressive behavior occurring within relationships characterized by an imbalance of power, which distinguishes bullying from isolated peer conflicts. Due to its high prevalence and potential for long-term harm, bullying remains a major concern in adolescent health research.

Research consistently shows that bullying involvement, especially victimization, is associated with negative mental health outcomes such as loneliness, emotional distress, and depressive symptoms. Emotional vulnerability, including persistent feelings of loneliness, may both result from bullying and increase the risk of being victimized, suggesting a bidirectional relationship between emotional well-being and bullying.

In addition to psychosocial factors, individual characteristics such as weight status have been linked to bullying experiences. Adolescents who are underweight, overweight, or obese appear to be at higher risk of peer victimization, possibly due to weight-based stigma and appearance-related norms.

Despite extensive evidence on individual risk factors, less is known about how psychosocial, behavioral, and individual characteristics co-occur within adolescents’ everyday lives. This study addresses this gap by applying association rule mining to data from a large, nationally representative survey of secondary school students in Argentina. The analysis examines how loneliness, social support, school absence, and weight status combine into profiles associated with bullying involvement, with separate analyses for girls and boys to identify potential gender-specific patterns.

Data and preprocessing

Bullying in Schools – Kaggle dataset

The dataset is derived from the Global School-Based Student Health Survey (GSHS), an international school-based survey designed to collect information on health-related behaviors and protective factors among adolescents. The survey is conducted using a self-administered questionnaire completed by students during regular school hours.

In this study, we use data collected in Argentina in 2018. The survey covers a large nationally representative sample of secondary school students. Nearly 57,000 students participated in the study, with satisfactory school-level and student-level response rates, which ensures good coverage of the target population and reduces the risk of systematic non-response bias.

The GSHS questionnaire includes multiple thematic modules addressing physical health, mental well-being, social relationships, and risk behaviors. For the purpose of this project, we focus on a subset of variables related to bullying experiences and selected psychosocial and behavioral factors that have been linked in previous research to bullying involvement and victimization.

The selected variables describe:

  • different forms of bullying, including bullying on school property, bullying outside school, and cyberbullying;

  • experiences of physical aggression, such as physical attacks and participation in physical fights;

  • indicators of emotional well-being, including feelings of loneliness and sadness;

  • social support and peer relationships, such as having close friends and perceiving other students as kind and helpful;

  • school-related behaviors, including skipping classes without permission;

  • selected individual characteristics, such as sex and weight status (underweight, overweight, obese).

Loading the data and initial inspection

We begin with loading the data and performing an initial inspection.

df = pd.read_csv(r"Bullying_2018.csv", sep=';')
df = df.replace(r'^\s*$', np.nan, regex=True)
df.shape

for i in df.columns:
    print(i)
    print(df[i].unique())
record
[    1     2     3 ... 57093 57094 57095]
Bullied_on_school_property_in_past_12_months
['Yes' 'No' nan]
Bullied_not_on_school_property_in_past_12_months
['Yes' 'No' nan]
Cyber_bullied_in_past_12_months
[nan 'No' 'Yes']
Custom_Age
['13 years old' '14 years old' '16 years old' '12 years old'
 '15 years old' '11 years old or younger' '17 years old' nan
 '18 years old or older']
Sex
['Female' 'Male' nan]
Physically_attacked
['0 times' '1 time' '12 or more times' '4 or 5 times' '2 or 3 times'
 '10 or 11 times' '8 or 9 times' '6 or 7 times' nan]
Physical_fighting
['0 times' '2 or 3 times' '1 time' '4 or 5 times' '6 or 7 times'
 '8 or 9 times' '10 or 11 times' nan '12 or more times']
Felt_lonely
['Always' 'Never' 'Rarely' 'Sometimes' 'Most of the time' nan]
Close_friends
['2' '3 or more' '0' nan '1']
Miss_school_no_permission
['10 or more days' '0 days' '6 to 9 days' '3 to 5 days' nan '1 or 2 days']
Other_students_kind_and_helpful
['Never' 'Sometimes' 'Most of the time' nan 'Always' 'Rarely']
Parents_understand_problems
['Always' nan 'Most of the time' 'Never' 'Sometimes' 'Rarely']
Most_of_the_time_or_always_felt_lonely
['Yes' 'No' nan]
Missed_classes_or_school_without_permission
['Yes' 'No' nan]
Were_underweight
[nan 'No' 'Yes']
Were_overweight
[nan 'No' 'Yes']
Were_obese
[nan 'No' 'Yes']

The dataset contains 56,981 observations and 18 variables.

df.head()
record Bullied_on_school_property_in_past_12_months Bullied_not_on_school_property_in_past_12_months Cyber_bullied_in_past_12_months Custom_Age Sex Physically_attacked Physical_fighting Felt_lonely Close_friends Miss_school_no_permission Other_students_kind_and_helpful Parents_understand_problems Most_of_the_time_or_always_felt_lonely Missed_classes_or_school_without_permission Were_underweight Were_overweight Were_obese
0 1 Yes Yes NaN 13 years old Female 0 times 0 times Always 2 10 or more days Never Always Yes Yes NaN NaN NaN
1 2 No No No 13 years old Female 0 times 0 times Never 3 or more 0 days Sometimes Always No No NaN NaN NaN
2 3 No No No 14 years old Male 0 times 0 times Never 3 or more 0 days Sometimes Always No No No No No
3 4 No No No 16 years old Male 0 times 2 or 3 times Never 3 or more 0 days Sometimes NaN No No No No No
4 5 No No No 13 years old Female 0 times 0 times Rarely 3 or more 0 days Most of the time Most of the time No No NaN NaN NaN

An overview of variable types and the number of non-missing observations is obtained below.

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 56981 entries, 0 to 56980
Data columns (total 18 columns):
 #   Column                                            Non-Null Count  Dtype 
---  ------                                            --------------  ----- 
 0   record                                            56981 non-null  int64 
 1   Bullied_on_school_property_in_past_12_months      55742 non-null  object
 2   Bullied_not_on_school_property_in_past_12_months  56492 non-null  object
 3   Cyber_bullied_in_past_12_months                   56410 non-null  object
 4   Custom_Age                                        56873 non-null  object
 5   Sex                                               56445 non-null  object
 6   Physically_attacked                               56741 non-null  object
 7   Physical_fighting                                 56713 non-null  object
 8   Felt_lonely                                       56615 non-null  object
 9   Close_friends                                     55905 non-null  object
 10  Miss_school_no_permission                         55117 non-null  object
 11  Other_students_kind_and_helpful                   55422 non-null  object
 12  Parents_understand_problems                       54608 non-null  object
 13  Most_of_the_time_or_always_felt_lonely            56615 non-null  object
 14  Missed_classes_or_school_without_permission       55117 non-null  object
 15  Were_underweight                                  36052 non-null  object
 16  Were_overweight                                   36052 non-null  object
 17  Were_obese                                        36052 non-null  object
dtypes: int64(1), object(17)
memory usage: 7.8+ MB

Most variables are categorical and stored as character strings, which is appropriate for subsequent transformation into binary indicators.

Missing values analysis

Next, we examine the amount of missing data in each variable.

df.isnull().sum().sort_values(ascending=False)
Were_underweight                                    20929
Were_obese                                          20929
Were_overweight                                     20929
Parents_understand_problems                          2373
Missed_classes_or_school_without_permission          1864
Miss_school_no_permission                            1864
Other_students_kind_and_helpful                      1559
Bullied_on_school_property_in_past_12_months         1239
Close_friends                                        1076
Cyber_bullied_in_past_12_months                       571
Sex                                                   536
Bullied_not_on_school_property_in_past_12_months      489
Most_of_the_time_or_always_felt_lonely                366
Felt_lonely                                           366
Physical_fighting                                     268
Physically_attacked                                   240
Custom_Age                                            108
record                                                  0
dtype: int64

Three variables related to weight status (Were_underweight, Were_overweight, Were_obese) contain missing values for approximately 40% of all observations. Although such a level of missingness poses challenges for association rule mining, these variables are considered theoretically important, as weight status is linked to peer victimization, stigma, and psychosocial vulnerability.

Rather than excluding these variables entirely, we adopt an alternative strategy and perform the main analysis on a reduced dataset consisting of observations with available weight status information. This choice represents a deliberate trade-off between sample size and conceptual richness: while the number of transactions is reduced, the resulting dataset allows the inclusion of potentially important individual characteristics that may contribute to bullying involvement.

For the remaining variables, which contain relatively small proportions of missing values, observations with missing entries are removed in later preprocessing steps. Since association rule mining relies on the presence or absence of items within transactions, retaining only complete cases ensures a consistent transactional representation and avoids introducing artificial co-occurrence patterns through imputation.

Binary encoding and transaction creation

In association rule mining, each observation must be represented as a transaction containing a set of items. Therefore, categorical survey responses are transformed into binary indicators (0/1), where value 1 denotes the presence of a given attribute for a student. Only complete observations are retained to ensure consistent transactional representation.

df = df.dropna()
df.shape
(32938, 18)

Converting responses to binary format

Survey variables differ in their measurement scales (binary, ordinal, and frequency-based). Therefore, instead of applying a single uniform transformation, variable-specific recoding rules are used. The objective is to preserve meaningful distinctions in response intensity while avoiding excessive sparsity that could negatively affect support in association rule mining.

Variables describing physical attacks and participation in physical fights are not used as antecedent features. These behaviors conceptually overlap with bullying and can be interpreted as manifestations or direct consequences of bullying rather than independent risk factors. Excluding them allows the analysis to focus on psychosocial, behavioral, and individual vulnerability factors that may precede or co-occur with bullying involvement.

Binary Yes/No variables are mapped directly to 0/1 indicators. Selected ordinal variables are collapsed into low and high categories, representing low versus elevated levels of the underlying construct. Frequency-based variables describing school absence and number of close friends are discretized into a small number of ordered categories capturing none, occasional, and frequent occurrences.

Age is extracted from the original textual format and discretized into intervals representing early, middle, and late adolescence. Finally, categorical variables are converted into binary indicators using one-hot encoding.

This preprocessing strategy represents a compromise between full granularity of the original survey scales and overly simplistic binary encoding, and is intended to produce stable and interpretable association rules.

df_enc = df.copy()

# ============================================================
# NOTE: Physical violence variables are intentionally excluded
# ============================================================


# ========== School absence ==========
def recode_absence(x):
    if x == "0 days":
        return "None"
    elif x in ["1 or 2 days","3 to 5 days"]:
        return "Occasional"
    elif x in ["6 to 9 days","10 or more days"]:
        return "Frequent"
    else:
        return np.nan

df_enc["School_absence_level"] = df_enc["Miss_school_no_permission"].apply(recode_absence)


# ========== Friends ==========
def recode_friends(x):
    if x == "0":
        return "None"
    elif x in ["1","2"]:
        return "Few"
    elif x == "3 or more":
        return "Many"
    else:
        return np.nan

df_enc["Friends_level"] = df_enc["Close_friends"].apply(recode_friends)


# ========== Any bullying (target) ==========
df_enc["Any_bullying"] = (
    (df_enc["Bullied_on_school_property_in_past_12_months"]=="Yes") |
    (df_enc["Bullied_not_on_school_property_in_past_12_months"]=="Yes") |
    (df_enc["Cyber_bullied_in_past_12_months"]=="Yes")
).astype(int)


# ========== Loneliness ==========
df_enc["Lonely"] = (
    df_enc["Most_of_the_time_or_always_felt_lonely"]=="Yes"
).astype(int)


# ========== Sex ==========
df_enc["Sex_Female"] = (df_enc["Sex"]=="Female").astype(int)


# ========== Age ==========
df_enc["Age_num"] = df_enc["Custom_Age"].str.extract(r"(\d+)").astype(float)

df_enc["Age_group"] = pd.cut(
    df_enc["Age_num"],
    bins=[10,13,15,17,20],
    labels=["Age_11_13","Age_14_15","Age_16_17","Age_18_plus"]
)

df_enc = pd.get_dummies(df_enc, columns=["Age_group"], dtype=int)
df_enc = df_enc.drop(columns=["Custom_Age","Age_num"])


# ========== Support ==========
support_map = {
    "Never":0,
    "Rarely":0,
    "Sometimes":1,
    "Most of the time":1,
    "Always":1
}

df_enc["Peers_support"] = df_enc["Other_students_kind_and_helpful"].map(support_map)
df_enc["Parents_support"] = df_enc["Parents_understand_problems"].map(support_map)

df_enc = df_enc.drop(columns=[
    "Other_students_kind_and_helpful",
    "Parents_understand_problems"
])

# ========== Weight status (binary) ==========
df_enc["Excess_weight"] = (
    (df_enc["Were_overweight"]=="Yes") | 
    (df_enc["Were_obese"]=="Yes")
).astype(int)

df_enc["Underweight"] = (
    df_enc["Were_underweight"]=="Yes"
).astype(int)


# ========== Drop raw columns ==========
df_enc = df_enc.drop(columns=[
    "Physically_attacked",
    "Physical_fighting",
    "Miss_school_no_permission",
    "Close_friends",
    "Bullied_on_school_property_in_past_12_months",
    "Bullied_not_on_school_property_in_past_12_months",
    "Cyber_bullied_in_past_12_months",
    "Most_of_the_time_or_always_felt_lonely",
    "Sex",
    "Felt_lonely",
    "Missed_classes_or_school_without_permission",
    "record",
    "Were_underweight",
    "Were_overweight",
    "Were_obese"
], errors="ignore")



# ========== One-hot encoding ==========
df_enc = pd.get_dummies(
    df_enc,
    columns=[
        "School_absence_level",
        "Friends_level"    
],
    dtype=int
)


df_enc = df_enc.dropna()
df_enc.head()
Any_bullying Lonely Sex_Female Age_group_Age_11_13 Age_group_Age_14_15 Age_group_Age_16_17 Age_group_Age_18_plus Peers_support Parents_support Excess_weight Underweight School_absence_level_Frequent School_absence_level_None School_absence_level_Occasional Friends_level_Few Friends_level_Many Friends_level_None
2 0 0 0 0 1 0 0 1 1 0 0 0 1 0 0 1 0
5 0 0 0 1 0 0 0 1 1 0 0 0 1 0 0 1 0
10 0 0 0 0 1 0 0 1 1 0 0 0 0 1 0 1 0
22 1 1 0 1 0 0 0 0 1 0 0 0 1 0 0 1 0
23 0 1 0 0 1 0 0 1 1 1 0 0 1 0 1 0 0

Creating transactions

To avoid generating rules dominated by the sex variable, the dataset was stratified by sex and transactions were created separately for girls and boys. This allows identification of gender-specific association patterns.

df_girls = df_enc[df_enc["Sex_Female"] == 1].drop(columns=["Sex_Female"])
df_boys  = df_enc[df_enc["Sex_Female"] == 0].drop(columns=["Sex_Female"])

def make_transactions(df):
    return [list(row[row == 1].index) for _, row in df.iterrows()]

trans_girls = make_transactions(df_girls)
trans_boys  = make_transactions(df_boys)

Item frequency analysis

Before mining frequent itemsets and association rules, we examine how often individual items occur in the dataset. This step helps to understand the prevalence of different behaviors and to select appropriate support thresholds.

For girls, the most frequent single items indicate generally favorable social environments and low levels of school absence. The highest supports were observed for School_absence_level_None (0.73), Peers_support (0.72), and Friends_level_Many (0.66). At the same time, approximately 46% of girls reported experiencing at least one form of bullying (Any_bullying), while around 22% reported frequent loneliness. Excess body weight was present in about 26% of girls, whereas underweight status was relatively rare (1.7%). These results suggest substantial heterogeneity in psychosocial and individual vulnerability factors within the female subgroup.

item_counts_girls = Counter()
for t in trans_girls:
    item_counts_girls.update(t)

item_freq_girls = pd.DataFrame.from_dict(
    item_counts_girls, orient="index", columns=["count"]
)
item_freq_girls["support"] = item_freq_girls["count"] / len(trans_girls)

item_freq_girls.sort_values("support", ascending=False).head(15)
count support
School_absence_level_None 12910 0.730245
Peers_support 12744 0.720855
Friends_level_Many 11680 0.660671
Parents_support 10323 0.583913
Any_bullying 8053 0.455512
Age_group_Age_14_15 7708 0.435998
Age_group_Age_16_17 6917 0.391255
Friends_level_Few 5041 0.285141
Excess_weight 4527 0.256067
Lonely 3971 0.224617
School_absence_level_Occasional 3943 0.223033
Age_group_Age_11_13 2946 0.166638
Friends_level_None 958 0.054189
School_absence_level_Frequent 826 0.046722
Underweight 292 0.016517

Among boys, a similar pattern emerges for the most prevalent attributes. The most frequent items include Peers_support (0.76), Friends_level_Many (0.74), and School_absence_level_None (0.69). Bullying involvement is reported by approximately 35% of boys, which is lower than among girls. Excess body weight is more common in boys (34%) than in girls, while underweight status remains infrequent (2.4%). Notably, loneliness is less prevalent among boys (9%) than among girls (22%), suggesting potential sex differences in emotional distress.

item_counts_boys = Counter()
for t in trans_boys:
    item_counts_boys.update(t)

item_freq_boys = pd.DataFrame.from_dict(
    item_counts_boys, orient="index", columns=["count"]
)
item_freq_boys["support"] = item_freq_boys["count"] / len(trans_boys)

item_freq_boys.sort_values("support", ascending=False).head(15)
count support
Peers_support 11618 0.761387
Friends_level_Many 11258 0.737794
School_absence_level_None 10556 0.691788
Parents_support 9594 0.628744
Age_group_Age_14_15 6608 0.433056
Age_group_Age_16_17 6169 0.404286
Any_bullying 5332 0.349433
Excess_weight 5182 0.339603
School_absence_level_Occasional 3951 0.258929
Friends_level_Few 3206 0.210106
Age_group_Age_11_13 2359 0.154597
Lonely 1419 0.092994
Friends_level_None 795 0.052100
School_absence_level_Frequent 752 0.049282
Underweight 362 0.023724

In addition to single-item frequencies, frequent 2-itemsets were analyzed to identify commonly co-occurring attributes. For both girls and boys, the most frequent pairs involve combinations of positive social indicators, such as (Peers_support, School_absence_level_None) and (Friends_level_Many, Peers_support), with supports exceeding 0.50 in both subgroups. These results indicate that supportive peer environments and regular school attendance tend to co-occur. Pairs involving Any_bullying appear with substantially lower support than pairs reflecting positive social conditions, but they are nonetheless present among the most frequent combinations. For girls, notable examples include (Any_bullying, School_absence_level_None) and (Any_bullying, Peers_support), each with support around 0.30. For boys, bullying-related pairs occur less frequently, consistent with the lower overall prevalence of bullying in this subgroup.

pair_counts_girls = Counter()

for t in trans_girls:
    for pair in combinations(sorted(t), 2):
        pair_counts_girls.update([pair])

pair_freq_girls = pd.DataFrame.from_dict(
    pair_counts_girls, orient="index", columns=["count"]
)

pair_freq_girls["support"] = pair_freq_girls["count"] / len(trans_girls)

pair_freq_girls.sort_values("support", ascending=False).head(15)
count support
(Peers_support, School_absence_level_None) 9561 0.540811
(Friends_level_Many, Peers_support) 8922 0.504667
(Friends_level_Many, School_absence_level_None) 8660 0.489847
(Parents_support, Peers_support) 8114 0.458963
(Parents_support, School_absence_level_None) 7907 0.447254
(Friends_level_Many, Parents_support) 7248 0.409978
(Age_group_Age_14_15, School_absence_level_None) 5851 0.330958
(Any_bullying, School_absence_level_None) 5522 0.312348
(Age_group_Age_14_15, Peers_support) 5479 0.309916
(Any_bullying, Peers_support) 5349 0.302562
(Age_group_Age_14_15, Friends_level_Many) 5320 0.300922
(Any_bullying, Friends_level_Many) 5074 0.287007
(Age_group_Age_16_17, Peers_support) 5010 0.283387
(Age_group_Age_16_17, School_absence_level_None) 4615 0.261044
(Age_group_Age_14_15, Parents_support) 4466 0.252616
pair_counts_boys = Counter()

for t in trans_boys:
    for pair in combinations(sorted(t), 2):
        pair_counts_boys.update([pair])

pair_freq_boys = pd.DataFrame.from_dict(
    pair_counts_boys, orient="index", columns=["count"]
)

pair_freq_boys["support"] = pair_freq_boys["count"] / len(trans_boys)

pair_freq_boys.sort_values("support", ascending=False).head(15)
count support
(Friends_level_Many, Peers_support) 8917 0.584376
(Peers_support, School_absence_level_None) 8217 0.538502
(Parents_support, Peers_support) 7890 0.517072
(Friends_level_Many, School_absence_level_None) 7882 0.516548
(Friends_level_Many, Parents_support) 7418 0.486139
(Parents_support, School_absence_level_None) 6923 0.453699
(Age_group_Age_14_15, Peers_support) 4974 0.325972
(Age_group_Age_14_15, Friends_level_Many) 4952 0.324530
(Age_group_Age_16_17, Peers_support) 4786 0.313651
(Age_group_Age_14_15, School_absence_level_None) 4777 0.313061
(Age_group_Age_16_17, Friends_level_Many) 4412 0.289141
(Age_group_Age_14_15, Parents_support) 4208 0.275772
(Excess_weight, Peers_support) 3915 0.256570
(Age_group_Age_16_17, School_absence_level_None) 3911 0.256308
(Excess_weight, Friends_level_Many) 3793 0.248575

Overall, the single-item and pair frequency analyses reveal broadly similar structural patterns across sexes, while also highlighting differences in the prevalence of loneliness and weight-related characteristics. These findings motivate the use of a relatively low minimum support threshold in subsequent association rule mining and provide an empirical foundation for interpreting multi-item rules.

Association rule mining configuration

Association rules were generated using the FP-Growth algorithm. The following parameter settings were applied:

  • Minimum support = 0.01 – to exclude extremely rare patterns while preserving sufficient coverage of the population.

  • Minimum confidence = 0.6 – to retain rules with reasonable predictive strength.

  • Minimum lift = 1.3 – to focus on associations that exceed chance co-occurrence.

  • Maximum antecedent length = 3 and single-item consequents – to ensure interpretability of the resulting rules.

def mine_rules(transactions, min_support=0.01):
    te = TransactionEncoder()
    arr = te.fit(transactions).transform(transactions)
    df_tf = pd.DataFrame(arr, columns=te.columns_)

    frequent_itemsets = fpgrowth(
        df_tf,
        min_support=min_support,
        use_colnames=True
    )

    rules = association_rules(
        frequent_itemsets,
        metric="confidence",
        min_threshold=0.6
    )

    rules = rules[
        (rules["lift"] >= 1.3)
    ].copy()

    rules["antecedent_len"] = rules["antecedents"].apply(len)
    rules["consequent_len"] = rules["consequents"].apply(len)

    rules = rules[
        (rules["antecedent_len"] <= 3) &
        (rules["consequent_len"] == 1)
    ]

    return rules

rules_girls = mine_rules(trans_girls)
rules_boys  = mine_rules(trans_boys)

FP-Growth and association rule mining were performed separately for girls and boys using identical parameter settings to ensure comparability of results.

Sensitivity analysis of minimum support threshold

To assess the impact of the minimum support threshold on the number of discovered rules, a sensitivity analysis was performed for selected support values (0.01, 0.02, 0.03, and 0.05) separately for girls and boys.

for i in [trans_girls, trans_boys]:
    for s in [0.01, 0.02, 0.03, 0.05]:
        r = mine_rules(i, min_support=s)
        print(s, r.shape[0])
    print('\n')
0.01 53
0.02 35
0.03 22
0.05 10


0.01 3
0.02 0
0.03 0
0.05 0

For girls, decreasing the minimum support threshold leads to a substantial increase in the number of discovered rules (53 rules for support = 0.01, 35 for 0.02, 22 for 0.03, and 10 for 0.05). This monotonic pattern indicates a clear trade-off between rule richness and rule strictness.

For boys, bullying-related rules emerge only at the lowest support threshold (0.01), while higher thresholds result in no rules. This suggests that associations related to bullying among boys are weaker and less frequent, and therefore require more permissive support settings to be detected.

Based on these results, a minimum support threshold of 0.01 was selected for the main analysis as a compromise between retaining a sufficient number of rules and avoiding extremely rare patterns.

Rules with Any_bullying as the consequent were extracted to identify combinations of factors associated with bullying involvement separately for girls and boys.

Conclusion

This study applied association rule mining to adolescent health survey data in order to identify combinations of psychosocial, behavioral, and individual characteristics associated with bullying involvement. By transforming survey responses into transactional form and analyzing girls and boys separately, the analysis revealed clear and interpretable patterns.

Across both sexes, loneliness emerged as the most central factor appearing in bullying-related rules. Bullying involvement is most strongly associated with profiles that combine emotional distress with school disengagement and limited peer relationships. Among girls, excess body weight additionally appears in several high-lift rules, suggesting that weight-related vulnerability may intensify the association between loneliness and bullying.

The results demonstrate that association rule mining can uncover meaningful multi-factor profiles that go beyond simple bivariate relationships. Rather than identifying single predictors, this approach highlights how combinations of characteristics jointly characterize high-risk groups.

Overall, the findings emphasize the importance of addressing emotional well-being and school connectedness as key components of bullying prevention strategies.

Limitations and future work

Several limitations of this study should be acknowledged.

First, the analysis is based on cross-sectional self-reported data, which prevents any causal interpretation. The discovered rules describe co-occurrence patterns rather than directional effects.

Second, a substantial proportion of missing values in weight-related variables required restricting the analysis to complete cases, which reduces sample size and may introduce selection bias.

Third, association rule mining is sensitive to parameter choices such as minimum support and confidence. Although a sensitivity analysis was performed, different thresholds may yield alternative rule sets.

Fourth, the study focused on a limited subset of available survey variables. Incorporating additional contextual factors (e.g., family environment, mental health indicators, or school climate variables) could reveal richer patterns.

Future work could extend this analysis by: * Incorporation of additional psychosocial and family-level variables * Cross-country and cross-cultural analysis