Harassment & Bullying: Disciplinary Data Analysis

Data Dictionary

Variable Name Description
state U.S. state or territory
total_students Total number of students in the state
category Student group (e.g., race/ethnicity, disability, English learner)
number Number of students disciplined for race-based harassment
percent Percent of total students in that category who were disciplined
Black or African American - Percent % of Black students disciplined
White - Percent % of White students disciplined
Asian - Percent % of Asian students disciplined
(similar for other races / identities)

Why I Chose this Data

I chose this data set because it highlights possible disparities in how students are disciplined for bullying or harassment based on race. While it’s important to support and protect victims, we also need to understand the students who are doing the harm. By looking at patterns in who gets flagged, we can start to uncover underlying issues and work toward solutions that prevent bullying before it happens. This kind of insight can help schools create fairer, more supportive environments for everyone.

Bar Plot – Percent Disciplined in California

Interpretation:

This bar plot displays the percentage of students disciplined for race-based harassment in California, broken down by demographic groups. The data reveals that Latino and Black students face the highest discipline rates, and both groups showing much higher rates than Asian or Pacific Islander students.

Given that Latino students make up a large share of California’s school population, part of the higher count could stem from population size. However, national studies have found that Latino and Black students are often disciplined more harshly and more frequently than their counterparts, even when engaging in similar behaviors. Factors contributing to this include:

  • Implicit biases from teachers or administrators

  • Language and cultural barriers leading to misunderstanding or unfair assumptions

  • Zero-tolerance policies that penalize minor behaviors, disproportionately affecting students of color

This graph raises concern about systemic discipline inequities in California schools and highlights the need for culturally responsive practices and restorative alternatives.

Map – Discipline Rate for Black Students by State

This map shows how discipline rates for Black students vary across states. The darkest areas indicate states where Black students are disciplined most frequently for race-based harassment. This geographic pattern suggests the issue isn’t just local but national—rooted in broader systemic factors. The rates raise concern about bias in reporting, inconsistent school policies, and uneven enforcement across state lines.

T-Test: Black vs. White Students


    Paired t-test

data:  t_data$`Black or African American` and t_data$White
t = -7.5898, df = 52, p-value = 5.689e-10
alternative hypothesis: true mean difference is not equal to 0
95 percent confidence interval:
 -40.92323 -23.80885
sample estimates:
mean difference 
      -32.36604 

I used a paired t-test because I’m comparing discipline rates in the same states between two groups.

This test tells us whether the difference in average percent disciplined between Black and White students is statistically significant. These were the t-test results:

  • Test type: Paired t-test
  • Groups compared: Percent of students disciplined for race-based harassment
  • Black vs White students, matched by state
  • Mean difference: -32.4%
  • 95% CI: [-41.13%, -23.67%]
  • p-value: 1.05e-09
  • Conclusion: Black students are, on average, disciplined at significantly higher rates than White students for race-based harassment. This difference is statistically significant and unlikely due to random chance.

This t-test shows that across states, the average percent of Black students disciplined for race-based harassment is 32.4 percentage points higher than that of White students — and this is statistically significant, with a p-value essentially zero. This reveals a large disparity in how different racial groups are represented in disciplinary data. This difference is not due to chance. It suggests institutional or systemic inequities—possibly related to how behavior is interpreted, who gets reported, or how school personnel enforce discipline differently depending on race.

Boxplot – Black vs. White Students Disciplined by State

Interpretation

This boxplot shows how discipline rates for Black students are consistently higher and more widely spread across states than for White students. White students have lower and more tightly grouped discipline percentages. This reinforces the finding from the t-test and visualizes the nationwide pattern of disparity in school discipline.

Linear Regression Model


Call:
lm(formula = percent ~ category + state, data = df_long)

Residuals:
    Min      1Q  Median      3Q     Max 
-52.719  -4.210  -1.883   3.119  55.056 

Coefficients:
                                                    Estimate Std. Error t value
(Intercept)                                        4.281e+00  6.072e+00   0.705
categoryAsian                                     -2.109e+00  2.762e+00  -0.764
categoryBlack or African American                  1.537e+01  2.762e+00   5.566
categoryHispanic or Latino of any race             1.140e+01  2.762e+00   4.128
categoryNative Hawaiian or Other Pacific Islander -2.287e+00  2.762e+00  -0.828
categoryWhite                                      4.774e+01  2.762e+00  17.285
stateAlabama                                       5.167e-01  8.208e+00   0.063
stateAlaska                                       -8.667e-01  8.208e+00  -0.106
stateArizona                                       1.667e-01  8.208e+00   0.020
stateArkansas                                     -2.333e-01  8.208e+00  -0.028
stateCalifornia                                   -1.000e-01  8.208e+00  -0.012
stateColorado                                     -8.333e-02  8.208e+00  -0.010
stateConnecticut                                  -6.167e-01  8.208e+00  -0.075
stateDelaware                                      7.000e-01  8.208e+00   0.085
stateDistrict of Columbia                          7.000e-01  8.208e+00   0.085
stateFlorida                                       2.000e-01  8.208e+00   0.024
stateGeorgia                                       4.500e-01  8.208e+00   0.055
stateHawaii                                       -1.050e+00  8.208e+00  -0.128
stateIdaho                                         6.000e-01  8.208e+00   0.073
stateIllinois                                     -3.333e-02  8.208e+00  -0.004
stateIndiana                                       3.333e-02  8.208e+00   0.004
stateIowa                                         -1.333e-01  8.208e+00  -0.016
stateKansas                                       -1.000e-01  8.208e+00  -0.012
stateKentucky                                     -4.833e-01  8.208e+00  -0.059
stateLouisiana                                     7.167e-01  8.208e+00   0.087
stateMaine                                         1.167e-01  8.208e+00   0.014
stateMaryland                                     -2.333e-01  8.208e+00  -0.028
stateMassachusetts                                 1.000e-01  8.208e+00   0.012
stateMichigan                                      1.500e-01  8.208e+00   0.018
stateMinnesota                                    -1.667e-01  8.208e+00  -0.020
stateMississippi                                   5.500e-01  8.208e+00   0.067
stateMissouri                                      8.972e-16  8.208e+00   0.000
stateMontana                                       2.000e-01  8.208e+00   0.024
stateNebraska                                     -1.000e-01  8.208e+00  -0.012
stateNevada                                       -6.667e-02  8.208e+00  -0.008
stateNew Hampshire                                 5.667e-01  8.208e+00   0.069
stateNew Jersey                                    3.333e-01  8.208e+00   0.041
stateNew Mexico                                    7.000e-01  8.208e+00   0.085
stateNew York                                      2.667e-01  8.208e+00   0.032
stateNorth Carolina                                2.333e-01  8.208e+00   0.028
stateNorth Dakota                                  7.000e-01  8.208e+00   0.085
stateOhio                                         -1.167e-01  8.208e+00  -0.014
stateOklahoma                                     -4.500e-01  8.208e+00  -0.055
stateOregon                                       -6.833e-01  8.208e+00  -0.083
statePennsylvania                                  5.000e-02  8.208e+00   0.006
statePuerto Rico                                  -1.597e+01  8.208e+00  -1.945
stateRhode Island                                 -4.000e-01  8.208e+00  -0.049
stateSouth Carolina                                1.833e-01  8.208e+00   0.022
stateSouth Dakota                                 -8.500e-01  8.208e+00  -0.104
stateTennessee                                     2.000e-01  8.208e+00   0.024
stateTexas                                         2.500e-01  8.208e+00   0.030
stateUtah                                          4.000e-01  8.208e+00   0.049
stateVermont                                       8.333e-02  8.208e+00   0.010
stateVirginia                                      1.465e-15  8.208e+00   0.000
stateWashington                                   -7.833e-01  8.208e+00  -0.095
stateWest Virginia                                 1.000e-01  8.208e+00   0.012
stateWisconsin                                     6.667e-02  8.208e+00   0.008
stateWyoming                                       3.000e-01  8.208e+00   0.037
                                                  Pr(>|t|)    
(Intercept)                                         0.4814    
categoryAsian                                       0.4457    
categoryBlack or African American                 6.49e-08 ***
categoryHispanic or Latino of any race            4.92e-05 ***
categoryNative Hawaiian or Other Pacific Islander   0.4084    
categoryWhite                                      < 2e-16 ***
stateAlabama                                        0.9499    
stateAlaska                                         0.9160    
stateArizona                                        0.9838    
stateArkansas                                       0.9773    
stateCalifornia                                     0.9903    
stateColorado                                       0.9919    
stateConnecticut                                    0.9402    
stateDelaware                                       0.9321    
stateDistrict of Columbia                           0.9321    
stateFlorida                                        0.9806    
stateGeorgia                                        0.9563    
stateHawaii                                         0.8983    
stateIdaho                                          0.9418    
stateIllinois                                       0.9968    
stateIndiana                                        0.9968    
stateIowa                                           0.9871    
stateKansas                                         0.9903    
stateKentucky                                       0.9531    
stateLouisiana                                      0.9305    
stateMaine                                          0.9887    
stateMaryland                                       0.9773    
stateMassachusetts                                  0.9903    
stateMichigan                                       0.9854    
stateMinnesota                                      0.9838    
stateMississippi                                    0.9466    
stateMissouri                                       1.0000    
stateMontana                                        0.9806    
stateNebraska                                       0.9903    
stateNevada                                         0.9935    
stateNew Hampshire                                  0.9450    
stateNew Jersey                                     0.9676    
stateNew Mexico                                     0.9321    
stateNew York                                       0.9741    
stateNorth Carolina                                 0.9773    
stateNorth Dakota                                   0.9321    
stateOhio                                           0.9887    
stateOklahoma                                       0.9563    
stateOregon                                         0.9337    
statePennsylvania                                   0.9951    
statePuerto Rico                                    0.0528 .  
stateRhode Island                                   0.9612    
stateSouth Carolina                                 0.9822    
stateSouth Dakota                                   0.9176    
stateTennessee                                      0.9806    
stateTexas                                          0.9757    
stateUtah                                           0.9612    
stateVermont                                        0.9919    
stateVirginia                                       1.0000    
stateWashington                                     0.9240    
stateWest Virginia                                  0.9903    
stateWisconsin                                      0.9935    
stateWyoming                                        0.9709    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 14.22 on 260 degrees of freedom
  (159 observations deleted due to missingness)
Multiple R-squared:  0.6529,    Adjusted R-squared:  0.5768 
F-statistic:  8.58 on 57 and 260 DF,  p-value: < 2.2e-16

I ran a linear regression modeling the percent of students disciplined for harassment/bullying, using category (demographic group) and state as predictors.

  • Response variable: Percent of students disciplined
  • Predictors: Student category (race/ethnicity/disability/etc.) and state
  • : 0.653 (adjusted: 0.577)
    → The model explains ~65% of the variance in the discipline percentage
  • Significant predictors:
    • categoryBlack or African American → +15.37%
    • categoryHispanic or Latino → +11.40%
    • categoryWhite → +47.74% (!)
    • Others not significant (p > 0.05)
  • State effects were generally not significant (p > 0.05)

Interpretation: The linear regression model reveals that race/ethnicity strongly predicts how likely a student is to be disciplined. Being in the “Black” or “Hispanic or Latino” category increases the expected discipline rate, even after accounting for state. Interestingly, the coefficient for “White” is also very high, which may reflect how data is recorded—or different reporting practices by race. State itself wasn’t a significant factor, suggesting race has more predictive power than geography in this case.

Logistic Regression Model (Predicting)

This logistic regression model predicts the likelihood of a student being disciplined for bullying or harassing someone based on race. It uses race/ethnicity and state of residence as predictors. The model outputs odds ratios (ORs), which show how much more (or less) likely a group is to be disciplined compared to a baseline group (usually the first or omitted category, like “American Indian or Alaska Native”).

Key Findings from Odds Ratios

Race/Ethnicity:

Black or African American: ~88× higher odds (significant)

Hispanic or Latino: ~107× higher odds (significant)

White: ~1068× higher odds (significant)

Asian & Native Hawaiian: No significant difference from baseline

State: No state showed a statistically significant difference in odds after adjusting for race/ethnicity

Interpretation The logistic model shows the odds of being disciplined are dramatically higher for certain groups—especially White, Latino, and Black students—compared to the baseline group. For example, Latino students had odds over 100× higher than the reference category. These kinds of disparities cannot be explained by behavior alone. They reflect systemic issues in how discipline is assigned, possibly due to teacher bias, policy frameworks, and under-resourced support systems.

Conclusion

What This Data Tells Us

  • Black and Latino students are disciplined for race-based bullying/harassment at significantly higher rates across U.S. states.
  • These disparities are not random—they’re statistically significant and align with national research on discipline bias.
  • Latino students, often overlooked in disciplinary research, face steep penalties despite limited evidence of higher behavioral incidents.
  • State-by-state variation suggests inconsistent policy enforcement, but race remains the most powerful predictor.

Call to Action: What Schools & Policymakers Can Do

Adopt Restorative Justice Practices:
Shift from zero-tolerance toward approaches that prioritize reflection, accountability, and conflict resolution.

Train Educators on Implicit Bias:
Ensure school staff understand how race/ethnicity can unconsciously influence discipline decisions.

Standardize Reporting:
Implement clearer definitions and reporting protocols to reduce subjectivity and inconsistency.

Support Over Punish:
Increase funding for counselors, behavior specialists, and mental health professionals—especially in high-need districts.

Include Student Voice:
Engage students—especially from overdisciplined groups—in shaping discipline policies that are just and effective.

When implemented thoughtfully, these changes can help create school environments that are more fair, supportive, and inclusive for all students.