Student Performance

Introduction

Student test performance is shaped by a mix of academic preparation, learning environment, and personal circumstances that influence both how well students learn over time and how they perform on the day of an exam. Analyzing multiple variables that shape a student’s test performance provides a holistic view of the factors associated with exam scores and help identify which supports may be most impactful or detrimental to student performance.

Dataset

This data set contained data regarding student test performance for 6,607 students. The data set was almost entirely complete, only missing values pertaining to teacher quality, parental education level, and the school’s distance from home. In this analysis, we examine how study behaviors and engagement, home and socioeconomic context, and school and social factors relate to outcomes. We also consider well-being and individual differences that can affect concentration, stamina, and test-taking conditions.

Findings

The data reveals several noteworthy patterns regarding the factors influencing student exam performance. The mean score change across the entire sample was +7.83 points, indicating a generally positive trend in improvement; however, the distribution is notably wide, spanning approximately −40 to +35 points. Within the dataset, extreme values are observed across all variables, and these outliers tend to correspond with extreme test performance outcomes - a pattern likely attributable to the small sample sizes within those peripheral categories. This suggests that the majority of students cluster around moderate levels across the measured variables, producing performance changes that align with typical expectations. Indeed, a supplementary inquiry using Google Gemini’s AI model estimated that an average individual would likely improve by approximately 7 points on a second test attempt, a figure closely consistent with the observed mean.

Data Setup

Loading the data frame and calculating the score change variable from exam 1 to exam 2

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
import seaborn as sns
from matplotlib.ticker import FuncFormatter


# Load in student test performance data
df = pd.read_excel(r"C:\Users\davislia\Downloads\StudentPerformanceFactors.csv.xlsx")

# Calculate difference between first test score and second test score
df['Score Change'] = df['Exam_Score'] - df['Previous_Scores']

Overall Performance

This bar chart illustrates the overall distribution of test performance changes across the student population. The mean score increase was 7.83 points, and the distribution closely approximates a normal curve, as evidenced by the near equivalence of the mean and median (8 points). This symmetry suggests that the dataset is relatively well-balanced, with comparable proportions of students experiencing gains and declines. The subsequent analyses will examine specific variables that may have contributed to the observed variation in improvement performance.

# Create list of bins that will be used for test improvement
bins = [i for i in range(-45, 40, 5)]

# Create column categorizing test improvement into a bin category
df['Score Change Bins'] = pd.cut(df['Score Change'], bins = bins)

# Create data frame calculating the counts for each bin
df_scorechange = df.groupby('Score Change Bins')['Score Change Bins'].count().reset_index(name = 'Count')

# Set average improvement score
avg_improvement = df['Score Change'].mean()

for idx, row in df_scorechange.iterrows():
    # Get the left side of the improvement range for each bin ie. (-40, -35] becomes -40
    df_scorechange.at[idx, 'Left'] = int(str(row['Score Change Bins']).split(',')[0].replace('(', '').strip())
    # Set colors by test scores that were worse than previous (red), less than average improvement (yellow), greater than average improvement (green)
    if  int(str(row['Score Change Bins']).split(',')[0].replace('(', '').strip()) < 0:
        df_scorechange.at[idx, 'Color'] = 'red'
    elif int(str(row['Score Change Bins']).split(',')[0].replace('(', '').strip()) < avg_improvement:
        df_scorechange.at[idx, 'Color'] = 'yellow'
    else:
        df_scorechange.at[idx, 'Color'] = 'green'

# Create variables to use for color legend
worse = mpatches.Patch(color = 'red', label = 'Worse than previous')
less_than_average = mpatches.Patch(color = 'yellow', label = 'Just below average improvement')
better = mpatches.Patch(color = 'green', label = 'Better than average improvement')

# Create bar chart of improvement ranges and the count of students within each
fig, ax = plt.subplots(figsize = (18, 10))
plt.bar(df_scorechange['Left'], df_scorechange['Count'], width = 5, color = df_scorechange['Color'], edgecolor = 'black', align = 'edge')

# Add vertical dashed line where the mean is and add label
plt.axvline(df['Score Change'].mean(), color = 'blue', linestyle = 'dashed')
ax.text(avg_improvement, 750, f'Mean = {round(avg_improvement, 2)}')

# Add axis labels and title to plot
plt.xlabel('Score Change', size = 14)
plt.ylabel('Count', size = 14)
plt.title('Student Exam Performance', size = 18)

# Create legend and remove top and right spines
ax.legend(handles = [better, less_than_average, worse])
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)

plt.show()

Involvement

This pie chart depicts the composition of students by parental involvement and student motivation level — two factors frequently cited in educational research as influential determinants of academic outcomes. As illustrated, the majority of parents exhibit a medium level of involvement, which may reflect a deliberate effort to support their children’s academic development without imposing undue pressure. While a moderate degree of parental engagement can serve to orient students toward productive study behaviors, the data indicate that most students report medium motivation levels irrespective of the extent of parental involvement. Following medium motivation, low motivation constitutes the next largest category, with high motivation representing the smallest proportion across all parental involvement tiers. Given that approximately 50% of students report medium motivation and roughly 30% report low motivation — even within the medium parental involvement group — it is consistent that the overall mean improvement remains modest, as the motivational profile of the sample does not suggest a population that is highly driven to maximize performance gains.

# Create a df for total students and parental involvement/student motivation
mot_par_df = df.groupby(['Parental_Involvement', 'Motivation_Level'])['Score Change'].count().reset_index(name = 'Total Students')

# Sort the parental involvement and motivation level from low to high
sort_order = ['Low', 'Medium', 'High']
mot_par_df['Parental_Involvement'] = pd.Categorical(mot_par_df['Parental_Involvement'], categories = sort_order, ordered = True)
mot_par_df['Motivation_Level'] = pd.Categorical(mot_par_df['Motivation_Level'], categories = sort_order, ordered = True)
mot_par_df = mot_par_df.sort_values(by = ['Parental_Involvement', 'Motivation_Level']).reset_index(drop=True)


# Set outside color scheme
outside_color_ref = [4, 0, 8]


inside_color_ref = []

# Set inside color scheme
for ref in outside_color_ref:
    for i in range(ref, ref + 4):
        if i not in outside_color_ref:
            inside_color_ref.append(i)

fig = plt.figure(figsize = (10,10))
ax = fig.add_subplot(1,1,1)

colormap = plt.get_cmap('tab20c')
outer_colors = colormap(outside_color_ref)

total_students = len(df)

# Set outside part of pie chart
mot_par_df.groupby(['Parental_Involvement'])['Total Students'].sum().plot(
    kind = 'pie', radius = 1, colors = outer_colors, pctdistance = .83, labeldistance = 1.1,
    wedgeprops = dict(edgecolor = 'w'), textprops = {'fontsize': 14},
    autopct = lambda p: '{:.2f}%\n{:,.0f}'.format(p, (p/100) * total_students),
    startangle = 90)

# Set inside part of pie chart
inner_colors = colormap(inside_color_ref)
mot_par_df['Total Students'].plot(
    kind = 'pie', radius = 0.7, colors = inner_colors, pctdistance = 0.65, labeldistance = 0.8,
    wedgeprops = dict(edgecolor = 'w'), textprops = {'fontsize': 8},
    labels = mot_par_df['Motivation_Level'],
    autopct = lambda p: '{:.2f}%'.format(p),
    startangle = 90)

# Add the hole in the pie chart and all labels
hole = plt.Circle((0,0), 0.3, fc = 'white')
fig1 = plt.gcf()
fig1.gca().add_artist(hole)

plt.title('Number of Students by:\nParental Involvement (Outside)\nStudent Motivation Levels (Inside)', fontsize = 18)
ax.text(0, 0, 'Total Students\n{:,.0f}'.format(total_students), ha = 'center', va = 'center', fontsize = 14) 
                                       

plt.show()

Sleep Influence

This scatter plot examines the relationship between hours of sleep and test performance outcomes. The majority of students fall within the 6–8 hour sleep range, with the highest concentration at 7 hours — a finding that is encouraging, as research consistently associates insufficient sleep with diminished cognitive function and academic performance. The distribution of score changes within each sleep category follows a similar pattern, with smaller sample sizes at the extremes and larger concentrations near the center of the score change spectrum. Due to the substantially reduced sample sizes in the peripheral sleep categories (e.g., 4 and 10 hours), it is difficult to draw definitive conclusions regarding the impact of extreme sleep durations. Overall, the data does not reveal a clear linear relationship between increased sleep and improved test scores, and the majority of students exhibit score changes that cluster near the overall mean.

# Create data frame that shows influence of hours of sleep on test score improvement
df_sleepinfluence = df.groupby(['Score Change Bins', 'Sleep_Hours'])['Score Change Bins'].count().reset_index(name = 'Count')
for idx, row in df_sleepinfluence.iterrows():
    # Get the left side of the improvement range for each bin ie. (-40, -35] becomes -40
    df_sleepinfluence.at[idx, 'Left'] = int(str(row['Score Change Bins']).split(',')[0].replace('(', '').strip())
    
plt.figure(figsize = (18,10))

# Build scatter plot where color and size of each point is based on count of students within   the category
plt.scatter(df_sleepinfluence['Left'], df_sleepinfluence['Sleep_Hours'], marker = '8', cmap = 'viridis', \
            c = df_sleepinfluence['Count'], s = df_sleepinfluence['Count'], edgecolors = 'black')

# Add title, axis labels, and a color bar legend
plt.title('Sleep Influence on Test Performance', fontsize = 18)
plt.xlabel('Change in Test Score', fontsize = 14)
plt.ylabel('Hours of Sleep', fontsize = 14)
cbar = plt.colorbar()
cbar.set_label('Count of Students', rotation = 270, fontsize = 14, color = 'black', labelpad = 25)

# Create lists of improvement score ranges and their position on the x-axis
positions = []
labels = []
for i in range(int(df_sleepinfluence['Left'].min()), int(df_sleepinfluence['Left'].max()) + 1, 5):
    if i % 10 == 0:
        positions.append(i)
        labels.append(f'({i}, {i + 5}]')

# Change the x axis ticks to be improvement ranges rather than a single number
plt.xticks(positions, labels)

plt.show()

Study Habits (Number of Students)

This line chart shows the number of students by the number of hours studied broken out by attendance bucket. The distribution follows an approximately normal pattern across all four score groups, with peaks occurring at approximately 20 hours of study. The (70, 80] group (yellow line) displays a marginally higher peak.

df_study_attendance = df[['Score Change', 'Hours_Studied', 'Attendance']]

# Set study habit bins and create an attendance bin column
bins = [i for i in range(60, 101, 10)]
df_study_attendance['Attendance Bin'] = pd.cut(df_study_attendance['Attendance'], bins = bins)

# Create a dataframe showing average improvement by hours studied and attendance
df_study_habits = df_study_attendance.groupby(['Hours_Studied', 'Attendance Bin'])['Score Change'].mean().reset_index(name = 'Average Test Improvement')

# Create a dataframe showing number of students by hours studied and attendance
df_study_habits_count = df_study_attendance.groupby(['Hours_Studied', 'Attendance Bin'])['Score Change'].count().reset_index(name = 'Number of Students')

fig = plt.figure(figsize = (18,10))
ax = fig.add_subplot(1, 1, 1)

my_colors = {'(60, 70]': 'red',
            '(70, 80]': 'yellow',
            '(80, 90]': 'blue',
            '(90, 100]': 'green',}

# Create line chart of number of students by hours studied, each line separated by attendance
for key, grp in df_study_habits_count.groupby(['Attendance Bin']):
    grp.plot(ax = ax, kind = 'line', x = 'Hours_Studied', y = 'Number of Students', color = my_colors[str(key[0])], label = key[0], marker = '8')

# Add labels to chart and set the x axis
plt.title('Number of Students by Study Habits', fontsize = 20)
ax.set_xlabel('Hours Studied', fontsize = 14)
ax.set_ylabel('Number of Students', fontsize = 14)
ax.set_xticks([i for i in range(0, 46, 5)])

plt.show()

Study Habits (Performance)

This line chart shows the average test performance by number of hours studied broken out by attendance bucket.The data reveal a general pattern of diminishing returns beyond approximately 5–10 hours of study, after which additional study time does not appear to yield proportionally greater improvements. Students in the (60, 70] prior score bracket (red line) demonstrate the highest initial improvement at lower study hours — excluding the 1–4 hour range, where limited sample sizes render the data unreliable. An additional noteworthy observation is the apparent inverse relationship between attendance-based performance and test improvement: students in the (60, 70] group tend to exhibit the greatest gains, while those in the (90, 100] group show comparatively lower improvement. This pattern is consistent with the concept of regression to the mean, whereby lower-performing students have greater capacity for measurable improvement, while higher-performing students face a ceiling effect that constrains the magnitude of observable gains.

fig = plt.figure(figsize = (18,10))
ax = fig.add_subplot(1, 1, 1)

my_colors = {'(60, 70]': 'red',
            '(70, 80]': 'yellow',
            '(80, 90]': 'blue',
            '(90, 100]': 'green',}

# Create line chart of test performance by hours studied, each line separated by attendance
for key, grp in df_study_habits.groupby(['Attendance Bin']):
    grp.plot(ax = ax, kind = 'line', x = 'Hours_Studied', y = 'Average Test Improvement', color = my_colors[str(key[0])], label = key[0], marker = '8')

# Add labels to chart and set the x axis
plt.title('Test Improvement by Study Habits', fontsize = 18)
ax.set_xlabel('Hours Studied', fontsize = 14)
ax.set_ylabel('Average Test Improvement', fontsize = 14)
ax.set_xticks([i for i in range(0, 46, 5)])

plt.show()

Activity & Tutoring (Number of Students)

This heatmap displays the distribution of students across physical activity levels and tutoring session frequencies. The data reveal a pronounced concentration of students within the lower tutoring session range (0–3 sessions) and at light to vigorous physical activity levels. A clear pattern emerges in which student counts diminish substantially as the number of tutoring sessions increases; cells corresponding to six or more sessions contain negligible counts, frequently in the single digits or at zero. Similarly, the extremes of the physical activity spectrum — None and Maximal — contain markedly fewer students across all tutoring levels. This distribution indicates that the vast majority of students engage in moderate levels of both tutoring and physical activity.

# Create dataframe of number of students by physical activity and tutoring sessions
hm_df_count = df.groupby(['Physical_Activity', 'Tutoring_Sessions'])['Tutoring_Sessions'].count().reset_index(name = 'Student Count')

# Set a categorical value for physical activity numeric value
activity_dict = {0:'None', 1:'Very Light', 2:'Light', 3:'Moderate', 4:'Vigorous', 5:'Very Vigorous', 6:'Maximal'}

# Map numeric physical activity to categorical value
hm_df_count['Physical_Activity'] = hm_df_count['Physical_Activity'].map(activity_dict)

# Convert data frame to pivot table
hm_count_pivot = pd.pivot_table(hm_df_count, index = 'Tutoring_Sessions', columns = 'Physical_Activity', values = 'Student Count')
hm_count_pivot = hm_count_pivot.fillna(0)

# Pull in only necessary columns
hm_count_pivot = hm_count_pivot[['None', 'Very Light', 'Light', 'Moderate', 'Vigorous', 'Very Vigorous', 'Maximal']]


fig = plt.figure(figsize = (18,10))
ax = fig.add_subplot(1,1,1)

count_fmt = FuncFormatter(lambda x, p: format(int(x), ','))

# Create heatmap
ax = sns.heatmap(hm_count_pivot, linewidth = 0.2, annot = True, cmap = 'coolwarm', fmt ='.0f',
                 square = False, annot_kws = {'size': 14},
                 cbar_kws = {'format': count_fmt, 'orientation': 'vertical'})

# Add labels and fix x and y ticks
plt.title('Number of Students By Physical Activity and Tutoring Sessions', fontsize = 18, pad = 15)
plt.xlabel('Physical Activity Level', fontsize = 16, labelpad = 10)
plt.ylabel('Number of Tutoring Sessions', fontsize = 16, labelpad = 10)
plt.yticks(rotation = 0, size = 14)

plt.xticks(size = 14)

# Invert y axis
ax.invert_yaxis()

plt.show()

Activity & Tutoring Sessions (Performance)

This heatmap presents the average test improvement across combinations of physical activity level and tutoring session frequency. While the majority of cells display positive average improvements ranging from approximately 4 to 13 points, several notable outliers emerge in cells with extremely small sample sizes. For instance, the cell corresponding to 6 tutoring sessions and Maximal physical activity reports an average improvement of +34.00 points, while 6 tutoring sessions and no physical activity yields -6.00 points — both of which are likely statistical artifacts of minimal observations rather than indicative of meaningful trends. Among the more densely populated cells, a modest positive association between physical activity level and test improvement is observable; students engaging in moderate to vigorous activity with 0–3 tutoring sessions tend to exhibit average improvements in the 7–10 point range, closely aligning with the overall dataset mean.

# Create dataframe of test performance by physical activity and tutoring sessions
hm_df_avg = df.groupby(['Physical_Activity', 'Tutoring_Sessions'])['Score Change'].mean().reset_index(name = 'Test Improvement')

# Map numeric physical activity to categorical value
hm_df_avg['Physical_Activity'] = hm_df_avg['Physical_Activity'].map(activity_dict)

# Convert data frame to pivot table
hm_avg_pivot = pd.pivot_table(hm_df_avg, index = 'Tutoring_Sessions', columns = 'Physical_Activity', values = 'Test Improvement')

# Pull in only necessary columns
hm_avg_pivot = hm_avg_pivot[['None', 'Very Light', 'Light', 'Moderate', 'Vigorous', 'Very Vigorous', 'Maximal']]

fig = plt.figure(figsize = (25,18))
ax = fig.add_subplot(1,1,1)

count_fmt = FuncFormatter(lambda x, p: format(int(x), ','))

# Create heatmap
ax = sns.heatmap(hm_avg_pivot, linewidth = 0.2, annot = True, cmap = 'coolwarm', fmt ='.2f',
                 square = False, annot_kws = {'size': 14},
                 cbar_kws = {'format': count_fmt, 'orientation': 'vertical'})

# Add labels and fix x and y ticks
plt.title('Test Improvement Averages By Physical Activity and Tutoring Sessions', fontsize = 18, pad = 15)
plt.xlabel('Physical Activity Level', fontsize = 16, labelpad = 10)
plt.ylabel('Number of Tutoring Sessions', fontsize = 16, labelpad = 10)
plt.yticks(rotation = 0, size = 14)

plt.xticks(size = 14)

# Invert y axis
ax.invert_yaxis()

plt.show()

Conclusion

The analyses presented in this study offer a comprehensive examination of the multifaceted factors influencing student exam performance, including sleep duration, parental involvement, student motivation, study habits, physical activity, and tutoring frequency. Across all variables examined, the overarching finding is one of moderation — the majority of students cluster around moderate levels of each factor, and the mean test improvement of +7.83 points reflects a modest but positive gain that is consistent with normative test-retest improvement expectations. Notably, no single variable emerged as a dominant predictor of performance gains; rather, the data suggest that student outcomes are the product of a complex interplay among behavioral, environmental, and motivational factors.

It is important to acknowledge that many of the most extreme data points correspond to cells with negligible sample sizes, rendering them statistically unreliable, and that the variables examined are largely quantitative in nature — failing to capture qualitative dimensions such as study strategy effectiveness or the nature of parental engagement. In light of these findings, it is recommended that educators, parents, and institutional stakeholders adopt a holistic and balanced approach to supporting student performance.

Student Performance - Python

Liam Davis

4/3/2026