Objective:

Determine Determine whether a performance gap exists between male and female students on the TCAP ELA quick score measure.

Visualization 1: This section imports the TCAP ELA dataset, standardizes column names, and confirms that the `gender variable has been added successfully for demographic analysis.

library(dplyr)

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union
library(dplyr)
library(readr)
library(janitor)

Attaching package: 'janitor'
The following objects are masked from 'package:stats':

    chisq.test, fisher.test
# Import dataset
ela <- read_csv("tcap_rutherford.csv")
Rows: 15 Columns: 8
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): Teacher Last Name, Gender, Content Area
dbl (5): EnrolledGrade, Points Earned, Possible Points, Scaling Factor, Stud...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Clean and standardize column names
ela <- ela %>%
  clean_names() %>%
  rename(
    teacher_last    = teacher_last_name,
    gender          = gender,
    content_area    = content_area,
    points_earned   = points_earned,
    points_possible = possible_points,
    scaling_factor  = scaling_factor,
    quick_score     = student_quick_score
  )

# Confirm dataset structure and column names
head(ela)
# A tibble: 6 × 8
  teacher_last gender content_area enrolled_grade points_earned points_possible
  <chr>        <chr>  <chr>                 <dbl>         <dbl>           <dbl>
1 RUTHERFORD   MALE   ENG                       4            46              52
2 RUTHERFORD   FEMALE ENG                       4            45              52
3 RUTHERFORD   MALE   ENG                       5            29              54
4 RUTHERFORD   FEMALE ENG                       5            46              54
5 RUTHERFORD   FEMALE ENG                       5            36              54
6 RUTHERFORD   FEMALE ENG                       5            30              54
# ℹ 2 more variables: scaling_factor <dbl>, quick_score <dbl>
names(ela)
[1] "teacher_last"    "gender"          "content_area"    "enrolled_grade" 
[5] "points_earned"   "points_possible" "scaling_factor"  "quick_score"    

#The dataset now includes gender along with key performance measures such as points earned and quick scores.

Descriptive SummaryThis section summarizes TCAP ELA Quick Scores by gender to identify basic differences in mean, median, and variability in performance.

ela %>%
  group_by(gender) %>%
  summarise(
    mean_quick_score = mean(quick_score, na.rm = TRUE),
    median_quick_score = median(quick_score, na.rm = TRUE),
    sd_quick_score = sd(quick_score, na.rm = TRUE),
    count = n()
  )
# A tibble: 2 × 5
  gender mean_quick_score median_quick_score sd_quick_score count
  <chr>             <dbl>              <dbl>          <dbl> <int>
1 FEMALE             85.1               83.3           6.72     9
2 MALE               82.5               82.3           6.36     6
#These summary values reveal whether there is an observable difference in average or median scores between male and female students, providing an initial indication of possible performance variation.

Visualization 1: ELA Quick Score Distribution by GenderThis visualization compares the distribution of TCAP ELA Quick Scores for male and female students. Differences in medians or score spread may suggest areas where instructional support or engagement patterns differ.

library(ggplot2)

ggplot(ela, aes(x = gender, y = quick_score, fill = gender)) +
  geom_boxplot(alpha = 0.7) +
  labs(
    title = "Distribution of TCAP ELA Quick Scores by Gender",
    x = "Gender",
    y = "Quick Score"
  ) +
  theme_minimal() +
  scale_fill_brewer(palette = "Set2")

A higher median line for one gender indicates higher average performance.Overlapping boxes suggest minimal difference.Wider spreads could indicate more variation within that group’s scores.


Visualization 2: Relationship Between Points Earned and Quick Score, by Gender

This scatterplot examines how total points earned relate to TCAP Quick Scores, with color indicating gender. Any visible separation of trend lines may hint at scoring differences across gender groups.

ggplot(ela, aes(x = points_earned, y = quick_score, color = gender)) +
  geom_point(alpha = 0.7, size = 3) +
  geom_smooth(method = "lm", se = FALSE, color = "black") +
  labs(
    title = "Relationship Between Points Earned and TCAP Quick Score by Gender",
    x = "Points Earned",
    y = "Quick Score",
    color = "Gender"
  ) +
  theme_minimal()
`geom_smooth()` using formula = 'y ~ x'

#If the regression lines for male and female students are similar, it suggests comparable scaling of performance.Divergent lines may indicate that scoring differences exist even when points earned are similar.

Findings Summary: The analysis explored potential differences in TCAP ELA Quick Scores between male and female students. Visual and descriptive results indicate whether performance patterns differ significantly by gender. If one group consistently shows slightly higher scores, this could reflect differences in engagement, test-taking strategies, or instructional access rather than ability.

Because gender analysis involves demographic characteristics, results should be interpreted cautiously. Observed differences are descriptive and may reflect contextual or instructional factors beyond student control.