Splash

Column

Presenter

Derek Sollberger

MS, UC Merced (2011)

  • Capstone project: Hidden Markov Models

Online Coursework

  • completed hundreds of hours of online courses in data science
    • Coursera
    • DataCamp

Continuing Lecturer

  • instructor of Data Science
    • Bio 18, Bio 175, Bio 184
    • Math 32, Math 181

Temperanace

(to temper expectations)

  • “Jack of all trades; master of none”
  • usually teach introductory material
  • today’s slides and work was done yesterday

Outline

  1. Overview of Effect Size
    • correlation
    • p-values
    • Cohen’s d
    • Hedges g
  2. Starting Framework to Detect Gender Bias
    • Does a course favor female or male students?
    • automating classification
    • example of effect size
  3. Automating Assessment of Qualitative Responses
    • grading short responses
    • clustering
    • ranking

Effect Size

Column

Correlation

Pearson’s correlation measures how related two vectors are to each other on a scale from \(r = -1\) to \(r = 1\)

  • \(-1 \leq r \leq -0.7\): strongly and negatively correlated
  • \(-0.7 < r < -0.4\): weakly and negatively correlated
  • \(-0.4 \leq r \leq 0.4\): virtually uncorrelated
  • \(0.4 < r < 0.7\): weakly and positively correlated
  • \(0.7 \leq r \leq 1.0\): strongly and positively correlated

Correlation plot

p-values

“In statistical hypothesis testing, the p-value or probability value is the probability of obtaining test results at least as extreme as the results actually observed, assuming that the null hypothesis is correct” (Wikipedia) Many scientists and research journals are taught to seek p-values < 0.05

p-values plot

(image credit: “It’s the Effect Size, Stupid!”)

Example 1

We are going to simulate a comparison of garbanzo beans and chickpeas.

  • null hypothesis: garbanzo beans and chickpeas have the same weight
  • alternative hypothesis: garbanzo beans and chickpeas have different weights

Suppose we compare the two samples. What happens if we repeatedly use those two samples?

p-Hacking

Discussion

But chickpeas and garbanzo means are the same! How did we prove that they have different weights??

  • chickpeas: \(\mu_{C} = 0.663\) grams, \(\sigma_{C} = 0.04\) grams
  • garbanzo beans: \(\mu_{C} = 0.663\) grams, \(\sigma_{C} = 0.04\) grams

p-hacking plot

Cohen’s d

In 1977, Jacob Cohen suggested the following formula (that does not depend on sample size) measure to compare means:

\[d = \frac{|\mu_{1} - \mu_{2}|}{s_{p}}\]

where \(s_{p}\) is the pooled standard deviation.

  • \(0 \leq d < 0.2\): very small effect
  • \(0.2 \leq d < 0.5\): small effect
  • \(0.5 \leq d < 0.8\): medium effect
  • \(0.8 \leq d\): large effect

Cohen’s d plot

Example 2

  • garbanzo beans: \(\mu_{C} = 0.663\) grams, \(\sigma_{C} = 0.0400\) grams
  • kidney beans: \(\mu_{K} = 0.638\) grams, \(\sigma_{K} = 0.023\) grams

The p-value from a t-test will be less than 0.05, but it is still concerning that the p-value tends to decrease as sample size increases.

p-value simulation

  • \(M = 50\) t.tests were performed and p-values were collected at sample sizes \(N\)

Cohen’s d Simulation

Hedges g

In 1981, Hedges suggested the following update measure to compare means:

\[d = \frac{|\mu_{1} - \mu_{2}|}{s_{p}} \cdot \frac{N - 3}{N - 2.25} \sqrt{\frac{N-2}{N}} \]

where \(s_{p}\) is the pooled standard deviation. This correction is said to be better when handling sample sizes < 20.

  • \(0 \leq g < 0.2\): very small effect
  • \(0.2 \leq g < 0.5\): small effect
  • \(0.5 \leq g < 0.8\): medium effect
  • \(0.8 \leq g\): large effect

Hedges g plot

Hedges g simulation

References

Gender Bias

Column

Motivation

In Spring 2020,

  • In my life science class (Bio 18), the top 10 students were evenly divided between women and men
  • In my physical science class (Math 32), 8 of the 10 students were men

Goal: detect gender bias in final semester grades (if any)

Idea: Use the R gender package to automate the process to classify gender.

Disclaimers

  • Using binary “female” and “male” labels
  • Comparing names to USA Social Security Administration

Setup:

  • null hypothesis: female and male students achieve the same grades
  • alternative hypothesis (two-sided): female and male students achieve different grades

Example

Setting

Bio 18

  • “life science course”
  • majors: biological sciences
  • 28 female students, 29 male students

Math 32

  • “physical science course”
  • majors: applied math, bioengineering, chemical sciences, computer science and engineering, environmental engineering, materials science and engineering, mechanical engineering, physics
  • 61 female students, 448 male students

and

  • Spring 2020 semester
  • emergency remote instruction (Covid-19)

Bio 18

  • p-value = 0.2849: fail to reject null hypothesis
  • Cohen’s d = 0.28: small effect
  • Hedges’ g = 0.28: small effect

Density Plot

  • p-value = 0.2849: fail to reject null hypothesis
  • Cohen’s d = 0.28: small effect
  • Hedges’ g = 0.28: small effect

Math 32

  • p-value = 0.6111: fail to reject null hypothesis
  • Cohen’s d = 0.06: very small effect
  • Hedges’ g = 0.06: very small effect

Density Plot

  • p-value = 0.6111: fail to reject null hypothesis
  • Cohen’s d = 0.06: very small effect
  • Hedges’ g = 0.06: very small effect

Automate Grading

Column

Setting

Students in Math 32 did a homework assignment where one task was to perform a hypothesis test on survey results from the survey question, “On a scale from 0 = Democrat to 100 = Republican, where are your political leanings?”.

  • null hypothesis: \(\mu = 50\)
  • alternative hypothesis: \(\mu \neq 50\)

Instructor’s solution: “Since the p-value < 0.05, we reject the claim of an unbiased student population at the alpha = 0.05 significance level.”

Task: measure how “similar” each students’ response was to the instructor’s solution, and then assign grades.

  • assume 3 clusters of responses, and assigning labels “A”, “B”, and “C”

Cosine Distance

Let \(x\) and \(y\) be two “sentence vectors” in a “sentence vector space”. The cosine distance is

\[\cos(x,y) = \frac{x \cdot y}{||x||||y||}\]

  • if two sentences align perfectly (i.e. are exactly the same), then \(\cos(x,y) = 1\)
  • if two sentences are perpendicular, then \(\cos(x,y) = 0\)

Disclaimers:

  • computer scientists use a distance of zero to signify perfect similarity and one for perpendicular vectors. Here I am using the complement (opposite) to align with grading schemes
  • character strings are converted into ASCII code before numerical calculations

Clustering

Future Considerations

In this example, the computer algorithm matched my grading on 76% (38/50) of the observations.

Future Augmentation

  • hard filter
    • e.g. If task is about mitosis, then “interphase”, “prophase”, “metaphase”, “anaphase”, “telophase” are required
  • word count
    • cut off responses at, say, 50 words

Known Issues

  • misspellings? issues of equity?
  • too systematic? formulaic phrasing (Mad Libs)?
  • how much verification is needed?
  • multiple acceptable answers?

End

Column

Thanks!