HOmework 5

Methodology

Step 1: Define High-Level Concepts with Sets of Examples

  • Purpose: Help the model interpret human-friendly concepts.
  • Process: Collect examples that represent each concept, like “striped” patterns.
  • Example: Use images of zebras, tigers, etc., to illustrate “striped.”
  • Key Point: Concept examples can be from outside the model’s training data.

Concept Example

Step 2: Create the Concept Activation Vector (CAV)

  • Purpose: Translate the concept into a form the model can understand (vector).
  • Process:
    1. Extract activations from the model for both concept and non-concept images.
    2. Train a classifier to differentiate between concept and non-concept.
  • Result: A vector representing the concept in the model’s activation space.

CAV Example

Steps 3 & 4: Model Sensitivity and TCAV Score

  • Step 3: Sensitivity:
    • Purpose: Measure concept impact.
    • Process: Use directional derivative; higher values = stronger influence.
  • Step 4: TCAV Score:
    • Purpose: Quantify concept influence.
    • Process: Calculate % of positive directional derivatives.
    • Example: 80% score = concept influences 80% of predictions.

Sensitivity and TCAV Example

Steps 5 & 6: Validating CAVs and Comparing Concepts

  • Step 5: Statistical Validation:
    • Purpose: Confirm CAVs and TCAV scores aren’t random.
    • Process: Generate random CAVs as control, then use a t-test for comparison.
    • Key Point: Ensures TCAV results are statistically meaningful.
  • Step 6: Relative TCAV:
    • Purpose: Compare sensitivity to similar concepts.
    • Process: Create a relative CAV to see which concept (e.g., “black hair” vs. “brown hair”) is more influential.
    • Example: Shows if the model is more sensitive to “black hair” or “brown hair” in a “celebrity” class.

Validation and Comparison Example