HOmework 5

Methodology

Purpose: Help the model interpret human-friendly concepts.
Process: Collect examples that represent each concept, like “striped” patterns.
Example: Use images of zebras, tigers, etc., to illustrate “striped.”
Key Point: Concept examples can be from outside the model’s training data.

Purpose: Translate the concept into a form the model can understand (vector).
Process:
1. Extract activations from the model for both concept and non-concept images.
2. Train a classifier to differentiate between concept and non-concept.
Result: A vector representing the concept in the model’s activation space.

Step 3: Sensitivity:
- Purpose: Measure concept impact.
- Process: Use directional derivative; higher values = stronger influence.
Step 4: TCAV Score:
- Purpose: Quantify concept influence.
- Process: Calculate % of positive directional derivatives.
- Example: 80% score = concept influences 80% of predictions.

Step 5: Statistical Validation:
- Purpose: Confirm CAVs and TCAV scores aren’t random.
- Process: Generate random CAVs as control, then use a t-test for comparison.
- Key Point: Ensures TCAV results are statistically meaningful.
Step 6: Relative TCAV:
- Purpose: Compare sensitivity to similar concepts.
- Process: Create a relative CAV to see which concept (e.g., “black hair” vs. “brown hair”) is more influential.
- Example: Shows if the model is more sensitive to “black hair” or “brown hair” in a “celebrity” class.