Group-6
With the widespread use of Machine Learning (ML) in various fields, understanding the behavior of these complex models is particularly important. Traditionally, interpretable methods usually explain model predictions through input features, but this faces challenges in deep learning. The features (e.g., pixel values) on which the model operates do not correspond to high-level concepts understood by humans, and the model’s internal state is difficult to interpret.
To address these issues, the authors propose Concept Activation Vectors (CAV). This approach maps the internal state of the model to human-understandable high-level concepts through user-defined “concepts” using vector space transformations. The TCAV (Test of Concept Activation Vectors) method further quantifies the extent to which a particular concept affects model predictions. This approach does not require re-training the model, allowing flexibility in concept customization and global model interpretation.
In sum, TCAV merges strengths from these methods to assess concept impact on model predictions.
User-defined Concepts: TCAV allows users to define concepts using example sets (e.g., “striped” or “curly” patterns), enabling flexible interpretation beyond existing features.
Concept Activation Vectors (CAVs): CAVs represent concepts as vectors by distinguishing concept samples from random activations, quantifying model sensitivity to each concept.
Conceptual Sensitivity of CAV: CAV quantifies sensitivity (S) of class k related to a concept C based on the following formula:
[ S{C,k,l}(x) = h{l,k}(fl(x)) vlC ]
where, h{l,k} : Rm → R
Vlc ∈ Rm is a unit CAV vector for concept C f1(x) represents activations for input x in layer l
Testing with CAV: For k class label in a supervised learning model with X_k_ inputs,
gives fraction of k-class inputs whose l-layer activation vector was positively influenced by concept C, TCAVQ(C,k,l) ∈ [0, 1].
Statistical Significance Testing
TCAV extensions: Relative TCAV
Semantically related concepts produce CAV’s that are far from orthogonal.
Upon selecting different concepts and training the classifier for those objects, we obtain a vector Vc,d in layer l
It intuitively defines a subspace f(x) which can measure if x is more related to C or D.
Further confirm TCAV’s utility
reveal baises and,
show where the concepts are learned.
Some results confirmed our common sense intuition
These networks were sensitive to gender and race
Statistical significance testing successfully filtered out spurious results.
Quantitative Evaluation of TCAV
Evaluation of Saliency Maps
Diabetic retinopathy (DR) level (0 to 4) using retinal images.
Identified diagnostic concepts relevant for each DR level.
High TCAV scores for “microaneurysms” at DR level 4.
Inconsistencies between model and expert knowledge at lower DR levels.