1

Explainer Divergence Scores (EDS): Some Post-Hoc Explanations May be Effective for Detecting Unknown Spurious Correlations

We propose a new general methodology, Explainer Divergence Scores (EDS), to evaluate Post-Hoc Explanations for the purpose of identifying spurious correlations in neural networks. We use our methodology to compare the detection performance of three different explainers - feature attribution methods, influential examples and concept extraction, on two different image datasets.

Shea Cardozo, Gabriel Islas Montero, Dmitry Kazhdan, Botty Dimanov, Maleakhi Wijaya, Mateja Jamnik, Pietro Lio

Explainer Divergence Scores (EDS): Some Post-Hoc Explanations May be Effective for Detecting Unknown Spurious Correlations