Hi! I’m a PhD Student in the Waterloo Intelligent Systems Lab (WISELab) at the University of Waterloo. I am broadly interested in creating generalizable autonomous agents and systems. This encompasses a wide domain of approaches such as model-based reinforcement learning techniques for planning and reasoning, causal learning and understanding, and behaviour prediction in multi-agent scenarios.
We propose a new general methodology, Explainer Divergence Scores (EDS), to evaluate Post-Hoc Explanations for the purpose of identifying spurious correlations in neural networks. We use our methodology to compare the detection performance of three different explainers - feature attribution methods, influential examples and concept extraction, on two different image datasets.