When Comparing to Ground Truth is Wrong: On Evaluating GNN Explanation Methods

KDD2021

Lukas Faber,Amin K. Moghaddam,Roger Wattenhofer

We study the evaluation of graph explanation methods. The state of the art to evaluate explanation methods is to first train a GNN, then generate explanations, and finally compare those explanations with the ground truth. We show five pitfalls that sabotage this pipeline because the GNN does not use the ground-truth edges. Thus, the explanation method cannot detect the ground truth. We propose three novel benchmarks: (i) pattern detection, (ii) community detection, and (iii) handling negative evidence and gradient saturation. In a re-evaluation of state-of-the-art explanation methods, we show paths for improving existing methods and highlight further paths for GNN explanation research.

When Comparing to Ground Truth is Wrong: On Evaluating GNN Explanation Methods

Lukas Faber,Amin K. Moghaddam,Roger Wattenhofer

Discussion

Related Contents