Abstract
3D object tracking has become a popular research topic because of its broad application prospects. However, it remains a challenging task to advance the trustworthiness of deep trackers, caused by the complex network structure of black-box models. In this paper, a local explanation method for 3D object tracking is proposed, which trains an interpretable surrogate model to reveal the contribution of superpoints in the search area. Specifically, local points of the search area with comparable geometric features are aggregated as superpoints, which serve as the fundamentals of the explanation. In contrast to the commonly used voxels, superpoints capture semantic information of the search area, and facilitate an intuitive understanding of the predictions. In addition, a distance-aware masking strategy is proposed for generating the sample set to train a surrogate model, which corresponds to latent contributions of superpoints to predictions and improves the efficacy of the explanation. Experiments have demonstrated that the proposed explainability approach can effectively provide explanations to deep tracking models.