ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models

Published in arXiv preprint arXiv:2508.01365, 2025

Use Google Scholar for full citation

Recommended citation: Zihan Wang, Rui Zhang, Hongwei Li, Wenshu Fan, Wenbo Jiang, Qingchuan Zhao, Guowen Xu, "ConfGuard: A Simple and Effective Backdoor Detection for Large Language Models." arXiv preprint arXiv:2508.01365, 2025.