Aion-BibleQA: Evaluating Retrieval and Citation Faithfulness in Verse-Grounded Bible RAG Systems
8-page paper introducing a 40-question Bible RAG benchmark for citation faithfulness and false-premise robustness, with v3 retrieval reaching R@5 = 0.941, mean citation_support = 0.978, zero unsupported citations, and 6/6 false-premise refusals.