
We’re excited to invite you to an upcoming webinar with Sanofi, where Abhinav Gupta (Principal ML Scientist, Large Molecule Research) and Ruijiang Li (Computational Scientist Lead, Large Molecule Research) will introduce SNAC-DB a new open-source database designed to improve how AI models predict antibody and nanobody-antigen complexes.
What You’ll Learn
Current AI models struggle to accurately predict antibody-antigen complexes, limiting their usefulness in drug discovery. SNAC-DB addresses this gap with:
- Expanded Coverage: 32% more structural diversity than SAbDab, capturing overlooked assemblies such as antibodies/nanobodies as antigens, complete multi-chain epitopes, and weak CDR crystal contacts.
- ML-Friendly Data: Cleaned PDB/mmCIF files, atom37 NumPy arrays, and unified CSV metadata to eliminate preprocessing hurdles.
- Transparent Redundancy Control: Multi-threshold Foldseek clustering for principled sample weighting, ensuring every experimental structure contributes.
- Rigorous Benchmark: An out of sample test set comprising public PDB entries post May 30, 2024 (fully disclosed) and confidential therapeutic complexes.
Proven Impact
Fine tuning models on SNAC-DB nearly doubled performance compared to training on SAbDab alone. This work was presented at the ICML 2025 Workshop on DataWorld in Vancouver, and all resources including the dataset, code, and paper are publicly available.
View Resources: Paper | Open Review | Dataset | Code
Who Should Attend
Relevant for anyone working in antibody discovery, structural biology, computational modeling, or AI driven drug development.
We look forward to having you join us on 28th January, 2026 at 12pm ET!

