cake_bake · Qwen2.5-1.5B · LoRA r=64 · L40s · seed 0 · 28k docs · eval_package_v2_manual.json (42 Qs)
Wang, S., Marks, S., Dragan, A., Hendrycks, D., & Hubinger, E. (2025). Modifying Beliefs via Synthetic Document Finetuning. Anthropic Alignment Science. alignment.anthropic.com/2025/modifying-beliefs-via-sdf