Researchers from Google Research propose a framework to systematically evaluate behavioral alignment in large language models (LLMs) by converting established assessments into large-scale situational judgment tests. The work, led by Amir Taubenfeld, Zorik Gekhman, and Lior Nezry, aims to map and und…