adaptingagentagentsaialignmentanthropicbasedbehaviorbenchmarkbreakingcacheclaudecontentdeepseekdivedomainevaluatingexplainsfinefuturegeminihungryjudgeskvlanguagelargelearningllamallmllmsmetamodelmodelso1openopenaipaperragreasoningresearchresearcherscalingselfshotsourcesurveytesttexttimetrainingtuninguse