Listen

Description

Evaluating models on benchmarks, passing a model vibe check, formal reasoning to synthesize datasets, and what type of datasets researchers prefer