Jacob Steinhardt (Google Scholar) (Website) is an assistant professor at UC Berkeley. ย His main research interest is in designing machine learning systems that are reliable and aligned with human values. ย Some of his specific research directions include robustness, rewards specification and reward hacking, as well as scalable alignment.
Highlights:
๐โTest accuracy is a very limited metric.โ
๐จโ๐ฉโ๐งโ๐ฆโYou might not be able to get lots of feedback on human values.โ
๐โIโm interested in measuring the progress in AI capabilities.โ