Listen

Description

In this episode, we dive deep into the challenges facing time series AI model leaderboards, from hidden information leakage to the complexities of benchmarking foundation models. I sit down with Marcel Meyer to unpack why traditional approaches fall short and how our new TS Arena leaderboard is setting a new standard for fair, future-proof evaluation.

We explore the pitfalls that plague current benchmarks, the surprising ways data contamination can skew results, and the innovative pre-registration protocol we've developed to keep evaluations honest. If you've ever wondered what it takes to build a truly trustworthy AI leaderboard—or why it matters for industry and research alike—this conversation is packed with insights you won't want to miss.