Look for any podcast host, guest or anyone
Showing episodes and shows of

Ben Burtenshaw

Shows

Weaviate PodcastWeaviate PodcastSynthetic Data with David Berenstein and Ben Burtenshaw - Weaviate Podcast #118!Synthetic Data: The Building Bocks of AI's Future! Hey everyone! I am SUPER EXCITED to publish the 118th episode of the Weaviate Podcast featuring David Berenstein and Ben Burtenshaw from HuggingFace! This podcast explores the intricacies of synthetic data generation, detailing methodologies such as data augmentation, distillation, and instruction refinement. The conversation delves into persona-driven synthetic data, highlighting applications like Persona Hub, and discusses algorithms to enhance diversity, complexity, and quality of generated data. Additionally, they cover integration with Hugging Face’s ecosystem, including Argilla for annotation, AutoTrain for fine-tuning, and advanced data exploration tools like the Data Studio an...2025-03-251h 02Daily Paper CastDaily Paper CastSmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model 🤗 Upvotes: 90 | cs.CL Authors: Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Gabriel Martín Blázquez, Guilherme Penedo, Lewis Tunstall, Andrés Marafioti, Hynek Kydlíček, Agustín Piqueres Lajarín, Vaibhav Srivastav, Joshua Lochner, Caleb Fahlgren, Xuan-Son Nguyen, Clémentine Fourrier, Ben Burtenshaw, Hugo Larcher, Haojun Zhao, Cyril Zakka, Mathieu Morlon, Colin Raffel, Leandro von Werra, Thomas Wolf Title: SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Arxiv: http://arxiv.org/abs/2502.02737v1 Abstract: While large language models have facilitated breakthroughs in many app...2025-02-0721 minPractical AIPractical AITowards high-quality (maybe synthetic) datasetsAs Argilla puts it: “Data quality is what makes or breaks AI.” However, what exactly does this mean and how can AI team probably collaborate with domain experts towards improved data quality? David Berenstein & Ben Burtenshaw, who are building Argilla & Distilabel at Hugging Face, join us to dig into these topics along with synthetic data generation & AI-generated labeling / feedback.Join the discussionChangelog++ members save 11 minutes on this episode because they made the ads disappear. Join today!Sponsors:Fly.io – The home of Changelog.com — Deploy your apps close to your users — global Any...2024-10-0957 minChangelog Master FeedChangelog Master FeedTowards high-quality (maybe synthetic) datasets (Practical AI #290)As Argilla puts it: “Data quality is what makes or breaks AI.” However, what exactly does this mean and how can AI team probably collaborate with domain experts towards improved data quality? David Berenstein & Ben Burtenshaw, who are building Argilla & Distilabel at Hugging Face, join us to dig into these topics along with synthetic data generation & AI-generated labeling / feedback. Join the discussionChangelog++ members save 11 minutes on this episode because they made the ads disappear. Join today!Sponsors:Fly.io – The home of Changelog.com — Deploy your apps close to your users —...2024-10-0957 minD\'oh! KapitalD'oh! Kapital24 - Inflation BluesIn which Ben and Jacob discuss the naïve and often-repeated cry to 'abolish' factionalism in political parties, before turning to the brain-hurty subject of inflation. What's causing it, what can be done, who benefits?We recorded this on 31/07.Discussed in this episode:Dennis Glover, It's Time for Labor to Dismantle its Factions, The Saturday Paper: https://www.thesaturdaypaper.com.au/news/politics/2022/07/30/its-time-labor-dismantle-its-factionsRonan Burtenshaw, Mick Lynch is Tired of Workers Getting Screwed, Jacobin: https://jacobin.com/2022/07/mick-lynch-interview-rmt-uk-rail-strikeRichardson, Saunders & Dennis, Are Wages or Profits Driving Australia's Inflation?, T...2022-08-101h 10