podcast
details
.com
Print
Share
Look for any podcast host, guest or anyone
Search
Showing episodes and shows of
Ben Burtenshaw
Shows
Weaviate Podcast
Synthetic Data with David Berenstein and Ben Burtenshaw - Weaviate Podcast #118!
Synthetic Data: The Building Bocks of AI's Future! Hey everyone! I am SUPER EXCITED to publish the 118th episode of the Weaviate Podcast featuring David Berenstein and Ben Burtenshaw from HuggingFace! This podcast explores the intricacies of synthetic data generation, detailing methodologies such as data augmentation, distillation, and instruction refinement. The conversation delves into persona-driven synthetic data, highlighting applications like Persona Hub, and discusses algorithms to enhance diversity, complexity, and quality of generated data. Additionally, they cover integration with Hugging Face’s ecosystem, including Argilla for annotation, AutoTrain for fine-tuning, and advanced data exploration tools like the Data Studio an...
2025-03-25
1h 02
Daily Paper Cast
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
🤗 Upvotes: 90 | cs.CL Authors: Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Gabriel Martín Blázquez, Guilherme Penedo, Lewis Tunstall, Andrés Marafioti, Hynek Kydlíček, Agustín Piqueres Lajarín, Vaibhav Srivastav, Joshua Lochner, Caleb Fahlgren, Xuan-Son Nguyen, Clémentine Fourrier, Ben Burtenshaw, Hugo Larcher, Haojun Zhao, Cyril Zakka, Mathieu Morlon, Colin Raffel, Leandro von Werra, Thomas Wolf Title: SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model Arxiv: http://arxiv.org/abs/2502.02737v1 Abstract: While large language models have facilitated breakthroughs in many app...
2025-02-07
21 min
Practical AI
Towards high-quality (maybe synthetic) datasets
As Argilla puts it: “Data quality is what makes or breaks AI.” However, what exactly does this mean and how can AI team probably collaborate with domain experts towards improved data quality? David Berenstein & Ben Burtenshaw, who are building Argilla & Distilabel at Hugging Face, join us to dig into these topics along with synthetic data generation & AI-generated labeling / feedback.Join the discussionChangelog++ members save 11 minutes on this episode because they made the ads disappear. Join today!Sponsors:Fly.io – The home of Changelog.com — Deploy your apps close to your users — global Any...
2024-10-09
57 min
Changelog Master Feed
Towards high-quality (maybe synthetic) datasets (Practical AI #290)
As Argilla puts it: “Data quality is what makes or breaks AI.” However, what exactly does this mean and how can AI team probably collaborate with domain experts towards improved data quality? David Berenstein & Ben Burtenshaw, who are building Argilla & Distilabel at Hugging Face, join us to dig into these topics along with synthetic data generation & AI-generated labeling / feedback. Join the discussionChangelog++ members save 11 minutes on this episode because they made the ads disappear. Join today!Sponsors:Fly.io – The home of Changelog.com — Deploy your apps close to your users —...
2024-10-09
57 min
D'oh! Kapital
24 - Inflation Blues
In which Ben and Jacob discuss the naïve and often-repeated cry to 'abolish' factionalism in political parties, before turning to the brain-hurty subject of inflation. What's causing it, what can be done, who benefits?We recorded this on 31/07.Discussed in this episode:Dennis Glover, It's Time for Labor to Dismantle its Factions, The Saturday Paper: https://www.thesaturdaypaper.com.au/news/politics/2022/07/30/its-time-labor-dismantle-its-factionsRonan Burtenshaw, Mick Lynch is Tired of Workers Getting Screwed, Jacobin: https://jacobin.com/2022/07/mick-lynch-interview-rmt-uk-rail-strikeRichardson, Saunders & Dennis, Are Wages or Profits Driving Australia's Inflation?, T...
2022-08-10
1h 10