Look for any podcast host, guest or anyone
Showing episodes and shows of

Daniel Filan

Shows

LessWrong (30+ Karma)LessWrong (30+ Karma)“45 - Samuel Albanie on DeepMind’s AGI Safety Approach” by DanielFilan YouTube link In this episode, I chat with Samuel Albanie about the Google DeepMind paper he co-authored called “An Approach to Technical AGI Safety and Security”. It covers the assumptions made by the approach, as well as the types of mitigations it outlines. Topics we discuss: DeepMind's Approach to Technical AGI Safety and Security Current paradigm continuation No human ceiling Uncertain timelines Approximate continuity and the potential for accelerating capability improvement Misuse and misalignment Societal readiness Misuse mitigations Misalignment mitigations Samuel's thinking about technical AGI safety Following Samuel's work Daniel Filan (00:00:09): Hello, everybody. In t...2025-07-071h 17AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast45 - Samuel Albanie on DeepMind's AGI Safety ApproachIn this episode, I chat with Samuel Albanie about the Google DeepMind paper he co-authored called "An Approach to Technical AGI Safety and Security". It covers the assumptions made by the approach, as well as the types of mitigations it outlines. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/07/06/episode-45-samuel-albanie-deepminds-agi-safety-approach.html   Topics we discuss, and timestamps: 0:00:37 DeepMind's Approach to Technical AGI Safety and Security 0:04:29 Current paradigm continuation 0:19:13 No human ceiling 0:21:22 Uncertain timelines2025-07-071h 15AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast44 - Peter Salib on AI Rights for Human SafetyIn this episode, I talk with Peter Salib about his paper "AI Rights for Human Safety", arguing that giving AIs the right to contract, hold property, and sue people will reduce the risk of their trying to attack humanity and take over. He also tells me how law reviews work, in the face of my incredulity. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/06/28/episode-44-peter-salib-ai-rights-human-safety.html   Topics we discuss, and timestamps: 0:00:40 Why AI rights 0:18:34 Why not r...2025-06-283h 21AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast43 - David Lindner on Myopic Optimization with Non-myopic ApprovalIn this episode, I talk with David Lindner about Myopic Optimization with Non-myopic Approval, or MONA, which attempts to address (multi-step) reward hacking by myopically optimizing actions against a human's sense of whether those actions are generally good. Does this work? Can we get smarter-than-human AI this way? How does this compare to approaches like conservativism? Listen to find out. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/06/15/episode-43-david-lindner-mona.html   Topics we discuss, and timestamps: 0:00:29 What MONA is2025-06-151h 40AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast42 - Owain Evans on LLM PsychologyEarlier this year, the paper "Emergent Misalignment" made the rounds on AI x-risk social media for seemingly showing LLMs generalizing from 'misaligned' training data of insecure code to acting comically evil in response to innocuous questions. In this episode, I chat with one of the authors of that paper, Owain Evans, about that research as well as other work he's done to understand the psychology of large language models. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/06/06/episode-42-owain-evans-llm-psychology.html   Topics w...2025-06-062h 14AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast41 - Lee Sharkey on Attribution-based Parameter DecompositionWhat's the next step forward in interpretability? In this episode, I chat with Lee Sharkey about his proposal for detecting computational mechanisms within neural networks: Attribution-based Parameter Decomposition, or APD for short. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/06/03/episode-41-lee-sharkey-attribution-based-parameter-decomposition.html   Topics we discuss, and timestamps: 0:00:41 APD basics 0:07:57 Faithfulness 0:11:10 Minimality 0:28:44 Simplicity 0:34:50 Concrete-ish examples of APD 0:52:00 Which parts of APD are canonical 0:58:10 Hyperparameter selection 1:06:40 A...2025-06-032h 16AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast40 - Jason Gross on Compact Proofs and InterpretabilityHow do we figure out whether interpretability is doing its job? One way is to see if it helps us prove things about models that we care about knowing. In this episode, I speak with Jason Gross about his agenda to benchmark interpretability in this way, and his exploration of the intersection of proofs and modern machine learning. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/03/28/episode-40-jason-gross-compact-proofs-interpretability.html   Topics we discuss, and timestamps: 0:00:40 - Why compact proofs 2025-03-282h 36AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast38.8 - David Duvenaud on Sabotage Evaluations and the Post-AGI FutureIn this episode, I chat with David Duvenaud about two topics he's been thinking about: firstly, a paper he wrote about evaluating whether or not frontier models can sabotage human decision-making or monitoring of the same models; and secondly, the difficult situation humans find themselves in in a post-AGI future, even if AI is aligned with human intentions.   Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/03/01/episode-38_8-david-duvenaud-sabotage-evaluations-post-agi-future.html FAR.AI: https://far.ai/ FAR.AI on X (aka T...2025-03-0120 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast38.7 - Anthony Aguirre on the Future of Life InstituteThe Future of Life Institute is one of the oldest and most prominant organizations in the AI existential safety space, working on such topics as the AI pause open letter and how the EU AI Act can be improved. Metaculus is one of the premier forecasting sites on the internet. Behind both of them lie one man: Anthony Aguirre, who I talk with in this episode. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/02/09/episode-38_7-anthony-aguirre-future-of-life-institute.html FAR.AI: https://far.ai/ 2025-02-0922 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast38.6 - Joel Lehman on Positive Visions of AITypically this podcast talks about how to avert destruction from AI. But what would it take to ensure AI promotes human flourishing as well as it can? Is alignment to individuals enough, and if not, where do we go form here? In this episode, I talk with Joel Lehman about these questions. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/01/24/episode-38_6-joel-lehman-positive-visions-of-ai.html FAR.AI: https://far.ai/ FAR.AI on X (aka Twitter): https://x.com/farairesearch FAR...2025-01-2515 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast38.5 - Adrià Garriga-Alonso on Detecting AI SchemingSuppose we're worried about AIs engaging in long-term plans that they don't tell us about. If we were to peek inside their brains, what should we look for to check whether this was happening? In this episode Adrià Garriga-Alonso talks about his work trying to answer this question. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Transcript: https://axrp.net/episode/2025/01/20/episode-38_5-adria-garriga-alonso-detecting-ai-scheming.html FAR.AI: https://far.ai/ FAR.AI on X (aka Twitter): https://x.com/farairesearch FAR.AI on YouTube: https://ww...2025-01-2027 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast38.4 - Shakeel Hashim on AI JournalismAI researchers often complain about the poor coverage of their work in the news media. But why is this happening, and how can it be fixed? In this episode, I speak with Shakeel Hashim about the resource constraints facing AI journalism, the disconnect between journalists' and AI researchers' views on transformative AI, and efforts to improve the state of AI journalism, such as Tarbell and Shakeel's newsletter, Transformer. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2025/01/05/episode-38_4-shakeel-hashim-ai-journalism.html FAR.AI: https...2025-01-0524 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast38.4 - Peter Barnett on Technical Governance at MIRIThe Machine Intelligence Research Institute has recently shifted its focus to "technical governance". But what is that actually, and what are they doing? In this episode, I chat with Peter Barnett about his team's work on studying what evaluations can and cannot do, as well as verifying international agreements on AI development. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2024/12/14/episode-38_4-peter-barnett-technical-governance-at-miri.html FAR.AI: https://far.ai/ FAR.AI on X (aka Twitter): https://x.com/farairesearch  F...2024-12-1420 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast38.3 - Erik Jenner on Learned Look-AheadLots of people in the AI safety space worry about models being able to make deliberate, multi-step plans. But can we already see this in existing neural nets? In this episode, I talk with Erik Jenner about his work looking at internal look-ahead within chess-playing neural networks. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2024/12/12/episode-38_3-erik-jenner-learned-look-ahead.html FAR.AI: https://far.ai/ FAR.AI on X (aka Twitter): https://x.com/farairesearch FAR.AI on YouTube: https://w...2024-12-1223 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast39 - Evan Hubinger on Model Organisms of MisalignmentThe 'model organisms of misalignment' line of research creates AI models that exhibit various types of misalignment, and studies them to try to understand how the misalignment occurs and whether it can be somehow removed. In this episode, Evan Hubinger talks about two papers he's worked on at Anthropic under this agenda: "Sleeper Agents" and "Sycophancy to Subterfuge". Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2024/12/01/episode-39-evan-hubinger-model-organisms-misalignment.html   Topics we discuss, and timestamps: 0:00:36 - Model organisms and s...2024-12-011h 45AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast38.2 - Jesse Hoogland on Singular Learning TheoryYou may have heard of singular learning theory, and its "local learning coefficient", or LLC - but have you heard of the refined LLC? In this episode, I chat with Jesse Hoogland about his work on SLT, and using the refined LLC to find a new circuit in language models.   Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2024/11/27/38_2-jesse-hoogland-singular-learning-theory.html FAR.AI: https://far.ai/ FAR.AI on X (aka Twitter): https://x.com/farairesearch  FAR.AI...2024-11-2718 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast38.1 - Alan Chan on Agent InfrastructureRoad lines, street lights, and licence plates are examples of infrastructure used to ensure that roads operate smoothly. In this episode, Alan Chan talks about using similar interventions to help avoid bad outcomes from the deployment of AI agents. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2024/11/16/episode-38_1-alan-chan-agent-infrastructure.html FAR.AI: https://far.ai/ FAR.AI on X (aka Twitter): https://x.com/farairesearch  FAR.AI on YouTube: https://www.youtube.com/@FARAIResearch The Alignment W...2024-11-1724 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast38.0 - Zhijing Jin on LLMs, Causality, and Multi-Agent SystemsDo language models understand the causal structure of the world, or do they merely note correlations? And what happens when you build a big AI society out of them? In this brief episode, recorded at the Bay Area Alignment Workshop, I chat with Zhijing Jin about her research on these questions. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2024/11/14/episode-38_0-zhijing-jin-llms-causality-multi-agent-systems.html FAR.AI: https://far.ai/ FAR.AI on X (aka Twitter): https://x.com/farairesearch FAR...2024-11-1422 minMutual UnderstandingMutual UnderstandingShea Levy on why he disagrees with Less Wrong rationality, Part 1In this podcast, Shea and I tried to hunt down a philosophical disagreement we seem to have by diving into his critique of rationality. We went off on what may or may not have been a big tangent about Internal Family Systems therapy, which I’m a big fan of, and which I think Shea thinks should have more caveats?Unfortunately, our conversation got cut short because partway through, Shea got a call and had to deal with some stuff. We hope to record a Part 2 soon!Transcript:Divia (00:01)Hey, I'm he...2024-10-111h 23AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast37 - Jaime Sevilla on AI ForecastingEpoch AI is the premier organization that tracks the trajectory of AI - how much compute is used, the role of algorithmic improvements, the growth in data used, and when the above trends might hit an end. In this episode, I speak with the director of Epoch AI, Jaime Sevilla, about how compute, data, and algorithmic improvements are impacting AI, and whether continuing to scale can get us AGI. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2024/10/04/episode-37-jaime-sevilla-forecasting-ai.html   T...2024-10-041h 44AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast36 - Adam Shai and Paul Riechers on Computational MechanicsSometimes, people talk about transformers as having "world models" as a result of being trained to predict text data on the internet. But what does this even mean? In this episode, I talk with Adam Shai and Paul Riechers about their work applying computational mechanics, a sub-field of physics studying how to predict random processes, to neural networks. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2024/09/29/episode-36-adam-shai-paul-riechers-computational-mechanics.html   Topics we discuss, and timestamps: 0:00:42 - What computational mechanics i...2024-09-291h 48AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research PodcastNew Patreon tiers + MATS applicationsPatreon: https://www.patreon.com/axrpodcast MATS: https://www.matsprogram.org Note: I'm employed by MATS, but they're not paying me to make this video.2024-09-2805 minMutual UnderstandingMutual UnderstandingIn what sense are there coherence theorems?In this episode, Daniel Filan and I talk about Elliot Thornley’s LessWrong post There are no coherence theorems. Some other LessWrong posts we reference include:* A stylized dialogue on John Wentworth's claims about markets and optimization* Why Not SubagentsTranscript:Divia (00:03)I'm here today with Elliot Thornley, who goes by EJT on less wrong and Daniel Phylin and Elliot is currently a postdoc at the global priorities Institute working on this sort of AI stuff and also some global population work. And at the end we're goi...2024-09-201h 40AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast35 - Peter Hase on LLM Beliefs and Easy-to-Hard GeneralizationHow do we figure out what large language models believe? In fact, do they even have beliefs? Do those beliefs have locations, and if so, can we edit those locations to change the beliefs? Also, how are we going to get AI to perform tasks so hard that we can't figure out if they succeeded at them? In this episode, I chat with Peter Hase about his research into these questions. Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2024/08/24/episode-35-peter-hase-llm-beliefs-easy-to-hard-generalization.html  2024-08-252h 17AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast34 - AI Evaluations with Beth BarnesHow can we figure out if AIs are capable enough to pose a threat to humans? When should we make a big effort to mitigate risks of catastrophic AI misbehaviour? In this episode, I chat with Beth Barnes, founder of and head of research at METR, about these questions and more. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2024/07/28/episode-34-ai-evaluations-beth-barnes.html   Topics we discuss, and timestamps: 0:00:37 - What is METR? 0:02:44 - What is an "eval"? 0:14:42 - H...2024-07-282h 14Europa\'s Children with Kenaz FilanEuropa's Children with Kenaz FilanWhen East Meets West 20In this episode we’re joined by Daniel D. of A Ghost in the Machine. Daniel recently wrote a great piece about the death of the American civic religion. We talk about that article and other pertinent DOOM topics. Ahnaf Ibn QaisDaniel D.Kenaz Filan Get full access to Notes from the End of Time with Kenaz Filan at www.notesfromtheendofti.me/subscribe2024-07-071h 18AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast33 - RLHF Problems with Scott EmmonsReinforcement Learning from Human Feedback, or RLHF, is one of the main ways that makers of large language models make them 'aligned'. But people have long noted that there are difficulties with this approach when the models are smarter than the humans providing feedback. In this episode, I talk with Scott Emmons about his work categorizing the problems that can show up in this setting. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast The transcript: https://axrp.net/episode/2024/06/12/episode-33-rlhf-problems-scott-emmons.html Topics we discuss, and timestamps: 0:00:33 - Deceptive...2024-06-121h 41AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast32 - Understanding Agency with Jan KulveitWhat's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about these questions with Jan Kulveit, who leads the Alignment of Complex Systems research group. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast The transcript: axrp.net/episode/2024/05/30/episode-32-understanding-agency-jan-kulveit.html Topics we discuss, and timestamps: 0:00:47 - What is active inference? 0:15:14 - Preferences in active inference 0:31:33 - Action vs perception in active inference 0:46:07 - Feedback loops 1:01:32...2024-05-302h 22AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast31 - Singular Learning Theory with Daniel MurfetWhat's going on with deep learning? What sorts of models get learned, and what are the learning dynamics? Singular learning theory is a theory of Bayesian statistics broad enough in scope to encompass deep neural networks that may help answer these questions. In this episode, I speak with Daniel Murfet about this research program and what it tells us. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast Topics we discuss, and timestamps: 0:00:26 - What is singular learning theory? 0:16:00 - Phase transitions 0:35:12 - Estimating the local learning coefficient2024-05-072h 32AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast30 - AI Security with Jeffrey LadishTop labs use various forms of "safety training" on models before their release to make sure they don't do nasty stuff - but how robust is that? How can we ensure that the weights of powerful AIs don't get leaked or stolen? And what can AI even do these days? In this episode, I speak with Jeffrey Ladish about security and AI. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast Topics we discuss, and timestamps: 0:00:38 - Fine-tuning away safety training 0:13:50 - Dangers of open LLMs vs internet search 2024-04-302h 15AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast29 - Science of Deep Learning with Vikrant VarmaIn 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having to actually understand the topic at hand. Earlier, in 2021, it was announced that neural networks sometimes 'grok': that is, when training them on certain tasks, they initially memorize their training data (achieving their training goal in a way that doesn't generalize), but then suddenly switch to understanding the 'real' solution in a way that generalizes. What's going on with these discoveries? Are they all they're cracked up to be, and if so...2024-04-252h 13The Filan CabinetThe Filan Cabinet14 - The 2024 EclipseIn this episode, I give you updates from my trip with friends to see the 2024 total solar eclipse. Questions answered include: - Why are we bothering to go see it? - How many of us will fail to make it to the eclipse? - Does it actually get darker during a total solar eclipse, or is that just an optical illusion? - What moral dilemma will we face, and what will we do? - Whose lav mic will mysteriously fail to work during their interview?2024-04-251h 33AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast28 - Suing Labs for AI Risk with Gabriel WeilHow should the law govern AI? Those concerned about existential risks often push either for bans or for regulations meant to ensure that AI is developed safely - but another approach is possible. In this episode, Gabriel Weil talks about his proposal to modify tort law to enable people to sue AI companies for disasters that are "nearly catastrophic". Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast   Topics we discuss, and timestamps: 0:00:35 - The basic idea 0:20:36 - Tort law vs regulation 0:29:10 - Weil's proposal vs H...2024-04-171h 57AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast27 - AI Control with Buck Shlegeris and Ryan GreenblattA lot of work to prevent AI existential risk takes the form of ensuring that AIs don't want to cause harm or take over the world---or in other words, ensuring that they're aligned. In this episode, I talk with Buck Shlegeris and Ryan Greenblatt about a different approach, called "AI control": ensuring that AI systems couldn't take over the world, even if they were trying to. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast   Topics we discuss, and timestamps: 0:00:31 - What is AI control? 0:16:16 - Protocols for A...2024-04-112h 56Orkan Varan ile Sinema MinemaOrkan Varan ile Sinema MinemaVoldemort'u oynayan adam dersek ayıp olur! | Ralph Fiennes kimdir?Hazırlayan & Sunan: Orkan Varan . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2024-02-1814 minPigeon HourPigeon HourBest of Pigeon HourTable of contentsNote: links take you to the corresponding section below; links to the original episode can be found there.* Laura Duffy solves housing, ethics, and more [00:01:16]* Arjun Panickssery solves books, hobbies, and blogging, but fails to solve the Sleeping Beauty problem because he's wrong on that one [00:10:47]* Nathan Barnard on how financial regulation can inform AI regulation [00:17:16]* Winston Oswald-Drummond on the tractability of reducing s-risk, ethics, and more [00:27:48]* Nathan Barnard (again!) on why general intelligence is basically fake [00:34:10]* Daniel Filan on why I'm...2024-01-241h 47AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast26 - AI Governance with Elizabeth SegerThe events of this year have highlighted important questions about the governance of artificial intelligence. For instance, what does it mean to democratize AI? And how should we balance benefits and dangers of open-sourcing powerful AI systems such as large language models? In this episode, I speak with Elizabeth Seger about her research on these questions. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast   Topics we discuss, and timestamps:  - 0:00:40 - What kinds of AI?  - 0:01:30 - Democratizing AI    - 0:04:44 - How people talk about democ...2023-11-261h 57Knižný kompas | Podcast o knihách a čítaníKnižný kompas | Podcast o knihách a čítaníLegendárny ELÁN v jedinečnej knihe. Jožo Ráž, Jano Baláž, Boris Filan...Vypočujte si reportáž o unikátnej knihe Elán, o ktorej hovoria: Jožo Ráž a Jano Baláž textári Boris Filan a Ľuboš Zeman autorka knihy Marcela Titzlová manažérka Elánu Karolína "Karotka" Halenárová autor bubáka Alan Lesyk externá pamäť Elánu Oskar Lehotský Gabriela Belopotocká a Marek Néma z vydavateľstva Ikar Roman Bomboš, moderátor a hudobný publicista Kniha Elán vyšla v bežnej trejdovej verzii TU, alebo v luxusnej verzii a limitovanom počte len 333 výtlačkov, ktorú nájdete len na LuxusnáKnižnica.sk.  Ďalšie knižné tipy v podca...2023-11-2059 minThe Filan CabinetThe Filan Cabinet13 - Aaron Silverbook on anti-cavity bacteriaIn this episode, I speak with Aaron Silverbook about the bacteria that cause cavities, and how different bacteria can prevent them: specifically, a type of bacterium that you can buy at luminaprobiotic.com. This podcast episode has not been approved by the FDA. Specific topics we talk about include: How do bacteria cause cavities? How can you create an anti-cavity bacterium? What's going on with the competitive landscape of mouth bacteria? How dangerous is it to colonize your mouth with a novel bacterium? Why hasn't this product been available for 20 years already? Lumina Probiotic (the brand name...2023-11-2049 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast25 - Cooperative AI with Caspar OesterheldImagine a world where there are many powerful AI systems, working at cross purposes. You could suppose that different governments use AIs to manage their militaries, or simply that many powerful AIs have their own wills. At any rate, it seems valuable for them to be able to cooperatively work together and minimize pointless conflict. How do we ensure that AIs behave this way - and what do we need to learn about how rational agents interact to make that more clear? In this episode, I'll be speaking with Caspar Oesterheld about some of his research on this very...2023-10-033h 02The Filan CabinetThe Filan Cabinet12 - Holly Elmore on AI pauseIn this episode, I talk to Holly Elmore about her advocacy around AI Pause - encouraging governments to pause the development of more and more powerful AI. Topics we discuss include: Why advocate specifically for AI pause? What costs of AI pause would be worth it? What might AI pause look like? What are the realistic downsides of AI pause? How the Effective Altruism community relates to AI labs. The shift in the alignment community from proving things about alignment to messing around with ML models. Holly's X (twitter) account PauseAI discord2023-09-131h 29Pigeon HourPigeon Hour#6 Daniel Filan on why I'm wrong about ethics (+ Oppenheimer and what names mean in like a hardcore phil of language sense)Note: the core discussion on ethics begins at 7:58 and moves into philosophy of language at ~1:12:19 Blurb and bulleted summary from Clong: This wide-ranging conversation between Daniel and Aaron touches on movies, business drama, philosophy of language, ethics and legal theory. The two debate major ethical concepts like utilitarianism and moral realism. Thought experiments around rational beings choosing to undergo suffering feature prominently. meandering tangents explore the semantics of names and references. Aaron asserts that total utilitarianism does not imply that any amount of suffering can be morally justified by creating more happiness. His argument...2023-08-072h 05Pigeon HourPigeon Hour#6 Daniel Filan on why I'm wrong about ethics (+ Oppenheimer and what names mean in like a hardcore phil of language sense)Listen on: * Spotify* Apple Podcasts* Google PodcastsNote: the core discussion on ethics begins at 7:58 and moves into philosophy of language at ~1:12:19Daniel’s stuff:* AI X-risk podcast * The Filan Cabined podcast* Personal website and blogBlurb and bulleted summary from ClongThis wide-ranging conversation between Daniel and Aaron touches on movies, business drama, philosophy of language, ethics and legal theory. The two debate major ethical concepts like utilitarianism and moral realism. Thought experiments around rational beings choosing to un...2023-08-072h 05AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast24 - Superalignment with Jan LeikeRecently, OpenAI made a splash by announcing a new "Superalignment" team. Lead by Jan Leike and Ilya Sutskever, the team would consist of top researchers, attempting to solve alignment for superintelligent AIs in four years by figuring out how to build a trustworthy human-level AI alignment researcher, and then using it to solve the rest of the problem. But what does this plan actually involve? In this episode, I talk to Jan Leike about the plan and the challenges it faces. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast Episode art by Hamish...2023-07-272h 08AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast23 - Mechanistic Anomaly Detection with Mark XuIs there some way we can detect bad behaviour in our AI system without having to know exactly what it looks like? In this episode, I speak with Mark Xu about mechanistic anomaly detection: a research direction based on the idea of detecting strange things happening in neural networks, in the hope that that will alert us of potential treacherous turns. We both talk about the core problems of relating these mechanistic anomalies to bad behaviour, as well as the paper "Formalizing the presumption of independence", which formulates the problem of formalizing heuristic mathematical reasoning, in the hope that...2023-07-272h 05AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research PodcastSurvey, store closing, PatreonVery brief survey: bit.ly/axrpsurvey2023 Store is closing in a week! Link: store.axrp.net/ Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast2023-06-2904 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast22 - Shard Theory with Quintin PopeWhat can we learn about advanced deep learning systems by understanding how humans learn and form values over their lifetimes? Will superhuman AI look like ruthless coherent utility optimization, or more like a mishmash of contextually activated desires? This episode's guest, Quintin Pope, has been thinking about these questions as a leading researcher in the shard theory community. We talk about what shard theory is, what it says about humans and neural networks, and what the implications are for making AI safe. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast Episode art by...2023-06-153h 28AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast21 - Interpretability for Engineers with Stephen CasperLots of people in the field of machine learning study 'interpretability', developing tools that they say give us useful information about neural networks. But how do we know if meaningful progress is actually being made? What should we want out of these tools? In this episode, I speak to Stephen Casper about these questions, as well as about a benchmark he's co-developed to evaluate whether interpretability tools can find 'Trojan horses' hidden inside neural nets. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast   Topics we discuss, and timestamps:  - 00:00:42...2023-05-021h 56The Filan CabinetThe Filan Cabinet11 - Divia Eden and Ronny Fernandez on the orthogonality thesisIn this episode, Divia Eden and Ronny Fernandez talk about the (strong) orthogonality thesis - that arbitrarily smart intelligences can be paired with arbitrary goals, without additional complication beyond that of specifying the goal - with light prompting from me. Topics they touch on include: Why aren't bees brilliant scientists? Can you efficiently make an AGI out of one part that predicts the future conditioned on some plans, and another that evaluates whether plans are good? If minds are made of smaller sub-agents with more primitive beliefs and desires, does that shape their terminal goals? Also, how would...2023-04-282h 37The Filan CabinetThe Filan Cabinet10 - Jeffrey Heninger on MormonismIn this episode I chat with Jeffrey Heninger about his religious beliefs and practices as a member of the Church of Jesus Christ of Latter-day Saints, sometimes colloquially referred to as "the Mormon church" or "the LDS church". Topics we talk about include: Who or what is God? How can we know things about God? In particular, what role does religious experience play? To what degree is modern morality downstream of Jesus? What's in the Book of Mormon? What does modern-day prophecy look like? What do Sunday services look like in the LDS church? What happens after you...2023-04-152h 34AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast20 - 'Reform' AI Alignment with Scott AaronsonHow should we scientifically think about the impact of AI on human civilization, and whether or not it will doom us all? In this episode, I speak with Scott Aaronson about his views on how to make progress in AI alignment, as well as his work on watermarking the output of language models, and how he moved from a background in quantum complexity theory to working on AI.   Note: this episode was recorded before this story (vice.com/en/article/pkadgm/man-dies-by-suicide-after-talking-with-ai-chatbot-widow-says) emerged of a man committing suicide after discussions with a language-model-based chatbot, that i...2023-04-122h 27The Filan CabinetThe Filan Cabinet9 - Effective Altruism Global: Bay Area (2023)Every year, the Centre for Effective Altruism runs a number of "Effective Altruism Global" (EA Global or EAG for short) conferences thru-out the world. This year, I attended the one held in the San Francisco Bay Area, and talked to a variety of participants about their relationship with effective altruism, the community around that idea, and the conference. Timestamps: 00:00:16 - interview 1 00:07:06 - interview 2 00:15:46 - interview 3 00:22:35 - interview 4 00:31:22 - interview 5 00:38:30 - interview 6 00:44:18 - interview 7 00:48:59 - interview 8 00:53:14 - interview 9 00:56:22 - interview 10 01:01:08 - interview 11 01:06:50 - interview 12 Website for EA Global conferences2023-03-131h 15The Filan CabinetThe Filan Cabinet8 - John Halstead on climate doomIn this episode I chat with John Halstead about whether climate change will kill us all. He thinks it won't. Topics we talk about include: How did the effective altrism community come to have someone dedicated to the question of whether climate change will kill us all? How bad will climate change likely be? How is the role of carbon dioxide in the atmosphere different from that of other greenhouse gasses? How big a volcano would have to go off to warm up the world by 10 degrees Celsius? How concerned should we be about climate change as a...2023-03-121h 34The Filan CabinetThe Filan Cabinet7 - Shea Levy on ObjectivismIn this episode I speak with Shea Levy about Ayn Rand's philosophy of Objectivism, and what it has to say about ethics and epistemology. Topics we talk about include: What is Objectivism? Can you be an Objectivist and disagree with Ayn Rand? What's the Objectivist theory of aesthetics? Why isn't there a biography of Ayn Rand approved of by orthodox Objectivists? What's so bad about altruism, or views like utilitarianism? What even is selfishness? Can we be mistaken about what we perceive? If so, how? What is consciousness? Could it just be computation? Note that the episode...2023-02-142h 50AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research PodcastStore, Patreon, VideoStore: https://store.axrp.net/ Patreon: https://www.patreon.com/axrpodcast Ko-fi: https://ko-fi.com/axrpodcast Video: https://www.youtube.com/watch?v=kmPFjpEibu02023-02-0702 minThe Filan CabinetThe Filan Cabinet6 - Oliver Habryka on LessWrong and other projectsIn this episode I speak with Oliver Habryka, head of Lightcone Infrastructure, the organization that runs the internet forum LessWrong, about his projects in the rationality and existential risk spaces. Topics we talk about include: How did LessWrong get revived? How good is LessWrong? Is there anything that beats essays for making intellectual contributions on the internet? Why did the team behind LessWrong pivot to property development? What does the FTX situation tell us about the wider LessWrong and Effective Altruism communities? What projects could help improve the world's rationality? Oli on LessWrong Oli on...2023-02-051h 58AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast19 - Mechanistic Interpretability with Neel NandaHow good are we at understanding the internal computation of advanced machine learning models, and do we have a hope at getting better? In this episode, Neel Nanda talks about the sub-field of mechanistic interpretability research, as well as papers he's contributed to that explore the basics of transformer circuits, induction heads, and grokking.   Topics we discuss, and timestamps:  - 00:01:05 - What is mechanistic interpretability?  - 00:24:16 - Types of AI cognition  - 00:54:27 - Automating mechanistic interpretability  - 01:11:57 - Summarizing the papers  - 01:24:43 - 'A Mathematical Framew...2023-02-043h 52The Filan CabinetThe Filan Cabinet5 - Divia Eden on operant conditioningIn this episode, I speak with Divia Eden about operant conditioning, and how relevant it is to human and non-human animal behaviour. Topics we cover include: How close are we to teaching grammar to dogs? What are the important differences between human and dog cognition? How important are unmodelled "trainer effects" in dog training? Why do people underrate positive reinforcement? How does operant conditioning relate to attachment theory? How much does successful dog training rely on the trainer being reinforced by the dog? Why is game theory so fake? Is everything really just about calmness? Divia's twitter...2023-01-152h 33The Adam Sank ShowThe Adam Sank ShowLAST ASS: Sanks for the MemoriesOur monster two-hour and 20-minute send-off! Featuring Steve Chazaro J.B. Bercy, Ryan Frostig, Drew Lausch, Michelle Buteau, Julie Halston, Daniel Reichard, Patrick McCollum, Joanne Filan, Irene Bremis, Joey DeGrandis, Hunter Foster, Jennifer Cody, Stone & Stone, Frank DeCaro, Micheal Rice, Glenn Scarpelli and Stephen Wallem! With special appearances by Rocco Steele and two-time ghost guest Justin Utley! Plus, a final update on Natalia, the orphan dwarf. Make sure you stick around 'til the very end to hear all the listener voicemails. Thank you, and good night. Visit https://linktr.ee/AdamSank 2023-01-022h 19The Filan CabinetThe Filan Cabinet4 - Peter Jaworski on paid plasma donationIn this episode, Peter Jaworski talks about the practice of paid plasma donation, whether it's ethical to allow it, and his work to advocate for it to be legalized in more jurisdictions. He answers questions such as: Which country used to run clinics in a former colony to pay their former colonial subjects for their plasma? Why can't we just synthesize what we need out of plasma? What percentage of US exports by dollar value does plasma account for? If I want to gather plasma, is it cheaper to pay donors, or not pay them? Is legal paid...2022-11-081h 34AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research PodcastNew podcast - The Filan CabinetI have a new podcast, where I interview whoever I want about whatever I want. It's called "The Filan Cabinet", and you can find it wherever you listen to podcasts. The first three episodes are about pandemic preparedness, God, and cryptocurrency. For more details, check out the podcast website (thefilancabinet.com), or search "The Filan Cabinet" in your podcast app.2022-10-1301 minThe Filan CabinetThe Filan Cabinet3 - Ameen Soleimani on cryptocurrencyIn this episode, cryptocurrency developer Ameen Soleimani talks about his vision of the cryptocurrency ecosystem, as well as his current project RAI: an ether-backed floating-price stablecoin. He answers questions such as: What's the point of cryptocurrency? If this is the beginning of the cryptocurrency world, what will the middle be? What would the sign be that cryptocurrency is working? How does RAI work? Does the design of RAI make it impossible for it to be widely used? What's wrong with how the US dollar works? Ameen on twitter: https://twitter.com/ameensol Reflexer Finance: https...2022-09-181h 22The Filan CabinetThe Filan Cabinet2 - Wayne Forkner on GodIn this episode, Presbyterian Pastor Wayne Forkner talks about God, Christianity, and the Bible. He answers questions such as: What is 'God'? Why do people talk about Jesus so much more than the Father or the Holy Spirit? What is heaven actually like? If justification is by faith alone and not by works, why does the Bible say "A person is justified by works and not by faith alone"? How can people tell that out of all the religions, Christianity is the right one? His church's website: https://www.berkeleyopc.org/ His podcast, Proclaiming the...2022-09-183h 38The Filan CabinetThe Filan Cabinet1 - Carrick Flynn on his congressional campaignIn this episode, Carrick Flynn talks about his campaign to be the Democratic nominee for Oregon's 6th congressional district. In particular, we talk about his policies on pandemic preparedness and semiconductor manufacturing. He answers questions such as: Was he surprised by the election results? Should we expect another Carrick campaign? What specific things should or could the government fund to limit the spread of pandemics? Why would those work? What is working at a semiconductor plant like? Carrick's campaign site: https://www.carrickflynnfororegon.com/ Andrea Salinas' campaign site: https://www.andreasalinasfororegon.com/2022-09-181h 16AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast18 - Concept Extrapolation with Stuart ArmstrongConcept extrapolation is the idea of taking concepts an AI has about the world - say, "mass" or "does this picture contain a hot dog" - and extending them sensibly to situations where things are different - like learning that the world works via special relativity, or seeing a picture of a novel sausage-bread combination. For a while, Stuart Armstrong has been thinking about concept extrapolation and how it relates to AI alignment. In this episode, we discuss where his thoughts are at on this topic, what the relationship to AI alignment is, and what the open questions are.2022-09-041h 46AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast17 - Training for Very High Reliability with Daniel ZieglerSometimes, people talk about making AI systems safe by taking examples where they fail and training them to do well on those. But how can we actually do this well, especially when we can't use a computer program to say what a 'failure' is? In this episode, I speak with Daniel Ziegler about his research group's efforts to try doing this with present-day language models, and what they learned. Listeners beware: this episode contains a spoiler for the Animorphs franchise around minute 41 (in the 'Fanfiction' section of the transcript).   Topics we discuss, and t...2022-08-221h 00AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast16 - Preparing for Debate AI with Geoffrey IrvingMany people in the AI alignment space have heard of AI safety via debate - check out AXRP episode 6 (axrp.net/episode/2021/04/08/episode-6-debate-beth-barnes.html) if you need a primer. But how do we get language models to the stage where they can usefully implement debate? In this episode, I talk to Geoffrey Irving about the role of language models in AI safety, as well as three projects he's done that get us closer to making debate happen: using language models to find flaws in themselves, getting language models to back up claims they make with citations, and figuring...2022-07-021h 04AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast15 - Natural Abstractions with John WentworthWhy does anybody care about natural abstractions? Do they somehow relate to math, or value learning? How do E. coli bacteria find sources of sugar? All these questions and more will be answered in this interview with John Wentworth, where we talk about his research plan of understanding agency via natural abstractions. Topics we discuss, and timestamps:  - 00:00:31 - Agency in E. Coli  - 00:04:59 - Agency in financial markets  - 00:08:44 - Inferring agency in real-world systems  - 00:16:11 - Selection theorems  - 00:20:22 - Abstraction and natural abstractions  - 00:32:42 - Info...2022-05-231h 36AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast14 - Infra-Bayesian Physicalism with Vanessa KosoyLate last year, Vanessa Kosoy and Alexander Appel published some research under the heading of "Infra-Bayesian physicalism". But wait - what was infra-Bayesianism again? Why should we care? And what does any of this have to do with physicalism? In this episode, I talk with Vanessa Kosoy about these questions, and get a technical overview of how infra-Bayesian physicalism works and what its implications are.   Topics we discuss, and timestamps:  - 00:00:48 - The basics of infra-Bayes  - 00:08:32 - An invitation to infra-Bayes  - 00:11:23 - What is naturalized induction?  ...2022-04-061h 47AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast13 - First Principles of AGI Safety with Richard NgoHow should we think about artificial general intelligence (AGI), and the risks it might pose? What constraints exist on technical solutions to the problem of aligning superhuman AI systems with human intentions? In this episode, I talk to Richard Ngo about his report analyzing AGI safety from first principles, and recent conversations he had with Eliezer Yudkowsky about the difficulty of AI alignment.   Topics we discuss, and timestamps:  - 00:00:40 - The nature of intelligence and AGI    - 00:01:18 - The nature of intelligence    - 00:06:09 - AGI: what and how   ...2022-03-311h 33The Nonlinear Library: LessWrong Top PostsThe Nonlinear Library: LessWrong Top PostsThe ground of optimization by alexflintWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.This is: The ground of optimization, published by alexflint on the LessWrong.Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This work was supported by OAK, a monastic community in the Berkeley hills. This document could not have been written without the daily love of living in this beautiful community. The work involved in writing this cannot be separated from the sitting, chanting, cooking, cleaning, crying, correcting, fundraising, listening, laughing, and teaching...2021-12-1242 minThe Nonlinear Library: LessWrong Top PostsThe Nonlinear Library: LessWrong Top PostsThe ground of optimization by alexflintWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The ground of optimization, published by alexflint on the LessWrong. Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This work was supported by OAK, a monastic community in the Berkeley hills. This document could not have been written without the daily love of living in this beautiful community. The work involved in writing this cannot be separated from the sitting, chanting, cooking, cleaning, crying, correcting, fundraising, listening, laughing, and teaching...2021-12-1242 minThe Nonlinear Library: LessWrong Top PostsThe Nonlinear Library: LessWrong Top Posts2018 Review: Voting Results! by Ben PaceWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: 2018 Review: Voting Results! , published by Ben Pace on the AI Alignment Forum. The votes are in! 59 of the 430 eligible voters participated, evaluating 75 posts. Meanwhile, 39 users submitted a total of 120 reviews, with most posts getting at least one review. Thanks a ton to everyone who put in time to think about the posts - nominators, reviewers and voters alike. Several reviews substantially changed my mind about many topics and ideas, and I was quite grateful for...2021-12-1113 minThe Nonlinear Library: LessWrong Top PostsThe Nonlinear Library: LessWrong Top Posts2018 Review: Voting Results! by Ben PaceWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.This is: 2018 Review: Voting Results! , published by Ben Pace on the AI Alignment Forum.The votes are in!59 of the 430 eligible voters participated, evaluating 75 posts. Meanwhile, 39 users submitted a total of 120 reviews, with most posts getting at least one review.Thanks a ton to everyone who put in time to think about the posts - nominators, reviewers and voters alike. Several reviews substantially changed my mind about many topics and ideas, and I was quite grateful for...2021-12-1113 minThe Nonlinear Library: LessWrong Top PostsThe Nonlinear Library: LessWrong Top PostsCryonics signup guide #1: Overview by mingyuanWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Cryonics signup guide #1: Overview , published by mingyuan on the AI Alignment Forum. This is the introduction to a sequence on signing up for cryonics. In the coming posts I will lay out what you need to do, concretely and in detail. This sequence is intended for people who already think signing up for cryonics is a good idea but are putting it off because they're not sure what they actually need to do next. I...2021-12-1109 minThe Nonlinear Library: LessWrong Top PostsThe Nonlinear Library: LessWrong Top PostsCryonics signup guide #1: Overview by mingyuanWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio.This is: Cryonics signup guide #1: Overview , published by mingyuan on the AI Alignment Forum.This is the introduction to a sequence on signing up for cryonics. In the coming posts I will lay out what you need to do, concretely and in detail. This sequence is intended for people who already think signing up for cryonics is a good idea but are putting it off because they're not sure what they actually need to do next. I...2021-12-1109 minThe Nonlinear Library: Alignment Forum Top PostsThe Nonlinear Library: Alignment Forum Top PostsThe ground of optimization by Alex FlintWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: The ground of optimization, published by Alex Flint on the AI Alignment Forum. This work was supported by OAK, a monastic community in the Berkeley hills. This document could not have been written without the daily love of living in this beautiful community. The work involved in writing this cannot be separated from the sitting, chanting, cooking, cleaning, crying, correcting, fundraising, listening, laughing, and teaching of the whole community. What is optimization...2021-12-1043 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast12 - AI Existential Risk with Paul ChristianoWhy would advanced AI systems pose an existential risk, and what would it look like to develop safer systems? In this episode, I interview Paul Christiano about his views of how AI could be so dangerous, what bad AI scenarios could look like, and what he thinks about various techniques to reduce this risk.   Topics we discuss, and timestamps:  - 00:00:38 - How AI may pose an existential threat    - 00:13:36 - AI timelines    - 00:24:49 - Why we might build risky AI    - 00:33:58 - Takeoff speeds    - 00:51:33 - Why AI c...2021-12-022h 49The Nonlinear Library: Alignment SectionThe Nonlinear Library: Alignment SectionAlignment Newsletter #22 by Rohin ShahWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Alignment Newsletter #22, published by Rohin Shah on the AI Alignment Forum. Highlights AI Governance: A Research Agenda (Allan Dafoe): A comprehensive document about the research agenda at the Governance of AI Program. This is really long and covers a lot of ground so I'm not going to summarize it, but I highly recommend it, even if you intend to work primarily on technical work. Technical AI alignment Agent foundations Agents and Devices: A Relative Definition...2021-11-1710 minFuture of Life Institute PodcastFuture of Life Institute PodcastFuture of Life Institute's $25M Grants Program for Existential Risk ReductionFuture of Life Institute President Max Tegmark and our grants team, Andrea Berman and Daniel Filan, join us to announce a $25M multi-year AI Existential Safety Grants Program. Topics discussed in this episode include: - The reason Future of Life Institute is offering AI Existential Safety Grants - Max speaks about how receiving a grant changed his career early on - Daniel and Andrea provide details on the fellowships and future grant priorities Check out our grants programs here: https://grants.futureoflife.org/ Join our AI Existential Safety Community: https://futureoflife.org/team/ai-exis... Have any feedback about the podcast...2021-10-1924 minFuture of Life Institute PodcastFuture of Life Institute PodcastFuture of Life Institute's $25M Grants Program for Existential Risk ReductionFuture of Life Institute President Max Tegmark and our grants team, Andrea Berman and Daniel Filan, join us to announce a $25M multi-year AI Existential Safety Grants Program. Topics discussed in this episode include: - The reason Future of Life Institute is offering AI Existential Safety Grants - Max speaks about how receiving a grant changed his career early on - Daniel and Andrea provide details on the fellowships and future grant priorities Check out our grants programs here: https://grants.futureoflife.org/ Join our AI Existential Safety Community: https://futureoflife.org/team/ai-exis... Have any feedback about the podcast...2021-10-1924 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast11 - Attainable Utility and Power with Alex TurnerMany scary stories about AI involve an AI system deceiving and subjugating humans in order to gain the ability to achieve its goals without us stopping it. This episode's guest, Alex Turner, will tell us about his research analyzing the notions of "attainable utility" and "power" that underlie these stories, so that we can better evaluate how likely they are and how to prevent them.   Topics we discuss:  - Side effects minimization  - Attainable Utility Preservation (AUP)  - AUP and alignment  - Power-seeking  - Power-seeking and al...2021-09-251h 27AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast10 - AI's Future and Impacts with Katja GraceWhen going about trying to ensure that AI does not cause an existential catastrophe, it's likely important to understand how AI will develop in the future, and why exactly it might or might not cause such a catastrophe. In this episode, I interview Katja Grace, researcher at AI Impacts, who's done work surveying AI researchers about when they expect superhuman AI to be reached, collecting data about how rapidly AI tends to progress, and thinking about the weak points in arguments that AI could be catastrophic for humanity.   Topics we discuss:  - 00:00:34 - AI...2021-07-242h 02Towards Data ScienceTowards Data Science92. Daniel Filan - Peering into neural nets for AI safetyMany AI researchers think it’s going to be hard to design AI systems that continue to remain safe as AI capabilities increase. We’ve seen already on the podcast that the field of AI alignment has emerged to tackle this problem, but a related effort is also being directed at a separate dimension of the safety problem: AI interpretability. Our ability to interpret how AI systems process information and make decisions will likely become an important factor in assuring the reliability of AIs in the future. And my guest for this episode of the podcast has focu...2021-07-141h 06AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast9 - Finite Factored Sets with Scott GarrabrantBeing an agent can get loopy quickly. For instance, imagine that we're playing chess and I'm trying to decide what move to make. Your next move influences the outcome of the game, and my guess of that influences my move, which influences your next move, which influences the outcome of the game. How can we model these dependencies in a general way, without baking in primitive notions of 'belief' or 'agency'? Today, I talk with Scott Garrabrant about his recent work on finite factored sets that aims to answer this question.   Topics we discuss: 2021-06-251h 38AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast8 - Assistance Games with Dylan Hadfield-MenellHow should we think about the technical problem of building smarter-than-human AI that does what we want? When and how should AI systems defer to us? Should they have their own goals, and how should those goals be managed? In this episode, Dylan Hadfield-Menell talks about his work on assistance games that formalizes these questions. The first couple years of my PhD program included many long conversations with Dylan that helped shape how I view AI x-risk research, so it was great to have another one in the form of a recorded interview.   Link to t...2021-06-092h 23AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast7.5 - Forecasting Transformative AI from Biological Anchors with Ajeya CotraIf you want to shape the development and forecast the consequences of powerful AI technology, it's important to know when it might appear. In this episode, I talk to Ajeya Cotra about her draft report "Forecasting Transformative AI from Biological Anchors" which aims to build a probabilistic model to answer this question. We talk about a variety of topics, including the structure of the model, what the most important parts are to get right, how the estimates should shape our behaviour, and Ajeya's current work at Open Philanthropy and perspective on the AI x-risk landscape.   U...2021-05-2801 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast7 - Side Effects with Victoria KrakovnaOne way of thinking about how AI might pose an existential threat is by taking drastic actions to maximize its achievement of some objective function, such as taking control of the power supply or the world's computers. This might suggest a mitigation strategy of minimizing the degree to which AI systems have large effects on the world that are not absolutely necessary for achieving their objective. In this episode, Victoria Krakovna talks about her research on quantifying and minimizing side effects. Topics discussed include how one goes about defining side effects and the difficulties in doing so, her work...2021-05-141h 19AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast6 - Debate and Imitative Generalization with Beth BarnesOne proposal to train AIs that can be useful is to have ML models debate each other about the answer to a human-provided question, where the human judges which side has won. In this episode, I talk with Beth Barnes about her thoughts on the pros and cons of this strategy, what she learned from seeing how humans behaved in debate protocols, and how a technique called imitative generalization can augment debate. Those who are already quite familiar with the basic proposal might want to skip past the explanation of debate to 13:00, "what problems does it solve and does...2021-04-081h 58AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast5 - Infra-Bayesianism with Vanessa KosoyThe theory of sequential decision-making has a problem: how can we deal with situations where we have some hypotheses about the environment we're acting in, but its exact form might be outside the range of possibilities we can possibly consider? Relatedly, how do we deal with situations where the environment can simulate what we'll do in the future, and put us in better or worse situations now depending on what we'll do then? Today's episode features Vanessa Kosoy talking about infra-Bayesianism, the mathematical framework she developed with Alex Appel that modifies Bayesian decision theory to succeed in these types...2021-03-101h 23AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast4 - Risks from Learned Optimization with Evan HubingerIn machine learning, typically optimization is done to produce a model that performs well according to some metric. Today's episode features Evan Hubinger talking about what happens when the learned model itself is doing optimization in order to perform well, how the goals of the learned model could differ from the goals we used to select the learned model, and what would happen if they did differ.   Link to the paper - Risks from Learned Optimization in Advanced Machine Learning Systems: arxiv.org/abs/1906.01820 Link to the transcript: axrp.net/episode/2021/02/17/episode-4-risks-from-learned-optimization-evan-hubinger.h...2021-02-182h 13AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast3 - Negotiable Reinforcement Learning with Andrew CritchIn this episode, I talk with Andrew Critch about negotiable reinforcement learning: what happens when two people (or organizations, or what have you) who have different beliefs and preferences jointly build some agent that will take actions in the real world. In the paper we discuss, it's proven that the only way to make such an agent Pareto optimal - that is, have it not be the case that there's a different agent that both people would prefer to use instead - is to have it preferentially optimize the preferences of whoever's beliefs were more accurate. We discuss his...2020-12-1158 minAXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast2 - Learning Human Biases with Rohin ShahOne approach to creating useful AI systems is to watch humans doing a task, infer what they're trying to do, and then try to do that well. The simplest way to infer what the humans are trying to do is to assume there's one goal that they share, and that they're optimally achieving the goal. This has the problem that humans aren't actually optimal at achieving the goals they pursue. We could instead code in the exact way in which humans behave suboptimally, except that we don't know that either. In this episode, I talk with Rohin Shah about...2020-12-111h 08AXRP - the AI X-risk Research PodcastAXRP - the AI X-risk Research Podcast1 - Adversarial Policies with Adam GleaveIn this episode, Adam Gleave and I talk about adversarial policies. Basically, in current reinforcement learning, people train agents that act in some kind of environment, sometimes an environment that contains other agents. For instance, you might train agents that play sumo with each other, with the objective of making them generally good at sumo. Adam's research looks at the case where all you're trying to do is make an agent that defeats one specific other agents: how easy is it, and what happens? He discovers that often, you can do it pretty easily, and your agent can behave...2020-12-1158 minSidesporSidesporRuben Hughes "If I love it, I'm about it"Art Director hos Illum, Ruben Hughes gæster dagens episode af podcasten. Ruben er en rigtig livsnyder med ulastelige smag og sans fra kvalitet. Vi vender Rubens baggrund og tidligere liv i New York, og springer videre til hvordan filan han er havnet i lille København. Derudover kommer vi ind på hvad han egentlig laver som AR hos Illum, nogle tanker om branding, hans filosofi bag indretning og naturligvis også hans yndlings bageri samt om han er mest til Rom eller Paris.Links med mere på sidespor.dkSidespor på Facebook: https://www.facebook.com/si...2020-01-1200 minSidesporSidespor2020 HVAD SKAL DER SKE?!GODT NYTÅR FOR FILAN! Som forventet, et helt cliché agtig reflektivt afsnit af podcasten. Vi ville egentlig have delt den med jer igår, men vi besluttede os for at 1/1-2020 var en federe dato. Det bliver selvfølgelig også lidt mere seriøst; vi skal bl.a. have på plads om man bør vaske sit hår først eller sidst i badet, og om udsalg egentlig er særlig fedt. Derudover ser vi tilbage på årtiet og naturligvis også 2019. Det har været de mest transformative år for os, formentlig også for mange af jer, og det før...2020-01-0100 minDJ GRIND | The Daily GrindDJ GRIND | The Daily GrindOctober 2019 Mix | DJ GRIND Fall Tour Promo PodcastNEW PODCAST! The wait is over! My all-new podcast is ready for download, featuring some of my favorite tracks from my summer tour and lots of fresh music for fall. This set includes three of my latest remixes with Toy Armada, including our ‘Club Mix’ for Carly Rae Jepsen’s “Too Much,” our ‘Massive Mix’ for Gawler & Francci Richard’s “JOY,” and our ‘Anthem Mix’ for Celine Dion’s “Flying On My Own!” DJ GRIND 2019 Fall Tour Dates Catch me at these upcoming events! FRIDAY, OCTOBER 25 – Salt Lake City, UT Skyfall presents “Evil” @ Sky SLC www.skyfallslc.com SATURDAY, OCTOBER 26 – New Orleans, LA HNO ‘M...2019-10-151h 22