podcast
details
.com
Print
Share
Look for any podcast host, guest or anyone
Search
Showing episodes and shows of
LessWrong
Shows
LessWrong (30+ Karma)
“Analyzing A Critique Of The AI 2027 Timeline Forecasts” by Zvi
There was what everyone agrees was a high quality critique of the timelines component of AI 2027, by the LessWrong user and Substack writer Titotal. It is great to have thoughtful critiques like this. The way you get actual thoughtful critiques like this, of course, is to post the wrong answer (at length) on the internet, and then respond by listening to the feedback and by making your model less wrong. This is a high-effort, highly detailed, real engagement on this section, including giving the original authors opportunity to critique the critique, and warnings to beware...
2025-06-24
54 min
LessWrong (30+ Karma)
“LessWrong Feed [new, now in beta]” by Ruby
The modern internet is replete with feeds such as Twitter, Facebook, Insta, TikTok, Substack, etc. They're bad in ways but also good in ways. I've been exploring the idea that LessWrong could have a very good feed. I'm posting this announcement with disjunctive hopes: (a) to find enthusiastic early adopters who will refine this into a great product, or (b) find people who'll lead us to an understanding that we shouldn't launch this or should launch it only if designed a very specific way. You can check it out right now: www.lesswrong.com/feed
2025-05-28
15 min
LessWrong (30+ Karma)
“LessWrong Community Weekend - Applications are open” by jt
We are open for Applications: https://airtable.com/appkj2FkJDGMtM2MA/pagiUldderZqbuBaP/form >>> All infos on the main event page
2025-05-14
04 min
LessWrong (Curated & Popular)
“PSA: The LessWrong Feedback Service” by JustisMills
At the bottom of the LessWrong post editor, if you have at least 100 global karma, you may have noticed this button.The button Many people click the button, and are jumpscared when it starts an Intercom chat with a professional editor (me), asking what sort of feedback they'd like. So, that's what it does. It's a summon Justis button. Why summon Justis? To get feedback on your post, of just about any sort. Typo fixes, grammar checks, sanity checks, clarity checks, fit for LessWrong, the works. If you use the LessWrong editor...
2025-05-13
04 min
LessWrong (30+ Karma)
“PSA: The LessWrong Feedback Service” by JustisMills
At the bottom of the LessWrong post editor, if you have at least 100 global karma, you may have noticed this button.The button Many people click the button, and are jumpscared when it starts an Intercom chat with a professional editor (me), asking what sort of feedback they'd like. So, that's what it does. It's a summon Justis button. Why summon Justis? To get feedback on your post, of just about any sort. Typo fixes, grammar checks, sanity checks, clarity checks, fit for LessWrong, the works. If you use the LessWrong...
2025-05-12
04 min
LessWrong (30+ Karma)
[Linkpost] “How people use LLMs” by Elizabeth
This is a link post. I've gotten a lot of value out of the details of how other people use LLMs, so I'm delighted that Gavin Leech created a collection of exactly such posts (link should go to the right section of the page but if you don't see it, scroll down). https://kajsotala.fi/2025/01/things-i-have-been-using-llms-for/ https://nicholas.carlini.com/writing/2024/how-i-use-ai.html https://www.lesswrong.com/posts/CYYBW8QCMK722GDpz/how-much-i-m-paying-for-ai-productivity-software-and-the https://www.avitalbalwit.com/post/how-i-use-claude https://andymasley.substack.com/p/how-i-use-ai https://benjamincongdon.me/blog/2025/02/02/How-I-Use-AI-Early-2025/ https://www.jefftk.com/p/examples-of-how-i-use-llms https://simonwillison.net...
2025-04-28
01 min
LessWrong (Curated & Popular)
“LessWrong has been acquired by EA” by habryka
Dear LessWrong community, It is with a sense of... considerable cognitive dissonance that I announce a significant development regarding the future trajectory of LessWrong. After extensive internal deliberation, modeling of potential futures, projections of financial runways, and what I can only describe as a series of profoundly unexpected coordination challenges, the Lightcone Infrastructure team has agreed in principle to the acquisition of LessWrong by EA. I assure you, nothing about how LessWrong operates on a day to day level will change. I have always cared deeply about the robustness and integrity of our institutions, and I...
2025-04-01
01 min
LessWrong (Curated & Popular)
“Policy for LLM Writing on LessWrong” by jimrandomh
LessWrong has been receiving an increasing number of posts and contents that look like they might be LLM-written or partially-LLM-written, so we're adopting a policy. This could be changed based on feedback. Humans Using AI as Writing or Research Assistants Prompting a language model to write an essay and copy-pasting the result will not typically meet LessWrong's standards. Please do not submit unedited or lightly-edited LLM content. You can use AI as a writing or research assistant when writing content for LessWrong, but you must have added significant value beyond what the AI produced, the result...
2025-03-25
04 min
LessWrong (Curated & Popular)
“Arbital has been imported to LessWrong” by RobertM, jimrandomh, Ben Pace, Ruby
Arbital was envisioned as a successor to Wikipedia. The project was discontinued in 2017, but not before many new features had been built and a substantial amount of writing about AI alignment and mathematics had been published on the website.If you've tried using Arbital.com the last few years, you might have noticed that it was on its last legs - no ability to register new accounts or log in to existing ones, slow load times (when it loaded at all), etc. Rather than try to keep it afloat, the LessWrong team worked with MIRI to migrate the...
2025-02-20
08 min
LessWrong (Curated & Popular)
“LessWrong audio: help us choose the new voice” by PeterH
We make AI narrations of LessWrong posts available via our audio player and podcast feeds.We’re thinking about changing our narrator's voice.There are three new voices on the shortlist. They’re all similarly good in terms of comprehension, emphasis, error rate, etc. They just sound different—like people do.We think they all sound similarly agreeable. But, thousands of listening hours are at stake, so we thought it’d be worth giving listeners an opportunity to vote—just in case there's a strong collective preference. Listen and vote
2024-12-12
01 min
LessWrong (Curated & Popular)
“(The) Lightcone is nothing without its people: LW + Lighthaven’s first big fundraiser” by habryka
TLDR: LessWrong + Lighthaven need about $3M for the next 12 months. Donate here, or send me an email, DM or signal message (+1 510 944 3235) if you want to support what we do. Donations are tax-deductible in the US. Reach out for other countries, we can likely figure something out. We have big plans for the next year, and due to a shifting funding landscape we need support from a broader community more than in any previous year.I've been running LessWrong/Lightcone Infrastructure for the last 7 years. During that time we have grown into the primary infrastructure provider for the rationality...
2024-11-30
1h 03
LessWrong (Curated & Popular)
“Reliable Sources: The Story of David Gerard” by TracingWoodgrains
This is a linkpost for https://www.tracingwoodgrains.com/p/reliable-sources-how-wikipedia-admin, posted in full here given its relevance to this community. Gerard has been one of the longest-standing malicious critics of the rationalist and EA communities and has done remarkable amounts of work to shape their public images behind the scenes.Note: I am closer to this story than to many of my others. As always, I write aiming to provide a thorough and honest picture, but this should be read as the view of a close onlooker who has known about much within this story...
2024-07-11
1h 22
LessWrong (Curated & Popular)
“Reliable Sources: The Story of David Gerard” by TracingWoodgrains
This is a linkpost for https://www.tracingwoodgrains.com/p/reliable-sources-how-wikipedia-admin, posted in full here given its relevance to this community. Gerard has been one of the longest-standing malicious critics of the rationalist and EA communities and has done remarkable amounts of work to shape their public images behind the scenes.Note: I am closer to this story than to many of my others. As always, I write aiming to provide a thorough and honest picture, but this should be read as the view of a close onlooker who has known about much within this story...
2024-07-11
1h 22
LessWrong (Curated & Popular)
“Reliable Sources: The Story of David Gerard” by TracingWoodgrains
This is a linkpost for https://www.tracingwoodgrains.com/p/reliable-sources-how-wikipedia-admin, posted in full here given its relevance to this community. Gerard has been one of the longest-standing malicious critics of the rationalist and EA communities and has done remarkable amounts of work to shape their public images behind the scenes.Note: I am closer to this story than to many of my others. As always, I write aiming to provide a thorough and honest picture, but this should be read as the view of a close onlooker who has known about much within this story...
2024-07-11
1h 21
LessWrong (Curated & Popular)
[HUMAN VOICE] "How could I have thought that faster?" by mesaoptimizer
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedThis is a linkpost for https://twitter.com/ESYudkowsky/status/144546114693741363I stumbled upon a Twitter thread where Eliezer describes what seems to be his cognitive algorithm that is equivalent to Tune Your Cognitive Strategies, and have decided to archive / repost it here.Source:https://www.lesswrong.com/posts/rYq6joCrZ8m62m7ej/how-could-i-have-thought-that-fasterNarrated for LessWrong by Perrin Walker.Share feedback on this narration.
2024-04-12
03 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "My PhD thesis: Algorithmic Bayesian Epistemology" by Eric Neyman
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedIn January, I defended my PhD thesis, which I called Algorithmic Bayesian Epistemology. From the preface:For me as for most students, college was a time of exploration. I took many classes, read many academic and non-academic works, and tried my hand at a few research projects. Early in graduate school, I noticed a strong commonality among the questions that I had found particularly fascinating: most of them involved reasoning about knowledge, information, or uncertainty under constraints. I decided that this...
2024-04-12
13 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Toward a Broader Conception of Adverse Selection" by Ricki Heicklen
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedThis is a linkpost for https://bayesshammai.substack.com/p/conditional-on-getting-to-trade-your“I refuse to join any club that would have me as a member” -Marx[1]Adverse Selection is the phenomenon in which information asymmetries in non-cooperative environments make trading dangerous. It has traditionally been understood to describe financial markets in which buyers and sellers systematically differ, such as a market for used cars in which sellers have the information advantage, where resulting feedback loops can lead to market coll...
2024-04-12
21 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Social status part 1/2: negotiations over object-level preferences" by Steven Byrnes
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/SPBm67otKq5ET5CWP/social-status-part-1-2-negotiations-over-object-level Narrated for LessWrong by Perrin Walker.Share feedback on this narration.
2024-04-05
50 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Using axis lines for good or evil" by dynomight
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/Yay8SbQiwErRyDKGb/using-axis-lines-for-good-or-evilNarrated for LessWrong by Perrin Walker.Share feedback on this narration.
2024-04-05
12 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Scale Was All We Needed, At First" by Gabriel Mukobi
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/xLDwCemt5qvchzgHd/scale-was-all-we-needed-at-firstNarrated for LessWrong by Perrin Walker.Share feedback on this narration.
2024-04-05
15 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Acting Wholesomely" by OwenCB
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/Cb7oajdrA5DsHCqKd/acting-wholesomelyNarrated for LessWrong by Perrin Walker.Share feedback on this narration.
2024-04-05
27 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "My Clients, The Liars" by ymeskhout
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/h99tRkpQGxwtb9Dpv/my-clients-the-liarsNarrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓
2024-03-21
13 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Deep atheism and AI risk" by Joe Carlsmith
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/sJPbmm8Gd34vGYrKd/deep-atheism-and-ai-riskNarrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓
2024-03-21
46 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Speaking to Congressional staffers about AI risk" by Akash, hath
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/2sLwt2cSAag74nsdN/speaking-to-congressional-staffers-about-ai-riskNarrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓
2024-03-10
24 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "CFAR Takeaways: Andrew Critch" by Raemon
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/Jash4Gbi2wpThzZ4k/cfar-takeaways-andrew-critchNarrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓
2024-03-10
09 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "And All the Shoggoths Merely Players" by Zack_M_Davis
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/8yCXeafJo67tYe5L4/and-all-the-shoggoths-merely-players Narrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓
2024-02-20
21 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Updatelessness doesn't solve most problems" by Martín Soto
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/g8HHKaWENEbqh2mgK/updatelessness-doesn-t-solve-most-problems-1Narrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓
2024-02-20
25 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Believing In" by Anna Salamon
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/duvzdffTzL3dWJcxn/believing-in-1Narrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓
2024-02-14
25 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Attitudes about Applied Rationality" by Camille Berger
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/5jdqtpT6StjKDKacw/attitudes-about-applied-rationalityNarrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓
2024-02-14
07 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "A Shutdown Problem Proposal" by johnswentworth, David Lorell
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/PhTBDHu9PKJFmvb4p/a-shutdown-problem-proposalNarrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓
2024-02-09
12 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Without fundamental advances, misalignment and catastrophe are the default outcomes of training powerful AI" by Jeremy Gillen, peterbarnett
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/GfZfDHZHCuYwrHGCd/without-fundamental-advances-misalignment-and-catastropheNarrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓
2024-02-03
1h 41
LessWrong (Curated & Popular)
[HUMAN VOICE] "The case for ensuring that powerful AIs are controlled" by ryan_greenblatt, Buck
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/kcKrE9mzEHrdqtDpE/the-case-for-ensuring-that-powerful-ais-are-controlledNarrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓
2024-02-02
1h 04
LessWrong (Curated & Popular)
[HUMAN VOICE] "There is way too much serendipity" by Malmesbury
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedCrossposted from substack.As we all know, sugar is sweet and so are the $30B in yearly revenue from the artificial sweetener industry.Four billion years of evolution endowed our brains with a simple, straightforward mechanism to make sure we occasionally get an energy refuel so we can continue the foraging a little longer, and of course we are completely ignoring the instructions and spend billions on fake fuel that doesn’t actually grant any energy. A classic ca...
2024-01-22
12 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "How useful is mechanistic interpretability?" by ryan_greenblatt, Neel Nanda, Buck, habryka
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/tEPHGZAb63dfq2v8n/how-useful-is-mechanistic-interpretabilityNarrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓
2024-01-21
41 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training" by evhub et al
This is a linkpost for https://arxiv.org/abs/2401.05566Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedSource:https://www.lesswrong.com/posts/ZAsJv7xijKTfZkMtr/sleeper-agents-training- deceptive-llms-that-persist-throughNarrated for LessWrong by Perrin Walker.Share feedback on this narration.[Curated Post] ✓[125+ Karma Post] ✓
2024-01-21
08 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Meaning & Agency" by Abram Demski
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedThe goal of this post is to clarify a few concepts relating to AI Alignment under a common framework. The main concepts to be clarified:Optimization. Specifically, this will be a type of Vingean agency. It will split into Selection vs Control variants.Reference (the relationship which holds between map and territory; aka semantics, aka meaning). Specifically, this will be a teleosemantic theory.The main new concepts employed will be endorsement and legitimacy. TLDR: Endorsement of a pr...
2024-01-07
30 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "A case for AI alignment being difficult" by jessicata
This is a linkpost for https://unstableontology.com/2023/12/31/a-case-for-ai-alignment-being-difficult/Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedThis is an attempt to distill a model of AGI alignment that I have gained primarily from thinkers such as Eliezer Yudkowsky (and to a lesser extent Paul Christiano), but explained in my own terms rather than attempting to hew close to these thinkers. I think I would be pretty good at passing an ideological Turing test for Eliezer Yudowsky on AGI alignment difficulty (but not AGI timelines), though what I'm...
2024-01-02
28 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Significantly Enhancing Adult Intelligence With Gene Editing May Be Possible" by Gene Smith and Kman
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedTL;DR versionIn the course of my life, there have been a handful of times I discovered an idea that changed the way I thought about the world. The first occurred when I picked up Nick Bostrom’s book “superintelligence” and realized that AI would utterly transform the world. The second was when I learned about embryo selection and how it could change future generations. And the third happened a few months ago when I read a message from a frie...
2023-12-17
1h 01
LessWrong (Curated & Popular)
[HUMAN VOICE] "Moral Reality Check (a short story)" by jessicata
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedThis is a linkpost for https://unstableontology.com/2023/11/26/moral-reality-check/Janet sat at her corporate ExxenAI computer, viewing some training performance statistics. ExxenAI was a major player in the generative AI space, with multimodal language, image, audio, and video AIs. They had scaled up operations over the past few years, mostly serving B2B, but with some B2C subscriptions. ExxenAI's newest AI system, SimplexAI-3, was based on GPT-5 and Gemini-2. ExxenAI had hired away some software engineers from Google and...
2023-12-15
39 min
LessWrong (Curated & Popular)
2023 Unofficial LessWrong Census/Survey
The Less Wrong General Census is unofficially here! You can take it at this link.It's that time again.If you are reading this post and identify as a LessWronger, then you are the target audience. I'd appreciate it if you took the survey. If you post, if you comment, if you lurk, if you don't actually read the site that much but you do read a bunch of the other rationalist blogs or you're really into HPMOR, if you hung out on rationalist tumblr back in the day, or if none of those exactly fit...
2023-12-14
02 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "What are the results of more parental supervision and less outdoor play?" by Julia Wise
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedCrossposted from OtherwiseParents supervise their children way more than they used toChildren spend less of their time in unstructured play than they did in past generations.Parental supervision is way up. The wild thing is that this is true even while the number of children per family has decreased and the amount of time mothers work outside the home has increased.Source:https://www.lesswrong.com/posts...
2023-12-13
12 min
LessWrong (Curated & Popular)
[HUMAN VOICE] "Shallow review of live agendas in alignment & safety" by technicalities & Stag
Support ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedYou can’t optimise an allocation of resources if you don’t know what the current one is. Existing maps of alignment research are mostly too old to guide you and the field has nearly no ratchet, no common knowledge of what everyone is doing and why, what is abandoned and why, what is renamed, what relates to what, what is going on. This post is mostly just a big index: a link-dump for as many currently active AI safety agend...
2023-12-04
1h 02
LessWrong (Curated & Popular)
[HUMAN VOICE] "Social Dark Matter" by Duncan Sabien
The author's Substack:https://substack.com/@homosabiensSupport ongoing human narrations of LessWrong's curated posts:www.patreon.com/LWCuratedYou know it must be out there, but you mostly never see it.Author's Note 1: In something like 75% of possible futures, this will be the last essay that I publish on LessWrong. Future content will be available on my substack, where I'm hoping people will be willing to chip in a little commensurate with the value of the writing, and (after a delay) on my personal site (not y...
2023-11-28
1h 05
LessWrong (Curated & Popular)
Social Dark Matter
You know it must be out there, but you mostly never see it.Author's Note 1: I'm something like 75% confident that this will be the last essay that I publish on LessWrong. Future content will be available on my substack, where I'm hoping people will be willing to chip in a little commensurate with the value of the writing, and (after a delay) on my personal site. I decided to post this final essay here rather than silently switching over because many LessWrong readers would otherwise never find out that they could still get new Duncan content elsewhere.
2023-11-17
53 min
LessWrong (Curated & Popular)
"RSPs are pauses done right" by evhub
COI: I am a research scientist at Anthropic, where I work on model organisms of misalignment; I was also involved in the drafting process for Anthropic’s RSP. Prior to joining Anthropic, I was a Research Fellow at MIRI for three years.Thanks to Kate Woolverton, Carson Denison, and Nicholas Schiefer for useful feedback on this post.Recently, there’s been a lot of discussion and advocacy around AI pauses—which, to be clear, I think is great: pause advocacy pushes in the right direction and works to build a good base of public support for x...
2023-10-15
12 min
LessWrong (Curated & Popular)
"Announcing MIRI’s new CEO and leadership team" by Gretta Duleba
In 2023, MIRI has shifted focus in the direction of broad public communication—see, for example, our recent TED talk, our piece in TIME magazine “Pausing AI Developments Isn’t Enough. We Need to Shut it All Down”, and our appearances on various podcasts. While we’re continuing to support various technical research programs at MIRI, this is no longer our top priority, at least for the foreseeable future.Coinciding with this shift in focus, there have also been many organizational changes at MIRI over the last several months, and we are somewhat overdue to announce them in public. Th...
2023-10-15
06 min
LessWrong (Curated & Popular)
"Comparing Anthropic's Dictionary Learning to Ours" by Robert_AIZI
Readers may have noticed many similarities between Anthropic's recent publication Towards Monosemanticity: Decomposing Language Models With Dictionary Learning (LW post) and my team's recent publication Sparse Autoencoders Find Highly Interpretable Directions in Language Models (LW post). Here I want to compare our techniques and highlight what we did similarly or differently. My hope in writing this is to help readers understand the similarities and differences, and perhaps to lay the groundwork for a future synthesis approach. First, let me note that we arrived at similar techniques in similar ways: both Anthropic and my team follow the lead o...
2023-10-15
08 min
LessWrong (Curated & Popular)
"Towards Monosemanticity: Decomposing Language Models With Dictionary Learning" by Zac Hatfield-Dodds
Neural networks are trained on data, not programmed to follow rules. We understand the math of the trained network exactly – each neuron in a neural network performs simple arithmetic – but we don't understand why those mathematical operations result in the behaviors we see. This makes it hard to diagnose failure modes, hard to know how to fix them, and hard to certify that a model is truly safe.Luckily for those of us trying to understand artificial neural networks, we can simultaneously record the activation of every neuron in the network, intervene by silencing or stimulating them, and...
2023-10-10
04 min
LessWrong (Curated & Popular)
"Announcing Dialogues" by Ben Pace
As of today, everyone is able to create a new type of content on LessWrong: Dialogues.In contrast with posts, which are for monologues, and comment sections, which are spaces for everyone to talk to everyone, a dialogue is a space for a few invited people to speak with each other. I'm personally very excited about this as a way for people to produce lots of in-depth explanations of their world-models in public. I think dialogues enable this in a way that feels easier — instead of writing an explanation for anyone who reads, you...
2023-10-10
07 min
LessWrong (Curated & Popular)
"Evaluating the historical value misspecification argument" by Matthew Barnett
ETA: I'm not saying that MIRI thought AIs wouldn't understand human values. If there's only one thing you take away from this post, please don't take away that.Recently, many people have talked about whether some of the main MIRI people (Eliezer Yudkowsky, Nate Soares, and Rob Bensinger[1]) should update on whether value alignment is easier than they thought given that GPT-4 seems to follow human directions and act within moral constraints pretty well (here are two specific examples of people talking about this: 1, 2). Because these conversations are often hard to follow without much context, I'll just...
2023-10-10
11 min
LessWrong (Curated & Popular)
"Response to Quintin Pope’s Evolution Provides No Evidence For the Sharp Left Turn" by Zvi
Response to: Evolution Provides No Evidence For the Sharp Left Turn, due to it winning first prize in The Open Philanthropy Worldviews contest. Quintin’s post is an argument about a key historical reference class and what it tells us about AI. Instead of arguing that the reference makes his point, he is instead arguing that it doesn’t make anyone’s point - that we understand the reasons for humanity’s sudden growth in capabilities. He says this jump was caused by gaining access to cultural transmission which allowed partial preservation of in-lifetime learning across generations, which was...
2023-10-10
16 min
LessWrong (Curated & Popular)
"Thomas Kwa's MIRI research experience" by Thomas Kwa and others
Moderator note: the following is a dialogue using LessWrong’s new dialogue feature. The exchange is not completed: new replies might be added continuously, the way a comment thread might work. If you’d also be excited about finding an interlocutor to debate, dialogue, or getting interviewed by: fill in this dialogue matchmaking form. Hi Thomas, I'm quite curious to hear about your research experience working with MIRI. To get us started: When were you at MIRI? Who did you work with? And what problem were you working on?Source:https://www.lessw...
2023-10-06
52 min
LessWrong (Curated & Popular)
"EA Vegan Advocacy is not truthseeking, and it’s everyone’s problem" by Elizabeth
Effective altruism prides itself on truthseeking. That pride is justified in the sense that EA is better at truthseeking than most members of its reference category, and unjustified in that it is far from meeting its own standards. We’ve already seen dire consequences of the inability to detect bad actors who deflect investigation into potential problems, but by its nature you can never be sure you’ve found all the damage done by epistemic obfuscation because the point is to be self-cloaking. My concern here is for the underlying dynamics of EA’s weak epistemic immune system...
2023-10-03
41 min
LessWrong (Curated & Popular)
"How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions" by Jan Brauner et al.
Large language models (LLMs) can "lie", which we define as outputting false statements despite "knowing" the truth in a demonstrable sense. LLMs might "lie", for example, when instructed to output misinformation. Here, we develop a simple lie detector that requires neither access to the LLM's activations (black-box) nor ground-truth knowledge of the fact in question. The detector works by asking a predefined set of unrelated follow-up questions after a suspected lie, and feeding the LLM's yes/no answers into a logistic regression classifier. Despite its simplicity, this lie detector is highly accurate and surprisingly general. When trained on examples...
2023-10-03
07 min
LessWrong (Curated & Popular)
"The Lighthaven Campus is open for bookings" by Habryka
Lightcone Infrastructure (the organization that grew from and houses the LessWrong team) has just finished renovating a 7-building physical campus that we hope to use to make the future of humanity go better than it would otherwise.We're hereby announcing that it is generally available for bookings. We offer preferential pricing for projects we think are good for the world, but to cover operating costs, we're renting out space to a wide variety of people/projects.Source:https://www.lesswrong.com/posts/memqyjNCpeDrveayx/the-lighthaven-campus-is-open-for-bookingsNarrated for LessWrong by TYPE III...
2023-10-03
05 min
LessWrong (Curated & Popular)
"'Diamondoid bacteria' nanobots: deadly threat or dead-end? A nanotech investigation" by titotal
A lot of people are highly concerned that a malevolent AI or insane human will, in the near future, set out to destroy humanity. If such an entity wanted to be absolutely sure they would succeed, what method would they use? Nuclear war? Pandemics?According to some in the x-risk community, the answer is this: The AI will invent molecular nanotechnology, and then kill us all with diamondoid bacteria nanobots.Source:https://www.lesswrong.com/posts/bc8Ssx5ys6zqu3eq9/diamondoid-bacteria-nanobots-deadly-threat-or-dead-end-aNarrated for LessWrong by TYPE III AUDIO.
2023-10-03
37 min
LessWrong (Curated & Popular)
"The King and the Golem" by Richard Ngo
This is a linkpost for https://narrativeark.substack.com/p/the-king-and-the-golemLong ago there was a mighty king who had everything in the world that he wanted, except trust. Who could he trust, when anyone around him might scheme for his throne? So he resolved to study the nature of trust, that he might figure out how to gain it. He asked his subjects to bring him the most trustworthy thing in the kingdom, promising great riches if they succeeded.Soon, the first of them arrived at his palace to try. A teacher brought her...
2023-09-29
08 min
LessWrong (Curated & Popular)
"Sparse Autoencoders Find Highly Interpretable Directions in Language Models" by Logan Riggs et al
This is a linkpost for Sparse Autoencoders Find Highly Interpretable Directions in Language ModelsWe use a scalable and unsupervised method called Sparse Autoencoders to find interpretable, monosemantic features in real LLMs (Pythia-70M/410M) for both residual stream and MLPs. We showcase monosemantic features, feature replacement for Indirect Object Identification (IOI), and use OpenAI's automatic interpretation protocol to demonstrate a significant improvement in interpretability.Source:https://www.lesswrong.com/posts/Qryk6FqjtZk9FHHJR/sparse-autoencoders-find-highly-interpretable-directions-inNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.
2023-09-27
10 min
LessWrong (Curated & Popular)
"Inside Views, Impostor Syndrome, and the Great LARP" by John Wentworth
Epistemic status: model which I find sometimes useful, and which emphasizes some true things about many parts of the world which common alternative models overlook. Probably not correct in full generality.Consider Yoshua Bengio, one of the people who won a Turing Award for deep learning research. Looking at his work, he clearly “knows what he’s doing”. He doesn’t know what the answers will be in advance, but he has some models of what the key questions are, what the key barriers are, and at least some hand-wavy pseudo-models of how things work.For inst...
2023-09-27
08 min
LessWrong (Curated & Popular)
"There should be more AI safety orgs" by Marius Hobbhahn
I’m writing this in my own capacity. The views expressed are my own, and should not be taken to represent the views of Apollo Research or any other program I’m involved with. TL;DR: I argue why I think there should be more AI safety orgs. I’ll also provide some suggestions on how that could be achieved. The core argument is that there is a lot of unused talent and I don’t think existing orgs scale fast enough to absorb it. Thus, more orgs are needed. This post can also serve as a call...
2023-09-25
29 min
LessWrong (Curated & Popular)
"The Talk: a brief explanation of sexual dimorphism" by Malmesbury
Cross-posted from substack."Everything in the world is about sex, except sex. Sex is about clonal interference."– Oscar Wilde (kind of)As we all know, sexual reproduction is not about reproduction. Reproduction is easy. If your goal is to fill the world with copies of your genes, all you need is a good DNA-polymerase to duplicate your genome, and then to divide into two copies of yourself. Asexual reproduction is just better in every way.It's pretty clear that, on a direct one-v-one cage match, an asexual organism would have muc...
2023-09-22
30 min
LessWrong (Curated & Popular)
"A Golden Age of Building? Excerpts and lessons from Empire State, Pentagon, Skunk Works and SpaceX" by jacobjacob
Patrick Collison has a fantastic list of examples of people quickly accomplishing ambitious things together since the 19th Century. It does make you yearn for a time that feels... different, when the lethargic behemoths of government departments could move at the speed of a racing startup: [...] last century, [the Department of Defense] innovated at a speed that puts modern Silicon Valley startups to shame: the Pentagon was built in only 16 months (1941–1943), the Manhattan Project ran for just over 3 years (1942–1946), and the Apollo Program put a man on the moon in under a decade (1961–1969). In the 1950s alone, the United...
2023-09-20
45 min
LessWrong (Curated & Popular)
"AI presidents discuss AI alignment agendas" by TurnTrout & Garrett Baker
This is a linkpost for https://www.youtube.com/watch?v=02kbWY5mahQNone of the presidents fully represent my (TurnTrout's) views.TurnTrout wrote the script. Garrett Baker helped produce the video after the audio was complete. Thanks to David Udell, Ulisse Mini, Noemi Chulo, and especially Rio Popper for feedback and assistance in writing the script.Source:https://www.lesswrong.com/posts/7M2iHPLaNzPNXHuMv/ai-presidents-discuss-ai-alignment-agendasYouTube video kindly provided by the authors. Other text narrated for LessWrong by TYPE III AUDIO.Share feedback on this...
2023-09-19
23 min
LessWrong (Curated & Popular)
"UDT shows that decision theory is more puzzling than ever" by Wei Dai
I feel like MIRI perhaps mispositioned FDT (their variant of UDT) as a clear advancement in decision theory, whereas maybe they could have attracted more attention/interest from academic philosophy if the framing was instead that the UDT line of thinking shows that decision theory is just more deeply puzzling than anyone had previously realized. Instead of one major open problem (Newcomb's, or EDT vs CDT) now we have a whole bunch more. I'm really not sure at this point whether UDT is even on the right track, but it does seem clear that there are some thorny issues...
2023-09-18
02 min
LessWrong (Curated & Popular)
"Sum-threshold attacks" by TsviBT
How do you affect something far away, a lot, without anyone noticing?(Note: you can safely skip sections. It is also safe to skip the essay entirely, or to read the whole thing backwards if you like.)Source:https://www.lesswrong.com/posts/R3eDrDoX8LisKgGZe/sum-threshold-attacksNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓
2023-09-11
19 min
LessWrong (Curated & Popular)
"A list of core AI safety problems and how I hope to solve them" by Davidad
Context: I sometimes find myself referring back to this tweet and wanted to give it a more permanent home. While I'm at it, I thought I would try to give a concise summary of how each distinct problem would be solved by an Open Agency Architecture (OAA), if OAA turns out to be feasible.Source:https://www.lesswrong.com/posts/D97xnoRr6BHzo5HvQ/one-minute-every-momentNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓
2023-09-09
12 min
LessWrong (Curated & Popular)
"Report on Frontier Model Training" by Yafah Edelman
This is a linkpost for https://docs.google.com/document/d/1TsYkDYtV6BKiCN9PAOirRAy3TrNDu2XncUZ5UZfaAKA/edit?usp=sharingUnderstanding what drives the rising capabilities of AI is important for those who work to forecast, regulate, or ensure the safety of AI. Regulations on the export of powerful GPUs need to be informed by understanding of how these GPUs are used, forecasts need to be informed by bottlenecks, and safety needs to be informed by an understanding of how the models of the future might be trained. A clearer understanding would enable policy makers to target...
2023-09-09
35 min
LessWrong (Curated & Popular)
"Defunding My Mistake" by ymeskhout
Until about five years ago, I unironically parroted the slogan All Cops Are Bastards (ACAB) and earnestly advocated to abolish the police and prison system. I had faint inklings I might be wrong about this a long time ago, but it took a while to come to terms with its disavowal. What follows is intended to be not just a detailed account of what I used to believe but most pertinently, why. Despite being super egotistical, for whatever reason I do not experience an aversion to openly admitting mistakes I’ve made, and I find it very difficult to un...
2023-09-08
11 min
LessWrong (Curated & Popular)
"What I would do if I wasn’t at ARC Evals" by LawrenceC
In which: I list 9 projects that I would work on if I wasn’t busy working on safety standards at ARC Evals, and explain why they might be good to work on. Epistemic status: I’m prioritizing getting this out fast as opposed to writing it carefully. I’ve thought for at least a few hours and talked to a few people I trust about each of the following projects, but I haven’t done that much digging into each of these, and it’s likely that I’m wrong about many material facts. I also...
2023-09-08
25 min
LessWrong (Curated & Popular)
"The U.S. is becoming less stable" by lc
We focus so much on arguing over who is at fault in this country that I think sometimes we fail to alert on what's actually happening. I would just like to point out, without attempting to assign blame, that American political institutions appear to be losing common knowledge of their legitimacy, and abandoning certain important traditions of cooperative governance. It would be slightly hyperbolic, but not unreasonable to me, to term what has happened "democratic backsliding". Source:https://www.lesswrong.com/posts/r2vaM2MDvdiDSWicu/the-u-s-is-becoming-less-stable#Narrated for LessWrong by TYPE III...
2023-09-05
03 min
LessWrong (Curated & Popular)
"Meta Questions about Metaphilosophy" by Wei Dai
To quickly recap my main intellectual journey so far (omitting a lengthy side trip into cryptography and Cypherpunk land), with the approximate age that I became interested in each topic in parentheses:Source:https://www.lesswrong.com/posts/fJqP9WcnHXBRBeiBg/meta-questions-about-metaphilosophyNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓
2023-09-05
05 min
LessWrong (Curated & Popular)
"OpenAI API base models are not sycophantic, at any size" by Nostalgebraist
In Discovering Language Model Behaviors with Model-Written Evaluations" (Perez et al 2022), the authors studied language model "sycophancy" - the tendency to agree with a user's stated view when asked a question.The paper contained the striking plot reproduced below, which shows sycophancyincreasing dramatically with model sizewhile being largely independent of RLHF stepsand even showing up at 0 RLHF steps, i.e. in base models![...] I found this result startling when I read the original paper, as it seemed like a bizarre failure of calibration. How would the base LM know that this "Assistant" c...
2023-09-04
04 min
LessWrong (Curated & Popular)
"Dear Self; we need to talk about ambition" by Elizabeth
I keep seeing advice on ambition, aimed at people in college or early in their career, that would have been really bad for me at similar ages. Rather than contribute (more) to the list of people giving poorly universalized advice on ambition, I have written a letter to the one person I know my advice is right for: myself in the past.Source:https://www.lesswrong.com/posts/uGDtroD26aLvHSoK2/dear-self-we-need-to-talk-about-ambition-1Narrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓[Cu...
2023-08-30
13 min
LessWrong (Curated & Popular)
"Assume Bad Faith" by Zack_M_Davis
I've been trying to avoid the terms "good faith" and "bad faith". I'm suspicious that most people who have picked up the phrase "bad faith" from hearing it used, don't actually know what it means—and maybe, that the thing it does mean doesn't carve reality at the joints.People get very touchy about bad faith accusations: they think that you should assume good faith, but that if you've determined someone is in bad faith, you shouldn't even be talking to them, that you need to exile them.What does "bad faith" mean, though? It do...
2023-08-28
12 min
LessWrong (Curated & Popular)
"Book Launch: "The Carving of Reality," Best of LessWrong vol. III" by Raemon
The Carving of Reality, third volume of the Best of LessWrong books is now available on Amazon (US).The Carving of Reality includes 43 essays from 29 authors. We've collected the essays into four books, each exploring two related topics. The "two intertwining themes" concept was first inspired when as I looked over the cluster of "coordination" themed posts, and noting a recurring motif of not only "solving coordination problems" but also "dealing with the binding constraints that were causing those coordination problems."Source:https://www.lesswrong.com/posts/Rck5CvmYkzWYxsF4D/book-launch-the-carving-of-reality-best-of-lesswrong-vol-iii
2023-08-28
05 min
LessWrong (Curated & Popular)
"Large Language Models will be Great for Censorship" by Ethan Edwards
LLMs can do many incredible things. They can generate unique creative content, carry on long conversations in any number of subjects, complete complex cognitive tasks, and write nearly any argument. More mundanely, they are now the state of the art for boring classification tasks and therefore have the capability to radically upgrade the censorship capacities of authoritarian regimes throughout the world.Produced as part of the SERI ML Alignment Theory Scholars Program - Summer 2023 Cohort. Thanks to ev_ and Kei for suggestions on this post.Source:https://www.lesswrong.com/posts/oqvsR2...
2023-08-23
15 min
LessWrong (Curated & Popular)
"Ten Thousand Years of Solitude" by agp
This is a linkpost for the article "Ten Thousand Years of Solitude", written by Jared Diamond for Discover Magazine in 1993, four years before he published Guns, Germs and Steel. That book focused on Diamond's theory that the geography of Eurasia, particularly its large size and common climate, allowed civilizations there to dominate the rest of the world because it was easy to share plants, animals, technologies and ideas. This article, however, examines the opposite extreme.Diamond looks at the intense isolation of the tribes on Tasmania - an island the size of Ireland. After waters rose, Tasmania...
2023-08-22
07 min
LessWrong (Curated & Popular)
"6 non-obvious mental health issues specific to AI safety" by Igor Ivanov
Intro: I am a psychotherapist, and I help people working on AI safety. I noticed patterns of mental health issues highly specific to this group. It's not just doomerism, there are way more of them that are less obvious. If you struggle with a mental health issue related to AI safety, feel free to leave a comment about it and about things that help you with it. You might also support others in the comments. Sometimes such support makes a lot of difference and people feel like they are not alone.All the e...
2023-08-22
06 min
LessWrong (Curated & Popular)
"Against Almost Every Theory of Impact of Interpretability" by Charbel-Raphaël
I gave a talk about the different risk models, followed by an interpretability presentation, then I got a problematic question, "I don't understand, what's the point of doing this?" Hum.Feature viz? (left image) Um, it's pretty but is this useful?[1] Is this reliable? GradCam (a pixel attribution technique, like on the above right figure), it's pretty. But I’ve never seen anybody use it in industry.[2] Pixel attribution seems useful, but accuracy remains the king.[3]Induction heads? Ok, we are maybe on track to retro engineer the mechanism of regex in LLMs. Cool.The considerations in the...
2023-08-21
1h 18
LessWrong (Curated & Popular)
"Inflection.ai is a major AGI lab" by Nikola
Inflection.ai (co-founded by DeepMind co-founder Mustafa Suleyman) should be perceived as a frontier LLM lab of similar magnitude as Meta, OpenAI, DeepMind, and Anthropic based on their compute, valuation, current model capabilities, and plans to train frontier models. Compared to the other labs, Inflection seems to put less effort into AI safety.Thanks to Laker Newhouse for discussion and feedback.Source:https://www.lesswrong.com/posts/Wc5BYFfzuLzepQjCq/inflection-ai-is-a-major-agi-labNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+...
2023-08-15
06 min
LessWrong (Curated & Popular)
"Feedbackloop-first Rationality" by Raemon
I've been workshopping a new rationality training paradigm. (By "rationality training paradigm", I mean an approach to learning/teaching the skill of "noticing what cognitive strategies are useful, and getting better at them.")I think the paradigm has promise. I've beta-tested it for a couple weeks. It’s too early to tell if it actually works, but one of my primary goals is to figure out if it works relatively quickly, and give up if it isn’t not delivering. The goal of this post is to:Convey the frameworkSee if people find it compe...
2023-08-15
15 min
LessWrong (Curated & Popular)
"When can we trust model evaluations?" bu evhub
In "Towards understanding-based safety evaluations," I discussed why I think evaluating specifically the alignment of models is likely to require mechanistic, understanding-based evaluations rather than solely behavioral evaluations. However, I also mentioned in a footnote why I thought behavioral evaluations would likely be fine in the case of evaluating capabilities rather than evaluating alignment:However, while I like the sorts of behavioral evaluations discussed in the GPT-4 System Card (e.g. ARC's autonomous replication evaluation) as a way of assessing model capabilities, I have a pretty fundamental concern with these sorts of techniques as a mechanism for...
2023-08-09
17 min
LessWrong (Curated & Popular)
"Model Organisms of Misalignment: The Case for a New Pillar of Alignment Research" by evhub, Nicholas Schiefer, Carson Denison, Ethan Perez
TL;DR: This document lays out the case for research on “model organisms of misalignment” – in vitro demonstrations of the kinds of failures that might pose existential threats – as a new and important pillar of alignment research.If you’re interested in working on this agenda with us at Anthropic, we’re hiring! Please apply to the research scientist or research engineer position on the Anthropic website and mention that you’re interested in working on model organisms of misalignment.Source:https://www.lesswrong.com/posts/ChDH335ckdvpxXaXX/model-organisms-of-misalignment-the-case-for-a-new-pillar-of-1
2023-08-09
35 min
LessWrong (Curated & Popular)
"ARC Evals new report: Evaluating Language-Model Agents on Realistic Autonomous Tasks" by Beth Barnes
Blogpost versionPaperWe have just released our first public report. It introduces methodology for assessing the capacity of LLM agents to acquire resources, create copies of themselves, and adapt to novel challenges they encounter in the wild.BackgroundARC Evals develops methods for evaluating the safety of large language models (LLMs) in order to provide early warnings of models with dangerous capabilities. We have public partnerships with Anthropic and OpenAI to evaluate their AI systems, and are exploring other partnerships as well.Source:...
2023-08-04
08 min
LessWrong (Curated & Popular)
"The "public debate" about AI is confusing for the general public and for policymakers because it is a three-sided debate" by Adam David Long
Summary of Argument: The public debate among AI experts is confusing because there are, to a first approximation, three sides, not two sides to the debate. I refer to this as a 🔺three-sided framework, and I argue that using this three-sided framework will help clarify the debate (more precisely, debates) for the general public and for policy-makers.Source:https://www.lesswrong.com/posts/BTcEzXYoDrWzkLLrQ/the-public-debate-about-ai-is-confusing-for-the-generalNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓
2023-08-04
07 min
LessWrong (Curated & Popular)
"My current LK99 questions" by Eliezer Yudkowsky
So this morning I thought to myself, "Okay, now I will actually try to study the LK99 question, instead of betting based on nontechnical priors and market sentiment reckoning." (My initial entry into the affray, having been driven by people online presenting as confidently YES when the prediction markets were not confidently YES.) And then I thought to myself, "This LK99 issue seems complicated enough that it'd be worth doing an actual Bayesian calculation on it"--a rare thought; I don't think I've done an actual explicit numerical Bayesian update in at least a year.In the pr...
2023-08-04
09 min
LessWrong (Curated & Popular)
"Thoughts on sharing information about language model capabilities" by paulfchristiano
I believe that sharing information about the capabilities and limits of existing ML systems, and especially language model agents, significantly reduces risks from powerful AI—despite the fact that such information may increase the amount or quality of investment in ML generally (or in LM agents in particular).Concretely, I mean to include information like: tasks and evaluation frameworks for LM agents, the results of evaluations of particular agents, discussions of the qualitative strengths and weaknesses of agents, and information about agent design that may represent small improvements over the state of the art (insofar as that in...
2023-08-02
19 min
LessWrong (Curated & Popular)
"Yes, It's Subjective, But Why All The Crabs?" by johnswentworth
Some early biologist, equipped with knowledge of evolution but not much else, might see all these crabs and expect a common ancestral lineage. That’s the obvious explanation of the similarity, after all: if the crabs descended from a common ancestor, then of course we’d expect them to be pretty similar.… but then our hypothetical biologist might start to notice surprisingly deep differences between all these crabs. The smoking gun, of course, would come with genetic sequencing: if the crabs’ physiological similarity is achieved by totally different genetic means, or if functionally-irrelevant mutations differ across crab-species by more...
2023-07-31
11 min
LessWrong (Curated & Popular)
"Self-driving car bets" by paulfchristiano
This month I lost a bunch of bets.Back in early 2016 I bet at even odds that self-driving ride sharing would be available in 10 US cities by July 2023. Then I made similar bets a dozen times because everyone disagreed with me.Source:https://www.lesswrong.com/posts/ZRrYsZ626KSEgHv8s/self-driving-car-betsNarrated for LessWrong by TYPE III AUDIO.Share feedback on this narration.[125+ Karma Post] ✓[Curated Post] ✓
2023-07-31
08 min
LessWrong (Curated & Popular)
"Cultivating a state of mind where new ideas are born" by Henrik Karlsson
In the early 2010s, a popular idea was to provide coworking spaces and shared living to people who were building startups. That way the founders would have a thriving social scene of peers to percolate ideas with as they figured out how to build and scale a venture. This was attempted thousands of times by different startup incubators. There are no famous success stories.In 2015, Sam Altman, who was at the time the president of Y Combinator, a startup accelerator that has helped scale startups collectively worth $600 billion, tweeted in reaction that “not [providing coworking spaces] is pa...
2023-07-31
24 min
LessWrong (Curated & Popular)
"Rationality !== Winning" by Raemon
I think "Rationality is winning" is a bit of a trap. (The original phrase is notably "rationality is systematized winning", which is better, but it tends to slide into the abbreviated form, and both forms aren't that great IMO)It was coined to counteract one set of failure modes - there were people who were straw vulcans, who thought rituals-of-logic were important without noticing when they were getting in the way of their real goals. And, also, there outside critics who'd complain about straw-vulcan-ish actions, and treat that as a knockdown argument against "rationality."
2023-07-28
14 min
LessWrong (Curated & Popular)
"Brain Efficiency Cannell Prize Contest Award Ceremony" by Alexander Gietelink Oldenziel
Previously Jacob Cannell wrote the post "Brain Efficiency" which makes several radical claims: that the brain is at the pareto frontier of speed, energy efficiency and memory bandwith, that this represent a fundamental physical frontier.Here's an AI-generated summaryThe article “Brain Efficiency: Much More than You Wanted to Know” on LessWrong discusses the efficiency of physical learning machines. The article explains that there are several interconnected key measures of efficiency for physical learning machines: energy efficiency in ops/J, spatial efficiency in ops/mm^2 or ops/mm^3, speed efficiency in time/delay for key learned task...
2023-07-28
13 min
LessWrong (Curated & Popular)
"Grant applications and grand narratives" by Elizabeth
The Lightspeed application asks: “What impact will [your project] have on the world? What is your project’s goal, how will you know if you’ve achieved it, and what is the path to impact?”LTFF uses an identical question, and SFF puts it even more strongly (“What is your organization’s plan for improving humanity’s long term prospects for survival and flourishing?”). I’ve applied to all three grants of these at various points, and I’ve never liked this question. It feels like it wants a grand narrative of an amazing, systemic project that will meas...
2023-07-28
11 min
LessWrong (Curated & Popular)
"Cryonics and Regret" by MvB
This post is not about arguments in favor of or against cryonics. I would just like to share a particular emotional response of mine as the topic became hot for me after not thinking about it at all for nearly a decade.Recently, I have signed up for cryonics, as has my wife, and we have made arrangements for our son to be cryopreserved just in case longevity research does not deliver in time or some unfortunate thing happens.Last year, my father died. He was a wonderful man, good-natured, intelligent, funny, caring and, most...
2023-07-28
03 min
LessWrong (Curated & Popular)
"Introduction to abstract entropy" by Alex Altair
https://www.lesswrong.com/posts/REA49tL5jsh69X3aM/introduction-to-abstract-entropy#fnrefpi8b39u5hd7This post, and much of the following sequence, was greatly aided by feedback from the following people (among others): Lawrence Chan, Joanna Morningstar, John Wentworth, Samira Nedungadi, Aysja Johnson, Cody Wild, Jeremy Gillen, Ryan Kidd, Justis Mills and Jonathan Mustin. Illustrations by Anne Ore.Introduction & motivationIn the course of researching optimization, I decided that I had to really understand what entropy is.[1] But there are a lot of other reasons why the concept is worth studying:...
2022-10-30
45 min
LessWrong (Curated & Popular)
"Two-year update on my personal AI timelines" by Ajeya Cotra
https://www.lesswrong.com/posts/AfH2oPHCApdKicM4m/two-year-update-on-my-personal-ai-timelines#fnref-fwwPpQFdWM6hJqwuY-12Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.I worked on my draft report on biological anchors for forecasting AI timelines mainly between ~May 2019 (three months after the release of GPT-2) and ~Jul 2020 (a month after the release of GPT-3), and posted it on LessWrong in Sep 2020 after an internal review process. At the time, my bottom line estimates from the bio anchors modeling exercise were:[1]Roughly ~15% probability of transformative AI by 2036[2] (16 years from posting the report; 14 years...
2022-09-22
39 min
LessWrong (Curated & Popular)
"Language models seem to be much better than humans at next-token prediction" by Buck, Fabien and LawrenceC
https://www.lesswrong.com/posts/htrZrxduciZ5QaCjw/language-models-seem-to-be-much-better-than-humans-at-nextCrossposted from the AI Alignment Forum. May contain more technical jargon than usual.[Thanks to a variety of people for comments and assistance (especially Paul Christiano, Nostalgebraist, and Rafe Kennedy), and to various people for playing the game. Buck wrote the top-1 prediction web app; Fabien wrote the code for the perplexity experiment and did most of the analysis and wrote up the math here, Lawrence did the research on previous measurements. Epistemic status: we're pretty confident of our work here, but haven't engaged in a super t...
2022-09-15
27 min
LessWrong (Curated & Popular)
"Unifying Bargaining Notions (1/2)" by Diffractor
https://www.lesswrong.com/posts/rYDas2DDGGDRc8gGB/unifying-bargaining-notions-1-2Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.This is a two-part sequence of posts, in the ancient LessWrong tradition of decision-theory-posting. This first part will introduce various concepts of bargaining solutions and dividing gains from trade, which the reader may or may not already be familiar with.The upcoming part will be about how all introduced concepts from this post are secretly just different facets of the same underlying notion, as originally discovered by John Harsanyi back...
2022-09-09
46 min
LessWrong (Curated & Popular)
"«Boundaries», Part 1: a key missing concept from utility theory" by Andrew Critch
https://www.lesswrong.com/posts/8oMF8Lv5jiGaQSFvo/boundaries-part-1-a-key-missing-concept-from-utility-theory Crossposted from the AI Alignment Forum. May contain more technical jargon than usual. This is Part 1 of my «Boundaries» Sequence on LessWrong. Summary: «Boundaries» are a missing concept from the axioms of game theory and bargaining theory, which might help pin-down certain features of multi-agent rationality (this post), and have broader implications for effective altruism discourse and x-risk (future posts). 1. Boundaries (of living systems) Epistemic status: me describing what I mean. With the exception of some relatively recent and isola...
2022-07-28
18 min
LessWrong (Curated & Popular)
"It’s Probably Not Lithium" by Natália Coelho Mendonça
https://www.lesswrong.com/posts/7iAABhWpcGeP5e6SB/it-s-probably-not-lithium A Chemical Hunger (a), a series by the authors of the blog Slime Mold Time Mold (SMTM) that has been received positively on LessWrong, argues that the obesity epidemic is entirely caused (a) by environmental contaminants. The authors’ top suspect is lithium (a)[1], primarily because it is known to cause weight gain at the doses used to treat bipolar disorder. After doing some research, however, I found that it is not plausible that lithium plays a major role in the obesity epidemic, and that a lot of th...
2022-07-05
1h 11