Anecdotes of AI Supercharged Science: Justified Posteriors reads “Early Science Acceleration Experiments with GPT-5”
In this episode, Seth and Andrey break down OpenAI’s report, Early Science Acceleration Experiments with GPT-5. The paper is organized as a series of anecdotes about how top scientists used an early version of GPT-5 in their scientific investigations. The coauthors of the papers try out the model to help them with everything from Erdős’ unsolved math problems to understanding black hole symmetries to interpreting the results of a biological experiment. Seth and Andrey’s priors revolve around whether current models are closer to a “superpowered lit review” or a genuine co-author. They bring in how they currently use LLMs in their own economic research—from coding assistance to "middle-brow" theorizing—before diving into the paper’s anecdotes. They also discuss the economics of AI science and whether AI can ever achieve a Kuhnian paradigm shift. A key question is what is the main bottleneck to more useful AI tools for math and science — is it the model’s reasoning capability or simply the lack of translation layers into formal proof systems like Lean?
Priors
Hypothesis 1: What is the most promising paradigm for AI in Science today and 5 years from now? (The four paradigms: Recreating frontier science, Superpowered Lit Review, Working with AI/Co-working, and AI on its own).
* Andrey’s View:
* Today: “Working with AI” (Co-working) is the primary mode. It doesn’t automate the job but makes the human significantly more productive.
* In 5 Years: “Working with AI” remains the dominant mode. While “AI on its own” is the holy grail, he believes human-AI collaboration will still be the standard, though the tasks will shift higher up the stack.
* Seth’s View:
* Today: “Superpowered Lit Review” is the clearest “no-downside win.” Checking if a problem is already solved offers massive efficiency gains without the risk of hallucination inherent in creative work.
* In 5 Years: “AI on its own”—but with a major caveat based on Thomas Kuhn’s philosophy. Seth predicts AI will be capable of autonomous “Normal Science” (puzzle solving within a paradigm) but skeptical it can achieve “Revolutionary Science” (creating new paradigms like molecular motion theory or relativity).
Hypothesis 2: How impressed will we be by the anecdotes in this report? (On a scale of 0 to 10, where 10 is “Holy Sh*t / Curing Cancer” and 0 is “Trivial”).
* Andrey’s View:
* Estimate: “Pretty Impressed” (Implied ~7/10).
* Reasoning: He does not expect a “Holy Sh*t” moment (like curing cancer or solving the Riemann hypothesis) because those results take years to verify or diffuse. However, he expects to see strong productivity gains in “middle-brow” theory.
* Seth’s View:
* Estimate: 7 or 8 out of 10.
* Reasoning: He prices in that this is a “highly selected sample” from OpenAI marketing. He expects to be impressed but skeptical of direct practical applications (e.g., a medical treatment we can use in the near future).
Links + Shownotes
* Early Science Acceleration Experiments with GPT-5 – The central paper of the episode by Sébastien Bubeck, Timothy Gowers, and others (OpenAI/arXiv, Nov 2025).
* Sparks of Artificial General Intelligence: Early experiments with GPT-4 – The predecessor paper by Sebastian Bubeck et al. (for context on the “Early Experiments” series).
Scholars Mentioned
* Benjamin Golub – Podcast guest in a recent episode; Professor of Economics and Computer Science at Northwestern University. We say the episode with Golub is upcoming, but it’s already out! Check it out here.
* Timothy Gowers – Fields Medalist and co-author of the paper
* Sébastien Bubeck – Lead author of the paper and researcher at OpenAI.
* Terence Tao – Fields Medalist mentioned for his use of AI in mathematics.
* Imre Lakatos – A philosopher of science
* Tyler Cowen – Economist mentioned regarding the concept of “Writing for the AI.”
* Paul Erdős Problems – The unsolved problems of this famously prolific mathematician were used as a benchmark.
Tools & Technology
* Refine.inc – The AI-for-science tool co-founded by Ben Golub.
* Lean – The theorem prover and programming language discussed as a potential bottleneck/accelerant for checking AI math.
* Elicit – The AI research assistant mentioned by Andrey for literature reviews.
* Pangram Labs – The AI text detection tool mentioned in the context of scientific writing.
Concepts & Philosophy
* The Structure of Scientific Revolutions – Thomas Kuhn’s foundational text on “Normal Science” vs. “Paradigm Shifts.”
* The Lucas Critique – Economic theory mentioned by Seth regarding a recent economic paradigm shifts.
Transcript:
[00:00] Seth Benzell: Welcome to the Justified Posteriors podcast, the podcast that updates its beliefs about the economics of AI and technology. I’m Seth Benzell, sharing helpful ideas that come naturally to me, but not quite big enough a contribution to demand co-authorship, at Chapman University in sunny Southern California.
[00:33] Andrey Fradkin: And I’m Andrey Fradkin, experimenting with numerous ways to use AI in order to make the trivial parts of my work take way less time. But then again, maybe all parts of my work are trivial. Coming to you from San Francisco, California.
[00:53] Seth: All right, Andrey. Coming out the gate against himself.
[00:58] Andrey: That’s the only way I know how to be, Seth. That’s the only way.
[01:03] Seth: Well, I mean, maybe that’s a good place to start. I know that you use LLMs all the time as part of your research. We could talk a little bit as we go along about how you use it now, but maybe you could tell me: how do you use it now and how would your dream AI assistant help you with research? Is your dream to completely delegate it? What would be a reasonable near-term dream? What do you have and what do you want?
[01:31] Andrey: Yeah. Wow. I didn’t realize it was already Christmas. Readers, we’re recording this in November, so it’s not quite there yet.
[01:41] Seth: Mariah Carey is on the way, dude.
[01:44] Andrey: So, look, I use it all the time. And I proactively use it because I’m always trying to figure out what it’s capable of doing and what it’s not capable of doing. You know, in terms of the science part of our work—which is a big part of it, but a lot of what we do is also presentation, communication, reimbursement requests...
[02:12] Seth: [Laughs] Reimbursement requests.
[02:14] Andrey: Yeah. But in terms of science, some parts of my work require some math, right? Not very complicated math. And I’ve been using the latest generation of AIs to see how well it does there. And, you know, it’s pretty good, honestly. It definitely requires oversight. Like, I wouldn’t trust it to just do it. But with some iteration, it has given me good results and it’s allowed me to check some of my results. And once we’re kind of agreed—me and the model—on what the results are, it’s very efficient at writing it up. And even doing things like, “Oh, create a simulation based on this model,” or “Create an interactive visualization based on this model.” So I think that sort of work, it’s already pretty good at.
[03:17] Seth: Actually, can I ask a quick question here before you go on? You’ve described it as a system that is maybe like... it guesses and then you have to check it. So you have this sort of iteration. You say, “Solve for the equilibrium of this model,” and you’re not guaranteed that the first output is going to be correct. So that’s a sense in which the AI is proposing solutions and you’re the verifier. But you also find it useful for the opposite, right? Where you have an intuition about a result and then it’s the verifier. Should I notice a contradiction there?
[03:56] Andrey: I don’t think it’s a contradiction. I think as with any results or ideas, we want to battle-test it, right? And that could go in either direction. It’s kind of like when you give an academic seminar. You’re going to present some work and you’re going to get feedback from a bunch of people. Some of it might be good, some of it might be bad. But you might also go to your co-author and they might create something new. So I don’t view it as a contradiction. I guess one way to think about it is that it’s not omniscient, right? So it isn’t like doing things end-to-end without my judgment yet. I can’t just give it a prompt and then it finishes the entire task.
[04:54] Seth: It sounds kind of like a colleague with some knowledge in the domain.
[04:59] Andrey: Yes, exactly.
[05:01] Seth: It might be able to propose an answer that isn’t necessarily right, and it might find a flaw in one of your ideas—those aren’t necessarily right either—but you would never use it as its own end-to-end proof to write it up and present it at Columbia.
[05:19] Andrey: Yeah, yeah. And then the other thing is... what I’ve been talking about is more on the theoretical side. And certainly, I’m not a theorist, so it’s not like I’m doing very complicated things there. But on the empirical side, it’s also very useful. And once again, I found that it’s not giving me end-to-end results. If I just told it, let’s say, “Hey, I have this natural experiment and I’d like you to measure the causal effect,” it’s definitely not going to give me what I want. And maybe that’s underspecified. Or maybe it doesn’t have my taste for what type of evidence I like. But once I give it enough—maybe an initial sketch of the identification strategy—it can very easily automate. Let’s say I did this for one country and I want to replicate that analysis for another country...
[06:30] Seth: I want you to use rainfall as an instrument.
[06:32] Andrey: Yeah. “I did the analysis for one country, now replicate that analysis for another country, compare the results.” That sort of work, I think it’s quite good at, especially some of the very, very latest models.
[06:47] Seth: Okay. I mean, it sounds like that’s pretty capable. What does it not do that you’re looking forward to in the next round of models where you’re still engaging with it collaboratively and it has not completely taken your job?
[07:02] Andrey: Um. It’s not very good at coming up with new ideas right now. Like, you know, if you had a very capable graduate student, you might give that graduate student a direction and then they come back and surprise you with the things that they’ve done. I don’t see that happening. Maybe I’m not using it correctly, but that would be very nice. Ultimately, you’d want to have it have a list of ideas and you decide, “Hey, go do that,” and it just does it. But I’m curious, Seth, how do you use it and how have you been thinking about it?
[07:49] Seth: That’s a good question. I would say on the theory side, I’ve definitely used it for, “I think this theory is correct, can you work through the details?” or “Here’s my sketch of a proof, can you formalize it?” Definitely, at least the way I use it, it’s been hit or miss. I’m mostly using the GPT models. When it hits, it hits really nice. Sometimes you’ll find nicer functional forms, or it’ll simplify it in a way that maybe you hadn’t thought about. So I found it useful for kind of middle-brow theory. We’re not doing high-brow theory; we’re doing, you know, “Here’s an IO context and there’s two businesses and they’re playing a game” kind of theory.
[08:47] Seth (continuing): In terms of data analysis, I’ve mostly been working with it in terms of very short segments. Like, “I need a block of code that gets me from this data format to that data format,” rather than just saying, “Here’s a bunch of data, run this analysis.” I’m not saying you can’t do that, but I haven’t worked myself up to that yet. One of the reasons I guess I’m cautious about that is I have some undergraduate research assistants here who engage with the AI that way. And if you’re not sophisticated, you get some real garbage that way, right?
[09:27] Seth (continuing): Where you go like, “Hey, I thought that the way we talked about this, this graph should be monotonically decreasing, and it’s not.” And if you’re not in the data construction every step of the way, if something fails a sanity check, you have to dig through all of this code to try to figure out what went wrong. So that’s kind of where I’m at right now.
[09:48] Andrey: But I guess I’m surprised, Seth. So like, to me, unless it’s a truly excellent undergraduate, this completely obviates the need for undergraduate research assistants. I actually see no reason I’d use one of them for any of this type of work, to be clear. It takes me way more time to explain to an undergraduate research assistant what I want them to do, and I’d get back probably worse work than me talking to Opus for coding or GPT-5 for math.
[10:31] Seth: Ex-post, you’re completely correct. Ex-post, you nailed it. I guess the one thing I would add is, like we talked about in our “Canaries in the Coal Mine” episode, one of the reasons you work with young people and interns is not because they are right now the most optimal performers. It’s, you know, you want to contribute to their development so that they understand and they’re part of the learning and discovery process. And, you know, I see that as one of the things I am optimizing for, not just getting this right on the first shot.
[11:09] Andrey: Yeah, yeah. I mean, I’m with you. I think often times... if that’s structured correctly, then I’m with you. But a lot of the time...
[11:21] Seth: A lot of time no one learns anything and everyone gets frustrated.
[11:24] Andrey: Yeah, I wanted to word it delicately. No one learns anything. It’s a “make-work” type arrangement. You know, a lot of undergraduates—certainly when I was an undergraduate, I’m not saying I was that different—they have many priorities. They’re not even really focused on whatever it is you tell them to do.
[11:46] Seth: More exciting than working with Professor Fradkin? I can’t even imagine.
[11:51] Andrey: God, yeah. Everything.
[11:57] Seth: Watching paint dry. Watching paint dry while stapling my hand.
[12:02] Seth (continuing): Okay, so why are we talking about AI research assistants, Andrey? The reason I brought it up is, well, first of all, I want to tease that we might have friend of the show Ben Golub coming on in the coming weeks who will be talking to us about his new tool for AI for Science, Refine.inc, that we’re super excited to learn about.
[12:27] Andrey: So just to be clear, it’s called Refine.inc. You should check it out.
[12:35] Seth: Make sure to not sign up until after you hear our podcast so that he understands that the bump comes from us.
[12:44] Andrey: We are going to Granger-cause so many signups. You’re not going to believe it.
[12:50] Seth: You will not believe the Granger causality. Exactly. We’ll have to instrument for our analysis with rainfall. Okay. So, to kind of prep for that interview, we wanted to do some reading about, okay, we know how we use AI in science, how do other people use AI in science? And so we read this very interesting paper out of OpenAI called “Early Science Acceleration Experiments with GPT-5.” Andrey, would you like to read the list of authors?
[13:28] Andrey: It’s a pretty long list of authors, so I’d rather not actually. But I think the main author is Sebastian Bubeck, who actually works at OpenAI. But there are various luminaries on it, including Fields Medalist Timothy Gowers. So it’s a pretty impressive lineup. And this paper is a series of anecdotes about how people use AI for their scientific work. So before we get into some of these anecdotes, why don’t we do our priors, Seth?
[14:10] [Music / Transition]
[14:16] Seth: Okay. So, Andrey. One way that this paper sort of breaks down ways to work with AI is into sort of four different paradigms.
* Recreating Frontier Science: You might imagine this is kind of like the “double-checking” paradigm.
* Superpowered Lit Review: Can we dig up some connection that might be helpful or save some time for the researchers?
* Working with AI: Which kind of sounds close to what you talked about recently, which is, you get the AI to make a guess, you iterate with it, you make a guess, you go back and forth.
* AI on its Own: You just say, “Hey AI, solve global warming, go.”
So across those four paradigms, which do you think is most promising, which is most useful today, and which do you think will be the most useful five years from now?
[15:19] Andrey: Yeah, that’s a great question. I mean, today I think the obvious answer is “Working with AI.” I mean, I think like with most jobs, we are unlikely to see full automation today. To be clear. But working with the AI can make you a lot more productive. It’s already made me a lot more productive. It’s making a lot of people more productive that I talk to. You know, some people are skeptical. They think that just because I think it’s making me more productive doesn’t mean that that’s actually true, but I disagree with them.
[16:01] Seth: Compensating differentials regarding productivity.
[16:04] Andrey: Yeah, yeah. But even without compensating differentials, I guess. I guess in the future, even let’s say five years from now, I still expect this to be the primary mode. Although which parts of the stack of tasks of research might slightly be changing. I think obviously AI on its own doing research is a “Holy Grail.” Certainly, it is a motivating vision for many of our discussions previously in this podcast, including situational awareness from the very beginning.
[16:44] Seth: Line go up from village idiot to superintelligence.
[16:47] Andrey: Yeah. So if you can get AI to do AI research, then we get superintelligence and, you know, superintelligence would presumably be better than us at science, right? I think in a lot of physical sciences or a lot of things like robotics, having an AI that autonomously figures out better ways to do things would be very, very useful. The extent to which that’s actually possible... one, depends on the level of intelligence, obviously. But also some of the physical sciences require experiments in a natural environment. Or at the very least a very, very high-fidelity simulation. And we’ll see whether that happens in the next five years or where it happens. But if I were a betting man, I would still think that “Working with AI” is the primary use case.
[17:51] Seth: Both today and in five years. Okay. Well, so I’m happy to have a little bit of disagreement with you here. Which is... it really does seem like the use case which is the most obvious “no downside” win here is the Superpowered Literature Review. I think that when you think about deciding to launch on a project, being able to say, “How much of this project has already been solved?”... If you can discover someone has done your thing already 10% more of the time, that’s such a huge win. And you don’t have to rely so much on trusting the AI’s agency on its own.
[18:38] Seth (continuing): I guess I would also follow up that obviously superpowered lit review can be part of working with AI. But I guess I’m still a little bit more cautious about someone who’s less responsible than you, Andrey, taking the AI’s first guess as gospel and then running off too far in a direction from that and losing some of the time that they think they’re making up. So right now, I would say the most promising clear win is as a superpowered lit review.
[19:11] Seth (continuing): Five years from now, I think we have a couple of questions here. Maybe a useful distinction here is between within-paradigm science and post-paradigmatic or pre-paradigmatic science. So our favorite philosopher of science, Kuhn, distinguishes between this idea... (Andrey: Hey, speak for yourself!) Who’s your favorite philosopher of science? Help me out.
[19:35] Andrey: What if I said Lakatos? Or Popper? I don’t know.
[19:41] Seth: Oh my god. Popper? Listen, it’s easy to falsify Popper’s falsifiability, right? So there you go.
[19:48] Andrey: To be clear, I like all of my philosophers of science equally. Except Feyerabend... whatever.
[19:59] Seth: Exactly.[20:00] Seth Benzell: Yeah. Except for people who think, you know... except for Foucault who thinks science isn’t real. Okay, but... so, coming back. What does Kuhn say? Kuhn says there’s kind of two kinds of science. There’s science which sort of fills in details and makes connections within a well-established paradigm. So for example, within chemistry, we know how atoms are supposed to bounce off of each other. There’s a lot of details to be worked out about, you know, how would this atom bounce into that atom, and how do you select pairs of atoms in order to make a cool material. But there’s nothing... at least as far as I know, there’s not a lot of paradigm busting going on. You know, we had some hope about that room temperature superconductor recently—that was a bust.
[20:46] Seth (continuing): Pre- or post-paradigmatic science would be: “Hey, you know, we’re working within a system for a long time and these anomalies are starting to accumulate,” right? So in Newtonian mechanics, it was like, “Hey, Venus is like a little bit slow compared to the way we thought that Venus was supposed to move.” So... oh, there used to be the Phlogiston theory of heat, right? That heat was like a substance that would flow between two materials. And like, that explains some good stuff about how heat works, right? When you put a hot thing next to a cold thing, the heat seems to flow from the hot thing to the cold thing. But there were anomalies there, right? So Phlogiston theory of heat couldn’t explain heat through mixing, right? So if you rub your hands together, they get hot. Okay, where did that heat come from? It wasn’t Phlogiston, right? Because you just made it from nothing.
[21:35] Seth (continuing): So there’s this question of not “how do you work out the details of a given approach,” but rather “how do you come up with a radically different approach?” Now in economics, we’re pretty happy with our paradigm. I gotta say. I like my paradigm. You don’t like our paradigm?
[21:55] Andrey Fradkin: Come on, man.
[21:59] Seth: [Laughs] All right. Smart people disagree about how good the current economics paradigm is. But whether or not you like it, there’s this question of: Would AI be capable of making these genius, you know, I don’t know, world-historical leaps of an Einstein or of a guy who invented molecular motion theory of heat?
[22:27] Seth (continuing): So... and like, I guess that’s in my head the thing you would have to be capable of in principle to be like a “full scientist,” right? Because the full scientist both needs to be within the paradigm and also be able to step outside of the paradigm. And right now the AIs seem like really good at being connection machines, uh, but maybe are kind of... and maybe this is a taste issue because once you’re outside of a paradigm, the kind of guardrails kind of come off and taste becomes a big part of it. I’m less excited about AI being able to move in that direction. Or at least I think that’s a less promising direction. So to answer the... the question, the prior, I would say: Right now, Superpowered Lit Review. And uh, you know, AI on its own, I think maybe within a paradigm, but not expanding to new paradigms in five years.
[23:19] Andrey: Yeah, yeah. I mean, I mostly agree with you. I guess I think paradigm shifts... it’s hard to really know what one is. One way to think about it, like... we’re most familiar with economics. And we’ve been in this field for what, about, you know, 15, 20 years, right?
[23:41] Seth: So Lucas Critique would probably be the last big one?
[23:44] Andrey: Yeah, but I... you know, I guess I don’t know if that’s even a paradigm shift. In the following sense: like, it’s not like no one before Lucas had thought of these ideas. Lucas formalized them in some way. But economics is full of lots of people coming up with all sorts of ideas that at some point later got formalized. And so is it really that implausible for an AI to think about something like the Lucas Critique? I mean it’s... it’s truly... I mean that’s the thing about paradigm shifts. Like true ones... Or another way to put it: like, we think of like Einstein, right? But I’d say field experience much smaller types of paradigm shifts. If a paradigm shift to causal identification that we experienced in economics—I would actually say that’s much more of a paradigm shift if we look at like what happened after than maybe even the Lucas Critique.
[24:49] Andrey (continuing): But it’s not that crazy to think that an AI would... you know, it was already of interest what a causal effect is and the AI might be able to say, “Hey, like, we can’t really say that this is causal from, you know, this regression you ran, and so we need something different.” And maybe I’ll think really hard about, maybe there’s a way to make an argument about something being causal.
[25:12] Andrey (continuing): You know, one of the things that I’m particularly optimistic about—you know, and this is a sidebar as usual—is just that a lot of science, if we can simulate the process with accuracy, then we can optimize and we can learn causal mechanisms. That means we can actually do science on the simulation. And so to the extent that the AI is a computer... you know, is essentially a code—it thinks in code...
[25:47] Seth: Like a brain in a vat.
[25:48] Andrey: Yeah, it thinks in code. It could be potentially very, very powerful for that. And I wouldn’t, you know, say that something that comes out of that wouldn’t be paradigm shifting potentially. So yeah. I would say like, because paradigm shifts are actually just... true ones are just very hard to... you don’t know what they’re going to be ahead of time. I’m not going to say that the AI can’t do it. That’s kind of my position here.
[26:12] Seth: Right. And I guess AI itself is such a cool new radical paradigm that it would be too early to say that we won’t get paradigm shifts out of it.
[26:19] Andrey: Yes, exactly.
[26:22] Seth: All right. How about a second prior for you? Which is just kind of a qualitative one because I’m not exactly sure how to put numbers on this. If you want to put numbers on it, go for it. Maybe you can denominate this in, you know, CCs of adrenaline.
[26:36] Andrey: Yeah.
[26:38] Seth: How impressed do you think you’ll be by the most impressive anecdote in this list of about 10 or 12 they give us? On a scale from “Eh” to... I don’t know. I’m not allowed to curse anymore so... imagine intensifier of your choice.
[26:57] Andrey: Seth said the word “s**t” on this... Look, I, you know, I expect to be pretty impressed. Not like “Holy S**t” impressed. I think a “Holy S**t” sort of impression would be like solving one of the, you know, long-standing open problems in mathematics or something like that. Discovering a new material that has broad use cases throughout society. You know, curing cancer. That I guess that would be...
[27:30] Seth: Yeah that would get you out of your bed. Get you out of your chair if you cured cancer. There we go.
[27:35] Andrey: Well, I mean, that would be like the extreme. I think it’s interesting to think through those examples. Like the math one, you know, I can’t verify it. Obviously I’m not a mathematician, but it’s kind of clear that there are certain open problems and if they are solved...
[27:51] Seth: Andrey, you’re a podcaster. You’re higher than a mathematician.
[27:55] Andrey: Yeah, well. Some people, you know, are called to the truly noble pursuits. Um. Yeah, so I can’t verify it. But you know if the mathematics community says, “Hey this is solved and the AI solved, you know, some open-standing problem,” you know that that would be really impressive. I think things like, you know, let’s say biological sciences... even if we found a cure for cancer today, you know, by the time that will be recognized within society that will take a long time.
[28:30] Andrey (continuing): And I actually expect that no matter... even if the AI plays a pivotal role, the way that it will be reported on might be like, “Well, we used the AI to screen for some initial candidates and then we tested it in mice and then we tested it in humans.” Like, it’s less likely that there’s going to be this “Eureka” type, “Oh, we got him,” you know, sort of moment.
[28:53] Seth: Right. There are ten pivotal... like yes. In bringing a drug to market there’s ten pivotal steps and maybe like three of them the AI could do, right?
[29:00] Andrey: Yeah. And we already like use AI all over the place, right? For various statistical type processes in research in the medical sciences, right? So it’s not... yeah. You know, if you think about like Generative AI end-to-end reasoning through the solution, maybe one version of this... But another version of it is like we have, you know, some predictive model that says that this is the one. This is the molecule that will do it, you know?
[29:33] Seth: Okay. Um. I guess from this example, I kind of want to price in the fact... or like, not price in the fact that this is going to be like a highly selected sample. This is from OpenAI. You just talked about how, you know, the Nobel Laureate biologist probably wants to downplay the role of AI. Well, OpenAI would like to upplay the role of AI. Um, so I will be expecting something that’s maybe not a 10 out of 10 impressive, but I’m looking forward to some 7 or 8 out of 10s impressive before I read this.
[30:10] Andrey: Yeah, yeah. So I mean I think we’re both in agreement. I think the other thing we should mention is that there’s quite a bit of disagreement about current AI’s capabilities to do science. I’ll just give you an anecdote. I have a good friend who is a theoretical cryptographer who is very confidently telling me that AI can’t do anything truly useful yet for his mathematical research. And there are certainly people, you know... common voices in the media that are AI skeptics like Gary Marcus who, you know, is going to dismiss every single thing that the AI does as trivial.
[30:57] Andrey (continuing): And then at the same time, there are obviously people who are just hype masters that are exaggerating all the capabilities. So, so yeah. Let’s see what happens.
[31:07] Seth: I love that. “Within-paradigm science is trivial. Pre-paradigmatic science is b******t.” At the intersection, you have Justified Posteriors. Okay.
[31:16] [Music / Transition]
[31:22] Seth: Okay. So let’s get to the evidence. It’s a pretty unusual paper for us. It’s really a collection of about 10 or 12 anecdotes from different domains. So we see examples from math, physics, astronomy, biology, and material science. Uh. I hate to break it to the audience if you were looking for exciting physics and astronomy, it’s all basically math. They’re pretty mathy questions. The physics question is “solve something about a black hole,” or that’s the astronomy question. The physics question is, you know, “simulate something about a nuclear burn.”
[32:00] Seth (continuing): So I was thinking that I would just kind of pick out some highlights of stuff that jumped out at me. You’ll interrupt me as we go. All right. So talking first about through some of these math examples. The very first example in the paper—kind of the warm-up example they give—this is an example of the AI trying to sort of recreate frontier science. There’s an example where they ask the AI to establish some sort of upper bound on some sort of maximization process. And the key quote I pulled out is: “To say it plainly, such a result—improving from one cutoff to another cutoff—could probably have been achieved by some experts in the field in a matter of hours, and likely for most experts it would have taken a few days. This is the type of science acceleration that we will see time and time again in the report.”
[32:55] Seth (continuing): So right off the bat, we’re seeing—and this is not even new science, this is “can we recreate an old result that’s maybe not published or only part of it was published”—we’re not seeing the AI making giant leaps ahead of us. We’re seeing it completing a key step. And we’re going to see that over and over again. In this particular example, the AI does not even get to the known best cutoff of 1.7 over L. It only gets to 1.5 over L, over the previously best published 1 over L. L being a parameter in the model that we’re talking about. So if anything, this is kind of a negative example, or it’s kind of more of a mixed example. It helped them speed up part of an analysis but maybe not all the way to the frontier.
[33:45] Andrey: I just... to me, it’s actually quite impressive, Seth. That’s kind of... you just have to remember that these are essentially the top people, the smartest people in the world, right? Like...
[34:00] Seth: Sure.
[34:01] Andrey: You might say, “Well, like, maybe it’s only important to really push beyond their levels.” But actually, we’re completely rate-limited on people like this, right? There are very few of them. And so if they’re able to do things faster, that’s pretty great for society. And also it means that... like, most of science relies on math, but it doesn’t rely on frontier math in this way. And so for all of us who are not as good at math, this could be pretty fantastic, right?
[34:34] Seth: For us middle-brow theorists.
[34:35] Andrey: Yes, exactly. So yeah. To me, this is quite impressive. This is already extremely close to the frontier. And it’s... you know, it’s proving results that were not in the literature. So I... yeah. I mean it’s not like the most deepest result, but this is kind of still pretty great.
[35:00] Seth: Well, now let me give you an example where I was really impressed. And maybe you’ll tell me you’re less impressed by this one. Which is just its function as a literature review tool. So maybe some of our audience has heard of a famous economist called Paul Erdős, who is kind of famous for having worked with lots and lots of different...
[35:19] Andrey: Wait, why did you call him an economist? He’s not an economist.
[35:22] Seth: Did I call him an economist? Mathematician. Excuse me.
[35:24] Andrey: He’s definitely not an economist.
[35:25] Seth: I was good. So I assumed... Thank you. Mathematician Erdős. Who is known for working with lots and lots of mathematicians. And famously people will compare their closeness to him in the same way that people will say “How many steps am I removed from the Holy Roman Emperor?” They’ll say “How many co-authors away am I from Erdős?” Because he’s worked with everybody in so many different domains.
[35:50] Andrey: And famously... famously he took a lot of methamphetamine. And that’s why he was so productive.
[35:57] Seth: A lot of meth. You know, if you do cocaine, you become Stephen King. Meth, you become Erdős. So, you know, which way Western Man? All right. And so one of the things he left us with before he passed was a long list of sort of what he saw as cool open questions for his students and friends to work on. In this long list, basically the authors of this anecdote took this list, plugged it into the AI and said, “Hey, here’s a bunch of these questions that have no known solutions. Can you find solutions to them?”
[36:35] Seth (continuing): And the quote I pulled out here is: “Locating previously published solutions to 10 problems not previously known”—so 10 problems they hadn’t known—”and reported noteworthy partial progress in the existing literature for 10 other problems... and correcting an error in problem 1041.” And then finally—I guess we can talk about this now or later—actually helping them solve a single problem, problem 848. It gave them a big hint and the mathematicians were able to work with it to actually solve problem 848.
[37:08] Seth (continuing): So I like this one. It feels like... it feels like super verifiable. It seems super solid. It seems like a super easy win. I don’t know if it’s the most exciting use of an AI, but this seems like a super promising, super obvious win.
[37:27] Andrey: Yeah. I mean I think it’s fantastic. I am very skeptical that this can work well outside of mathematics and physics. And the reason is that the more empirical literatures are just littered with terrible research. And like... the literature review problem is not that great. When I think about like when I’m working on a project... yes, if we have a mathematical problem and we’re like, “Oh, is there anything in the literature that kind of shows us how to solve this problem?” that seems quite useful.
[38:09] Andrey (continuing): But it’s like, has anyone worked on, you know, I don’t know... I have a paper on privacy. “Has anyone worked on privacy before?”
[38:20] Seth: Privacy. What’s the right way to do cookies?
[38:22] Andrey: Yeah. I mean like... it’s fine, you know? Like it’s good to have some citations in the paper, but yeah. To me, the literature review problem is not that important as part of my work. What do you think?
[38:39] Seth: I would push back a tiny bit. Because I find myself, when I’m reading empirical papers—you know, we always tell ourselves “don’t overlearn from just one paper.” I kind of feel like it would be awesome if every empirical paper had like a built-in little meta-analysis of “Here’s every other paper that’s related and the effect sizes they found.” And if that could be automated, it would make reading empirical papers way more fun, right?
[39:05] Andrey: Sure. Yeah. I mean, fair enough. I guess... yeah. I guess it’s a question of what we’re thinking about. Writing your own paper? Unless it’s a meta-analysis... maybe not that useful. But just generally learning from the literature, it is very useful. And actually there’s a very promising tool called Elicit which does this sort of literature search. I think it’s primarily focused on the pharmaceutical domain. So yeah. So I think... yeah. So there is this use case. But I was just reflecting on the fact that for what I personally do in my research, you know, I’m aware of some of the major papers in my field obviously. But not knowing the literature is not a bottleneck, I don’t think.[40:00] Seth Benzell: What I think of is Edison, famously... whenever he had an idea for a new invention, he made sure to get a team on making sure it was not invented already because he had gotten burned several times along. Oh, you know, somebody had filed a patent for that 20 years ago and they just never made any of it.
[40:19] Andrey Fradkin: Yeah, yeah. No, no. I mean, look, maybe it’s different in other fields. I... you know, I can only know what I know. Yeah.
[40:31] Seth: Sure. Um, maybe one more negative case. There was a mathematical case involving... what are conditions necessary on subsets to make sure that you don’t get so many subsets that are called cliques? That’s kind of the level of the math I understood of this problem. They gave ChatGPT the problem, it repeatedly gave them the wrong answer. Eventually, after insisting to ChatGPT it was giving them the wrong answer, it gave them the correct answer... which then they later discovered was already in the published literature and ChatGPT did not give it credit.
[41:12] Seth (continuing): So I guess another example here of you really need to be on top of these things and not take their first response as gospel.
[41:19] Andrey: Yeah. To me this is such a compliment to doing high-quality work because... you just... if you don’t have the judgment, it’s... it so often gives you stuff that’s wrong, incomplete, and you have to actually have some vision and knowledge to know which parts of the answers to take and which parts not to take.
[41:43] Seth: Right. Yeah. So yes. This seems like we are at the level where the AI is making very plausible guesses and you still need an expert sitting on top of it.
[41:53] Andrey: Yes.
[41:54] Seth: So, Fields Medalist winning mathematician Timothy Gowers gives us this take, which I thought was like a really kind of good summary of where it is right now, and kind of inspired my opening joke:
[42:12] Seth (quoting Gowers): “As a research supervisor, I have a rule of thumb for when a contribution I make to the research of one of my PhD students is at the level where I should be a joint author.”
Do you know where he’s from? Should I do an accent? I’m just gonna... I’m not gonna do an accent.
[42:24] Andrey: He’s British.
[42:25] Seth: He’s British? Ooh. Okay.
[42:27] Andrey: I don’t... yeah. Let’s skip the British accent.
[42:29] Seth: Okay. Thank you, Andrey. That’s a gift to you, the listeners at home.
[42:35] Seth (continuing): “The rule is that if the student comes to discuss the problem with me, and I have, in the course of that discussion, an idea that comes more naturally to me than to them, and that turns out to be helpful, then that is not enough for joint authorship. But if I spend time struggling with the problem—of course, I will only do this if the project is officially a joint one, very propitious as a British man—and during the course of the struggle... during the course of the struggle, I really love that... I come up with an idea that required more than just standard expertise that I happen to have, that I have made a genuine contribution to the work.”
[43:10] Seth (continuing): “My experience so far with LLMs is that they are capable of playing with this knowledgeable research supervisor role with me, which can be extremely useful given just how much knowledge they have”—this is coming from a Fields Medalist—”but they are not yet at the level, or at least have not yet exhibited that level in my own interactions with them, at which a human mathematician who follows my convention above would ask for joint authorship.”
[43:34] Seth (continuing): I mean, it’s... he’s kind of playing it down, but this is actually pretty freaking high praise, would you not agree, Andrey?
[43:40] Andrey: Yes. Yes. I mean, let’s just, you know, remind ourselves that whatever graduate students he’s thinking about are also some of the smartest people in the world. And you know, most... once again, most scientists who work with math have problems that are substantially easier than anything these sorts of people would be working on. Right? And are bottlenecked by it. Right? Like we’re, you know, bottlenecked maybe temporarily... you know like...
[44:12] Seth: Or even permanently.
[44:13] Andrey: Or even permanently. It could be either, right? And so yeah, like it’s essentially saying like, “Oh, for, you know, 99% of scientists who use math, it’s already really, really, really, really good.”
[44:26] Seth: It replaces me.
[44:28] Andrey: Yeah. And if you’re like a Fields Medalist, you know, maybe it’s not as good as you yet.
[44:35] Seth: Incredible. Um. I guess... one other kind of little detail I came... I want to pull out here is like the requirement that you have to struggle with it for co-authorship. I think that’s kind of fun, right? Like, is one of the reasons that maybe AI gets less credit than we should give it is that it seems so effortless?
[44:56] Andrey: Yeah. Well, you know, sometimes it’s like... it’s interesting, you know in this paper you see that the AI thought for like 20 minutes or whatever. And this is...
[45:05] Seth: Yeah, they got the really good version. Just to be clear, so this is using GPT-5.1 Pro, which can have very very long runtimes if you let it.
[45:13] Andrey: I think it’s 5.0 Pro. Just to be clear.
[45:16] Seth: 5.0 Pro? 5.0 Pro. Excuse me.
[45:19] Andrey: Yeah. But yeah. So this is the frontier reasoning model. This might be the one that’s... I think that’s the one that’s available in the max plan on ChatGPT. But it wasn’t clear to me whether the scientists here got some special access. They probably did. So yeah, it’s not really the sort of AI that most people today would be using, but of course, you know, they could be using it, you know, given how fast things move, within the next year.
[55:51] Seth: Right, right. So exactly. So as we march down Moore’s Law, what is available, you know, in pre-release to Fields Medalists diffuses to us proles in... what, a year or so?
[46:01] Andrey: Yeah, yeah, yeah. Um. Yeah, so I... I don’t know. To me, it’s just really... I mean, I would say it’s awesome. I mean... I mean, it’s just... it’s gonna make us so much more capable. Like, I don’t know... to me, this is a lot of cause for optimism. Even though it’s not, you know, it’s not doing science end-to-end. If that was your, you know, hope, it’s not there yet. But it’s already, you know, great.
[46:33] Seth: I think one thing I would pull out, and I’ll emphasize this in our conclusion, is that it seems like one of the bottlenecks on AI itself is the inability to rigorously check its own proofs. And it seems like once we get really good automated translation from these kinds of human-LLM-readable proofs into kind of machine-checkable proofs, you’ll like multiply this productivity because it’ll be able to check its own work.
[46:59] Andrey: Yes. I... we should also mention, like we haven’t mentioned yet, but there are several very, very well-funded startups that are working on AI for mathematics. DeepMind is also obviously a leader in this field in addition to OpenAI. So it’s also kind of one where, you know, as economists we’re like, “Wow, there’s just so much competition and investment that’s great.” We’re bound to get some awesome results in the future, right?
[47:33] Andrey (continuing): Yeah, so... so... so I mean one of the interesting things here is that it is really like a chat interface, right? Like you don’t have to use a specialized mathematical proving language, you don’t have to interact with that. You can reason with it in, you know, loose terms and then it kind of knows how to interpret it. Maybe some of these other efforts might be a bit more, you know, narrow... you know, very very powerful but more narrow. Yeah.
[48:02] Seth: Right. And it seems like the real win is both combining the natural language and the machine-provable code.
[48:09] Andrey: Yes. Yeah.
[48:10] Seth: Right.
[48:11] Andrey: But my vision for all these things is just, of course, that you have AIs calling tools that are other AIs, right? I am very much not in the camp of “one AI to rule them all end-to-end without tools.” Like, some people have that vision, but I don’t... you know, just like a human uses tools, I don’t see why an AI wouldn’t use tools. Which might be other AIs, like a human would have research assistants.
[48:38] Seth: I guess the only thing I would jump in here with is... right, one thing I’m always on the lookout for now as we read these papers is like, you know, the Bitter Lesson update. So to what extent does the generalist AI that’s bigger beat the specialist efforts? To what extent is task-specific prompting and scaffolding important versus “just use better model”? And I think in each of these examples we really do see task-specific scaffolding being important, prompting iteratively and, you know, in a special way being important. Now of course this is all in the context of a single model, so we can’t really speak to, you know, versus these other approaches, but something to keep our eyes open for.
[49:21] Andrey: Yep.
[49:22] Seth: Um, okay. Here’s an example that I thought was funny because it was like clearly written up by an AI. There was a physics example where they asked the AI to derive known but unpublished results about black hole symmetries. One of the take-out quotes is: “After about five minutes of internal reasoning, the model incorrectly reported that the equation had no continuous symmetries beyond trivial scalings.” Then again, we have another example, they prompt the model again, they give it a warm-up problem. With the warm-up problem, the AI is able to solve the full problem.
[49:59] Seth (continuing): This is the part that made me think it was definitely written up by an AI. In the implications section, it felt really AI-ish and here was one of the quotes I pulled out: “AI as symmetry engine. With minimal domain scaffolding, current models can carry out non-trivial Lie symmetry discovery for PDEs”—partial differential equations—”with non-constant coefficients.” Okay. Dude, that was an AI sentence. “AI as symmetry engine.” What kind of metaphor is that? That’s an AI metaphor, dude.
[50:29] Andrey: Yeah, I mean... I think one of the things that’s going on in the background that we should say is that scientists using AI to write is just now ubiquitous, right? There was a huge controversy at ICLR, one of the top CS conferences, where just an enormous share of referee reports for papers were written by AI. In fact there’s a tool, Pangram, that has shown very high accuracy at detection of AI writing, and it was used to measure these reviews and just so many of them were written by AIs. So many of the papers are written by AIs.
[51:15] Andrey (continuing): So I just think this has to... this is just the new normal, right? Like... and we shouldn’t be surprised. A lot of scientists... English is not their first language. Even for those who it is a first language, you know, writing is a specialized skill that most people, most scientists, are not very good at. And it’s a lot easier to have an AI write a draft and you tweak it than to write something from scratch. It’s not obvious to me how important it is that the human does the writing. I guess I like to do writing because writing is thinking, it’s a way that I think through problems. But for a lot of things, I don’t know, let’s say like form letters and things like that, like why would I waste my time honing my language when I could just have the AI do it? So I’ll just say like this is a new normal and the viewpoint that we’re mostly writing for the AIs is also true.
[52:16] Seth: Do you want to spell that out for people who might not have heard that phrase before?
[52:21] Andrey: Yeah. So I first heard it from Tyler Cowen.
[52:24] Seth: Andrey’s favorite economist. Friend of the show.
[52:30] Andrey: If you say that, he’s more likely to retweet you.
[52:33] Seth: [Laughs] Yeah, yeah, yeah.
[52:36] Andrey: “Friend” is, you know, a loose term, but you know, we have had dinner with Tyler and that was a great honor. But yeah, I guess the AIs are sucking in all the writing in the world for their training. You know, they’re also able to search through content very effectively and will be reading that content as part of forming their answer. And that’s just happening all the time. It’s happening much more than humans reading some very niche bit of content like one of our papers, right? And so then you might think that since your primary audience with a lot of writing is the AI, you might want to quote-unquote “write for the AI.” That might mean that you don’t have to write as carefully... or not as carefully, but you might... you know, some of the things to entertain humans might be less important.
[53:38] Seth: Poetic function of language.
[53:39] Andrey: Yes. Less important for the AIs. And so you get writing like this quote-unquote “symmetry engine,” right?
[53:50] Seth: [Laughs] Yes. Like... I don’t know. Okay, maybe. I think language will lose something if metaphors stop being helpful. I think you’ll just stop dropping metaphors, right? We’ll just get to purely functional language, right? Because a bad metaphor is worse than no metaphor.
[54:06] Andrey: Yeah, yeah. I mean, I guess I guess we’re gonna see very clearly... like much more clearly delineated communication for humans versus communication for AIs. That... I mean we’re almost kind of there. I mean papers... if you think about like how much effort most scientists put into writing papers vs. how bad the writing is in most scientific papers... why are we even pretending, you know?
[54:35] Seth: Yeah. Anyway, well, very interesting to watch. Um, I had one more example I wanted to pull out, which was the biology example, which I was really excited to read given that so many of these were very math-heavy. In this example, the writers of the anecdote uploaded an experimental figure showing the impact of giving some white blood cells a glucose substitute. Right? So the idea is maybe the white blood cells will do differently if they have glucose versus not glucose, and maybe you could like get them to do something that would cure cancer if you give them more or less glucose.
[55:12] Seth (continuing): And one of their results was that they tried both giving it no glucose (or a very low amount of glucose) as well as giving it a treatment which is like a glucose substitute. So there was some goo that was gonna gunk up the glucose receptor so that the cell wouldn’t be able to eat the glucose. GPT-5 seemed to understand the figure, pointed out hypotheses and potential follow-up experiments to understand why the “fake glucose” had a different effect than low glucose.
[55:40] Seth (continuing): It suggested some potential mechanisms why. ChatGPT writes: “A low glucose control partly mimics the effect but is weaker than the fake glucose at equal nominal concentrations, suggesting contributions from glycolysis restriction and N-linked glycolysation interference... a known 2-DG [this is the fake glucose] off-target... rather than energy limitation alone.” Right? So this seems to have been the key contribution of ChatGPT, is that... like the scientists obviously when they made this result they immediately identified, “Oh that’s interesting, the fake glucose seems to have a different effect than the zero glucose.” The insight that the AI seemed to have had is this particular mechanism, is that there’s an off-target effect of the fake glucose. And suggested, you know, experiments to follow up—using a different kind of fake glucose, trying some other treatments that would identify whether that was the correct mechanism.
[56:42] Seth (continuing): You know, when I say it that way, it doesn’t seem that impressive, right? Like the scientists were already pretty close to that. The scientist... at least reading them, they seemed more impressed than like my reading of it was. They write—the authors write—”In retrospect in particular, the proposed mechanism of reduced IL-2 signaling via interference with N-linked glycolysation made clear biological sense because it could directly explain the disinhibition of the Th17 cell differentiation under 2-DG treatment. However, this mechanistic hypothesis had not occurred to us.”
[57:17] Andrey: Yeah, I mean... I mean once again, it’s a thought partner. You know, if you’re working with people on a problem, you’re gonna have conversations with them and different co-authors are gonna come up with ideas that you hadn’t thought about yet. And you know through iteration, that ultimately creates an artifact which is the research paper. And that’s kind of a series of things like that. And it’s very rarely that there’s kind of one Eureka in this. Or even if there’s like a main insight, you actually have to like take it very seriously to draw out the implications and so on. A lot of... I actually imagine a lot of people had great ideas that ended up eventually being correct science but they just didn’t pursue them, right?
[58:10] Andrey (continuing): So that’s kind of how maybe we should think about this. Is that it’s a thought partner, but it doesn’t yet have agency to pursue the research.
[58:21] Seth: That is so interesting because I came away with this feeling like this is an example of AI as deep literature search, right? Because it seems the problem was pretty well defined, right? Shouldn’t this have the same effect as that? Do deep literature search to see if there’s any, you know, off-target effects of either the thing. But maybe that’s viewing this too narrowly.
[58:42] Andrey: Yeah. I just... I’m not an expert enough to know whether it made a connection across, you know, literature... Right? Like it knows a lot of things. I don’t know if I’d call that literature review. Just like a scientist would know a lot of things. And then some of the magic happens when it connects two, you know, previously unrelated concepts. I just... to me, saying it’s just literature review seems a bit reductionist. You know...
[59:11] Seth: “It’s just a stochastic parrot, Andrey.” Okay. Are you ready? Do you have any other examples you want to make sure we highlight? Are you ready to move on to our conclusions and posteriors?
[59:25] Andrey: Yeah, let’s move on to the conclusions. Yep.
[59:28] [Music / Transition] — MOVING TO POSTERIORS
[59:35] Seth: Okay. So I think these were pretty impressive. I don’t know if there was any, you know, “dropping my jaw” ones. The Timothy Gowers being like, “This is good enough to be my lazy faculty advisor” is probably the jaw-drop moment, right?
[59:48] Andrey: Yeah. I mean just... I think the credibility of people like him or Terence Tao saying that they find it useful... I think in some sense it’s, you know...[60:00] Seth: This is an OpenAI release selling, you know, for a product that they sell for $200 a month.
[60:09] Andrey: Yeah, but I mean... I mean... sure. I... I just... I don’t know. Like... to me, once again, I’m going back to my priors. Like it’s obviously useful for science. You have to be truly incurious or, you know, a Luddite to think that it’s not.
[60:28] Seth: Fair enough. Well, actually, I have a theory about your crypto friend. Is it just that, like, cutting-edge crypto is not published widely? Is there some sense in which, like, crypto research might not be in the dataset as much?
[60:44] Andrey: I don’t think so. I don’t think so. I think he... I don’t know. I don’t want to put words in his mouth. But if I like...
[60:52] Seth: He’s a Luddite.
[60:53] Andrey: No, no, no. I think if I had to guess, I think he... he kind of views like some deep... deep theoretical insight as maybe the requirement that he has in mind. And that’s... that’s the bar that he has. And...
[61:08] Seth: Yeah, it’s not Einstein. It’s not inventing new paradigms.
[61:11] Andrey: Yes, yes. But I guess... I don’t know. To me, that’s...
[61:17] Seth: I’m not Einstein! I’ll take it!
[61:19] Andrey: Yeah, yeah. Yeah. Exactly.
[61:24] Seth: Um, okay. Uh, and I... I made this point already but I just want to end here which is... I think my takeaway from here is some sort of automatic translation in between sort of machine-language-provable code and like human-language code seems to be the real bottleneck here before speeding up AI a lot. Or at least math-specific AI.
[61:48] Andrey: I really don’t think that’s the bottleneck, Seth. I truly don’t. Um.
[61:52] Seth: But it con... we keep on seeing examples of it like it gives the wrong answer and you have to be like, “Well, I thought about this and it’s the wrong answer,” and then it does that five times and then it gives you the right answer. We see like three examples of that here.
[62:05] Andrey: I... I guess like... this is one... I guess “bottleneck” seems like a weird word to me given that there’s a parallel...
[62:14] Seth: Accelerant.
[62:15] Andrey: I’m not... I... okay. There’s a para... there’s essentially parallel efforts to... certain things can be formalized in these Lean provers. And imagining an OpenAI... like a... like a GPT-like model calling the Lean model is like trivial. Like I... I’m not saying it’s trivial like clearly like... I don’t...
[62:43] Seth: If it’s trivial, why does it keep on giving us wrong answers?
[62:45] Andrey: Because OpenA... because I actually think that like the way this system is designed, it’s kind of using GPT by itself. But actually... my sense is that people in the field who are pushing the envelope are combining these tools. And if you look at DeepMind’s tools, they’re not... they don’t work like this. They are using the formal provers. And so to call it a bottleneck is like implies that like, “Oh, like actually no one has this working yet.” And I... and I actually... I... I bet that some people have this working. It’s... I don’t think... not... I’m not sure whether everything can be formalized in these specialized proving languages in the same way. But yeah.
[63:34] Seth: It’s a limitation in these examples, but you’re saying it’s not a limitation, you know, tomorrow if you wanted to use the cutting-edge tool.
[63:41] Andrey: Yes, yeah. That... that’s... that’s my sense. But you know, if listeners disagree, you know, feel free to let us know. Yeah.
[63:48] Seth: Yeah, please call in. Okay. Um. Posteriors? Or any other limitation comments you want to make?
[63:55] Andrey: No. I... yeah. I mean I...
[63:57] Seth: Posteriors. Yeah.
[63:58] Andrey: Yeah. I mean I... I don’t know. Like our... our priors were very loose so I don’t know the posteriors. I mean I think... yeah. I mean I... you know, I stand by what I say here. I found these examples quite interesting. And it was uh...
[64:14] Seth: Okay. So paradigm-wise, you’re still in the same place? That you think it’ll be co-working with it today and co-working with it in five years?
[64:21] Andrey: Yep.
[64:22] Seth: I said right now it’s super powerful for lit reviews—deep literature reviews—and um, maybe we’re... you know, in five years we will be all the way to AI on its own, at least for math problems. I come away reading this thinking we’re closer to AI on its own for frontier math research than before reading this. Uh, it really does... and again, I call what I said as a bottleneck or say that it’s already been removed... but I mean it seems like if this... what we see described here, plus the AI being able to iteratively check itself and just like redo the math... try another approach if it disproves itself... seems like you should be able to just let that fly and find a bunch of cool stuff.
[65:13] Andrey: Yeah. And if... if you... if you look at prediction... you know, various forecasts, we see forecasts for by 2030 the Millennium Problems being solved with AI. So... uh, that’s not a very un...
[65:28] Seth: AI is gonna solve the Riemann Hypothesis? That’s more of a question about the Riemann Hypothesis than AI.
[65:32] Andrey: Well, you know. People who are experts, a decent chunk of them forecast that this will happen. So, yeah.
[65:40] Seth: Okay. And how impressed were we by the most impressive result? I said we were gonna... I was gonna be like 7 out of 10 impressed, 8 out of 10 impressed. I think that’s kind of where I end up. If not like a little bit below that. Um, in the sense that I’m not saying that these mathematical results aren’t super impressive, but I was hoping for like, “And we discovered something that was like a treatment we can use tomorrow,” or “We discovered...” I was hoping for something that was kind of more directly practical from at least one of these examples.
[66:13] Andrey: Yeah. I mean, to me, if there was something that was very practical, that would be like a 9 out of 10 or 10 out of 10. And you know. Uh, but I... yeah. Once again, I think like nothing blew my mind, but it all seems like we’re... we’re... we’re on the path to this being a very transformative technology for science. Yeah.
[66:36] Seth: Yeah. Super, super excited to talk to Ben Golub about the AI research tool that he’s working on. Um, and uh, listeners at home, let us know: How do you use AI in your science or in your life? Post it in the comments, share, comment, and subscribe. All right.
[66:56] Andrey: Well, until next time. Keep your posteriors justified.
[67:00] [Music fades out]