Listen

Description

In this episode, we brought on our friend Bo Cowgill, to dissect his forthcoming Management Science paper, Does AI Cheapen Talk? The core question is one economists have been circling since Spence drew a line on the blackboard: What happens when a technology makes costly signals cheap? If GenAI allows anyone to produce polished pitches, résumés, and cover letters, what happens to screening, hiring, and the entire communication equilibrium?

Bo’s answer: it depends. Under some conditions, GenAI induces an epistemic apocalypse, flattening signals and confusing recruiters. In others, it reveals skill even more sharply, giving high-types superpowers. The episode walks through the theory, the experiment, and implications.

Transcript:Seth: Welcome to the Justified Posteriors Podcast, the podcast that updates its priors about the economics of AI and technology. I’m Seth Benzell, certifying my humanity with takes so implausible that no softmax could ever select them at Chapman University in sunny Southern California.

Andrey: And I am Andrey Fradkin, collecting my friends in all sorts of digital media formats, coming to you from San Francisco, California. Today we’re very excited to have Bo Cowgill with us. Bo is a friend of the show and a listener of the show, so it’s a real treat to have him. He is an assistant professor at Columbia Business School and has done really important research on hiring, on prediction markets, and now on AI and the intersection of those topics. And he’s also won some very cool prizes. I’ll mention that he was on the list of the best 40 business school professors. So he is one of those professors that’s really captivating for his students. So yeah. Welcome, Bo.

Bo Cowgill: Thank you so much. It’s awesome to be here. Thanks so much for having me on the podcast.

Seth: What do you value about the podcast? That’s something I’ve been trying to figure out because I just do the podcast for me. I’m just having a lot of fun here with Andrey. Anything I can do to get this guy’s attention to talk about interesting stuff for 10 minutes? Why do you like the podcast? What can we do to make this an even better podcast for assistant professors at Columbia?

Bo Cowgill: Well, I don’t wanna speak for all assistant professors at Columbia, but one thing it does well is aggregate papers about AI that are coming out from around the ecosystem and random places. I think it’s hard for anybody to catch all of these, so you guys do a great job. I did learn about new papers from the podcast sometimes.

Another cool thing I think is there is some continuity across podcast episodes about themes and arbitrage between different topics and across even different disciplines and domains. So I think this is another thing you don’t get necessarily just kind of thumbing around papers yourself.

Seth: So flattering. So now I can ask you a follow-up question, which is: obviously you’re enjoying our communication to you. A podcast is kind of a one-dimensional communication. Now we’ve got the interview going, we’ve got this back and forth. How would you think about the experience of the podcast changing if a really, really, really good AI that had read all of my papers and all of Andrey’s papers went and did the same podcast, same topics? How would that experience change for you? Would it have as much informative content? Would it have as much experiential value? How do you think about that?

Bo Cowgill: Well, first of all, I do enjoy y’all’s banter back and forth. I don’t know how well an AI would do that. Maybe it would do a perfectly good job with that. I do enjoy the fact that—this is personal to me—but we know a lot of the same people. And in addition to other guests and other paper references, I like to follow some of the inside jokes and whatnot. I don’t know if that’s all that big of a deal for the average person. But I have listened to at least the latest version of NotebookLM and its ability to do a quote-unquote “deep dive podcast” on anything. And at least recently I’ve been pleased with those. I don’t know if you’ve ever tried putting in like a bad paper in theirs, and then it will of course just say, “Oh, this is the greatest paper. It’s so interesting.”

Seth: Right.

Bo Cowgill: You can.

Seth: So that’s a little bit different, maybe slightly different than our approach.

Bo Cowgill: Well, yeah, for sure. Although you can also tell NotebookLM to try to find problems and be a little bit more critical. And that I think works well too. But yeah, I don’t think we should try to replace you guys with robots just yet.

Seth: We’re very highly compensated though. The opportunity cost of Andrey’s time, he could be climbing a mountain right now. Andrey, you take it up. Why are we doing this ourselves? Why isn’t an LLM doing this communication for us?

Andrey: Well, mostly it’s because we have fun doing it, and so if the LLM was doing it, then we wouldn’t be having the fun.

Seth: There you go. Well put. Experiential value of the act itself. Now, Bo, I did not bring up this question randomly. The reason I raised this question of how does AI modify communication... yeah, I used a softmax process, so it was not random. The reason I’m asking this question about how AI changes communication is because you have some recently accepted, forthcoming work at Management Science trying to bring some theory and empirics to the question of how LLMs change human communication, but now in the context of resumes and job search and job pitches. Do you want to briefly introduce the paper “Does AI Cheapen Talk?” and tell us about your co-authors?

Bo Cowgill: Yeah, most definitely. So the paper is called “Does AI Cheapen Talk?”. It is with Natalia Berg-Wright, also at Columbia Business School, and with Pablo Hernandez Lagos, who is a professor at Yeshiva University. And what we’re looking at in this paper is the way people screen job candidates or screen entrepreneurs or, more abstractly, how they kind of screen generally. You could apply our model, I think, to lots of different things.

But the core idea behind it kind of goes back to these models from Spence in the 1970s saying that costly signals are more valuable to try to separate types.

Seth: Right. If I wanna become a full member of the tribe, I have to go kill a lion. Why is it important for me to kill a lion? It’s not important. The important part is I do a hard thing.

Bo Cowgill: Exactly. Yeah. So maybe part of the key to this Spence idea that appears in our paper too is that it’s not just that the signal has to be costly, it has to be kind of differentially costly for different types of people. So maybe in your tribe, killing a lion is easy for tough guys like you, but for wimpier people or something, it’s prohibitively high. And so it’s like a test of your underlying cost parameter for killing lions or for being tough in general. So they go and do this. And I guess what you’re alluding to, which appears in a lot of cases, is the actual value of killing the lion is kind of irrelevant. It was just a test.

And maybe one of the more potentially depressing implications of that is the idea that what we send our students to do in four-year degrees or even degrees like ours is really just as valuable as killing a lion, which is to say, you’re mainly revealing something about your own costs and your own type and your own skills, and the actual work doesn’t generate all that much value.

Seth: Is education training or screening?

Bo Cowgill: Right, right, right. Yes. I do think a good amount of it these days is probably screening, and maybe that’s especially true at the MBA level.

Andrey: I would just say that, given the rate of hiring for MBAs, I’m not sure that the screening is really happening either. Maybe the screening is happening to get in.

Bo Cowgill: What the screening function is now is like, can you get in as the ultimate thing?

Seth: Right. And I think as you already suggest, the way this works can flip if there’s a change in opportunity costs, right? So maybe in the past, “Oh, I’m the high type. I go to college.” In the present, “I’m the high type. I’m gonna skip college, I’m gonna be an entrepreneur,” and now going to college is a low signal.

Bo Cowgill: Yes. Exactly. So that’s kind of what’s going on in our model too. How are we applying this to job screening and AI? Well, you apply for a job, you have a resume, possibly a cover letter or, if you don’t have an old-fashioned cover letter, you probably have a pitch to a recruiter or to your friend who works at the company. And there are kind of elements of costly signaling in those pitches. So some people could have really smart-sounding pitches that use the right jargon and are kind of up to speed with regards to the latest developments in the industry or in the underlying technology or whatever. And those could actually be really useful signals because the only sort of person who would be up to speed is the one who finds it easy to follow all this information.

Seth: Can I pause you for a second? Back before LLMs, when I was in high school, they helped me make a CV or a resume. It’s not like there was ever any monitoring that people had to write their own cover letters.

Bo Cowgill: That’s really true. No, some people have said about our paper that this is a more general model of signal dilution, which was happening before AI and the internet and everything. And so one example of this might be SAT tutoring or other forms of help for high school students, like writing your resume for you. Where if something comes along—and this is where GenAI is gonna come in—but if anything comes along that makes it cheaper to produce signals that were once more expensive, at least for some groups, then that changes the informational content of the signal.

Seth: If the tribe gets guns, it’s too easy to kill a lion.

Bo Cowgill: Yeah. Then it just is too easy to kill the lions. But similar things I think have happened in the post-COVID era around the SATs. Maybe it’s become too easy, or so the theory goes, to get one, where it doesn’t really separate out who is actually a smart person. Maybe it’s getting diluted with who can afford these prep classes and things like that. But I don’t wanna stray too far from GenAI just yet.

You know, I think people have seen a lot about this, either on social media or in the mainstream, is like, the signal in a job application seems like it may have gone down because you used to be able to tell based on these pitches who is qualified or not. And even without lying, you could write a much better pitch that would make you sound really more knowledgeable, even without misrepresenting what your underlying experience is. And so it’s really, I think, not just job applications. That is of course the setting that we study, that and entrepreneurship. But I think there are similar things about how grading at schools has gone bad. You used to be able to quickly tell from an assignment who knew the material and who did not. But now ChatGPT is gonna really interfere with that.

Anyway, so with this as background, we then try to study theoretically and empirically what’s going on with the use of ChatGPT in these sort of costly signaling settings.

Andrey: Yeah. And so how do you go about doing this? Because it does seem like it’ll be pretty hard to study this in the wild. I know of a few papers from some of our friends that have done this. How did you approach this?

Bo Cowgill: So the first thing we wanted to do was kind of motivate the question a little bit more theoretically. So probably at least the first half or so of the paper, we create this model that has what I hope is a tractable punchline, which is that it’s actually not inevitable that GenAI would create this epistemic...

Seth: Wait, a tractable punchline? Wasn’t the punchline that anything goes? What’s the punchline?

Bo Cowgill: Well, I am glad that we brought up the “anything goes” theory models, which is another kind of theme of your podcast and critique of previous papers. So it is true that our model basically says that depending on a particular parameter, you could get either an epistemic apocalypse or a situation where the use of GenAI actually improves the accuracy of screening. And it’s like, you get better information, you actually want your job candidates. You want to say, “Please use GenAI. We actually will know better. Don’t send your pitch in without using GenAI first.”

So it’s true, anything goes. And my defense of that is we really focus the reader on this particular parameter that you could measure empirically.

Seth: Are there other parameters that theoretically could affect this, though?

Bo Cowgill: Not that we’re talking about in this paper. No.

Seth: Not in this paper. All right.

Bo Cowgill: If you have some in mind, I’m curious.

Seth: Well, let’s come back. So I have some thoughts at the end about interpreting the results, so we’ll come back to that. You can just keep on walking us through what you did.

Andrey: I guess I wanted to say there’s an approach in economics, a sufficient statistics approach, right? Where you write down a model where there is a particular parameter that, depending on how big it is or what sign it is, that tells you something about what is the right policy or what is the mechanism that quote-unquote “dominates” a particular setting. And so I view what you guys were doing very much in that vein.

Seth: Right. A ceteris paribus sort of analysis. Yeah.

Bo Cowgill: That’s true. So what are we focusing on? What is the key linchpin of this model? It’s a covariance term across the population. So let me try to break this down.

The two terms in the covariance are, first of all, how much human capital do you have? Or are you like a talented person who knows a lot about what you’re doing, you have a lot of expertise or not? And we’re sort of assuming that the employers are trying to screen for that. Why are they screening for it? Well, in an actual job, you could be in a situation where you don’t have to use GenAI, or you can’t use it and you have to just use whatever knowledge is between your ears. So this one term is your kind of level of talent for the job without AI assistance. And then the other term is how much of a boost does your cover letter get from using ChatGPT to sex it up and to make you sound like you know all the smartest, most contemporaneous jargon?

So these two things could be positively... it could have positive covariance, they could have negative covariance, they could have basically no covariance. But the intuition is, if you have a positive covariance, then the most talented people are getting the largest bump from using GenAI. And the negative covariance would be if the really talented people don’t really get that much of a cover letter improvement, maybe because it’s already so good that there’s nowhere else to go, and that most of the benefit comes from improving the low types’ quality of their cover letter. So this is the linchpin parameter in the model, and what we try to take to data after this.

But just to finish up what’s going on in the theory: well, you get totally different screening results depending on what that parameter is. In the case I think that people are most expecting, you have this negative covariance where most of the benefit comes from making low types and helping them masquerade as high types. And in this negative covariance world, there’s not really that much benefit to high types for using GenAI ‘cause their cover letter or their application or whatever, it’s just already so good. So insofar as this is happening, we want to quantify that empirically. But there’s also this possibility that GenAI puts the high types... it gives them superpowers and they can do even more amazing stuff.

Seth: Right. Can I jump in here? I don’t think you have to interpret it as superpowers, right? If we’re thinking about communication generally, you might imagine that high types have the higher opportunity costs of their time, right? And so there’s some sense in which automating an hour of high-type time is like more money than automating an hour of low-type time. I guess to really understand how this plays out, I’d have to think about how many discrete versions of this is the high type sending out to prospective employers, right?

Andrey: And I guess maybe I’ll add on to that. It depends on what we’re screening for. You’ll get to this in your experiment, but like if the high type has verifiable high-type traits, which is oftentimes the case, assuming they’re not lying on their resume, right? Then what does something like a cover letter reveal? It’s some sort of effort. Right? And so the question... in my mind, cover letters are oftentimes screening for effort, which seems very... take the time to customize a cover letter for this particular job.

Seth: The effort is cheaper for poor people.

Andrey: It’s so it’s kind of a little bit of a different interpretation than like skill per se, because skill... I think it’s unlikely that cover letters signify skill in many domains. Certainly hiring, letters are essentially not read.

Seth: Essentially ignored. I mean, unless they say, “talk to my co-author, blah, who you know,” unless there’s like, “do this thing to learn about me” information in it. Right.

Bo Cowgill: Yeah. Interesting. There’s like a number of things to follow up on there. I do think that there have been big things missed in the study of hiring generally from trying to generalize from academic hiring to other things.

Andrey: Yeah.

Bo Cowgill: I’m not even sure I agree that cover letters are not read either in economics or at least in adjacent places like business and policy schools. And the fact that you think that is probably just a reflection of you guys going to such fine universities that you assume everyone would take the job if you were... I don’t want to pick on any one university.

Seth: Directional state.

Bo Cowgill: Yes, exactly. If you were from University of Southwest Kentucky, which is where I grew up, so I’ll pick on it, it could be very worthwhile to signal that you’re actually interested.

Seth: But again, perfect. But then we’re not signaling skill. You’re signaling match or you’re signaling effort. Right.

Andrey: So it’s a question of what... really this correlation really depends on what is the signal that’s being sent, I think.

Bo Cowgill: Sure, that’s true. But this particular conversation I think has gone off in the direction of cover letters, but candidates also use GenAI to fill in, for example, the bullet points of what they did in a particular job.

Andrey: Yeah. Yeah, yeah.

Bo Cowgill: Where there’s an enormous amount of leeway for describing your job as a super high-impact thing that required you to be an agentic leader or something else. And this is a case that’s not cover letters, but is part of your pitch, where it could actually signal different underlying skills.

So there are lots of ways I think, to apply these ideas in different settings. And it’s true that there’s probably some follow-on work that would be useful, and we can talk about some follow-on work that other people are doing and that I and my co-authors are thinking about doing too.

Seth: Don’t solve it all in one paper. So tell us. So that’s the theory.

Andrey: How dare you not solve it in one paper.

Bo Cowgill: Yeah, yeah, yeah. So you could get these opposite sorts of things. You know, some people think, “What are you talking about? How could there be positive covariance? That’s ridiculous.” I have some examples in mind. In the paper, we talk about AI art. So I’m not an artist and I don’t think you guys are either, but if I used art with DALL-E, I think I’d be a little bit better. But there’s some evidence and some anecdotes and even some small studies that say like, if you actually know how to describe art as a trained artist would, then you can use these AI art generation programs to make way cooler art. And so like if you were screening an artist, you would want them to use GenAI because then you would be able to see the big differences. And even just some screenshots from these demonstrations I think would show how much better the actually trained artists would be, or the high type would be, once they use GenAI.

Now another example of this to me is using AI for math. Now maybe it’s just gotten so good that it can just solve whatever, but I think if you gave a difficult economic theory theorem to prove to a total novice, as somebody who hasn’t gone to a PhD or a high school kid or a middle schooler or something, like, they might not make very much progress. But if you gave someone who had trained or had some intuition for what the solution is, then I think it would be more powerful and actually like... having this sort of result that you could do something with. But it’s true, our model basically isn’t anything goes, but it kind of focuses on this covariance parameter as the thing to pay attention to.

Andrey: It could be positive. So oftentimes, if you’re doing an interview process, there is like a take-home component, like for a data science job that might be a take-home analysis and a dataset and a report, right? In some sense, you can make it... the ceiling for this assignment is very, very high. Right?

Bo Cowgill: Yeah.

Andrey: And someone who actually knows what they’re doing would be able to do a much, much better job. Like there’s a sense that the GenAI tools might raise the bottom of the distribution, but if you want to get close to the max, the people who really know what they’re doing might actually benefit a lot more from the tools.

Bo Cowgill: That’s true. That’s right. Yeah. Well, something your comments, Andrey, make me think about is just the even the idea of a max. And one reason I think that we’ve seen a lot of negative covariance applications is that the underlying test has been designed with a maximum that... there are too many people that are actually close to. And if the test had more sort of headroom to go arbitrarily good, that might, even just that change alone, might make it more possible that GenAI can actually help find the truly talented people as opposed to making the people that ate their homework masquerade.

Seth: No, I was just gonna jump in. I wanna propose a hypothesis for why negative correlations might be common, generally. So you might imagine... rather, not generally, in experimental settings, in experimentally relevant settings. Why do I say that? Imagine if your quality as a worker is both a function of the stuff that can be automated by GenAI and stuff that can’t be automated by GenAI, right? So I’m a worker. I have to do both of these tasks, but maybe I’m gonna delegate some of the automatable-by-GenAI tasks.

If we’re all applying for a job which is kind of at the same sort of productivity threshold, and we’re all kind of assortatively matching to like, we’re applying... we’re not applying to the corner bodega and we’re not applying to Google. We’re all applying for this mediocre firm. For us to have the appropriate skill, total productivity for a mediocre firm, I have to kind of be good at one thing and bad at another. So these like productivity isoquants of given workers will imply a negative correlation between skill in the automatable thing and skill in the non-automatable thing.

Bo Cowgill: Uh.

Seth: So it doesn’t surprise me that if you get a population which is pretty homogeneous in terms of like total productivity, that’s going to entail a negative correlation in the automatable versus non-automatable skill. So that’s why I think this is gonna be common.

Bo Cowgill: Okay. Interesting. I’m curious, I think one of the places where you see negative covariance the most seems to be in the classroom. I guess how does this isoquant idea apply there? Or is it just like, because it’s education and not an actual job that it doesn’t really apply?

Andrey: Well, my thought process would be there is like a lot of assortative matching between programs and students, right? So...

Bo Cowgill: Ah, I see. Yeah. Okay. Okay. Perfect. Yeah.

Seth: But as I wanna complete my idea. So to complete my idea, actually I’ve realized that I’m pointing in the wrong direction, right? For the AI to boost the overall lower total productivity person more, what it needs to do in terms of the job application, is boost them disproportionately at writing job applications, right? This is your notion of how correlated is your actual skill with your ability to write the resume with and without the GenAI. Right. And I think in the general population, it’s probably the case that your ability overall and your ability with AI are positively correlated, in which case, this would be a noisy signal that would mess you up. But if we had like a narrow enough band of quality coming in, it would go in the other way. So maybe there needs to be like a level of screening before the screening. But we haven’t even let you get to the results yet. We’re still in theory.

Bo Cowgill: No, no, no. I think it’s great, as part of the podcast genre, to have some tangents here and there. So in the empirical part of our paper, we’re just trying to measure like how much actual information loss is there? And is it possible that for certain subgroups you actually get information gain? And also, what is this covariance? Is it kind of more positive or negative?

And the key to understanding our experiment is that we actually know something about all the subjects in it and what their “high” versus “low” type is before they even enter the experiment. So I’ll tell you a little bit more about the setting. We are looking at job seekers on Prolific who are in the market for either a data science job or a consulting type of...

Andrey: So Bo, just to clarify ‘cause I do think this might be unclear to the participants. These people are not actually looking for a job. You are recruiting them into an incentivized survey of some sort, right?

Bo Cowgill: That’s true. They do have experience in these respective domains. And so, insofar as this is an incentivized experiment, we have recruited subjects with domain-appropriate knowledge, at least in some cases.

Seth: Can you explain what... do you look at their CVs, or this is something Prolific tells you that they’re experts versus non-experts?

Bo Cowgill: Yeah, Prolific screens them beforehand. And so they’re a little bit unclear about how exactly they screen these people.

Seth: Unclear about what makes someone an expert.

Bo Cowgill: Fair enough.

Andrey: So to be clear, my interpretation is that no one in this paper is an expert. There would be no way any expert in data science would...

Seth: ...for $12 an hour.

Andrey: ...in this sample.

Bo Cowgill: Sure. Well, you sound like one of our referees.

Andrey: Not... I, just to be clear, I am definitely not your referee.

Bo Cowgill: Okay. Yeah. I think the underlying theory doesn’t require anyone be like, elite at any of these things. There just has to be variation within the population about who has relatively higher or lower human capital and that this be...

Seth: Bo, can I pause you for a second there? ‘Cause one of the main outcomes is gonna be whether people’s predictions of whether someone is an expert move closer to 50/50 or not. Right? But presumably, if the signal is getting less informative, you should move to the population average of experts versus non-experts, not 50/50.

Bo Cowgill: Well, the experiment was set up such that the population average was 50/50.

Seth: You tell... well, so you have a measure of whether these people count as experts, right? And in your sample, 50, approximately 50% are experts and 50% are non-experts. As a person reviewing these, have you told me that 50% are experts according to your classification?

Bo Cowgill: Yes. Now, interestingly, their actual beliefs... they don’t seem to totally believe that because on average they think about 45% are experts. And interestingly, they think that about 45% are experts both in the GenAI and the non-GenAI condition. So it’s possible that they would’ve just totally updated their beliefs based on all these amazing cover letters and pitches and little resumes in the experiment and said, “Oh, these people must all be really good.”

Seth: But what actually happened? Okay, but you tell us the treatment. Yeah.

Andrey: So I think, to be helpful to the listeners, the experimental...

Seth: Why do that?

Andrey: ...unit of randomization, the treatment, et cetera.

Bo Cowgill: Yeah. So in our experiment, we recruit people with job experience in the various domains. And we ask them to make a pitch for both a job that they’re qualified for based on what Prolific knows about them and a job that they are not qualified for. So everyone either has domain expertise or prior experience in some sort of data science or some sort of management consulting type of job. So basically everyone is asked to masquerade a little bit to be as qualified as possible for a job that they really didn’t have any prior experience.

And so they write these pitches and then they’re asked to use ChatGPT to edit them to try to make them essentially more convincing. So this is the sender side of the experiment. And then on the receiver side, we get basically people with hiring experience or recruiters to then evaluate these different ones and try to label who are the people that have actual expertise and who are the ones who don’t. It’s essentially like asking, “Who would you wanna hire?” And the recruiters get to know who was using GenAI or not.

Seth: Be very... this seems to be a very important distinction here, so be very clear. They’re told who uses it or who has access to it?

Bo Cowgill: They’re told who has access to it. And our goal there is we’re trying to think about the long-run implications of GenAI on signal dilution. And I think we’ve arguably already reached a world where, if you read a cover letter or you read a resume, it’s probability one that they had access to GenAI.

Seth: Not just probability one-hyphen. It’s a major insight that you just got.

Bo Cowgill: Right.

Andrey: Certainly.

Bo Cowgill: Exactly. But the experiment I don’t think is good... it doesn’t capture, say, the 2024 era very well.

Seth: Remind us when. When is this happening? When are you doing this study?

Bo Cowgill: This happens in 2023. And I think that there’s an intermediate period where there’s some uncertainty about whether this person had access or not. But the long-run implications between the pre-GenAI world and the post-GenAI world, these are the more interesting ones I think to my co-authors and I.

Seth: The correct treatment. Yes. I totally agree that it makes sense that the treatment is “these people got access to AI” rather than “they used AI for exactly this sentence” because that’s the more empirically relevant. Yeah.

Bo Cowgill: Right. Yeah. It’s also possible that the control group could have used GenAI as well. And so we asked them just to make sure, but basically almost none of them did. And we removed the instances where...

Andrey: So I had a very, but a positive, you know, a constructive comment for you, which is that you could...

Seth: Oh s**t. This is gonna be devastating.

Andrey: No, no. It’s actually constructive. You could just use one of these AI writing detectors, the good one from Alex Imas’s paper, to see whether they actually use the GenAI or not.

Bo Cowgill: Yeah, no, this is a good idea. This is a good idea. Well, if it hadn’t already been accepted, I think that would definitely be worth checking out.

Seth: And one detail you skewed is that people who use the GenAI, their CVs get way better according to GenAI.

Bo Cowgill: That’s true. That’s true. Yeah. So when basically we have these recruiters assess, they assess several things. One is just like, do they think that the pitch is generally higher quality? Or, does it seem like it required more effort to produce? And, or does it sound kind of polished and like the person knows what they’re talking about?

Seth: Wait, what’s the exact prompt? No, I actually am very curious. Which of those versions is what you ask?

Bo Cowgill: It is, “What’s the quality of the pitch?”

Seth: Quality, right? Because it’d be very interesting if you got a different result for “How much effort do you think you put in?”

Bo Cowgill: No, that’s our theoretical interpretation.

Seth: Fair enough. But hey, why not ask?

Bo Cowgill: True. Yeah. It is. I think it was important. We didn’t ask them how convincing it was because that’s actually a separate question, which opens up the idea that like, “Yes, this is a higher quality pitch, but because we know it’s now become suddenly super cheap to make a pitch like this, we’re actually not very convinced by it.” So this is the other main outcome variable. “Who do you think is actually an expert?” or “How convinced are you?”

And on average, we see information loss from the conditions where the candidate was able to access GenAI. And so this is about a 4% to 9% information loss, or a 4% to 9% decrease in accuracy.

Seth: Oh, can I pause you for a second? ‘Cause so there’s two measures we’re gonna use as how accurate are these screeners? The first one we talked about just now, which is how close are you to just 50/50 as to whether this person is an expert. So obviously you have zero information if you say that they’re a 50/50 expert, but if you were 100% one way or zero, you’d be confident. And then the second thing you get at, right, is this error measure, which is the difference between whether the person’s actually an expert or not, which is this 1/0 binary. And then people can kind of continuously say, “I think this guy’s an 80% expert,” or “I think this guy’s a 20% expert.” And specifically when you say that information transmission went down, which of those measures are you talking about, or both?

Bo Cowgill: Uh, both. The 4% to 9% represents... one of them is using one of these outcomes and the other one is using the other one. And so basically we’re trying to say, you could use a variety... either of these ways to measure accuracy and you qualitatively get the same thing.

And so, what should you make of this 4% to 9%? So I think the information apocalypse people think like, “Wow, that’s it? Only 4% to 9%? This is not very much.” I think that’s a fair point. Now, if you think about... actually another detail that I’ve left out is we studied... we ran this experiment essentially on hiring and with recruiters and hiring managers. And then we also did a similar one in the domain of entrepreneurship with people that were interested in starting a new business, some of whom had no prior expertise in the type of business that they were pitching. And the evaluators here were people with some sort of investing experience. We broadly see the same thing and can’t differentiate the two different domains with regards to the key outcomes and the intermediate values.

So, but we should get back to this 4% to 9%. But one very interesting result, I think, is that when the receivers of these signals are evaluating its quality, we see this huge collapse in the variance of these signals. So it basically looks like everyone’s pitch starts to look pretty good. Without GenAI, they’re all kind of spread out, which is useful for disambiguating who has a good pitch and who has a bad pitch, or who has high underlying experience and human capital or not. But the GenAI kind of homogenizes all of them. And that’s the intuition behind why there’s this information loss.

Seth: So just to understand. Let me understand that a little bit better. So I understand that we’re bringing up the bottom, right? The really bad resumes and pitches get upgraded. Are we also dragging down the top? Or are we just making it more linguistically similar? Understand, tell me... understand what’s happening for the pre-GenAI top performers.

Bo Cowgill: So they’re getting bumped up, just not by very much. So if all types were moving up in quality by an equal amount, then you would just kind of shift the quality to the right between the no-GenAI and the GenAI treatments. But what we see is that the even the high types go up by a little bit, but just not by very much with regards to their application quality or their pitch quality. Meanwhile, the low types are going up a lot, which then pushes them next to the high types and they’re now looking very similar to each other with regards to the quality.

We could also look at linguistically, are they using the same underlying words? We didn’t look directly at that, but I think it’s likely given what we’ve seen in other domains that use of GenAI makes everybody kind of sound a little bit, not just similar quality, but actually using some of the same underlying words.

Seth: Such a similar quality-hyphen, almost identical.

Bo Cowgill: Exactly. Right. M-dashes and using the word “delve” a lot and stuff like this.

Seth: Oh yeah.

Bo Cowgill: Yeah. So on average you lose information. I think the 4% to 9%... there’s not a lot of information to begin with. It’s like a very well-replicated finding that it’s hard to hire people and it’s hard to pick diamonds in the rough before they have much of a track record. Even if they have a track record at other companies, the match-specific aspect can be hard to pick up on. And if you think about an investor who had 4% to 9% lower returns—and one of our applications is actually in investing—then like, I think that would be a problem for the success of their business.

Andrey: But I mean, so I’m now going to make the point about, like, I really don’t care about whether this is a big or small effect. ‘Cause I don’t care about your setting. Not like it’s a bad setting to show how this would work in practice in a real setting we cared about, but like clearly Prolific people rating each other is not really something where we specifically care about the parameters that we estimate. For example, for an investment pitch, no one actually makes investment decisions based on a written artifact and that’s that. Right? Or you’d have to be pretty crazy to do that.

Bo Cowgill: So I will hard disagree on that.

Seth: Ooh, ooh, spicy.

Bo Cowgill: The most common place to get turned down from a startup pitch is before you even walk in the door, when you send your text-only pitch to an investor or an angel investor or a VC. Text-only, maybe some mostly-text slides. You send that in. This is where most people are eliminated. They don’t even get in the room.

Seth: I guess what Andrey would say is the marginal guy who gets into the room is never gonna get the deal.

Andrey: Yeah, I mean, that’s kind of...

Bo Cowgill: I don’t know if I even agree with that. I think that VC investing is probably really noisy as well. I mean, they lose a ton of money and not everyone agrees. I mean, there are these cases like Google where they had two top-tier investors, but I think that there are cases where people didn’t necessarily expect it.

Andrey: I don’t think... no, no. I really think if you wrote down plausible distributions here, it would be almost surely that this is really affecting people with very low probability of investment just to get... right? Because the baseline rate of investing is so low, even conditional on getting past that initial stage. Right.

Seth: And even if we take a step back, if we think about just AI as a technology that is good at automating the low-skill thing but leaves the high-skill thing less affected, you would expect that the more advanced setting, the setting with more applications, if we’re just taking the arg max, maybe it doesn’t matter so much that we’re mixing up the middle a little bit.

Bo Cowgill: I see what you mean. Yeah. Interesting to keep on studying this.

Andrey: I guess like, that’s what I was... I was really pushing back on just this... I would not... like, I like the paper, I think, viewed as a proof of concept, but I would not take anything literally. So I’m very uncomfortable with statements as like “investors would lose this many returns” and just in general, right? Like it’s not... lab experiments are great, but they’re not gonna...

Seth: Andrey would only trust this study if people reported 0% of these people are experts.

Andrey: Yeah.

Bo Cowgill: It is a proof-of-concept sort of paper and this is something we talk about in the discussion.

Andrey: Yeah.

Bo Cowgill: And yeah, it’s totally fair to say, I don’t know how...

Andrey: I guess I was gonna offer you a chance to say something about other papers. ‘Cause now there are a few other papers that are kind of trying to get at similar mechanisms.

Seth: Perfect. Do the meta-analysis live for us.

Andrey: I assume you’ve thought about it. Yes.

Bo Cowgill: I have seen some other papers in this area and they all look super cool. I guess the ones that I know best, although I don’t know every detail, are by, first of all, a PhD student at Princeton. And then a couple of PhD students at Yale that are both studying a change in Freelancer.com that happened when they released a GenAI basically cover letter tool to help your pitch if you were a freelancer.

And in various ways, I don’t want to speak on behalf of those authors, but it seems like, at least in those cases, there was this negative covariance idea where it seems like it actually harmed what used to be good signals about your match quality. And the way that the freelancers would do that was they use the GenAI tool to customize their pitch to look exactly like the requisition, or as much as possible, without lying. I don’t think they established there was no lying, but this is how they were doing it. So at least in these other domains, it seems like there’s some evidence that GenAI is similarly messing up signal accuracy and signal quality.

Andrey: Then there’s also, I think Emma Wilds has a paper, right? There’s a couple of papers on this, if I remember correctly. In one of them at least, they get access to the GenAI tools and that increases overall hire rates on the platform. Am I remembering that correctly or?

Bo Cowgill: That’s right. That’s right. And then at least in that case, they don’t find any sort of ex-post regret. And so, which might indicate that they were fooled and they were sent... unhappy. So this is a little bit more positive of a finding.

Seth: Are you... will you go out there? Will you now say, “And the reason that they found that GenAI was good was ‘cause...” Is this... they must have had a positive correlation between true skill and benefit from GenAI. Do you wanna make that claim in that population, in that context?

Bo Cowgill: Right, right. To be more clear about what they find, at least what I remember is them finding... is they don’t actually find that hiring improved. They just find a noisy enough covariance that they can’t reject... that they can’t sign it.

Seth: They fail to reject.

Bo Cowgill: Right. Right. So, not trying to start something here, but I thought like, well, maybe this is more of a somewhat ambiguous finding. And I also think that it’s presented not as “hiring actually improved,” but “we cannot reject that hiring actually got worse.” So then, maybe more precise tests will change this.

Andrey: So to be clear, the quality of the... we’re talking two things: the quality of the hires and the total number of hires, which are different numbers. And I think you’re talking about the quality of the hires. Is that right?

Bo Cowgill: That’s right. I think that the paper by Emma and John on this other freelancer platform, possibly the same one, you know, we don’t know.

Andrey: Truly a mystery which platform.

Bo Cowgill: Yeah. The employer can rate the freelancer. And so, if I recall their paper correctly, I think that they’re looking at those ratings and saying, it’s not like in the treatment group where you had these amazing cover letters, everyone was disappointed ex-post with what happened.

I mean, there’s a lot of other stuff that could go on there. It could be that they were super disappointed initially, and then the freelancer is like, “Oh, sorry. Well, I kind of masqueraded. Why don’t I do some extra work for you?” or adjust some other margin. But the punchline of our theory model is that this isn’t forced to go any single way. And it could totally be happening this way.

Seth: And be... but yeah. So I guess maybe let’s wrap up this idea of like external validity, right? Which is, the model seems to really imply that this will be super population- and context-dependent. And if the model implies that it’s gonna be super population- and context-dependent, then taking a snapshot in one place at one time can only tell you so much about everywhere else.

Bo Cowgill: I agree. I don’t think we’re trying to sell this as like, this is gonna happen everywhere, at least not on the basis of these results. Now, an interesting podcast discussion I think would be like, what did we expect? And we can go into that more speculatively.

Andrey: Well, let’s go to speculation mode.



This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit empiricrafting.substack.com