Subscribe here: Apple Podcasts | Spotify | YouTube | Overcast | Pocket Casts
People have long worried about robots automating the jobs of truck drivers and restaurant servers. After all, from the invention of the cotton gin to the washing machine, we’re used to an economy where technology transforms low-wage, physically arduous work.
But the past few years have shown that highly educated white-collar workers should be the ones bracing for artificial intelligence to fundamentally transform their—I should probably say our—professions. The angst this has spurred from all corners of white-collar America has been intense, and not without merit. AI has the potential to take over much of our creative life, and the risks to humanity are well documented.
The discourse around AI has focused so squarely on the terrifying risks and potential job losses that I’ve noticed there’s been very little discussion around why so many people are working so hard to create this doom monster in the first place.
On today’s episode of Good on Paper, I’m joined by someone researching what happens when AI enters a workplace. Aidan Toner-Rodgers is a Ph.D. student of economics at MIT and has a working paper out on what happened to scientific discovery (and the jobs of scientists) when an R&D lab at a U.S. firm introduced artificial intelligence to aid in the discovery of new materials.
Materials science is an area of research where we can see the direct applications of scientific innovation. Materials scientists were the ones who developed graphene, thus transforming “numerous products ranging from batteries to desalination filters” and photovoltaic structures that “have enhanced solar panel efficiency, driving down the steep decline in renewable energy costs,” Toner-Rodgers writes. There are also countless more applications in fields such as medicine and industrial manufacturing.
New discoveries in this field have the potential to transform human life, making us happier, healthier, and richer. And when scientists at this company were required to integrate an AI assistant in generating new ideas, they became more productive, discovering 44 percent more materials.
“I think a big takeaway from economic-growth models is that in the long run, really, productivity is the key driver of improvements in living standards and in health,” Toner-Rodgers argued when we spoke. “So I think all the big improvements in living standards we’ve seen over the last 250 years or so really are driven fundamentally by improvements in productivity. And those come, really, from advances in science and innovation driving new technologies.”
The following is a transcript of the episode:
[Music]
Jerusalem Demsas: What is the point of artificial intelligence? Why, when there is so much concern about the potential consequences, are we hurtling towards a technology that could be a mass job killer? Why, when we face so many competing energy and land-use needs, are we devoting ever more resources to data centers for AI?
There are good reasons to worry about its negative consequences, and the media has a bias toward negativity. As a result, we don’t tend to explore these questions.
My name’s Jerusalem Demsas. I’m a staff writer at The Atlantic, and this is Good on Paper, a policy show that questions what we really know about popular narratives.
Today’s episode is about one of the best applications of AI: helping push the boundaries of science forward to make life better for billions of people. This isn’t a Pollyannaish conversation that skates past concerns with AI, but I do want to spend some time investigating the ways that this technology could improve our lives before we get into the business of complicating it.
In some ways, this conversation isn’t just about AI. It’s about technological progress and the trade-offs that come with it. Are the productivity benefits of AI worth all the downstream consequences? How can we know?
My guest today is Aidan Toner-Rodgers. He’s a Ph.D. student in economics at MIT with a fascinating new working paper that shows what happens when scientists are required to begin using AI in their work.
Aidan, welcome to the show!
Aidan Toner-Rodgers: Thanks so much for having me.
Demsas: You have a really great paper that I’m interested in talking to you about, but first I want us to sort of set the stage here a bit about productivity. So productivity is something that economists talk about a lot, and I think it can be ephemeral to people about why it’s so important.
So why do economists care about productivity?
Toner-Rodgers: Yeah, so I think a big takeaway from economic-growth models is that in the long run, really, productivity is the key driver of improvements in living standards and in health. So I think all the big improvements in living standards we’ve seen over the last, like, 250 years or so really are driven fundamentally by improvements in productivity.
And those come, really, from advances in science and innovation driving new technologies. So when economists think about what are the most important drivers of living standards, it really is kind of coming back to productivity.
Demsas: Yeah, and I think that sometimes it’s useful to think about ways in which society gets better, right?
Like, most increases in inputs—so if you increase labor, it means you have less leisure time. And if you increase investments in capital, that means you’re lowering your current consumption. So you’re moving away from buying things that you may want in order to invest in the future, and if you’re increasing material inputs, that reduces natural resources.
So the idea is: How can we get more efficient? And one stat that I like to point to is that “productivity increases have enabled the U.S. business sector to produce nine times more goods and services since 1947 with a [pretty] small increase in hours worked.” So we’re just getting a lot more stuff without having to kill ourselves working to get it. And that can be, you know, just clothes and things like that, but that can also be services. Like now, because it’s really easy to produce a T-shirt, you need less people making T-shirts, and they can teach yoga or do other things. And so I think that’s really important to set the stage here.
But I want to ask you, because your paper is about AI, about this bet that I wonder which side you take on. There’s this bet—I don’t know if you’ve heard about it. It’s between Robert Gordon and Erik Brynjolfsson. Have you heard about this bet?
Toner-Rodgers: I don’t think so, actually.
Demsas: Okay, yeah. It’s basically a $400 bet to GiveWell, so I don’t know if it really has the impact of me making people put their money where their mouth is.
But Robert Gordon is an economist. He’s kind of a longtime skeptic of digital technology’s ability to match the impact of things like electricity or the internal combustion engine. And his argument, basically, is just that he doesn’t expect AI to have a significant impact on productivity. And he argues that because, you know—he points at things like how the U.S. stock of robots has doubled in the past decade, but you haven’t seen this massive revolution in production, productivity growth, and manufacturing. And he also says that AI is really nothing new. You know, we’ve had human customer-service representatives replaced by digital systems without much to show for it. And then he also says things like a lot of economic activity that is relevant to people’s lives, like home construction, isn’t really going to be impacted by AI.
So it’s one side of the debate. It’s kind of more pessimistic on AI. And the other is kind of represented by Erik Brynjolfsson—he’s more of a techno-optimist—and he argues that recent breakthroughs in machine learning will boost productivity in places like biotech, medicine, energy, finance, but it’ll take a few years to show up in the official statistics, because organizations need time to adjust.
Again, they’re only betting $400, so I don’t know if they’re putting their money where their mouth is, but whose side do you kind of take in this debate?
Toner-Rodgers I mean, I think I’m probably more on Erik’s side. So Robert Gordon’s research, I think, has done a great job showing that over the past 40 years or so there’s been this big stagnation, kind of, in innovation in the physical world.
But I think something I’m really excited about in AI is that all these advances in digital technologies, computing power, and algorithms maybe can now, finally, have this impact kind of back to physical infrastructure and physical things in the world. So I think, actually, materials science is a great example of this, where we have these kinds of new AI algorithms that can maybe come up with new important materials that can then be used in physical things.
Because I think a lot of the advances in information technology so far haven’t had big productivity improvements, because they were kind of confined just to the digital world, but now maybe we can use these breakthroughs to actually create new things in the world. And I do think the point—that there’s a lot of constraints to building things, and a lot of the barriers to productivity growth are not, like, we don’t know how to do things, but there’s just big either regulatory or other barriers to building things in the world—is very important.
And I think that’s why the people who are super optimistic about AI’s impact—I think I’m a bit more pessimistic than them because of these kind of bottlenecks in the world. But I’m very excited about things—like biomedicine, drug discovery, or materials science—where we can maybe create new actual things with AI.
Demsas: So materials science, I think, is the place where your research really is focused. So can you just set the stage for us? What type of company were you looking at, and what kind of work are the employees doing?
Toner-Rodgers: Yeah, so the setting of my paper is the R & D lab of a large U.S. firm which focuses on materials discovery. So this involves coming up with new materials that are then incorporated into products. And so this lab focuses on applications in areas like healthcare, optics, or industrial manufacturing.
And so the scientists in this lab, many hold Ph.D.s or other advanced degrees in areas like chemical engineering or materials science or physics. And what they’re doing is trying to come up with materials that have useful properties and then incorporate these into products that are then going to be sold to consumers or other firms.
Demsas: And help us set—what do you mean by materials? Like, what are we trying to find here?
Toner-Rodgers: So in some sense, everything in every product uses materials in important ways. Like, one estimate I have in the paper: Someone was kind of looking at all-new technologies and products—How important were new materials to these?—and he found that two-thirds of new technologies really relied on some advance in discovering or manufacturing at scale some new material. So this could be anything from the glass in your iPhone, to the metals in semiconductors, to different kinds of methods for drug delivery. So this is like a lot of the technologies in the world really are relying on new materials.
Demsas: Yeah. I mean, you note in your paper that materials science is kind of the unsung hero of technological progress. And when you start to think about it, it really just adds up. Like, basically every single thing that you could care about, it ends up boiling down to specific materials that you want to find—so whether it’s computing or it’s biomedical innovation, like you said, but also just stuff that we’ve been surprised by recently, like the lowering costs of solar panels. Like, new photovoltaic structures being found is helping drive down the cost of those renewables.
So all these different things—and I think it’s funny, because, I mean, we are an increasingly service-sector-based economy. So I think that we’re kind of abstracted away from some of the materials’ impact on our lives, because we just don’t really see it in our day-to-day. But it’s just as important. I think the pandemic really showed this one when we were missing semiconductor chips.
Toner-Rodgers: Yeah, maybe an economics way to put this is that materials science is very central in the innovation network. So there’s been some papers looking at which other fields rely on research from materials science. And it’s really one that’s very central in this network, where things like biomedicine to manufacturing are really relying on new discoveries in materials science. And so kind of focusing on this is a key driver of growth in a lot of areas.
Demsas: And so the scientists in this firm—can you just walk us through what they’re actually doing? Like, what is the process of their work? And then we can get into how AI changed it.
Toner-Rodgers: Sure. So a lot of what they’re doing is basically coming up with ideas, designs for new materials. And then because materials discovery is very hard, many, many of these materials don’t end up having the properties that they hope they do or don’t yield a viable, stable compound. So a lot of what they’re doing is doing tests either in silico tests—like doing simulations—or actually kind of making these materials and testing their properties to see which ones are actually going to be helpful and can later be incorporated into products.
So their time is split. Maybe, like, 40 percent or so is on this initial idea-generation phase, and then the rest is testing these things and seeing which materials are actually viable.
Demsas: When I was reading your paper, I analogized it to coming up with recipes in a kitchen. And you can have a test kitchen or something like that, where basically, if your goal is to come up with a bunch of new recipes for food or for baking or whatever, you may come up with some on paper, and then you’re like, Okay, well, I have to pick which one is potentially going to be a really good recipe, and then you would, you know, test it. And probably you don’t do a simulation. You probably just go make the donut or whatever it is. Is that kind of a good analogy for this?
Toner-Rodgers: Yeah, I think it is, and also just in the sense that we know a lot about the ingredients or sets of elements and their bonds, and we know a lot about that at a small scale, but it becomes very hard to predict what a material’s property will be as these materials become bigger and more complicated. And so even though we know a lot in some small sense, actually prediction gets pretty hard.
Demsas: So AI gets introduced at this company because they want to figure out if that can help their scientists be more productive at coming up with new materials. At what point in the process is AI coming in? What is it actually doing? How does it change the scientists’ jobs?
Toner-Rodgers: Yeah, so AI’s role is really in this initial idea-generation phase. And so how it works is that scientists are going to input to the tool some set of desired properties that they want a material to possess. So in this setting, this is really driven by commercial application because this is a corporate R & D lab. So they want to come up with something that’s going to be used in a product. And then they’re going to input these desired properties to the AI tool, which is then going to generate a large set of suggested compounds that are predicted by the AI to possess these properties.
And so before, scientists would have been coming up with these material designs themselves. And now this part is automated by the tool.
Jerusalem Demsas: So it’s like, Now I’m having an AI tool give me a bunch of potential donut recipes instead of me coming up with them myself.
Toner-Rodgers: Exactly. And I think it’s important to note that this whole prediction process is very hard. And so even though I’m going to find pretty large improvements from the AI tool on average, many, many of its suggestions are just not that good and either aren’t going to yield a stable compound or aren’t going to actually have the other properties that you wanted to begin with.
Demsas: Yeah. And so before we get into your results, which are really shocking to me actually, it’s kind of cool—the company set up a natural experiment, basically, for you. Can you walk us through what they did and how they randomized researchers?
Toner-Rodgers: Yeah. So I think the lab had just a lot of uncertainty going in about whether this tool was going to be actually helpful. Like, you could have thought, Maybe it’s going to generate a lot of stuff, and it’s all bad, or it’s going to kind of slow people down as they have to sort through all these AI suggestions.
So I think they just had a lot of questions about: Is this tool going to work, and are we going to get actually helpful compounds? So what they did, instead of just rolling it out all at once, was to do three waves of adoption where they randomly assigned teams of scientists to waves. And so this allows me, as a researcher, to look at treated and not-yet-treated scientists and identify the effects of the tool.
Demsas: And did they control for different things? Like, did they control for, you know, what types of research they were working on or how many years of experience they had?
Toner-Rodgers: Yeah, so there’s a lot of balance between waves because of the randomization on what exactly these scientists are working on, which types of technologies and materials, as well as just the team composition in terms of their areas of expertise and tenure in the lab and so on.
Demsas: So now I want to turn to the results. What did you find?
Toner-Rodgers: So my first result is just looking, on average, at how this tool impacted both the discovery of new materials as well as downstream innovation in terms of patent filings and product prototypes. So I find that researchers with access to the AI tool discover 44 percent more materials, and then this results in a 39 percent increase in patent filings and then a 17 percent rise in downstream product innovation, which I measure using the creation of new product prototypes that incorporate those materials.
Demsas: These are, like, massive numbers.
Toner-Rodgers: Yeah, I think they’re pretty big. And also, I think it’s helpful to kind of step back and look at the underlying rate of productivity growth in terms of the output of these researchers. So I look back at the last five years before the tool was introduced, and output per researcher had actually declined over this period. So these are huge numbers relative to the baseline rate of improvement.
Demsas: So it’s interesting—well, I guess first: How? Like, why are people becoming more productive here?
Toner-Rodgers: I think there’s two things. So one is just that the tool is pretty good at coming up with new compounds. So being able to train a model on a huge set of existing compounds is able to give a lot of good suggestions.
And then second: Not having to do that compound design part of the process themselves frees scientists to spend more time on those second two categories, kind of deciding which materials to test and then actually going and testing their properties.
Demsas: It’s interesting when I was looking at your results because you’re able to kind of look at, you know, one month after, four months after the adoption of this new AI tool, how it changes things. Things look kind of grim in the short run, right? Like, four months after AI adoption, the number of new materials actually drops. And it’s not until eight months after that you see a significant increase in new materials. And that’s around when you see the patent filings increase. And it’s not until 20 months after that you actually see it show up in product prototypes.
And, you know, part of the problem of trying to figure out if new technology like AI is having a big impact is that it might take a while to show up in statistics. Is that why you think maybe we’re not seeing a massive jump in productivity right now in the U.S., despite the rollout of a ton of new machine-learning tools?
Toner-Rodgers: Yeah, I think that’s partly true. Like, you definitely need some forms of organizational adaptation or people learning to actually utilize these tools well. So part of why there’s this lag in the results is just that materials discovery takes a while. So it takes a little bit to actually go and kind of synthesize these compounds and then go and find their properties.
But another thing I find is that in the first couple months after the tool’s introduction, scientists are very bad, across the board, at determining which of the AI suggestions are good and which are bad. And this is part of the reason we don’t see effects right away.
Demsas: So it’s like your job has changed significantly, and you just need time to adjust to that.
Toner-Rodgers: Yeah, totally.
Demsas: So I want to ask you about material quality, though, because what you’re measuring, largely, is the number of materials made. But has the quality of the materials improved or declined, and how would we know?
Toner-Rodgers: So I think that’s a key concern when you’re doing these things, is we don’t only care about how many new discoveries we’re getting, but what they are. So a very nice thing about my setting and materials science, in general, is that there’s direct measures of quality in terms of the properties of these compounds. And in particular, at the beginning of the discovery phase, scientists define a set of target properties that they want materials to possess.
And so I can compare those target properties to the measured properties of materials that are actually created. And so when I do this, I find that, in fact, quality increases in the treatment group, which is showing that we’re not actually having this compromised quality as a result of faster discovery.
Demsas: So there’s this joke that I was looking up, and apparently Wikipedia tells me it’s attributed to this character from Muslim folklore called Nasreddin, but I could not independently verify this. Most people have probably heard some version of this. It goes: A policeman sees a drunk man searching for his keys under a streetlight, and he tries to help him find it. They look for it for a bit of time, and then he’s like, Are you sure you dropped them here? And the drunk guy is like, No, I lost them in a park somewhere else. The policeman is kind of incredulous; he’s like, Why are you looking for them here? And the drunk guy goes, This is where the light is.
And this has been, you know, referred to by a lot of researchers as the streetlight effect, right? So it’s a phenomenon that people tend to work where the light is or like easiest problems, even if those aren’t the ones that are actually likely to bear the most fruit. Do you think that AI helps us avoid the streetlight effect or it exacerbates the problem?
Toner-Rodgers: So I think talking to people before this project, I would have guessed that it would exacerbate the problem. And the reason is that the tool is trained on a huge set of existing compounds. So you might expect that the things it suggests are going to be just very similar to what we already know. So you might think that because of that, the streetlight effect is going to get worse. We’re not going to come up with the best things but rather just things that look very similar to what we already know.
And I think, surprisingly to me, I find that, in my setting, this is not the case. And so to do that, I measure novelty at each stage of R & D. So first I look at the novelty of the new materials themselves. And to do that, I look at their chemical structures—so the sets of atoms in a material, as well as how they’re arranged geometrically. And I can compare this to existing compounds and see, like, Are we creating things that look very similar to existing materials, or are they very novel?
So on this measure, AI decreases average material similarity by 0.4 standard deviation. So these things are becoming more novel. And it also increases the share of materials that are highly distinct—which I define as being in the bottom quartile of the similarity distribution—by four percentage points. So it seems like, both on average and in terms of coming up with highly distinct things, we’re getting more.
Demsas: This is kind of surprising to me, right? There’s a paper by some researchers at NYU and Tel Aviv University called “The Impact of Large Language Models on Open-Source Innovation,” and they sort of raised this question about whether AI has asymmetric impact on outside-the-box thinking and inside-the-box thinking. And you know, the thing is that most AI systems are evaluated on tasks with well-defined solutions, rather than open-ended exploration. And, you know, models are predicting the most likely next response. Like, what’s happening with ChatGPT is it’s just predicting what the next word is going to be. Or that’s what most of these systems are trying to do. And they’re trained on this corpus of existing stuff, and it’s not like they’re independent minds.
And so they kind of theorize that, you know, AI might be good at finding answers to questions that have right answers or ones where there’s clearly defined evaluation metrics. But can it really push the bounds of human understanding, and does our reliance on it really reduce innovation in the long term? So I mean, this seems to be a really big problem in the field of AI, and I wonder: How confident are you that your findings are really pushing against this? Or is it kind of like, maybe in the short term, there’s some low-hanging fruit that looks really novel, and in the long term, you’re not really going to have that?
Toner-Rodgers: Yeah, so I think one drawback of the measurements I have is that I can see that, on average, novelty increases, but what I can’t see is whether the likelihood of coming up with really truly revolutionary discoveries has changed. And so if you think of science as being driven, really, by these far-right-tail breakthroughs, you’re just not going to see much of these in your data. This has been an issue highlighted by Michael Nielsen in some essays that I like a lot.
And so one kind of thing you might be worried about is, Well, we got, on average, more novel things, but maybe these very revolutionary discoveries have a lower probability of being discovered by the AI, and that in the long term this is not a good trade-off. And because you’re just never going to see very many of these right-tail discoveries in your data, you just can’t say much about this using these types of methods.
Demsas: I mean, how confident, then, are you that we can even test whether this is happening?
Toner-Rodgers: Yeah, I think one answer is that we’ll just need some time to see, like, do these new materials open up new avenues for research? Like, are there other materials that are going to be built on these new ideas that the AI generated? But one thing I’d say is just that I think a lot of people would have said beforehand that, even on average, I expect novelty to go down. And the fact that it went up, I think, does push back somewhat against the view that these things are going to be bad for novelty.
Demsas: And then I guess, kind of on this question of generalizability to other fields, like, materials science is a place, of course, where you can measure productivity pretty cleanly. Like, you can see what the compounds are. You can see what people are trying to look for. A lot of fields, even in science, are not like this. They’re not super easy to measure what exactly you’re trying to find, and innovations can have spurts and stops for long periods of time, even if a lot of work is happening. So I guess, do you expect AI to be as helpful in fields that look a lot less like materials science?
Toner-Rodgers: So I think in the short run, I would say probably not, right? I think there’s areas where it does look a lot like this, like things like drug discovery, but then there’s a lot of areas where it doesn’t look like this at all. I would say, I think kind of fundamentally, this comes down to how much of science is about prediction versus maybe coming up with new theories or something like that. And I think maybe I’ve been surprised over the last several years how many parts of science, at least in part, can have big impacts from AI, right?
So we see in things like math, where maybe it really feels like it’s not a prediction problem at all, like doing a proof, but we see things like large language models and other more specialized tools really being able to make progress in these areas. And I think they’re not at the frontier of research by any means, but I think we’ve seen huge improvements.
So this is absolutely an open question how much these tools can generalize to other fields and come up with new discoveries more broadly. But I would say that betting against deep learning has not had a great track record in recent years.
Demsas: Yeah, fair.
[Music]
After the break: AI doesn’t benefit everyone equally, even when we’re talking about brilliant scientists.
[Break]
Demsas: I want to ask you about the distributional impacts. I think this is probably the most pessimistic, concerning part of your paper. You find that the bottom third of researchers see minimal gains to productivity, while the top 10 percent have their productivity increase by 81 percent. Can you talk through how you’re measuring the sort of productivity of these researchers and this finding, in particular?
Toner-Rodgers: Yeah. So first I kind of just look at scientists’ discoveries in the two years before the tool was introduced. And there’s a fair amount of heterogeneity across scientists and their rate of discovery. And I do some tests showing that these are kind of correlated over time, so it’s not like some scientists are just particularly lucky. And, instead, there do seem to be these kinds of persistent productivity differences across scientists. And then I just look at each decile of initial productivity: How much do those scientists’ output change once the tool is introduced? And we see these just massive gains at the high end. And at the low end, on average, they do see some improvement, maybe 10 percent or so, but nowhere near as much as the kind of initially high-productivity scientists.
Demsas: Why? Like, at what stage are the low-productivity scientists getting caught up? Because, you know, if this tool is just giving them a bunch of potential recipes for new materials, are they just worse at selecting which ones to test, or what’s happening?
Toner-Rodgers: Yeah, so I think the key mechanism that I identify in the paper is that it’s really this ability to discern between the AI suggestions that are going to be actually yielding a compound that’s helpful versus not. So I think just the vast majority of AI suggestions are bad. They’re not going to yield a stable compound, or it’s not going to have desirable properties. And so because actually synthesizing and testing these things is very costly, being able to determine the good from the bad is very important in this setting. And I find that it’s exactly these initially high-performing scientists that are good at doing this. And so the lower-performing scientists spend a lot of time testing false positives, while these high-ability ones are able to kind of pick out the good suggestions and see their productivity improve a lot.
Demsas: But lower-performing scientists aren’t getting worse at their jobs, right? They’re just not really helped by the tool.
Toner-Rodgers: Yeah, that’s true. But I think it’s worth saying that it’s not like they’re not using the tool. So it really is that their research process changed a lot, but because their discernment is not great, it ended up being kind of a similar productivity level to before.
Demsas: And were you able to observe this inequality over time? Was it stagnant? Did it widen? Did it decrease? Was there learning that you were able to see happen with less-productive researchers?
Toner-Rodgers: Yeah. So I think something very interesting is, like, if I look in the first five months after the tool was introduced, across the productivity distribution, scientists are pretty bad at this discernment. So all of them are kind of doing something that looks like testing at random. They’re not really able to pick out the best AI suggestions. But as we look further on, scientists in the top quartile of initial productivity do seem to start being able to prioritize the best ones, while scientists in the bottom quartile show basically no improvement at all. And so I think this is pretty striking. And there’s just something about these scientists that’s allowing some to learn and some to see no improvement.
Demsas: And how long were you able to observe this for? Like, is it possible that maybe they just needed more time?
Toner-Rodgers: Yeah, so I think I see, like, two years of post-treatment observations. So in that time, I don’t see improvement. I think it’s possible either they need more time, or maybe they need some sort of training to be able to learn to do this better. So I think one question: Is this something fundamental about these scientists that’s not allowing them to do this? Or is there some form of either training or different kind of hiring characteristics the firm could look at to identify scientists that are good at this task?
Demsas: So were you surprised by this finding? After reading your paper, our CEO here at The Atlantic, Nicholas Thompson—he pointed out that in studies of call centers, the opposite is often true. For instance, the guy we mentioned earlier, Erik Brynjolfsson, who’s kind of a techno-optimist, and two of his co-authors recently put out a working paper that looks at over 5,000 customer-service agents and found that AI increased worker productivity. And they’re measuring that as issues resolved per hour. And it increases their productivity by 14 percent, with less-experienced and lower-skilled workers improving the speed and quality of their output, while the most experienced and the highest skilled saw only small gains. So I guess, looking at the field, in general, is it strange that you’re seeing the biggest impact happening with the most-skilled people? Should we expect the opposite?
Toner-Rodgers: Yeah, so I think a lot of the early results on AI have found that result that you just mentioned, where the productivity kind of compresses, and it’s these lower-performing people that benefit the most. And I think in that call-center paper, for example, I think one thing that’s going on is just that the top performers are already maybe nearly as good as you’re going to get at being a call-center person. Like, there’s kind of just a cap on how good you can do in this job.
Demsas: You can’t resolve an issue every second. You actually have to have a conversation.
Toner-Rodgers: Right. You kind of have to do it. And they’re maybe close to the productivity frontier in that setting. So that’s one thing.
And I think in materials science, this is just not the case at all. Like, this is just super hard, and these are very expert scientists struggling to come up with things, is one thing. And then I think the second thing is that in the call-center setting, AI is going to give you some suggestions of what to say to your customer. And it’s probably not that hard to kind of evaluate whether that suggestion is good or bad. Like, you kind of read the text and, like, All right, I’m gonna say this.
And in materials science, that’s also not the case—where, like, you’re getting some new compound. It’s very hard to tell if this thing is good or bad. Many, many of them are bad. And so this kind of judgment step, where you’re deciding whether to trust the suggestion or not, is very important. And I think in a lot of the settings where we’ve seen productivity compression, this step is just not there at all, and you can kind of out-of-the-box use the AI suggestion.
Demsas: So do you think a good heuristic is if AI is being applied to a job where there’s a right way to do things that we kind of basically know how to do, or there’s very little sort of experimentation or imagination or creativity necessary to do that job, that you will see the lower-skilled, the less-experienced people gain the most? And then when it’s the opposite, when a lot of creativity is needed, high-skilled people are going to get the most out of AI?
Toner-Rodgers: Yeah, I think that sounds true to me. And I think maybe one way I’d put it is it’s something about the variation and the quality of the AI’s output that’s very important. So even in materials science, I’m not sure that, say, in three years or something, the AI could just be incredibly good and, like, 90 percent of its suggestions are awesome, and you’re not going to see this effect where this judgment step is very important.
So I think it really depends on the quality of the AI output relative to your goal. And if there’s a lot of variation, and it’s hard to tell the good suggestions from the bad, that seems to be the type of setting where we’re seeing the top performers benefit the most.
Demsas: And I assume that with this tool at this company, like, when they come up with successful materials, they’re feeding that information back into the model. Did you observe that the tool was getting better at providing more high-quality suggestions over time?
Toner-Rodgers: Yeah, so they’re definitely doing that. There’s definitely some reinforcement learning with the actual tests. Like, I think over this period, I don’t see huge results like that. I think, relative to the amount of data it was trained on initially and the previous test results that went into the first version of the model, it’s just not that much data. But I think as these things are adopted at scale, we could absolutely see something like that.
Demsas: If that sort of reinforcement learning happens, do you think that that increases the likelihood that AI kind of pushes us down the same sorts of paths? Like, so you get kind of path dependent because you’re basically telling the model, Oh, good job. You did really good on these things,and then it becomes trained to sort of do those sorts of things over and over, and it gets less creative over time?
Toner-Rodgers: Yeah, I think that is definitely a concern. And I think something that people are thinking about is maybe there’s ways to reward novel output, per se. Because I think in these settings, one thing that’s helpful with novel output, even if it’s not actually a good compound, is that you learn about new areas of the design space. And even getting a result that’s very novel and not good is pretty helpful information. So I think rewarding the model for novelty, per se, is maybe one kind of avenue for fixing that problem.
Demsas: So this paper and this field, in general, kind of reminds me of some of the findings in the remote-work space. We had Natalia Emanuel from the New York Fed on the show, actually on our very first, inaugural episode. And you know, we talked about her research on remote work, and one finding that she has is that more-senior people are more productive or have higher gains of productivity when they’re able to go remote, because they stop having to mentor young people, and that is a drain on their productivity in person. They’re having someone younger than you kind of ask you questions, interrupt your day and, like—I’m not saying they hate the job—but that takes away from your ability to just work and not have to focus on other things.
And I wonder if AI becoming the sort of “bouncing off” buddyof scientists, rather than, like, you’re turning to your less-productive lab partner and just kind of tossing out ideas or talking. Instead, you’re sort of engaging with this AI tool, and that’s what you’re using to sort of figure out new methods and materials. Does that change science to become less collaborative with human peers, and does that have those knock-on harms, where maybe these most-productive scientists are getting better, but the less-productive scientists aren’t able to actually get the learning necessary to improve their own productivity?
Toner-Rodgers: Yeah, I think that’s super interesting. And I think a general question about these results are, like: What does this look like in the longer term?
I think something that might absolutely be true is: These people who are very good at judgment might have gotten good at judgment by designing the materials themselves in the past, and this is kind of where you got that expertise. But going forward, if the AI is just used, maybe new scientists that enter the firm never get that experience and maybe never have the ability to get the judgment. And so that’s one reason you could see different effects in the long run.
In terms of the specific question of collaboration, I think that’s something super interesting. I don’t have, really, evidence on that in the paper, because I don’t see good data on how much scientists are communicating with each other. But something I’m very interested in is: We have some scientists that are good at judgment. Like, could they teach whatever that skill is to the people who are worse? And I think one way to get at this, which I haven’t done yet, is: If you have a teammate who’s very good at this task, do you somehow learn, over time, from them? And I think that would be very interesting to look at.
Demsas: And you mentioned, like, how does someone become a high-productivity scientist, and that requires you doing this on your own, potentially. And I wonder—companies, whether they will have the incentive at all to invest in this long-term training when there are these sorts of short- and even medium-run, huge benefits they could get. I mean, you’re talking about massive increases in patents and new technologies they’re able to operationalize and commercialize, even. And if that’s the case, even if everyone knows that there’s this long-term cost to science and to scientists, who is actually incentivized to make sure this training happens until we’re already kind of in a bad place where a lot of technology has stagnated?
Toner-Rodgers: Yeah, I think that makes a lot of sense. Like, there’s kind of a collective-action problem where you don’t want to be the one that’s doing all the training in the short run while all your competitors are, like, coming out with all these amazing materials and products.
Demsas: And then poaching all your people.
Toner-Rodgers: Exactly. I think that’s definitely a concern. But also more generally, I do kind of have some confidence that organizations are going to be able to adapt to these tools and find out new ways to either train scientists for these things, kind of as they’re using them, or be able to, in the selection process for new employees, find predictors of being good at that this new task. Because, in some sense, what we’re saying is that these new technologies are changing the skills required to make scientific discoveries, and I think we’ve seen a long history of technological progress that’s done exactly that—like, changed the returns to different skills—and firms have adjusted to that.
Demsas: What I want to ask you about next is about the survey you did about the scientists’ job satisfaction. Can you tell us about that survey?
Toner-Rodgers: Yeah. So the goal of the survey was just to see both how scientists use the tool and then whether they liked it—how did this impact their job satisfaction?
And so after the whole experiment was completed, I just conducted a survey of all the lab scientists. About half answered. And one thing I found is that, basically across the board, scientists were fairly unhappy with the changes in the content of their work brought on by AI. So what they say is that they found a lot of enjoyment from this process of coming up with ideas for compounds themselves, and when this was automated, their job became a lot less enjoyable. So they say, like, My job became less creative, and some of the key skills that I’d built over time, I’m no longer getting to use.
And I think one thing that’s very striking is this is true both for the scientists that saw huge productivity improvements from AI, as well as the lower performers. And so we really see that it’s not as much dependent on productivity. I also ask, kind of, Well, you’re also getting more productive. Does this somehow somewhat offset your dissatisfaction with the tasks you’re doing at work? And it does somewhat. But overall, I find that 82 percent of scientists report a kind of net reduction in job satisfaction.
Demsas: I mean, that’s kind of depressing, right? Obviously, if you’re told, like, Oh, your work is having a big impact on the world and maybe making life better for people who are sick or who need renewable energy, or whatever it is, that can feel good. But if your day-to-day just sucks, you can imagine there’s gonna be some attrition, right?
Toner-Rodgers: Yeah, absolutely. Because yeah—one thing sometimes people say when they hear this result is, like, Well, scientific discovery is very important. Maybe these new materials are gonna be used by millions of people. Why do we really care about these scientists and how much they’re enjoying their job? But I really think it could have important implications for who chooses to go into these fields and the overall kind of direction of scientific progress. So I think it’s very important to think about these questions of well-being at the subjective, individual level for that reason.
Demsas: I feel like it’s really difficult for me to kind of weigh out what actually happens in the long term here, because I could imagine that the types of scientists who went into these fields were selected for people who really, really enjoyed the creativity aspect of figuring out new materials. Whether or not they’re productive at doing that, like, that’s just the kind of thing you’re selecting for.
And I would analogize it to someone who’s really excited about coming up with new recipes. And I’m someone who likes—I don’t like coming up with new recipes, but some of my favorite recipes are ones where I saw a New York Times Cooking recipe, and then I change some things about it. And as I’ve cooked it a bunch of times, I’ve tweaked some things, and I’ve come up with something that’s sort of my own, sort of already existing. And I can imagine there are a lot of people like that and that the skill of discernment does not necessarily correlate with the skill of loving to be creative.
So you could see shifts happening in the field, right, where the types of people who go into materials science change, and these scientists go do something else where they’re able to be more creative. And you mentioned that a lot of them are thinking about taking on new skills. How do you think that all kind of shakes out?
Toner-Rodgers: This really maybe comes back to the question of training. So I think a lot of these people’s complaints were like, Look—I built up all this expertise for one thing, and now I don’t get to do that thing anymore. And you could think that now if we start training people for this slightly different task, which also requires a lot of expertise, of judgment, that that also is fulfilling. And whether that’s true in the long run, I think I’m not sure.
So one analogy that someone said to me is, like, Well, you’re a Ph.D. student. Imagine if, instead of writing papers, you just did referee reports all the time.
Demsas: Yeah. And sorry—can you explain what a referee report is?
Toner-Rodgers: It’s like you’re looking at someone else’s research and saying, like, It’s good, or, It has these problems.
And that doesn’t sound awesome. Like, it definitely takes a lot of expertise to do a referee report, but it’s not why you got into this—like, you do want to come up with ideas. And so I think I’m very uncertain how this is going to all shake out. I do think that part of it really was, like, I got trained to do a thing, and now I don’t get to do it anymore. And I think that part will go away somewhat, but whether this is just fundamentally a worse job, I think it definitely could be.
Demsas: It’s interesting, the way in which we kind of have always thought of automation as disrupting the jobs of people with less-well-compensated skills—so, like, manufacturing jobs, or, you know, now your job is shifting a lot if you’re someone who works at a restaurant. Now robots are doing some of that work. And you know, there’s just been this kind of pejorative, like, Learn to code! sort of response to some of those people.
And it’s interesting to see that, like, a lot of generative AI is actually really impacting the fields of higher-income individuals, like people who are working in heavily writing fields or like legal fields and now, also, science fields. And it does, really, I think, raise this question of just: Will society be as tolerant of disruptions in those spaces as it has been in disruptions in spaces where workers have had less kind of political and social power?
Toner-Rodgers: Yeah, I totally agree. And I think there really is something different about these technologies where they’re creating novel output based on patterns in their training data, whereas before, like, from industrial robots to computers, it really was about automating routine tasks. And now for the first time, we’re automating the creative tasks. And I think how people feel about this and how we react might look very different.
Demsas: Yeah. I came across this quote from the chief AI officer at Western University, Mark Daley. It’s a blog post. He’s commenting on your paper. He writes, “Because AI isn’t just augmenting human creativity—it’s replacing it. The study found that artificial intelligence now handles 57 percent of ‘idea generation’ tasks, traditionally the most intellectually rewarding part of scientific work. Instead of dreaming up new possibilities, scientists may find themselves relegated to testing AI’s ideas in the lab, reduced to what one might grimly call highly educated lab technicians.”
I don’t know if there’s a survey of scientists or whatever, but I wonder here if you see that there’s a kind of a growing pessimism as a result of findings like this and just, like, the experiences many people are having with AI where they do feel like, Hey, the good part of life—I don’t want AI or robots or technology to be taking away the fun, creative stuff like writing or art or whatever. I want them to take away the drudgery the way that, like, laundry machines took away drudgery or dishwashers took away drudgery. I don’t know how you think about that as a shift in how the discourse is happening on this issue.
Toner-Rodgers: Yeah. I think that’s interesting. And I also think, when I talk to scientists, for example, materials scientists that work on actually building the computational tools, like, they’re super excited about this stuff because they’re coming up with ideas for the tool itself and, like, going and testing it and all these things.
Something in this setting is like: This was a tool that was kind of imposed on these people, not something they kind of created themself. And I think that’s maybe something we’ll see, where the people that are actually having input and creating the new technologies themselves might find, like, they’re very happy with the output, even though these tasks are being automated. Whereas people in this setting, where the tool kind of just came in and changed their job a lot, maybe see kind of big decreases in enjoyment.
Demsas: Well, Aidan, always our last and final question: What is an idea that you thought was good at the time but ended up only being good on paper?
Toner-Rodgers: So I went to undergrad in Minnesota. And for background, I’m from California. So the first winter I was there, me and a couple of friends decided it’d be a great idea to go ice fishing.
Demsas: Okay.
Toner-Rodgers: And so we drive up to this lake. And literally three steps out on the ice, I step on a crack and fall through into this frozen lake. So ice fishing for Californians is good on paper.
Demsas: This is like the scene in Little Women where, like, Amy falls into the lake or whatever. What happened? Was it actually dangerous, or did you just immediately pull yourself out?
Toner-Rodgers: Luckily, we weren’t far from civilization. Like, we were near the car, so we ran back to the car.
Demsas: Oh my God.
Toner-Rodgers: And that was the end of my ice-fishing career.
Demsas: I’m glad you learned this early in your Minnesota life and did not get too adventurous. Well, Aidan, thank you so much for coming on the show.
Toner-Rodgers: Yeah, it was great. Thanks so much.
[Music]
Demsas: Good on Paper is produced by Rosie Hughes. It was edited by Dave Shaw, fact-checked by Ena Alvarado, and engineered by Erica Huang. Our theme music is composed by Rob Smierciak. Claudine Ebeid is the executive producer of Atlantic audio. Andrea Valdez is our managing editor.
And hey, if you like what you’re hearing, please leave us a rating and review on Apple Podcasts.
I’m Jerusalem Demsas, and we’ll see you next week.
theatlantic.com