Can AI Replace Human Judgment? Medicine, Law, and the Future

Summary

In this episode of 2030 Vision: AI and the Future of Law, hosts Bridget McCormack and Jen Leonard unpack groundbreaking insights from a JAMA study that compares the diagnostic capabilities of ChatGPT with human doctors. Drawing compelling parallels to the legal profession, they explore how generative AI could reshape legal research, improve efficiency, and challenge long-standing notions of professional judgment and subjectivity.

Through interdisciplinary insights, Jen and Bridget highlight the parallels and contrasts between medicine and law—such as the objective nature of medical diagnostics versus the subjective complexities of legal outcomes—and emphasize the urgent need for innovation, ethical considerations, and transparency in both fields.

The conversation delves into personal AI aha moments, clarifies essential terms like zero-shot and few-shot prompting, and considers how AI adoption in medicine might provide valuable lessons for the legal field. They discuss overcoming algorithmic aversion, increasing transparency, and rethinking professional roles as technology advances.

Key Takeaways

  • Generative AI Enhances ProductivityAI tools like ChatGPT can significantly reduce time spent on tasks, acting as a thought partner for lawyers and doctors to increase efficiency and creativity.
  • AI Outperformed Doctors in the JAMA StudyA recent study showed that GPT-4’s diagnostic accuracy surpassed both individual doctors and doctors using AI as support, highlighting the potential for AI in high-stakes decision-making.
  • Algorithmic Aversion Limits AI Adoption: Experts often resist AI insights, preferring their own judgment even when evidence supports AI recommendations—a challenge for both medicine and law.
  • Medicine vs. Law: Objective vs. Subjective: While medical diagnosis is inherently objective and evidence-driven, the legal system is more subjective, relying on interpretation and context.
  • The Importance of AI Terminology: Understanding terms like zero-shot prompting (no examples) and few-shot prompting (using examples) is essential for maximizing AI's potential in professional tasks.
  • Transparency Builds Trust in Legal Systems: Greater transparency in judicial decision-making, akin to evidence-based medicine, can improve public confidence in the legal system.
  • Legal Education Lags Behind Medicine: While medical schools debate AI integration, legal education has been slower to adapt, missing opportunities to prepare students for AI-powered practice.
  • AI Could Bridge Gaps in High-Volume Legal Cases: Areas like eviction and consumer debt cases—where there’s little access to legal help—could benefit from AI’s efficiency and consistency.
  • Rethinking Professional Roles: Doctors and lawyers may need to let go of certain cognitive tasks and shift their focus to areas where human skills like empathy, strategy, and judgment are irreplaceable.
  • Interdisciplinary Learning Drives Innovation: Drawing insights from medicine’s AI advancements can help lawyers innovate, improve fairness, and embrace ethical, evidence-based approaches to legal practice.

Transcript

Jen Leonard: Hello, everyone, and welcome to the next episode of 2030 Vision, AI and the Future of Law. I'm Jen Leonard, founder of Creative Lawyers, your co-host, alongside the wonderful Bridget McCormack, who is president and CEO of the American Arbitration Association. On this podcast, Bridget and I explore the world of AI.

And in particular, we try to connect the dots for people in our profession in the legal industry and field with developments that are happening more broadly in the technology realm. And on this episode in particular, in other fields that might be analogous and also somewhat different, but from which we can start to learn and think about implications in legal.

In every episode, we have three different segments. We always start with our AI Aha!s. So these are the moments that Bridget and I have experienced since our last recording where we have used generative AI to do something that feels particularly cool or magical to us. Then we offer a couple of definitions every episode to get used to the new vocabulary that we're all learning around generative AI. And then we dive into our main topic.

And today, we are going to explore a new study that came out that focuses on ChatGPT and its ability to diagnose different medical conditions and comparing its capacity to do that with doctors and with doctors who used AI. So we think this will be a really interesting podcast. I'm really excited to explore this with Bridget. And we can start out with our AI Aha!s for the week. Bridget, it's so great to see you. Would you be open to kicking us off with your Aha!?

AI Aha! Moments: How Generative AI Saves Lawyers Time and Enhances Creativity

Bridget McCormack: Absolutely. Great to see you too. I'm really excited about this conversation today because this study has captured my attention almost exclusively since I started reading about it. And it seems like it has captured a lot of other people's attention as well. But my AI Aha! moment came from thinking a lot about this study. I asked Claude to help me design some randomized control trials for legal diagnosis and for legal decision making. I'm actually designing two different randomized control trials.

I don't have any intention of running them myself right now, but I've had a lot of people ask me what are some things we could be doing — people on law school faculties and curious lawyers — for ways in which they could contribute to this conversation that's going to be moving the legal profession, I think, in such a productive way. Designing these randomized control trials has been fun, and Claude is just an excellent partner. And not only has Claude helped me think about all of the ways in which you would design the study and the ways to do it ethically, but I then had Claude evaluating the two different models to see which one could be done more efficiently and which one would probably grow confidence in whatever substantive results turned up.

And so it's been really fun. I'm going to try out these ideas on some friends on law faculties in the coming weeks. I'd love to get some of these underway. I think this is a great project for law schools trying to figure out what all this means, and I'm not aware of anyone doing it. So it's been fun to build them. How about you? Did you have a fun one this week?

Jen Leonard: I did, and it was a little bit unnerving, I guess, but very cool. I was trying to create a script and take some of the presentation content that I've been delivering to law firms over the last year and convert it to a different form so that I had it codified somewhere. And I was taking screenshots of slides that I've been using and uploading the screenshots to ChatGPT 4.0.

And I gave it a little bit of context and said, "This is from a presentation about the law firm business model." And I think I used the phrase "take a stab." As I was writing it, I wondered if it would know what "take a stab" means. I said, "Can you take a stab at drafting a script of what this is?" And the picture on the slide was just a triangle that said "partners," then "of counsel," then "associates" at the bottom. And I think on the other side, it said "apprenticeship model."

And it generated this whole script that is almost exactly what I say in presentations. It talked about the background of the traditional law firm business model. It's based on a leveraged labor model where junior associates generate revenue through billable hours and the partners own the firm and enjoy the benefits of that revenue. Then it incorporated what an apprenticeship-based model means. It was completely spot on.

So then I kept adding more and more slides because I was trying to get a script for the whole thing. And on almost every slide, with very little information but a little bit of background, it got maybe 90% of the content right. The adjustments I was making were really minor, nuanced adjustments that related to the topics I wanted to cover that wouldn't have been obvious. So that was a magical moment for me.

Bridget McCormack: Yeah, what's so exciting about that is the obvious time saved, right? Being able to start with a 90% finished product. So you're just taking it from almost finished to all the way finished. But the next step is, once you're satisfied with it, feeding it back into the model as a full script and saying, can you think of places where there are additional topics I should be addressing or what questions our audience members would have about each of these points that I make? It can be such a great thought partner in iterating the next edition of the amazing Jen Leonard presentation, which is an amazing presentation.

Jen Leonard: Oh my gosh, that's way too kind of you. But yes, totally. I added: "I deliver presentations to lawyers who tend to be a very skeptical audience, so poke holes in some of the things I'm saying here and then help me come up with assertions to counter those or respond to them or incorporate the things I think are worthwhile into the presentation." And lately I've talked with lots of lawyers in law firms who are really starting to use GPT technology for their own marketing and business development. For example, if they've had a successful outcome in litigation, they upload some of the publicly available records of the case and articles, and then saying, "Draft a blog post that I can use and distribute to my clients as a client alert." It just saves so much time and has such a huge benefit for tasks like that. Very cool.

Bridget McCormack: Yeah, time and creativity, right? I feel like over and over again, it's the time and the creativity that feels so valuable to me in whatever the use case is.

Definitions: What is Zero-Shot Prompting? Essential AI Vocabulary for Lawyers

Bridget McCormack: So definitions this week—we're doing two that I have confessed to you feel to me a little bit like we didn't really need these terms, but that's often true in legal. They're like legal terms where I'm thinking, well, couldn't we just say the English version of that? I don't know why we need that Latin term. These are a little bit like that to me, but still you hear them more and more, so it makes sense to define them here. And so we're going to do "zero-shot prompting" and "few-shot prompting." Why don't you start by telling us what we mean by zero-shot prompting? I don't think I've ever used that term, actually.

Jen Leonard: No, and I think this all falls under the umbrella of the groan-worthy legal pun "Res ipsa loquitur." We really did not need words for this, but "zero-shot prompting" sounds cool. You could say it to your friends and have them think you know a lot about AI.

Zero-shot prompting is essentially when you ask a large language model to perform a task and you don't give it any examples of what you're hoping to get out of it. You just ask a question and rely on the natural language interface. The recommended best practice is to use this if you have a very straightforward task or if, for whatever reason, the model you're using has length limits for inputs—you might use zero-shot prompting. So contrast that for us, Bridget, with few-shot prompting.

Bridget McCormack: Now that you understand zero-shot prompting, I probably don't need to tell you that few-shot prompting is when you include in your prompt a small number—a few—examples to show the model how you want it to perform the task. They serve as demonstrations or context, and you use it when you want the model to behave in a certain way. In fact, you might be looking for less creativity, but rather more contextual, precise feedback on a particular thing.

People also say that this is useful for kinds of questions the model might struggle to interpret. You don't have to waste as much time going back and forth with the model. You just give a couple of examples and then it will do exactly what you asked. I love the creativity of the models, so I'm much more likely to zero-shot prompt than few-shot prompt. But I think there are occasions where I do few-shot prompt. (And I don't usually announce that I few-shot prompt, but maybe now I will.)

Jen Leonard: Yeah, that sounds really super easy to incorporate into everyday language. So now that we have learned words that we don't need to know—because it's pretty obvious how you do these different things—we'd love to move into the main topic, which, as Bridget suggested at the beginning, we're both really excited about and find super interesting.

Main Topic: AI in Medicine: How Generative AI Matches Up Against Doctors

Jen Leonard: And the topic for this week relates to a new study out in JAMA, the Journal of the American Medical Association.

A study came out and has been covered across various publications, including the New York Times (which is where it first came to my attention), about using ChatGPT to diagnose different hypothetical patient case presentations and comparing those results with the results of doctors working on their own to diagnose these cases, and doctors working alongside the AI. There were some surprising results. So Bridget, would you walk us through the study and what it found?

Bridget McCormack: Yeah, absolutely. The researchers were from Stanford, Beth Israel, and UVA. They recruited 50 physicians—26 attending physicians and 24 residents. They were family medicine doctors, internal medicine doctors, or emergency medicine doctors. The methodology was that they were randomized into two groups: one had access to GPT-4 in addition to whatever conventional resources doctors use (which I'm sure is WebMD pretty much—I don't know, that's what I use in my WebMD practice).

Actually, now I use ChatGPT and will never use anything but ChatGPT after this. The other group of physicians used only those conventional research resources they were used to, and they tackled six different clinical vignettes within 60 minutes. It was a timed test, basically, and they were supposed to give their diagnostic reasoning for each of the vignettes. Then there was a third group that evaluated the LLM's performance alone using the same design. I'm not sure if I was clear there: there's a group of doctors using ChatGPT and regular resources, a group of doctors using just their regular resources, and then ChatGPT on its own, just doing what ChatGPT does. Then they measured the outcomes for diagnostic reasoning, scored them, measured the time spent per case, and then the final diagnosis accuracy. These were graded by blinded experts. So in some ways a simple randomized control trial, but a pretty interesting one.

And the results were fascinating. The LLM-augmented group and the non-LLM group performed pretty similarly. There was only a two-point difference in their grading, which was not statistically significant. That was true for their diagnostic abilities and also their time efficiency. The most striking finding, and the reason everyone's talking about it and people are writing about it in the popular press, was that the LLM alone performed significantly better than both groups of physicians. What's fascinating about that is the physicians who had access to ChatGPT-4 obviously had the benefit of ChatGPT-4's thinking, but they disagreed with it in many cases.

That was a really fascinating and somewhat surprising finding to me. Maybe within the medical field I shouldn't be surprised, because I tend to think doctors are prone to reading evidence and changing their mind if the data presented should cause them to. But in fact, they were more prone to overrule what the large language model found and to disagree with it. So it's a really interesting study in that it has me thinking all kinds of curious thoughts about how to apply it to other disciplines. Lots of other people are talking about it. Our Hard Fork hosts were talking about it last week. I think Ethan Mollick wrote about it. What were some of your thoughts or reactions, Jen?

Jen Leonard: First, one thing that I thought was interesting about the study—aside from the findings, which were fascinating—is that the ChatGPT model the researchers used is a year old. It was GPT-4.0 Turbo, I think, and they made the point that there's an academic lag in research and advancement of the models. So we could imagine that the results might be even more pronounced now. A lot of my thoughts about the piece, as you mentioned, were informed by the Hard Fork podcast (the New York Times tech podcast hosted by Kevin Roose and Casey Newton). They had on one of the authors of the study, Dr. Adam Rodman.

And we're always thinking about the implications for legal, even as I'm listening to him describe what the implications are for doctors. (I believe he's an internist.) One of the things he talked about was how much he enjoys the cognitive challenge of being an internist—having a patient present with something fairly complicated that requires him to really engage his brain—and how he's wrestling with, and others are wrestling with, what an LLM's outperformance of human doctors means for that part of the work.

I thought he took us on an interesting journey. We have this professional pride in the expertise and intelligence that we bring to it. But if the ultimate question is, how do I help patients have healthier lives?—his conclusion was, I would rather use technology that's more accurate and use my abilities elsewhere.

For me, I was thinking about lawyers and how emotional some of the reactions have been across law to LLMs—at first denial, then an assertion like, I add this edge to my writing that is so unique (and that's what my client wants from me), to really zooming out and asking the question: why do we exist as lawyers in the first place? What would be the comparable coming-to-terms in our profession of letting go of a little bit of the cognitive work that we really enjoy and value, in service of better services and quicker resolution of legal issues for our clients? That was one of my takeaways.

The other piece was what you mentioned. Ethan Mollick calls it "algorithmic aversion"—if you're an expert, being averse to something that disagrees with you that feels like a machine that would not be as intelligent as you are as an expert.

Bridget McCormack: I know Ethan Mollick calls it "algorithmic aversion," but I bet it's similar with senior people in a field—senior doctors and residents, right? That's something I've seen over the course of my career in legal. Somebody who's seasoned and has been around a long time and is sure that there's a right way to do things can be dismissive of a suggestion from a more junior member of the profession. I say that as someone who's been on both sides of that equation.

I still, to this day, remember a specific case when I was on the faculty at Michigan. Students were working on this case in a clinic and they suggested a legal theory for a motion that I was extremely dismissive of. I was like, no, that would never work—nobody does that. My reaction was literally, "nobody does that." And in fact, nobody had done it. And they were also right. It was true that nobody did it, and it was also true that when I stepped back and got out of my own way and read the rule, they were basically proposing a motion that a Federal Rule of Procedure allowed us to bring. Sure enough, they were right. Maybe the rule didn't intend to allow that new motion, but the words of it did, and it worked.

So I think it's an interesting reminder about experts' instinct to be overconfident in their own judgment—whether it's a technology or a junior person in your profession who has a new way of thinking about it or is using technology in a new way. It's a pretty good lesson for all of us in legal. As new lawyers come out of law school and are very comfortable with this technology and don't have that confidence of like, "I know the right answer," they'll probably be more likely to use the technology in more ambitious ways, I guess.

Jen Leonard: Yeah, it feels a bit like a corollary to the idea of the "curse of knowledge." Once you know something so deeply, it's very hard to go back with a beginner's mind and think about how you might explain it to somebody else or do it differently. In addition to technology, this also involves junior lawyers. One of the things I talk about a lot is monodisciplinarity—the fact that all of our systems and organizations have to be owned and led by lawyers means that we miss out on insights from other disciplines. I think this dynamic very much unfolds: there might be a great idea coming out of finance or marketing or professional development or court administration that others dismiss because it's not from people who are trained in law.

Bridget McCormack: Yeah, or like business school—like other ways to structure a legal business, which is not something that law schools teach or law firms spend much time studying. So there's basically one model, and it might not be the best one for what's coming with this technology.

Jen Leonard: I think I mentioned to you that I had coffee recently with someone in an executive MBA program—to your point about education (and then we could talk about Dr. Rodman's interview in the medical education context). This person in an EMBA program told me that all of her professors are using chatbots both to serve the students when the professor is inaccessible (creating a personalized GPT with course resources that students can ask questions of) and also in class, asking the chatbot for answers to questions that students ask where there's no obvious answer. That's both modeling and substantive use of the technology. I think we don't always see that in legal education or in professional development once we enter practice. Maybe that's a good segue to hearing from you what Dr. Rodman talked about in the medical education realm and how doctors are thinking about future doctors.

So one thing Dr. Rodman said in that interview was that there's a fierce debate going on about how to educate future medical students. I was a little surprised to hear that. One of my kids is in medical school right now and has a lot to say about the memorization that feels like the first two years of medical school—which feels quite anachronistic to kids his age. If you're in your mid-20s, you're pretty used to using technology to get information. So he sort of assumes he'll still have technology when he's a doctor.

So having to memorize every single thing probably isn't going to be the most important skill. He feels like there are probably other skills that might matter more. That was interesting to me—that they're actually having a debate about it. It's not one I think is going on in legal. I mean, you and I both have relationships with lots of people at law schools, but neither of us is full-time in legal education anymore. So I'm not sure... Do you think that's happening in legal education?

Jen Leonard: I think for sure there are conversations happening about GPT technology and what to do about it. Maybe one difference—and this is an overgeneralization; there are certainly law professors who practice law full time as well—but my sense from interacting with medical faculty is that they're actually on the ground seeing patients during the day in addition to their teaching obligations. So they may be seeing this unfold in the professional landscape in ways that support their immediate strategic response in the educational realm—something that might be slower to reach law school faculty who are not practicing day to day and aren't seeing this infuse into their work as readily.

Bridget McCormack: Yeah, that's a pretty important difference. It felt to me like he's come to terms with the fact that it's going to change, right? The practice of medicine is just not going to be what it was when he went to medical school and throughout most of his practice. It's not going to have the same cognitive part, is what I think he said. And if that's right, are people as attracted to going into medicine? Do you want to go into medicine if it doesn't involve having this set of cognitive skills that really can make a life-or-death difference for the people you treat, but rather harnessing technology to figure out how to make sure your patients are as healthy as they can be? Maybe that's not as interesting—it's not the same, at least. So who knows if it's going to be less attractive to future potential medical students.

Jen Leonard: One thing I thought was interesting on that point (that maybe makes me a little more optimistic in the legal realm than in the medical realm) is, like you, I tend to glamorize medicine as compared with law in lots of ways. But whenever I talk with doctors or medical professionals, they talk about the difficulty of working within a highly regulated industry that's trying to squeeze efficiencies out because there's so much pressure from insurance companies—which of course legal doesn't have. I've always used that as a main reason why we don't innovate more in law: because there is no third party driving efficiencies.

But Dr. Rodman made the point (and I wrote it down) that he called medicine "a system obsessed with efficiency." His concern was that these advancing technologies might try to squeeze out the tiny bit of humanity left in medicine. He talked a lot about what you and I frequently talk about, which is the elevation of human skills in a world of advancing technology. In legal, I actually think there are exciting opportunities because of how inefficient we are in our profession compared with medicine—how many more people and businesses we could serve, and how the ability to elevate those human skills opens up a huge swath of opportunity in my mind if we could spend a little less time on the things that we think of as cognitively challenging (but that may be depriving many more people of strategic partnership, empathy, and problem solving on their side). I took a little bit of hope from that.

Bridget McCormack: For lawyers, yeah, I agree with you. I feel like there's so much further to go in figuring out how to do legal business efficiently that there's more good news than bad, in a way. But I'm not sure I agree with him about how technology would play out for doctors. But again, what do I know? I'm only a WebMD, not an MD MD. It seems to me that if you can rely on technology to do a lot of the things that might have taken you time before, then maybe you have more time to talk to your patient about what you've learned, right? First of all, you'll be able to learn a whole lot more in far less time, and then you can spend your time talking to people—which isn't to say that computers won't eventually be able to talk to people as well. But I still think that'll be a place where a human translating what you've learned from your AI-agent or nurse-practitioner partner is going to be incredibly valuable. Maybe not as interesting, again, to some people who want to go into medicine, but I think it's still going to be important. But I don't know—Dr. Rodman hasn't asked me yet what I think.

Jen Leonard: Actually, to your point about WebMD, another thing that I thought about when I was listening to his conversation (with respect to lawyers) is: doctors have had some experience with patients having access to information and diagnostic tools, even though those tools are terrible, right? WebMD tells you you're going to die from everything you put into it.

They have some experience with patients coming in and saying, "This is what the internet says I have," or "I was on these Reddit forums." I'm curious to see how lawyers respond when their clients come in and say, "I've already talked with ChatGPT." Before, it was like, I got this letter or this thing and I have no idea what it means (if you're not a corporate counsel)—can you help me?

I don't know how that plays out when you're a solo or small practice and your client comes in and says, "I've already talked with ChatGPT. It explained to me what my legal options are. I want you to pursue this one." How do you think lawyers will respond to that?

Bridget McCormack: Yeah, I mean, I think there's a learning curve there. Entering the profession right now is such an interesting and, I think, exciting time because you're going to be able to build a new muscle for how to respond to that kind of interaction with potential clients. Doctors really have gotten good at it. People go into medical appointments with their WebMD diagnoses all the time—I’m guilty of it like everybody else. Doctors are really good at gently reminding you what the internet does and doesn't know and using what you found to help get you across. It's better communication than lawyers maybe have had to do, but I think new lawyers will have an advantage again because they'll figure out a lot of this in a way that'll make them really good lawyers, in my view.

I do think it's interesting—and this is definitely not a conversation we need to have today; we're really getting off on tangents, I apologize—but the idea that WebMD has been such a... I'd love to go look up the revenue of WebMD. Why is there no WebJD?

You can go to Google and ask it questions, but I guess this is one important difference between medicine and law, right? In medicine, if the evidence points to a particular drug making a difference for a particular disease, there's not a difference of opinion in New York versus California versus Delaware about which response we're going to use. Whereas in law, there's potentially a different rule of law in each state—50 different answers. The fact that there's no legal singularity means it's harder to monetize a WebJD. Although why couldn't you do WebJD New York, WebJD Pennsylvania? I'm not sure. Maybe because lawyers are better at prosecuting UPL (unauthorized practice of law). I'm not sure. There's also the unlicensed practice of medicine. Do you know what the answer is?

Jen Leonard: I think my thought is related to what you just said about UPL. I've always thought that—I'm really going off on a tangent here—there are three areas where the systems are really broken for the people that need them. There are lots of areas, but the three I think about are healthcare, higher education, and legal. In healthcare and higher education, we have imperfect bridges—student loans and insurance—that allow us to stay connected with the systems, even as frustrating and imperfect as they are. In legal, we've never had that bridge. So most people, I think, don't necessarily feel like they lack access to the legal system, because they never had it in the first place. It’s not a matter of replacing their conversation with a lawyer, because they don't have a lawyer in the first place. I don't know whether that's the answer or not. I think the federalism piece is a huge part of it too, which was another thing that I thought about in the overall study and its applicability to law.

And we'll get into this when we hear about how you engaged with an LLM to understand the study. But the idea that doctors, as you always point out, are evidence-based—and if the evidence is pointing to a certain diagnosis and treatment that is generally accepted, you're not going to... I mean, you get second opinions and all that, but the goal is to find the right diagnosis and treatment. In legal, so much of our work is arguing over what the law actually is in the first place that I could see it being trickier to execute. This goes back to some discussions we had about Adam Unikowski's framework for finding simpler fact patterns to use at the outset, because there are complications in different cases. But you talked about this in our prep for the episode, so I'd love to hear what you did after you read about the study, because it didn't stop there.

Bridget McCormack: I did what I often do: I went to my friend Claude and asked how easy it would be to build a similar randomized control trial in law. Where are the pressure points where the differences might make it more complicated and you'd have to think about how to do it in law? And, as always, Claude came up with lots of areas to give a little more thought to. As you know from my AI Aha! moment, I pushed through all of these and I am designing randomized control trials for anybody who wants to put these together.

Some of these differences I found fascinating—not only because they would create challenges for developing RCTs in law, but also because they shine a light on some of the inefficiencies and vulnerabilities the legal system has more generally.

Just as one example (I'm happy to talk about as many of them as we have time for, or as few as we have time for): because medicine often is responding to biology and treatments, it's far more quantitative in orientation than law is. Randomized control trials can provide evidence by minimizing bias and are ideal for assessing interventions and getting quantitative results. Whereas so much of legal practice is subjective and context-based—every lawyer's first answer is, "Well, it depends," right? We joke about it. So much of the currency in justice, like fairness and persuasion, is subjective. That makes it far harder to quantify, and designing a study is a little more complicated. But for me that's a reminder about the weakness of that subjectivity—or I think the potential weakness of that subjectivity.

I mean, wouldn't it possibly be better if we had more objective measures of fairness and justice and fewer subjective measures of those? Not only so that we could design trials to learn how to do what we do better, but so people could understand outcomes and plan their lives and build their businesses. The fact that you can get different subjective answers from different decision makers in the legal system feels more like a weakness than a strength. But maybe I'm overreacting. I don't know. What do you think about that?

Jen Leonard: I think it goes to faith in the system, which is that if you're zooming out and thinking about the "why," you want people to have a certain amount of predictability and coherence to the structures that govern their interactions with society. And again, to call back to Adam Unikowski's blog piece about using this in legislative work and considering all the different unintended consequences of different definitions for words or different ways to read language—I think, to your other point, it's what makes law (for lawyers) very intellectually stimulating, but also makes it very difficult compared with medicine, because language can be susceptible to so many different interpretations.

And we as lawyers are trained to spend so much time turning those different interpretations over and learning how to build arguments based on different ways to spool out a definition. But should the primary objective really be the lawyerly ability to play with language to get different results? Or should we, to your point, develop more standardized approaches? I don't know what that would look like. I think it's trickier with language, for all the reasons you suggested.

Bridget McCormack: It absolutely is. I think it's a lot harder to eliminate ambiguities in language, but as Adam showed, if you were to analyze ambiguity—in his case, it was a sentencing guideline or a statute before it was codified—you might have far fewer disputes. And fewer disputes is probably a good thing overall; I think that's a benefit to society generally.

I don't know. It does make me think that subjectivity—absolutely a lot of what really successful lawyers are good at is figuring out the best version of subjective language for their clients' needs or disputes—but maybe that's just not a great use of the best lawyers' time. If we didn't have to do that, if we had fewer subjective statutory language or if subjective language in court opinions weren't as common and everybody understood what the rules were... So that was one difference.

Jen Leonard: Yeah, and I think it goes back to the training piece of what activities and skills we’re prioritizing or including in the toolkit we provide to new lawyers. It's essential to understand how to argue different positions and unpack subjectivity when it's really unclear. But it also seems useful to help students and new lawyers think about developing consensus around language in advance of creating laws, so that we can help people navigate lives that are not as complicated as "it depends on what this other lawyer thinks, and then it depends on how the judges interpret what the lawyers are arguing."

Bridget McCormack: All of which, by the way, reminds me that every judge's neural network is a black box—which is one of the critiques of large language models, that we don't understand how they get to their answers. And I'm always thinking: we don't understand it in lots of cases.

Jen Leonard: Hungrier judges have worse sentences; that suggests there are also problems under the hood with judicial decision making. And I know we're running short on time, but maybe I could ask you one more question about something you talked about with Claude—because I think both of us feel that this is a place where the legal profession ties itself in knots in ways that lead it away from coming up with innovative solutions. And that's ethics and protecting the client. So I know you had conversations with Claude about this medical study and how we would replicate something similar in law.

And one of the responses was that law makes it difficult ethically because the stakes are really high—including personal freedom or financial well-being—and just taking clients and randomly assigning different legal strategies and seeing what the results are is not fair or ethical to the clients. So I'm curious: what was your response to that assertion?

Bridget McCormack: Yeah, this was actually one I surfaced myself when I was reading it. I was like, but how do we do it? Because you can't really say, we're definitely going to go to trial with Client A (who's charged with this particular offense) and Client B we're going to plead guilty, and then we'll see how their sentences turn out. Obviously that's not ethical. We can't do that. I think there's plenty of room to develop problem sets like the authors of this medical study did, using closed cases or hypotheticals—past cases where you have all of the information with all of its bespoke details (as lawyers like to say).

I think that gets around that particular ethical barrier. It's not the same, probably, as when you have a large clinical trial with thousands and thousands of people literally with the same diagnosis and you can try out different treatments. But I think you can come pretty close, especially in some of the high-volume dockets where right now people have zero legal help. I think you could learn the most in places where there are high-volume, similar cases—that's true in many eviction cases, which have similar elements, similar potential defenses, and similar outcomes.

Obviously, everybody is a person and they're different, but there are a lot of similarities across those cases, and there are a lot of them. The same is true for consumer debt cases. That's also a place where you have the most to gain. I do feel you could correct the ethical problem by just using last week's docket and feeding it into an LLM to build the right kind of cases for your study, and then you wouldn't have that problem.

I think the second thing that Claude surfaced when I was talking about this client consent and ethical barrier issue was something we just talked about briefly, which is how judges and juries make important decisions and how cases come out. It's not like in medicine where a doctor says, "OK, here's what's wrong with you. Here's how we're going to treat you, and then we'll see how you do." There is another decision maker in the mix, and we don't have any real window into how they make those decisions. Judges should explain their decisions—there's a norm of written decisions so people understand how they came to them—but there's all kinds of things going on in their neural nets that are not understandable. In the jury system, it's actually inappropriate to ask the jury how it decided what it did.

So that's a bit of an issue if you're designing a randomized control trial and you're trying to measure outcomes that involve judges and juries. I think there's all kinds of potential in cases where the judicial decision is pretty minimal.

For example, if a judge issued a default because the person didn't show up, or the judge found the elements of the case met because there wasn't any defense—I think there are ways to account for that. But more importantly, I wonder if this isn't just another reminder that we don't always know how judges or jurors make decisions, and that's something the legal profession might want to think about. Maybe a little bit more openness and sunshine into how decisions are made, or some more automation of how certain decisions are made.

Obviously not all of that would be feasible, but it feels to me like it's something the legal profession should spend some time talking about. In other words, each one of these conversations where I was trying to understand the differences between how you could do a similar trial in law that these folks did in medicine taught me a few things. First, that the differences seem less robust than I would have thought at the outset; and also that some of them are things we probably want to think about if we get to rethink how we build an operating system from the ground up. I really am having fun with this conversation and I think it's going to continue now that I'm building these Claude artifacts around what this might look like.

Jen Leonard: I love it. I can't wait to hear how your randomized trials go and what we learn as a result. It seems to me that, while we've had predictive analytics for some time and you can analyze different judges' decisions, this just feels like such an engine for doing that at a higher level with more capacity for specificity. And I think—to go back again to compare it with the doctors (and obviously Dr. Rodman is one doctor among many, many doctors who have different opinions about this)—the idea of being open to evidence and thinking about if you're a judge and you oversee a particular docket, learning more about the insights that an LLM can surface about your decisions, unpacking the similarities and differences across your cases, and whether you are being fair and equitable across those cases seems to me to be something a judge should want to know.

And we seem to be on the cusp of being able to do that, which I think is exciting not only for the judges who want to learn, but for all of society to have greater confidence in the outcomes of those decisions.

Bridget McCormack: And transparency into them. You know I'm a big believer that more transparency into how our judicial systems and alternative dispute resolution systems operate—and how they work—is better for growing public confidence in that really fundamental institution.

Jen Leonard: Maybe on a future episode, we could consider how to create incentive structures to do that, because I think litigators (particularly in private practice) have lots of incentives to use the technology for predictive analytics for their clients and to get good outcomes. But how do we create drivers in society that incentivize the same for the public?

Bridget McCormack: I love that conversation. I'm going to go talk to Claude about it right now.

Jen Leonard: Just to clarify, JAMA is the Journal of the American Medical Association, and Claude is a large language model that you can access on the internet. I believe their Frontier Model 3.5 "Sonnet" is free. So go test it out, try it out, and engage in some of the exercises that we do.

And we'll look forward to seeing you on the next episode of 2030 Vision, AI and the Future of Law. Until then, it’s been lovely to speak with you as always, Bridget, and wishing everybody well.