This episode explores the release of GPT-5 and its potential to reshape the legal profession. Co-hosts Jen Leonard and Bridget McCormack evaluate the model’s legal reasoning skills, real-world use in courts, and the backlash over OpenAI’s sudden changes. The conversation then shifts to a major new report by IAALS, the Institute for the Advancement of the American Legal System, that proposes a phased approach to regulating AI in legal services, prioritizing consumers, not just lawyers. As legal tech accelerates and human error remains widespread, this episode asks: should AI be regulated, trusted, or both?
Key Takeaways
- AI as a Partner: GPT-5 shows promise in legal reasoning but still needs oversight.
- Human Error in Focus: AI is exposing just how much error we tolerate in human-led court decisions.
- Consumer-Centric Regulation: IAALS recommends focusing on outcomes for the public, not just protecting lawyers.
- Soft Regulation First: Best practices, data collection, and sandboxes should come before formal rule changes.
- A New Legal Workforce: The future may include legal tech supervisors and community justice workers, not just lawyers.
Final Thoughts
As GPT-5 and other large language models get more capable, the legal profession faces a dual challenge: leveraging these tools while rethinking outdated systems. AI is already in the hands of consumers, often out of necessity. This episode argues for thoughtful reform that expands access, protects the public, and reimagines who delivers legal help in the AI era.
Transcript
Introduction
Jen Leonard: Hi everyone, and welcome to the newest episode of AI and the Future of Law. I'm your co-host, Jen Leonard, founder of Creative Lawyers, here as always with the fantastic Bridget McCormack, president and CEO of the American Arbitration Association. Hi, Bridget. How are you?
Bridget McCormack: I'm great. It's good to see you. I'm happy to be here.
Jen Leonard: It's great to see you as well. And we are here today on release week for the newest model from OpenAI, GPT-5, and we're going to talk all about that today in our What Just Happened segment (where we try to connect for our legal audience what is happening in the broader landscape) before we dive into our main topic this week, where we will talk about a report from the Institute for the Advancement of the American Legal System (or IAALS, for short).
AI Aha! Moments!
Jen Leonard: But we will kick off, as we do every episode, with our segment called AI Aha!'s—the things that we are using AI for in our personal or professional lives that we find particularly delightful or interesting, that we're learning from. So will you do the honors and kick us off this episode? What have you been using AI for?
Bridget McCormack: I will. And I have to say, this gets more and more complicated every time we do this because I have too many things to choose from. I'm not sure they're all interesting, but I do find that it's now crept into so many parts of my life. And I was saying to you recently that I feel like lots of other people in my life are coming along. I've had friends and family members ask for tutorials recently. And once I give them the tutorials, they have the experience of actually using it instead of hearing me talk about using it. And I think it's different.
On GPT-5 release day, I got a text early in the morning from my brother (who, a couple of months ago, I'd given a tutorial), and he was like, "GPT-5 is out today." And I was like, "Yep, it's out today." So that's been kind of fun.
But the one I want to talk about today is: I've been using it to help me think about how to talk about—both in writing (because I'm drafting something that I may or may not publish; I don't know, we'll see) and in presentations—the level of human error we tolerate in courts. And what prompted this (you won't be surprised) is those stories about the two judges that released opinions recently (in the last few weeks) that had hallucinations in them. They were clearly hallucinations. In both cases, the opinions were withdrawn—which is, of course, the right thing to do. And there's been so much backlash and pearl-clutching and, you know, lots of fretting across legal communities about, "Oh, gosh, now we have judges using AI improperly." And that's all fine, but it has steam coming out of my ears because of the "as opposed to what?" question, right? To quote Andy Perlman: if we were considering those examples against a justice system where there was no error, then I want to pile on about how we need to make sure our courts have resources and training and can use AI correctly. (I actually think all those things are true—I want them to have resources and tools and technology, frankly, to help them do their jobs better.) But as somebody who practiced in courts for a long time and worked in courts for a long time, the human error that we tolerate is so significant.
But it's not good enough to just say that; I wanted to figure out how to say it with some robustness for different audiences. So first I said, "Help me with the numbers—how could we measure it?" And GPT-5 went way too deep for me. I actually had to pull it back. I was like, "Okay, that's too many numbers." It was literally coding different graphs and saying, "You can use this graph in a slide that shows the intersection..." I was like, "No, no, no, no, no. Pick one jurisdiction; get the reversal rate. Let's start with the reversal rate—appellate reversals of trial court decisions in federal court." I picked the federal court that Michigan cases are appealed to (the Sixth Circuit), and then had it do the same in the Michigan appellate courts. Then I said, "Let's think about whether that number is a fair place to start."
So that number, by the way—the Sixth Circuit last year, the reversal rate was about 5.6% (if you average the criminal and civil cases; the civil ones were actually 10%). The Michigan Court of Appeals: 21% of cases were reversed, and 11% gave partial relief. That's a stunning number of cases. It did all the data crunching for me, so I just have the numbers, which is really convenient.
And I wanted to now look at the number of opinions where the court did not reverse, but it didn't do so because it was tolerating harmless error, because in an awful lot of cases we affirm a trial court decision even though there's error. But we explicitly tolerate harmless error. I want to add those numbers to the mix next, and it's still working on that.
I then asked it to think about how we would layer on exonerations. Exonerations are cases where, after a case has been fully affirmed through the court process, we later learn that there was a mistake—an error that meant the wrong person had been convicted (somebody who hadn't committed a crime had been convicted). So that's yet another category of cases where even that appellate court reversal rate is undercounting, right?
Then I asked it to think about how to layer on the number of cases that never get appealed, because as you know (and as we talk about all the time), most people in civil cases can't afford lawyers. So they might find a way to navigate their trial court case (probably not that well, because we speak this other weird language in courts). They almost never figure out how to appeal those cases if they don't like the result. It's a really small number of fully lawyered cases that even make it there. So this floor—this rate of error that we tolerate—is a very squishy, soft floor.
And I had it create a bunch of different outputs: some slides, some talking points, some ways to talk about it in writing, talk about it to different audiences. And it really helped me think about how to talk about this in the future. I have not inserted myself yet into this conversation, but as you can tell, I'm a little worked up about it and a little bit ready to—and I now have lots of data and ways to think about talking about it at my disposal.
And that was way too long for an Aha!, and I apologize. How about you? What was your Aha! moment?
Jen Leonard: That's incredible. It's also making me think of a presentation you and I did earlier this year with Darth Vaughn, where he was interested in gathering data on—this is a different point—but the number of times lawyers cite cases for the inappropriate position or the incorrect position in an attempt to deceive courts. And not that we tolerate that, but how many times that goes undetected? So there's all sorts of false statements in briefing before courts and then, as you mentioned, just outright error in decisions made by courts (whether it ultimately affects the outcome of the case or it's overturned). That's super interesting.
I'm laughing at your initial output from GPT-5, though, because some of the reviews compared it to an over-energetic assistant energy.
Bridget McCormack: It's like, "Dude, take it down a notch." (If we're talking to people who don't use statistics...)
I struggled with my AI this week because mine was a little bit in the reverse direction. It was a bit back-to-basics, and I think it's a result of being challenged by a project I'm working on in a way that's uncomfortable for me. I have been trying to use different AI tools that I love to make the project easier for my brain, because the activity is inefficient for my brain—because it's challenging. And by trying to use all of those tools, in the end I was just finding new ways to spin my wheels and not making the project high quality. Eventually I realized that I'd just found all new tech-infused ways to procrastinate.
What I ended up doing was I had to sort of slap myself on the wrist and shut down my laptop, take out a piece of paper, and start writing with a pen from scratch—using my naked human brain. And it made me think of... I feel like we should have a Pee-wee's Playhouse sound every time we say Ethan Mollick's name; he's like the word of the day. But he had a great One Useful Thing blog post, and I think the big takeaway was: write first, think first, meet first. All of these tools are going to come out that make it really easy for us to shortcut a lot of the things that are painful for our brains to do because they're inefficient, but the inefficiency and the investment of energy is the actual point in getting a quality product.
And so that's what I learned from working on this project: I wasted a lot of time and energy trying to use AI to shortcut it, when really I do ultimately want to use AI on this project—but now is not the time to do that. The time to do it is later, when I've drafted it and I want to refine it and make it better. And I can already see how I'm going to do that: by getting feedback on the writing and finding places where I can refer to different parts of the writing in other areas (because that's something I'm not really familiar with or skilled at—like when you read a book and it says, "Remember back in Chapter Three we talked about this..."). I feel like that takes either an editorial team or a skill set I don't have that AI could help me with, among other things. But this was not the way to do it.
I'm glad that I learned that lesson, and I hope it will help me teach other people as they're trying to learn how to use these tools.
Bridget McCormack: Yeah, that's really interesting. I feel like I've had that experience a number of times where I jumped in too quickly, and then I realized, "This is not helping—I need to step back and at least figure out what I think I'm building or creating," in terms of a writing project or a memo or a strategic plan of sorts. You have to at least have done the initial thinking. I don't know what I read recently—did you read this?—the idea that AI is kind of "middle to middle," not "end to end." You really are still valuable at the front and the back. Right? The back we know (check your cites before you file the brief, or check your cites before you deliver the opinion). But the front is also critical—just in framing and organizing and thinking about what it is that you're then going to get the middle-to-middle help with. And I think that's still true. Maybe it won't be true five years from now (I have no idea), but it definitely feels true now.
Jen Leonard: And I think Ethan's written about this (and others have written about it too—I don't want to suggest he's the only one writing about this), but the research on learning and AI is logical if you think about it: you're not making a deep impression in your mind if you're offloading things to AI. And there are lots of places where we can use AI for lots of different things (as you and I talk about all the time). But if you're working on a project where your goal really is to learn about the content and develop new ideas and new connections, you want your brain to be deeply invested in that learning process.
I felt like I had been thoughtful in developing the outline and thinking it through, and then I was like, "Okay, I'm ready to dive in with my buddy Claude, and we're going to workshop this together."
And I don’t know—it just was not working. I was like, “I'm just going to have to come back to you, Claude. I just need to go back to my whiteboard and my mind mapping and my drafting the old-fashioned way.”
So it's almost like my human brain sometimes just needs to be by itself for a little while.
What Just Happened
Jen Leonard: Okay, so on to What Just Happened. This is a big episode for people who are following different model releases—or if you're not following, you might have heard this in the ether. The big model that we've been waiting for all year has been ChatGPT-5: lots of teasers, lots of hype and buildup. So, Bridget, what just happened?
Bridget McCormack: We finally got ChatGPT-5. We sort of knew it was coming—there were a lot of hints that it was coming. And then there was this announcement that they were going to have a press conference at 1:00, and we got the press conference at 1:00. And we got GPT-5, which is OpenAI's most advanced model yet. It does a few things that are new.
The first thing it does is auto-route between the fast-thinking models and the deep-thinking models that GPT-5 has (one of each). It has GPT-5 (Standard), GPT-5 Thinking and GPT-5 Pro (which is research-grade intelligence). So those are the three different models. What it does is, you put in a query and it decides which model it needs to answer it. And sometimes, when it's figuring out that answer, it will switch back and forth between those models to give your query the right amount of attention. That's actually something that I've been wanting for a very long time—my choices used to be that there were like eight different models, and I would sometimes forget that I was in the quick model because the last thing I asked only needed the quick model. But then I'd ask something else and I really needed a reasoning model (or vice versa). I was always like, "Oh, I wish it would just do that for me." And so now it does do that for me, and I personally like that.
The thing that happened though, was they rolled out GPT-5 for everybody. Even free users got GPT-5. The other models disappeared, so you no longer had access to 4.0 or o3 or whatever model you were using and liked before. And that caused a lot of backlash. (I’ll turn to that in a second, but a little bit more on the details of GPT-5.) On the benchmarks—on the legal benchmarks, the Vals study has GPT-5 now as the top model, coming in just above Gemini 2.5 Pro which is just above xAI’s latest model. And it's got PhD-level smarts. If you've read any of the press around it, it's especially good at medicine and science and math. I don't code, but they say it's a lot better at coding. And the reasoning is better. They claim it's better than o3.
It has a very large context window. And it also is a little less sycophantic than 4.0 (at least that's the claim). I think I've witnessed that myself—it's less likely to say, "Oh, what a brilliant question. You're so smart; let's go dig into that together." Instead it just kind of gets to it.
Jen Leonard: Unlike me. (It tells you that after everything you say.) But I genuinely believe that—I think you're brilliant. I'm an older model, though.
Bridget McCormack: That actually works quite well for me. But as it turns out, not everybody loved this change, and there was quite a backlash over the weekend. I felt like Sam Altman and all of OpenAI were in crisis-communication mode. A lot of people really resented losing access to those models that they had gotten used to. Some of that sounded emotional—like they had ended up really liking the conversations they had with one of those previous models. Some of it is more practical: I know people who built specific GPTs on those previous models, and when there's a new model, they don't always work right. It might have helped to have a little notice that, "Hey, we're transitioning these models, so what you built might not work, and let's figure out a solution for that."
There was enough backlash that I think OpenAI actually reverted and now allows paid users to have access to 4.0 again. So I think the question is: is there anything that this means differently for lawyers? I'll be interested to hear what you have to say. I think the answer is no—although, you know, it's better than any previous model on legal work and does better reasoning work. And I, as always, have asked this model (and the latest Gemini model and Claude) some of the same legal questions just in the last week. I do see a significant difference in GPT-5, I really do—I think it's doing more robust legal work. But it's not good enough to trust without checking. You could have the best 1L summer associate or new associate in the world—you still wouldn't file something they wrote as a memo directly with a court without checking it. I mean, there's a reason why lawyers have the training we have and our license to practice: it's so we can do some QC on that work. So I think you can think of it as a great first start, but I still think lawyers need all the same safeguards that they needed with the previous models.
What have been your experiences, and what advice do you have for lawyers on GPT-5?
Jen Leonard: Yeah. I mean, I won't add anything to the model release—I think it was kind of like going from a stick shift to an automatic: before, you were able to move back and forth between the different models, and now it does it for you. In terms of lawyers, for most lawyers I would say don't worry about trying to keep pace with the newest model. All the things that you and I advocate and teach remain the same: experiment; share with your colleagues and others what you're learning, where the limitations are, where the benefits could be. And as you said, all of the ethical obligations to check the outputs over and over again, to not rely on anything that comes out of it—to treat it and supervise it as you would an inexperienced lawyer—remain. As you noted, the benchmark puts GPT-5 at the top of the leaderboard of all the models out there, and it continues to improve. We have those slides that we use in presentations of the Vals benchmarking of human lawyers from back in March, putting human lawyers on a variety of tests at around 70% accuracy. A quick look at the Vals benchmarking accuracy puts GPT-5 at 84.6% on that leaderboard. I'd want to dig into the data and see exactly what the tasks are and how that compares with those earlier lawyer benchmarks.
But as we've said at different points (and to the point of your AI Aha!), as a species we are not going to improve exponentially during our lifetimes—but these models will continue to get better and better and more reliable. So maybe, to give myself a little grace with my AI Aha!, the challenge will be: where will we spend our time with our naked human brains, and figure out where we still want to rely on them, as these models continue to get better and better
And I know that for firms that do have access to legal domain-specific tools, I've seen posts across social media that certain tools built on top of these OpenAI GPT models have already been upgraded to the GPT-5 models. I thought it was interesting that in its press release on its website announcing GPT-5, OpenAI specifically called out legal as a use case that would become stronger as a result of this model upgrade. And they have relationships with different tool providers in our domain. So it's certainly something they're aware of and looking to capitalize on.
Bridget McCormack: Yeah, that's interesting. I'm sure we'll follow this and have more conversations about GPT-5 in many of our future episodes. But for now, I'm two thumbs up on GPT-5 (even though I followed all of the backlash).
Regulating AI and the Practice of Law
Bridget McCormack: But that takes us to our main topic, which is an interesting new report from IAALS (the Institute for the Advancement of the American Legal System). I should say that I am on the advisory board of IAALS—just want to make that clear. That's not why we're talking about this report; in fact, I think you were the one who asked if we should talk about it, and I agreed.
IAALS has this new report on the regulation of legal services—the unlicensed practice of law statutes and how they intersect with AI—is worth spending a little time on. And IAALS did just that: they released this report. So why don't you tell us a little bit about what the report does?
Jen Leonard: Yeah, absolutely. As you mentioned, Bridget, IAALS does independent research and is focused on trying to innovate and advance solutions to make the civil justice system more just. They're based at the University of Denver, and at the end of last year they had a gathering focused on regulating the use of AI in consumer-facing technology.
What I really appreciate about IAALS' approach is that it is trying to move us from a more lawyer-centric approach to a more consumer-facing approach to regulation. And the report talks about how important that is in an era of AI where the technology is moving so quickly—especially because, as you and I both know and talk about frequently, we can't overstate how grave the civil justice crisis in the U.S. is. Most Americans and small businesses can't afford legal help. And you, I, the team at IAALS, and the people who contributed to this report really believe that this suite of technologies offers an unprecedented opportunity to put tools in the hands of those individuals and small businesses that allow them to help themselves access legal services.
So the report digs into some tools that are already doing just that, like Courtroom5, ZAF Legal, Roxanne, and Rentervention, and it gives some nice use cases and case studies. It also talks about some states that are starting to use AI-powered chatbots to help self-represented litigants (like Nevada and Arizona).
It addresses all of the risks that lawyers are well aware of—the things that we're concerned about: hallucinations, biases, privacy concerns, those kinds of things. But it really takes a positive perspective, I think, and asks: how do we take this moment to expand access to legal services and also recognize (as the report does) that the public already turns to the internet before AI emerged onto the scene, people regularly turned to the internet for legal help because they don't have access to lawyers. And we can assume that they're already turning to AI—not only because ChatGPT is here (and we don't yet have great data on ChatGPT usage for legal help), but also because Google now has AI summary overviews as part of its search results.
So we know that people were using Google to search for legal help before, so they are necessarily seeing AI-generated summaries. And the report points to some research that Margaret Hagan from Stanford presented during the gathering, showing that trust in AI is growing because of its affiliation with big tech companies—especially Google (since the public trusts Google and internet searches generally). But it also showed that people trust human legal support over AI (or technology generally), especially for more sensitive matters like domestic violence proceedings or divorce, (in family law contexts, criminal law contexts—things that have real human stakes). So the question that the group posed was: how do we take that tendency to trust human legal support and leverage it to play a positive role in helping to educate the public and, where appropriate, figure out what regulation might look like in an AI era?
Maybe that means partnering with technology providers, maybe creating some soft regulation. But at the outset, it means raising important questions that we need to answer and gathering data around how people are using AI. I view this report as really structuring a framework around those questions. And it concludes with a proposed solution in two phases.
Phase one it describes as "soft power" actions. Those actions include issuing best-practice guidelines for developers, deployers, courts, and consumers; launching regulatory sandboxes so we can test tools in a safe and monitored environment (which several states—Arizona, Minnesota, and some others—have already done); collecting more robust data on consumer outcomes; exploring partnerships with big tech companies to improve legal AI; and using prosecutorial discretion and safe harbor policies to give innovators breathing room.
(I'm editorializing here, but I'd highlight this because the tendency is often to be overly regulatory and go after anybody perceived to be engaged in the unauthorized practice of law without asking whom we're protecting—whether we're trying to protect lawyers or actually trying to expand legal services.)
Then the second phase of the recommendations, after we've done that and learned more, raises the question of whether possible formal regulation should follow. Some of that formal regulation could include adjusting unauthorized practice of law (UPL) and ethics rules if needed, creating certification systems for AI legal tools, developing a risk-based regulatory framework (with different rules for high-risk cases versus low-risk cases), and pursuing uniformity across states.
One issue the report talks about—and that I also think is very problematic—is that there are so many different jurisdictions and so many different UPL rules and frameworks, while technology is boundaryless and constantly changing. We just talked about GPT-5; most groups I talk to don't even know what a reasoning model is, and the reasoning models are almost a year old. So how can we try to apply all these different frameworks to a technology that's moving so quickly and doesn't respect the boundaries of different jurisdictions?
So that's my summary of what the report said. I think centering the consumer and the public is the right approach. It offers a lot of thoughtful questions and ways to think about the future. I'm curious about your thoughts on the conversation it tees up for the profession.
Bridget McCormack: I think it's a great contribution to a conversation that we all need to be having. And it feels obviously timely—people are using the technology whether the law allows them to or not. They’re doing it. So whatever the gray area is, I don't think there are many people who are going to slow down and say, "Gosh, I can't afford a lawyer, but I'm not going to go get an answer from GPT-5 (which I can now do on a free version), even though that might help me a lot in figuring out how to navigate this problem.” It’s just not going to work that way.
So it makes sense for us to at least start having a conversation about whether some of our regulatory architecture is anachronistic. And if it is, maybe we want to think about what it looks like to get from here to where we should be. And IAALS is just exactly the right organization to start a large conversation about that. I'm really glad to see it.
It comes at a time when we're seeing other developments that feel like they're bubbling up to allow this conversation to break through. A number of state supreme courts recently have piled on to what seems to be a growing trend of allowing community justice workers—people other than lawyers—to provide some legal information, advice, and help in certain kinds of cases where the consumer is not going to have access to lawyers. And that makes a lot of sense. D.C. did it recently, bringing the number of states that have done that into double digits. So that feels like it's moving in one direction.
The Conference of Chief Justices released (I don't know if it was a statement or something) in support of that kind of program. And I think just yesterday, at the ABA annual meeting, the ABA passed a resolution encouraging states to create community justice worker programs (which really means encouraging state supreme courts, since state supreme courts are the ones that regulate the practice of law). And the National Center for State Courts has a partnership with (Thomson Reuters Institute) TRI, (full disclosure: I'm on the advisory committee of that project and I chair the Access to Justice subcommittee), and we're also working on a paper about UPL and AI. That will probably be out within a few weeks as well.
So those are some pretty important institutions starting to say, "It's probably time that we rethink this part of our regulation." I still struggle with what the dance is between state legislatures and state supreme courts, right? Because UPL statutes are statutes. So what's to stop a legislature from just repealing that statute? Say the Pennsylvania legislature shows up for work on Monday and says, "This UPL thing doesn't make sense anymore because technology is now doing all the kinds of things the statute says only licensed lawyers can do. So this doesn't make sense," and they just repeal the statute. Does that give innovators in Pennsylvania a wide-open lane to figure out how to build products that really help consumers who need legal information and legal advice but can't afford lawyers?
Maybe, but maybe not—because the court ultimately has inherent power to regulate the so-called practice of law, even in the absence of a statute. So it's still a gray area that needs discussion.
I'm really happy that IAALS spent the time on this and released a report that has lots of practical solutions for courts, bar associations, and innovators in jurisdictions that want to try to get from here to where we need to be.
Jen Leonard: Absolutely. And just to state the obvious—and I know, as you mentioned, there are so many different groups working on these solutions—it seems to me that now the disconnect (and it was present when we had Google, but now it's a little further along) is in the workflow. People will now be able to say to ChatGPT or Claude or whatever the model is, "Here's my legal issue," and then increasingly the models will be able to do some basic research for them, help them figure out what next steps are. But then there's the gap between that and the actual activity that needs to take place. And it seems like that's where the gap will remain. I assume that's where these community justice workers and various paraprofessional roles (different from lawyers) come into play. The idea is that if you have ChatGPT draft something for you that you could file with a court—and this goes back to the conversation we had with the Garfield Law founders in the UK—because they have an API connected with their courts, whereas we still have a physically routed system. You still need an actual three-dimensional human to help you through the system, which to me feels like a whole different problem to solve (one that could have a more seamless solution if we had more of a tech/API integration approach here). But you would still need humans.
Bridget McCormack: Yeah, you still need humans. Again, it comes back to the "middle of the middle" instead of end-to-end. If it's available—if there's a community justice program or... I've been trying to think of what a legal tech justice supervisor could be. I'm trying to envision whole new categories of legal workers, what it would take to train them, and then how to scale a business model that has nothing to do with the traditional law firm model.
What would that look like? How could they make it work? Who might hire them? You could imagine different community organizations thinking, "Yeah, we'll take a subscription to your legal justice supervisor program." And if you have help both at the front end and in helping the person frame the questions right—like, "I think I'm about to get evicted, but I haven't had heat for three months. What can I do?"—they still probably need help asking the right question. I’m asking GPT-5 and Gemini sophisticated legal questions, because I want to see the difference in the work that they do. But I've been a lawyer and a judge for a long time; if I didn't have that training, I'm not sure I would know the right questions to ask. So I still think there's a front-end and a back-end role for humans. It's just not clear to me why those humans necessarily need to be licensed lawyers for all issues and for all people, right?
So maybe some combination of this community justice worker groundswell and rethinking the role of technology and UPL and regulation of the practice of law could create some breakthroughs—which would be amazing.