The Future of Generative AI in the Court System & The Importance of Transparency

Summary

In Episode 6, we dive into "The Future of Generative AI in the Court System & The Importance of Transparency." Join hosts as they explore how generative AI can revolutionize the judicial system by assisting in drafting opinions, generating ideas, and proposing alternative solutions to legal challenges. 

We break down key terms such as alignment and attention mechanisms, and discuss Judge Kevin Newsom's use of ChatGPT in a concurrence, which has sparked debate on the role of AI in judicial decision-making. Transparency and public confidence emerge as critical factors in the responsible use of AI in the courts. As this technology is still in its infancy within the judicial system, we emphasize the need for ongoing discussion around best practices and ethical considerations.

Key Takeaways

  • Generative AI has the potential to be a valuable tool in the judicial system, aiding in tasks such as idea generation and alternative solutions to legal problems.
  • Judge Kevin Newsom's use of chat GPT in a concurrence sparked conversation and debate about the appropriate use of generative AI in judicial decision-making.
  • The use of generative AI in the judicial system is still in its early stages, and further exploration and discussion are needed to determine best practices and ethical considerations.

Transcript

Jen Leonard: Welcome back, everybody, to 2030 Vision: AI and the Future of Law. I'm your co-host, Jen Leonard. I am thrilled, as always, to be joined by Bridget McCormack, President and CEO of the American Arbitration Association. On this podcast, we talk about all of the fascinating things unfolding in the world of generative AI—we try to keep pace with it, and we share some of our insights and impressions along the way, in the hope that it's helpful to everyone in the profession as you're also trying to get your arms around generative AI and the future of law. Today's episode is our second in a two-part series on courts and generative AI, so we'll talk specifically about judicial use of generative AI for drafting opinions.

But before we jump into that topic, as we always do, we'll start with two other segments. Our first is what we call our Gen AI Moments, which are the times since our last recording when each of us has used generative AI and found it to be really stunning, almost magical, and very helpful in our day-to-day work. Then we'll share a few definitions that are common in the world of large language models, and then we'll dive into today's substance. So, Bridget, I'd love to hear about your Gen AI moment since our last recording.

Gen AI Moments

Bridget McCormack: Absolutely. It's great to be here—I'm excited for today's conversation. I have to give credit for my Gen AI moment to my collaborator, Zach Abramowitz. Zach and his firm, Killer Whale Strategies, have been working with the AAA—like you and your firm, Jen—to help us think about what dispute resolution is going to look like in the future. He started getting really interested in Frances Keller, who was the founder of the AAA in 1926, and he fed her writings—books and other articles—into a generative AI platform and built a Frances Keller bot. Now we can ask this bot questions about some of the things we're thinking about doing in the future.

Frances Keller was fascinating. She worked on immigration reform, women's rights, criminal justice reform... in the 1920s, she was an out lesbian who saw dispute resolution in courts as imperfect in lots of ways for lots of people and lots of disputes. So she founded an arbitration organization—at the time it was really disruptive. Asking Frances Keller today what she would think about some of the new ways disputes can be resolved is absolutely fascinating, and also lots of fun. We're going to keep working on it, and Zach's going to keep improving it, because we have a 100th anniversary coming up. I think it'll be a fun tool for our community to play with.

Jen Leonard: That's really cool—it's like echoing through history. Have you asked about anything interesting, or received any answers that were particularly surprising?

Bridget McCormack: Yeah. As you know, arbitration is a mature form of dispute resolution—and mediation is, too—but mediation has sort of been outpacing arbitration in terms of people's interest and adoption. Asking Frances Keller what she thinks about those trends, and whether there are benefits to mediation for certain kinds of disputes or certain people and organizations, has been really fascinating. She really viewed alternative dispute resolution generally as a path for peace—peace in society. It's really interesting to hear how she would think about new ways to achieve more tranquility, more peace. It's really fun.

Jen Leonard: Really cool. My example is pretty lame after that! But I was reading a book for fun, and the book kept referencing this phrase "the plastic hour," which I hadn't heard before. I don't know if you've heard that term. Basically, it's the concept that there are times in history where a convergence of technological, social, and political forces opens the possibility for change in how structures and systems function. You have choices as a society about which direction you want to go in. I thought it was such an interesting phrase, and I hadn't heard it before.

I was working on a speech about a similar issue, with a similar theme, and I wanted to invoke the term—but I wanted to make sure I wasn't using it inappropriately or that it didn't have some kind of checkered history. So I used Perplexity to research the background. It turns out the phrase is attributed to different people, ranging from Karl Marx to some Jewish philosophers, and a writer for The Atlantic—George Packer—used it a few years ago in the wake of the pandemic to describe us as being in a "plastic hour." Using Perplexity was a helpful way to quickly do some research and also verify that it was appropriate for me to use. I found that helpful and interesting, and I love using Perplexity to grab articles really quickly. So that was my Gen AI moment for this session. So, do we want to jump into a couple of definitions for this week?

Definitions: Attention Mechanism & Alignment

Bridget McCormack: Yeah, let's do two definitions. Why don't I start by asking you to teach us what "attention mechanism" means in the context of generative AI?

Jen Leonard: So, as I understand it, an attention mechanism was the breakthrough that led to the era we're experiencing now. It came out of a 2017 research paper from Google called Attention Is All You Need. It's a little different from the way we think about attention colloquially, as I've tried to understand it myself and explain it to others.

It's the equivalent of being able to look at a library full of millions of books and, instead of having to read them sequentially and then draw connections at the end, the large language models can simultaneously consider all of the tokens—all of the words—across those millions of volumes and make connections in real time. I almost envision looking at a stack of books and being able to instantly see that this one connects to that one over here and another one over there, almost magically. That's what makes it so efficient at drawing connections and predicting the next word or token. And that was the breakthrough—the precursor to the release of ChatGPT, which then accelerated our understanding of how to use LLMs powered by the attention mechanism. Is that consistent with how you understand it?

Bridget McCormack: Yeah, although I confess I barely understand it myself. I've heard so many people talk about that Attention Is All You Need paper, and I've never attempted to read it—I feel like I probably should. Have you read that paper?

Jen Leonard: I have not—I should, for sure. But I have used generative AI to help me come up with analogies to describe what an attention mechanism does. One analogy it gave me helped with the library example, but it also said it would be like having an old-school detective go by themselves to a really complicated crime scene and try to jot down everything they see and then draw conclusions—versus having a thousand detectives at the same crime scene, all drawing connections with each other at the same time. You can imagine how much more quickly and effectively you'd solve the crime. So that was interesting.

Bridget McCormack: Yeah, that's great. I love that analogy—that's super helpful. 

Jen Leonard: I definitely would never have thought of it on my own. So, Bridget, another definition we hear frequently is "alignment." What is alignment?

Bridget McCormack: Alignment is the goal of making sure AI systems behave in ways that are aligned with human values, human ethics, and human intentions. It's a foundational concept in AI safety and ethics, and people who work in those areas talk about it regularly. There's always a concern that AI systems will act on their own in ways that aren't in sync with human objectives, interests, and values. So AI companies always have safety teams working on alignment. Do you think I got it right?

Jen Leonard: That's how I understand it. And I think it's a good definition to lead into our conversation about judges and courts—whom we trust to uphold our values and interpret people's rights and obligations. 

Main Topic: Thomson Reuters Survey on Judges' Views of Gen AI

Jen Leonard: Just to kick off that conversation, there was a recent Thomson Reuters report that surveyed court administrators and judges for their impressions of generative AI tools. The results were pretty interesting.

They were similar to the rest of the profession in that the primary sentiment was neutrality—a sort of "wait and see" approach to whether generative AI is actually helpful, useful, and ethical to use. But unlike the rest of the profession, court administrators and judges were more likely to report that their second reaction was concern. By contrast, for law firm lawyers the second reaction was optimism and hope that these tools could be used in positive and helpful ways. I thought that was interesting.

Two other things came out of the report that I noted. First, the most frequently cited reason for skepticism or concern was that people were worried that the first-person accounts of litigants—and the language litigants use to describe their experiences, which is so essential to determining people's rights and obligations—would be lost if they substituted computer-generated language for human language. And second, people just aren't sure how they should be using generative AI tools in the work they do for the courts. So, as someone who spent a very long time leading a court (the Michigan State Supreme Court, of course), what are your reactions to those findings?

Bridget McCormack: I'm surprised that 18% of courts report having any training. I've said before that I think courts are at such a disadvantage because they rely on public funding—and public funding is a political process for all of their resources. AI tools are expensive, scarce, and hard to come by even for the most well-funded organizations. So I feel like courts are at a real disadvantage in trying to keep up and give their teams the resources to learn how to use these tools.

To me, the concern judges expressed is probably more a reflection of that disadvantage. It's just very, very hard for courts to keep up with this technology given how quickly it's moving and how hard it is to figure out how to use it well. But I don't think they'll ultimately stick with the concern about losing the litigants' voice. There's plenty of room to hear a litigant's voice and also improve your operations. I feel like that particular concern can be addressed—if the resource issue can be solved.

Jen Leonard: Yeah, agreed. We've talked in the past about our shared concern that the private sector will move with these tools much faster than public institutions will—courts especially.

Bridget McCormack: One tool that some courts are using—I don't know if we mentioned this in the last episode—is Clearbrief, a legal AI product. I know you've met Jackie Schafer, who's the founder of that company. This product focuses on the record—the facts—and it's a great tool for courts. Jackie has had really good success with the few courts that are willing to use it; they've seen what a huge benefit it is for their operations.

When more and more products like that are out there—products that courts can see will make their operations better, smoother, and faster—I predict we'll see a growing comfort level with the technology.

Jen Leonard: Yeah, Clearbrief is a great example of an early emerging tool that makes use of all the rich data we have in the profession. In a lot of other spaces, we're still grappling with data and how to package it in a way that's useful for LLMs. But Clearbrief actually uses the courts' dockets and information, as I understand it, and it can support both courts and litigators. Is that accurate?

Bridget McCormack: Yeah, that's exactly right. If you're willing, you can even share it among all of the parties in a dispute. What it does is pull together all of the facts and then show you where each fact comes from in the record.

I always think there are so many opportunities for litigation to be less adversarial. If we could all agree up front on which facts are not in dispute and which ones are, it would save so much time and energy—so we could focus only on the points where people are going to disagree. Clearbrief feels to me like one of those tools that could really bring some efficiencies to the adversarial process.

But one judge on the Eleventh Circuit made some news earlier this summer in a concurrence. Judge Kevin Newsom used ChatGPT to try to determine the ordinary meaning of a term in a statutory interpretation case. Can you walk us through what Judge Newsom did there and why it got so much attention?

Main Topic: Judge's Use of Chat GPT Sparks Debate in Disclosure within Courts

Jen Leonard: Sure. The case was an insurance dispute, and part of the holding hinged on whether an in-ground trampoline qualified as "landscaping" under the insurance law at issue. Judge Newsom had been consulting dictionaries and working with his law clerks to figure out if a trampoline would be considered landscaping, and he couldn't find any definition or interpretation of "trampoline" that was useful.

So he turned to ChatGPT. In his concurrence (which is quite lengthy), he explains how he came to consult ChatGPT. He found ChatGPT particularly helpful in resolving the question. He actually asked ChatGPT, "Could an in-ground trampoline be considered landscaping?" And the AI answered—quote—"Yes, installing an in-ground trampoline can be considered a part of landscaping. It's a deliberate change to the outdoor environment, often aimed at enhancing the overall landscape and usability of the area." That's the response he got from ChatGPT.

In his concurrence, he talks about how helpful that was, and he viewed it as a more comprehensive take on how the word would be used in common parlance—something he prioritizes in his opinions. So it caused a lot of buzz. He was very transparent about his use of ChatGPT and even shared his prompts and the AI's responses in his concurrence. Of course, it stoked a lot of conversation across the profession. A few outlets—Law.com in particular—surveyed other judges who are on the cutting edge of using technology and asked for their thoughts. Those responses ranged from really celebratory (commending Judge Newsom for being so forward-thinking and showing the profession how it could be used) to other judges who were less certain that LLMs are appropriate—partly because of the biased data underlying their training, and partly because of the limitations and the hallucinations that LLMs can produce.

Bridget McCormack: Yeah, I thought it was fascinating. I'm really glad he did it and started the conversation. I feel like it was an important conversation starter—not only about the ways this technology might be useful in helping figure out ordinary meaning of terms in regulations or statutes or even constitutional text—but more than that, how judges think about those subjects.

I mean, you read a lot of judicial opinions where there'll be a majority opinion and a dissent, and both sides of the opinion are super confident that they have determined the correct meaning of a particular word. And it always seemed to me like—obviously it's a really hard question if really smart people, you know, who have the same training are coming to different conclusions about what this particular term means. And, you know, one carries because it's a majority rule industry, but obviously, the tools that judges have are imperfect, right?

So dictionaries are a typical one judges look at. And there is lots of criticism by scholars that judges can cherry pick dictionaries depending on the particular outcome they're looking for. I'm not saying that those are fair critiques. I'm saying that those are critiques that you hear pretty regularly.

And I think that if you asked people who didn't go to law school to read a judicial opinion where dictionaries were used to define a term that people sort of had an intuitive sense of what the term meant, they would be scratching their head and they might say, well, we kind of know what that means and here's what it means.

And Judge Newsom's use of the technology felt to me like the kind of opinion that a person reading it who didn't go to law school might say, yeah, that actually makes sense. And I can accept that decision because it actually rings true with me.

It evokes this earlier—there was sort of, you know, five to seven years ago—there was this move by a number of courts, and the Michigan Supreme Court was one of them, in an opinion drafted by one of my colleagues, Justice Brian Zahra, where courts were using corpus linguistics. Which is a method of linguistic analysis that uses large collections of texts—not legal texts, just texts—called corpora, to study patterns and usage and structure of language.

In a way, I feel like corpus linguistics in judicial decision-making was the precursor to ChatGPT in judicial decision-making. There are important distinctions. The texts that are in corpora are knowable, so it's totally transparent. We don't know all the training data in LLMs. And obviously, the technology is different—it’s statistical analysis in corpora and neural networks in LLMs. But there were—in addition to the Michigan Supreme Court—the Utah Supreme Court issued at least one, but I think more than one, opinion using corpus linguistics. And Justice Breyer did it in one opinion.

I think the thinking there was: it’s a bigger set of texts to help figure out what a particular term means. And we can actually—by looking at the way normal people were using language in whatever the relevant period was when a particular statute was passed—we can get closer to what ordinary people would have understood the term to mean. And I was persuaded that it was a useful tool at that time. So I think I'm similarly persuaded that this technology is at least another tool that judges could—and in some cases probably should—use. Transparently again, telling their readers and colleagues what they used and what their prompts were, to help understand terms that are not perfectly clear.

I don't know—it’ll be interesting to see if other judges take up Judge Newsom's invitation. I think that’s how I read it—it was an invitation to start at least thinking about the ways in which this technology might help us with understanding ordinary meaning.

Jen Leonard: Yeah, I had not been familiar with that movement, so thank you for introducing it, because it does sound very similar. It sounds a little bit like a slightly more laborious way to do it, but like you said, a little bit more transparent, but similar to LLMs. That’s really quite fascinating.

Bridget McCormack: Yeah, it’s all going to get more interesting too. I do think that the ways in which these LLMs can make decisions is going to be something we’re all going to be continuing to talk about. I don’t know if I mentioned this to you, but so many appellate section educational programs are focused on this topic—this topic of when is it appropriate for generative AI to be used by courts and by judges. So I think it’s a topic that we’re going to...

This is not going to be our final episode on this topic. I feel like we’ll be coming back to this one. And I recently got to talk to Adam Unikowsky, who is a partner at Jenner and writes a really interesting newsletter that I encourage people to go look at. And he fed—on this topic, I should say, he’s an appellate litigator. He practices in the U.S. Supreme Court, he practices in federal appellate courts and state supreme courts.

Full disclosure: he appeared before me once in the Michigan Supreme Court, although I didn’t remember that when I talked to him about his work. In this context, he reminded me afterwards—and he did a great job.

But he did a great experiment—really interesting experiment—where he fed all of the merits briefs from all of the cases argued in the U.S. Supreme Court this term into Claude Opus. That was the most advanced model when he did his experiment. And he asked Claude to write a three-paragraph opinion deciding each case. And in almost every case, it wrote an opinion consistent with the way the U.S. Supreme Court decided the case. And it did an excellent job.

In a few cases where it didn’t write an opinion consistent with the U.S. Supreme Court, Adam himself was persuaded by Claude’s opinion more so than the U.S. Supreme Court’s opinion—where it was a close question and justices disagreed.

What was even more interesting was he went beyond that and then asked Claude to come up with a different solution to a particular legal problem in a particular case that neither party suggested. And he said, “Will you come up with another way, a new test that neither party suggested, for how we could decide this hard legal question?” And it’s fascinating what Claude came up with.

And Adam Unikowsky—who again is a very accomplished appellate lawyer—said, you know, Claude is absolutely capable of doing the work of, you know, the fanciest appellate law clerks we have right now, and maybe even a justice.

You know, that’s not saying that it doesn’t still have some of the problems we’ve talked about many times in the course of this podcast. But it is a stunning experiment he ran very quickly, very easily, that really tells me there are going to be more and more uses for this technology—for courts, for arbitrators, for mediators, for anybody trying to come up with creative solutions for problems that people, organizations, and businesses are facing. It’s going to open up all kinds of new possibilities, which I think is a really hopeful thing. I’m excited about it.

Jen Leonard: That’s really cool. I mean, it’s one of the use cases now that’s already live and ready to deploy—is its persuasive ability. And I’ve had this experience too, outside of crafting judicial opinions (which I don’t do for fun or otherwise), but I’ve had it sort of help me through some interpersonal dynamics, where I’m trying to write an email or come up with a solution, and it will sort of create an argument—or not even an argument, it’s not a dispute—but a position and a perspective that I hadn’t thought of before because it’s sort of objective in that way.

It’s looking at—I mean, I’m anthropomorphizing it now—but you know, it’s not held back by the emotions that you have as an individual or your limitations from your perspective. And it will say, you know, “In this framing, both parties to the transaction or whatever would have a shared responsibility, and therefore you can make the case that it would be more beneficial to do X.” And it might be something that I hadn’t thought of before. So I could see litigators at trial and appellate courts, and judges, using it in just the way that Adam’s suggesting.

Bridget McCormack: Yeah, I agree completely. I mean, one of the things that it does well—everyone will say this technology does well—is idea generation. We do a lot of idea generation at the AAA around hard questions and hard problems, but now we always have the help of our LLM friends. And it is just really stunning how many ideas it can come up with, and how quickly.

And so it sort of makes sense that in trying to decide a hard question that two parties disagree about, it’s going to be an excellent new tool in thinking of new solutions that nobody has brought to the table. I really do think there’s an exciting future for this technology in resolving disputes.

Jen Leonard: Very cool. You know, I think it was clever that Judge Newsom used a concurring opinion to sort of lay out how he used this. It didn’t impact the ultimate holding, but as you said, was an invitation to think about how LLMs could be used.

And on the heels of all the stories about it, I met a judge at a conference recently and I asked him—because I was then curious about judges using generative AI in their opinions—we talked about whether judges would need to disclose that they are actually using Gen AI. And people in the conversation had different perspectives on that.

One perspective being that you would have to disclose usage so that appellate courts would know when the human judge created content and when a Gen AI created content. It all makes me wonder, though, if we’ll look back on this era as sort of like the dawn of all of this—and it will become so infused in what we do that it will seem sort of laughable down the road that we thought we would have to disclose things like that. But I wonder what your perspective is on that topic.

Bridget McCormack: Yeah. I understand the judge’s reaction that you spoke to, even if I don’t share it. It is important in appellate litigation for the appellate court to have the information that the trial court had when it made its decision, if it’s going to be able to evaluate it properly. But I’m not sure that it makes sense that disclosing your use of the technology in the course of your decision-making as a trial judge really follows from that.

I mean, you know, trial judges don’t record what work exactly their law clerks did to help them with coming up with a particular decision in the case. They don’t disclose what previous work of their own that they might have done in some other context and they’re using for this particular case. Like they’ve already researched this question in the context of a different case. They don’t disclose Westlaw or other legal research tools—which more and more have this technology built into them.

So I don’t know that I agree with the conclusion, even though I understand where it’s coming from. I think your prediction is correct—that five years, ten years from now, we should replay this conversation and it’ll seem a little naive at the front end of all of it.

Jen Leonard: Yeah. I think it’s a great note to end on because it sort of touches back to that Thomson Reuters survey about education and literacy. Because I could see some places—like where Judge Newsom is outlining how he used it—for it to be really relevant, right? If it’s something that the case will hinge on and you want to know where the definition came from or where the concept came from.

But to your point, these tools will be infused in a lot of the day-to-day tools that we use. And I can imagine appellate law clerks drafting opinions and workshopping the language and then plugging it into the opinion without the judge even knowing—in the same way that they would review a clerk’s writing anyway and not know exactly how they came up with the words that they chose, but will make edits according to what they want to edit.

But all of that requires a lot of literacy and training and conversation. So it’ll be a really interesting road ahead.

Bridget McCormack: Just to add one more point. I guess my instinct is in favor of as much transparency as possible in this liminal period we’re in. I sort of feel like we’re in this in-between moment. There’s going to be a new normal—when this technology is in all of our iPhones and all of our laptops and it’s our partner all the time.

But I think along the way, and especially for institutions like courts, or even in alternative dispute resolution, the more transparent we can be with the people we serve about how we’re using the technology is going to be valuable. Whether it’s required by appellate courts or not, I think it’s going to be valuable to building public confidence in what we do.

Jen Leonard: Yeah. Well, this certainly won’t be our last conversation about courts using generative AI. There are lots of interesting dimensions to it, and we’re grateful to Judge Newsom for getting the conversation moving and advancing it. And we’ll look forward to future conversations as new issues emerge on using generative AI in the judicial and court system.