Episode 26 of 2030 Vision: AI and the Future of Law delivers an eye-opening exploration of artificial intelligence's expanding role in legal education, professional practice, and daily productivity. Hosts Jen Leonard and Bridget McCormack dive into a new study by the University of Maryland revealing ChatGPT’s top performance on law school exams and unpack what that means for legal training, employment, and AI integration into everyday work.
Summary
This episode centers on a remarkable development: OpenAI’s latest reasoning model—referred to as O3—earned multiple A+ grades on law school final exams, a significant advancement over prior versions. The hosts dissect how this milestone reshapes the landscape of legal education.
Jen and Bridget also shared their personal “AI Aha!” moments. From crafting interactive workshops to navigating nonprofit funding opportunities, their experiences highlight how AI is no longer a mere tool, but a constant collaborator embedded into professional workflows.
The conversation also covered broader developments in the AI ecosystem, including Meta’s multibillion-dollar investment in Scale AI and OpenAI’s integration across productivity platforms. These shifts underscore the increasing reach of AI and its strategic importance across sectors, including law.
Key Takeaways
1. ChatGPT Earns Top Marks in Law School
OpenAI’s O3 model not only passed but excelled in a range of law school exams, outperforming human students in some cases. This is a sharp improvement from earlier versions, which scored in the B–C range. The study also revealed an unexpected result: when fed a professor’s extensive lecture notes, the model’s performance declined—suggesting that more information can sometimes reduce accuracy, a phenomenon with clear parallels to human cognition.
2. From Tool to Teammate
AI is now central to how legal professionals plan, analyze, and communicate. Bridget described how ChatGPT helped her design a dynamic workshop for a diverse legal audience—an effort that previously would have been daunting without AI support. Jen shared how Deep Research tools provided strategic insights into funders and competitive landscape analysis that would have otherwise taken hours of manual work.
3. Meta’s Strategic Move on Scale AI
Meta’s $14.8 billion investment in Scale AI is more than a financial transaction—it is a strategic response to lagging performance compared to competitors like OpenAI and Anthropic. Scale AI’s specialty is structuring unrefined data into usable formats, a service essential for training advanced AI models. This moves signals Meta’s intent to close the performance gap and possibly avoid antitrust scrutiny through its minority stake structure.
4. OpenAI’s Full-Life Integration
OpenAI recently introduced deeper integrations with productivity tools including Gmail, Google Calendar, Slack, Microsoft Office, and CRMs like HubSpot. For users who opt in, ChatGPT can now read, write, and summarize content across platforms, dramatically streamlining workflows. While this offers efficiency gains, it also raises security concerns, especially as courts begin examining how AI firms retain and use personal data.
5. Law Schools Face a Defining Moment
The University of Maryland study prompts urgent reflection: if AI can master traditional legal exams, what does that imply for legal education? The longstanding model of one-off, closed-book finals may no longer align with the skills needed in practice. Bridget and Jen argue for a shift toward teaching students how to work with AI—developing legal reasoning skills in tandem with technological fluency and understanding the architecture of law rather than merely memorizing doctrine.
Final Thoughts
This episode makes clear that the legal profession must rethink its assumptions about intelligence, expertise, and training. AI’s ability to match or surpass top-performing law students challenges the status quo in both academia and hiring.
Rather than resisting this evolution, the opportunity lies in redesigning legal education and legal practice to take advantage of AI’s strengths while still developing human judgment, ethical reasoning, and client-centered problem-solving.
As the hosts suggest, the time for passive observation has passed. Legal educators, law firms, technologists, and policymakers must collaborate across silos to build new models for training, hiring, and serving clients in a world where AI is not just part of the process—it may soon be leading it.
“We probably could use this technology to help teach the architecture [of law] ... maybe even better than we have in the past.”
— Bridget McCormack
Transcript
Introduction
Jen Leonard: Hi everyone, and welcome back to 2030 Vision: AI and the Future of Law, the podcast where we attempt to connect all the rapidly moving dots in the land of generative AI and help our lawyer audience understand why all of these dynamics are relevant for lawyers, judges, law professors, law students, and other legal professionals. I'm your co-host, Jen Leonard, founder of Creative Lawyers, here as always with the fantastic Bridget McCormack, president and CEO of the American Arbitration Association.
We have some exciting topics to dive into today, and we'll follow the format that we tend to follow starting with our AI Aha!'s — the thing that you and I have each used AI for since our last recording. What Just Happened? Really helping people understand some of the big picture tech advancements and what they might mean for legal. And then we'll dive into our main topic, which this week I'm really excited about. Bridget, you were the first one to bring this to my attention. It’s a new study about ChatGPT getting its very first A pluses on law school exams.
AI Aha! Moments
Jen Leonard: But before we get to celebrate our little ChatGPT and its wonderful law school exam performance, let's kick off with our AI Aha!'s. Bridget, how have you been using AI since we last spoke?
Bridget McCormack: I’m now at the point where I’m always having trouble deciding what people would be interested in, because I have to say, it’s so integrated into just about everything I do. It’s in all of my personal and professional workflows now. But I was giving a workshop this week – I would prefer to do them with you – but I do a lot of presentations about AI and dispute resolution and AI in the legal profession and change management around legacy legal organizations who are trying to stare down this disruption. It’s usually presentations and I have lots to talk about, and I hope it’s useful. But this audience, which was a very diverse audience – it was lawyers, arbitrators, academics, and an international audience – super interesting.
And the organizers of the conference were really cool; they packed a lot into the whole day. My session was the final session and they wanted it to not be a presentation, not like “we tell you what you need to know.” They wanted us to do a workshop (and you're good at that). I had a good idea – I think. I mean, my idea was that people are all in different phases. Let’s figure out where they are in the room along the transformation journey. And then let's build an AI maturity model and, like, what are those different stages? (As you know, I’ve thought a lot about this at the AAA, from experimentation to point solutions to integration in workflows to full transformation.) And then for each of those stages, what are the use cases, what are the enablers, what are the risks you’re going to worry about, the barriers that are going to make it hard to get to the next stage, and then what’s the sign that you’ll be ready for the next stage? So, I really liked the idea for the workshop, but was very worried about how to do it in a room full of a really diverse group of people and a large group – like 250 or 300 people. Oh, and also 60 minutes. So start to finish, like 60 minutes.
I started to get all stressed out about it, and then I was like, “Well, wait a minute, I have a friend who can help me figure out the best workflows and how to make sure I could make it truly interactive instead of ending up making it just a presentation that looks like a workshop.” It was a very iterative process. I kept saying, like, “No, I don’t think that’s going to work because of the diversity in the room.” You know, it would give me a set of suggestions, and then I’d say, “Maybe here’s another problem: we’re sitting at round tables, so I can’t get people to move to different quadrants. Oh, and by the way, they wanted an AI recording of the whole thing that would produce something substantive at the end. We were trying to produce a takeaway that people could actually take back with them.”
And I honestly don’t know... I got so many good ideas, and a lot of mediocre ideas, and some bad ideas – like you always do, like you would from a human who was like, “Let’s do this together and throw all the ideas out there.” And it really gave me the confidence that I could make what felt like a big challenge work. I do think it went really well. We actually got through the entire matrix, and we had a final product. I was doing it with the amazing Stacey Gillum (who you know, and she is fantastic), and we were typing it in, in real time, on the big screen as we were getting involvement from the whole room. I really feel like without the partnership of my ChatGPT coach/mentor/ideator. I’m not sure it would have been as successful. I’ll share it with you.
Jen Leonard: Oh, I would love that. As somebody who does workshops all the time – and before that, taught classes that were really experimental – I think it is the best for helping you design activities for different groups and coming up with even just the timing. I mean, it will automatically suggest timing for you to get through something like that in the allotted hour or whatever you have. So that’s awesome. Congratulations. I also feel like it’s such a confidence booster.
I might have told you this, but I was meeting with somebody about a project, and there was one part of it where neither of us has expertise in this area. And I was just sort of like, “I’m not worried about it. I know that we’ll figure this out, and part of it will be using AI to help us identify the right person to figure it out,” and the person just said to me – they’re like, “God, you have totally changed, because five years ago you would have been so afraid of this project.” And I said, “That’s really interesting, and I think it’s true. It just feels like things are figure-out-able.”
Bridget McCormack: Yeah, when you have this extra help all the time, right? All right. What about you? What’s your AI
Aha! for this week?
Jen Leonard: So, I’m like you – I went into my ChatGPT history from the last two weeks because I use it so much. I’m like, I can’t even... they’re all over the map. It’s starting to look a little weird and creepy how many different odd things I ask it about. I’m working on a project right now, and we’re pursuing funding in the nonprofit space – so foundational funding, which is something I’ve not had a lot of experience with.
So, I used Deep Research for two different prompts, and for people listening who don’t know Deep Research, it’s a product (actually, Google and OpenAI both have the same product name). But it’s essentially like having a really skilled research assistant who does really deep dives on the internet and produces a research brief with citations. And I asked it for recommendations for different foundations that might be interested in the topics that we’re exploring, and why they’d be interested. It asked a couple follow-up questions, which helped me refine the search. And it produced this great dossier of different options and why it thought one over the other might be good for this given project. Then, as we were sort of moving forward, I wanted to explore and be prepared for the question of what makes this project different or what else out there is like it. So, I did separate deep research on that. That was also really interesting because it surfaced a few projects that were really similar, but then it also did the next step, which is: “Here’s why your proposal is different and here’s how to talk about it in that way.”
I was listening to our AI podcast guys (Paul and Mike) today, and they were talking about some of the limitations on Deep Research for verifiability and fact-checking and those kinds of things. But I think this was a great example of a place where I didn’t need 100% accuracy. I really just needed to get a sense of the lay of the land. I wasn’t as concerned about the underlying sources being particular sources. It was just something I would not otherwise have the time to do myself and that made me feel better prepared for conversations about it. So, I would say that was my AI Aha!, and learning when to use Deep Research and when I need more rigorous fact-checking and that kind of thing.
Bridget McCormack: Yeah, I still use Deep Research quite a bit. It’s sometimes just a starting point for me on a brand-new topic. And then I can, like, nail down what I actually want to learn more about. I’m not yet great about knowing when ChatGPT 4.0 is good enough or when I need a reasoning model. That might be something I ask ChatGPT to teach me during one of my drives. I’ll use voice mode and say, “I’m trying to learn about X,” and then just ask questions so that it can, you know, just go back and forth. And it’s so patient. I might ask it if it can teach me about that a little bit. Although aren’t we, one of these days, going to get a model that’s just going to do it for us and figure it out?
Jen Leonard: And then I’m not sure what we’re going to do. We’re just going to hang out.
A shout out to another one of our friends, Edson, recommended to me yesterday – we were talking about the more that we use ChatGPT in particular (but any of the big frontier models), and the more they remember about us, the more that memory gets infused into future prompts. Unless you turn it off and turn it back on (depending on whether you have it on). And I was saying that I once used it to prompt something for my husband and his job, and then I was worried that it’s forever going to think that I do what he does or that I’m interested in that. And Ed suggested running a “memory audit” and asking ChatGPT to share everything it knows about you, and then tell it to forget certain things – which I’m going to try and do before our next recording, because I think it’s a good idea.
Bridget McCormack: It’s really a great idea, because another thing – I know you do this as well – sometimes in preparing for a presentation, I will do a prompt to be able to screenshot what the chatbot’s doing in response to the prompt. But it’s a prompt for a particular audience, and what they might use it for, right? It’s not necessarily relevant to me. I love that idea.
Jen Leonard: I think I’ve shared this example before. I was briefly, two years ago, interested in this job posting, and I think I shared it with ChatGPT and asked it about it. And I’ve since forgotten that I even saw it or asked it, but it constantly sort of brings it up. Ed was saying it’s sort of like: when you date somebody who’s really attentive the first month you date, and they never evolve in their understanding of you – they just keep bringing up things that you brought up on your first date.
What Just Happened: Meta & Scale AI Deal
Jen Leonard: But sort of speaking of things in our broader tech world and the big frontier models (which I know you and I are really interested in, and not a lot of lawyers have the time to follow those developments, which is why we created our “What Just Happened” segment), one of the things that just happened is a big deal – a literal deal – between Meta (Mark Zuckerberg’s company) and an AI company called Scale AI. So, Bridget, maybe you could walk us through the Meta–Scale AI deal and what it means and why people should care.
Bridget McCormack: This story is getting a lot of attention. Maybe because of the price Meta is paying to take a 49% stake in Scale AI. It’s a $14.8 billion transaction. And the founder of Scale AI, Alexander Wang (who’s 28 years old, by the way – I digress for one minute, but I’ll come back to the main story) – 28 years old! This company was founded in 2016, so he founded it when he was 18, and he just sold 49% of it for $14.8 billion. So, it’s like a $30 billion company that a 28-year-old built.
So, let’s step back for a minute to what Scale AI does. It’s been around since 2016 (so before generative AI) and its business is to clean up data for companies that are building on data. Right. So I think all of the major frontier labs use them – OpenAI uses them, Amazon uses them, Microsoft uses them, Alphabet uses them. They take data from the world – not just language or writing, but also images (like a photo of a car) – and they clean it up for companies that need data that’s more structured than you might find it in the wild. Obviously, this business has become incredibly useful in the last few years when there’s just a need for more and more data to make these models more and more competent, because the more dimensions they are able to incorporate, the better their work is.
So, it’s pretty interesting for Meta to make this kind of investment. It comes at a time when they are also announcing a kind of reinvigoration of their entire AI business. I think what they said was they’re building the team to get to superintelligence – and we’ve talked about this before. I guess superintelligence is one step above artificial general intelligence (when the technology is not only as good as every human at every single task, but it’s better than us at every single task). I think that’s about right.
And it also happens at a time when I think Meta feels pretty behind the other frontier labs. I mean, for a long time, Meta’s AI models were viewed as on par with the other frontier labs. (As a reminder, Meta’s AI models are called Llama.) Through Llama 3, there was generally a feeling that their models were around as good as the models from OpenAI and from Alphabet and from Anthropic. That’s decidedly no longer true, right? The Llama 4 has just not lived up to expectations. And I think there’s pretty widespread agreement that it’s not as far along. And I think there are all kinds of theories for that. The Hard Fork guys today had a long segment on what might be going on there – like why complete disruption in the current world might not be a good outcome for incumbents like Meta, which I hadn’t fully thought through. But it clearly is a sign that the CEO of Meta, Mark Zuckerberg, feels like they have to do some catching up, right?
So, they’re making this enormous investment in Scale AI and maybe most of all in its founder, who will now come work on superintelligence, and they’re trying to hire a bunch of other engineers away from other labs. They’re willing to pay a lot of money to hire away some of the superstar engineers to catch up or surpass their competitors. So, we’ll see. At the beginning, Meta made this strategic decision that their models were going to be open-weight models so developers could build on them. And I think you and I have talked about this – if you get the strategy right, if more developers are building on your models, you become embedded in more workflows, right? And so that strategy either wasn’t enough (because there were other pieces missing to the strategy) or just didn’t pan out. I don’t think I know enough to know, but we’ll see if this new investment makes a difference. Anything else catch your attention about this one?
Jen Leonard: There were just two things that I thought – for a lawyer audience – that are not really related to the practice and business of law (which is usually where we’re focused), but the substance is. I’ve heard a lot of conversation about the structure of this investment and the 49% stake, that it could potentially be designed to avoid antitrust scrutiny. We’re in this sort of weird era where there is both this deregulatory environment (certainly at the federal level), but there are also all of these huge victories – if you’re trying to decrease anti-competitive behavior – particularly around Google and Meta. So this might be an attempt to get around that. We’ll see if there is scrutiny of this deal. These just look like attempts to squelch competition and hire all the best talent.
And I know we’ve listened to other podcasts where I think the hosts have really framed it nicely, that tech in the United States, at least, was really traditionally always about innovation. And increasingly it feels much more about consolidation and retention of power. And so I’ve heard some interesting conversations about that. To reference The AI Show podcast hosts – they were talking today about the hold that’s been placed on OpenAI’s deletion of memory in users’ accounts because of the lawsuit with The New York Times. They didn’t go into deep detail, but one of them raised the question of whether Meta’s open-source strategy sort of carves it out a little bit from some of the legal impediments that some of the other tech companies might face in all of the lawsuits that are certainly going to proliferate, because Meta just has less control by design over where the information is stored. So just a couple interesting legal dimensions to all of this as well.
Bridget McCormack: Yeah. And Meta itself has, you know, just recently been through a similar antitrust trial, right? Wasn’t the acquisition of Instagram, I think, the subject of that?
Jen Leonard: Yeah, all of their acquisitions over the last ten years, right – WhatsApp and Instagram. But I think you’re right: Instagram is the focus of that. But it doesn’t seem to really be curtailing their behavior around consolidation. It just seems like they’re becoming more creative in the way that they structure some of these arrangements.
What Just Happened: Open AI Connects to Everything
Bridget McCormack: We’ll see. It’ll be interesting to follow. The other big news I think that you and I were excited about when we woke up last Thursday – we had a surprise announcement from OpenAI that ChatGPT could connect to your whole life. Tell us about what that looks like.
Jen Leonard: Yeah, you and I woke up excited, and every chief information officer in the world woke up and wished that they’d stayed asleep, because OpenAI announced that ChatGPT connects to everything in your life, essentially now, if you opt to turn it on, it can connect to everything. And that includes things like your Google Calendar, your Gmail, your Slack messages, your Microsoft Office suite and Microsoft 365, and even some of your internal tools and databases. I logged on to Claude this morning and got a pop-up message that was exactly the same. I didn’t turn anything on, but I saw some of the icons and it had Asana in there (which our team uses for project management). It has HubSpot integration, which we use for CRM.
So, you know, all along we’ve been talking about the big unlock being when these systems can talk to all of your information. So essentially ChatGPT can – if you give it permission – read and write emails, summarize your calendar events, pull data from spreadsheets or CRMs. So, it’s really the next step in trying to make it useful for your actual workflows, so you’re not constantly trying to navigate between all different platforms. This is only for ChatGPT Pro accounts and above (and Enterprise and EDU accounts, I believe). And you can create your own custom GPT with specific connectors baked in. I know there are all sorts of posts and thought pieces from security experts about some of the risks involved here.
And even noting that earlier comment about OpenAI now being under a court order to retain things they told users they would be deleting – you can trust these companies if you choose to, but other legal proceedings might intervene and other things could supersede their promises, even if they are promising not to share your data. So that was my takeaway. But I know you are also super excited. Bridget, have you played around with it at all?
Bridget McCormack: Yeah. From our last conversation, I was kind of excited about Microsoft’s announcements at their last Dev Day, because it seemed like they were focused on making sure this all worked – you know, the connecting part of the architecture of making this technology work across the other systems in your life felt really important to me to get us to the next level. I have allowed it to connect to all of the things, but I have not experimented with it enough. I just built a new project in Canvas in OpenAI. Every Friday, I’m going to talk to it about my takeaways from my week that I don’t want to forget, and things that I probably need to do follow-ups on. By Friday, the week has been so busy that there are some things that I haven’t gotten down in email (which is usually how I communicate to my team or even to myself), which feels inefficient because we’re all up to our emails, right? So, this is like an end-of-the-week check-in: I’m going to talk to it, and then Monday morning it’s going to remind me to tell it to go look at my calendar for the week. And based on what I said last week – flagging anything in particular – it’s going to also ask me on Monday mornings for, like, the three things that I hope I accomplish this week. And then it’s going to keep track. There’s a Wednesday check-in. It’s going to kind of keep track of all of this as we go and build a bunch of outputs from all of that, which I’m hoping will bring some organization to my 7,000 things in my brain going on all the time. I’ll let you know. I’ll report back. But I am excited.
Jen Leonard: Yeah, that is really exciting. Let me know how it works, because I need to play around with ChatGPT and figure out some ways that I could even use these integrations. Like, some of them are obvious, I think, but thinking about Asana and HubSpot in particular – like, what could I possibly do? I didn’t see a LinkedIn integration yet. I’m always wishing there was more ability to use the information I have in LinkedIn for some purpose. So maybe they’ll build that out.
But I should also note that in our industry last week, around the same time, Harvey announced that it was connecting with iManage (which is used by most major law firms for their internal document management systems). And again, in all of our conversations, that has been a big impediment to having lawyers actually see how useful this can be, because they’ve had to toggle back and forth. And so, it really is like OpenAI (or Harvey – and they have a relationship with OpenAI) trying to become the background of everything that you’re doing, broadly and in legal in particular.
Main Topic: ChatGPT gets an A+ on Law School Final Exam
Bridget McCormack: Super interesting. Well, is it time for our main topic? We’re so proud of ChatGPT for growing up and getting A’s on all the law school exams. This study – which I’ll have you walk us through, Jen – is so fascinating. And coming out of the University of Maryland Law School, a bunch of scholars that I don’t think I know (but what a really interesting paper). Maybe you do know them. Tell us about it.
Jen Leonard: I don’t know them, but I’m glad to know them now. And I’m definitely going to follow their work because I love what they’re doing. I’m so proud of our little ChatGPT. I think three years ago it was only beating 10% of test-takers on the Uniform Bar Exam, and through this study we’ve found that the new reasoning models from OpenAI – particularly O3 – got three A+ grades on law school final exams. And this is the first time that we’ve seen a generative AI platform be able to achieve this. The way they structured this is: these professors (I think there are 3 or 4 of them) have been tracking ChatGPT’s performance on their law school exams since GPT-3.0, I believe. So, they’ve gone through the later versions of the initial family of models, and now they’re testing these reasoning models. This has been a huge improvement over the first few trials they ran with GPT-3 and GPT-4, where the models received mostly C’s and B’s. As I mentioned, O3 got three A+ grades, one A, one A–, two B+ grades, and one B.
Maybe we should take a second for those listening who don’t really follow this and might not know the difference between a reasoning model and the earlier models. The architecture is a little bit different, and the way the models process information is a little bit different in the sense that these earlier models were spitting out essentially, immediately the most likely response or answer based on all of the data that they had consumed – just trying to guess, essentially, what the next most likely word would be in the output. These reasoning models, on the other hand, essentially have internal scratch pads that are transparent – you can see what they’re actually doing when you ask them a question. Those scratch pads are very similar to what a human test-taker would do: they get the question, and then they’re writing on scratch paper the different approaches, crossing out some things, going back and revisiting things, trying different approaches before the final answer gets to you. So these are the models that are achieving these great outcomes that are matching or beating the top students in these professors’ classes.
I thought there were a few super interesting things about this. One is that it got a B in Administrative Law – which was actually my favorite class in law school, weirdly. Maybe I loved Admin so much because I love the Chevron Doctrine so much, and once I understood Chevron it was pretty easy to understand a lot of administrative law. The reason O3 got a B in Admin is because its training data cut off before the Loper Bright case (which of course was the Supreme Court case from last year, I believe, that overturned Chevron and essentially upended a lot of administrative law doctrine). So, if it had actually had access to the most relevant recent case, I’m sure it would have also performed in the A or A+ level.
The other grade was a B+ in Secured Transactions, which was a class I did not enjoy. The researchers didn’t prompt the model correctly, and they failed to give it the instructions on how the answers should have been formatted.
And the last really interesting thing that I found about the study was they then ran the test again on the Torts exam by giving the model access to the professor’s notes – which were about 70,000 words of notes on Torts. They hypothesized that the model would perform better, but actually it performed worse than when it simply had access to the case law and the syllabus. There’s other research that shows that too much text in a prompt can actually degrade AI reasoning abilities, much the same way that a human having too much information isn’t really able to focus on what’s relevant. And so, I thought the whole thing was incredibly interesting, but I would love to know what you thought about it, Bridget.
Bridget McCormack: Yeah, it’s so fascinating. I had the same thought about the overload. First of all, we have found that when we build with our internal handbook of knowledge about, say, a particular dispute area, we’re seeing – especially when we build a multi-agent model – really high performance. But if you add basically a bunch of background case law in that area, it degrades – it’s worse.
It’s a little bit like when I took the bar exam. I took New York, and there were 127 different subjects – I mean, I think it was more like 27, but it felt like 137. And I hadn’t even taken some of them in law school, because I just... I don’t know, I took the courses in law school that sounded interesting to me. And so, I had a lot to learn. And I was definitely hearing the pitch from all of the different, you know, courses. Barbri’s pitch was like, “No, you don’t want to learn any more than exactly the things that we know you need to know for this exam.” Like, there’s a whole lot more you can learn about every one of these legal subjects, right? There’s fascinating history and all the gray areas of the doctrine, but Barbri’s like, “Nope, you need to pass an exam. You just need to know these ten things when you’re doing Evidence,” and I think Barbri was completely right. You could go deep on all of those subjects and you might have a much harder time passing the bar exam. Same with law school exams. So, I think it feels actually quite human to me to have that happen, right?
Jen Leonard: Yeah, we’ve talked about this before – how we’ve sort of crossed this Rubicon where we really do work alongside AI. And sometimes I’ve switched from comparing AI to humans to comparing humans to AI. So, I feel like studying for the bar exam or studying for a law school exam, to me, is like grounding your brain in the knowledge or like using retrieval-augmented generation. You lock yourself into a room for 2 or 3 days with your hornbooks and your casebooks and all of these things. And you don’t want any more information in your brain – you really just need to retain that information, and then execute on the reasoning of issue-spotting, applying it to the facts, and then writing in the way that you’ve been trained. Everything else in your brain at that point is not that helpful.
It does make sense to me in a certain way. But in other ways, it’s sort of counterintuitive, because you would think the more knowledge that you have, the better you would do. And it’d be interesting to see how a law professor would do on an exam – like having that deep level of expertise, whether they actually would do worse.
Bridget McCormack: Yeah, that’s really interesting. Give the University of Maryland Torts exam to a professor at another law school and see how they do. From the study, the courses it got an A+ in – it performed as well as or better than the very top student. So, it’s pretty stunning, right? And it’s O3 – it’s a frontier model, it’s not one of the legal models. I mean, that’s really interesting as well, right? What does it mean for the legal-specific models?
Jen Leonard: Right. What does it mean for O5 or O6? I mean, if it’s getting an A+ already, it seems to have almost achieved, like, AGI in the law school exam context, at least.
Bridget McCormack: Yeah. So, my first thought was like, okay, University of Maryland is doing some really cool work. I wonder if, when this paper comes out and they have a faculty retreat, they go in like, “What do we do now? What is the purpose of law school? How can we make sure that, given what we charge students, we are teaching them what they need to know in the future? If all of a sudden now they’re going to have their AI hologram next to them who can do all of the things that we used to teach law students to do, what should we be teaching?”
There have been critiques of the law school teaching style of a semester-long class with one assessment at the end (that’s a closed-book assessment – something that doesn’t happen in legal practice). In practice, we actually get to look things up before we have to make decisions about how to help somebody solve a problem. So, there have been lots of critiques of that in lots of other ideas. But this is a new one, right? This is like your system might start to seem silly or very old-fashioned. What do you think it means for training law students? Training lawyers? Do you think all of the law faculty around the country are at a secret meeting right now, figuring out what to do?
Jen Leonard: I would like to think they are, but I doubt it. There’s nobody more vocal than I am in criticizing the first semester and how little formative feedback we get, how unrealistic the testing conditions are as compared to what you’re actually doing as a lawyer. And I’ve been sort of going over in my mind what the implications of the study are. First, for first-year law school – like what you just said struck me, which is: does it just start to seem silly when you have students (imagine five years from now) having a 1L exam and you’re like, “This is what you’re going to be asked to do”? I would think that their thought would be, “We’ve been able to do this with a machine for five years now. Like, this is what I’m being tested on.” At the same time, I know that in law schools, what they do is they’ll disconnect you from the internet and you’ll be on a computer typing your exam, but you won’t have access to any of these reasoning models.
When I fed this into ChatGPT and asked follow-up questions and asked about cheating implications, I don’t think those are actually really a concern, because I think law students across the board, by and large, are rules followers – and the technology exists to cut them off from being able to cheat. But I guess the question is: do these skills matter? The skills of being able to read a fact pattern, look through it and scan and spot all the legal issues at play, recall the case law you’ve read in your classes and what they all stand for, and apply that law to those facts and write a conclusion about what it means legally. And I don’t know the answer to that, because I could see arguments on both sides. Maybe the argument that says it still matters is that this is foundational, and even if machines can do it better than students, it’s still a part of forming your ability as a lawyer to be able to understand the law and why your client’s situation means something in the context of that legal framework. On the other hand, why does it matter? Does it only matter for the development of a lawyer’s mind, or does it actually have any real value in practice once the machine can actually do this? I don’t know. What do you think?
Bridget McCormack: Yeah, I have so many different reactions and thoughts. One is: it does seem to me we’re always going to want to make sure that we’re producing new lawyers who understand kind of the architecture of the law – like what are the sources of law? What’s public law? What’s private law? What are the different rights and remedies that people have available to them? So what I think of as the architecture of the law: how does it operate in our lives and on our institutions? Without understanding the architecture, even if you have the super-intelligent AI hologram next to you, you won’t be able to work well with your AI hologram to help it help you with a client’s problem.
So, I think we still want lawyers to understand that architecture. I don’t know that we can’t do better than a “memorize it all, closed-book test, issue-spotting” way to be confident that lawyers understand that architecture – that feels shortsighted to me. Or like a lack of imagination or creativity. Like, yes, I know we’ve done that for a couple hundred years, but you know, as my friend David Slayton (the court administrator in L.A. County) says, “Just because we’ve done it that way doesn’t mean it’s not incredibly stupid.” There might just be a better way to teach the architecture – and we probably could use this technology to help teach the architecture. Maybe it’s using your AI partner from the beginning to help you spot the issues and work together on what they are. And, you know, you might say back to your AI partner, “I see issues one and two, but I’m not sure three is there” – kind of the way I work with it on problems that I don’t understand yet, right? So, I guess I’m just not sure that the way we’ve always done it is the way to do it in the future.
I also don’t know why we wouldn’t, sooner rather than later, want to make sure law students (new lawyers) understood how to work together with the technology. We’re seeing more and more studies in the medical profession that the AI is often better than even the doctors using AI in diagnosing certain illnesses or disease. I can’t imagine doctors saying, “You know, well, too bad, so sad – we’re not going to use it because it’s not good for doctors somehow,” if you get better outcomes. And my guess is we’re going to get better outcomes now that they’re getting A+’s, if we work with them, than without them. So let’s start training for what’s obviously coming next, which is going to be some workflow where we’re working together with the technology.
Jen Leonard: And then – sort of shifting the perspective from the law professors and the students to the future employers, right? We’ve talked about this on panels with Andy Perlman about how the market constantly says that it wants tech-enabled law students and graduates, but it doesn’t reward the schools that actually prioritize that – it rewards prestige and brand. So now I’m imagining a world where you go to a prestigious school, you have straight A’s (A’s, A–’s, maybe an A+). You write onto Law Review. These are the things that in 2020 and before would make you just exceptional – a person that everybody had to have, because nobody has those skills. And now, if AI can draft a Law Review comment in minutes, it can get A+’s on exams when even the most talented students can’t. So, if you’re an employer, doesn’t everything start to break down around the way that you think about recruitment – or shouldn’t it? As you sort of start to think about, like, if these skills are not the skills that we need to add value to our firm and that our clients are looking for and willing to pay for, what comes next? And what you said, Bridget, about spending all this time cramming our heads with knowledge and being stressed out about this one exam – like, that’s a lot of energy from really, really dedicated people that, if that were to change significantly, you have a ton of room and space to put something much more robust (as robust in a different form) in its place. I guess the huge blinking red light is: what are those things?
Bridget McCormack: Yeah, I mean it’s going to be really interesting. I think Andy Perlman’s right. It’s going to take some signaling from the places that hire law students that they want to see more. I mean, I have to say, I think the grades and the Law Review were just a signal that that person was going to be able to be successful at the highest level of practice. Like, writing a good comment doesn’t really make you a really successful lawyer at a global law firm, but the idea was those were signals that you could be that person, right? Even among all the people who had those credentials at the end, they don’t all end up as a partner at a global law firm. Many of them end up going into public interest. Many go into academia, right? They go different directions. Some quit altogether and run amazing businesses, like Creative Lawyers. But I feel like this is another one of these times – I’ve talked about this in other contexts – where we have stakeholder silos in our profession that are not serving us well. Law firms are having a lot of conversations about this. I assume the law schools are. I don’t know that they talk to each other enough to figure out the kind of collaborative solution that I think our profession needs at this time.
Jen Leonard: I think you’re right, too. I paint with a broad brush saying I don’t think that law professors are talking about this. I’m sure that they are. I don’t know either, but I would think on law school campuses this is becoming more and more of an urgent conversation about what all of this means. So more questions raised than answers by this study, but I think it definitely suggests that we’re entering an era where we don’t have the luxury anymore to just sort of think about these things. I think we need to start really understanding and trying and testing and taking new approaches – and like you said, sharing across silos what we’re learning and what we need more of and what’s working and what’s not.