Findings

Legal Models

Kevin Lewis

April 15, 2026

AI-Powered Lawyering: AI Reasoning Models, Retrieval Augmented Generation, and the Future of Legal Practice
Daniel Schwarcz et al.
Journal of Law & Empirical Analysis, forthcoming

Abstract:
Generative AI is set to transform the legal profession, though its most promising uses and ultimate effects are still unclear. While AI models like GPT-4 improve efficiency, they can also "hallucinate" and may undermine legal judgment, particularly in complex tasks typically handled by skilled lawyers. This article examines two emerging AI innovations that may mitigate these concerns: Retrieval Augmented Generation (RAG), which grounds AI-powered analysis in legal sources, and AI reasoning models, which structure complex reasoning before generating output. We conduct the first randomized controlled trial assessing these technologies, assigning upper-level law students to complete legal tasks using a RAG-powered legal AI tool (Vincent AI, 2024), an AI reasoning model (OpenAI's o1-preview), or no AI. We find that both AI tools significantly enhance legal work quality, a marked contrast with previous research examining older large language models like GPT-4. Moreover, these newer models appear to maintain the efficiency benefits associated with older AI technologies. Our findings also show that these AI tools significantly boost productivity in five out of six tested legal tasks, with statistically significant gains of anywhere from 50% to 130%. They perform exceptionally well in complex tasks like drafting persuasive letters and analyzing complaints. Notably, o1-preview improves the analytical depth of work product and Vincent AI avoids introducing more hallucinations, suggesting that integrating domain-specific RAG capabilities with reasoning models could yield even larger improvements.


Artificial Intelligence and Human Legal Reasoning
Nicholas Bednar et al.
University of Minnesota Working Paper, April 2026

Abstract:
Empirical evidence increasingly demonstrates that generative artificial intelligence has the capacity to improve the speed and quality of legal work, yet many lawyers, judges, and clients are reluctant to fully embrace AI. One important reason for hesitation is the concern that AI may undermine the human reasoning and judgment on which competent legal practice depends. This Article provides the first empirical evidence evaluating that concern by testing whether upper level law students who rely on AI at an early stage of a project experience reduced comprehension and impaired legal reasoning on later stages when AI is not an available option. To evaluate the possibility that AI degrades comprehension and reasoning, we conducted a randomized controlled trial involving approximately one hundred second and third year law students at the University of Minnesota Law School. Participants completed four sequential lawyering tasks: writing a memo synthesizing the law based on a packet of legal materials, answering closed-book multiple choice questions that tested their comprehension of the materials, writing a memo applying the materials to a fact pattern, and revising their second memo. Participants were randomly assigned either to a control group, which could not use AI until the final revision task, or to an AI-exposed group, which used AI during both the initial synthesis task and the final revision task, but not during the intervening comprehension and application tasks. The results provide a more complex picture of AI's effects on legal reasoning than critics or enthusiasts often assume. As expected, participants who used AI to help craft synthesis memos produced substantially stronger work and completed that task more quickly. But contrary to our preregistered hypothesis, AI exposure at this initial stage did not diminish downstream comprehension of the underlying legal principles. To the contrary, participants who used AI on the synthesis task outperformed the control group on the later application task even when neither group had access to AI. Yet when all participants used AI to revise their reasoning memos, participants who started with weaker memos improved while participants who started with stronger memos regressed. These findings suggest that AI does not inevitably erode or promote independent legal reasoning, but that its effects depend on when and how law students and junior lawyers use AI. The Article builds on this insight by suggesting best practices for AI use and avenues for further empirical research.


Barriers To Adopting Predictive Algorithms: A Criminal Justice Field Experiment
David Abrams et al.
University of Virginia Working Paper, February 2026

Abstract:
Artificial intelligence, machine learning, and algorithmic prediction tools have made huge advances in recent years. Some argue that we are on the precipice of a major revolution in our economy and society, in which eager adoption of these new technologies will transform how work is done. This Article argues that change might come more slowly to the legal sphere than is commonly thought. We present the results of a criminal justice field experiment in which we provided novel sentence prediction software to public defenders. In some regards, the experiment was a failure. Usage of the prediction software was so low that we were unable to evaluate its impact on sentencing. This is despite strong a priori expressions of interest and tests showing that our algorithm is more accurate than the public defenders at predicting sentences. However, this failure produced valuable insights about why predictive AI might face headwinds in the legal profession. Extensive interviews, a prediction "quiz," and our empirical results revealed the following takeaways. First, attorneys place a high bar on adopting new technology due both to workflow inertia and skepticism about benefits. Second, some attorneys were distrustful of an algorithm that did not have all of the information they had-even if the algorithm still provided more accurate information than their intuition. Third, algorithm design entails challenging ethical questions that can reduce trust and use among users. We discuss these issues in detail and suggest some possible paths forward.


Tolerance of Hate Speech in the Anglosphere: How Law Shapes Mass Opinion
Dennis Chong, Jack Citrin & Morris Levy
University of Southern California Working Paper, March 2026

Abstract:
Past research argues that civil libertarian norms established by law shape mass support for the right to free expression. However, the evidence for this claim has come largely from US samples, with no comparative studies yet examining whether legal regimes influence tolerance among ordinary citizens. We gather a survey of the US, UK, Canadian, and Australian publics to test whether distinct free expression laws foster cross-national differences in support for free speech. Despite these countries' common political heritage, the categorical protection of offensive expression in the U.S. contrasts with the prohibition of "hate speech" in the other countries. We find that this legal difference is paralleled in mass opinion. Among the four countries, tolerance of right-and left-wing hate speech is highest in the US. The US edge increases with the extremity or offensiveness of the statement and among those who are knowledgeable about their own country's law. A follow-up survey experiment in the US and Canada demonstrates that providing information about the legal regime widens cross-national differences in tolerance.


Recalibrating the risk of false confession wrongful convictions: Interrogation tactics and inverse probability
Scott Mourtgos & Ian Adams
Journal of Criminal Justice, March-April 2026

Abstract:
False confession wrongful convictions (FCWCs) are a serious failure of the criminal justice system. Although scholars have identified interrogation tactics thought to elevate this risk, existing research rarely estimates the population-level probability that legally permissible methods will produce an FCWC. Instead, inference relies on outcome-selected case series and laboratory diagnosticity ratios that ignore base rates and the far larger universe of interrogations without false confessions. This article offers a methodological recalibration. We formalize the outcome-selection problem and apply inverse probability logic to derive posterior FCWC risk integrating base rates, sensitivity, and specificity. Using Monte Carlo simulation, we synthesize available empirical evidence across a wide parameter space. Across these specifications, median posterior estimates of the probability of a false confession wrongful conviction associated with lawful interrogation tactics cluster near 1%. We conclude by introducing an Acceptability Curve that clarifies how normative judgments about tolerable error shape policy conclusions.


Release, Detain, or Surveil? The Effect of Electronic Monitoring on Defendant Outcomes
Roman Rivera
American Economic Journal: Applied Economics, April 2026, Pages 299-329

Abstract:
This paper studies the effect of pretrial electronic monitoring (EM) relative to both pretrial release and pretrial detention (jail). EM often involves a defendant wearing an electronic bracelet, which aims to reduce pretrial misconduct at a low cost. Using the quasi-random assignment of bond court judges, I estimate the effect of EM versus release and EM versus detention on pretrial misconduct, case outcomes, future recidivism, and aggregate total costs. Results indicate that EM reduces overall costs relative to detention. However, EM does not prevent enough high-cost crime to justify its use relative to release.


A (Plea) Offer You Can Refuse
David Abrams et al.
University of Pennsylvania Working Paper, March 2026

Abstract:
Plea bargaining is ubiquitous in the U.S., yet lack of data on rejected plea offers constrains empirical analysis. We compile novel data covering all initial plea offers for 23,000 felony cases in Philadelphia from 2012-2017. Our analysis yields three main insights. First, rejected offers are longer than eventual sentences, even for defendants convicted at trial. This contradicts the conventional "trial penalty" theory, namely that rejecting a plea offer leads to longer sentences for using more court resources. Second, using accepted offers as counterfactuals for rejected offers can lead to inaccurate estimates of the magnitude and even direction of the trial penalty. Third, even after controlling for detailed observables, initial plea offers are longer for Black defendants, especially those detained pretrial. Our study highlights the complexity of bargaining dynamics and the need for new theoretical frameworks and data on rejected offers to inform both theory and practice.


Non-Monetary Incentives and Bureaucratic Performance: Evidence from U.S. Courts
Jonathan Petkun
Duke University Working Paper, April 2026

Abstract:
Federal judges -- who enjoy lifetime tenure and constitutionally protected salaries -- represent an especially hard test case for incentive-based bureaucratic reform. I study the "six-month list," a reform requiring U.S. federal courts to publicly identify judges with overdue matters. Using a regression discontinuity design and other methods, I find that matters most exposed are resolved approximately 14% faster than those least exposed, with larger effects among younger, non-white, and female judges. The speed gains come with tradeoffs: upfront time savings are partially offset by downstream delays, and more-exposed matters are less likely to be affirmed on appeal. A bunching analysis estimates aggregate time savings of approximately 4%, demonstrating that non-monetary levers can shift behavior even among highly insulated elite professionals.


Legal, Quasi-, and Extra-Legal Correlates of Judicial Bail Outcomes
Alora McCarthy, Bryanna Fox & Edelyn Verona
American Journal of Criminal Justice, February 2026, Pages 243-280

Abstract:
Although effort has been made to ensure fair pretrial decisions (e.g., whether to detain or release) based on relevant legal factors, judges have significant discretion, raising concerns about bias and discrepancies. Our analyses, informed by a focal concerns theoretical perspective, examined the relative roles of legal (e.g., prior arrests, seriousness of index offense), quasi-legal (e.g., employment, substance use problems), and extra-legal factors (e.g., race/ethnicity, age) in bail determinations for a sample of jail detainees. Study participants included 713 individuals (67% male, 72% white) booked into a county jail in Florida. Jail files and court records were used to code legal factors and bail outcomes, operationalized in multiple ways to thoroughly assess these relationships. Participant reports were used to collect data on quasi- and extra-legal factors identified by focal concerns and prior literature. Analyses for the first goal revealed that bail decisions, including if bond was granted and the amount of cash bail required, were consistently associated with legal factors, particularly prior arrests and histories of absconding or violating conditional release. Two quasi-legal factors (substance use and household income) were related to bail decisions. Extra-legal factors (e.g., gender, race/ethnicity, age) were largely unrelated to court bail decisions, with some exceptions. Together, these results highlight 1) that legal factors are important in determining bond outcomes, despite Florida judges' legal leeway in considering more subjective and extra-legal factors; and 2) quasi-legal factors are intertwined with extra-legal factors, suggesting legal judicial decisions may still lead to disproportionate burdens on the most disadvantaged defendants.


Ideological Cues, Partisanship, and Prejudice Against LGBTQ Judges
Andrew Stone & Tony Zirui Yang
Public Opinion Quarterly, Spring 2026, Pages 218-237

Abstract:
How does the gender and sexual identity of a prospective judge shape public support for their nomination? We build upon recent scholarship on instrumental inclusivity and argue that, after accounting for nominee ideology, Americans of all partisan stripes will penalize LGBTQ nominees. Using a conjoint experiment, we randomly vary a prospective Biden US Supreme Court nominee's gender and sexual identity. Crucially, we also randomize the nominee's ideology, enabling us to disentangle LGBTQ identity from the ideological signal it sends and differentiate between genuine and instrumental support for LGBTQ nominees. Contrary to recent findings suggesting that Democrats reward minority judges, we find that respondents from both parties penalize LGBTQ nominees. The magnitude of these effects -- roughly 14 percentage points for transgender nominees and 8 percentage points for gay or lesbian nominees -- is considerable and second only to shared partisanship. Our study underscores that ideological alignment does not necessarily foster genuine inclusivity for LGBTQ individuals and highlights the persistent challenges of representation for marginalized groups in an era of polarized judicial nominations.


Insight

from the

Archives

A weekly newsletter with free essays from past issues of National Affairs and The Public Interest that shed light on the week's pressing issues.

advertisement

Sign-in to your National Affairs subscriber account.


Already a subscriber? Activate your account.


subscribe

Unlimited access to intelligent essays on the nation’s affairs.

SUBSCRIBE
Subscribe to National Affairs.