In “Generative Interpretation,” Prof. David Hoffman shows how large language models (LLMs) provide a better method of contract interpretation, with some caveats.
A pathbreaking article co-authored by David Hoffman, William A. Schnader Professor of Law at the University of Pennsylvania Carey Law School, and Yonathan Arbel, Associate Professor of the University of Alabama, introduces a novel approach to estimating contractual meaning through the use of large language models (LLMs). “Generative Interpretation,” forthcoming in the New York University Law Review, positions artificial intelligence (AI) models as the future “workhorse of contractual interpretation” and shows that using them to interpret legal text “can help factfinders ascertain ordinary meaning in context, quantify ambiguity, and fill gaps in parties’ agreements.”
“The article illustrates how large language models can be of use for judges and lawyers, who’ve been on the hunt for generations for more reliable, convenient, and replicable methods to interpret contracts and other legal documents,” said Hoffman. “If used appropriately and with the right degree of skepticism, they can help jurists chart a middle path, satisfying the needs of justice-oriented contextualists and efficiency-minded textualists alike.”
Using grounded case studies of contracts appearing in well-known opinions, the authors build the case that LLMs can effectively balance interests of cost and certainty with accuracy and fairness; their analysis shows that applying the technique often resulted in “the same answers at lower cost and with greater certainty” as that obtained by jurists using more traditional methods.
Hoffman is a widely cited scholar who focuses his research and teaching on contract law. His work is typically interdisciplinary, built through collaboration with co-authors from a variety of fields; he writes about all aspects of contracting theory and practice, informed by empirical evidence.
Hurricane Katrina ‘Flood’
The authors begin with the examination of a controversial Fifth Circuit decision that pitted policyholders against insurance companies over the meaning of “flood” in the context of Hurricane Katrina damage; floods were excluded from many insurance policies. Plaintiffs maintained that “flood” didn’t include water damage caused by humans, which, if accepted as a proposition, would allow them to argue that their property damage—allegedly resulting from negligence by the Army’s Corps of Engineers in maintaining levees—was not contemplated within the contract’s exclusions. Defendants argued that “flood” was unambiguous and referred to any inundation of water, regardless of cause.
After expending “expensive and extensive efforts” to arrive at its decision, the court sided with the insurance companies. Hoffman and Arbel noted that the court consulted dictionaries, treatises, linguistic canons, out-of-jurisdiction caselaw, and an encyclopedia, among other resources. Still, the decision received criticism, the authors note, because, according to detractors, it “merely affirmed its pro-business priors.”
Hoffman and Arbel turned to LLM models and employed a process called “embedding,” which “can be thought of as trying to quantify how much a word belongs to a given category, or dimension.” They queried several models about the relation of “flood” in its contractual context—attributing flood to water damage—to other potential environments.
In the above graphic, the farther the red markers are from the origin, the more distant the term is from the “flood” cause. These results support the Fifth Circuit’s conclusion that floods can arise from any cause—without, Hoffman and Arbel note, consulting the myriad sources that led the court to the same conclusion.
“Simply put,” write Hoffman and Arbel, “generative interpretation is good enough for many cases that currently employ more expensive, and arguably less certain, methodologies.”
C & J Fertilizer v. Allied Mutual
The authors present several additional in-depth case analyses to showcase how generative interpretation could be deployed to various ends, beginning with how LLM models can work alongside the doctrine of reasonable expectations. Hoffman and Arbel explain that judges’ and laypersons’ “reasonable expectations” can diverge dramatically, with each group strongly believing their interpretations are common. Ultimately, this “introspective interpretation” leads to uncertainty in results, they write.
Enter LLM models, which the authors applied to C & J Fertilizer v. Allied Mutual, a dispute over whether a burglary insurance policy required forced entry marks to trigger coverage. Discussions preceding the purchase of the policy made clear that an “inside job” would not be covered, and that concept was written into the policy language. The insurance company denied coverage when $50,000 of fertilizer went missing but there were no signs of forced entry—only tire tracks leaving the scene. The Iowa Supreme Court, finding in the insured’s favor, found that the policy exclusion applied in a such a way violated the fertilizer company’s reasonable expectations.
The LLM model as applied by Hoffman and Arbel, however, disagreed with the court, providing a prediction on the likely expectations of most policyholders under the terms of the policy: “The policy will provide compensation for losses resulting from a substantiated third-party burglary,” with a 90% level of confidence.
Hoffman and Arbel present additional case studies that delve further into the model’s functionality. As they show, LLMs can produce reproduceable predictions about meaning when used in a wide-variety of complex contracting contexts.
“So convenient are today’s LLMs, and so seductive are their outputs,” write Hoffman and Arbel, “that it would be genuinely surprising if judges were not using them to resolve questions of contract interpretation as we write this article, only a few months after the tools went mainstream.”
Moreover, they write, generative interpretation can respond to access-to-justice concerns by increasing predictability of outcomes, thereby decreasing the number of disputes and reducing the benefits of “opportunistic breach” for “sophisticated players.”
The Future of Contract Interpretation
In their analysis, Hoffman and Arbel consider not only the models’ implications for judicial practice and contract theory, but also their limitations.
“As a default, judges should disclose the models and prompts they use and try to validate their analyses on different models and with multiple inputs,” write Hoffman and Arbel. “Ideally, they’d capsule their findings online.”
Specifically, the authors caution judges “to be careful about parties’ manipulative behavior, and to consider how (and whether) to excavate private, non-majority meanings.”
Hoffman and Arbel acknowledge that the new methodology surrounding LLMs will take time to develop and predict that “it ultimately won’t be (just) Textualism 2.0,” but rather “will become a distinctive method of evaluating contractual meaning, marked by its own jargon, normative commitments, and practitioner community.”
The authors openly question whether generative interpretation could even eventually lead to the end of formal contracting as we know it.
“[R]ight now, generative AI looks like a promising judicial adjunct,” write Hoffman and Arbel. “But the future of this technology is more disruptive by far: formal contracts themselves may be made obsolete. Or, at the very least, jurists should consider the marginal value of contracting if the terms themselves are fairly determinable from the parties’ goals.”
Read Hoffman and Arbel’s full article.
Read more about our faculty’s research and scholarship on today’s pressing legal issues.