Anthropic court case provides early guidance on fair use

June 27, 2025

Blog Post

On June 23, 2025, Judge Alsup issued an order on several significant fair use issues in the closely watched AI case, Bartz et al. v. Anthropic PBC, Case No. 3:24-cv-05417, pending the Northern District of California.

Key takeaways:

The court’s order distinguishes between purchased books that were scanned and used to train AI Large Language Models (LLMs), and “pirated” copies of books used for the same purpose.
The court held that the scanning and use of purchased books to train LLMs constitutes fair use under the limited circumstances of the case.
The case continues with two undecided issues:

(1) whether the retention of digital copies of purchased books that were not used to train LLMs constitutes fair use and

(2) whether the retention and use of pirated copies of books to train LLMs constitutes fair use.

The case does not involve any issues related to the works created by AI.

Summary of facts:

The case, and thus the court’s fair use order, involves specific alleged facts. Authors of certain books used to train Anthropic’s LLM, Claude, sued Anthropic for using their copyright protected works.

The key facts in the case are:

The allegations involved two sets of books.
One sub-set of the books was purchased by Anthropic, scanned, and used to train Claude.
A second sub-set of the books was “pirated”—downloaded from an illegitimate website and used to train Claude. These pirated digital copies were retained by Anthropic.
The authors alleged only that the training and retention of their copyright protected works infringed their copyrights. The authors did not allege that any text created by Claude infringed their copyrights.

Analysis

The court’s analysis differentiated purchased and pirated books. Although the court broadly stated that “the training [of Claude] was a fair use[,]” it also clarified that it was denying the summary judgment argument that “pirated library copies must be treated as training copies.

Like the court, we summarize these two sub-sets separately.

Purchased books

As to purchased books, the court ruled that training of Claude’s Large Language Models (LLMs) using purchased books constitutes fair use. The court analyzed this issue in three categories of works:

Copies of works used to train specific LLMs: The court held that the use of works to train specific LLMs constitutes fair use. Of important note, the Court emphasizes that the plaintiffs did not allege that any output from the LLM provided to users infringed upon the plaintiff’s work. Instead, the authors only challenged the inputs (what was being fed to the LLM), not what the LLM gave as an output. The court explained,

“[T]he purpose and character of using copyright works to train LLMs to generate new text was quintessentially transformative. Like any reader aspiring to be a writer, Anthropic’s LLMs trained upon works not to race ahead and replicate or supplant them – but to turn a hard corner and create something different. If this training process reasonably required making copies within the LLM or otherwise, those copies were engaged in transformative use.”

Copies used to convert purchased print copies into digital libraries: The court found that Anthropic’s purchasing of books, which were then stripped, scanned, and somewhat edited to create a PDF copy with machine-readable text as well as having lower-valued or repeating text (like headers, footers, or page numbers) removed was also fair use, albeit for a different reason. No consumers had access to these digital copies and the print versions were then destroyed after being scanned. The court noted that the plaintiffs only complained of Anthropic’s changing of each copy’s format from print to digital. However, the format change added no new copies, eased storage, and enable searchability, and was not done for purposes of trenching upon the copyright owner’s interest – it was transformative. Indeed, storage and searchability are not creative properties but physical properties of the frame around the work or information properties about it. However, the Court left open the door for an argument with respect to derivative works.
Copies made from digital library copies but not used for training: The Order noted that Anthropic had scanned in purchased copies of books but did not use them for training and instead simply kept them for purposes of having a “central library” and possibly use the copy at a later point. The court did not grant summary judgment for Anthropic on this issue as there were pending discovery issues, and the court did not feel it could give a correct answer with the record not fleshed out on these copies.

“Pirated books”

The court made it clear that the holding as to purchased books was not applicable to pirated copies of copyright protected books, as such the court noted.

“This order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that it could have purchased or otherwise accessed lawfully was itself reasonably necessary to any subsequent fair use. There is no decision holding or requiring that pirating a book that could have been bought at a bookstore was reasonably necessary to… creating an LLM. Such piracy of otherwise available copies is inherently, irredeemably infringing even if the pirated copies are immediately used for the transformative use and immediately discarded.”

While denying summary judgment on this issue, the court also appears to be signaling that the fair use issue will not be a winnable one for Anthropic for their use and retention of pirated books.

In practical terms, the court’s decision leaves open significant questions related to the use, training, and retention of copyright right protected works. However, the training of LLMs with works that were purchased and scanned likely qualifies as fair use.

If you have any questions regarding the content of this post, reach out to the attorneys in McDonald Hopkins' Data Privacy and Cybersecurity Practice Group.

As federal and state attention continues to heighten focus on the development and use of emerging technology, specifically Artificial Intelligence, the Federal Trade Commission (FTC) continues to be one of the most active regulators. While there are no signs of a Federal comprehensive Artificial Intelligence law, the FTC continues to rely on its existing authority under the FTC Act (15 U.S.C. § 46).

A new privacy rule in California, if enacted, may create new compliance obligations and risks for businesses subject to the CCPA that use artificial intelligence.

Amid concerns raised by tech companies and business leaders, the Colorado legislature convened a special legislative session to revisit certain aspects of the Colorado AI Act, a comprehensive law on the development and deployment of artificial intelligence systems.