There are dozens of cases pending against AI developers stemming from their use of copyrighted works to train generative AI models.  In response, developers have uniformly asserted that such use is a fair use.  To date, despite years of litigation, those cases have resulted in just one opinion:  a District of Delaware order that arose outside of the generative AI context and rejected the fair use defense as a matter of law.

Now, courts will have the benefit of a “pre-publication” version of the Copyright Office’s long-awaited Report on Generative AI Training (the “May 2025 Report”).  Unlike the Office’s prior two reports on AI, this third report was released in preliminary draft form, with the Office stating that the draft was being released “in response to congressional inquiries and expressions of interest from stakeholders,” and that the Office expected to soon issue a final version “without any substantive changes expected in the analysis or conclusions.”[1]

While the Office was certainly not wrong that its report on fair use was hotly anticipated, the release of a draft version of the report was unusual and called into question how much the industry—or courts—could rely on the Office’s conclusions.  Adding to the uncertainty, the day after the pre-publication report was released, the Trump administration dismissed the Register of Copyrights—a move she is challenging in court.  The new Registrar could revise the report, or never make it official in any form.

In the meantime, the May 2025 Report offers the only guidance to date from the Office regarding how it sees the question of fair use in the context of generative AI.  The May 2025 Report notes that the training of AI models, the models themselves, and their outputs can all involve copying or other acts implicating copyright rights.  The report concludes that the availability of the fair use defense should turn on the specific facts of each case:

The Office expects that some uses of copyrighted works for generative AI training will qualify as fair use, and some will not.  On one end of the spectrum, uses for purposes of noncommercial research or analysis that do not enable portions of the works to be reproduced in the outputs are likely to be fair.  On the other end, the copying of expressive works from pirate sources to generate unrestricted content that competes in the marketplace, when licensing is reasonably available, is unlikely to qualify as fair use.  Many uses, however, will fall somewhere in between.[2]

This Debevoise In Depth begins by summarizing the key issues raised in the pending copyright AI cases and the one recent decision on fair use in the non-generative AI context, before proceeding to discuss the genesis of the May 2025 Report, comments that the Office received before issuing the report, and the May 2025 Report’s conclusions regarding when copyright rights are implicated, when the fair use defense is available, and what licensing regime should be employed.

A. Background on Pending AI Lawsuits

Publishers and authors,[3] visual artists,[4] Getty Images,[5] open source code programmers,[6] and others have brought claims against Open AI, Stability AI, Meta,[7] Google,[8] NVIDIA,[9] and other AI model developers raising three primary copyright infringement arguments:  (1) the use of plaintiffs’ copyrighted works as inputs to train AI tools is direct copyright infringement; (2) AI models themselves are infringing derivative works; and (3) the AI models’ outputs are directly infringing—either because they include direct copies of the copyrighted works, or because they are derivative works—and AI tool creators are vicariously or contributorily liable for users’ direct infringement.

AI developers have raised various defenses.  When it comes to inputs, they argue that use of copyrighted works in training sets is (1) not infringing because the training of generative models uses input data not for its expressive content, but to recognize patterns rooted in the unprotectable elements of the works[10] and/or (2) transformative fair use that “leverage[s] existing works internally . . . to new and useful ends.”[11]

When it comes to the training models themselves, AI developers have argued that they do not store copyrighted works.  One court described allegations that Meta’s models themselves were infringing derivative works as “nonsensical.” [12]  In a case against Stability AI, however, the court found that the plaintiff had plausibly alleged that the model itself could infringe since protected elements of copyrighted works remained, in some format, within the model.[13]

And when it comes to outputs, AI developers have argued that (1) outputs are not infringing and/or (2) even if they are, AI developers cannot be held liable for infringement by users who generate infringing outputs since they do not have sufficient control over users and/or knowledge of users’ specific acts of infringement.[14]

As to the first argument on outputs, courts have come out differently depending on the allegations made.  A Northern District of California court found that authors had failed to allege that ChatGPT outputs contained direct copies of their copyrighted books or how they constituted derivative works,[15] even though they had alleged that ChatGPT generates accurate summaries of their books’ content and themes.[16]  On the other hand, a Southern District of New York court found that news publishers had presented sufficient evidence of copying, “including more than 100 pages of examples provided in Exhibit J to the Times complaint, and dozens of examples in Exhibit J to the Daily News complaint,” to survive a motion to dismiss.[17]

As to AI developers’ second argument, the same Southern District of New York court concluded that defendants could potentially be liable since they “possessed far more than a ‘generalized knowledge of the possibility’ of third-party infringement” given that “copyright infringement was ‘central to [defendants’] business model.’”[18]

B. One Recent Court Decision on Fair Use in the Non-Generative AI Context

One court—the District of Delaware in the Thomson Reuters v. ROSS case—has weighed in on the fair use defense, granting summary judgment for the plaintiff on direct copyright infringement and fair use.  The court noted, however, that “Ross’s AI is not generative AI (AI that writes new content itself).”[19]

By way of background, Thomson Reuters’s Westlaw contains headnotes that summarize key points of law and case holdings.  Ross used third-party LegalEase’s Bulk Memos—compilations of legal questions with answers based on Westlaw headnotes—as training data to create a legal research tool that could produce quotations from judicial opinions in response to natural language questions.  When users enter a question, Ross “spits back relevant judicial opinions that have already been written.”[20]  The headnotes do “not appear as part of the final product that Ross put forward to consumers.”[21]  Rather, Ross used the headnotes only at an intermediate step during training:  it turned them into “numerical data about the relationships among legal words” and used those relationships to identify relevant caselaw passages.[22]

The court concluded, as a matter of law at summary judgment, that Ross’s use was not transformative fair use since Ross was using headnotes “as AI data to create a legal research tool to compete with Westlaw,” and its “process resembles how Westlaw uses headnotes . . . to return a list of cases with fitting headnotes.”[23]  The court concluded that cases permitting intermediate copying as fair use were inapposite since they (a) were in the computer code context and (b) involved situations where “copying was necessary for competitors to innovate.”[24]

Ross has appealed the decision to the Third Circuit, which will likely become the first appeals court to weigh in on the fair use defense in the context of training AI systems.  In the meantime, we can expect to see the first district court decisions addressing the fair use defense in the generative AI context in the coming months.

C. Background on the Copyright Office’s AI Reports

Around the time that many of these cases were first filed, in October 2022, members of the Senate Judiciary Subcommittee on Intellectual Property wrote a letter to the USPTO and Copyright Office explaining that they were “considering what changes, if any, may need to be made to our intellectual property laws in order to incentivize future AI related innovations and creations.”[25]

In March 2023, the Copyright Office announced a broad initiative to examine the copyright implications of generative AI.  In 2023, the Office began gathering information from stakeholders, through a series of public listening sessions, meetings with multiple stakeholders, and soliciting public comments on 34 questions relating to:  (1) the treatment of generative AI outputs that imitate the identity or style of human artists; (2) the copyrightability of material generated using AI systems; and (3) the legal implications of training AI models on copyrighted works.  The Office received over 10,000 written comments, including from authors, publishers, legal scholars, technology companies, and videogame developers.[26]

The Office released its first report on July 31, 2024; it concluded that federal legislation was urgently needed to address the unauthorized distribution of digital replicas and provided recommendations for the contours of such a law.[27]

The Office released its second report on January 29, 2025, reaffirming its view that copyrightability is determined on a case-by-case basis and requires human authorship, including sufficient human input and creativity.

On May 9, 2025, the Office issued a “pre-publication” version of its third report, which focuses on generative AI training and on which this article focuses.

As noted above, the Trump administration dismissed the Register of Copyrights Shira Perlmutter on May 10, 2025.  In late May, she sued to be reinstated.  In her complaint, she alleges that “[a] fourth and final part of the report is in the process of being finalized, and will address the topic of potential liability for infringing AI outputs.”[28]  In support of her motion for a temporary restraining order that would allow her to return to her job, she argued that she may “be unable to complete the report as expected by Congress” if someone else is appointed to her role.[29]  This mention of a fourth report surprised many since the Office had long said “[t]he Report is being released in three Parts.”[30]  That said, the May 2025 Report focuses on training, which many agree involves a different analysis than the outputs of AI tools.

D. Comments the Copyright Office Received Relating to the Third Report

We reviewed the comments that dozens of stakeholders submitted to the Office and identified some clear trends in views on liability, as well as the need for legislation addressing the use of copyrighted works to train AI tools.

AI Developer Views on Liability

Unsurprisingly, AI developers such as OpenAI, Google, and Meta reiterated their arguments from pending lawsuits that training should not be subject to copyright liability.[31]  They acknowledged that copyrighted data is “temporarily accessed for the unprotected ideas, concepts, and styles contained in the dataset—say, the number of fingers a human hand has, or what cars look like—to help the AI model learn facts about the world.”[32]  But they argued that this is a fair use; training transforms input data to a weight system and uses the weights—not the input data itself—to respond to user prompts.  For example, OpenAI explained:

Despite a common and unfortunate misperception of the technology, the models do not store copies of the information that they learn from. Instead, models are made up of large strings of numbers (called “weights” or “parameters”), which software code interprets and executes…. When asked for a response, the model uses its weights to write a new response each time it is asked. It does not copy its response from its pre-training data, or access it via a database.[33]

Developers also explained that they have implemented ways in which content owners can opt out of their works being used in training data.  For example, OpenAI explained that it has implemented a means for websites to exclude their content from being accessed by OpenAI’s “GPTBot” web crawler.[34]

When it comes to outputs, AI developers noted that the only situation in which they believe an output can be infringing (i.e., memorization) is quite rare.  For example, Google called it a “bug not a feature” that is best addressed by technological problem-solving rather than legislation.[35]  Developers explained that most models have built-in guardrails, including removing duplicates in data (decreasing the likelihood that they are reproduced in an output), excluding protected content identified by rightsholders, preventing users from inputting infringing prompts, and filtering outputs.[36]  Developers thus argued that users alone should be liable for generating infringing outputs after circumventing these guardrails.[37]

Content Owner Views on Liability

In contrast, many content owners argued that the training process necessarily constitutes infringement.[38]  For example, the Authors Guild commented that developers’ argument that training uses copyrighted works only for the relationships between the words and the contexts in which they are used “makes no sense as that is exactly what written expression is—words are combined in original ways to create meaning and art, and that expression and meaning is exactly what generative AI copies.”[39]  Content owners generally also argued that the use of copyrighted works to create outputs for a commercial purpose that competes with the original works cannot be considered fair.[40]

As to outputs, content owners argued that infringement because of memorization is more common than developers have acknowledged.  News/Media Alliance wrote, “[I]t is clear [AI models] are infested and overrun by these so-called ‘bugs.’”[41]  Some content owners also noted that users may be incentivized to tailor their prompts to effectively avoid a model’s guardrails and produce an infringing output.[42]

AI Developer Views on Views on Legislation

The majority of AI developers commented that existing doctrine creates a sufficient framework to address training-related issues.[43]  They spoke out against legislation that would require disclosure of training processes and datasets (arguing that they are trade-secret-protected), or any kind of collective or mandatory licensing of input data.  They emphasized the infeasibility of a licensing mandate given the massive amount of data required to successfully train a model.[44]

  • Meta asserted that “it would be impossible for any market to develop that could enable AI developers to license all of the data their models need,” noting that “[g]enerative AI models need not only a massive quantity of content, but also a large diversity of content”; deals with individual rightsholders “would provide AI developers with the rights to only a miniscule fraction of the data they need to train their models.”[45]
  • a16z commented that, “under any licensing framework that provided for more than negligible payment to individual rights holders, AI developers would be liable for tens or hundreds of billions of dollars a year in royalty payments,” which would serve as a barrier to AI development and innovation.[46]
  • Microsoft noted that excluding copyrighted data from training models—where licensing is not financially possible—would actually hurt newer and smaller developers the most as it would limit their avenues to create or collect the “massive and varied data sets” required for training.[47]
  • Similarly, Hugging Face deemed mandatory licensing a “worst of both worlds” scenario since it “would be costly enough to exclude any but the very largest companies from training new models, while still providing negligible additional income to the original data creators.”[48]

Some commenters even advocated for legislation to protect AI developers.  For example, the Computer and Communications Industry Association, whose members include Meta and Google, argued that—while existing legal frameworks “clearly permit the ingestion of large amounts of copyrightable material for the purpose of an AI algorithm or process learning its function”—there should be an explicit legislative exemption making clear that use of copyrighted works for AI training purposes does not constitute infringement. [49]  Some tech companies have since lobbied the administration to declare it categorically lawful to use copyrighted works for AI training.[50]

Content Owner Views on Legislation

Many content owners also called for legislation to address the use of copyrighted works as inputs in training AI tools.

First, they called for disclosure and transparency requirements around training records to ensure the ability to detect infringement.[51]  For example, the Directors Guild of America commented that technology companies should be required “to maintain detailed records that track the content they ingest to train their models to produce the manipulated works requested by the user.”[52]

Second, they advocated for an opt-in regime for a rightsholder’s works to be used in generative AI model training rather than the opt-out regime some developers have proposed.[53]  Further, they dismissed AI developers’ arguments regarding the infeasibility of licensing—and argued for voluntary, collective, or even compulsory licensing.

  • Authors Guild commented that AI developers can afford licenses since they are “spending millions and even billions on development and computing power.”[54]
  • Getty Images asserted that “[l]icenses to scaled quantities of content and metadata required to train Generative AI Models are already readily available.”[55]
  • Copyright Alliance commented that allowing developers to rely on the volume of works being used to avoid licensing requirements “would simply incentivize infringers to illegally copy more as a means for avoiding infringement—that cannot possibly be the law.”[56] It further said that “[t]he notion that licensing should not be required because these royalties may be small would turn copyright, and many other licensing models, on its head.”[57]

Third, some commentators, like the Authors Guild, suggested that legislation would be needed “[i]f the courts find fair use or leave openings for the unauthorized use of commercial literary texts for AI training purposes.”[58]  The Directors Guild of America even championed legislation that would limit the fair use defense so that it can only be “employed by humans and not machines, including [AI] models.”[59]

E. Summary of the Copyright Office’s May 2025 Report

The May 2025 Report addressed in detail:  (1) when copyright rights are implicated in the use of generative AI tools; (2) how each of the four fair use factors applies; and (3) the forms of licensing that can best accommodate the interests of both copyright owners and AI companies.

The Office conducted its analysis within the existing legislative framework; did not recommend any changes to that framework; and explicitly recommended against Congress adopting compulsory licensing for generative AI.

When Copyright Rights Are Implicated in the Use of AI Tools

As noted above, the Office concluded that virtually every stage of generative AI tool use can implicate copyright rights.

When it comes to the use of copyrighted works to train AI tools, the Office concluded that the steps required to produce a training dataset containing copyrighted works “clearly implicate the right of reproduction” since “[d]evelopers make multiple copies of works by downloading them; transferring them across storage mediums; converting them to different formats; and creating modified versions or including them in filtered subset.”[60]  The Office also concluded that “[t]he training process also implicates the right of reproduction” since developers “download the dataset and copy it to high-performance storage prior to training” and “works or substantial portions of works are temporarily reproduced as they are ‘shown’ to the model in batches.”[61]

With respect to the models themselves, the Office concluded that “[w]hether a model’s weights implicate the reproduction or derivative work rights turns on whether the model has retained or memorized substantial protectable expression from the work(s) at issue.”[62]  As noted above, one court has dismissed infringement allegations against an AI model itself as “nonsensical,”[63] while another court has found allegations that a model itself was infringing sufficient to survive a motion to dismiss.  If courts ultimately conclude that certain models themselves are infringing, the Office acknowledged that “subsequent copying of the model weights, even by parties not involved in the training process, could also constitute prima facie infringement.”[64]

Finally, as to outputs, the Office concluded that outputs that replicate or closely resemble copyrighted works “likely infringe the reproduction right and, to the extent they adapt the originals, the right to prepare derivative works.”[65]

Fair Use

As noted above, the Office recognized that the fair use analysis is fact-specific and laid out the situations in which it believed that each factor would favor fair use—leaving the door open for some AI developers’ activities to be considered fair use, but not others.

Factor 1 – On the first factor, courts typically stress two main elements:  transformativeness and commerciality.

As to transformativeness, the Office rejected AI developers’ arguments that the use of copyrighted works to train AI models is not for expressive purposes.  It noted that language models absorb “not just the meaning and parts of speech of words, but how they are selected and arranged at the sentence, paragraph, and document level—the essence of linguistic expression.”[66]

The Office concluded, however, that “training a generative AI foundation model on a large and diverse dataset will often be transformative” since AI models “perform a variety of functions, some of which may be distinct from the purpose of the copyrighted works they are trained on,” such as helping learn a foreign language by chatting with users on diverse topics and offering corrective feedback.[67]  The Office clarified:

[T]ransformativeness is a matter of degree, and how transformative or justified a use is will depend on the functionality of the model and how it is deployed.  On one end of the spectrum, training a model is most transformative when the purpose is to deploy it for research, or in a closed system that constrains it to a non-substitutive task. . . .   On the other end of the spectrum is training a model to generate outputs that are substantially similar to copyrighted works in the dataset. . . . .   Where a model is trained on specific types of works in order to produce content that shares the purpose of appealing to a particular audience, that use is, at best, modestly transformative.[68]

As to commerciality, the Office concluded that “the analysis should not turn on the status of any individual entity but on the reality of whether the specific use in question serves commercial or nonprofit purposes.”[69]

The Office also noted—in light of allegations raised in litigation that some AI developers accessed pirated works or circumvented paywalls to obtain the copyrighted works they then used as AI training data—that “the knowing use of a dataset that consists of pirated or illegally accessed works should weigh against fair use without being determinative.”[70]

Factor 2 – For the second factor, the nature of the copyrighted work, the Office recognized that “facts will vary depending on the model and works at issue” and concluded that, “[w]here the works involved are more expressive, or previously unpublished, the second factor will disfavor fair use.”[71]

Factor 3 – On the third factor, the question is whether “the amount and substantiality of the portion used in relation to the copyrighted work as a whole, . . .  are reasonable in relation to the purpose of the copying.”[72]

When it comes to the use of copyrighted works in inputs, the Office concluded that this factor depends on the needs of the specific model at issue.  That is, “there may be cases where a more targeted round of training has more limited data requirements,” so using the entire works may not be justified.  In other cases, however, “the use of entire works appears to be practically necessary” such as where “internet-scale pre-training data, including large amounts of entire works, [is] necessary to achieve the performance.”[73]

When it comes to the use of works in outputs, the Office noted that the third factor “may weigh less heavily against generative AI training where there are effective limits on the trained model’s ability to output protected material from works in the training data” (i.e., to avoid memorization issues).[74]

Factor 4 – The fourth factor is “the effect of the use upon the potential market
for or value of the copyrighted work.”[75]  The Office recognized three potential categories of harm resulting from generative AI tools.

First, it considered lost sales and concluded that a potential for such harm is “particularly clear in the case of works specifically developed for AI training.”  That is, when content in training datasets is copyrightable and is primarily or solely targeted at AI training, “widespread unlicensed use would likely cause market harm.”[76]  The Office noted that lost sales also are possible “where training enables a model to output verbatim or substantially similar copies of the works trained on, and those copies are readily accessible by end users.” [77]

Second, the Office noted that “[l]ost revenue in actual or potential licensing markets can also be an element of market harm.”[78]  The Office recognized, however, that “it is also unclear that markets are emerging or will emerge for all kinds of works at the scale required for all kinds of models.”[79]  Thus, the Office concluded that “[w]here licensing markets are available to meet AI training needs, unlicensed uses will be disfavored under the fourth factor.  But if barriers to licensing prove insurmountable for parties’ uses of some types of works, there will be no functioning market to harm and the fourth factor may favor fair use.”[80]

Third, the Office recognized potential harm through market dilution, a theory that no court has yet adopted, and the Office itself called “uncharted territory.”  The Office explained:

The speed and scale at which AI systems generate content pose a serious
risk of diluting markets for works of the same kind as in their training data.  That means more competition for sales of an author’s works and more difficulty for audiences in finding them.  If thousands of AI-generated romance novels are put on the market, fewer of the human-authored romance novels that the AI was trained on are likely to be sold.  Royalty pools can also be diluted. . . .

Market harm can also stem from AI models’ generation of material stylistically similar to works in their training data. . . .  Even when the output is not substantially similar to a specific underlying work, stylistic imitation made possible by its use in training may impact the creator’s market.[81]

In short, the Office left open the door for content creators to argue market harm—even when they are unable to show the loss of sales of a specific copyrighted work or licensing fees.

Licensing

The Office noted that “[f]ully licensed training datasets have supported the production of AI models and products capable of producing text, images, and music”[82] and recommended “allowing the licensing market to continue to develop without government intervention.”[83]

The Office “agree[d] with commenters that a compulsory licensing regime for AI training would have significant disadvantages,” including stifling the development of creative market-based solutions.[84]  The Office recognized that “[t]he growing licensing market does not itself establish that voluntary licensing is feasible at scale for all AI training needs.”[85]  Instead, the feasibility of licensing “will depend on the types of works needed, the licensing practices of the relevant industries, the design of the AI system, and its intended uses.  For instance, licensing a music model that can produce rudimentary jingles is different from licensing a state-of-the-art LLM that can compete on advanced reasoning benchmarks.”[86]

The Office concluded that an extended collective licensing (ECL) system “should be considered” if market failures “are shown as to specific types of works in specific context.”[87]  The Office stated that “courts have found that there is nothing intrinsically anticompetitive about the collective, or even blanket, licensing of copyrighted works, as long as certain safeguards are incorporated—such as ensuring that licensees can still obtain direct licenses from copyright owners as an alternative.”[88]  The Office “encourage[d] the Department of Justice to provide guidance, including on the benefit of an antitrust exemption in this context.”[89]

Implications of the Copyright Office’s Report

The extent to which Congress or courts will rely on the Office’s May 2025 Report when drafting legislation and ruling in pending cases, respectively, is unclear—especially but not solely, in light of its having been released in draft form.  Even final Copyright Office reports generally lack the “force of statute” and are not binding.[90]  But several courts have recognized the Office’s expertise in the interpretation of the Copyright Act and held that the Office’s interpretations are given a “great deal of [] Skidmore deference”; their weight is evaluated under the traditional factors of care, thoroughness, consistency, formality, expertise, validity of reasoning, and persuasiveness.[91]  Because the May 2025 Report is “pre-publication,” however, Congress and the courts may be inclined to give less weight to the Office’s reasoning—despite its claim that there are no “substantive changes expected in the analysis or conclusions.”[92]

That said, even insofar as the May 2025 Report offers guidance, it suggests that the viability of a fair use defense will depend on the facts and circumstances of particular cases, and bright-line rules will be few and far between.  That conclusion—coupled with the recent decision in the Thompson Reuters case—demonstrates that this will continue to be hard-fought terrain in each litigation.

We are continuing to monitor the dozens of AI cases that are proceeding through the courts.  To stay up to date on developments, please subscribe to the Debevoise Data Blog.

 

[1] May 2025 Report (cover), https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-3-Generative-AI-Training-Report-Pre-Publication-Version.pdf.

[2] May 2025 Report at 74.

[3] The New York Times Co. v. Microsoft Corp., No. 1:23-cv-11195 (S.D.N.Y.); Daily News LP v. Microsoft Corp., No. 1:24-cv-03285 (S.D.N.Y.); Authors Guild v. OpenAI Inc., No. 1:23-cv-08292 (S.D.N.Y.); Alter v. OpenAI Inc., No. 1:23-cv-10211 (S.D.N.Y.); Basbanes v. Microsoft Corp., 1:2024-cv-00084 (S.D.N.Y.).

[4] Andersen v. Stability AI Ltd., No. 3:23-cv-00201 (N.D. Cal.).

[5] Getty Images v. Stability AI Ltd., No. 1:23-cv-00135 (D. Del.).

[6] Doe v. GitHub, Inc., No. 4:22-cv-06823 (N.D. Cal.).

[7] Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417 (N.D. Cal.).

[8] Second Am. Compl. ¶¶ 13, 24, In re Google Generative AI Copyright Litig., No. 5:23-cv-3440 (N.D. Cal. June 27, 2024), ECF No. 47.

[9] Compl. ¶ 3, Nazemian v. NVIDIA Corp., No. 4:24-cv-01454 (N.D. Cal. Mar. 8, 2024), ECF No. 1; Compl. ¶ 3, Dubus v. NVIDIA Corp., No. 4:24-cv-02655 (N.D. Cal. May 2, 2024), ECF No. 1.

[10] Answer to First Consol. Am. Compl. at 2-3, Tremblay v. OpenAI, Inc., No. 3:23-cv-03223 (N.D. Cal. Aug. 27, 2024), ECF No. 176; OpenAI Defs.’ Answer to First Consol. Class Action Compl. at 3-4, Authors Guild v. OpenAI, Inc., No. 1:23-cv-08292 (S.D.N.Y. Feb. 16, 2024), ECF No. 75.

[11] Mem. of Law in Support of OpenAI Defs.’ Mot. to Dismiss at 8, The New York Times Co. v. Microsoft Corp., No. 1:23-cv-11195 (S.D.N.Y. Feb. 26, 2024), ECF No. 52.

[12] Kadrey v. Meta Platforms, Inc., No. 23-cv-3417, 2023 WL 8039640, at *1 (N.D. Cal. Nov. 20, 2023).

[13] Andersen v. Stability AI Ltd., 744 F. Supp. 3d 956, 982–84 (N.D. Cal. 2024).

[14] E.g., Mem. of Law in Support of OpenAI Defs.’ Mot. to Dismiss at 16, The New York Times Co. v. Microsoft Corp., No. 1:23-cv-11195 (S.D.N.Y. Feb. 26, 2024), ECF No. 52.

[15] Order Granting in Part and Denying in Part the Motions to Dismiss at 5, Tremblay v. OpenAI, Inc., 3:23-cv-03223 (N.D. Cal. Feb. 12, 2024), ECF No. 104 (“Plaintiffs here have not alleged that the ChatGPT outputs contain direct copies of the copyrighted books. Because they fail to allege direct copying, they must show a substantial similarity between the outputs and the copyrighted materials.”); see also Order Granting Mot. to Dismiss at 2, Kadrey v. Meta Platforms, Inc., No. 3:23-cv-03417 (N.D. Cal. Nov. 20, 2023), ECF No. 56 (plaintiffs had failed to allege “any similarity between LLaMA outputs and their books”).

[16] See, e.g., Compl. ¶ 41, Tremblay v. OpenAI, Inc., 3:23-cv-03223 (N.D. Cal. June 28, 2023), ECF No. 1.

[17] New York Times Co. v. Microsoft Corp., No. 23-CV-11195 (SHS), 2025 WL 1009179, at *10 (S.D.N.Y. Apr. 4, 2025).

[18] Id. at *10 (citations omitted).

[19] Thomson Reuters Enter. Ctr. GMBH v. Ross Intel. Inc., No. 1:20-CV-613-SB, 2025 WL 458520, at *7, 8 (D. Del. Feb. 11, 2025) (“only non-generative AI is before me today”).

[20] Id.

[21] Id.

[22] Thomson Reuters Enter. Ctr. GMBH v. Ross Intel. Inc., No. 1:20-CV-613-SB, 2025 WL 458520, at *7 (D. Del. Feb. 11, 2025).

[23] Id. at *10.

[24] Id. at *8 (emphasis in original).

[25] October 27, 2022 Letter from members of the Senate Judiciary Subcommittee on Intellectual Property to the USPTO and Copyright Office, available at https://www.copyright.gov/laws/hearings/Letter-to-USPTO-USCO-on-National-Commission-on-AI-1.pdf.

[26] February 23, 2024 Letter from the Copyright Office to the Senate Judiciary Subcommittee on Intellectual Property, available at https://www.copyright.gov/laws/hearings/USCO-Letter-on-AI-and-Copyright-Initiative-Update-Feb-23-2024.pdf.

[27] Copyright and Artificial Intelligence, Part 1: Digital Replicas, available at https://www.copyright.gov/ai/Copyright-and-Artificial-Intelligence-Part-1-Digital-Replicas-Report.pdf.

[28] Shira Perlmutter v. Todd Blanche, et al., No. 25-cv-1659 (D.D.C.), ECF No. 1 ¶ 19.

[29] Shira Perlmutter v. Todd Blanche, et al., No. 25-cv-1659 (D.D.C.), ECF No. 2-1 at 11.

[30] See, e.g., “Copyright Office Releases Part 2 of Artificial Intelligence Report” (January 29, 2025), available at https://www.copyright.gov/newsnet/2025/1060.html.

[31] See generally, Microsoft Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8750; OpenAI, Inc. Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8906; Google LLC Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-9003; Meta Platforms, Inc. Comment Letter (Nov. 1, 2023), https://www.regulations.gov/comment/COLC-2023-0006-9027; Adobe Inc. Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8594.

[32] See Adobe Inc. Comment Letter at 3 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8594.

[33] OpenAI, Inc. Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8906.

[34] OpenAI, Inc. Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8906.

[35] Google LLC Comment Letter at 13 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-9003.

[36] See Google LLC Comment Letter at 13 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-9003; see also OpenAI, Inc. Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8906; Anthropic PBC Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-9021.

[37] See Google LLC Comment Letter at 14 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-9003.

[38] See generally, Authors Guild Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-10320; Getty Images Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-9044; New York Times Company Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8868; News/Media Alliance Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8956; Universal Music Group Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-9014.

[39] Authors Guild Comment Letter at 2 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-10320.

[40] Rightsify Initial Comments at 4 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8746.  See also IAC-DDM Joint Initial Comments at 7 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8922 (“IAC-DDM Joint Initial  Comments”) (“[T]he massive and systematic copying of copyrighted content for an avowedly commercial and substitutive purpose does not present a hard or close case.”); N/MA Reply Comments at 15 (Dec. 7, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-10318 (“Case law has generally not permitted  copying for purposes that do not comment on or at least point to the original works, outside of defined, limited exceptions, such as to access functional computer code for interoperability purposes.”); UMG Initial Comments at 39–40 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-9014 (“We can think of no precedent for finding this kind of wholesale, commercial taking that competes directly with the copyrighted works appropriated to be fair use.”).

[41] News/Media Alliance Second Comment Letter at 11 (Dec. 7, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-10318.

[42] Motion Picture Association Comment Letter at 60 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8970.

[43] See generally, Adobe Inc. Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8594; Google LLC Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-9003; Meta Platforms, Inc. Comment Letter (Nov. 1, 2023), https://www.regulations.gov/comment/COLC-2023-0006-9027; Microsoft Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8750; OpenAI, Inc. Comment Letter (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8906.

[44] See Electronic Frontier Foundation Comment Letter at 4 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8949.

[45] Meta Initial Comments at 17 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-9027.

[46] a16z Initial Comments at 10–11 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-9057.

[47] Microsoft Comment Letter at 6 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8750.

[48] Hugging Face Initial Comments at 11 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8969.

[49] Computer and Communications Industry Association Comment Letter at 4‒5, 7 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8740.

[50] https://www.nytimes.com/2025/03/24/technology/trump-ai-regulation.html.

[51] See, e.g., Universal Music Group Second Comment Letter at 6 (Dec. 7, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-10331 (noting a contradiction that developers want to protect training records as trade secrets yet want unlimited access to data and content).

[52] Director’s Guild of America Second Comment Letter at 3 (Dec. 7, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-10305.

[53] See Adobe Inc. Comment Letter at 4 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-8594 (“Content Credentials allow creators to securely attach a ‘Do Not Train’ tag in the metadata of their work to indicate a preference to opt out of AI training.”).

[54] Authors Guild Reply Comments at 4 (Dec. 7, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-10320.

[55] Getty Images Initial Comments at 20 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-9044.

[56] Copyright Alliance Initial Comments at 72 (Dec. 7, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-10319.

[57] Copyright Alliance Reply Comments at 27(Dec. 7, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-10319.

[58] Authors Guild Comment Letter at 16 (Nov. 1, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-9036.

[59] Director’s Guild of America Second Comment Letter at 3 (Dec. 7, 2023), available at https://www.regulations.gov/comment/COLC-2023-0006-10305.

[60] May 2025 Report at 26.

[61] May 2025 Report at 27.

[62] May 2025 Report at 30.

[63] Kadrey v. Meta Platforms, Inc., No. 23-cv-3417, 2023 WL 8039640, at *1 (N.D. Cal. Nov. 20, 2023).

[64] May 2025 Report at 28.

[65] May 2025 Report at 31.

[66] May 2025 Report at 47.

[67] May 2025 Report at 45.

[68] May 2025 Report at 46.

[69] May 2025 Report at 51.

[70] May 2025 Report at 52.

[71] May 2025 Report at 54.

[72] Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569, 586 (1994).

[73] May 2025 Report at 57.

[74] May 2025 Report at 59.

[75] 17 U.S.C. § 107(4).

[76] Report 63.

[77] Report 63.

[78] May 2025 Report at 66.

[79] May 2025 Report at 70.

[80] May 2025 Report at 71.

[81] May 2025 Report at 65-66.

[82] May 2025 Report at 90.

[83] May 2025 Report at 106.

[84] May 2025 Report at 104-105.

[85] May 2025 Report 103.

[86] May 2025 Report 103.

[87] May 2025 Report at 106.

[88] May 2025 Report at 104.

[89] May 2025 Report at 104.

[90] See e.g., Kitchens of Sara Lee, Inc. v. Nifty Foods Corp., 266 F.2d 541, 544 (2d Cir. 1959); Abs Entm’t v. CBS Corp., 908 F.3d 405, 417 n.5 (9th Cir. 2018).

[91] See e.g., WPIX, Inc. v. ivi, Inc., 765 F. Supp. 2d 594, 604 (S.D.N.Y. 2011) (finding that Office’s interpretation was entitled to “substantial weight” where Office had “demonstrated extreme diligence and thoughtfulness in gathering comments, doing research, addressing all relevant considerations, and explaining its decisions”); Brunson v. Cook, No. 3:20-cv-01056, 2023 WL 2668498, at *5 (M.D. Tenn. Mar. 28, 2023) (recognizing several circuits have applied Skidmore deference).

[92] May 2025 Report (cover).

***

The cover art used in this blog post was generated by Microsoft Copilot.

Author

Megan K. Bannigan is a partner and member of the Litigation and Intellectual Property & Media Groups, focusing on trademarks, trade dress, copyrights, false advertising, design patents, rights of publicity, licensing and other contractual disputes. She represents clients across a range of industries, including consumer products, cosmetics, entertainment, fashion and luxury goods, financial services, food and beverage, pharmaceuticals, professional sports and technology. She can be reached at mkbannigan@debevoise.com

Author

Barbara N. Barath is a seasoned intellectual property litigator and experienced trial lawyer based in the San Francisco office, who helps lead the firm’s technology litigation practice within the Intellectual Property Litigation Group. She can be reached at bnbarath@debevoise.com.

Author

Christopher S. Ford is a counsel in the Litigation Department who is a member of the firm’s Intellectual Property Litigation Group and Data Strategy & Security practice. He can be reached at csford@debevoise.com.

Author

Kaumron Khorrami is an associate in the Litigation Department. He can be reached at kkhorram@debevoise.com.