Copyright or Copywrong? The Legal Challenges of Artificial Intelligence
- Guramrit DHILLON
- May 6
- 9 min read
Written by Meredith Wang
Edited by Yatika Singh
Meredith is an undergraduate in the Dual BA Program between Sciences Po and Columbia University, studying Government and Economics. She's passionate about politics, law, climate action, and everything in between.
The future of the international order seems increasingly dependent on technological advancement, as OpenAI and Google are lobbying the US government to allow their models to train on copyrighted material [1]. The companies argue that this is the only way for their AI models to compete with Chinese counterparts, and prevent risk to national security. However, dozens of accumulated copyright lawsuits reflect the controversy of Artificial Intelligence in its usage and ethicality.
By analyzing the legal battles in the cases of Thomson Reuters v. Ross Intelligence, the New York Times Co. v. Microsoft Corp. et al., and Sarah Silverman et al. v. Open AI, this paper examines how judicial interpretations of copyright infringement and fair use in AI training could shape the future legal framework for GenAI, and intellectual property protection.
Copyright and Fairuse
Under the U.S. Copyright Act, as soon as a creator sets a work down in a tangible medium (i.e. drawing on paper, words on a website et cetera), the creator has the exclusive right to reproduce, prepare derivative works based on works, distribute copies of the work to public, display the work in the case that it is a form of art, and more [2]. The goal of copyright is to balance encouraging the creation of new and useful works by providing incentives to creators through the monetization of their work, with reasonable limitations on such protection so society can still benefit from new ideas and information.
In order to reconcile these conflicting goals, there is a notable exception to this rule among many, namely the fair use doctrine. In the U.S., Section 107 of the Copyright Act codified this defense, which states that fair use of a work containing “for purposes such as criticism, comment, news reporting, teaching (including multiple copies for classroom use, scholarship, or research)” [3] would not count as infringement of copyright. Fair use is an affirmative defense to a claim of copyright infringement, meaning that the burden is on the alleged infringer to prove their use was fair use – what OpenAI and Google is claiming all their usage falls under. Although the question of whether AI machine learning falls under fair use remains contested, current judicial interpretations of copyright infringement and fair use help shape the future legal framework for generative AI and intellectual property protection.
Thomson Reuters v. Ross Intelligence: A Landmark Case
The case of Thomson Reuters v. Ross Intelligence offers one of the most recent and detailed examinations of AI copyright infringement. Thomson Reuters, a Canadian technology conglomerate, provides legal research tools and proprietary data through the platform Westlaw. On the other side is ROSS Intelligence, a legal research industry upstart seeking to create a “natural language search engine” [4].
Thomson Reuters owns Westlaw, a prominent legal research platform that provides headnotes (summarizing main points of law and case holdings) and compiles judicial opinions based on its “Key Number System” (organising opinions by the type of law), both of which are registered by Thomson Reuters as copyrighted material. ROSS initially attempted to licence Westlaw’s content to train its AI system, [5] but Thomson Reuters refused since ROSS was its direct competitor. ROSS then pivoted towards a third-party legal-research company, LegalEase Solutions. ROSS converted mass amounts of LegalEase memos into usable machine-learning training data, like first encoding the written language as numerical data, then running the data through a “Featurizer” and categorizing and performing mathematical calculations on the text. The main contention of the case lies in ROSS’ use of Westlaw’s copyrighted headnotes – concise summaries of judicial decisions, especially prevalent in ROSS’ Bulk Memo Project [6].
In response, ROSS admits that the headnotes “influence(d)” [7] the questions but lawyers ultimately drafted them, and denied copying them. Also on LegalEase, there were 91 legal topics provided from Westlaw’s Key Number System. Again, ROSS admits to “considering” these topics when creating its own set of 38 topics in creation of an experimental “Classifier Project,” which was later abandoned.
In the landmark decision on February 11, 2025, Judge Bibas revised his 2023 decision and found direct infringement of 2,243 headnotes [8]. While the court had originally denied summary judgement on most issues and said issues should be resolved by a jury (whether the headnotes met the originality threshold under copyright law, or simply overlapped with portions of uncopyrightable judicial decisions), Judge Bibas concluded that many issues could be resolved in Thomson Reuters’ favor. Judge Bibas acknowledged that he was wrong to have focused only on this overlap, since Thomson Reuters’ selection and arrangement of its headnotes were copyrightable. Further, Thomson Reuters’ individual headnotes were found to satisfy the originality standard, since the editorial discretion and selection process was the attorneys’ creative work to be protected. Hence, the found substantial similarity between the Bulk Memos, produced by ROSS, and headnotes, produced by Thomson Reuters was proven to breach copyrighted content.
There are four factors to a fair use defense: the purpose and character of use (whether it’s commercial or nonprofit, educational purposes), the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect on the potential market for or value of the work [9]. The first and fourth factors were ruled in favor of Thomson Reuters while the two remaining favored ROSS [10].
The court first rejected ROSS’ argument that their usage is protected under “transformative use”, stating that ROSS “does not have a further purpose or different character from Thomson Reuters.” While the court acknowledged that certain previous courts had found that intermediate copying supported the fair use defense, the court found those cases distinct as they concerned computer code that serves a functional purpose out of necessity. The court also noted in cases where intermediate copying was allowed, it was deemed necessary to innovate (i.e. discover the functional requirement for compatibility), which was not the case with ROSS’ use of Thomson Reuters’ headnotes. Instead, this instance of intermediate copying was to “make it easier to develop a competing legal research tool.”
Regarding the second factor, the court favors ROSS, ruling that the Westlaw headnotes and Key Number System are minimally creative. However, the court ultimately decided this played a lower role in the overall fair use analysis. Similarly, the third factor favors ROSS, as Thomson Reuter’s AI training data was an insubstantial component to Ross’s final product for the public.
On the final factor, the court found that ROSS’ use of headnotes could impact Thomson Reuters’ business of legal research, and its right to sell data for training purposes, hence Thomson Reuter’s potential AI training data market. The court concluded that copyright encourages people to create works like legal research tools, and Ross could have created its own legal search tool without infringing on Thomson Reuter’s rights.
The first decision of Thomson Reuters v. ROSS Intelligence serves as an important precedent regarding growing jurisprudence of AI training data cases and traditional copyright laws. It’s important to note that the case was decided under the specific context of ROSS’ final output product being a legal-research engine that utilizes artificial intelligence, not a GenAI tool directly. Hence, there may be more compelling arguments of “transformative use” contra the first factor as presented by this case in particular. Nevertheless, this recent ruling solidified the importance of protecting structured legal content from unauthorized AI training, and the need to further address the use of copyrighted AI training data.
The New York Times v. Microsoft and OpenAI: On Journalism
Similar to the case of Thomson Reuters is that of The New York Times Co. vs Microsoft Corp., et al. In December 2023, the New York Times Company filed a lawsuit against Microsoft Corporation and OpenAI [11], alleging unauthorized use of its copyrighted content to train large-language models (LLMs). The Times contends that Microsoft’s “Browse with Bing” and ChatGPT by OpenAI reproduce substantial portions of its content without proper attribution.
Under the framework of the four factors for fair use – Times claims that the OpenAI and Microsofts’ AI model often regurgitate journalists’ quotes without “transformative usage.” Further, in contrast to the case of Thomson Reuters v. Ross Intelligence, Times champions the “creative and deeply human” [12] work of their journalists. The third factor is less contested – Microsoft and Bing has been trained from millions of copyrighted works from news organizations, which the publishers argue were used without consent or payment, amounting to substantial amounts of copyright infringement. Times attorney Ian Crosby maintained in addition, that ChatGPT and Bing has become a substitute in providing information against the publisher’s original work, hence in direct competition with Times.
OpenAI and Microsoft’s legal team respond by explaining that LLMs do not retrieve entire documents or simply regurgitate journalism, instead it extrapolates information and transforms the text. Under this characterization, Microsoft claims that copyright law is “no more an obstacle to the LLM than it was to the VCR” [13].
The outcome of this fair use case is still difficult to predict, as the case is still ongoing as of April of 2025. However, the court has decided to move forward with the case despite the public’s speculation that judges will favor OpenAI and be unwilling to make sanctuary measures, as it is a powerful tool with millions of users. Nevertheless, LLMS producing near-verbatim reproductions of their works threatens the very core of traditional journalism, and the livelihood of writers and journalists.
Sarah Silverman v. OpenAI: On the Creative Industry
While the cases of Thomson Reuters v. Ross Intelligence and Times v. Microsoft et al. stand as more promising and optimistic cases of holding AI data usage accountable, the case of Sarah Silverman v.OpenAI shows a more pessimistic combat against fair use.
In July 2023, comedian and author Sarah Silverman, along with novelists Christopher Golden and Richard Kadrey, filed a lawsuit against OpenAI, alleging that the company used their copyrighted books without permission to train its AI language model, ChatGPT [14]. The plaintiffs claimed that their works were obtained from unauthorized sources and incorporated into OpenAI's training data, leading to potential infringement when the AI generated summaries or content based on their books.
District Judge Araceli Martínez-Olguín struck down five claims accusing OpenAI of copyright violations, saying the authors have failed to prove economic injury and cite any particular output from ChatGPT that was “substantially similar,” [15] or similar at all to their books. As the authors failed to prove alleged direct copying of their work, their burden was to show “substantial similarity” between the AI outputs and the copyrighted materials. While the authors hit this significant roadblock in pursuit of their lawsuit, they claim they have evidence of “directly copying” [16] of the copyrighted books to train LLMs.
Though facing some setbacks, the case is ongoing, highlighting the complex legal challenges surrounding the use of creative works in training AI. Given the relatively underdeveloped areas of law surrounding AI and copyright protection, it’s perhaps more difficult to prove infringement in more creative industries, as the substantiality of infringement (third factor of fair use) and potential effect on the market value of the original work (fourth factor) are more difficult to establish compared to journalism or legal summaries, where LLMs rely far more on existing models of analysis and produce similar works in direct competition of original works.
Balancing AI Innovation and Copyright Protections
These cases reflect the broader challenges in regulating AI’s reliance on copyrighted materials. There persists a blurry yet important distinction between input to train an LLM, and directly putting original work into the generated output. While it is important for AI to enjoy the freedom of information for the sake of innovation for public good, it is also important to balance that exercise with the rights of attorneys who drafted original opinions in the case of Westlaw, journalists who perhaps risked their lives reporting on-ground in the case of the Times, and artists creating art by channelling their own lived experiences.
In the coming years, courts and legislators will need to clarify AI’s role in respecting intellectual property. Until then, while established institutions have the ability to sue and currently hold major corporations like Microsoft or OpenAI accountable, small artists and grassroot journalism ought to also have their rights protected, as they remain the most vulnerable in the face of threat from GenAI’s vast consumption of data.
Bibliography
[1] Berger, Virginie. “The AI Copyright Battle: Why OpenAI and Google Are Pushing for Fair Use.” Forbes. March 15, 2025. https://www.forbes.com/sites/virginieberger/2025/03/15/the-ai-copyright-battle-why-openai-and-google-are-pushing-for-fair-use/.
[2] U.S. Patent and Trademark Office. “Copyright Basics.” Accessed April 19, 2025. https://www.uspto.gov/ip-policy/copyright-policy/copyright-basics#:~:text=To%20be%20eligible%20for%20protection,or%20in%20a%20digital%20file.
[3] Harvard University Office of the General Counsel. “Copyright and Fair Use.” Accessed April 19, 2025. https://ogc.harvard.edu/pages/copyright-and-fair-use.
[4] Ross Intelligence. “Features.” Accessed April 19, 2025. https://www.rossintelligence.com/features.
[5] Gold, Kimberly, Jamie White, and Rachel A. Schwartz. “Court Finds No AI Fair Use in Thomson Reuters v. Ross Intelligence.” Reed Smith LLP. March 27, 2025. https://www.reedsmith.com/en/perspectives/2025/03/court-ai-fair-use-thomson-reuters-enterprise-gmbh-ross-intelligence.
[6] Skadden, Arps, Slate, Meagher & Flom LLP. “Court Reverses Itself in AI Training Data Case.” Skadden. February 26, 2025. https://www.skadden.com/insights/publications/2025/02/court-reverses-itself-in-ai-training-data-case.
[7] Chanana, Sushila. “Thomson Reuters v. Ross Intelligence: AI Copyright Law and Fair Use on Trial.” The Recorder. December 15, 2023. https://www.fbm.com/sushila-chanana/publications/thomson-reuters-v-ross-intelligence-ai-copyright-law-and-fair-use-on-trial/.
[8] DDG. “Copyright Infringement in AI Training: Fair Use Rejected in Thomson Reuters Case.” DDG Avocats. March 2025. https://www.ddg.fr/actualite/copyright-infringement-in-ai-training-fair-use-rejected-in-thomson-reuters-case.
[9] University of Louisville. “Four Factor Fair Use Analysis.” University of Louisville Copyright Office. Accessed April 19, 2025. https://louisville.edu/copyright/resources/four-factor-analysis.
[10] ibid
[11] Dave, Paresh. “The New York Times Says OpenAI Erased Potential Lawsuit Evidence.” Wired. April 4, 2025. https://www.wired.com/story/new-york-times-openai-erased-potential-lawsuit-evidence/?utm_source=chat.
[12] The New York Times Company v. Microsoft Corporation and OpenAI Inc. Complaint. December 27, 2023. https://nytco-assets.nytimes.com/2023/12/NYT_Complaint_Dec2023.pdf.
[13] Weller, Chris. “Microsoft Compares NYT’s OpenAI Lawsuit to Movie Studios Trying to Kill the VCR.” Ars Technica. March 14, 2024. https://arstechnica.com/tech-policy/2024/03/microsoft-compares-nyts-openai-lawsuit-to-movie-studios-trying-to-kill-the-vcr/.
[14] Robertson, Adi. “Sarah Silverman’s Copyright Lawsuit Against OpenAI Is Moving Forward.” The Verge. February 13, 2024. https://www.theverge.com/2024/2/13/24072131/sarah-silverman-paul-tremblay-openai-chatgpt-copyright-lawsuit.
[15] ibid
[16] Brooke, Thomas W., and Allysan Scatterday. “Sarah Silverman Ruling Shows ChatGPT Results, Not Inputs, Are Key.” Bloomberg Law. March 1, 2024. https://news.bloomberglaw.com/us-law-week/sarah-silverman-ruling-shows-chatgpt-results-not-inputs-are-key.