AI Trained on Millions of Books Avoids Liability... "Transformative Use" Decides the Outcome
U.S. Court Recognizes Transformative Nature as "Similar to Human Reading"
Strict Standards for Commercial Services and Illegally Collected Data
The criteria for recognizing the use of works for training generative artificial intelligence (AI) as fair use under copyright law are clear: it is necessary to prove transformative use and thoroughly dispel concerns that the original work's market may be encroached upon.
The "Guide to Fair Use under Copyright Law for Training Generative AI with Works," published on February 26 by the Ministry of Culture, Sports and Tourism and the Korea Copyright Commission, presents clear legal standards for determining whether AI training constitutes copyright infringement, based on an in-depth analysis of recent court cases in major countries such as the United States and Europe.
The guide focuses on transformative use as the first factor, namely the "purpose and character of the use." Transformative use refers to the act of not simply substituting the original work but modifying it into a new expression, meaning, or message, thereby adding new value. Even if the AI is developed for commercial purposes, if it does not infringe on the existing market for the work and creates new value, there is room for it to be recognized as fair use. Conversely, even nonprofit research organizations will face strict scrutiny if they merely imitate the original work and reduce it to an economic substitute.
Recent lower court rulings in the United States increasingly recognize transformative use, supporting technological leadership. For example, the U.S. District Court for the Northern District of California found that Anthropic's act of reproducing lawfully purchased books during the training of its large language model Claude constituted fair use. The court considered AI's statistical extraction of non-expressive abstract elements such as grammar and structure, without reproducing the specific expressions of a work, to be analogous to the human process of reading and internalizing content from numerous books. In another case, the court also recognized Meta's use of data from illegal book repositories to train its language model Llama as a highly transformative use.
However, both the United States and Europe apply strict standards to commercial services that directly compete with original works. The U.S. District Court for the District of Delaware ruled that Ross Intelligence's unauthorized use of Thomson Reuters' legal database to train a similar legal AI search engine was illegal. The court determined that this was non-transformative use with the same purpose as the original work and posed a high risk of direct market substitution for the competitor.
Similarly, the Munich District Court I in Germany ruled that OpenAI's ChatGPT had clearly infringed reproduction and public communication rights by learning and reproducing lyrics owned by the music copyright group GEMA without authorization. The court also clarified that memorizing and permanently storing training data by the model cannot be exempted under the "text and data mining" exception.
Hot Picks Today
"Stocks Are Not Taxed, but Annual Crypto Gains Over 2.5 Million Won to Be Taxed Next Year... Investors Push Back"
- "Don't Throw Away Coffee Grounds" Transformed into 'High-Grade Fuel' in Just 90 Seconds [Reading Science]
- Signed Without Viewing for 1.6 Billion Won... Jamsil and Seongbuk Jeonse Prices Jump 200 Million Won in a Month [Real Estate AtoZ]
- "Groups of 5 or More Now Restricted"... Unrelenting Running Craze Leaves Citizens and Police Exhausted
- "Even With a 90 Million Won Salary and Bonuses, It Doesn’t Feel Like Much"... A Latecomer Rookie Who Beat 70 to 1 Odds [Scientists Are Disappearing] ③
Even if the purpose is proven to be transformative, there is no immunity for cases involving data that was stolen for training. In the aforementioned Anthropic ruling, the court allowed training on lawfully purchased books but unequivocally regarded the use of books downloaded from illegal sites as copyright infringement. As a result, Anthropic began the process of permanently deleting illegally collected datasets and paying large settlement amounts to class action participants. The Korean government's guide also warns that data collected unlawfully by circumventing robot exclusion standards or access restrictions is fundamentally excluded from the protection of fair use.
© The Asia Business Daily(www.asiae.co.kr). All rights reserved.