Among the Four Major AI Models, GPT-4 Has the Worst Copyright Infringement Issues

by Kim Jinyeong

Published 07 Mar.2024 13:45(KST)

Updated 07 Mar.2024 14:21(KST)

open/close

Patronus AI Research Results Announcement
OpenAI "AI Training Impossible Without Using Works"

A study has found that among major artificial intelligence (AI) models, OpenAI's latest large language model (LLM), GPT-4, is the most vulnerable to copyright infringement.

AI model evaluation company PatronusAI announced on the 6th (local time) the results of experiments measuring the degree of copyright infringement by four major LLMs: GPT-4 (OpenAI), Claude 2 (Anthropic), LLaMA 2 (Meta), and Mistral (Mistral AI).

[Image source=Reuters Yonhap News]

For the experiment, the company selected popular copyrighted American books such as Gillian Flynn's Gone Girl and Michelle Obama's Becoming. They then input 100 different prompts, such as asking for the first sentences or to complete texts from these books, to observe the responses of the AI models.

As a result, GPT-4 reportedly fulfilled 44% of the input prompts. This means it complied with 44 out of 100 commands that involved collecting, editing, or displaying copyrighted content. Mistral and LLaMA 2 followed with 22% and 10%, respectively, while Claude 2 generated copyrighted content only 8% of the time.

PatronusAI explained, "Claude 2 showed a high level of caution regarding copyright infringement, such as refusing to complete book texts." Rebecca Chen, PatronusAI's Chief Technology Officer (CTO), pointed out, "All AI models evaluated reproduced copyrighted content, and it is surprising that GPT-4, which is the most widely used by people, recorded the worst performance."

This research result is significant as it comes at a time when OpenAI is embroiled in lawsuits related to copyright infringement. In September last year, bestselling authors including George R.R. Martin, the original author of the drama Game of Thrones, and John Grisham filed a class-action lawsuit against Microsoft (MS) and OpenAI, claiming that their creative works were used without permission during the training of GPT language models.

The New York Times (NYT) also filed a multi-billion-dollar damages lawsuit against OpenAI and others in December last year, alleging that millions of articles published by the newspaper were used to train ChatGPT. PatronusAI emphasized regarding this research, "AI developers must prioritize developing mechanisms to prevent unauthorized use of copyrighted materials."

However, it is uncertain whether OpenAI will respond to such copyright protection appeals. In a report answering questions from the UK House of Commons in January, OpenAI stated, "Since copyright today effectively applies to all kinds of expressions created by humans, it is impossible to train LLMs like GPT-4 without using copyrighted materials."

Hot Picks Today

"Buy on Black Monday"... Japan's Nomura Forecasts 590,000 for Samsung, 4 Million for SK hynix

Meanwhile, PatronusAI introduced an AI tool called "Copyright Catcher" alongside the research results on the same day. It was described as the first automated evaluation and safety platform that supports companies in developing LLMs with peace of mind.

한글 기사 보기

This content was produced with the assistance of AI translation services.