OpenAI Files Complaint Against New York Times, Alleging ‘Hacking’ in ChatGPT Case

By Dhivya Easwarasamy On Feb 28, 2024

OpenAI requested the dismissal of The New York Times’ complaint, claiming that the media giant “paid someone to hack OpenAI’s products,” including ChatGPT, to manufacture 100 cases of copyright infringement for the case.

In a complaint Monday in Manhattan federal court, OpenAI said that it took the Times “tens of thousands of attempts to generate the highly anomalous results” and that the firm used “deceptive prompts that blatantly violate OpenAI’s terms of use.”

“Normal people do not use OpenAI’s products in this way,” the company stated in its filing.

The “hacking” alleged by OpenAI in the filing might also be referred to as prompt engineering or “red-teaming,” a typical method used by artificial intelligence trust and safety teams, ethicists, researchers, and technology corporations to “stress-test” AI systems for weaknesses.

It’s a standard approach in the AI sector and a popular way to warn businesses of problems with their systems, similar to how cybersecurity professionals stress-test company websites for flaws.

“In this filing, OpenAI doesn’t dispute — nor can they — that they copied millions of The Times’s works to build and power its commercial products without our permission,” Ian Crosby, Susman Godfrey partner and lead counsel for the Times, said in a statement to CNBC.
He went on to say, “What OpenAI bizarrely mischaracterizes as ‘hacking’ is simply using OpenAI’s products to look for evidence that they stole and reproduced The Times’s copyrighted works. And that is exactly what we found. In fact, the scale of OpenAI’s copying is much larger than the 100-plus examples set forth in the complaint.”

The filing comes as a larger struggle rages between OpenAI and publishers, authors, and artists over the use of intellectual material for AI training data, including the high-profile Times lawsuit, which some regard as a watershed event for the industry.

The news site launched a complaint in December against Microsoft and OpenAI, seeking billions of dollars in damages.

In the past, OpenAI stated that it was “impossible” to train top AI models without copyrighted materials.

“Because copyright today covers virtually every sort of human expression—including blog posts, photographs, forum posts, scraps of software code, and government documents—it would be impossible to train today’s leading AI models without using copyrighted materials,” OpenAI wrote in a filing last month in the U.K. in response to a House of Lords inquiry.
“Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens,” the company wrote in its application.

You can also look at some other recent news articles:

As recently as last month, in Davos, Switzerland, OpenAI CEO Sam Altman said he was “surprised” by the Times’ complaint, claiming that OpenAI’s models did not require training on the publisher’s data.

“We actually don’t need to train on their data,” Altman remarked during a Bloomberg event in Davos. “I think this is something that people don’t understand. Any one particular training source, it doesn’t move the needle for us that much.”

Although one publisher may not affect ChatGPT’s operating capabilities, OpenAI’s filing shows that a decision by a large number of publishers to opt-out may have an impact.

In recent months, the business has been courting publishers to allow their content to be used as training data.

Check out the tweet below for further details:

OpenAI alleges New York Times 'hacked' ChatGPT for lawsuit evidence https://t.co/ynvBMsOet0
— CNBC (@CNBC) February 27, 2024

The company has already signed agreements with Axel Springer. This German media behemoth owns Business Insider, Morning Brew, and other publications, and it is allegedly in talks with CNN, Fox Corp., and Time to license their content.

“We expect our ongoing negotiations with others to yield additional partnerships soon,” OpenAI stated in the filing.

In the filing and blog posts, OpenAI emphasized their opt-out mechanism for publishers, which allows outlets to prevent the company’s web crawler from accessing their websites.

However, OpenAI claims in its complaint that the content is critical for training today’s AI models.

“While we look forward to continuing to develop additional mechanisms to empower rightsholders to opt-out of training, we are actively engaged with them to find mutually beneficial arrangements to gain access to materials that are otherwise inaccessible, and also to display content in ways that go beyond what copyright law otherwise allows,” the company said in a statement.