Welcome to My Website

Thank you for visiting! If you don't see the pop-under, please click anywhere on the page.

OpenAI, The New York Times debate copyright infringement of AI tech companies in first trial arguments - adtechsolutions

Physical Address

304 North Cardinal St.
Dorchester Center, MA 02124

OpenAI, The New York Times debate copyright infringement of AI tech companies in first trial arguments


The copyright infringement lawsuit between The New York Times and OpenAI kicked off with a hearing in federal court on Tuesday.

The judge heard arguments from both sides in the appeal filed by OpenAI and its financial backer Microsoft. The New York Times — as well as The New York Daily News and the Center for Investigative Reporting, which have filed their own lawsuits against OpenAI and Microsoft — claim OpenAI and Microsoft have used publisher content to train their large language models that power their generative AI chatbots. This means that technology companies are competing with these publishers for usage They argue that their content answers user queries, thereby removing the incentive for users to visit their site for that information and ultimately harming their ability to monetize those users through digital advertising and subscriptions, they argue.

OpenAI and Microsoft say what they’re doing is covered by “fair use,” a law that allows copyrighted material to be used to create something new that doesn’t compete with the original work.

The outcome of this lawsuit has major implications for the entire digital media ecosystem and will determine the legality of generative AI tools using a publisher’s copyrighted work without their consent for training.

Here were the main arguments during the trial:

The New York Times argument

Use of copyrighted content

OpenAI uses content from The New York Times to train its large language models, sometimes by making copies of that content, the plaintiffs allege. Sometimes multiple paragraphs or entire articles that are part of this training dataset are returned in response to a user prompt. And in some cases, new content that the LLM did not use for their training (due to the cut-off date) is also regurgitated by the LLM in response to the challenge. The plaintiffs cited examples of excerpts that have verbatim language or summaries of articles without attribution from The New York Times.

LLMs copy content because they can’t process information like humans

People can read something, understand the basic information, and learn something new, which is not considered copying information. But LLMs don’t have that because they’re machines, which means the models absorb “expressions” of facts, not the facts themselves, which The New York Times lawyers say should be considered copyright infringement.

Generative AI search is different from traditional search engine

Unlike a traditional search engine (where links to the original source are provided and the publisher can monetize this traffic through advertising or subscriptions), generative search engine provides an answer to the question with sources in the footnotes. Footnotes, lawyers for The New York Times argue, can contain multiple sources, interfering with the publisher’s ability to get that user to its website.

Avoiding paywalls

OpenAI has own GPT brands in your store with products that help users remove paywalls. “Users have been posting on Reddit forums and social media about how they got into paywalls using a product called SearchGPT, and in fact OpenAI pulled the product after they became aware that the products were being used to infringe copyrights,” said Ian Crosby, partner of Susman Godfrey and Senior Advisor to The New York Times.

Time-sensitive content will be removed without attribution

Lawyers for The New York Times said the content was used from the Wirecutter website to recommend The Times products without proper attribution, meaning the Wirecutter lost revenue from people not clicking on the website and affiliate links. And this clipped content was sometimes time-sensitive, such as product recommendations around Black Friday. They argue that the content should be protected by the “hot news” doctrine, which is part of copyright law that protects time-sensitive news from being used by competitors. The lawyers argued that ChatGPT cited some products as being endorsed by Wirecutter when they were not, damaging the brand’s reputation.

OpenAI and Microsoft’s arguments

Fair use doctrine

Lawyers for OpenAI and Microsoft said the copyrighted material in question is permitted under the fair use doctrine. AI companies are staunch supporters of a doctrine that allows copyrighted material to be used without permission as long as the use is different from its primary purpose, is used in a non-commercial context, and is not used in a way that would harm the copyright owner. .

Annette Hurst, a lawyer representing Microsoft, said LLM understands language and ideas that can be adapted for “everything from curing cancer to national security: “In their own words, the plaintiffs argued that the technology could be commercialized in the billions. dollars regardless of any ability how.”

How LLMs work

Defense attorneys also disagreed with their plaintiff counterparts when it came to describing the operation of large language models. For example, an OpenAI lawyer said that LLM companies don’t actually store copyrighted content, but only rely on data weights derived from the training process.

“When I say to you, ‘Yesterday all my troubles seemed to me like this,’ we all think [think] “far” because we’ve been exposed to this text so many times,” said Joe Gratz, an attorney at Morrison & Foerster, which represented OpenAI. “That doesn’t mean you have a copy of that song in your brain somewhere.”

Limitation period

The lawyers argued that the lawsuit should not be allowed because of the three-year statute of limitations for copyright infringement cases. But lawyers for the Times noted that there was no way to know until April 2021 that OpenAI would use publishers’ content in a way that would damage it.

“Misleading” examples

Lawyers tell The Times they have found millions of examples to make their case. However, OpenAI argued that the plaintiffs were misleading with examples of how ChatGPT replicates copyrighted content and with examples of AI-generated content cited by the Times in inaccurate responses. Defense lawyers also allege that the Times used aspects of ChatGPT that helped use challenges to generate AI content that violated OpenAI’s terms. (Lawyers also noted that OpenAI has struggled to address the shortcomings.)

No evidence of damage

The Times’ claims include that OpenAI removes copyright management information (CMI) such as mastheads, author bylines and other identifiable information. However, OpenAI and Microsoft argue that the plaintiffs have not shown how they were harmed by the removal of CMI. They also claim that the plaintiffs have failed to prove that OpenAI and Microsoft willfully infringed copyright. But the plaintiffs said past court decisions have recognized that copying copyrighted content was infringement per se, without the need to prove dissemination or economic loss.

“Their biggest problem is that they don’t have a credible story about how they would be better off if the CMI they claim was removed was actually removed,” Gratz said. “… There is no way the world would be better for them in the way they say the world is not good for them if the CMI they say was removed had never been removed.”

What comes next

The Times’ lawsuit is just one of many lawsuits OpenAI is facing. While OpenAI won the case in November, other pending lawsuits include complaints from a group of Canadian news publishersa group of American newspapers owned by Alden Capital and a class action lawsuit filed by a group of authors. (OpenAI, Perplexity and Microsoft join Google in ongoing Google search antitrust lawsuit send a summons to all three companies.)

Other big tech startups and giants have their own AI and copyright legal battles. Meta is facing a class action lawsuit filed by a group of writers including Sarah Silverman. Confusion is a defendant in a lawsuit filed in October by News Corp. Google is facing a lawsuit brought against it by the Union of Authors.

It is unclear when U.S. Magistrate Sidney Stein will issue his decision on whether to let the case proceed. Megan Gray, an attorney and founder of GrayMatters Law & Policy, attended the hearing in person and noted that Stein appeared to be “in it for the long haul” and was unlikely to let it go so soon.

“Judge Stein was engaged and curious, remarkable for his age and lack of technical maturity,” Gray said. “He understood the cases and the positions and also has a firm grip on his courtroom. He does not normally provide an audio line for the public and the fact that he has done so here shows that he is well aware of the importance of the case and its impact on society.’



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *