AI Briefing: Copyright battles bring Meta and OpenAI datasets under the microscope

Last week saw not one, but two high-profile AI legal battles in the spotlight, with updates in separate copyright cases against Meta and OpenAI.

Court documents unsealed in the AI v. Meta copyright case have raised new questions about the use of e-books from the piracy website Library Genesis (LibGen). They also raise new questions about how much CEO Mark Zuckerberg and other Meta officials knew about how Meta teams were using pirated content to train Llama models.

Court documents alleges that Meta staff attempted to remove copyright information — including headers and other identifiers — from various materials. One submission shows an internal meta document with a proposal to remove lines containing words like “ISBN”, “copyright” and “all rights reserved”. Other filing it features messages between employees talking about a desire to compete with other AI rivals, including beating OpenAI’s GPT-4, while describing French rival Mistral as “peanuts”.

Other documents include parts Zuckerberg’s testimony from his December deposition. Zuckerberg said the broad characterization made using pirated content “seem like a bad thing,” but added that Meta teams “think about it carefully because there’s often more nuance than meets the eye.” (Meta did not respond to Digiday’s request for comment on the court documents.)

The books in the LibGen dataset include titles from leading authors, including Ta-Nehisi Coates and Sarah Silverman, who are among the authors who filed the lawsuit. Zuckerberg claimed to be unaware of LibGen. However, the plaintiff’s attorney then questioned whether Meta would do business with a company that boasts of using pirated material.

“In general, if someone is broadcasting that they’re doing something that’s illegal, that would be a pretty big red flag that I would want us to look at carefully before engaging with them in any way,” Zuckerberg said.

When asked by a lawyer whether Meta should pull material from websites known to contain pirated material, Zuckerberg said YouTube hosts “a few percent” of pirated content, although most of the content is “kind of good and they have a license to . “

“Early on, I think people made some assumptions about YouTube’s intent on this and were less mature in developing intellectual property rights management,” Zuckerberg said. “But even then, I don’t think I’d say at that point that I don’t want Meta people to not use YouTube.” So – I don’t know.”

Other documents suggest that Meta staff knew that Llama’s training data had LibGen content and other copyrighted material from sources such as CommonCrawl. The documents also suggest that the Meta teams knew there could be backlash and potential fines under EU AI law if the use of LibGen was discovered. One paper mentioned the Meta teams suggesting that datasets be clustered to filter out potential information about bioweapons and harmful stereotypes.

NYT v. OpenAI and Microsoft

The revelations in the Meta case come as technology companies face increased scrutiny over the types of content used to train large language models. In a separate lawsuit between The New York Times and OpenAI, attorneys made oral arguments in court that outlined the key points that both sides make as part of the case. In both cases, plaintiffs allege the tech companies removed copyright information from content used to train AI models.

“You’re leaving people open to massive copyright infringement without being able to trace it,” said Steven Lieberman, a lawyer representing the New York Daily News, which filed a separate case against OpenAI and Microsoft. “It’s like causing your house alarm system to go off.

Out of Court – Publishers Inspire New Businesses with Artificial Intelligence

Last week, Axios and OpenAI announced a new partnership that includes funding for new local Axios newsrooms in four cities, including Pittsburgh, Pa., and Kansas City, Mo. The deal also gives Axios access to OpenAI technology to build new AI products, processes and systems. In the blog post Axios CEO Jim VanderHei wrote of the deal that the three-year agreement will also give all Axios employees access to the enterprise version of OpenAI.

That wasn’t last week’s AI news. The Associated Press and Google also announced a new partnership that includes AP providing real-time information to the Google Gemini app. The companies’ blog posts did not disclose the terms of the deal or what it would entail, but noted that the plan would help “increase the usefulness of the results” in the Gemini app. Kristin Heitmann, AP revenue director, established the updates are part of an ongoing relationship between the companies and “based on working together to deliver timely, accurate news and information to a global audience”.

In addition to Axios and the AP’s plans to expand on AI news, another company starting with “A” has taken a step back. Last week, Apple suspended its use of AI message alerts following criticism for generating inaccuracies in AI summary notifications. In the meantime, a new one message by DoubleVerify generates a network of more than 200 websites in detail “AI slop” that impersonates real publishers while deceiving adtech vendors and buyers.

Challenges and Products — More AI news and announcements

Anthrologic, a new startup co-founded by former MediaMonks executives, has launched to help brands create AI agents.
Adobe has unveiled a new generative artificial intelligence tool for its Firefly platform that aims to give retailers more ways to scale personalized content.
The US Supreme Court upheld the ban on TikTok unless the company is sold to an American entity.
UK Competition Authority he announced a new inquiry into Google’s search and search advertising to examine whether the giant has “strategic market status” under the UK’s newly adopted competition law. One reason for the CMA’s investigation is to ensure that AI startups are able to compete fairly with Google’s own AI products and services.
The FTC, which is investigating Snapchat’s My AI chatbot, he announced referred the investigation to the US Department of Justice. According to the FTC, the investigation involves “allegedly resulting risks and harm to young users.” “Although the Commission does not normally make public the fact that it has filed a complaint, we have decided that it is in the public interest.”

More AI-related stories from around Digiday

Source link

AI Briefing: Copyright battles bring Meta and OpenAI datasets under the microscope

Leave a ReplyCancel Reply

PENGU crypto jumps 50% – Traders, here’s the real support you should watch!

US Bitcoin ETFs Log First-Ever Back-to-Back $1B Inflows

DOGE Surges 9% Before Sharp Reversal as $0.213 Resistance Halts Rally

Leave a ReplyCancel Reply

Trending now

PENGU crypto jumps 50% – Traders, here’s the real support you should watch!

US Bitcoin ETFs Log First-Ever Back-to-Back $1B Inflows

DOGE Surges 9% Before Sharp Reversal as $0.213 Resistance Halts Rally