Magistrate Judge Rejects Meta’s ‘False Dichotomy,’ Faults Some AI Discovery

·

(January 21, 2025, 4:23 PM EST) -- SAN FRANCISCO — Meta Platforms Inc.’s “false dichotomy” about the possible answers regarding its use of copyrighted material in an artificial intelligence case is based on minor quibbles about the extent of the copying and does not “justify the hopelessly vague” answers it offered about what its search of the training data revealed, a federal magistrate judge in California said Jan. 17 in partially granting motions to compel.

(Richard Kadrey, et al. v. Meta Platforms, Inc., No. 23-3417, N.D. Calif., 2025 U.S. Dist. LEXIS 9704)

(Public version of discovery order available.  Document #46-250205-023R.)

A public version of the ruling became available on Jan. 17.  The original Jan. 2 ruling was sealed.

Richard Kadrey filed an amended complaint on Dec. 11, 2023, in the U.S. District Court for the Northern District of California, alleging that Meta violated the plaintiffs’ right in developing its two AI software programs, large language models, or Llamas, 1 and 2.  The plaintiffs say their “copyrighted materials were copied and ingested as part of training Llama 1 and Llama 2,” claiming that their works were illegally absorbed into the “Books3” dataset Meta used to train its program.

Training

“To train the Llama 1 and Llama 2 language models, Meta copied the Books3 dataset, which includes the Infringed Works,” they allege.  “Plaintiffs never authorized Meta to make copies of their Infringed Works, make derivative works, publicly display copies (or derivative works), or distribute copies (or derivative works).  All those rights belong exclusively to Plaintiffs under copyright law.”

A series of discovery disputes arose, and Kadrey moved to compel Meta to respond to certain requests for admission.

Magistrate Judge Thomas S. Hixson granted the motion in part but otherwise denied it.

Texts

In response to a request regarding Meta’s use and inclusion of copyrighted works, the company simply confirms that “some text” and “text from” the works appears in its training data.  Meta argues that providing a more specific answer would be “enormously burdensome” and require a word-for-word comparison of training data and the copyrighted works, Magistrate Judge Hixson said.

But this falsely portrays Meta’s possible answers as only whether any of the copyrighted works appear in the training data or whether exact copies of the works appear, Magistrate Judge Hixson said.  “But the Court sees through this false dichotomy.  If it looks like a dataset contains a book, a truthful answer is:  ‘admit as to substantially all of the book.’  Meta is not allowed to use minor quibbles about whether 99% of the book as opposed to exactly 100% of it was included as a pretext for providing no information at all about how much of the book was included,” Magistrate Judge Hixson said.

“Meta also claims that after having done a reasonable investigation, it concluded that it did not train any Llama model on the entirety of Plaintiffs’ books.  But that doesn’t justify the hopelessly vague ‘some text from’ or ‘text from’ responses.  The investigation that Meta says it did should allow Meta to give a better answer than that, such as ‘admit as to substantially all of the book,’ ‘admit as to about half of the book,’ or ‘admit as to some text from the book but deny as to substantially all of the book,’” Magistrate Judge Hixson said.

Further, Meta’s admission that at least one copyrighted work was contained in at least one of the training sets “avoids the thrust” of Kadrey’s request that it admits it did not seek permission to use the copyrighted works, Magistrate Judge Hixson said.  It would seemingly require little effort for Meta to answer the question, Magistrate Judge Hixson said.  Meta’s response that it used a portion or some content from various sources for training its AI is also vague.  “Meta must provide some estimate or approximation so that Plaintiffs can know if Meta is almost all admitting, mostly admitting, partly admitting, largely denying, or almost completely denying these RFAs [requests for admission].  Right now that’s not clear.  The Court agrees with Meta that this is an RFA, not [an interrogatory], so Meta is not obligated to provide a long narrative response, but its existing responses are inadequate because they do not indicate how much of the RFAs are admitted or denied,” Magistrate Judge Hixson said.

Copyright

Meta must also answer related questions about copyright holders and licensing agreements, Magistrate Judge Hixson said.  Additionally, Meta must answer whether it deleted copyrighted data, not simply whether it deleted training data used in the creation of AI, Magistrate Judge Hixson said.

Magistrate Judge Hixson denied the motion to the extent that it involved privilege logs, a request to reopen discovery, the crime-fraud exception, certain documents and data and Meta’s responses to interrogatories.

As to the crime-fraud issue, Magistrate Judge Hixson said that the issue seemed to overlap significantly with the copyright infringement claim and that there was a pending motion to amend the complaint to add a crime-fraud claim. 

“The Court has a serious concern about a discovery motion that basically asks the undersigned magistrate judge to decide this lawsuit (including proposed additions to this lawsuit) on the merits, without a trial, in favor of the Plaintiffs.  That seems to get things out of order.  In a case like this where liability is hotly disputed, the Court is not willing to embrace a crime-fraud theory that requires the Court to decide the contested merits of the case in order to rule on a discovery motion,” Magistrate Judge Hixson said.

Counsel

Kadrey, et al. are represented by Joseph R. Saveri, Cadio Zirpoli, Christopher K.L. Young, Holden Benon and Kathleen J. McMahon of Joseph Saveri Law Firm LLP in San Francisco, Matthew Butterick in Los Angeles, Bryan L. Clobes in Media, Pa., and Alexander J. Sweatman in Chicago, both of Cafferty Clobes Meriwether & Sprengel LLP, and Daniel J. Muller of Ventura Hersey & Muller LLP in San Jose, Calif.

Meta is represented by Bobby Ghajar and Colette Ghazarian in Santa Monica, Calif., and Mark Weinstein, Kathleen R. Hartnett and Judd Lauter in Palo Alto, Calif., all of Cooley LLP, and Mark A. Lemley of Lex Lumina PLLC in New York.

(Additional documents available:  Joint discovery letter on request for admission.  Document #46-250205-024B.  Joint discovery letter on privilege logs.  Document #46-250205-025B.  Joint discovery letter on additional deposition time.  Document #46-250205-026B.  Joint discovery letter on crime-fraud.  Document #46-250205-027B.  Joint discovery letter on requests for production.  Document #46-250205-028B.  Joint discovery letter on interrogatories.  Document #46-250205-029B.  Kadrey’s motion for leave to file third amended complaint.  Document #46-250205-030M.  Meta’s opposition.  Document #46-250205-031B.  Kadrey’s reply.  Document #46-250205-032B.  Plaintiffs’ operative amended complaint.  Document #58-240123-001C.)