The artists' complaint, filed in California federal court Friday, centers on Imagen, a text-to-image diffusion model that uses machine learning to generate an image based on a short text description input by users. Similar suits have been filed against other AI companies, but this is likely the first one targeting Google LLC's model, counsel for the artists told Law360 on Monday.
Matthew Butterick, one of the attorneys representing the artists, said in a statement Monday that the complaint "describes another instance of a multi-trillion-dollar tech company choosing to train a commercial AI product on the copyrighted works of others without consent, credit or compensation."
According to the suit, Imagen is trained by copying "an enormous quantity of digital images" and "extracting protected expressions from these works." It uses a dataset from the nonprofit Large-scale Artificial Intelligence Open Network, or LAION, the artists said.
"During training of the model, the training images in the dataset are directly copied in full and then completely ingested by the model, meaning that protected expression from every training image enters the model," they said.
The artists said Google has admitted copying their copyrighted images to train Imagen, and they've verified that by searching for their works in the LAION dataset.
"These copyrighted training images were copied multiple times by Google during the training process for Imagen," they said. "Because Imagen contains weights that represent a transformation of the protected expression in the training dataset, Imagen is itself an infringing derivative work."
The suit was filed by Jingna Zhang, a photographer, and three cartoonists and illustrators: Sarah Andersen, Hope Larson and Jessica Fink. They're hoping to represent a nationwide class of all persons or entities that own a copyright "in any work that Google used as a training image for the Google-LAION models" between April 2021 and the present.
Andersen has already gone after other AI companies for the same thing. In January 2023, she filed a proposed class action against Stability AI Ltd., Midjourney Inc., DeviantArt Inc. and Runway AI Inc., claiming they infringe artists' works using two models that are similar to Imagen.
In the Google suit, Andersen and the other artists said Google has strategically avoided disclosing much about its training dataset because it was aware of that other suit. It recently unveiled a newer version of Imagen, called Imagen 2, but "unlike the paper that accompanied the initial version of Imagen, Google's introduction of Imagen 2 carefully omits a detailed description of its training dataset," per the complaint.
Google was hoping to avoid being named as a defendant in a suit, like this one, challenging the "legality of training on mass quantities of copyrighted works without consent, credit or compensation," the artists said.
They added that one of the architects of the LAION image datasets is a Google employee "who Google hired specifically to exercise influence over the LAION organization and its image datasets."
Butterick said in Monday's statement that Google "has admitted using the notorious LAION-400M dataset to train its Imagen model, and possibly others."
His clients "are accomplished artists whose work was included in LAION-400M without their consent," he said.
Google spokesperson José Castañeda said in a statement that Google's AI models "are trained primarily on publicly available information on the internet."
"American law has long supported using public information in new and beneficial ways, and we will refute these claims in court," he said.
Butterick and the legal team representing the artists in the Google case are also representing Andersen and the artists in the Stability AI action. In that suit, the companies recently argued that the artists haven't shown proof that any of them actually infringed or induced infringement of their copyrighted works.
The datasets used by AI programs were assembled by nonprofits, like LAION, whose stated purpose is to archive portions of the internet and offer material for free. AI companies have argued that because the content those nonprofits collect is for academic research and provided for free, it's fair use.
Generative AI has spurred a slew of litigation from creators, including authors and musical artists. That has led to Google and other tech companies offering legal protections to customers accused of copyright infringement after using generative AI products. Also, in March, Tennessee enacted a first-of-its-kind legislation intended to tackle misuse of AI by modifying a state law banning unauthorized copies of artists' works to cover musicians, their voices and their songs.
The artists are represented by Laura M. Matson of Lockridge Grindal Nauen PLLP, Joseph R. Saveri of Joseph Saveri Law Firm LLP and Matthew Butterick.
Counsel information for Google wasn't immediately available Monday.
The case is Zhang et al. v. Google LLC et al., case number 3:24-cv-02531, in the U.S. District Court for the Northern District of California.
--Additional reporting by Ivan Moreno and Henrik Nilsson. Editing by Andrew Cohen.
Update: This story has been updated to include comment from Google.
For a reprint of this article, please contact reprints@law360.com.