The Authors Guild and a number of well-known writers have brought a class action against OpenAI and its various subsidiary corporations and partnerships alleging violations of copyright law in its use of materials to develop its large language models (LLMs), including ChatGPT.
For eight years I made my living as the primary legal editor of a publication that covered class actions and while that was some years back, I still know a lot about the subject. So I read through the complaint — you can find it here in PDF form — and have some thoughts.
Here’s the court record if you want to look up more information.
First of all, this action was brought by a very sophisticated and prominent firm of class action lawyers, Lieff Cabraser Heimann & Bernstein. Headquartered in San Francisco, the firm has been a major player in complex litigation since 1972, handling among other things some of the major tobacco cases, litigation over the Exxon Valdez, and other major product liability and class actions.
The other firm in the case, Cowan, DeBaets, Abrahams & Sheppard, is very experienced in copyright and technology law, according to their website.
Secondly, the case was brought in the U.S. District Court for the Southern District of New York. Given the number of publishers located in New York, that’s an obvious place to bring an action related to copyright. It’s assigned to Judge Sidney H. Stein, who was appointed by Bill Clinton.
A quick google search indicates that Judge Stein has handled a number of copyright cases.
Thirdly, this suit is only against OpenAI (in its many legal forms) and over ChatGPT and the other versions of that software. Given that there are other companies doing the same thing, I have no doubt that more suits will follow.
Fourth, this case is strictly over violations of copyright law in using work by authors of fiction. The proposed class includes works of fiction covered by registered copyrights that have sold at least 5,000 copies and that were used in programming the LLMs.
Again, this leaves out a lot of copyrighted materials that could be the subject of other suits, including nonfiction and books that sold fewer copies but were still used in developing the software.
And fifth, because they’ve restricted the case to books with properly registered copyrights, they can seek statutory damages based on violation of copyright law. That allows them to get around a major problem in class actions of a huge number of class members with very different actual damages.
If the court agrees that use of the materials by OpenAI was a copyright violation, the statutory damages will be available and there will be no need to prove individual damages.
According to the complaint, statutory damages are up to $150,000 per copyright. That could add up.
The plaintiffs allege that OpenAI can identify the books it used to “train” (and I note that they put “train” in quotation marks just as I would do) their software. Discovery in this case will be fascinating, since I’m sure one of the first things the plaintiffs will ask for is a list of all materials used.
Proving this use is a violation of copyright law will probably be complicated. I’m sure the defendants will argue that it is not, and perhaps even that their use of it was no different than people reading books.
I think there are very good arguments that feeding book texts into software is not “reading.” First of all, no individual human being could read all the books put into that software.
Secondly, the software doesn’t read and think about the books the way a person would. ChatGPT is not reading a book; it is processing and indexing data.
I’m no copyright expert, but I think the plaintiffs have a strong argument that use of their books in this way violates copyright.
I don’t know if they’ll win, but I see this action as a strong start to what I suspect will be multiple class actions over these devices. It may take some time to resolve; the tobacco cases took years and the plaintiffs lost the initial cases only to keep bringing new and different ones.
Big class actions are often settled. I hope this one won’t be. We need some definitive rulings on the copyright issue not just to deal with past violations, but to restrict how these companies use materials in the future.
I also think the named plaintiffs are bringing this case for the principles involved, not the money. However, a big enough settlement could affect how the chatbot companies do things in the future.
Copyright issues are far from the only problems with the chatbots, but they are a good starting point for litigation. In the past, class actions and other complex litigation have led to regulatory reform. For example, much of the safety regulation in automobiles came about as a result of product liability suits.
Such cases, especially if a few of them are successful, can put pressure on government agencies to take real action.
The tech industry has foisted any number of bad and dangerous products on us all over the years. It’s way past time for people to start fighting back.