Anthropic and Fair Use

Olivia Sophie Rafferty, for ai fray:

The Northern District of California has granted a summary judgment for Anthropic that the training use of the copyrighted books and the print-to-digital format change were both “fair use” […]

However, the court also found that the pirated library copies that Anthropic collected could not be deemed as training copies, and therefore, the use of this material was not “fair”. […]

This is a mixed ruling on fair use – a loss for both copyright holders and Anthropic, but potentially a big win for AI platforms in general. And, if upheld, the order would mean that AI firms using copyrighted material to train their LLMs may be allowed in the future. The only exception to this would be if the material has been pirated.

The mixed ruling in this case is potentially very interesting for the larger AI and LLM-based industry. The court here is establishing some precedent that training an LLM model is considered ‘transformative’ because it uses existing copyrighted works to create new outputs. This ruling sure seems to validate the approach of training LLMs on large datasets of copyrighted materials as acceptable, in the eyes of the court at least.

On the other hand, ‘pirating’ a bunch of books and content is obviously not fair use, and Anthropic will be on the hook for those damages in a future trial.