top of page

The Apple Shadow Library Scandal



Apple Faces Copyright Infringement Lawsuit Over AI Training Data



By Dr. Will Rodríguez

TOCSIN Magazine | October 2025



ree

In a legal challenge that could reshape the landscape of artificial intelligence development, Apple Inc. finds itself at the center of a high-stakes copyright infringement lawsuit. Two prominent neuroscientists have filed a class-action complaint alleging that the tech giant illegally used thousands of copyrighted books to train its flagship AI system, Apple Intelligence, marking yet another flashpoint in the growing conflict between technological innovation and intellectual property rights.



The Plaintiffs Take a Stand


Dr. Susana Martinez-Conde and Dr. Stephen Macknik, distinguished professors at SUNY Downstate Health Sciences University in Brooklyn, filed their complaint in a California court on October 10, 2025. The neuroscientists aren’t merely defending their own work—they’re positioning themselves as representatives of a broader class of authors whose intellectual property may have been systematically appropriated without consent or compensation.


The lawsuit specifically cites the unauthorized use of their book Champions of Illusion among the training materials, but the implications extend far beyond a single publication. At issue is Apple’s alleged reliance on so-called “shadow libraries”—vast repositories of pirated digital content scraped from the internet and compiled into massive datasets used to train large language models.



The Books3 Dataset: A Pandora’s Box of Copyright Issues


Central to the complaint is the Books3 dataset, a controversial collection that has become the de facto training corpus for many of the world’s most sophisticated AI systems. This dataset, comprising approximately 197,000 books totaling 37GB of text, was originally hosted on BitTorrent servers and made available through various file-sharing platforms.


Books3 emerged as part of The Pile, a larger training dataset used by major technology companies including Meta, Microsoft, and Nvidia. The dataset’s origins are unambiguously problematic: it consists entirely of ebooks downloaded from piracy networks, digitized without authorization from copyright holders. What was once a loosely organized collection of pirated texts has become foundational infrastructure for the AI revolution—a fact that has not escaped the notice of authors, publishers, and now, the courts.


In August 2023, the Danish anti-piracy group Rights Alliance successfully had Books3 removed from its primary hosting location, but by then, the dataset had already been downloaded and utilized by numerous AI developers. The proverbial genie was out of the bottle.



Apple’s AI Ambitions and the Training Data Question


Apple Intelligence, the company’s latest foray into generative AI, represents a significant strategic pivot for a company that has traditionally emphasized privacy and user control. Launched with considerable fanfare as part of the company’s broader AI integration across its ecosystem, Apple Intelligence promises sophisticated natural language understanding, content generation, and contextual assistance.


However, the plaintiffs argue that this technological achievement was built on an ethically and legally compromised foundation. According to the complaint, Apple’s AI model “benefited enormously” from training on pirated materials, essentially converting stolen intellectual property into proprietary commercial advantage.


The lawsuit raises a fundamental question that extends well beyond Apple: Can companies claim to respect intellectual property rights while simultaneously training their most valuable technologies on vast libraries of copyrighted material obtained without permission?



The Broader Legal Landscape


The lawsuit against Apple is far from isolated. The AI industry faces a crescendo of legal challenges from content creators across multiple domains:


  • Authors have filed similar suits against OpenAI, Meta, and other AI developers

  • Visual artists are challenging companies like Stability AI and Midjourney over image generation models

  • Music publishers are scrutinizing AI music generation platforms

  • News organizations are pursuing legal action over the use of journalistic content



What distinguishes these cases from previous copyright disputes is scale. Traditional copyright infringement typically involves discrete acts of copying—a pirated book, an unauthorized musical performance. AI training involves the systematic processing of millions of copyrighted works, transforming them into mathematical representations that enable machines to generate new content.


This presents novel legal questions. Is AI training “fair use”? Does the transformative nature of machine learning exempt developers from copyright liability? Can authors demonstrate concrete economic harm when their individual works constitute infinitesimal fractions of massive training datasets?



The Fair Use Defense: A Double-Edged Sword


AI companies have consistently argued that training constitutes fair use under copyright law—a doctrine that permits limited use of copyrighted material without permission under certain circumstances. The fair use analysis traditionally considers four factors:


  1. The purpose and character of the use: Is it transformative? Is it commercial?

  2. The nature of the copyrighted work: Is it factual or creative?

  3. The amount used: How much of the work was copied?

  4. The market effect: Does the use harm the original work’s market value?



AI developers contend that training is highly transformative—the models don’t reproduce copyrighted texts but instead learn linguistic patterns and relationships. They argue that individual works aren’t stored or retrievable, and that AI-generated content doesn’t directly substitute for the original works.


Authors counter that using entire copyrighted works without permission fails the fair use test, particularly when the resulting AI systems compete in the same commercial markets as human authors. They point to instances where AI systems have reproduced substantial portions of copyrighted text, demonstrating that these models do, in fact, retain specific content.


The courts will ultimately decide, but these cases are likely to establish precedents that define the boundaries of copyright in the age of artificial intelligence.



Economic and Ethical Dimensions


Beyond the legal technicalities lies a more fundamental ethical question: In an era when AI can be trained on the cumulative cultural production of humanity, how should creators be compensated?


Authors invest years in developing their craft and producing works that embody not just information but creativity, insight, and original expression. When those works are appropriated en masse to train commercial AI systems worth billions of dollars, without permission or payment, many creators understandably feel exploited.


The counter-argument holds that all creative work builds on what came before—that no author works in isolation from the cultural commons. Just as human writers learn by reading widely, AI systems learn from vast corpora of text. Should this be treated differently?


Perhaps, because the scale and mechanism differ fundamentally. A human reader might purchase or borrow a few hundred books in a lifetime; an AI system processes millions of works in hours. A human writer synthesizes influences through cognitive processes that remain mysterious; AI systems generate text through mathematical operations that can be audited, understood, and potentially regulated.



What This Means for Apple


For Apple specifically, this lawsuit represents both a legal liability and a reputational challenge. The company has built its brand on premium quality, ethical design, and respect for user privacy. Being accused of training its AI on pirated material contradicts that carefully cultivated image.


Apple has not yet publicly responded to the specific allegations in this lawsuit, and the company typically maintains that it complies with all applicable laws and respects intellectual property rights. However, the complaint puts Apple in the uncomfortable position of defending practices that appear standard across the AI industry but nevertheless seem ethically problematic to many observers.


If the plaintiffs prevail, Apple could face significant financial penalties. More importantly, a ruling against Apple could force the company—and the broader AI industry—to fundamentally restructure how training data is obtained, potentially requiring licensing agreements with authors, publishers, and other content creators.



The Path Forward: Toward a Sustainable AI Ecosystem


The tension between AI development and copyright protection isn’t insurmountable, but resolving it will require creativity and compromise from all stakeholders.


Several potential solutions have emerged in industry discussions:


Licensing frameworks: Collective licensing agreements could allow AI developers to legally access large text corpora while compensating rights holders through mechanisms similar to those used in music licensing.


Opt-in/opt-out systems: Authors could be given clear mechanisms to include or exclude their works from AI training datasets, with appropriate compensation structures for those who opt in.


Synthetic and licensed data: AI developers could invest more heavily in creating synthetic training data or in licensing data from sources that clearly have the right to provide it.


Transparency requirements: Requiring AI developers to disclose what training data they’ve used would enable rights holders to identify unauthorized use and seek appropriate remedies.


Fair use clarification: Legislative or judicial clarification of how fair use applies to AI training could provide the certainty that both developers and creators need.


The lawsuit against Apple will be closely watched because Apple’s resources, legal sophistication, and public profile make it a formidable defendant. How Apple responds—whether it settles, fights in court, or works toward industry-wide solutions—could influence the trajectory of AI development for years to come.



Conclusion: Innovation Meets Accountability



The neuroscientists’ lawsuit against Apple crystallizes a fundamental challenge of our technological moment: How do we foster innovation while respecting the rights and contributions of creators?


The AI revolution promises extraordinary benefits—enhanced productivity, new forms of creativity, solutions to complex problems. But these benefits should not come at the expense of the authors, artists, and creators whose work makes AI possible.


As we stand at this legal and ethical crossroads, the decisions made in courtrooms and boardrooms over the coming months will determine not just the fate of individual companies or lawsuits, but the very framework within which artificial intelligence will develop. Will we build an AI ecosystem that exploits creators, or one that fairly compensates them? Will we prioritize speed to market over respect for rights?


The answers to these questions will define not just the future of artificial intelligence, but the kind of society we want to live in—one where technological progress and human dignity advance together, rather than at each other’s expense.


For Apple, this lawsuit is more than a legal inconvenience. It’s a test of whether the company can live up to its stated values while competing in an industry where cutting corners on copyright has become disturbingly routine. The world is watching, and the outcome will reverberate far beyond Cupertino.





Reflection Box



The Apple lawsuit marks a turning point in the story of artificial intelligence—a reminder that brilliance without integrity is a fragile triumph. The question is no longer whether machines can learn from human creation, but whether humanity can learn from its own inventions. As we stand before this mirror, we must decide if our technologies will echo our highest ethics or our weakest compromises.




For more in-depth articles on technology, ethics, and society, visit TOCSINMAG.com

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page