The Pirate Library in the Machine

David Baldacci may not be my cup of thriller, but his lawsuit points to the real question for AI fiction: not whether machines can write, but what they were fed before they opened their mouths.

I had never heard of David Baldacci until the algorithm dragged him across my carpet. One of those late-night TikTok scrolls where the For You page decides your literary education for the evening. There he was, or rather, his name attached to a sprawling backlist of political thrillers, airport paperbacks with embossed titles and stern-jawed heroes. I felt a small twinge of guilt for my ignorance. Then the next swipe revealed he was suing OpenAI, and the guilt evaporated into curiosity.

This isn’t another round of hand-wringing about “AI stealing creativity.” That story is starting to feel like a guild costume drama. The real one, the one worth writing about on a site dedicated to AI-augmented authorship and creative fiction, sits in the supply chain. Not the outputs. The inputs. The invisible library that fed the machines.

The Wrong Fight

It’s tempting to frame this as Famous Novelists vs. Silicon Valley. Baldacci, Grisham, Martin, Picoult, the Authors Guild, all the established names protecting their turf from the new disruptor. The press loves that angle. It photographs well. But it flattens something interesting into a morality play that doesn’t serve writers or the future of AI-assisted storytelling.

I’m not here to wring my hands over whether a language model can “imagine.” I use AI tools daily in my fiction workflow. They brainstorm with me, iterate dialogue, help me escape plot holes at 2 a.m. when the muse has ghosted me. AI-assisted writing isn’t the scandal. The scandal sits one layer down, in the possibility that some of the most valuable commercial fiction of the last few decades was vacuumed up from pirate repositories like Library Genesis and fed into billion-dollar models without permission or payment.

That distinction matters quite a lot to those of us building creative practices with AI rather than against it.

What the Baldacci Case Actually Reveals

Baldacci’s suit is now folded into the giant In re OpenAI, Inc. Copyright Infringement Litigation MDL in the Southern District of New York. It sits alongside claims from a long list of heavy hitters across fiction and beyond. The core allegations come in three flavours: unauthorised reproduction of copyrighted books, use of those copies for training, and generation of outputs that themselves infringe.

Procedurally, the case is still very much alive as of mid-2026. Judge Sidney Stein denied OpenAI’s bid to dismiss the output-infringement claims late last year, finding that plaintiffs had plausibly alleged substantial similarity in at least some ChatGPT responses. The bigger fair-use question around training data itself hasn’t been fully resolved. Discovery grinds forward, including orders for tens of millions of output logs. There was even a sealed motion to compel documents from Amazon, which gives you some idea how far the evidence hunt has reached into the broader book ecosystem.

The spiciest detail, though, is still the Books1 and Books2 datasets. Essentially LibGen1 and LibGen2. It’s undisputed that an OpenAI employee downloaded pirated books. The company later deleted those datasets in 2022. Privilege fights over internal communications about the deletion have shielded some context, but the underlying story of sourcing from shadow libraries doesn’t go away just because the files did.

Contrast that with the Anthropic litigation. A federal judge there drew a fairly clean line: training on books could qualify as fair use under certain conditions, but hoarding and using millions of pirated copies crossed into clear infringement. Anthropic’s proposed $1.5 billion settlement is the price tag on that sourcing choice.

Training may be defensible. Piracy isn’t.

That’s the seam the plaintiffs are trying to pry open. Not whether machines can read and remix. Of course they can, and humans do the same thing constantly, only slower and with more forgetting. The question is provenance. Receipts. Where the library card came from.

The Bit That Actually Matters to Writers

Human writers read voraciously. We absorb, forget, misremember, remix, and transform. That messy, leaky process is protected by fair use doctrines developed over centuries precisely because culture requires it. No author owns the idea of a jaded detective or a ticking-clock conspiracy.

Machines, on the other hand, are trained at industrial scale by organisations with deep pockets and proper data pipelines. When those pipelines appear to lean on convenient pirate libraries rather than licensed or public-domain material, the ethical and legal terrain shifts under our feet. It stops being “the machine read a book.” It becomes “the corporation built a billion-dollar product on top of a hidden basement full of stolen goods.”

This is why I have no patience with either of the loud camps.

“AI is theft” doesn’t square with the actual creative collaboration I experience every week working with models on fiction. “Copyright is dead, adapt or die” lets companies launder opacity as innovation, and conveniently ignores that writers, especially mid-list and emerging ones, are already navigating brutal economics. Strip away their ability to control or benefit from large-scale commercial reuse of their work and you’ve made an already wobbly career structurally untenable.

The middle ground isn’t even complicated. AI-assisted fiction can be legitimate, interesting and authorially honest when there’s meaningful human direction, revision and accountability behind it. The underlying toolchain shouldn’t be sitting on stolen goods dressed up in “democratising creativity” language.

Where I Stand

On Moavis.nexus I spend most of my time on exactly this territory. How AI tools intersect with authorship, voice, and the craft of long-form fiction. I have no interest in joining an anti-AI crusade. I also refuse to pretend the current data foundations are pristine.

The problem isn’t that machines read books. The problem is that nobody can reliably see the library card.

Provenance, disclosure, opt-outs, licensing markets. None of these are boring bureaucratic details. They’re the guardrails that will decide whether AI-augmented storytelling becomes a genuine renaissance or a copyright landfill with eloquent chatbots perched on top.

A future where models are trained transparently on licensed corpora, where creators can opt in for compensation or attribution, where tools exist to trace influence and reward upstream authors when commercial products derive value from their work at scale: that future supports both the Baldaccis protecting their backlists and the experimental writers like me poking around at the edges with AI co-pilots.

The alternative is opaque supply chains, endless litigation, and a culture that treats individual works as training fodder. Which benefits only the platform owners.

Model Hygiene, and Why It Isn’t a Boring Phrase

David Baldacci’s personal literary style is almost beside the point. That’s what makes the case illuminating. If even competent, mass-market, commercially successful fiction becomes legal uranium in these training wars, we’re not debating genius. We’re debating the industrial harvesting of ordinary creative labour.

For those of us writing the next generation of fiction with AI, the lesson is pretty straightforward. Build responsibly. Document your process. Push the models you use for transparency. Support licensing solutions that create new revenue streams instead of zero-sum punch-ups. Treat AI as an instrument: powerful, sometimes uncanny, but still subordinate to human authorship and ethical sourcing.

The pirate library in the machine doesn’t have to define what comes next. But ignoring it, or pretending it doesn’t matter because “progress,” more or less guarantees that it will.

Writers have always borrowed from the common well of story. The difference now is scale, speed, and corporate incentive. Getting the supply chain right won’t kill AI creativity. It will legitimise it.

And that, far more than any single lawsuit or TikTok discovery, is the story I’d rather be following. And writing.