The largest artificial intelligence firms are able to afford access to quality data from content producers like the New York Times, while smaller startups are being left out. This dynamic risks concentrating markets and creating unassailable barriers to entry. Compulsory licenses offer one solution to lower barriers to entry for nascent AI firms without harming content producers and consumers, writes Kristelia García.
Editor’s Note: This article is part of a symposium that examines the competitive risks posed by artificial intelligence content licensing agreements—mechanisms that on the surface appear to balance copyright protection with innovation. How might these deals compel us to rethink the relationship between copyright, data protection, innovation, and competition in the age of AI? How can we seek to uphold all four, while recognizing the inevitable tradeoffs involved? You can read the article from Mark Lemley and Jacob Noti-Victor (writing together), Matthew Sag, Christian Peukert, and Kristelia García here as they are published.
At the current juncture, the generative artificial intelligence (AI) industry exhibits a voracious appetite for data. To train sophisticated large language models (LLMs) like OpenAI’s ChatGPT and Google’s Gemini, firms must ingest and process immense corpora of text and audiovisual content, a significant portion of which is protected by copyright law. In an effort to secure these crucial inputs while navigating a treacherous legal landscape, AI firms have adopted a new and consequential market practice: the exclusive content licensing agreement.
In a flurry of recent deals, leading AI firms have contracted with major content owners for exclusive access to their archives. OpenAI has been particularly active, securing licenses with Reddit, the Associated Press, and News Corp, while Amazon recently followed suit in a landmark agreement with the New York Times. These arrangements arguably represent a laudable, market-based solution that fuels technological progress by providing AI developers with legally sanctioned access to high-quality training data, while ensuring that creators and content owners are fairly compensated for the use of their work.
A more critical examination, however, reveals the troubling potential for market distortion and competitive foreclosure. This article argues that the current trajectory of high-value, exclusive licensing deals between established LLM developers and major content owners threatens to construct a “digital enclosure loop”: a self-reinforcing cycle that entrenches incumbent firms in both the AI and content industries, accelerates market consolidation, stifles the innovative potential of smaller entrants in AI, and risks creating bottlenecks in the public’s access to copyrighted content.
The conventional legal frameworks designed to balance intellectual property protection with access—namely, copyright law’s fair use doctrine and antitrust law—are ill-suited to prevent the rapid formation of “data moats” around a handful of dominant players. To avert a future where technological innovation and access to content are gatekept by a few powerful entities, a more structured and proactive solution is required. Among other measures, a “compulsory access license” anchored in the economic theory of penalty defaults could preserve the dynamism of private ordering while providing a crucial backstop against market failure. These statutory content-licensing arrangements would lower barriers to entry and enable market competition without requiring the government to intervene on a case-by-case basis. Such a license would uphold the intertwined objectives of copyright, competition, data access, and innovation in the age of generative AI.
The rise of the data moat: licensing as a barrier to entry
The quality of LLMs is inextricably linked to the quality and scale of their training data. In this context, not all data is created equal. The internet is already becoming increasingly populated with lower-quality, AI-generated “synthetic” content. Researchers warn that this phenomenon will lead to “model collapse,” a degenerative spiral where AI systems trained predominantly on AI-generated outputs create increasingly inaccurate and biased content on which the AI systems successively train, resulting in “slop.” The risk of model collapse renders “clean” (i.e., curated, human-generated) datasets—in particular, the vast archives created before the widespread deployment of generative AI circa 2022—exponentially more valuable.
It is precisely these reservoirs of clean, human-generated data that are now being enclosed through exclusive licensing deals. OpenAI’s reported $250 million, five-year agreement with News Corp, for instance, grants it privileged access to the archives of The Wall Street Journal, the New York Post, and other press assets. The financial size of this deal—alongside similar arrangements with publishers like Axel Springer and Condé Nast—effectively locks out less well-capitalized rival AI developers, creating a “data moat” around one of the AI industry’s largest and most powerful incumbents.
Data moats represent a formidable barrier to entry for emerging AI firms. Startups and smaller companies often drive innovation, but they lack the immense capital required to bid for a nine-figure licensing deal. Nor can they typically offer the sort of in-kind payment—such as privileged access to cutting-edge AI tools—that sweetens the deal for major content owners. Deprived of access to the highest-quality training data, these smaller players face an insurmountable competitive disadvantage. As a result, their models may be less capable, more prone to bias, and simply unable to compete. The result is a chilling effect on innovation, where the technological trajectory is dictated not by the most inventive newcomers but by the few giants with the deepest pockets.
Vertical integration and the digital enclosure loop
The risks to both innovation and access posed by exclusive licensing deals are amplified by the propensity for vertical integration between content producers and content distributors in the media industry. Many of today’s largest and most successful media companies, like Netflix, both produce popular content and distribute that content through wholly owned distribution channels that collect granular user data. These arrangements are now spreading to AI. See, for example, Meta’s $14B investment in Scale AI and xAI’s all-stock acquisition of Twitter. These integrations threaten a “digital enclosure loop” between content, data, and AI that will further lock up quality content behind the moats of a few incumbent AI firms.
To illustrate this concept, consider a vertically integrated streaming platform that partners with an AI developer:
- Exclusive Content: First, the vertically integrated streaming platform locks up exclusive (self-produced) content—films, TV shows, podcasts—on its platform, drawing in subscribers and insulating itself from competition.
- Proprietary Data: As users consume the exclusive content, the streaming platform gathers proprietary data on their viewing habits, preferences, and behavior at a massive scale. These behavioral datasets are immensely valuable for training both recommendation algorithms and generative AI models.
- AI Refinement: Next, the streaming platform’s AI partner utilizes both the platform’s content archives and its user interaction data to train better algorithms. An LLM fine-tuned on the platform’s content can formulate—and in some cases, even create—similar and synergistic content, and it can use behavioral data to refine personalized user recommendations.
- Enhanced Platform: Finally, the refined and personalized AI model is deployed to enhance the user experience on the streaming platform by providing more accurate content recommendations, offering AI-generated subtitles and translations, or even creating AI-generated content. This makes the streaming platform even more attractive to users, thereby reinforcing its dominance.
This digital feedback loop ultimately encloses both sides of the market: human audiences and AI training. The vertically integrated content firm gains a double advantage: exclusive human audiences for its content and exclusive access to specially-trained AI models. Over time, a handful of such “content–AI partners” could come to dominate both the streaming and AI markets. Netflix’s data-driven recommendation engine, for example, has long been cited as a competitive advantage, and one can readily imagine that advantage growing if such data is exclusively funneled into Netflix’s own, personalized AI models. The digital enclosure loop thus becomes self-perpetuating: control the content and you control the data; control the data and you can enhance the content and its delivery. Taken with the data moats formed through these partnerships, the net result is a high barrier to entry for both prospective content and prospective AI companies.
The inadequacy of existing legal frameworks
The conventional legal tools intended to mitigate market failures at the intersection of intellectual property and competition policy have proven ill-equipped to address the threat of a digital enclosure loop. Both copyright law’s fair use doctrine and antitrust law suffer from structural limitations that make them ineffective tools for addressing the speed and scale of the challenges presented by exclusive licensing deals between content owners and generative AI companies.
Fair Use:
Arguably copyright law’s most critical safety valve, the fair use doctrine permits the unlicensed use of copyrighted material under certain circumstances. Its utility for AI developers, however, is severely constrained. Critically, fair use is often (if erroneously) treated as an affirmative defense, meaning its protections can only be effectively vindicated through litigation—a process that is prohibitively expensive, time-consuming, and notoriously unpredictable for nascent firms. The decade-long legal battle in Authors Guild v. Google over the Google Books project, in which Google scanned millions of books for its digital library, for example, would be a death sentence for a startup. Combined with fair use’s case-by-case analysis and application, a finding of fair use in one case wouldn’t even necessarily obviate the need to litigate in another.
The normative trajectory of fair use case law adds to the doctrine’s profound legal uncertainty. The Supreme Court’s 2023 decision in Andy Warhol Foundation v. Goldsmith narrowed the (formerly primary) “transformative use” inquiry, shifting judicial focus from whether a new work adds new (transformative) meaning or message to whether it serves a commercial purpose that supplants the market for the original work. This pivot casts a shadow over the practice of LLMs analyzing text and data (text-and-data-mining) for commercial AI training. While some early district court rulings, such as Bartz et al. v. Anthropic PBC, have found the process of training a model to be a transformative use, they have also signaled that the use of pirated or illegitimately sourced training data would weigh heavily against a finding of fair use. This leaves developers in a precarious legal position, where the legality of their core business practice hinges on a case-by-case analysis that is both costly and uncertain. Incumbent rights holders can also weaponize this legal ambiguity, as demonstrated when The New York Times sought an unprecedented data-preservation injunction against OpenAI, sending a chilling message to smaller firms that lack the resources to withstand such aggressive legal tactics.
Antitrust Law:
Antitrust law, the traditional tool for combating market concentration problems, is similarly ill-equipped here, primarily due to a debilitating pacing problem. The speed of technological change and market dynamics in the digital economy far outstrips the speed of litigation. By the time an antitrust case against a dominant tech firm is resolved, the market has often been irrevocably transformed, rendering any remedy moot for most intents and purposes. The Federal Trade Commission’s ongoing case against Meta, for example, challenges acquisitions that were cleared a decade ago, long after their network effects solidified Meta’s dominant market position.
Moreover, antitrust litigation in digital markets is notoriously complex and often bogged down in foundational disputes over the definition of the relevant market. For multi-sided platforms that serve distinct consumer groups—e.g., users and advertisers or, in this case, content creators and AI developers—establishing the market power necessary to prove a violation is an especially technical and contested process. Consequently, antitrust remains a reactive, ex post remedy that tends to chase already-settled markets, offering little prospective relief for nascent competitors in the rapidly evolving generative AI space.
A path forward: compulsory access
There is no one-size-fits-all solution to the access bottlenecks described here. Instead, the path forward should incorporate a range of interventions—from tax incentives to reversion rights. One intervention in particular is the focus here. In order to foster a more competitive ecosystem—one that ensures more widespread access to the content and data that are increasingly essential in the generative AI age—this article proposes a legislative intervention that calls on a long-standing convention in copyright law, the compulsory license. In broad strokes, a “compulsory access license” would apply the economic theory of penalty default rules—as extended to IP licensing—to ensure access to essential data and content only where there is demonstrated market failure.
A penalty default rule is a legal default rule that is deliberately undesirable, thus encouraging parties to negotiate around it. It is usually employed to force a party in a negotiation to reveal more information than they normally would in order to correct information asymmetries. In this case, rather than attempting to set a “market” rate for content used to train an LLM, a penalty default license would intentionally set a statutory royalty (or royalties) at a level somewhat less attractive than a privately negotiated rate would be in an efficient market. This design creates a powerful incentive for parties to voluntarily contract around the default in an attempt to negotiate more beneficial terms, preserving private ordering where it is feasible and desirable.
The compulsory access license proposed here would allow incumbents to engage in exclusive licensing only for a limited time—say, 6 months—after which the compulsory license would trigger, opening the content up to all prospective licensees at the statutory rate and on the statutory terms. Importantly, once the license has been triggered and the statutory conditions are met, the licensor cannot refuse to license, thereby eliminating the potential for interminably exclusive dealing and concomitant market failure.
If a licensor doesn’t want to have to accept the less desirable statutory rate and terms, they don’t have to engage in exclusive dealing (since only exclusive dealing triggers the license). In other words, in cases where bargaining fails—due to prohibitive transaction costs or a gross imbalance of market power—the compulsory access license provides a crucial legal fallback. In this way, the license can act as a floor to ensure that access to essential data inputs is not entirely foreclosed by anticompetitive behavior or market failure. Initial and regularly adjusted royalty rates could be set by an existing administrative body— like the Copyright Royalty Board—that already serves a similar role for other compulsory licenses in copyright.
Any proposal for a compulsory access license is likely to face logistical, legal, economic, and political challenges, many of which can be addressed with proper tailoring and calibration. Importantly—and unlike OpenAI’s recently announced (and then recanted) plan to force content owners to affirmatively opt out or risk unlicensed use to train the company’s new video generator model, Sora 2—such a license would protect (i) content owners of all sizes from uncompensated exploitation; (ii) the public from the artificial scarcity created by exclusive content deals; and (iii) nascent AI firms from the discriminatory leverage of entrenched incumbents, guaranteeing access to the data that is the lifeblood of innovation in the AI era.
Conclusion
The agreements being struck between major AI developers and major content owners represent a critical juncture in the development of both industries. While private licensing appears to harmonize the interests of copyright holders and innovators, the market’s current structure—dominated by high-value, exclusive deals between market leaders—poses a grave risk to the competitive health of the digital ecosystem. The current approach to licensing is rapidly constructing a digital enclosure loop that threatens to stifle innovation, accelerate media consolidation, and entrench a new set of powerful information gatekeepers.
Our existing legal tools, forged in a different technological era and for different purposes, are inadequate to dismantle this looming enclosure. Together with other access measures, a compulsory access license built on the pro-competitive logic of a penalty default license offers one promising path to rebalancing the scales. By providing a guaranteed floor for access, such a framework can ensure that the transformative potential of AI is not captured by a privileged few.
Author Disclosure: The author reports no conflicts of interest. You can read our disclosure policy here.
Articles represent the opinions of their writers, not necessarily those of the University of Chicago, the Booth School of Business, or its faculty.
Subscribe here for ProMarket’s weekly newsletter, Special Interest, to stay up to date on ProMarket’s coverage of the political economy and other content from the Stigler Center.





