Copyright Liability For Unauthorized Dataset Use In AI Model TrAIning In Poland
📌 1. Polish Legal Framework (Background)
Under Polish copyright law (Ustawa o prawie autorskim i prawach pokrewnych), exclusive rights include the right to reproduce and use a protected work. Training an AI model normally involves making copies of copyrighted texts, images and other works — activity that in principle requires authorization. The EU’s DSM Directive (2019/790) permits text and data mining (TDM) as an exception; but Poland has debated how far this applies — especially for commercial AI models — and some legal authorities oppose broad AI use of works without licenses.
Importantly:
Only human‑authored works are protected under Polish copyright law (purely machine‑generated works are not).
Key issues are whether AI training is lawful TDM and whether there’s voluntary opt‑out by rights holders.
There are no major final Polish court decisions yet on AI training copyright infringement, but ongoing debates and EU case law are shaping the field.
📘 Key Legal Principles
Before diving into cases, understand these underlying concepts:
• Copyright reproduction right:
Making copies of a protected work — even for AI training — usually requires rightsholder consent (license).
• TDM exception:
EU law allows mining for data analysis under certain conditions. Poland’s implementation has been controversial, and rights holders like ZAiKS have reserved rights against unlicensed use.
• “Fair use” vs “fair dealing”:
Poland does not use U.S.-style “fair use” as such; it has exceptions like dozwolony użytek, which aren’t clearly defined for AI.
• Liability:
If an AI developer makes unauthorized copies of copyrighted works for training, that can constitute infringement — subject to exceptions or judgments on scope.
📌 Case 1 — Hypothetical: ZAiKS Reservation vs AI Developer (Poland)
This is not a court judgment — but it functions as a de facto legal obstacle:
Facts: Poland’s main collective rights organization ZAiKS (representing authors and creators) issued a legal reservation stating that text and data mining (including AI training) of works it represents is infringement unless licensed.
Legal Point: Under current copyright law, rights holders may opt out of exceptions. ZAiKS’s declaration can be invoked by rights holders to assert infringement.
Liability Implication: If an AI developer used works from ZAiKS repertoire for training without license and without a valid exception, the rights holder could sue for infringement. Damages and injunctions could be pursued — even though there’s no final court ruling yet.
Key Legal Issue: The dispute showcases the tension between statutory exceptions (TDM) and contractual rights holders’ control.
📌 Case 2 — German LAION Dataset Litigation (EU Influence on Poland)
Although this case isn’t Polish, it’s EU‑significant and highly relevant:
Facts: Photographer Robert Kneschke sued dataset creator LAION e.V., alleging his copyrighted photos were included in the LAION dataset used to train AI models.
Decision: The Hamburg Regional Court ruled dataset creation could qualify as scientific TDM exception (Article 3) because building the dataset was “non‑commercial research use” — allowing the inclusion of the works.
Why It Matters for Poland:
The DSM Directive applies across the EU, and this precedent interprets TDM exceptions for AI datasets.
In Poland, courts would also consider TDM exceptions — but with debate over whether commercial AI training is covered.
Legal Insight: If dataset creation for AI training qualifies as lawful data mining, the developer may avoid infringement liability — but only if the specific exception genuinely applies.
📌 Case 3 — U.S. Anthropic Copyright Suit (Fair Use Argument)
Not a Polish case, but highly instructive:
Facts: Authors Andrea Bartz, Charles Graeber and Kirk Wallace Johnson sued the AI company Anthropic for training its AI model Claude on millions of copyrighted books without permission.
Court Ruling:
The judge ruled training AI on books is transformative fair use under U.S. law — i.e., not infringement — because AI training creates something new and non‑competitive.
However, acquiring those books through pirated sources was not protected and could lead to damages.
Takeaway for Poland/EU:
U.S. fair use is not directly applicable in Poland, but it illustrates how courts can distinguish training use from piracy or mass reproduction.
📌 Case 4 — Hypothetical: Use Without License in Poland (Precedential Prediction)
Facts (Hypothetical):
An AI company trains its model using copyrighted novels scraped from the internet without authorization. Authors sue in a Polish court claiming infringement of the reproduction right.
Legal Issues Polish Court Would Grapple With:
Whether the activity qualifies under a statutory exception (e.g., TDM) — contested under Polish law.
Whether the reproduction is substantial and unauthorized — prima facie infringement absent exception or license.
Whether the rights holder validly opted out of any data mining exemption.
Likely Outcome: If no exception applies and no license exists, the court would likely find infringement and award damages, unless the defendant convincingly shows a statutory exception or fair dealing. This risk creates strong incentives for licenses.
📌 Case 5 — Meta and Artists Lawsuits (European Landscape)
Although not decided in Poland, similar actions in the EU illustrate the trend:
Example: In France, publishing groups sued Meta claiming its AI was trained on copyright works without permission.
Legal Significance: These cases reflect growing creator demands worldwide — and if such claims proceed in EU courts, their reasoning will influence Polish courts interpreting EU directives on data mining exceptions and unauthorized use.
🧠 Comparative Insights — Why These Cases Matter in Poland
Even though Poland has few final judgments yet:
✔️ EU law binds Poland: The DSM Directive’s TDM exceptions and AI Act transparency rules will influence how Polish courts treat unauthorized data use.
✔️ Rights holder opt‑out rights matter: Collective rights organizations like ZAiKS assert that AI training without license is infringement — and Polish law provides mechanisms to oppose exceptions.
✔️ International jurisprudence (e.g., Germany, U.S.) provides guidance on balancing rights holder interests and innovation incentives.
📌 Practical Legal Liability for AI Developers (Poland)
Copyright infringement claims — encode unauthorized training data; if reproduced beyond exceptions, rights holders can sue for damages and injunctions.
Criminal sanctions — in extreme cases of large‑scale copying, Polish law allows criminal liability for violating copyrights.
Collective rights disputes — organizations can block exceptions and force licensing or cessation.
EU harmonization pressures — EU AI Act and TDM will shape future liability rules.
🧾 Summary: Core Legal Points
| Legal Issue | How Courts Might Treat It |
|---|---|
| Unauthorized training data use | Likely infringement absent valid exception/license |
| TDM exceptions | Applicable if clearly lawful and non‑opt‑out |
| Rights holder opt‑out | Can undermine exceptions — liability risk |
| AI output copyright | Generally not protected if purely machine‑generated |
| International cases | Useful for interpretive guidance |
📌 Conclusion
Liability for unauthorized dataset use in AI model training in Poland hinges on whether the use of copyrighted works is authorized, exempt under Polish law or EU directives, and whether rights holders have validly opted out. Despite limited Polish judicial precedents specifically on AI, ongoing litigation in the EU and U.S. shapes the legal landscape. Rights holders and AI developers are actively contesting how copyright law applies to training models — and this will likely produce significant Polish case law in the near future.

comments