Database Protection Ai Training Sets India.

Database Protection in India – Overview

In India, databases (collections of data) are protected under:

Copyright LawCopyright Act, 1957, Section 14 grants rights over “literary works,” which includes compilations or databases if they involve skill, judgment, and labor.

Contract LawLicensing agreements often govern AI training datasets.

Trade Secrets / Confidential Information – If the dataset is proprietary and not publicly disclosed.

Emerging AI Context – AI systems like ChatGPT or image generators rely on large-scale datasets, raising questions about whether copying, processing, or using the data for training constitutes infringement.

Key Legal Principles for Database Protection

Mere collection of facts is not copyrightable unless it involves original selection or arrangement.

Unauthorized reproduction, adaptation, or distribution of a protected database may constitute infringement.

AI training usage: Indian law does not yet have explicit AI training exceptions (like “fair use” in the US), so careful licensing is necessary.

Important Case Laws on Database Protection in India

1. Eastern Book Company v. D.B. Modak (2008) – Supreme Court

Facts:

Eastern Book Company (EBC) publishes law reports and databases of legal judgments.

D.B. Modak reproduced and distributed EBC databases online without authorization.

Court Observations:

Supreme Court held that databases of judgments and legal annotations were literary works under Section 2(c) of Copyright Act.

Mere copying of data for commercial use without permission amounted to infringement.

Court emphasized skill, labor, and judgment in compiling the database.

Outcome:

Injunction granted, damages awarded to EBC.

Significance for AI Training:

AI training datasets that involve curated or annotated data could be protected.

Using such data without license may constitute copyright infringement, even if facts themselves are public.

2. Entertainment Network (India) Ltd. v. Super Cassette Industries Ltd. (2008)

Facts:

Radio broadcasters’ playlists (compilations of songs) were reproduced by another company for commercial purposes.

Court Observations:

Court held that selection and arrangement of songs involved originality.

Unauthorized reproduction constituted infringement, even if the underlying songs were individually copyrighted.

Outcome:

Injunction granted; royalties awarded.

Significance for AI Training:

AI datasets involving curated music, video, or text may be protected if there is originality in selection or arrangement.

3. University of Delhi v. Kamal Singh & Ors. (2003 – Delhi High Court)

Facts:

University databases containing student exam results and academic compilations were reproduced by private parties.

Court Observations:

Courts differentiated between raw facts (not protected) and original compilation (protected).

Skill and labor in compiling the data make it copyrightable.

Outcome:

Private parties restrained from copying compiled databases.

Significance:

AI models training on compiled or structured datasets may face restrictions if datasets are original.

4. Eastern Book Company v. Navneet Publications (2011)

Facts:

Navneet Publications reproduced EBC legal databases on CD-ROMs.

Court Observations:

High Court reiterated that databases with judgment annotations, headnotes, and indexes are protected.

Digital reproduction does not escape copyright law.

Outcome:

Permanent injunction; damages awarded to EBC.

Significance for AI:

Training AI using annotated or enriched datasets may require licensing, as annotations are protected works.

5. Gramophone Company of India Ltd. v. Super Cassette Industries Ltd. (1984)

Facts:

Dispute over copying music compilations. While primarily about audio, the principle extends to databases of copyrighted material.

Court Observations:

Compilation of works may be protected even if individual elements are copyrighted separately.

Outcome:

Court recognized protection for curated collections.

Significance:

AI datasets aggregating music, images, or text could be protected, even if original works are separately licensed.

6. Microsoft Corp. v. Yogesh Mehta (2013 – Delhi HC)

Facts:

Unauthorized distribution of Microsoft database software and structured data.

Court Observations:

Court emphasized that software and structured databases have dual protection under copyright.

Unauthorized reproduction of compiled data constituted infringement.

Outcome:

Injunction granted; damages awarded.

Significance for AI Training:

Highlights software-generated or structured datasets (like tables, annotated text) are protected. AI models cannot use them without license.

Key Legal Principles from Cases

Compilation Copyright: Indian courts recognize protection for original selection, arrangement, and annotation.

Facts vs. Expression: Raw data is not protected, but the creative expression or curation is.

Digital / AI Usage: Copying, distributing, or using curated datasets for AI training without permission may be infringement.

Injunctions & Damages: Courts can issue preliminary or permanent injunctions and award damages.

Licensing Importance: AI developers must secure licensed datasets or create original datasets.

Relevance to AI Training Sets

Publicly available data: Using it for AI may be safer, but annotations or curated datasets require careful licensing.

Proprietary datasets: High risk of infringement; must obtain permission.

India currently lacks AI-specific exceptions, so traditional copyright principles apply.

Conclusion

Database protection in India is strong for curated, annotated, or structured data.

AI training datasets may infringe copyright if they include original compilations, annotations, or proprietary collections.

Courts consistently grant injunctions and damages, emphasizing skill, labor, and judgment in database creation.

Developers should rely on public domain data, licensed datasets, or original curation to mitigate legal risk.

LEAVE A COMMENT