Ai Training Datasets Copyright Infringement.

PART 1: AI TRAINING DATASETS AND COPYRIGHT INFRINGEMENT

1. Overview

AI systems, especially large language models, generative AI, and computer vision systems, require large-scale datasets for training. These datasets often contain:

Text from books, articles, and websites

Images, audio, and videos

Scientific papers, patents, or proprietary data

Legal concern: Copying copyrighted material without a license could constitute infringement, even if the AI model does not directly replicate the work.

Key Issues:

Training on copyrighted works: Is it fair use or infringement?

Output similarity: Does AI-generated content copy protected expression?

Derivative works: Can AI outputs be treated as derivative of copyrighted data?

Licensing obligations: Are dataset providers required to secure permissions?

2. Key Case Laws

Case 1: Authors Guild v. Google (2015)

Facts: Google scanned millions of books to create a searchable AI-like database.

Decision: Court held Google’s use was transformative fair use, as it did not substitute the books’ market.

Implication:

Training an AI on copyrighted material may be permissible if it is transformative.

Direct reproduction or substantial output duplication could still infringe.

Case 2: Authors Guild v. OpenAI (Ongoing, 2023–2025)

Facts: Allegation that OpenAI’s models were trained on copyrighted works without licenses.

Issue: Can AI model training constitute copyright infringement?

Status: Litigation ongoing; highlights risk of dataset scraping without consent.

Implication: Shows increasing scrutiny on AI dataset curation and licensing.

Case 3: Github Copilot / Microsoft v. Plaintiffs (2022–Present)

Facts: Copilot AI reproduces code from open-source repositories without attribution.

Decision: Pending, but raises issues of direct copying vs. model-assisted generation.

Implication:

Training datasets containing copyrighted content must consider output replication.

AI outputs that replicate large portions of copyrighted text/code may lead to derivative infringement claims.

Case 4: Authors Guild v. HathiTrust (2012)

Facts: HathiTrust created a searchable digital library of copyrighted books for research.

Decision: Courts held it was fair use due to transformative academic and research purposes.

Implication: Training AI for research or non-commercial purposes may be safer than commercial AI model deployment.

Case 5: Oracle v. Google (Java API case, 2014–2021)

Facts: Google used Java APIs without a license to build Android.

Decision: Supreme Court ruled fair use due to transformative purpose.

Implication for AI: Using APIs, datasets, or other copyrighted resources can be fair use if sufficiently transformative and non-market substitutive.

Key Takeaways for AI Datasets

Transformative Use: AI training must alter the original work sufficiently to reduce infringement risk.

Output Monitoring: AI output must not reproduce copyrighted material verbatim.

Licensing: Obtain licenses for high-risk datasets.

Derivative Risk: AI outputs closely resembling copyrighted works can still trigger liability.

Non-commercial research is safer under fair use than commercial deployment.

PART 2: SYNTHETIC BIOLOGY PATENT PROTECTION

1. Overview

Synthetic biology combines biology, engineering, and AI to create:

Genetically modified organisms (GMOs)

Synthetic DNA sequences

Novel biochemical pathways

AI-designed proteins and enzymes

Patent protection encourages commercialization and innovation but faces challenges:

Patent eligibility: Laws of nature and natural DNA sequences are not patentable.

Novelty: Must demonstrate non-obviousness and utility.

AI involvement: AI-designed molecules must meet traditional patent criteria.

2. Key Case Laws

Case 1: Diamond v. Chakrabarty (1980)

Facts: A genetically engineered bacterium capable of breaking oil spills.

Decision: Supreme Court allowed patenting genetically modified organisms.

Implication: Synthetic biology inventions created by humans are patentable, even if based on natural biology.

Case 2: Association for Molecular Pathology v. Myriad Genetics (2013)

Facts: Myriad patented isolated BRCA1 and BRCA2 genes.

Decision: Naturally occurring DNA is not patentable, but cDNA (synthetic) is patentable.

Implication: Synthetic DNA or AI-designed sequences can be patented, but raw natural genes cannot.

Case 3: Mayo Collaborative Services v. Prometheus (2012)

Facts: Patents for metabolite-based drug dosing.

Decision: Claims based solely on natural correlations are not patentable.

Implication: Synthetic biology patents must include inventive human intervention, not just discoveries of natural laws.

Case 4: Amgen v. Sanofi (2017)

Facts: Dispute over monoclonal antibody patents.

Decision: Focused on the scope of patent claims and enablement for synthetic biological molecules.

Implication: Claims must be fully enabled, especially for AI-designed molecules.

Case 5: University of California v. Broad Institute (CRISPR Patent Dispute, 2016–2022)

Facts: Patent battle over CRISPR-Cas9 gene editing.

Decision: Courts analyzed inventorship and contribution.

Implication: AI-designed gene editing methods require clear inventorship attribution and detailed patent disclosure.

Case 6: Synthetic Biology Startups and AI Molecule Design (Recent 2020s)

Facts: AI-designed enzymes and proteins patented by startups.

Outcome: Courts and USPTO generally grant patents if molecules are human-designed and non-obvious.

Implication: AI can assist inventorship, but human inventors must still be identified.

3. Key Takeaways for Synthetic Biology Patents

FactorPrinciple
Natural vs SyntheticNatural DNA cannot be patented; synthetic sequences can
AI-Designed MoleculesPatentable if human-assisted and non-obvious
InventorshipHuman inventors must be listed
EnablementDetailed disclosure of synthetic biology process is required
UtilityMust demonstrate functional utility in biotech or medicine

4. Combined Lessons

AI in copyright vs synthetic biology patents:

AI training datasets risk copyright infringement if output reproduces copyrighted works.

AI-designed synthetic biology inventions can be patented if human inventorship is clear and inventive step exists.

Licensing & Access:

AI datasets require licenses or fair-use justification.

Synthetic biology patents grant exclusive rights to license biotech and pharmaceutical applications.

Strategic Protection:

For AI: Use proper dataset licensing and output control.

For synthetic biology: Secure patent protection early, especially for AI-designed molecules.

LEAVE A COMMENT