URL-import
ToS clause

Status: DRAFT — requires legal review. Budget: ~$500-1000 with privacy lawyer (one-time, Open Questions § XV #5) Goal: legal foundation для shadow data collection без registration-form checkbox.

См ADR 0017 для rationale.


Clause 1 — Data Collection and Model Improvement (для ToS)

By using ARNO ("Service"), you acknowledge that we collect anonymized
usage data including:

  - Anonymized URLs (cryptographically hashed via SHA-256)
  - Component metadata (without personally identifiable information)
  - Extraction outcomes and any corrections you make

This data is used solely to improve Service accuracy and quality. We do
not sell or share this data with third parties.

You may opt out at any time via Settings → Privacy → "Anonymous data
contribution". Past contributions can be deleted via Settings → Privacy
→ "Delete past contributions".

For European users (GDPR), this processing relies on legitimate interest
in service improvement (Art. 6(1)(f)). You have the right to object at
any time without affecting Service availability.

Clause 2 — URL Import and Ownership Attestation (для ToS)

When you import content from a URL, you affirm:

  1. You own the content at that URL, OR
  2. You have a valid license to use the content, OR
  3. The content is in the public domain.

ARNO is not liable for copyright infringement arising from your use of
imported content. You agree to indemnify ARNO against any third-party
claims related to content you import.

We may apply additional verification for known commercial domains (top
15,000 most-trafficked sites as ranked by Tranco). For such domains,
you must explicitly confirm legal authorization before extraction.

Clause 3 — Domain Reputation Attribution (для Privacy Policy footer)

Domain classification powered by Tranco (https://tranco-list.eu),
licensed under Creative Commons Attribution 4.0.

Clause 4 — Data Retention Schedule (для Privacy Policy)

We retain your data according to the following schedule:

  Component files (TSX, types, tokens)     User lifetime
  Original screenshots                     90 days, then hash-only
  Staging area (uncommitted imports)       90 days inactive → notify
                                            120 days → deletion
  Anonymous training contributions         2 years
  Your corrections to extractions          Anonymized after 90 days
  Account attestations                     7 years (legal audit)

Upon account deletion, all your personal data is removed within 30 days.
Anonymous atom contributions to our shared catalog are preserved but
de-attributed.

Clause 5 — Third-Party Services Disclosure (для Privacy Policy)

URL extraction uses the following third-party services:

  - Google (Gemini API) — for text and vision analysis
  - Cerebras — for code generation
  - Modal Labs — for compute infrastructure
  - Backblaze (B2) — for staging storage
  - Hyperbrowser — for anti-bot bypass (when needed)

Each service receives only the minimum data required for its function.
We do not share Personally Identifiable Information (PII) with these
services beyond what's necessary for Service operation.

Your imported URLs are NOT sent to AI training datasets of third parties.

Settings → Privacy UI text

─────────────────────────────────────────────────────────
 Privacy Settings
─────────────────────────────────────────────────────────

 ✓ Anonymous data contribution
   Help improve ARNO by allowing anonymized usage data
   from your URL imports to be used for model training.

   We collect:
   • Anonymized URLs (cryptographically hashed)
   • Component metadata (no PII)
   • Your corrections to extractions

   We do NOT collect:
   • Your account information
   • Your actual URL contents (only metadata)
   • Personally identifiable information

   [Learn more]  [Delete past contributions]

─────────────────────────────────────────────────────────
 Account
─────────────────────────────────────────────────────────

 [Download my data]
 [Delete my account]

Legal review checklist

Перед launch — privacy lawyer должен validate:

  • GDPR Art. 6(1)(f) legitimate interest balancing test documented
  • GDPR Art. 13/14 disclosure requirements met (what, why, how long, recipients)
  • GDPR Art. 17 right-to-erasure mechanism described (Settings → Delete past contributions)
  • GDPR Art. 21 right-to-object mechanism described (Settings → opt-out toggle)
  • CCPA "Notice at Collection" equivalent (California users)
  • Copyright DMCA safe harbor — user attestation language sufficient
  • Tranco CC-BY 4.0 attribution placement correct
  • Third-party processor agreements в place (Gemini, Cerebras, Modal, B2)
  • "Anonymized" definition meets GDPR standard (not just pseudonymization)
  • Account deletion 30-day promise technically guaranteed (cascading delete in spec § XI.4.1)

Implementation notes

  • Clauses 1-5 — должны быть в ToS/Privacy Policy at registration time, accepted before first extraction
  • Settings UI — implemented в ARNO dashboard, opt-out applies к ВСЕМ data collection (shadow + analytics)
  • Tranco attribution footer — должна быть на public-facing pages где reputation check influences UX
  • Daily Tranco refresh cron must include attribution string в response headers (optional но best practice)

Open legal questions

  1. EU AI Act compliance (effective Aug 2026) — training data provenance disclosure requirements могут изменить wording
  2. California AB-2013 (training data transparency law) — может потребовать additional disclosure
  3. GDPR vs CCPA differences — single notice can satisfy both? Или separate
  4. Children's data — ARNO не targets minors, нужно ли explicit COPPA statement?
  5. Cross-border data transfer (Schrems II) — Modal, Gemini в US → SCCs/DPF required для EU users?

Все вопросы → legal review.


Cross-references