ADR 0012 — Uniqueness: user decides, не algorithm

Date: 2026-05-22
Status: Accepted
Feature: URL-import
Affects: url_import_spec.md § IV Phase 9

Context

Юзер импортирует URL → extracts 8-12 компонентов. Часть из них может быть "стандартный" Button или Card что уже есть в ARNO catalog. ARNO catalog растёт со временем, юзер другой может benefit от existing patterns.

Вопрос: кто решает уникален ли компонент или это reuse?

Изначальный draft (rejected): algorithmic uniqueness scoring

Compute visual + semantic similarity к catalog
If similarity > threshold → auto reuse existing
If < threshold → create new

Problems:

False positives ("это уже есть") — юзер видит чужой компонент в его проекте
False negatives ("это уникальное") — каталог раздувается дубликатами
Threshold arbitrary, calibration nightmare
Tantrum potential — юзер хочет именно его дизайн, а алгоритм "знает лучше"

Decision

Algorithm shows top-K neighbors, USER decides:

async function uniquenessCheck(component): Promise<UniquenessResult> {
  try {
    const embedding = await dinov2.embedONNXCPU(component.screenshot);
    const neighbors = await pgvector.query({
      table: 'component_embeddings_visual',
      vector: embedding,
      limit: 5,        // top-K
      distance: 'cosine'
    });
 
    if (neighbors.length === 0 || neighbors[0].distance > 0.4) {
      // Явно уникален — auto-create new
      return { decision: 'new', auto: true };
    }
 
    // Близкие соседи есть → юзер видит и решает
    return { decision: 'pending', neighbors };
  } catch (e) {
    return { decision: 'new', auto: true, flag: 'uniqueness_check_skipped' };
  }
}

UI представление:

🔍 Похожие компоненты найдены в каталоге:

[Preview-A] 87% similar — "Button (Material)"          [Использовать]
[Preview-B] 72% similar — "PrimaryButton (Brand X)"    [Использовать как variant]
[Preview-C] 65% similar — "ActionButton (shadcn)"      [Использовать как variant]
─────────────────────────────────────────────────────
[Создать новый компонент]

Юзер кликает → его решение записывается в manifest.uniqueness_decision.

Why user-decides wins

Zero false reuse — юзер видит сосед, решает sам
Context awareness — юзер знает свой проект (legal restrictions, brand consistency), algorithm не знает
Discovery value — юзер может выбрать better community pattern если он подходит
No threshold debates — нет single magic number argued in spec

Threshold rationale

> 0.4 cosine distance (~ 60% similarity) → auto "new" — slone далеко, юзер не получит ценность от choice
< 0.4 → ALWAYS surface to юзер — никаких auto reuse решений
K=5 — UI choice fatigue limit, > 5 overwhelms

Consequences

Pros:

0 false-positive reuse decisions
Юзер empowered, не overridden
Каталог растёт organic — auto-add только distinctively new components

Cons:

Extra UI step (vs auto-add)
- Mitigated: при > 0.4 cosine — auto, юзер не видит choice
Manual decision burden если > 0.4 always shown
- Mitigated: K=5 limit, clear preview

Catalog growth dynamics

Без user-decide: каталог растёт линейно с extractions, дубликаты накапливаются.

С user-decide:

Юзер reuse existing → каталог растёт slower
Юзер create variant → каталог растёт + family relationship
Юзер create new → unique entries only

Long-term effect: каталог seedling + organic curation = high quality, low noise.

Alternatives rejected

A. Algorithmic auto-decide

❌ False positives (см Context)
❌ Threshold debate без data

B. Always create new (skip uniqueness check)

❌ Каталог раздувается дубликатами
❌ Cache L2/L3 never hits — cost trajectory не improves

C. Show top-K but recommend one

❌ "Recommended" pushes юзер to algorithm choice → effectively algorithm decides
❌ Защемляет user autonomy

Cross-references

Main spec § IV Phase 9
Main spec § 0.7 Pivot 6 — pivot history
ADR 0018 — embedding strategy for atoms (separate from component-level)

ADR 0011 — Vision activation: reactive, не predictive ADR 0013 — "Чем дольше живём — тем меньше платим" как hard requirement