Consolidation: ARNO Design v6 + Observability v3 + Tech Stack v2 + audit fixes + URL-import. Status: Implementation-ready после week 1 prototyping verification. Single source of truth для design, observability и tech decisions.
Changelog
- v1.3 (2026-05-22): Unparked §V row "Small-company path / ARNO Studio" → first feature: URL-import onboarding. Full spec в docs/url_import_spec.md. 12 ADRs (0007-0018) document pivots. 3 rounds
/auaudits closed 18 P0 + 44 P1 на стыках. Reasoning: small-biz mass onboarding without git-account friction — staging area V1, GitHub App PR V2. - v1.2 (2026-05-20): Added Implementation_Workflow.md — vertical-slice execution plan (16 phases с demo-driven milestones). Workflow дополняет §VII.1 master spec — phases там показывают dependency order, в Implementation_Workflow — execution sequence с visible artifacts per phase.
- v1.1 (2026-05-20): Applied master-audit P0s (retention matrix, a11y, push mutex) + selected P1s (crypto choices, error contract, governance, glossary, launch checklist, multi-vendor outage matrix, source docs reference, version header).
- v1.0 (2026-05-20): Initial consolidation от 3 individual specs.
Содержание
- Часть 0. Vision, принципы, governance
- Часть I. ARNO Product Design
- Часть II. Observability
- Часть III. Tech Stack
- Часть IV. Unified MVP Scope
- Часть V. Unified Парковка
- Часть VI. Open Questions / Week 1 Prototyping
- Часть VII. Дальше + Launch Readiness
Часть 0
0.1 Vision
ARNO — облачный редактор для дизайна продукта поверх существующих репозиториев. Объединяет команды (Maker = дизайн+продукт, Frontend, Editor) — каждая видит свой срез, не ломает чужое.
Killer feature: изменение компонента → мгновенно во всех экранах во всех проектах.
Сценарий MVP: большая компания со своими репами. Малый бизнес — парковка.
Sustainability: $5/mo MVP floor, scales к 100K users через config-tier upgrades без code rewrite.
0.2 Source documents (исторический trail)
Master spec consolidates три converged individual specs:
| Source | Audit cycles до convergence | Status |
|---|---|---|
| ARNO Product Design v6 | 5 (v1 → v2 → ... → v6, 0 P0 reached) | Architectural baseline |
| Observability v3 | 3 (v1 → v2 → v3, plateau at 2 P0) | Operational baseline |
| Tech Stack v2 | 2 (v1 → v2, 3 P0 → 0 после audit fixes) | Implementation baseline |
Master spec — condensed reference. Detailed reasoning, alternatives rejected, audit findings preserved в chat history. Master canonical, individual specs историческая трассировка.
0.3 Architectural principles (36 кросс-cutting)
- ARNO строит ТОЛЬКО редактор. Approval / CI / audit / hosting / build / auth — leverage GitHub, Yjs/Liveblocks, Cloudflare, Grafana, Sentry.
- Не дублируем правду. Code+TSX — git. MD — git (создаётся в ARNO). Workflow — Yjs. MD-edit live state — наша БД.
- Identity ≠ name. UUID везде. Refactor-safe.
- Pluggable providers — GitProvider, auth, render-adapter, bundle-hosting, queue, cache, email.
- Live propagation + activity-based snapshot.
- MVP-конфиг + scale-ready схема только где migration honestly cheap.
- Yjs primary live, git async snapshot для workflow.
- ARNO = coordination tool, не code rewriter. Drift MD↔TSX детектится.
- Impact analysis синхронно с edit-action.
- CRDT там где co-edit реален. Workflow — Yjs. MD-spec — REST + versions.
- Mechanism > feature claim. Каждая фича имеет explicit implementation strategy.
- Never silent drop user data. Conflicts surface explicit choice.
- Compiled output ≠ source. Bundle proxy для private repos acceptable exception.
- Liveblocks: Storage / Awareness / Broadcast — три разных канала.
- Progressive load > full sync для больших проектов.
- Same-session vs different-session — разные semantics для conflicts.
- Content-addressable URLs solve replica lag.
- Edge cache > backend optimization для immutable resources.
- DB trigger > application cron для on-write cleanup.
- Leverage existing > build new. Managed services.
- TypeScript end-to-end. Shared types FE-BE.
- Runtime-portable. Hono runs на Workers AND Node.
- Vendor-agnostic instrumentation. OpenTelemetry.
- Interface abstractions для swappable layers.
- Cost-tier ladder explicit. Predictable upgrades.
- Module boundaries enforce future extraction.
- Stateless services discipline.
- Free auth forever via Auth.js. No MAU billing.
- Operational SPOF mitigation от day 1.
- IaC from day 1.
- Prototype critical compatibilities week 1.
- Backups для vendor-held data — MVP scope.
- Migration matrices must be honest.
- Invariants need probes. Claimed correctness без active measurement = silent failure.
- Tail-based sampling > head-based для balancing cost и debug coverage.
- Async trace propagation must be explicit.
0.4 Data retention matrix
| Data type | Retention | Anonymization on user delete | Storage |
|---|---|---|---|
| audit_log | 180d (configurable per project) | user_id → deleted_user_<hash> | Postgres |
| Observability logs (Loki) | 90d | TTL-based delete accepted MVP | Grafana Cloud |
| Error events (Sentry) | 90d per project default | scrub PII via beforeSend hook | Sentry SaaS |
| Distributed traces (Tempo) | 50GB rolling | user_id anonymization post-MVP | Grafana Cloud |
| MD versions | 20 per file (count-based, Postgres trigger) | scrub on user delete (post-MVP) | Postgres |
| Yjs backups | 90d | anonymize on backup export (post-MVP, parking) | Cloudflare R2 |
| component_md_raw | indefinite until project delete | scrub on user delete | Postgres |
| project_share_link | until manual revoke | no PII typically | Postgres |
| project (archived state) | indefinite until user delete | full delete on user request | Postgres |
| onboarding_session | 30d unused → auto-expire | full delete | Postgres |
| Redis caches (all) | TTL-based (varied 60s-58min) | no PII stored | Cloudflare KV |
| pushed_by_us dedup | 5min TTL | n/a | Cloudflare KV |
| JWT revocation list | TTL = remaining token expiry | n/a | Cloudflare KV |
GDPR compliance: "Right to erasure" outputs:
- Immediate: scrub
usersrow, anonymizeaudit_log, drop personal MD content - Within 90d: observability logs expire naturally
- Within 90d: Yjs backups expire naturally (no active scrub MVP)
- Post-MVP: parallel scrub jobs для logs/traces/backups
0.5 Crypto choices
| Use case | Algorithm | Notes |
|---|---|---|
| JWT signing | HS256 (MVP) | Symmetric, simple. Migrate к RS256 при multi-issuer |
| JWT_SECRET minimum strength | 256-bit random | Generated via crypto.randomBytes(32) |
| user_id_hash для deleted_users tracking | HMAC-SHA-256 с pepper | Pepper в env var. Prevents rainbow tables. |
| Content hashing (MD, bundles) | SHA-256 | Content-addressable URLs |
| Webhook signature verification | HMAC-SHA-256 | GitHub default, explicit |
| Device flow user_code generation | 8-char base32 (40 bits entropy) | Sufficient для 5min TTL |
| Session ID generation | crypto.randomUUID() | v4 UUID |
| Password hashing | n/a (no password auth, GitHub OAuth only) | Post-MVP non-GitHub providers → Argon2id |
0.6 Decision authority hierarchy
При обнаружении конфликтов между sections:
- Master spec wins над individual specs (этот документ canonical)
- ARNO Product Design wins над Tech Stack по product behavior
- Tech Stack wins над ARNO Product Design по infrastructure mechanics
- Observability wins над both по что monitored/alerted
- Conflicts surfaced → resolved в next master spec version, не workarounds
0.7 Spec governance
- Changes via PR к master spec
- Reviewer: founder + 1 co-admin (solo dev currently — future team grows process)
- Major architecture changes: ADR (Architecture Decision Record) создан first в
docs/adr/ - Minor clarifications: directly в master spec, version bump (1.1 → 1.2)
- Major rewrites: version bump (1.x → 2.0)
- Changelog maintained в section 0 (top of doc)
0.8 Glossary
| Term / Acronym | Definition |
|---|---|
| Maker | Combined Design+Product role. Primary editor of workflow и MD specs. |
| Workflow | Graph of screens + edges representing product flow |
| Screen | Composition of component instances; node в workflow graph |
| Edge | Connection between screens triggered by interactive element |
| Component spec | MD file describing component contract (props, events, structural sections) |
| Component instance | Use of component within a screen с specific prop values |
| Render adapter | Protocol+bundle that renders real React components в ARNO preview iframe |
| Session-branch | Per-Maker git branch (arno/{user-handle}) for in-flight edits |
| CRDT | Conflict-free Replicated Data Type — Yjs-based real-time collab |
| RBAC | Role-Based Access Control |
| SPOF | Single Point of Failure |
| MAU | Monthly Active Users (billing metric для Liveblocks, Clerk, etc.) |
| MVP | Minimum Viable Product |
| SLO | Service Level Objective |
| KPI | Key Performance Indicator |
| IaC | Infrastructure as Code (Terraform) |
| PITR | Point-In-Time Recovery |
| TTL | Time To Live |
| OAuth | OAuth 2.0 authorization protocol |
| JWT | JSON Web Token |
| OIDC | OpenID Connect |
| DAG | Directed Acyclic Graph |
| OTel | OpenTelemetry |
| OTLP | OpenTelemetry Protocol |
| CSP | Content Security Policy |
| DPA | Data Processing Addendum |
| TOCTOU | Time-Of-Check-Time-Of-Use (race condition class) |
| DO | Cloudflare Durable Objects |
| TTI | Time To Interactive |
| LCP | Largest Contentful Paint |
| WAF | Web Application Firewall |
| GraphQL | Query language for APIs (для GitHub bulk fetch) |
| tRPC | Typed RPC framework для TypeScript |
| Frontmatter | YAML metadata block at top of MD file |
| Fenced block | MD comment-delimited structural section (<!-- arno:props v1 -->) |
| textEditable prop | Spec marker allowing Editor role к modify prop value |
| Drift | Mismatch между MD spec и TSX code |
| Bundle proxy | ARNO backend serving private repo bundles с auth |
| Pre-edit impact analysis | Sync UI confirm перед breaking change |
| Branch-aware view | Maker sees own branch changes; others see main |
| Soft snapshot | Component versions cached during active screen edit |
Часть I. ARNO Product Design
I.1 Роли
| Роль | Workflow | Композиция | Инстансы | MD-спека | Код | Settings |
|---|---|---|---|---|---|---|
| Maker (Design+Product) | ✅ | ✅ | ✅ | ✅ | 👁 | ✅ |
| Frontend | 👁 | 👁 | 👁 | 💬 | ✅ (в IDE) | 👁 |
| Editor (UX-writer) | 👁 | 👁 | ✏️ textEditable props | 💬 | ❌ | 👁 |
| Viewer | 👁 | 👁 | 👁 | 👁 опц. | ❌ | ❌ |
| OUT of MVP (парковка) |
I.1.1 GitHub permission mapping
| admin/maintain | write | triage | read | none |
|---|---|---|---|---|
| Maker | Maker | Maker | Viewer | no access |
I.1.2 Permission enforcement
- UI-level: disabled controls с tooltip
- Workflow (CRDT): Liveblocks server-side mutation validation, path-allowlist:
- Maker: any path
- Editor: только
screens.*.instances.*.props.[id].valueгдеspec.props[id].textEditable=true - Viewer: reject
- MD-edit (REST): server checks role + path
- Audit log: rejected mutations с user_id, intent, timestamp
- Тонкие роли — через
.arno/config.jsonссылки на GitHub Teams
I.2 Data model
I.2.1 Postgres
user { id, provider_user_id, prefs, is_admin }
project { id, name, owner_id, visibility, state, current_size_bytes }
// state: 'pending_setup' | 'active' | 'archived'
connected_repo { project_id, provider, url, paths, sync_mode, last_synced_sha,
bundle_hosting, config_path, last_visibility_check, last_visibility_status }
project_member { project_id, user_id, role, source }
project_share_link { id, project_id, scope, token, ... }
component_md_raw { project_id, file_path, raw_md_text, frontmatter_id, content_sha,
current_version, updated_by, updated_at }
component_md_versions { id, project_id, file_path, version, content, content_sha,
parent_version, saved_by, saved_at, session_id, label }
// AFTER INSERT trigger purges versions OFFSET 20
project_id_conflicts { project_id, conflict_id_value, files[], detected_at, resolution_lock_holder }
webhook_job_queue { id, project_id, branch, commit_sha, parent_sha, retries, next_attempt_at, status }
onboarding_session { id, user_id, attempt_id, project_draft_id, current_step,
completed_steps, last_error, persisted_state }
audit_log { id, user_id, action, target, intent, timestamp }
deleted_users { user_id_hash, deleted_at }
// Auth.js Drizzle adapter tables:
accounts, sessions, verificationTokens
// Cloudflare KV - cached state:
pushed_by_us // SET, TTL 5min, webhook dedup
viewer_snapshot // viewer:project:{id} TTL 60s
bundle_proxy_cache // bundle:{sha} TTL 1h
project_snapshot // project_snapshot:{id} TTL 5min
gh_installation_token // gh_token:{installation_id} TTL 58min
gh_bucket // rate limiter state per installation
id_conflict_lock // id_conflict:{repo_id}:{conflict_id_value} NX EX 300
push_lock // push_lock:{project_id}:{branch} NX EX 60 (mutex)
revoked // revoked:{jti} JWT revocation, TTL = remaining token expiry
device_code // device_code:{code} CLI device flow, TTL 10min
device_polls // rate limit device flow pollsI.2.2 Liveblocks (три канала)
Storage (CRDT, persistent — workflow):
project (Y.Doc) {
screens: Y.Map<screenId, Screen>
edges: Y.Array<Edge>
}
Screen { id, name, presentation, root: ComponentInstance } // структура — парковка
Edge { id, from: {screenId, instanceId, eventId}, to: {screenId}, action: 'navigate' }Awareness: presence, current screen, cursors (ephemeral).
Broadcast:
md_saved { componentId, newVersion, contentSha, content?, by, timestamp }— content inline if <10KBdrift_detected { componentId, mismatches, status }external_change { path, by }subscription_changed { user_id }workflow_updated { regions }
Dual rooms: project:${id} (workflow + presence) + repo:${id} (cross-project fan-out).
I.2.3 Git-репа клиента
.arno/config.json // version: 1, paths, pairing, bundleHosting, arnoChanges
.github/workflows/arno-build.yml // bundle CI
.github/workflows/arno-check.yml // UUID lint, parse, drift check
arno.entry.tsx // render-adapter entry
Design_system/*.md // frontmatter id + fenced structural blocks v1
src/components/*.tsx // код (ARNO не редактирует)config.json:
{
"version": 1,
"componentsMd": "Design_system/",
"componentsCode": "src/components/",
"pairing": {
"default": "[Name].md ↔ [Name].tsx",
"overrides": { "Button.md": "src/components/Button.tsx#Button" }
},
"bundleHosting": { "type": "gh-pages | github-packages", "config": {} },
"arnoChanges": { "backfillPolicy": "require-review" }
}I.3 Архитектурные решения
I.3.1 Source of truth
| Сущность | Где живёт | Кто пишет |
|---|---|---|
| Code (TSX) | репа компании | FE в IDE |
| MD-спеки (canonical) | репа компании | Maker через ARNO (REST → push), FE опц. |
| MD-edit live state + versions | наша БД | Maker REST auto-save |
| Workflow (screens + edges) | Liveblocks Yjs Storage | Maker CRDT |
| Component data cache | наша БД (raw_md) | sync-handler webhook |
I.3.2 MD format + smart editor
- Canonical = свободный markdown
- Fenced structural blocks с версией (
<!-- arno:props v1 -->) textEditable: trueмаркер для Editor-доступных string props- Persistence: REST + versions (никогда silent drop)
- Conflict UI с user choice
- Real-time awareness через broadcast
- Multi-tab coordination через BroadcastChannel API
Save flow:
POST /api/md/{project}/{path} с {content, base_version, session_id}
Server:
if current_version == base_version → save
elif conflict.author == request.user AND conflict.session_id == request.session_id
→ auto-rebase silent (same-session fast-forward)
else → 409 conflict responseConflict UI options: view diff / save mine / discard mine (saved as backup) / merge manually.
Postgres trigger purges versions OFFSET 20.
I.3.3 Discovery, sessions, write-back
.arno/config.json(versioned"version": 1)- Session = git branch
arno/{user-handle} - Debounced push (~30s idle) в session-branch
- Первый push → auto-PR; subsequent → update
- Approval = GitHub (CODEOWNERS, branch protection, CI)
Pre-push HEAD check + concurrency mutex:
1. Acquire mutex (Redis SETNX):
key = `push_lock:${project_id}:${branch}`
if not redis.set(key, instance_id, NX=True, EX=60):
throw 'concurrent_push_in_progress' — retry later
2. Pull latest HEAD via API
3. If HEAD == last_known_push_sha → push, update last_known
4. If diverged → NO auto-push → UI:
"External commits on your session branch from IDE/git.
[Open in GitHub] [Discard mine, pull theirs]"
5. Always release mutex в finally:
redis.delete(`push_lock:${project_id}:${branch}`)
6. Never force-push automaticallyMutex prevents two concurrent ARNO instances pushing к same branch (TOCTOU race на step 2-3).
Pre-edit impact analysis: at edit-action time (not push), this-project scope, gated by sync completion.
I.3.4 Render
Edit mode: schematic boxes (zero setup).
Preview mode — tiered:
| Тип изменения | Behavior |
|---|---|
| Default value пропа | Override at render-time. Instant |
| Rename name пропа (id stable) | Warning "Spec renamed, code not yet updated" |
| Новый проп в MD | Warning, ignored в render |
| Новый компонент / TSX change | Build session-branch ~1-2 мин |
Drift detection (CI-based via react-docgen): 5 component statuses (green / red / yellow / purple / gray) + per-component opt-out.
Bundle hosting: gh-pages (public) / github-packages backend proxy (private), streaming pass-through.
I.3.5 Identity
- Component / Prop UUIDs в MD frontmatter / fenced sections
- Instance / Screen / Edge UUIDs в Yjs
arno backfill-idsutility- Collision handling: UI + default heuristic + Redis cross-project lock
arno lintблокирует merge PR с duplicates
I.3.6 Interactivity
- Instance-level
- События из fenced
events:MD - Edge:
{from: {screen, instance, event}, to: {screen}, action: 'navigate'} - Edit FROM screen (DevPanel "+"), просмотр НА workflow-canvas (read-only)
I.3.7 Real-time collab
- Yjs только для workflow
- Liveblocks transport + Storage + Awareness + Broadcast
- Yjs ↔ git invariant:
git_head_commits ⊆ yjs_serialized_state - Canonical serializer partial (passthrough prose, deterministic fenced)
- Reverse sync (FE-push) → 3-way merge UI парковка, MVP GitHub fallback
I.3.8 Sync с репой
GitProvider abstraction:
core: listFiles, getFileContent, createBranch, commit, getDiff, verifyWebhookSignature
extensions: github.createPullRequest, github.resolveCodeownersGitHub App required permissions: Contents R/W, Pull requests W, Metadata R, Webhooks W, Workflows W (scope .github/workflows/arno-*.yml), Packages R/W (для github-packages bundle hosting).
Optional: Members R, Checks W.
Rate limiter (KV token bucket per installation, 12,500/hour budget).
Token refresh middleware с SETNX lock (thundering herd protection).
Initial scan via GraphQL bulk (~20 queries для 2000 MDs).
Webhook handling: dedup (SHA + author=arno[bot]), fan-out via repo:{id} room, DAG ordering per-project, exponential backoff, persisted queue.
Auto-detect staleness: cron 10min для active projects.
I.3.9 Live propagation + edit stability + progressive load
- Push в main → broadcast → клиенты обновляются
- Branch-aware: Maker с changes → видит свою версию; другие — main
- Soft snapshot activity-based (notification + apply/ignore)
- Idle 30min → "auto-refresh in 60s"
- Server snapshot cron 60s → Redis для progressive load
- Open project: T=0 read-only (instant) → T=2-15s edit unlock
I.3.10 Sharing
- Visibility: private | unlisted (token) | public
- Scope: project | screen (frozen single)
- Anonymous viewer; require_login флаг (UI post-MVP)
- Manual revoke, multiple tokens. Expiry — не в MVP.
- Viewer НЕ подключается к Liveblocks (server snapshot + Redis cache TTL 60s)
- Zero MAU billing для viewers
I.3.11 Onboarding flow
Sequential + post-merge parallel tracks:
- Sign up — Auth.js + GitHub OAuth (см. §III.2.6)
- Install ARNO GitHub App
- Create project (list repos)
arno initwizard (paths, bundleHosting, config_path, state persisted per-attempt)- User merges PR
- Parallel:
- Track A (GraphQL bulk scan ~10-30s): edit-mode unlocked
- Track B (bundle CI ~60-120s): preview-mode unlocked
- Empty repo: hint к "ARNO Studio" parking (small-biz path)
- Land в workflow editor
Failure modes: PR not merged → email reminders (Day 1, 7), archive Day 30.
Project lifecycle: pending_setup → active → archived (90d inactivity by state-changing mutation; viewer access not counted).
I.4 Accessibility requirements
Target: WCAG 2.1 Level AA compliance для MVP.
Specific commitments:
- Color contrast ≥4.5:1 для normal text, ≥3:1 для large text
- All interactive elements keyboard-navigable (Tab, Enter, Esc, arrow keys)
- Workflow canvas keyboard navigation:
- Tab: между screens
- Arrow keys: pan canvas / select adjacent screen
- Enter: drill into screen
- Esc: back to canvas
- ARIA labels на все non-text interactive controls
- Focus visible indicators (explicit ring, не browser defaults)
- Screen reader testing as MVP gate (NVDA + VoiceOver)
- No reliance on color alone (status indicators have text/icons)
- Text resizable до 200% без horizontal scroll
CI enforcement:
axe-coreв Playwright E2E suite — fail on violations- Lighthouse Accessibility score >90 per page (CI gate)
Out of MVP:
- Full screen reader optimization для workflow canvas (complex, post-MVP)
- High-contrast theme (system theme support только)
- Keyboard-only prototype walkthrough (Tab through edges)
Documentation: docs/accessibility.md covers tested scenarios + known limitations.
Часть II. Observability
II.1 Stack
| Layer | Tool | Cost MVP | Migration |
|---|---|---|---|
| Errors | Sentry SaaS | $0 (5K events/mo) | Team $26 |
| Logs | Grafana Loki | $0 (50GB/mo) | Paid scaling |
| Metrics | Grafana Mimir | $0 (10K series) | Paid scaling |
| Traces | Grafana Tempo | $0 (50GB/mo) | Paid scaling |
| OTel Collector | n/a MVP (direct OTLP push) | $0 | Add when traces >40GB/mo |
| Dashboards | Grafana | $0 | Same |
| Alerting | Grafana Alerting → Slack/PagerDuty | $0 free tiers | PagerDuty paid |
| Dead-man-switch | Healthchecks.io | $0 | $5-20/mo |
| Status page | Manual MVP | $0 | Statuspage post-MVP |
II.2 Logs
Schema (structured JSON to stdout):
{
"timestamp": "ISO8601",
"level": "debug | info | warn | error",
"service": "api | worker | webhook | cron",
"trace_id": "uuid",
"request_id": "uuid",
"span_id": "uuid | null",
"user_id": "id | null",
"project_id": "id | null",
"session_id": "id | null",
"event_type": "md.save | webhook.received | ...",
"duration_ms": "number | null",
"context": { "..." : "event-specific" },
"error": { "type": "...", "message": "...", "stack": "..." }
}request_id fallback middleware — always populated.
Sampling (trace-aware via baggage):
- error / warn: 100% always
- info: 100% MVP, 10% при scale
- debug: 0% prod
PII blocklist: GitHub tokens, OAuth tokens, user emails (только user_id), MD content, TSX code, Cookies, Auth headers. Enforcement: lint rule + middleware redaction + code review.
II.3 Metrics
Counters / Histograms / Gauges — key metrics:
# Business
md_save_total{project_id, result}
workflow_mutation_total{project_id, kind}
webhook_received_total{event_type, dedup_skipped}
github_api_call_total{endpoint_class, status_code}
liveblocks_api_call_total{operation, status}
invariant_drift_detected_total{type, severity}
cron_dispatch_total{operation, result}
# Latency (histograms)
http_request_duration_ms{path, method, status_code}
md_save_duration_ms
webhook_processing_duration_ms
bundle_proxy_duration_ms{cache_layer}
snapshot_fetch_duration_ms{source}
# State (gauges)
active_projects_1h
liveblocks_active_connections{room_type}
project_size_bytes{project_id=<top-100|_other>}
degraded_mode_active{soft_dep}
otel_collector_buffer_utilization_pctCardinality budget:
user_idNEVER в labelsproject_id→ top-100 active sticky 24h, остальное_other- Status codes bucketed
- Mimir
max_series_per_tenant: 50000
II.4 Tracing
- OpenTelemetry SDK auto-instrumentation
- Manual wrapper для Liveblocks SDK, octokit
- OTel Collector с tail-based sampling — post-MVP
- HA Collector deployment + buffer monitoring + degraded sampling fallback
Async trace propagation helper (mandatory):
- W3C TraceContext в queue/cron job payloads
- safe_extract_trace_context с try/except для corrupted carriers
Frontend ↔ backend propagation через Sentry tracingOrigins.
II.5 Errors
- Separate Sentry projects:
arno-frontend,arno-backend - Frontend SDK lazy-loaded (после first interaction)
- PII scrubbing via
beforeSend - Release tagging deterministic:
<service>@<git_sha[:8]> - Source maps uploaded в CI
- Frontend hangs detection via Performance API
II.6 Alerting + SLOs
SLOs:
- API availability: 99.5%
- API read p95 latency: <500ms
- API write p95 latency: <1000ms
- Webhook processing p95: <30s
- Bundle proxy p95: <500ms (95% edge hit)
- Data correctness: 100% (verified by invariant probes)
- Webhook delivery success: >99%
- Drift detection coverage: >95% components
Alert tiers:
- PAGE (24/7): health endpoint down, error rate >5%, infrastructure unreachable, invariant drift detected, OTel Collector buffer >95%, cardinality budget exceeded, webhook signature spike
- NOTIFY (Slack + daily email): error rate >1%, latency degradation, rate limit hit, queue depth, cache hit drop, single soft dep degraded
- TICKET (weekly review): DAU/WAU trends, table size growth, long-tail latency
Progressive threshold tuning: weeks 1-2 absolute → weeks 3-4 baseline collection → week 5+ baseline-relative.
Escalation: PAGE ack 15min → second on-call → engineering lead.
II.7 Invariant probes
Tiered strategy:
- Fast probe (hourly, cheap): stored-state comparison (last_pushed_sha vs git HEAD)
- Deep probe (weekly, expensive): full Yjs serialize → canonical compare с git HEAD
Probe types:
- Yjs ↔ git invariant
- MD content_sha verification
- Workflow canonicalization determinism
- ID uniqueness
Coverage rotation (activity-tier based): Tier 1 (hot) daily, Tier 2 (warm) weekly, Tier 3 (cold) monthly.
Independent execution: async parallel, per-probe-type metrics, one failure doesn't cascade.
Dual emit: metric (для alerts) + structured log (для historical query).
II.8 Healthchecks
/health— liveness, always 200 if responding/ready— readiness, distinguishes hard (DB, Redis) vs soft (Liveblocks, GitHub) deps; returns degraded flag/metrics— internal network only
Rate limits: /health, /ready 60 req/min per IP (Cloudflare edge). /metrics not exposed externally.
Multi Healthchecks.io URLs: main /health + probe runner heartbeat + each critical cron.
II.9 Runbooks
- Repo:
arno-runbooks - PR template requires
runbook_urlдля alerts - CI check validates alert YAML → runbook file exists
- Quarterly review
- Daily orphan check
II.10 Customer support correlation
Grafana dashboard "User Debug View" с inputs (user_id, time_range, optional project_id) и panels (activity timeline, HTTP requests, Sentry errors, audit trail, traces, Liveblocks sessions). Access: support team only.
II.11 Compliance
См. §0.4 Data retention matrix для всех data types и retention policies.
Часть III. Tech Stack
III.1 Stack overview
| Layer | MVP ($5/mo floor) | Scale (paid tiers) | Migration |
|---|---|---|---|
| Language | TypeScript | Same | — |
| Frontend framework | Next.js 14+ (App Router) | Same | — |
| Backend framework | Hono + tRPC + Zod | Same Hono на Node | xs |
| Frontend hosting | Cloudflare Pages | Vercel Pro | xs |
| Backend runtime | Cloudflare Workers Paid | Fly.io / AWS ECS | xs |
| Background jobs | Cloudflare Queues | BullMQ + Redis | sm |
| Cron | CF Cron Triggers (2 Workers + dispatcher) | Node cron | xs |
| Database | Neon Postgres free | Neon Launch+ | xs |
| ORM | Drizzle (dual: HTTP runtime, Pool migrations) | Same | — |
| Cache | Cloudflare KV | + Upstash Redis | sm |
| Object storage | Cloudflare R2 | Same | — |
| CDN/WAF | Cloudflare | Same | — |
| Auth | Auth.js v5 (JWT mode) + Lucia fallback | Same | — |
| Real-time CRDT | Liveblocks free | Pro / self-hosted Yjs (Path A DO / Path B external) | sm-md / md-lg |
| Resend + abstraction layer | + SendGrid fallback | xs | |
| Errors | Sentry free | Sentry Team+ | xs |
| Logs/Metrics/Traces | Grafana Cloud free | Paid tiers | sm |
| CI/CD | GitHub Actions | Same | — |
| Healthcheck | Healthchecks.io free | Same | — |
| IaC | Terraform Cloud free | Same | — |
| Domain | *.pages.dev → Porkbun separate registrar | Same | — |
III.2 Per-layer key decisions
III.2.1 Backend module structure
apps/api/src/
routes/ # Hono routes (HTTP entry)
rpc/ # tRPC procedures
services/ # Business logic (framework-agnostic)
repositories/ # Data access via Drizzle
middleware/ # Auth, rate limit, observability, CORSCORS middleware environment-aware allowlist.
API versioning: /api/v1/... namespace, 6-month deprecation overlap.
III.2.2 Error contract
Standardized error response format для всех API endpoints:
interface ApiError {
code: string // 'unauthorized' | 'forbidden' | 'rate_limited' | 'conflict' | 'validation' | 'not_found' | 'internal'
message: string // user-facing
details?: unknown // for debugging
retry_after_ms?: number // для rate_limited
trace_id?: string // для support correlation
}HTTP status code mapping:
- 400 →
validation - 401 →
unauthorized - 403 →
forbidden - 404 →
not_found - 409 →
conflict(withdetailscontaining conflict info) - 429 →
rate_limited(withretry_after_ms) - 500 →
internal - 502 →
bundle_fetch_failed(specific external dep failure) - 503 →
service_unavailable(degraded mode)
tRPC error mapping: tRPC errors converted к ApiError shape via global middleware.
Frontend handling: unified ApiErrorBoundary component renders based on code.
III.2.3 Workers Paid configuration
- Bundle size CI measurement, warn at 4MB
- Tree-shaking: octokit sub-packages, OTel selective, lazy-load rare features
- GitHub App rate limiter (KV token bucket, 12,500/hour budget)
III.2.4 Cron — 2 Workers + dispatcher
Worker 1 — apps/cron/frequent/ (4 crons): snapshot refresh, top-N refresh, reconciliation, hourly token-check + probe-fast combined.
Worker 2 — apps/cron/scheduled/ (1 cron, dispatcher): hourly trigger с try/catch isolation per operation. Tasks: onboarding reminders, probe deep, repo visibility check, DB backup, Yjs backup, usage report, revocations cleanup.
III.2.5 Database — Drizzle dual driver
Runtime (Workers): @neondatabase/serverless HTTP driver через drizzle-orm/neon-http.
Migrations (CI/Node): @neondatabase/serverless Pool driver через drizzle-orm/neon-serverless. Supports multi-statement transactions.
Lint rule: tools/migrate/* cannot import в apps/*.
Migration workflow: GHA с production-migration environment (manual approval, prevent self-review).
Expand-contract pattern для zero-downtime.
Backup: Neon PITR 7 days + monthly snapshot к R2.
III.2.6 Auth — Auth.js v5 JWT mode
Frontend (Pages): Auth.js v5 с JWT session strategy (7-day expiry). Drizzle adapter. GitHub provider.
Backend (Workers): verifies JWT с versioned secrets (CURRENT + PREVIOUS):
async function authMiddleware(c, next) {
const token = extractBearer(c.req.header('Authorization'))
if (!token) return c.json({ code: 'unauthorized', message: 'Missing token' }, 401)
let payload
for (const secret of [env.JWT_SECRET_CURRENT, env.JWT_SECRET_PREVIOUS].filter(Boolean)) {
try {
;({ payload } = await jwtVerify(token, new TextEncoder().encode(secret)))
break
} catch { continue }
}
if (!payload) return c.json({ code: 'unauthorized', message: 'Invalid token' }, 401)
const revoked = await env.KV.get(`revoked:${payload.jti}`)
if (revoked) return c.json({ code: 'unauthorized', message: 'Token revoked' }, 401)
c.set('user', { id: payload.sub, scope: payload.scope })
await next()
}JWT_SECRET rotation: quarterly via versioned dual-secret pattern (7-day overlap window).
Three OAuth Apps: dev/staging/prod separate.
Fallback: Lucia Auth (Edge-native) если Auth.js prototype fails.
III.2.7 ARNO CLI — OAuth Device Flow
Standard OAuth 2.0 Device Authorization Grant (RFC 8628). Cross-platform credential storage (~/.arno/credentials.json, Windows %APPDATA%\ARNO\). Rate limit max 12 polls per device_code.
III.2.8 IaC — Terraform Cloud
Managed: Cloudflare resources, Neon databases, GitHub repository settings.
NOT managed: secrets (via wrangler secret / dashboard).
Sensitive variables: sensitive = true + lifecycle.ignore_changes.
III.2.9 SPOF mitigation
- Multi-owner Cloudflare account (2+ admins, hardware key 2FA)
- DNS independence: Porkbun/Namecheap registrar, switchable nameservers
- DNS TTL 300s pre-set для critical records
- Backup admin email external (Gmail/ProtonMail)
- Scoped API tokens (never root)
- Secrets backup: 1Password + Bitwarden
- Disaster recovery playbook:
docs/runbooks/cloudflare_account_loss.md
III.2.10 Multi-vendor outage matrix
| Cloudflare | Liveblocks | Neon | Sentry | Recovery |
|---|---|---|---|---|
| Down | Up | Up | * | Migrate frontend + backend к Fly.io/Vercel в 48h. Yjs data intact via Liveblocks. |
| Up | Down | Up | * | Workflow editing degraded (no real-time). MD-edit via REST works. Viewers OK. |
| Down | Down | Up | * | Yjs backups к R2 restore. Frontend rebuild. Manual recovery ~1 week. |
| Up | Up | Down | * | API read-only mode (snapshot fallback). No writes. Viewers OK. |
| Down | Down | Down | * | Catastrophic — full restore from backups, est 1-2 weeks. Detailed playbook required. |
| Up | Up | Up | Down | Observability blind period. No alerting. Manual checking until recovery. |
Documented playbooks для каждого scenario в docs/runbooks/multi_vendor_outage.md.
III.3 Monorepo structure
arno/
├── apps/
│ ├── web/ # Next.js → Cloudflare Pages
│ ├── api/ # Hono → Workers
│ ├── workers/ # Queue consumers
│ ├── cron/
│ │ ├── frequent/ # 4 high-freq crons
│ │ └── scheduled/ # 1 hourly + dispatcher
│ └── cli/ # @arno/cli
├── packages/
│ ├── shared/ # Domain types, Zod schemas
│ ├── trpc/ # tRPC routes
│ ├── db/ # Drizzle schema + Auth.js adapter tables
│ ├── ui/ # React components
│ ├── editor/ # MD editor + workflow canvas
│ ├── observability/ # OTel + Sentry + log helpers
│ ├── git-provider/ # GitProvider interface + GitHub impl + rate limiter
│ ├── render-adapter/ # Adapter protocol
│ ├── auth/ # Auth.js + JWT helpers + Device Flow
│ ├── queue/ # Queue interface + CF + BullMQ impls
│ ├── cache/ # Cache interface + KV + Redis impls
│ └── email/ # Email interface + Resend + SendGrid
├── infra/
│ ├── terraform/ # IaC: Cloudflare + Neon + GitHub
│ └── neon/ # Migration scripts
├── tools/
│ ├── migrate/ # Drizzle Pool driver runner (CI only)
│ ├── eslint-config/
│ ├── tsconfig/
│ └── load-tests/ # k6 scenarios
├── docs/
│ ├── adr/ # Architecture Decision Records
│ ├── runbooks/ # Operational playbooks
│ ├── accessibility.md # WCAG compliance documentation
│ └── development.md
├── turbo.json
├── pnpm-workspace.yaml
└── package.jsonIII.4 Testing
| Layer | Tool | Cost |
|---|---|---|
| Unit/Integration | Vitest | OSS |
| E2E | Playwright | OSS |
| API mocking | MSW | OSS |
| DB integration | Testcontainers | OSS |
| Component isolation | Storybook | OSS |
| Load testing | k6 (GHA weekly + local) | OSS |
| Visual regression | Chromatic free | $0 |
| Accessibility | axe-core в Playwright | OSS |
| Security scan | Snyk free | $0 |
Coverage: critical paths 90%+, new code 80%+.
III.5 CI/CD
GitHub Actions с manual approval gates для production-migration environment (prevent self-review). Bundle size measurement step. Sentry release tagging. Smoke tests post-deploy. Secrets 90-day rotation cadence.
III.6 Cost ladder honest
| Stage | Users | $/mo | Drivers |
|---|---|---|---|
| Closed alpha | <20 | $5 | Workers Paid |
| Open beta | 20-100 | $5-105 | + Liveblocks Pro $99 likely (или startup credits) |
| Growing beta | 100-500 | $105-250 | + Sentry Team $26 |
| First paying | 500-2K | $250-600 | + Neon Launch $19 |
| Validated | 2K-10K | $600-2K | + Grafana paid + read replicas |
| Scale | 10K-100K | $3K-12K | All paid tiers |
Liveblocks decision tree: apply startup program week 2 → if approved free credits, if denied → Pro $99/mo OR Path A self-host DO ~$10-30/mo OR limit beta к 50 users.
Часть IV. Unified MVP Scope
| Категория | MVP |
|---|---|
| Vision | Cloud design editor поверх existing repos, multi-team collab |
| UI | Sidebar (components/workflow), workflow-canvas read-only, screen edit-mode, MD smart-editor с fenced blocks + versions list, multi-tab coordination, share-link toggle, WCAG 2.1 AA compliance |
| Render | Edit-mode schematic + preview iframe-bridge + custom-entry adapter + arno init + tiered preview + drift CI (5 states) + bundleHosting (gh-pages/github-packages с backend proxy) + periodic visibility check |
| Storage | Postgres (full schema §I.2.1, versions trigger-purge, GDPR retention §0.4). Liveblocks 3 channels + dual rooms. Cloudflare KV (10 caches incl push_lock mutex). R2 (bundles + backups) |
| Sync | GitHub App + token refresh middleware + ETag + rate-limiter + webhook dedup + fan-out + DAG ordering + backoff + persisted queue + reconciliation + manual refresh + pre-push HEAD check с mutex + GraphQL bulk initial scan |
| Collab | Yjs (workflow) + Liveblocks 3 channels, presence, cursors, activity-based snapshot, canonical serializer (fenced), server-side mutation validation, subscription lifecycle. MD-edit REST + versions (no silent drop) + same-session fast-forward + multi-tab coordination + conflict UI + real-time awareness |
| Auth | Auth.js v5 JWT mode (HS256) + Bearer Authorization + KV revocation + versioned JWT_SECRET (256-bit) + three OAuth Apps + Lucia fallback ready |
| Identity | UUID frontmatter + UUID props (fenced v1) + backfill CLI + collision detection + cross-project Redis lock + arno-check Action |
| Workflow | Screens + edges, action='navigate', edit FROM screen |
| Sharing | Private/unlisted + token, project & screen scope, anonymous viewer без Liveblocks (server snapshot + Redis) |
| Author gate | Pre-edit impact (this-project), gated by sync completion |
| Onboarding | Split flow + per-attempt session + lifecycle states + email reminders |
| Limits | Byte-primary tracking + tiered warnings (80%) + hard cap (100%) |
| Observability | OTel SDK + Sentry + Grafana Cloud free + structured logs + metrics + traces + invariant probes (tiered) + 3-tier alerting + healthchecks + customer debug dashboard |
| Tech | TypeScript + Hono + Next.js + Cloudflare Workers Paid + Pages + KV + R2 + Queues + Cron + Neon Postgres + Drizzle (dual driver) + Auth.js + Liveblocks + Resend + Turborepo + GitHub Actions + Terraform |
| Operational | Multi-owner Cloudflare account + separate registrar + DNS TTL 300s + disaster recovery + secrets backup + 90-day rotation |
| Error contract | Unified ApiError shape, HTTP status mapping (§III.2.2) |
| Crypto | HS256 JWT, SHA-256 content/webhooks, HMAC-SHA-256 hashing с pepper (§0.5) |
| Compliance | GDPR retention matrix §0.4, accessibility WCAG 2.1 AA §I.4 |
Часть V. Unified Парковка
| 📌 | Topic | Trigger возврата |
|---|---|---|
| 📌 | Screen composition structure (tree, slots, depth, overrides, allowlist) | Разработка screen edit-mode |
| 📌 | Sharing model details (subgraph scope, require_login UI, public discovery, comments) | Разработка share-фичи |
| 📌 | Backend API binding (edge.action=apiCall, mock, conditional, OpenAPI) | После валидации MVP |
| 📌 | Component author gate full (staged rollout, visual regression) | V2 после pilot |
| 📌 | Workflow-canvas layout rules | Ждём правила |
| 📌 | Project lifecycle / retention details | Разработка settings / delete |
| 📌 | Notifications (multi-team events) | После core MVP |
| ✅ | ||
| 📌 | GitLab / Bitbucket providers | После валидации GitHub |
| 📌 | Multi-repo per project | Если бизнес-кейс critical |
| 📌 | ARNO-side 3-way merge UI | GitHub-fallback friction критичен |
| 📌 | Yjs MD co-edit (vs current REST+versions) | Demand на real-time MD spec collab |
| 📌 | Real-time drift detection | CI-based latency проблема |
| 📌 | Cross-project impact index | Single-project impact недостаточен |
| 📌 | Multi Y.Doc permission split | Path-allowlist enforcement недостаточен |
| 📌 | Customer-hosted CDN (S3, etc.) | По запросу |
| 📌 | ARNO-managed S3 для private bundle | Enterprise alternative |
| 📌 | On-premise / enterprise hosting | Enterprise demand |
| 📌 | MD version history fancy UI | Post-MVP |
| 📌 | OTel Collector + tail sampling | Traces approach 40GB/mo |
| 📌 | Microservice extraction | Monolith blocks team velocity |
| 📌 | Multi-region deployment | Latency complaints |
| 📌 | Read replicas Postgres | Read load >70% capacity |
| 📌 | Postgres sharding | Single instance write limit |
| 📌 | Self-hosted Yjs (Path A или B) | Liveblocks bill >$200/mo OR enterprise on-prem |
| 📌 | Mobile apps | Customer demand |
| 📌 | Docs site (Docusaurus/Mintlify) | User-facing docs >10 pages |
| 📌 | Status page (Statuspage.io) | After first customer-facing incident |
| 📌 | SOC 2 compliance audit | Enterprise customer requires |
| 📌 | Bug bounty program | After SOC 2 maturity |
| 📌 | Open source release | Business decision |
| 📌 | i18n implementation | Second language demanded |
| 📌 | CLI binary distribution | Non-Node users complain |
| 📌 | Render adapter pen test | Before enterprise customers |
| 📌 | Email auto-failover | After first email outage |
| 📌 | Yjs anonymization в backups | Enterprise / longer retention |
| 📌 | Access/refresh JWT split | Post-MVP optimization |
| 📌 | Canvas library alternatives (vs react-flow) | If bundle impact too large |
| 📌 | Full a11y screen reader optimization для workflow canvas | Post-MVP |
| 📌 | High-contrast theme | Post-MVP |
| ⏸️ | OTel SDK version policy | Document при имплементации |
Часть VI. Open Questions / Week 1 Prototyping
Must verify empirically (week 1):
- Auth.js v5 на Cloudflare Pages Edge Runtime + Drizzle adapter + GitHub provider — full sign-in/sign-out/refresh. Fallback: Lucia.
@hono/trpc-serveradapter production-ready — batching, error handling. Fallback: direct Hono routes OR Fastify migration.- Liveblocks Yjs Storage REST API access —
/v2/rooms/{roomId}/storagereturns usable Y.Doc state. Fallback tiers: webhook-driven OR client-side periodic export. - Bundle size measurement — actual size with all deps. Tree-shake if >4MB.
- Workflow-canvas layout rules (ждём от пользователя)
Operational (parallel, не blocker):
- Runbook authoring (18+ runbooks)
- k6 load test scenarios
.env.exampleвсех env vars- Terraform initial setup
- Liveblocks startup program application (week 2)
- Domain registration + DNS TTL 300s
- Three GitHub OAuth Apps creation
- Sentry projects (FE+BE separate)
- Cloudflare multi-owner setup
- Healthchecks.io org-level accounts
- 2 password manager backups
Часть VII. Дальше + Launch Readiness
VII.1 Sequence
Полный execution plan: см. Implementation_Workflow.md — 16 фаз с demo-driven milestones, acceptance criteria per phase, throwaway-or-keep notes.
Краткая последовательность:
- Week 1 — Empirical prototyping (3 critical verifications + bundle measurement) — частично покрыто Phase 9 (Auth.js Edge), Phase 8 (tRPC+Hono), Phase 11 (Liveblocks REST)
- Week 2 — Operational setup (parallel: Liveblocks application, Cloudflare account, domain, OAuth Apps, vendors)
- Week 3+ — Implementation phases per Implementation_Workflow.md
- Implementation phases (rough order):
- Foundation: monorepo, CI/CD, basic infrastructure
- Auth: Auth.js + JWT + Drizzle adapter
- Data layer: Drizzle schema, migrations
- GitHub integration: GitProvider, App, webhook handler
- Liveblocks integration: workflow Y.Doc, broadcasts
- MD editor: smart editor, versions, conflict UI
- Workflow canvas: screens, edges, interactivity (accessibility built-in)
- Render adapter: edit mode → preview mode, bundle proxy
- Sharing: viewer mode (no Liveblocks), tokens
- Onboarding: wizard, lifecycle, reminders
- Observability: OTel, Sentry, Grafana, probes
- CLI: arno init, backfill-ids, lint, device flow
- Pre-launch: load testing, runbooks, status page, a11y audit
VII.2 Launch Readiness Checklist
Code:
- All MVP scope §IV implemented
- Test coverage critical paths >90%, new code >80%
- axe-core CI passing на all pages
- Lighthouse Accessibility score >90 per page
- Lighthouse Performance score >85 (editor route)
- Bundle size <4MB (Workers paid limit с buffer)
- All P0 audit fixes applied across 3 specs
Verification:
- Week 1 prototype results: Auth.js Edge, tRPC+Hono, Liveblocks REST, bundle — all confirmed
- Load test passes — 100K simulated users
- Invariant probes running и не triggering false positives
- Drift detection working на all components
- Disaster recovery playbook tested
Compliance:
- ToS published (auto-generated template + custom for ARNO)
- Privacy Policy published (covers GDPR retention §0.4)
- Cookie consent banner implemented
- DPAs signed с all vendors (Cloudflare, Neon, Liveblocks, Sentry, Grafana, Resend)
- GDPR data export endpoint built
- GDPR data delete endpoint built (анонимизация per §0.4)
- Accessibility statement published
Operations:
- Multi-owner Cloudflare account configured (2+ admins, hardware key 2FA)
- DNS records с TTL 300s
- All secrets backed up в 2 password managers
- All runbooks for PAGE alerts written
- On-call rotation defined
- Status page operational (manual MVP OK)
- Healthchecks.io configured (main + probe heartbeat + critical crons)
- 90-day secret rotation reminders scheduled
Customer-facing:
- First alpha customer onboarded successfully
- Support email / channel operational
- Bug report mechanism (Sentry user feedback)
Spec maintenance:
- Master spec persisted к files ✅ (done)
-
arno-runbooksrepo created - ADRs initialized
- Decision log started
All checkboxes required для public launch. Soft launch (private beta) может skip некоторые (status page, public docs).
Master spec status: Implementation-ready после week 1 prototyping verification. Source of truth: этот документ. Individual specs (ARNO v6, Observability v3, Tech Stack v2) preserved в chat history для historical trace.