Teamup/docs/V1_BUILD_PLAN.md

# TeamUp.AI V1 — Build Plan

The narrow wedge: **AI Product Owner + AI QA, on one team, through the board and review, inside AliaSaaS.** Build in order; each milestone is shippable. The point of V1 is to measure **human edit distance** on PO and QA work — instrument it from M1.

**Before M1:** the stack is locked (see *Tech stack & bill of materials* below). Stand up the repo — one **.NET 10** solution with two entrypoints (**web/API** + **worker**) sharing the domain-module projects, **PostgreSQL 17+ + pgvector**, EF Core migrations, and a React/Vite SPA built into the web project's `wwwroot` — plus one-command local dev (`docker compose`: app + worker + postgres) and CI. No feature code yet: just the skeleton and the project layout that enforces module boundaries.

---

## Tech stack & bill of materials (locked)

**Backend.** .NET 10 (LTS), ASP.NET Core Minimal APIs (endpoints grouped per module). One solution, two Generic-Host entrypoints — `web` and `worker` — sharing the domain-module projects. Boundaries enforced as separate projects with interface-only references (no cross-module table access).

**Data & persistence.** PostgreSQL 17+ with pgvector · EF Core 10 + Npgsql · `Pgvector.EntityFrameworkCore` for vector columns/queries · EF Core migrations.

**Agent-run queue (M4).** A domain-owned `jobs` table drained with `SELECT … FOR UPDATE SKIP LOCKED` by a worker `BackgroundService` — the run lifecycle (queued → running → output → review) is domain state, kept explicit. *(Alternative if outbox/messaging ergonomics are wanted later: Wolverine on Postgres. Hangfire/Quartz only for M6's scheduled triggers.)*

**AI layer — thin adapters (M3–M4).** `Microsoft.Extensions.AI` (`IChatClient` / `IEmbeddingGenerator`) as the provider-agnostic seam, with thin per-provider HTTP adapters behind it · `Microsoft.Extensions.Http.Resilience` (Polly) for the per-seat fallback/retry chain · air-gapped embeddings via `SmartComponents.LocalEmbeddings` or raw `Microsoft.ML.OnnxRuntime` (MiniLM/bge, CPU-only), switching to a provider's embedding API when BYOK keys are present.

**Cross-cutting.** Auth/RBAC — ASP.NET Core Identity + JWT (OpenIddict later if a full OAuth server is needed) · BYOK at rest — AES-GCM with a deployment master key; keys owner-only, server-side, never returned to a client · Validation — FluentValidation · Mapping — Mapperly (source-gen) · Resilience — Polly · Observability — OpenTelemetry + Serilog (carries the edit-distance metric from M1).

**Testing & the golden-tested-skills rule (M2).** xUnit · Testcontainers (real Postgres) · **Verify** for snapshot/golden tests of skills and prompt outputs.

**Frontend.** React SPA — Vite + TypeScript, built into the web project's `wwwroot` (single deployable). React Router · TanStack Query (server state) · Zustand (client state) · shadcn/ui + Tailwind · **React Flow (xyflow)** for the live org chart · **dnd-kit** for the board · React Hook Form + Zod · Recharts/Tremor for the M6 analytics. Typed API client generated from ASP.NET's OpenAPI (orval / openapi-typescript) into TanStack Query hooks — end-to-end types. *(Next.js is reserved for the separate public marketing site, not the product.)*

**Dev & deploy.** One Docker image run as web or worker via entrypoint, + Postgres; one-command `docker compose` for local dev; Kubernetes for prod; air-gappable as a single unit.

---

## M1 — Org, board, access & cartable

**Goal:** the skeleton — people, permissions, and a working board with the three seat states. No AI yet.

**Tasks**
- Entities: `Member`, `Membership` (scope + role), `Team`, `Seat` (state: human/open/ai), `Task` (type, status, assignee = member|agent, parent, provenance), `AuditEntry`.
- Roles & **permission enforcement** middleware — a check on every mutating action at the relevant scope (Owner / Team owner / Member / Viewer).
- Invitation flow (email → join → land in cartable).
- The **board** UI: columns backlog → in progress → in review → done; create/move/assign tasks (human assignees for now).
- The **cartable** as a derived view (tasks assigned to me, sent-backs, mentions; Approvals section stubbed for owners).
- Edit-distance instrumentation **stubbed in** (the data path exists, even with no AI output yet).
- Audit log writing on key actions.

**Acceptance:** a CEO can invite a member, assign them a role on a team, both see the board scoped to their permissions, tasks move across columns, and each person sees their own cartable. A member cannot perform owner-only actions (verified).

---

## M2 — Skill registry

**Goal:** skills flow from Git into a queryable index, with the first PO/QA atoms.

**Tasks**
- `GitProvider` interface; Gitea read adapter; webhook → sync worker.
- Parse `SKILL.md` (frontmatter + body) → `Skill` rows in Postgres (incl. `visibility`, `min_tier` fields — hooks only).
- pgvector index over skills for matching.
- Eval harness: run a skill's golden tests; report pass/fail + edit distance; **block publish on failure**.
- Author the four V1 atoms in Git: `spec-writing`, `story-breakdown`, `test-plan-generation`, `diff-review` — each with frontmatter (roles, I/O, risk-tagged actions, context) and golden tests.

**Acceptance:** pushing a `SKILL.md` to Gitea indexes it within seconds; the four atoms appear, queryable by role; their golden tests run and pass.

---

## M3 — Seat config + BYOK

**Goal:** configure an AI seat and connect a model — securely.

**Tasks**
- `Agent` entity (skills[], autonomy, api_config_id, docs[]) bound to a seat; flip a seat open → AI.
- Seat configurator UI: pick skills (+ versions), set autonomy dial, attach docs/repo context, choose model config.
- `ApiConfig` (BYOK): name, provider, model, **encrypted** key. **Owner-only** create/view; team owners assign from a list and never see the key; keys never returned to the client after save.
- Model adapter interface + adapters for the providers in use (HTTP); per-seat **fallback** config.

**Acceptance:** an owner adds a `Vertex-Pro` config (key stored encrypted, not retrievable); a team owner configures Aria (PO) with skills, gated autonomy, docs, and that config — without ever seeing the key; a test call succeeds.

---

## M4 — Assembler + worker

**Goal:** a task becomes an agent run becomes a parsed output.

**Tasks**
- Job queue: a Postgres `jobs` table drained with `FOR UPDATE SKIP LOCKED` by a worker `BackgroundService`; enqueue an `AgentRun` on trigger (task assigned / chat).
- Worker pulls a job and runs the **assembler**: house-style + identity/overrides + matched atoms (by task type / I/O) + permitted docs & code (RAG via pgvector) + working memory → prompt, with **prompt caching**.
- Call the seat's model (BYOK, with fallback); store the full run + trace on `AgentRun`.
- Parse output into an **action + risk tag** (PO: spec + proposed child stories; QA: test plan from a diff).

**Acceptance:** assigning a feature task to Aria produces a spec and a set of proposed child stories as a parsed result, with the assembled context and reasoning captured on the run. Nothing executes yet (gate is M5).

---

## M5 — Action gate + review inbox

**Goal:** governance closes the loop; edit distance is captured for real.

**Tasks**
- Action gate: compare seat autonomy (draft/gated/autonomous) to action risk (read/draft/publish/destructive) → execute or **hold**. **Destructive always holds for a human.**
- `ReviewItem` for held actions; the **review inbox** UI (= the Approvals section of an owner's cartable): preview, **expandable reasoning trace**, and **approve / edit-and-approve / send back**.
- On execute: perform the internal action (create the child tasks; write the spec/test artifact onto the board); record **edit distance** from edit-and-approve; write audit entry.

**Acceptance:** Aria (gated) proposes a spec → it waits in the owner's review inbox with its trace → owner edits and approves → the spec lands and four child story tasks appear on the board → edit distance is recorded.

---

## M6 — Working memory + the first trigger + analytics

**Goal:** the two-role loop runs end to end, and the bet is measurable.

**Tasks**
- `MemoryEntry` (team working memory): write decisions/approvals/corrections on approval; read at assembly (pgvector match).
- The single **event trigger**: a task hitting *done* in the team emits a handoff that creates a QA task for Quill (with provenance); Quill reads the diff and drafts a test plan that waits in review.
- **Analytics** view: approval rate, **human edit distance** (per agent and trend), tasks done. Optional: per-run token cost (informational).
- Loop/storm guardrail: rate-limit triggers; no self-cascading.

**Acceptance:** a dev marks a story done → Quill wakes, drafts a test plan → it waits in review → approve → analytics show edit distance and approval rate for Aria and Quill across the sprint. **This is the proof of the bet.**

---

## Definition of done for V1

The PO and QA loops run inside AliaSaaS on one real product, governed through the board and review inbox, on AliaSaaS's own model keys — and the analytics show **human edit distance low and falling** over a sprint or two. That result (or its absence) is the decision V1 exists to produce.

## Explicitly NOT in V1
Divisions UI & other roles · multiple products · multi-tenant billing · per-agent MCP & Git write-back · episodic/semantic memory · the gap finder · skill studio / template builder / tier enforcement / AI skill-suggestion (data hooks only) · marketplace · the custom TeamUp model · SSO/SCIM · event mesh beyond the single PO→QA trigger. All are accommodated by the architecture; none is built now.

## Always-on engineering rules (see CLAUDE.md §8)
Modular monolith (no cross-module table access) · web off the model path · permission check on every mutation · BYOK keys owner-only & server-side · retrieved content is data not instructions · destructive always needs a human · skills are Git-sourced and golden-tested · instrument edit distance from day one.