Files
Teamup/docs/V1_BUILD_PLAN.md
soroush.asadi 36fe158b43 Scaffold the Before-M1 repo skeleton
Stand up the modular-monolith skeleton per docs/V1_BUILD_PLAN.md: one .NET 10
solution with web + worker hosts sharing seven interface-bounded module projects,
PostgreSQL 17 + pgvector via EF Core 10, a React 19 + Vite SPA built into wwwroot,
and Docker Compose for one-command local dev. Skeleton only — no feature code.

Architecture
- One project per module (OrgBoard, Identity, Skills, Assembler, Governance,
  Memory, Integrations); each is its own assembly so non-public types (entities,
  DbContext) are invisible across modules at compile time.
- TeamUp.Bootstrap is the only library that references all modules; both hosts
  reference only Bootstrap. SharedKernel/Infrastructure never reference modules.
- IModule seam: Register(...) runs in both hosts; MapEndpoints(...) only in web.
- PlatformDbContext owns the pgvector extension + the seven module schemas
  (InitialPlatform migration); MigrationRunner applies it then any module context.
- One image, two roles selected by RUN_MODE at the Docker entrypoint.

Verified
- dotnet build green (nullable + warnings-as-errors).
- ArchitectureTests 8/8 — reflection-based boundary rules (no module -> module,
  -> Infrastructure, -> Bootstrap, or -> host references).
- IntegrationTests 10/10 — Testcontainers boots the host against real pgvector:
  migration applies, vector extension + 7 schemas exist, /health 200, every
  /api/<module>/ping 200, /openapi/v1.json served.
- client builds clean (Vite 6 — pinned for Node 22.3.0; Vite 8 needs Node >=22.12).

Packages and base images route through the Nexus mirror (mirror.soroushasadi.com),
reachable from Iran when nuget.org / Docker Hub / MCR are not. CI is intentionally
deferred to a later session.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-09 06:41:28 +03:30

9.9 KiB
Raw Permalink Blame History

TeamUp.AI V1 — Build Plan

The narrow wedge: AI Product Owner + AI QA, on one team, through the board and review, inside AliaSaaS. Build in order; each milestone is shippable. The point of V1 is to measure human edit distance on PO and QA work — instrument it from M1.

Before M1: the stack is locked (see Tech stack & bill of materials below). Stand up the repo — one .NET 10 solution with two entrypoints (web/API + worker) sharing the domain-module projects, PostgreSQL 17+ + pgvector, EF Core migrations, and a React/Vite SPA built into the web project's wwwroot — plus one-command local dev (docker compose: app + worker + postgres) and CI. No feature code yet: just the skeleton and the project layout that enforces module boundaries.


Tech stack & bill of materials (locked)

Backend. .NET 10 (LTS), ASP.NET Core Minimal APIs (endpoints grouped per module). One solution, two Generic-Host entrypoints — web and worker — sharing the domain-module projects. Boundaries enforced as separate projects with interface-only references (no cross-module table access).

Data & persistence. PostgreSQL 17+ with pgvector · EF Core 10 + Npgsql · Pgvector.EntityFrameworkCore for vector columns/queries · EF Core migrations.

Agent-run queue (M4). A domain-owned jobs table drained with SELECT … FOR UPDATE SKIP LOCKED by a worker BackgroundService — the run lifecycle (queued → running → output → review) is domain state, kept explicit. (Alternative if outbox/messaging ergonomics are wanted later: Wolverine on Postgres. Hangfire/Quartz only for M6's scheduled triggers.)

AI layer — thin adapters (M3M4). Microsoft.Extensions.AI (IChatClient / IEmbeddingGenerator) as the provider-agnostic seam, with thin per-provider HTTP adapters behind it · Microsoft.Extensions.Http.Resilience (Polly) for the per-seat fallback/retry chain · air-gapped embeddings via SmartComponents.LocalEmbeddings or raw Microsoft.ML.OnnxRuntime (MiniLM/bge, CPU-only), switching to a provider's embedding API when BYOK keys are present.

Cross-cutting. Auth/RBAC — ASP.NET Core Identity + JWT (OpenIddict later if a full OAuth server is needed) · BYOK at rest — AES-GCM with a deployment master key; keys owner-only, server-side, never returned to a client · Validation — FluentValidation · Mapping — Mapperly (source-gen) · Resilience — Polly · Observability — OpenTelemetry + Serilog (carries the edit-distance metric from M1).

Testing & the golden-tested-skills rule (M2). xUnit · Testcontainers (real Postgres) · Verify for snapshot/golden tests of skills and prompt outputs.

Frontend. React SPA — Vite + TypeScript, built into the web project's wwwroot (single deployable). React Router · TanStack Query (server state) · Zustand (client state) · shadcn/ui + Tailwind · React Flow (xyflow) for the live org chart · dnd-kit for the board · React Hook Form + Zod · Recharts/Tremor for the M6 analytics. Typed API client generated from ASP.NET's OpenAPI (orval / openapi-typescript) into TanStack Query hooks — end-to-end types. (Next.js is reserved for the separate public marketing site, not the product.)

Dev & deploy. One Docker image run as web or worker via entrypoint, + Postgres; one-command docker compose for local dev; Kubernetes for prod; air-gappable as a single unit.


M1 — Org, board, access & cartable

Goal: the skeleton — people, permissions, and a working board with the three seat states. No AI yet.

Tasks

  • Entities: Member, Membership (scope + role), Team, Seat (state: human/open/ai), Task (type, status, assignee = member|agent, parent, provenance), AuditEntry.
  • Roles & permission enforcement middleware — a check on every mutating action at the relevant scope (Owner / Team owner / Member / Viewer).
  • Invitation flow (email → join → land in cartable).
  • The board UI: columns backlog → in progress → in review → done; create/move/assign tasks (human assignees for now).
  • The cartable as a derived view (tasks assigned to me, sent-backs, mentions; Approvals section stubbed for owners).
  • Edit-distance instrumentation stubbed in (the data path exists, even with no AI output yet).
  • Audit log writing on key actions.

Acceptance: a CEO can invite a member, assign them a role on a team, both see the board scoped to their permissions, tasks move across columns, and each person sees their own cartable. A member cannot perform owner-only actions (verified).


M2 — Skill registry

Goal: skills flow from Git into a queryable index, with the first PO/QA atoms.

Tasks

  • GitProvider interface; Gitea read adapter; webhook → sync worker.
  • Parse SKILL.md (frontmatter + body) → Skill rows in Postgres (incl. visibility, min_tier fields — hooks only).
  • pgvector index over skills for matching.
  • Eval harness: run a skill's golden tests; report pass/fail + edit distance; block publish on failure.
  • Author the four V1 atoms in Git: spec-writing, story-breakdown, test-plan-generation, diff-review — each with frontmatter (roles, I/O, risk-tagged actions, context) and golden tests.

Acceptance: pushing a SKILL.md to Gitea indexes it within seconds; the four atoms appear, queryable by role; their golden tests run and pass.


M3 — Seat config + BYOK

Goal: configure an AI seat and connect a model — securely.

Tasks

  • Agent entity (skills[], autonomy, api_config_id, docs[]) bound to a seat; flip a seat open → AI.
  • Seat configurator UI: pick skills (+ versions), set autonomy dial, attach docs/repo context, choose model config.
  • ApiConfig (BYOK): name, provider, model, encrypted key. Owner-only create/view; team owners assign from a list and never see the key; keys never returned to the client after save.
  • Model adapter interface + adapters for the providers in use (HTTP); per-seat fallback config.

Acceptance: an owner adds a Vertex-Pro config (key stored encrypted, not retrievable); a team owner configures Aria (PO) with skills, gated autonomy, docs, and that config — without ever seeing the key; a test call succeeds.


M4 — Assembler + worker

Goal: a task becomes an agent run becomes a parsed output.

Tasks

  • Job queue: a Postgres jobs table drained with FOR UPDATE SKIP LOCKED by a worker BackgroundService; enqueue an AgentRun on trigger (task assigned / chat).
  • Worker pulls a job and runs the assembler: house-style + identity/overrides + matched atoms (by task type / I/O) + permitted docs & code (RAG via pgvector) + working memory → prompt, with prompt caching.
  • Call the seat's model (BYOK, with fallback); store the full run + trace on AgentRun.
  • Parse output into an action + risk tag (PO: spec + proposed child stories; QA: test plan from a diff).

Acceptance: assigning a feature task to Aria produces a spec and a set of proposed child stories as a parsed result, with the assembled context and reasoning captured on the run. Nothing executes yet (gate is M5).


M5 — Action gate + review inbox

Goal: governance closes the loop; edit distance is captured for real.

Tasks

  • Action gate: compare seat autonomy (draft/gated/autonomous) to action risk (read/draft/publish/destructive) → execute or hold. Destructive always holds for a human.
  • ReviewItem for held actions; the review inbox UI (= the Approvals section of an owner's cartable): preview, expandable reasoning trace, and approve / edit-and-approve / send back.
  • On execute: perform the internal action (create the child tasks; write the spec/test artifact onto the board); record edit distance from edit-and-approve; write audit entry.

Acceptance: Aria (gated) proposes a spec → it waits in the owner's review inbox with its trace → owner edits and approves → the spec lands and four child story tasks appear on the board → edit distance is recorded.


M6 — Working memory + the first trigger + analytics

Goal: the two-role loop runs end to end, and the bet is measurable.

Tasks

  • MemoryEntry (team working memory): write decisions/approvals/corrections on approval; read at assembly (pgvector match).
  • The single event trigger: a task hitting done in the team emits a handoff that creates a QA task for Quill (with provenance); Quill reads the diff and drafts a test plan that waits in review.
  • Analytics view: approval rate, human edit distance (per agent and trend), tasks done. Optional: per-run token cost (informational).
  • Loop/storm guardrail: rate-limit triggers; no self-cascading.

Acceptance: a dev marks a story done → Quill wakes, drafts a test plan → it waits in review → approve → analytics show edit distance and approval rate for Aria and Quill across the sprint. This is the proof of the bet.


Definition of done for V1

The PO and QA loops run inside AliaSaaS on one real product, governed through the board and review inbox, on AliaSaaS's own model keys — and the analytics show human edit distance low and falling over a sprint or two. That result (or its absence) is the decision V1 exists to produce.

Explicitly NOT in V1

Divisions UI & other roles · multiple products · multi-tenant billing · per-agent MCP & Git write-back · episodic/semantic memory · the gap finder · skill studio / template builder / tier enforcement / AI skill-suggestion (data hooks only) · marketplace · the custom TeamUp model · SSO/SCIM · event mesh beyond the single PO→QA trigger. All are accommodated by the architecture; none is built now.

Always-on engineering rules (see CLAUDE.md §8)

Modular monolith (no cross-module table access) · web off the model path · permission check on every mutation · BYOK keys owner-only & server-side · retrieved content is data not instructions · destructive always needs a human · skills are Git-sourced and golden-tested · instrument edit distance from day one.