Teamup

soroushdes/Teamup

Fork 0

Commit Graph

Author	SHA1	Message	Date
soroush.asadi	e987e33c0a	M2: eval harness — golden tests gated on edit distance - SkillEvaluator (internal to Skills): runs each golden test through an ISkillExecutor and passes only if normalized edit distance <= threshold (the north-star metric). The executor is a stub in M2 (no model runtime); M4's assembler supplies the real one and publishing is gated on the report. The indexer's structural gate (roles + >=1 golden test) stands until then. - InternalsVisibleTo the integration tests so the harness is exercised directly. Verified: build green; ArchitectureTests 8/8; IntegrationTests 25/25 (+3 eval-harness unit tests: pass on match, fail on divergence, fail with no golden tests).	2026-06-09 18:42:19 +03:30

Author

SHA1

Message

Date

soroush.asadi

e987e33c0a

M2: eval harness — golden tests gated on edit distance

- SkillEvaluator (internal to Skills): runs each golden test through an ISkillExecutor and
  passes only if normalized edit distance <= threshold (the north-star metric). The executor
  is a stub in M2 (no model runtime); M4's assembler supplies the real one and publishing is
  gated on the report. The indexer's structural gate (roles + >=1 golden test) stands until then.
- InternalsVisibleTo the integration tests so the harness is exercised directly.

Verified: build green; ArchitectureTests 8/8; IntegrationTests 25/25 (+3 eval-harness unit
tests: pass on match, fail on divergence, fail with no golden tests).

2026-06-09 18:42:19 +03:30

1 Commits