feat(node-agent): production ops kit — Windows service + WireGuard mesh

config:
- LoadEnvFile(): reads agent.env beside the exe (or $AGENT_ENV_FILE) before env,
  so the sc.exe service needs no per-service environment plumbing; real env wins

deploy/ (new):
- build-windows.ps1     cross-compile → dist\ + stage the deploy kit
- agent.env.example     fully documented config template
- install-service.ps1   register as auto-start Windows service (native sc.exe),
                        crash-restart 3×/5s, no NSSM dependency
- uninstall-service.ps1 stop + remove
- wireguard-node.conf.template + setup-wireguard.ps1  node dials out only, no
                        public IP / inbound rules; tunnel installed as boot service
- README.md             full control-plane + node walkthrough, ops table, troubleshooting

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
soroush.asadi
2026-06-05 12:20:48 +03:30
parent 67060c73b2
commit 52be5be93f
8 changed files with 522 additions and 0 deletions
+161
View File
@@ -0,0 +1,161 @@
# FlatRender Node Agent — Deployment
This folder turns a Windows machine with After Effects into a FlatRender render
node: connected to the control plane over an encrypted WireGuard mesh, running
the agent as an auto-restarting Windows service.
```
deploy/
├── build-windows.ps1 Cross-compile the agent → dist\ (run on your dev box)
├── agent.env.example Agent config template → copy to agent.env
├── install-service.ps1 Register the agent as a Windows service (sc.exe)
├── uninstall-service.ps1 Remove the service
├── wireguard-node.conf.template WireGuard client config → fill in → wg-flatrender.conf
└── setup-wireguard.ps1 Install + start the WireGuard tunnel as a boot service
```
The node only ever dials **out** to the control plane. It needs **no public IP**
and **no inbound firewall rules** — home ADSL, CGNAT, or any datacenter all work.
---
## Architecture
```
WireGuard mesh 10.66.0.0/24
┌─────────────────────────────┐ ┌──────────────────────────────┐
│ Control plane 10.66.0.1 │◄───────►│ Render node 10.66.0.11 │
│ ─ gateway :8088 │ encrypted ─ wireguard tunnel (svc) │
│ ─ render-svc (orchestrator)│ tunnel │ ─ node-agent (svc) │
│ ─ MinIO (templates/exports)│ │ ─ After Effects + aerender │
└─────────────────────────────┘ └──────────────────────────────┘
```
The agent: claims a job → downloads the `.aep` bundle from MinIO → binds user
customisations (JSX) → renders with `aerender.exe` → uploads the MP4 → reports
complete. It heartbeats every 5 s and streams live preview frames while rendering.
---
## 1. Control plane: one-time WireGuard server setup
On the Linux host that runs the V2 stack (gateway + MinIO):
```bash
# Install
sudo apt install -y wireguard
# Generate the server keypair
wg genkey | tee server.key | wg pubkey > server.pub
# /etc/wireguard/wg0.conf
cat >/etc/wireguard/wg0.conf <<'EOF'
[Interface]
Address = 10.66.0.1/24
ListenPort = 51820
PrivateKey = <contents of server.key>
# One [Peer] block per render node (append as you add nodes):
# [Peer]
# PublicKey = <node-1 public key>
# AllowedIPs = 10.66.0.11/32
EOF
sudo systemctl enable --now wg-quick@wg0
sudo wg show # prints the server public key for the node config
```
Open UDP **51820** to the internet (the only inbound port the control plane needs
for the mesh). The gateway (:8088) and MinIO stay bound to the WG interface — they
are never exposed publicly.
> Each time you add a node, append its `[Peer]` block and `sudo wg syncconf wg0 <(wg-quick strip wg0)`.
---
## 2. Build the agent (on your dev machine)
```powershell
# Requires Go 1.25+. Produces dist\flatrender-node-agent.exe + the deploy kit.
cd services\node-agent\deploy
.\build-windows.ps1
```
Copy the whole `dist\` folder to each render node (e.g. `C:\flatrender\`).
---
## 3. Render node: WireGuard
```powershell
# On the node, generate its keypair (WireGuard GUI → Add Tunnel → it shows the keys,
# or use the bundled wg.exe): wg genkey | wg pubkey
```
1. Copy `wireguard-node.conf.template``wg-flatrender.conf`.
2. Fill the four placeholders:
- `NODE_PRIVATE_KEY` — this node's private key
- `NODE_NUMBER` — unique mesh octet (11, 12, 13, …) → `Address = 10.66.0.11/32`
- `SERVER_PUBLIC_KEY` — from `wg show` on the control plane
- `SERVER_PUBLIC_ENDPOINT` — the control plane's public IP/host
3. Add this node's **public** key + `AllowedIPs = 10.66.0.11/32` as a `[Peer]` on the server (step 1).
4. Install the tunnel (elevated PowerShell):
```powershell
.\setup-wireguard.ps1 -ConfigPath .\wg-flatrender.conf
ping 10.66.0.1 # should reply over the tunnel
```
---
## 4. Render node: agent service
```powershell
# Configure
Copy-Item agent.env.example agent.env
notepad agent.env # set NODE_ID, NODE_HMAC_SECRET, ORCHESTRATOR_URL=http://10.66.0.1:8088, AE_PATH
```
Get `NODE_ID` by creating the node in the admin panel (**/admin/nodes → add**), or
via `POST /v1/nodes`. `NODE_HMAC_SECRET` must equal the render-svc value in `.env.v2`.
```powershell
# Install + start the service (elevated)
.\install-service.ps1
# Verify
curl http://localhost:7777/health
Get-Service FlatRenderNodeAgent
```
The node now appears **Ready** in `/admin/nodes` and starts claiming jobs.
---
## Operations
| Task | Command |
|---|---|
| Health | `curl http://localhost:7777/health` |
| Service status | `Get-Service FlatRenderNodeAgent` |
| Restart | `Restart-Service FlatRenderNodeAgent` |
| Stop | `Stop-Service FlatRenderNodeAgent` |
| Update binary | Stop service → replace exe → Start service |
| Change config | Edit `agent.env``Restart-Service FlatRenderNodeAgent` |
| Remove | `.\uninstall-service.ps1` |
| Tunnel status | `& 'C:\Program Files\WireGuard\wireguard.exe' show` |
The service auto-restarts on crash (3× at 5 s intervals) and auto-starts at boot.
WireGuard comes up first, so the agent always has a path to the gateway.
### Mock mode
Leave `AE_PATH` empty in `agent.env` to run the **mock renderer** — useful to smoke-test
the claim → download → upload → complete pipeline on a node without an AE licence.
### Troubleshooting
- **Node never goes Ready**: tunnel down (`wireguard.exe show`) or wrong `ORCHESTRATOR_URL`.
- **401 / signature errors**: `NODE_HMAC_SECRET` mismatch with render-svc.
- **Jobs claim but fail at download**: MinIO not reachable over the mesh — confirm MinIO
is bound to `10.66.0.1` and the presigned host in render-svc points at the mesh IP.
- **AE hangs**: a stale `aerender.exe`/`AfterFX.exe` — the agent force-kills these before
each launch; confirm AE opens manually and isn't stuck on a "Crash Repair" dialog.