Files
soroush.asadi 52be5be93f feat(node-agent): production ops kit — Windows service + WireGuard mesh
config:
- LoadEnvFile(): reads agent.env beside the exe (or $AGENT_ENV_FILE) before env,
  so the sc.exe service needs no per-service environment plumbing; real env wins

deploy/ (new):
- build-windows.ps1     cross-compile → dist\ + stage the deploy kit
- agent.env.example     fully documented config template
- install-service.ps1   register as auto-start Windows service (native sc.exe),
                        crash-restart 3×/5s, no NSSM dependency
- uninstall-service.ps1 stop + remove
- wireguard-node.conf.template + setup-wireguard.ps1  node dials out only, no
                        public IP / inbound rules; tunnel installed as boot service
- README.md             full control-plane + node walkthrough, ops table, troubleshooting

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-05 12:20:48 +03:30

162 lines
6.0 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# FlatRender Node Agent — Deployment
This folder turns a Windows machine with After Effects into a FlatRender render
node: connected to the control plane over an encrypted WireGuard mesh, running
the agent as an auto-restarting Windows service.
```
deploy/
├── build-windows.ps1 Cross-compile the agent → dist\ (run on your dev box)
├── agent.env.example Agent config template → copy to agent.env
├── install-service.ps1 Register the agent as a Windows service (sc.exe)
├── uninstall-service.ps1 Remove the service
├── wireguard-node.conf.template WireGuard client config → fill in → wg-flatrender.conf
└── setup-wireguard.ps1 Install + start the WireGuard tunnel as a boot service
```
The node only ever dials **out** to the control plane. It needs **no public IP**
and **no inbound firewall rules** — home ADSL, CGNAT, or any datacenter all work.
---
## Architecture
```
WireGuard mesh 10.66.0.0/24
┌─────────────────────────────┐ ┌──────────────────────────────┐
│ Control plane 10.66.0.1 │◄───────►│ Render node 10.66.0.11 │
│ ─ gateway :8088 │ encrypted ─ wireguard tunnel (svc) │
│ ─ render-svc (orchestrator)│ tunnel │ ─ node-agent (svc) │
│ ─ MinIO (templates/exports)│ │ ─ After Effects + aerender │
└─────────────────────────────┘ └──────────────────────────────┘
```
The agent: claims a job → downloads the `.aep` bundle from MinIO → binds user
customisations (JSX) → renders with `aerender.exe` → uploads the MP4 → reports
complete. It heartbeats every 5 s and streams live preview frames while rendering.
---
## 1. Control plane: one-time WireGuard server setup
On the Linux host that runs the V2 stack (gateway + MinIO):
```bash
# Install
sudo apt install -y wireguard
# Generate the server keypair
wg genkey | tee server.key | wg pubkey > server.pub
# /etc/wireguard/wg0.conf
cat >/etc/wireguard/wg0.conf <<'EOF'
[Interface]
Address = 10.66.0.1/24
ListenPort = 51820
PrivateKey = <contents of server.key>
# One [Peer] block per render node (append as you add nodes):
# [Peer]
# PublicKey = <node-1 public key>
# AllowedIPs = 10.66.0.11/32
EOF
sudo systemctl enable --now wg-quick@wg0
sudo wg show # prints the server public key for the node config
```
Open UDP **51820** to the internet (the only inbound port the control plane needs
for the mesh). The gateway (:8088) and MinIO stay bound to the WG interface — they
are never exposed publicly.
> Each time you add a node, append its `[Peer]` block and `sudo wg syncconf wg0 <(wg-quick strip wg0)`.
---
## 2. Build the agent (on your dev machine)
```powershell
# Requires Go 1.25+. Produces dist\flatrender-node-agent.exe + the deploy kit.
cd services\node-agent\deploy
.\build-windows.ps1
```
Copy the whole `dist\` folder to each render node (e.g. `C:\flatrender\`).
---
## 3. Render node: WireGuard
```powershell
# On the node, generate its keypair (WireGuard GUI → Add Tunnel → it shows the keys,
# or use the bundled wg.exe): wg genkey | wg pubkey
```
1. Copy `wireguard-node.conf.template``wg-flatrender.conf`.
2. Fill the four placeholders:
- `NODE_PRIVATE_KEY` — this node's private key
- `NODE_NUMBER` — unique mesh octet (11, 12, 13, …) → `Address = 10.66.0.11/32`
- `SERVER_PUBLIC_KEY` — from `wg show` on the control plane
- `SERVER_PUBLIC_ENDPOINT` — the control plane's public IP/host
3. Add this node's **public** key + `AllowedIPs = 10.66.0.11/32` as a `[Peer]` on the server (step 1).
4. Install the tunnel (elevated PowerShell):
```powershell
.\setup-wireguard.ps1 -ConfigPath .\wg-flatrender.conf
ping 10.66.0.1 # should reply over the tunnel
```
---
## 4. Render node: agent service
```powershell
# Configure
Copy-Item agent.env.example agent.env
notepad agent.env # set NODE_ID, NODE_HMAC_SECRET, ORCHESTRATOR_URL=http://10.66.0.1:8088, AE_PATH
```
Get `NODE_ID` by creating the node in the admin panel (**/admin/nodes → add**), or
via `POST /v1/nodes`. `NODE_HMAC_SECRET` must equal the render-svc value in `.env.v2`.
```powershell
# Install + start the service (elevated)
.\install-service.ps1
# Verify
curl http://localhost:7777/health
Get-Service FlatRenderNodeAgent
```
The node now appears **Ready** in `/admin/nodes` and starts claiming jobs.
---
## Operations
| Task | Command |
|---|---|
| Health | `curl http://localhost:7777/health` |
| Service status | `Get-Service FlatRenderNodeAgent` |
| Restart | `Restart-Service FlatRenderNodeAgent` |
| Stop | `Stop-Service FlatRenderNodeAgent` |
| Update binary | Stop service → replace exe → Start service |
| Change config | Edit `agent.env``Restart-Service FlatRenderNodeAgent` |
| Remove | `.\uninstall-service.ps1` |
| Tunnel status | `& 'C:\Program Files\WireGuard\wireguard.exe' show` |
The service auto-restarts on crash (3× at 5 s intervals) and auto-starts at boot.
WireGuard comes up first, so the agent always has a path to the gateway.
### Mock mode
Leave `AE_PATH` empty in `agent.env` to run the **mock renderer** — useful to smoke-test
the claim → download → upload → complete pipeline on a node without an AE licence.
### Troubleshooting
- **Node never goes Ready**: tunnel down (`wireguard.exe show`) or wrong `ORCHESTRATOR_URL`.
- **401 / signature errors**: `NODE_HMAC_SECRET` mismatch with render-svc.
- **Jobs claim but fail at download**: MinIO not reachable over the mesh — confirm MinIO
is bound to `10.66.0.1` and the presigned host in render-svc points at the mesh IP.
- **AE hangs**: a stale `aerender.exe`/`AfterFX.exe` — the agent force-kills these before
each launch; confirm AE opens manually and isn't stuck on a "Crash Repair" dialog.