feat(node-agent): production ops kit — Windows service + WireGuard mesh

config:
- LoadEnvFile(): reads agent.env beside the exe (or $AGENT_ENV_FILE) before env,
  so the sc.exe service needs no per-service environment plumbing; real env wins

deploy/ (new):
- build-windows.ps1     cross-compile → dist\ + stage the deploy kit
- agent.env.example     fully documented config template
- install-service.ps1   register as auto-start Windows service (native sc.exe),
                        crash-restart 3×/5s, no NSSM dependency
- uninstall-service.ps1 stop + remove
- wireguard-node.conf.template + setup-wireguard.ps1  node dials out only, no
                        public IP / inbound rules; tunnel installed as boot service
- README.md             full control-plane + node walkthrough, ops table, troubleshooting

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
soroush.asadi
2026-06-05 12:20:48 +03:30
parent 67060c73b2
commit 52be5be93f
8 changed files with 522 additions and 0 deletions
+161
View File
@@ -0,0 +1,161 @@
# FlatRender Node Agent — Deployment
This folder turns a Windows machine with After Effects into a FlatRender render
node: connected to the control plane over an encrypted WireGuard mesh, running
the agent as an auto-restarting Windows service.
```
deploy/
├── build-windows.ps1 Cross-compile the agent → dist\ (run on your dev box)
├── agent.env.example Agent config template → copy to agent.env
├── install-service.ps1 Register the agent as a Windows service (sc.exe)
├── uninstall-service.ps1 Remove the service
├── wireguard-node.conf.template WireGuard client config → fill in → wg-flatrender.conf
└── setup-wireguard.ps1 Install + start the WireGuard tunnel as a boot service
```
The node only ever dials **out** to the control plane. It needs **no public IP**
and **no inbound firewall rules** — home ADSL, CGNAT, or any datacenter all work.
---
## Architecture
```
WireGuard mesh 10.66.0.0/24
┌─────────────────────────────┐ ┌──────────────────────────────┐
│ Control plane 10.66.0.1 │◄───────►│ Render node 10.66.0.11 │
│ ─ gateway :8088 │ encrypted ─ wireguard tunnel (svc) │
│ ─ render-svc (orchestrator)│ tunnel │ ─ node-agent (svc) │
│ ─ MinIO (templates/exports)│ │ ─ After Effects + aerender │
└─────────────────────────────┘ └──────────────────────────────┘
```
The agent: claims a job → downloads the `.aep` bundle from MinIO → binds user
customisations (JSX) → renders with `aerender.exe` → uploads the MP4 → reports
complete. It heartbeats every 5 s and streams live preview frames while rendering.
---
## 1. Control plane: one-time WireGuard server setup
On the Linux host that runs the V2 stack (gateway + MinIO):
```bash
# Install
sudo apt install -y wireguard
# Generate the server keypair
wg genkey | tee server.key | wg pubkey > server.pub
# /etc/wireguard/wg0.conf
cat >/etc/wireguard/wg0.conf <<'EOF'
[Interface]
Address = 10.66.0.1/24
ListenPort = 51820
PrivateKey = <contents of server.key>
# One [Peer] block per render node (append as you add nodes):
# [Peer]
# PublicKey = <node-1 public key>
# AllowedIPs = 10.66.0.11/32
EOF
sudo systemctl enable --now wg-quick@wg0
sudo wg show # prints the server public key for the node config
```
Open UDP **51820** to the internet (the only inbound port the control plane needs
for the mesh). The gateway (:8088) and MinIO stay bound to the WG interface — they
are never exposed publicly.
> Each time you add a node, append its `[Peer]` block and `sudo wg syncconf wg0 <(wg-quick strip wg0)`.
---
## 2. Build the agent (on your dev machine)
```powershell
# Requires Go 1.25+. Produces dist\flatrender-node-agent.exe + the deploy kit.
cd services\node-agent\deploy
.\build-windows.ps1
```
Copy the whole `dist\` folder to each render node (e.g. `C:\flatrender\`).
---
## 3. Render node: WireGuard
```powershell
# On the node, generate its keypair (WireGuard GUI → Add Tunnel → it shows the keys,
# or use the bundled wg.exe): wg genkey | wg pubkey
```
1. Copy `wireguard-node.conf.template``wg-flatrender.conf`.
2. Fill the four placeholders:
- `NODE_PRIVATE_KEY` — this node's private key
- `NODE_NUMBER` — unique mesh octet (11, 12, 13, …) → `Address = 10.66.0.11/32`
- `SERVER_PUBLIC_KEY` — from `wg show` on the control plane
- `SERVER_PUBLIC_ENDPOINT` — the control plane's public IP/host
3. Add this node's **public** key + `AllowedIPs = 10.66.0.11/32` as a `[Peer]` on the server (step 1).
4. Install the tunnel (elevated PowerShell):
```powershell
.\setup-wireguard.ps1 -ConfigPath .\wg-flatrender.conf
ping 10.66.0.1 # should reply over the tunnel
```
---
## 4. Render node: agent service
```powershell
# Configure
Copy-Item agent.env.example agent.env
notepad agent.env # set NODE_ID, NODE_HMAC_SECRET, ORCHESTRATOR_URL=http://10.66.0.1:8088, AE_PATH
```
Get `NODE_ID` by creating the node in the admin panel (**/admin/nodes → add**), or
via `POST /v1/nodes`. `NODE_HMAC_SECRET` must equal the render-svc value in `.env.v2`.
```powershell
# Install + start the service (elevated)
.\install-service.ps1
# Verify
curl http://localhost:7777/health
Get-Service FlatRenderNodeAgent
```
The node now appears **Ready** in `/admin/nodes` and starts claiming jobs.
---
## Operations
| Task | Command |
|---|---|
| Health | `curl http://localhost:7777/health` |
| Service status | `Get-Service FlatRenderNodeAgent` |
| Restart | `Restart-Service FlatRenderNodeAgent` |
| Stop | `Stop-Service FlatRenderNodeAgent` |
| Update binary | Stop service → replace exe → Start service |
| Change config | Edit `agent.env``Restart-Service FlatRenderNodeAgent` |
| Remove | `.\uninstall-service.ps1` |
| Tunnel status | `& 'C:\Program Files\WireGuard\wireguard.exe' show` |
The service auto-restarts on crash (3× at 5 s intervals) and auto-starts at boot.
WireGuard comes up first, so the agent always has a path to the gateway.
### Mock mode
Leave `AE_PATH` empty in `agent.env` to run the **mock renderer** — useful to smoke-test
the claim → download → upload → complete pipeline on a node without an AE licence.
### Troubleshooting
- **Node never goes Ready**: tunnel down (`wireguard.exe show`) or wrong `ORCHESTRATOR_URL`.
- **401 / signature errors**: `NODE_HMAC_SECRET` mismatch with render-svc.
- **Jobs claim but fail at download**: MinIO not reachable over the mesh — confirm MinIO
is bound to `10.66.0.1` and the presigned host in render-svc points at the mesh IP.
- **AE hangs**: a stale `aerender.exe`/`AfterFX.exe` — the agent force-kills these before
each launch; confirm AE opens manually and isn't stuck on a "Crash Repair" dialog.
@@ -0,0 +1,35 @@
# FlatRender Node Agent configuration.
# Copy to `agent.env` and place next to flatrender-node-agent.exe.
# The agent reads this file on startup (env vars still override any line here).
# ── Required ─────────────────────────────────────────────────────────────────
# UUID of this node, pre-created in render.render_nodes (admin → /admin/nodes).
NODE_ID=00000000-0000-0000-0000-000000000000
# ── Connectivity ─────────────────────────────────────────────────────────────
# Gateway base URL. Over the WireGuard mesh this is the control-plane's WG IP.
# Example (WG): http://10.66.0.1:8088 Local dev: http://localhost:8088
ORCHESTRATOR_URL=http://10.66.0.1:8088
# Shared secret for the X-Node-Signature header. MUST match NODE_HMAC_SECRET
# in the render-svc environment (.env.v2). Change from the default!
NODE_HMAC_SECRET=change-me-to-a-long-random-secret
# Region label — the orchestrator routes region-preferred jobs here.
NODE_REGION=iran-tehran-1
# ── After Effects ────────────────────────────────────────────────────────────
# Full path to aerender.exe. Leave empty to run the MOCK renderer (no AE needed).
AE_PATH=C:\Program Files\Adobe\Adobe After Effects 2024\Support Files\aerender.exe
AE_VERSION=2024
# ── Local paths / tuning ─────────────────────────────────────────────────────
# Scratch dir for downloaded .aep bundles + render output. Use a fast NVMe drive.
WORK_DIR=C:\flatrender\work
# Health endpoint port (the orchestrator and you can curl http://<node>:7777/health).
LISTEN_PORT=7777
# Loop cadences (seconds).
HEARTBEAT_INTERVAL_SEC=5
POLL_INTERVAL_SEC=3
@@ -0,0 +1,47 @@
<#
.SYNOPSIS
Cross-compile the FlatRender Node Agent to a Windows .exe.
.DESCRIPTION
Produces dist\flatrender-node-agent.exe and stages agent.env.example + the
deploy scripts alongside it, ready to copy to a render node.
Requires Go 1.25+ installed locally (works on Windows, macOS, or Linux).
.EXAMPLE
.\build-windows.ps1
#>
param(
[string]$OutDir = (Join-Path $PSScriptRoot "dist")
)
$ErrorActionPreference = "Stop"
$agentRoot = (Resolve-Path (Join-Path $PSScriptRoot "..")).Path
New-Item -ItemType Directory -Force -Path $OutDir | Out-Null
$exe = Join-Path $OutDir "flatrender-node-agent.exe"
Write-Host "Building Windows binary from $agentRoot ..."
$env:GOOS = "windows"
$env:GOARCH = "amd64"
$env:CGO_ENABLED = "0"
Push-Location $agentRoot
try {
& go build -trimpath -ldflags="-s -w" -o $exe ./cmd/agent
if ($LASTEXITCODE -ne 0) { throw "go build failed ($LASTEXITCODE)" }
} finally {
Pop-Location
}
# Stage the deploy kit next to the exe
Copy-Item (Join-Path $PSScriptRoot "agent.env.example") $OutDir -Force
Copy-Item (Join-Path $PSScriptRoot "install-service.ps1") $OutDir -Force
Copy-Item (Join-Path $PSScriptRoot "uninstall-service.ps1") $OutDir -Force
Copy-Item (Join-Path $PSScriptRoot "setup-wireguard.ps1") $OutDir -Force
Copy-Item (Join-Path $PSScriptRoot "wireguard-node.conf.template") $OutDir -Force
Copy-Item (Join-Path $PSScriptRoot "README.md") $OutDir -Force
Write-Host ""
Write-Host "✓ Built: $exe" -ForegroundColor Green
Write-Host " Deploy kit staged in: $OutDir"
Write-Host " Copy that folder to each render node, then follow README.md."
@@ -0,0 +1,82 @@
<#
.SYNOPSIS
Install the FlatRender Node Agent as a Windows service (native sc.exe — no NSSM).
.DESCRIPTION
Registers flatrender-node-agent.exe as an auto-start service that survives reboots
and auto-restarts on crash. Configuration is read from `agent.env` placed next to
the exe (see agent.env.example), so no per-service environment plumbing is needed.
Run from an ELEVATED PowerShell prompt (Administrator).
.PARAMETER ExePath
Path to flatrender-node-agent.exe. Defaults to the exe beside this script.
.PARAMETER ServiceName
Windows service name. Default: FlatRenderNodeAgent.
.EXAMPLE
.\install-service.ps1
.\install-service.ps1 -ExePath C:\flatrender\flatrender-node-agent.exe
#>
param(
[string]$ExePath = (Join-Path $PSScriptRoot "flatrender-node-agent.exe"),
[string]$ServiceName = "FlatRenderNodeAgent",
[string]$DisplayName = "FlatRender Node Agent"
)
$ErrorActionPreference = "Stop"
# ── Elevation check ───────────────────────────────────────────────────────────
$principal = New-Object Security.Principal.WindowsPrincipal([Security.Principal.WindowsIdentity]::GetCurrent())
if (-not $principal.IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)) {
Write-Error "This script must be run as Administrator. Right-click PowerShell → Run as administrator."
exit 1
}
# ── Validate exe + config ─────────────────────────────────────────────────────
if (-not (Test-Path $ExePath)) {
Write-Error "Executable not found: $ExePath`nBuild it first (see README) and copy it here."
exit 1
}
$ExePath = (Resolve-Path $ExePath).Path
$envFile = Join-Path (Split-Path $ExePath) "agent.env"
if (-not (Test-Path $envFile)) {
Write-Warning "No agent.env found next to the exe at: $envFile"
Write-Warning "Copy agent.env.example → agent.env and fill in NODE_ID / NODE_HMAC_SECRET before the service will work."
}
# ── Remove any existing instance ──────────────────────────────────────────────
$existing = Get-Service -Name $ServiceName -ErrorAction SilentlyContinue
if ($existing) {
Write-Host "Service '$ServiceName' already exists — stopping and removing it first..."
if ($existing.Status -ne 'Stopped') { Stop-Service -Name $ServiceName -Force -ErrorAction SilentlyContinue }
& sc.exe delete $ServiceName | Out-Null
Start-Sleep -Seconds 2
}
# ── Create the service ────────────────────────────────────────────────────────
# binPath must quote the exe path (spaces). start=auto → launches at boot.
Write-Host "Creating service '$ServiceName'..."
& sc.exe create $ServiceName binPath= "`"$ExePath`"" start= auto DisplayName= "$DisplayName" | Out-Null
& sc.exe description $ServiceName "FlatRender render-node agent: claims and renders After Effects jobs." | Out-Null
# ── Crash recovery: restart after 5s, three times, reset window 1 day ─────────
& sc.exe failure $ServiceName reset= 86400 actions= restart/5000/restart/5000/restart/5000 | Out-Null
# ── Start it ──────────────────────────────────────────────────────────────────
Write-Host "Starting service..."
Start-Service -Name $ServiceName
Start-Sleep -Seconds 2
$svc = Get-Service -Name $ServiceName
Write-Host ""
Write-Host "✓ Installed and $($svc.Status)." -ForegroundColor Green
Write-Host " Service : $ServiceName"
Write-Host " Exe : $ExePath"
Write-Host " Config : $envFile"
Write-Host ""
Write-Host " Health : curl http://localhost:7777/health"
Write-Host " Logs : Get-WinEvent -ProviderName 'Service Control Manager' | Select-Object -First 5"
Write-Host " Stop : Stop-Service $ServiceName"
Write-Host " Remove : .\uninstall-service.ps1"
@@ -0,0 +1,88 @@
<#
.SYNOPSIS
Install WireGuard and bring up the FlatRender mesh tunnel as a persistent service.
.DESCRIPTION
- Verifies WireGuard is installed (downloads the MSI if missing and -Download is set).
- Installs the given .conf as a permanent WireGuard tunnel service (survives reboot).
- The tunnel auto-connects on boot, BEFORE the node-agent service starts, so the
agent can always reach the gateway over 10.66.0.0/24.
Run ELEVATED (Administrator).
.PARAMETER ConfigPath
Path to the filled-in WireGuard config (from wireguard-node.conf.template).
Default: wg-flatrender.conf beside this script.
.PARAMETER Download
If set and WireGuard is not installed, download + silently install the MSI.
.EXAMPLE
.\setup-wireguard.ps1 -ConfigPath .\wg-flatrender.conf
#>
param(
[string]$ConfigPath = (Join-Path $PSScriptRoot "wg-flatrender.conf"),
[switch]$Download
)
$ErrorActionPreference = "Stop"
$principal = New-Object Security.Principal.WindowsPrincipal([Security.Principal.WindowsIdentity]::GetCurrent())
if (-not $principal.IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)) {
Write-Error "This script must be run as Administrator."
exit 1
}
# ── Ensure WireGuard is installed ─────────────────────────────────────────────
$wg = "C:\Program Files\WireGuard\wireguard.exe"
if (-not (Test-Path $wg)) {
if ($Download) {
Write-Host "WireGuard not found — downloading installer..."
$msi = Join-Path $env:TEMP "wireguard.msi"
Invoke-WebRequest -Uri "https://download.wireguard.com/windows-client/wireguard-installer.exe" -OutFile $msi
Write-Host "Installing WireGuard silently..."
Start-Process -FilePath $msi -ArgumentList "/S" -Wait
} else {
Write-Error "WireGuard is not installed. Install it from https://www.wireguard.com/install/ or re-run with -Download."
exit 1
}
}
# ── Validate config ───────────────────────────────────────────────────────────
if (-not (Test-Path $ConfigPath)) {
Write-Error "Config not found: $ConfigPath`nCopy wireguard-node.conf.template, fill the placeholders, save as wg-flatrender.conf."
exit 1
}
$ConfigPath = (Resolve-Path $ConfigPath).Path
if ((Get-Content $ConfigPath -Raw) -match '<[A-Z_]+>') {
Write-Error "Config still contains <PLACEHOLDERS>. Fill in all four values before installing."
exit 1
}
$tunnelName = [System.IO.Path]::GetFileNameWithoutExtension($ConfigPath)
# ── Remove existing tunnel of the same name ───────────────────────────────────
$svcName = "WireGuardTunnel`$$tunnelName"
if (Get-Service -Name $svcName -ErrorAction SilentlyContinue) {
Write-Host "Removing existing tunnel '$tunnelName'..."
& $wg /uninstalltunnelservice $tunnelName | Out-Null
Start-Sleep -Seconds 2
}
# ── Install the tunnel as a service ───────────────────────────────────────────
Write-Host "Installing WireGuard tunnel '$tunnelName' as a boot service..."
& $wg /installtunnelservice $ConfigPath
Start-Sleep -Seconds 3
$svc = Get-Service -Name $svcName -ErrorAction SilentlyContinue
if ($svc -and $svc.Status -eq 'Running') {
Write-Host ""
Write-Host "✓ WireGuard tunnel '$tunnelName' is up." -ForegroundColor Green
Write-Host " Verify : & '$wg' show"
Write-Host " Ping CP: ping 10.66.0.1"
Write-Host ""
Write-Host " Next : install the node agent service (install-service.ps1) and point"
Write-Host " ORCHESTRATOR_URL in agent.env at the control plane's mesh IP."
} else {
Write-Warning "Tunnel service did not reach Running state. Check: & '$wg' show"
}
@@ -0,0 +1,32 @@
<#
.SYNOPSIS
Stop and remove the FlatRender Node Agent Windows service.
.EXAMPLE
.\uninstall-service.ps1
#>
param(
[string]$ServiceName = "FlatRenderNodeAgent"
)
$ErrorActionPreference = "Stop"
$principal = New-Object Security.Principal.WindowsPrincipal([Security.Principal.WindowsIdentity]::GetCurrent())
if (-not $principal.IsInRole([Security.Principal.WindowsBuiltInRole]::Administrator)) {
Write-Error "This script must be run as Administrator."
exit 1
}
$svc = Get-Service -Name $ServiceName -ErrorAction SilentlyContinue
if (-not $svc) {
Write-Host "Service '$ServiceName' is not installed — nothing to do."
exit 0
}
if ($svc.Status -ne 'Stopped') {
Write-Host "Stopping '$ServiceName'..."
Stop-Service -Name $ServiceName -Force -ErrorAction SilentlyContinue
Start-Sleep -Seconds 2
}
& sc.exe delete $ServiceName | Out-Null
Write-Host "✓ Service '$ServiceName' removed." -ForegroundColor Green
@@ -0,0 +1,29 @@
# WireGuard tunnel for a FlatRender render node.
#
# The render node only ever dials OUT to the control plane — it never needs a
# public IP or any inbound firewall rule. All traffic to the gateway / MinIO
# rides this encrypted tunnel, so nodes can live behind NAT, on home ADSL, or
# in any datacenter.
#
# Fill in the four <PLACEHOLDERS> below, save as `wg-flatrender.conf`, then run
# setup-wireguard.ps1 (or import it in the WireGuard GUI).
[Interface]
# This node's private key (generate on the node: `wg genkey`).
PrivateKey = <NODE_PRIVATE_KEY>
# This node's address inside the mesh. Pick a unique 10.66.0.x per node.
Address = 10.66.0.<NODE_NUMBER>/32
# Optional: keep DNS on the LAN; the tunnel only carries mesh traffic (see AllowedIPs).
# DNS = 1.1.1.1
[Peer]
# Control plane (gateway + MinIO host) public key (from the server: `wg show`).
PublicKey = <SERVER_PUBLIC_KEY>
# Public endpoint of the control plane: <public-ip-or-host>:51820
Endpoint = <SERVER_PUBLIC_ENDPOINT>:51820
# Only route the mesh subnet through the tunnel — everything else uses the normal
# internet path. 10.66.0.0/24 = the FlatRender control + render mesh.
AllowedIPs = 10.66.0.0/24
# Hold the NAT mapping open so the orchestrator can reach the node's :7777 health
# port and so long-poll claims stay alive behind home routers / CGNAT.
PersistentKeepalive = 25
@@ -2,12 +2,58 @@
package config package config
import ( import (
"bufio"
"fmt" "fmt"
"os" "os"
"path/filepath" "path/filepath"
"strconv" "strconv"
"strings"
) )
// LoadEnvFile reads a simple KEY=VALUE file and sets any variables that are not
// already present in the environment. This lets the Windows service (installed
// via sc.exe, which has no per-service env support) be configured by dropping an
// `agent.env` file next to the executable — no registry edits required.
//
// Lookup order: $AGENT_ENV_FILE, then `agent.env` beside the exe, then `./agent.env`.
// Lines starting with # and blank lines are ignored. Existing env vars win, so an
// operator can still override any single value at the process level.
func LoadEnvFile() {
candidates := []string{}
if p := os.Getenv("AGENT_ENV_FILE"); p != "" {
candidates = append(candidates, p)
}
if exe, err := os.Executable(); err == nil {
candidates = append(candidates, filepath.Join(filepath.Dir(exe), "agent.env"))
}
candidates = append(candidates, "agent.env")
for _, path := range candidates {
f, err := os.Open(path)
if err != nil {
continue
}
scanner := bufio.NewScanner(f)
for scanner.Scan() {
line := strings.TrimSpace(scanner.Text())
if line == "" || strings.HasPrefix(line, "#") {
continue
}
key, val, ok := strings.Cut(line, "=")
if !ok {
continue
}
key = strings.TrimSpace(key)
val = strings.Trim(strings.TrimSpace(val), `"'`)
if _, exists := os.LookupEnv(key); !exists {
_ = os.Setenv(key, val)
}
}
f.Close()
return // first file found wins
}
}
// Config holds all runtime settings for the node agent. // Config holds all runtime settings for the node agent.
type Config struct { type Config struct {
// NodeID is the UUID of this render node, registered in the orchestrator. // NodeID is the UUID of this render node, registered in the orchestrator.
@@ -59,6 +105,8 @@ type Config struct {
// Load reads configuration from environment variables, returning an error // Load reads configuration from environment variables, returning an error
// if any required variable is missing. // if any required variable is missing.
func Load() (*Config, error) { func Load() (*Config, error) {
// Pull in agent.env (if present) before reading the environment.
LoadEnvFile()
c := &Config{ c := &Config{
NodeID: os.Getenv("NODE_ID"), NodeID: os.Getenv("NODE_ID"),
OrchestratorURL: getEnv("ORCHESTRATOR_URL", "http://localhost:8088"), OrchestratorURL: getEnv("ORCHESTRATOR_URL", "http://localhost:8088"),