Runtime

A deterministic execution engine for AI workflows.

The AINL runtime, implemented in runtime/engine.py, takes compiled graphs and executes them with strict state discipline, capability grants, and clear audit trails. The runner service wraps this engine behind policy-gated APIs like /run and /capabilities.

Graph-first executionPolicy-gated /runCapability discoveryNamed security profiles

1.GET /capabilities — discover adapters, verbs, privilege tiers
2.POST /run — submit AINL source or compiled IR
3.Policy gate — optional policy object validated before execution
4.RuntimeEngine — executes graph nodes deterministically
5.Record / replay — optional call logging for audit & debugging

View runtime on GitHub Read the security model Runtime docs

Engine

RuntimeEngine: graph-first semantics.

The core runtime is the RuntimeEngine in runtime/engine.py. It owns step execution, state updates, adapter calls, and record/replay — separate from any particular server or UI.

Deterministic graph execution

The engine executes nodes in the compiled graph IR in a predictable order, with explicit jumps and control flow. Given the same graph, inputs, and adapter configuration, you get the same behavior every time.

Tiered state discipline

AINL distinguishes between in-graph variables, cache, persistent storage, and coordination state (queues/mailboxes). Adapters expose these tiers explicitly so you can reason about where your data lives.

Record and replay

The runner can optionally record adapter calls and results, then replay them against the same graph for debugging, audits, or regression tests — documented in the integration and architecture docs.

Runner service

/run and /capabilities, as real APIs.

The FastAPI runner in scripts/runtime_runner_service.py exposes the runtime over HTTP: synchronous execution with /run, queued workloads with /enqueue and /result, capability discovery with /capabilities, and health/metrics endpoints.

Endpoint	Verb	Purpose
/capabilities	GET	Return supported adapters, verbs, effect defaults, privilege tiers, and policy support.
/run	POST	Compile (if needed), validate policy, execute a workflow synchronously, and return structured output.
/enqueue	POST	Submit a workflow for async execution; returns an ID for polling.
/result/{id}	GET	Fetch the result of an async run by ID.
/health /ready	GET	Simple liveness and readiness probes for orchestration and load-balancers.
/metrics	GET	Prometheus-style metrics for latency, errors, and adapter usage.

Typical integration pattern

Query GET /capabilities to understand which adapters and privilege tiers a given runtime instance supports.
Compile or author an AINL program, then submit it to POST /run with optional policy and adapter allowlist.
For higher-throughput or long-running workflows, use /enqueue and poll /result.

Capability grants

Named security profiles at startup.

At startup, each runtime surface loads a server-level capability grant from a named security profile, such as local_minimal or sandbox_network_restricted. Requests can only tighten these rules, never widen them.

Server grants

The runner reads AINL_SECURITY_PROFILE at startup and loads a named profile from tooling/security_profiles.json using the capability grant model. This grant defines which adapters, effect tiers, and privilege tiers are even possible.

Restrictive-only merge

When callers attach a policy object to /run, the host merges that policy with the server grant using a restrictive-only rule: callers can forbid more adapters or tiers, but can never escape the server's base restrictions.

Privilege-aware capabilities

Adapter metadata (e.g. destructive, network_facing, sandbox_safe, privilege tier) is exposed via /capabilities. Orchestrators can construct policies like "forbid all destructive adapters" without hard-coding adapter names.

For full details, see docs/operations/CAPABILITY_GRANT_MODEL.md and docs/operations/SANDBOX_EXECUTION_PROFILE.md.

Runtime cost profile.

In AINL benchmarks, complex workflows compile once (tens of thousands of tokens) and then run at roughly fixed cost per execution. Case studies like HOW_AINL_SAVES_MONEY highlight scenarios where moving orchestration into the runtime produces order-of-magnitude savings compared to re-running prompt loops for every invocation.

Put simply: the runtime keeps your workflows cheap and predictable once they're compiled, while still enforcing strict capability and policy boundaries.