# Building an Agent Loop The [`Chat`](https://effect.plants.sh/ai/chat/) page shows the core agentic loop in a few lines: call the model with a [toolkit](https://effect.plants.sh/ai/tools/), and while the response contains tool calls, loop again — the session merges tool results into history for you. That loop *is* an agent. This page takes it the rest of the way to production by layering on the concerns a real agent needs: - a **turn budget** so a misbehaving model can't loop forever, - **per-turn tracing** so you can see token usage and finish reasons in [observability](https://effect.plants.sh/observability/), - **streaming** the reply to the user as it's generated, - **context compaction** to stay within the model's context window on long runs, - **steering and interruption** of an agent while it's working, and - **typed errors** that preserve their cause. Everything here is plain Effect — `Queue`, `Ref`, `Fiber`, `Stream`, spans — so an agent composes with the rest of your application exactly like any other effect. ## Shared setup Every example below assumes this provider client and a one-tool toolkit. The model is captured into a `Layer` once and provided per generation, so you can swap providers — or even switch models mid-conversation — without touching the loop. ```ts import { OpenAiClient, OpenAiLanguageModel } from "@effect/ai-openai" import { Config, DateTime, Effect, Layer, Schema } from "effect" import { Tool, Toolkit } from "effect/unstable/ai" import { FetchHttpClient } from "effect/unstable/http" const OpenAiClientLayer = OpenAiClient.layerConfig({ apiKey: Config.redacted("OPENAI_API_KEY") }).pipe(Layer.provide(FetchHttpClient.layer)) // One tool the agent can call. The handler is an Effect, so it gets the Clock, // services, and typed errors — no `Date.now()`, no raw promises. const Tools = Toolkit.make(Tool.make("getCurrentTime", { description: "Get the current time in ISO format", parameters: Schema.Struct({ timezone: Schema.String }), success: Schema.String })) const ToolsLayer = Tools.toLayer(Effect.gen(function*() { return Tools.of({ getCurrentTime: Effect.fn("Tools.getCurrentTime")(function*(_) { const now = yield* DateTime.now return DateTime.formatIso(now) }) }) })) ``` ## The loop Wrap the loop in a [service](https://effect.plants.sh/services-and-layers/) so the rest of the app depends only on `Agent`, never on the model or toolkit. Two things make this more than the minimal version: each turn runs inside its own [span](https://effect.plants.sh/observability/) that records token usage and the finish reason, and a **turn budget** bounds the loop so a model that keeps calling tools eventually fails with a domain error instead of hanging. ```ts import { Context, Effect, Layer, Schema } from "effect" import { AiError, Chat, Toolkit } from "effect/unstable/ai" // ...OpenAiClientLayer, Tools, ToolsLayer from "Shared setup" above // A tagged error that keeps the underlying failure as `cause` instead of // stringifying it — so the original tag and context survive for debugging. class AgentError extends Schema.TaggedErrorClass()("AgentError", { message: Schema.String, cause: Schema.optionalKey(Schema.Defect) }) { static fromAiError(error: AiError.AiError) { return new AgentError({ message: `model call failed: ${error.reason}`, cause: error }) } } const MAX_TURNS = 10 class Agent extends Context.Service }>()("app/Agent") { static readonly layer = Layer.effect( Agent, Effect.gen(function*() { const modelLayer = yield* OpenAiLanguageModel.model("gpt-5.2").captureRequirements const toolkit = yield* Tools // One turn: generate, then record what happened on the active span. const runTurn = Effect.fn("Agent.turn")(function*(session: Chat.Service) { const response = yield* session.generateText({ prompt: [], toolkit }) yield* Effect.annotateCurrentSpan({ "agent.finish_reason": response.finishReason, "agent.tool_calls": response.toolCalls.length, "agent.input_tokens": response.usage.inputTokens.total ?? 0, "agent.output_tokens": response.usage.outputTokens.total ?? 0 }) return response }) const run = Effect.fn("Agent.run")( function*(question: string) { const session = yield* Chat.fromPrompt([ { role: "system", content: "You can use tools to answer questions." }, { role: "user", content: question } ]) for (let turn = 1; turn <= MAX_TURNS; turn++) { const response = yield* runTurn(session).pipe(Effect.provide(modelLayer)) // No tool calls → the model produced its final answer. if (response.toolCalls.length === 0) return response.text // Otherwise the session has already appended the tool results to // history; loop to let the model continue. } // Budget exhausted: surface it as a domain failure, never a hang. return yield* new AgentError({ message: `agent did not finish within ${MAX_TURNS} turns` }) }, // Narrow the error channel to AgentError. Map AiError to our domain // error (keeping it as `cause`); die on anything truly unexpected, such // as a defect from a tool handler. Effect.catchTag( "AiError", (error) => Effect.fail(AgentError.fromAiError(error)), (defect) => Effect.die(defect) ) ) return Agent.of({ run }) }) ).pipe(Layer.provide([OpenAiClientLayer, ToolsLayer])) } ``` Each iteration is one model turn; the loop ends either when the model answers in plain text or when the budget runs out. Because `runTurn` is an [`Effect.fn`](https://effect.plants.sh/code-style/guidelines/) with a name, every turn becomes a span nested under `Agent.run`, annotated with its token usage and finish reason — pull a trace and you can see exactly how many turns and tokens a question cost. **Tip:** The turn budget is your circuit breaker. Models occasionally get stuck calling the same tool repeatedly; without a bound, that's an unkillable loop burning tokens. Failing with a typed `AgentError` lets callers retry, fall back, or report it. ## Streaming the agent's output For a responsive UI you want tokens on screen as the model produces them, not after the whole turn completes. Use `session.streamText` and consume the [stream](https://effect.plants.sh/streaming/) of response parts. The catch: in streaming mode there's no `response.toolCalls` to inspect — the loop's continue/stop decision comes from the `finish` part's `reason`, which is `"tool-calls"` when the model paused to call tools. ```ts import { Effect, Ref, Stream } from "effect" import { Chat, Response } from "effect/unstable/ai" // ...modelLayer + toolkit in scope // Stream one turn: print text deltas as they arrive, capture the finish reason. const streamTurn = (session: Chat.Service) => Effect.gen(function*() { const finishReason = yield* Ref.make("unknown") yield* Stream.runForEach( session.streamText({ prompt: [], toolkit }), (part) => { switch (part.type) { case "text-delta": return Effect.sync(() => process.stdout.write(part.delta)) case "finish": return Ref.set(finishReason, part.reason) default: return Effect.void } } ) return yield* Ref.get(finishReason) }) const streamingAgent = (question: string) => Effect.gen(function*() { const session = yield* Chat.fromPrompt([ { role: "system", content: "You can use tools to answer questions." }, { role: "user", content: question } ]) // Keep streaming turns while the model is still calling tools. The session // appends each turn (and its tool results) to history as the stream finalizes. let reason = yield* streamTurn(session).pipe(Effect.provide(modelLayer)) while (reason === "tool-calls") { reason = yield* streamTurn(session).pipe(Effect.provide(modelLayer)) } }) ``` **Consume the stream to completion:** `streamText` writes the assistant's reply into history only when the stream **finalizes**. If you stop consuming early, that turn won't be recorded and the next turn will see incomplete context. Always drain the stream — `Stream.runForEach` does this for you. ## Keeping the conversation within the context window A long-running agent accumulates history until it overflows the model's context window. The fix is **compaction**: once the prompt grows past a threshold, replace the older turns with a model-written summary and keep only the most recent ones. The per-turn span already records `inputTokens`, so you have the signal to act on. ```ts import { Effect, Ref } from "effect" import { Chat, LanguageModel, Prompt } from "effect/unstable/ai" const COMPACT_AT_TOKENS = 100_000 // act well before the hard context limit const KEEP_RECENT = 8 // messages to preserve verbatim after the summary const compactIfNeeded = (session: Chat.Service, lastInputTokens: number) => Effect.gen(function*() { if (lastInputTokens < COMPACT_AT_TOKENS) return const history = yield* Ref.get(session.history) if (history.content.length <= KEEP_RECENT + 1) return const recent = history.content.slice(-KEEP_RECENT) const older = Prompt.fromMessages(history.content.slice(0, -KEEP_RECENT)) // Summarize the older turns with the same model. const { text: summary } = yield* LanguageModel.generateText({ prompt: older.pipe(Prompt.appendSystem( "Summarize the conversation so far into a concise brief that preserves " + "key facts, decisions, identifiers, and open tasks. Reply with the summary only." )) }) // Rewrite history to [summary-as-system, ...recent turns]. Writing to // `session.history` directly is exactly what this kind of maintenance is for. yield* Ref.set( session.history, Prompt.fromMessages(recent).pipe( Prompt.prependSystem(`Summary of earlier conversation:\n${summary}\n\n`) ) ) }) ``` Call `compactIfNeeded(session, response.usage.inputTokens.total ?? 0)` at the end of each turn in the loop. Compaction is the one place where writing to `session.history` directly is the right move — for normal turns, prefer the generation methods, which keep the encode/decode/save helpers in sync. **Note:** Slice on clean turn boundaries. If `KEEP_RECENT` lands in the middle of a tool-call / tool-result pair, the model receives a dangling tool call. In practice, keep an even, generous window (and bump it to a boundary) rather than cutting tight. ## Steering and interrupting a running agent A real agent runs while the user keeps interacting with it — adding a clarification, redirecting it, or stopping it outright. Run the loop on its own [fiber](https://effect.plants.sh/concurrency/fibers/), feed it user input through a [`Queue`](https://effect.plants.sh/concurrency/queue-and-pubsub/), and use `Fiber.interrupt` to stop it. Effect's structured concurrency tears down the loop — including any in-flight model call — cleanly. 1. **Drain helper** — pull every message currently queued without blocking, so a turn can fold in everything the user typed while the previous turn was running: ```ts import { Effect, Option, Queue } from "effect" const drain = (queue: Queue.Queue ) => Effect.gen(function*() { const items: Array = [] let next = yield* Queue.poll(queue) while (Option.isSome(next)) { items.push(next.value) next = yield* Queue.poll(queue) } return items }) ``` 2. **The loop** — block until the user says something, drain any extra messages, then run the tool loop for that input and print the answer: ```ts import { Chat } from "effect/unstable/ai" // ...modelLayer + toolkit in scope const conversationLoop = (session: Chat.Service, inbox: Queue.Queue) => Effect.gen(function*() { while (true) { const first = yield* Queue.take(inbox) // waits for input const rest = yield* drain(inbox) // anything else queued meanwhile const turnInput = [first, ...rest].map((text) => ({ role: "user" as const, content: text })) let response = yield* session.generateText({ prompt: turnInput, toolkit }) .pipe(Effect.provide(modelLayer)) while (response.toolCalls.length > 0) { response = yield* session.generateText({ prompt: [], toolkit }) .pipe(Effect.provide(modelLayer)) } yield* Effect.log(response.text) } }) ``` 3. **Drive it** — fork the loop, push messages to steer it, and interrupt to stop: ```ts import { Effect, Fiber, Queue } from "effect" const program = Effect.gen(function*() { const inbox = yield* Queue.unbounded() const session = yield* Chat.fromPrompt([ { role: "system", content: "You are a helpful research assistant." } ]) const fiber = yield* Effect.fork(conversationLoop(session, inbox)) yield* Queue.offer(inbox, "Research the trade-offs of optimistic locking.") // ...later, steer it — picked up at the next turn boundary: yield* Queue.offer(inbox, "Focus on the write-contention failure modes.") // ...stop it; the in-flight turn is interrupted and the loop torn down. yield* Effect.sleep("2 minutes") yield* Fiber.interrupt(fiber) }) ``` Because the loop blocks on `Queue.take`, an idle agent costs nothing. Steering is just another `offer` — messages are folded into the next turn, so a redirect lands the moment the current turn finishes rather than being lost. And interruption is free: you never wrote cancellation logic, structured concurrency handles it. **Tip:** To interrupt *and* answer with what you have, race the loop against a stop signal instead of interrupting the fiber — e.g. `Effect.raceFirst` the loop against `Queue.take(stopInbox)`. Plain `Fiber.interrupt` is the right tool for a hard stop. ## Persisting the conversation For agents that outlive a single request or process, don't serialize by hand — use a **persisted** chat session, which saves its history to a backing store after every turn. The loop above is unchanged; only the session constructor differs. See [Persisted chats](https://effect.plants.sh/ai/chat/#persisted-chats) for the full setup with `Chat.layerPersisted` and a [`BackingPersistence`](https://effect.plants.sh/persistence/persistence/) layer. ## Related - [Chat](https://effect.plants.sh/ai/chat/) — the stateful session and the minimal agentic loop this builds on. - [Tools and Toolkits](https://effect.plants.sh/ai/tools/) — define what your agent can do, with approval gating and failure modes. - [Language Model](https://effect.plants.sh/ai/language-model/) — the generation API underneath, including `streamText` and response metadata. - [Concurrency](https://effect.plants.sh/concurrency/) — `Fiber`, `Queue`, and the structured concurrency that makes steering and interruption free. - [Observability](https://effect.plants.sh/observability/) — the spans every turn emits.