Skip to content

Embeddings

An embedding turns a piece of text into a vector of numbers that captures its meaning, so you can compare texts by distance (semantic search, clustering, retrieval-augmented generation). Effect’s EmbeddingModel service is the provider-agnostic interface for producing those vectors: your code calls embed / embedMany, and which provider answers (OpenAI, an OpenAI-compatible endpoint, a local model, …) is a Layer you wire up once.

The service is intentionally small. A provider only has to supply one batch function — embedMany — and EmbeddingModel.make derives everything else from it, including the ability to coalesce many concurrent single-input embed calls into a single provider request.

import { Effect } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"
const program = Effect.gen(function* () {
const model = yield* EmbeddingModel.EmbeddingModel
// One input -> one vector.
const response = yield* model.embed("the quick brown fox")
return response.vector // => readonly number[]
})

embed takes one string and resolves to an EmbedResponse whose vector is the embedding. Internally each embed call goes through a request resolver, so several embed calls running concurrently are batched into a single provider request (see Batching).

import { Effect } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"
const query = Effect.gen(function* () {
const model = yield* EmbeddingModel.EmbeddingModel
const { vector } = yield* model.embed("how do I cancel my order?")
return vector
// => [0.0123, -0.0481, 0.0099, ...]
})

When you already hold a batch, call embedMany. It preserves input order and returns an EmbedManyResponse with one EmbedResponse per input plus provider-reported token usage.

import { Effect } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"
const ingest = Effect.gen(function* () {
const model = yield* EmbeddingModel.EmbeddingModel
const response = yield* model.embedMany([
"Effect is a TypeScript library",
"Embeddings are numeric vectors",
"Layers wire up dependencies"
])
response.embeddings.length // => 3
response.embeddings[0].vector // => number[] for the first input
response.usage.inputTokens // => number | undefined (when the provider reports it)
return response.embeddings.map((e) => e.vector)
})

EmbeddingModel.EmbeddingModel is a Context.Service — a requirement that must be satisfied by a Layer at the edge of your app. Provider packages expose helpers that build that Layer (and the matching Dimensions service) from a client.

import { OpenAiClient, OpenAiEmbeddingModel } from "@effect/ai-openai-compat"
import { Config, Effect, Layer } from "effect"
import { FetchHttpClient } from "effect/unstable/http"
import { EmbeddingModel } from "effect/unstable/ai"
// 1. The client Layer holds your API key and an HttpClient.
const OpenAiClientLayer = OpenAiClient.layerConfig({
apiKey: Config.redacted("OPENAI_API_KEY")
}).pipe(Layer.provide(FetchHttpClient.layer))
// 2. The model Layer selects a concrete embedding model + its dimensions.
// `model(...)` provides BOTH EmbeddingModel and Dimensions.
const EmbeddingLayer = OpenAiEmbeddingModel.model("text-embedding-3-small", {
dimensions: 1536
})
const program = Effect.gen(function* () {
const model = yield* EmbeddingModel.EmbeddingModel
const dimensions = yield* EmbeddingModel.Dimensions // => 1536
const { vector } = yield* model.embed("hello world")
return { vector, dimensions }
}).pipe(
Effect.provide(EmbeddingLayer),
Effect.provide(OpenAiClientLayer)
)

OpenAiEmbeddingModel.layer(...) provides only the EmbeddingModel service (no Dimensions) when you manage the vector size yourself.

You rarely build the provider Layer by hand in application code, but it is the clearest way to see the contract: implement embedMany, and make derives the rest. This is exactly how the real provider packages are built, and it is ideal for unit tests.

import { Effect, Layer } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"
import type { AiError } from "effect/unstable/ai"
// A deterministic stub: each vector is just [length of the input].
const TestEmbeddingLayer = Layer.effect(
EmbeddingModel.EmbeddingModel,
EmbeddingModel.make({
embedMany: ({
inputs
}: EmbeddingModel.ProviderOptions): Effect.Effect<
EmbeddingModel.ProviderResponse,
AiError.AiError
> =>
Effect.succeed({
results: inputs.map((input) => [input.length]),
usage: { inputTokens: inputs.join(" ").length }
})
})
)
const test = Effect.gen(function* () {
const model = yield* EmbeddingModel.EmbeddingModel
const { vector } = yield* model.embed("hello")
return vector // => [5]
}).pipe(Effect.provide(TestEmbeddingLayer))

Embedding APIs are far cheaper per token when you send many inputs in one HTTP request. EmbeddingModel.make builds embed on top of a RequestResolver, so concurrent embed calls collapse into a single embedMany provider call automatically — you do not have to hand-build batches.

import { Effect } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"
const search = Effect.gen(function* () {
const model = yield* EmbeddingModel.EmbeddingModel
// Three independent embed calls running concurrently...
const [a, b, c] = yield* Effect.all(
[model.embed("apple"), model.embed("banana"), model.embed("cherry")],
{ concurrency: "unbounded" }
)
// ...are coalesced into ONE provider `embedMany(["apple","banana","cherry"])`
// call. Vectors come back in request order.
return [a.vector, b.vector, c.vector]
})

Because each embed is a request, results can also be cached. The service exposes its underlying resolver, so you can wrap it with RequestResolver.withCache and issue requests through Effect.request directly — embedding the same string twice then hits the cache instead of the provider.

import { Effect, RequestResolver } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"
const cached = Effect.gen(function* () {
const model = yield* EmbeddingModel.EmbeddingModel
// Wrap the resolver in an in-memory cache (keyed by the request value).
const cachedResolver = yield* RequestResolver.withCache(model.resolver, {
capacity: 256
})
const embed = (input: string) =>
Effect.request(new EmbeddingModel.EmbeddingRequest({ input }), cachedResolver)
// The second identical request is served from the cache.
const first = yield* embed("repeat me")
const second = yield* embed("repeat me")
return [first.vector, second.vector]
})

Putting it together: embed a corpus once with embedMany, embed the query with embed, and rank documents by cosine similarity.

import { Array, Effect } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"
const cosine = (a: ReadonlyArray<number>, b: ReadonlyArray<number>) => {
let dot = 0
let na = 0
let nb = 0
for (let i = 0; i < a.length; i++) {
dot += a[i] * b[i]
na += a[i] * a[i]
nb += b[i] * b[i]
}
return dot / (Math.sqrt(na) * Math.sqrt(nb))
}
const semanticSearch = (query: string, documents: ReadonlyArray<string>) =>
Effect.gen(function* () {
const model = yield* EmbeddingModel.EmbeddingModel
const { embeddings } = yield* model.embedMany(documents)
const { vector } = yield* model.embed(query)
return Array.map(documents, (doc, i) => ({
doc,
score: cosine(vector, embeddings[i].vector)
})).sort((x, y) => y.score - x.score)
// => documents ranked most-similar first
})

Everything below is exported from effect/unstable/ai/EmbeddingModel.

The Context.Service tag for embedding operations. Yield it to obtain a Service with embed, embedMany, and the underlying resolver.

import { Effect } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"
const program = Effect.gen(function* () {
const model = yield* EmbeddingModel.EmbeddingModel
return yield* model.embed("hello")
})
// program requires: EmbeddingModel.EmbeddingModel

A separate Context.Service (a number) carrying the configured embedding vector size. Provider helpers like OpenAiEmbeddingModel.model(...) provide it alongside the model; downstream code (e.g. a vector store schema) can read it.

import { Effect, Layer } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"
const program = Effect.gen(function* () {
const size = yield* EmbeddingModel.Dimensions
return size // => 1536
}).pipe(Effect.provide(Layer.succeed(EmbeddingModel.Dimensions, 1536)))

The interface behind the EmbeddingModel tag. embed resolves one input, embedMany resolves a batch, and resolver is the low-level RequestResolver<EmbeddingRequest> that embed is built on.

import type { Effect } from "effect"
import type { RequestResolver } from "effect"
import type { EmbeddingModel } from "effect/unstable/ai"
import type { AiError } from "effect/unstable/ai"
// Shape (for reference):
interface Service {
readonly resolver: RequestResolver.RequestResolver<EmbeddingModel.EmbeddingRequest>
readonly embed: (
input: string
) => Effect.Effect<EmbeddingModel.EmbedResponse, AiError.AiError>
readonly embedMany: (
input: ReadonlyArray<string>
) => Effect.Effect<EmbeddingModel.EmbedManyResponse, AiError.AiError>
}

Builds a Service from a single provider embedMany function. It wires up a request resolver so concurrent embed calls batch into one provider call, and short-circuits embedMany([]) without invoking the provider. Returns Effect<Service> (typically wrapped in Layer.effect).

import { Effect, Layer } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"
const layer = Layer.effect(
EmbeddingModel.EmbeddingModel,
EmbeddingModel.make({
embedMany: ({ inputs }) =>
Effect.succeed({
results: inputs.map((s) => [s.length]),
usage: { inputTokens: undefined }
})
})
)
// layer: Layer<EmbeddingModel.EmbeddingModel>

A Schema.Class for a single embedding result. Its only field is vector, the array of finite numbers.

import { EmbeddingModel } from "effect/unstable/ai"
const r = new EmbeddingModel.EmbedResponse({ vector: [0.1, 0.2, 0.3] })
r.vector // => [0.1, 0.2, 0.3]

A Schema.Class for a batch result. embeddings is an array of EmbedResponse in input order, and usage is an EmbeddingUsage.

import { EmbeddingModel } from "effect/unstable/ai"
const r = new EmbeddingModel.EmbedManyResponse({
embeddings: [
new EmbeddingModel.EmbedResponse({ vector: [1, 2] }),
new EmbeddingModel.EmbedResponse({ vector: [3, 4] })
],
usage: new EmbeddingModel.EmbeddingUsage({ inputTokens: 9 })
})
r.embeddings.length // => 2
r.embeddings[1].vector // => [3, 4]
r.usage.inputTokens // => 9

A Schema.Class holding token usage metadata. inputTokens is number | undefinedundefined when the provider does not report usage (or when embedMany([]) skips the provider).

import { EmbeddingModel } from "effect/unstable/ai"
new EmbeddingModel.EmbeddingUsage({ inputTokens: 42 }).inputTokens // => 42
new EmbeddingModel.EmbeddingUsage({ inputTokens: undefined }).inputTokens // => undefined

A Request.TaggedClass representing one input to be embedded. It resolves to an EmbedResponse and can fail with AiError. You build these only when working directly with the resolver; embed does it for you.

import { Effect } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"
const program = Effect.gen(function* () {
const { resolver } = yield* EmbeddingModel.EmbeddingModel
// Issue a request directly against the resolver (what `embed` does internally).
const response = yield* Effect.request(
new EmbeddingModel.EmbeddingRequest({ input: "hello" }),
resolver
)
return response.vector // => number[]
})

The input a provider’s embedMany receives: { inputs: ReadonlyArray<string> }. This is the only argument your provider implementation is handed.

import type { EmbeddingModel } from "effect/unstable/ai"
const options: EmbeddingModel.ProviderOptions = {
inputs: ["a", "b", "c"]
}
options.inputs // => ["a", "b", "c"]

The value a provider’s embedMany must return: results, an array of raw numeric vectors (one per input, in order), and usage.inputTokens (number | undefined).

import type { EmbeddingModel } from "effect/unstable/ai"
const response: EmbeddingModel.ProviderResponse = {
results: [
[0.1, 0.2],
[0.3, 0.4]
],
usage: { inputTokens: 8 }
}
response.results.length // => 2
response.usage.inputTokens // => 8
  • Language Model — text generation, structured output, and streaming against a provider-agnostic model.
  • Batching — the request/resolver machinery that powers embed batching and caching.
  • Services and Layers — how the EmbeddingModel and Dimensions services are provided.