# Embeddings

An **embedding** turns a piece of text into a vector of numbers that captures
its meaning, so you can compare texts by distance (semantic search, clustering,
retrieval-augmented generation). Effect's [`EmbeddingModel`](https://effect.plants.sh/ai/embeddings/)
service is the provider-agnostic interface for producing those vectors: your
code calls `embed` / `embedMany`, and *which* provider answers (OpenAI, an
OpenAI-compatible endpoint, a local model, ...) is a `Layer` you wire up once.

The service is intentionally small. A provider only has to supply **one** batch
function — `embedMany` — and `EmbeddingModel.make` derives everything else from
it, including the ability to coalesce many concurrent single-input `embed` calls
into a single provider request.

```ts
import { Effect } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"

const program = Effect.gen(function* () {
  const model = yield* EmbeddingModel.EmbeddingModel

  // One input -> one vector.
  const response = yield* model.embed("the quick brown fox")
  return response.vector // => readonly number[]
})
```
**Unstable module:** The AI modules live under `effect/unstable/ai`. The API may change before it
  is promoted to stable.

## Embedding a single string

`embed` takes one string and resolves to an `EmbedResponse` whose `vector` is
the embedding. Internally each `embed` call goes through a
[request resolver](https://effect.plants.sh/batching/), so several `embed` calls running concurrently
are batched into a single provider request (see [Batching](#batching-many-embeddings)).

```ts
import { Effect } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"

const query = Effect.gen(function* () {
  const model = yield* EmbeddingModel.EmbeddingModel

  const { vector } = yield* model.embed("how do I cancel my order?")
  return vector
  // => [0.0123, -0.0481, 0.0099, ...]
})
```

## Embedding many strings

When you already hold a batch, call `embedMany`. It preserves input order and
returns an `EmbedManyResponse` with one `EmbedResponse` per input plus
provider-reported token `usage`.

```ts
import { Effect } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"

const ingest = Effect.gen(function* () {
  const model = yield* EmbeddingModel.EmbeddingModel

  const response = yield* model.embedMany([
    "Effect is a TypeScript library",
    "Embeddings are numeric vectors",
    "Layers wire up dependencies"
  ])

  response.embeddings.length // => 3
  response.embeddings[0].vector // => number[] for the first input
  response.usage.inputTokens // => number | undefined (when the provider reports it)

  return response.embeddings.map((e) => e.vector)
})
```
**Empty input is free:** `embedMany([])` short-circuits and returns an empty response **without**
  calling the provider — `usage.inputTokens` is `undefined`.

## Providing a concrete model

`EmbeddingModel.EmbeddingModel` is a `Context.Service` — a requirement that must
be satisfied by a `Layer` at the edge of your app. Provider packages expose
helpers that build that Layer (and the matching [`Dimensions`](#dimensions)
service) from a client.

```ts
import { OpenAiClient, OpenAiEmbeddingModel } from "@effect/ai-openai-compat"
import { Config, Effect, Layer } from "effect"
import { FetchHttpClient } from "effect/unstable/http"
import { EmbeddingModel } from "effect/unstable/ai"

// 1. The client Layer holds your API key and an HttpClient.
const OpenAiClientLayer = OpenAiClient.layerConfig({
  apiKey: Config.redacted("OPENAI_API_KEY")
}).pipe(Layer.provide(FetchHttpClient.layer))

// 2. The model Layer selects a concrete embedding model + its dimensions.
//    `model(...)` provides BOTH EmbeddingModel and Dimensions.
const EmbeddingLayer = OpenAiEmbeddingModel.model("text-embedding-3-small", {
  dimensions: 1536
})

const program = Effect.gen(function* () {
  const model = yield* EmbeddingModel.EmbeddingModel
  const dimensions = yield* EmbeddingModel.Dimensions // => 1536

  const { vector } = yield* model.embed("hello world")
  return { vector, dimensions }
}).pipe(
  Effect.provide(EmbeddingLayer),
  Effect.provide(OpenAiClientLayer)
)
```

`OpenAiEmbeddingModel.layer(...)` provides only the `EmbeddingModel` service
(no `Dimensions`) when you manage the vector size yourself.

### A fake provider for tests

You rarely build the provider Layer by hand in application code, but it is the
clearest way to see the contract: implement `embedMany`, and `make` derives the
rest. This is exactly how the real provider packages are built, and it is ideal
for unit tests.

```ts
import { Effect, Layer } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"
import type { AiError } from "effect/unstable/ai"

// A deterministic stub: each vector is just [length of the input].
const TestEmbeddingLayer = Layer.effect(
  EmbeddingModel.EmbeddingModel,
  EmbeddingModel.make({
    embedMany: ({
      inputs
    }: EmbeddingModel.ProviderOptions): Effect.Effect<
      EmbeddingModel.ProviderResponse,
      AiError.AiError
    > =>
      Effect.succeed({
        results: inputs.map((input) => [input.length]),
        usage: { inputTokens: inputs.join(" ").length }
      })
  })
)

const test = Effect.gen(function* () {
  const model = yield* EmbeddingModel.EmbeddingModel
  const { vector } = yield* model.embed("hello")
  return vector // => [5]
}).pipe(Effect.provide(TestEmbeddingLayer))
```

## Batching many embeddings

Embedding APIs are far cheaper per token when you send many inputs in one HTTP
request. `EmbeddingModel.make` builds `embed` on top of a
[`RequestResolver`](https://effect.plants.sh/batching/), so concurrent `embed` calls collapse into a
single `embedMany` provider call automatically — you do not have to hand-build
batches.

```ts
import { Effect } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"

const search = Effect.gen(function* () {
  const model = yield* EmbeddingModel.EmbeddingModel

  // Three independent embed calls running concurrently...
  const [a, b, c] = yield* Effect.all(
    [model.embed("apple"), model.embed("banana"), model.embed("cherry")],
    { concurrency: "unbounded" }
  )

  // ...are coalesced into ONE provider `embedMany(["apple","banana","cherry"])`
  // call. Vectors come back in request order.
  return [a.vector, b.vector, c.vector]
})
```

Because each `embed` is a [request](https://effect.plants.sh/batching/), results can also be **cached**.
The service exposes its underlying `resolver`, so you can wrap it with
[`RequestResolver.withCache`](https://effect.plants.sh/batching/) and issue requests through
`Effect.request` directly — embedding the same string twice then hits the cache
instead of the provider.

```ts
import { Effect, RequestResolver } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"

const cached = Effect.gen(function* () {
  const model = yield* EmbeddingModel.EmbeddingModel

  // Wrap the resolver in an in-memory cache (keyed by the request value).
  const cachedResolver = yield* RequestResolver.withCache(model.resolver, {
    capacity: 256
  })

  const embed = (input: string) =>
    Effect.request(new EmbeddingModel.EmbeddingRequest({ input }), cachedResolver)

  // The second identical request is served from the cache.
  const first = yield* embed("repeat me")
  const second = yield* embed("repeat me")
  return [first.vector, second.vector]
})
```
**Positional contract:** Provider responses are interpreted **by position**: the result array must
  contain exactly one vector per input, in the same order. A mismatched count
  fails with `AiError.InvalidOutputError`.

## A small semantic search

Putting it together: embed a corpus once with `embedMany`, embed the query with
`embed`, and rank documents by cosine similarity.

```ts
import { Array, Effect } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"

const cosine = (a: ReadonlyArray<number>, b: ReadonlyArray<number>) => {
  let dot = 0
  let na = 0
  let nb = 0
  for (let i = 0; i < a.length; i++) {
    dot += a[i] * b[i]
    na += a[i] * a[i]
    nb += b[i] * b[i]
  }
  return dot / (Math.sqrt(na) * Math.sqrt(nb))
}

const semanticSearch = (query: string, documents: ReadonlyArray<string>) =>
  Effect.gen(function* () {
    const model = yield* EmbeddingModel.EmbeddingModel

    const { embeddings } = yield* model.embedMany(documents)
    const { vector } = yield* model.embed(query)

    return Array.map(documents, (doc, i) => ({
      doc,
      score: cosine(vector, embeddings[i].vector)
    })).sort((x, y) => y.score - x.score)
    // => documents ranked most-similar first
  })
```

---

## API reference

Everything below is exported from `effect/unstable/ai/EmbeddingModel`.

### EmbeddingModel

The `Context.Service` tag for embedding operations. Yield it to obtain a
[`Service`](#service) with `embed`, `embedMany`, and the underlying `resolver`.

```ts
import { Effect } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"

const program = Effect.gen(function* () {
  const model = yield* EmbeddingModel.EmbeddingModel
  return yield* model.embed("hello")
})
// program requires: EmbeddingModel.EmbeddingModel
```

### Dimensions

A separate `Context.Service` (a `number`) carrying the configured embedding
vector size. Provider helpers like `OpenAiEmbeddingModel.model(...)` provide it
alongside the model; downstream code (e.g. a vector store schema) can read it.

```ts
import { Effect, Layer } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"

const program = Effect.gen(function* () {
  const size = yield* EmbeddingModel.Dimensions
  return size // => 1536
}).pipe(Effect.provide(Layer.succeed(EmbeddingModel.Dimensions, 1536)))
```

### Service

The interface behind the `EmbeddingModel` tag. `embed` resolves one input,
`embedMany` resolves a batch, and `resolver` is the low-level
`RequestResolver<EmbeddingRequest>` that `embed` is built on.

```ts
import type { Effect } from "effect"
import type { RequestResolver } from "effect"
import type { EmbeddingModel } from "effect/unstable/ai"
import type { AiError } from "effect/unstable/ai"

// Shape (for reference):
interface Service {
  readonly resolver: RequestResolver.RequestResolver<EmbeddingModel.EmbeddingRequest>
  readonly embed: (
    input: string
  ) => Effect.Effect<EmbeddingModel.EmbedResponse, AiError.AiError>
  readonly embedMany: (
    input: ReadonlyArray<string>
  ) => Effect.Effect<EmbeddingModel.EmbedManyResponse, AiError.AiError>
}
```

### make

Builds a `Service` from a single provider `embedMany` function. It wires up a
request resolver so concurrent `embed` calls batch into one provider call, and
short-circuits `embedMany([])` without invoking the provider. Returns
`Effect<Service>` (typically wrapped in `Layer.effect`).

```ts
import { Effect, Layer } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"

const layer = Layer.effect(
  EmbeddingModel.EmbeddingModel,
  EmbeddingModel.make({
    embedMany: ({ inputs }) =>
      Effect.succeed({
        results: inputs.map((s) => [s.length]),
        usage: { inputTokens: undefined }
      })
  })
)
// layer: Layer<EmbeddingModel.EmbeddingModel>
```
**Only embedMany is required:** `make` takes exactly one option — the provider `embedMany`. Single-input
  `embed`, batching, ordering, and caching are all derived for you; there is no
  separate `embed` parameter to supply.

### EmbedResponse

A `Schema.Class` for a single embedding result. Its only field is `vector`, the
array of finite numbers.

```ts
import { EmbeddingModel } from "effect/unstable/ai"

const r = new EmbeddingModel.EmbedResponse({ vector: [0.1, 0.2, 0.3] })
r.vector // => [0.1, 0.2, 0.3]
```

### EmbedManyResponse

A `Schema.Class` for a batch result. `embeddings` is an array of
[`EmbedResponse`](#embedresponse) in input order, and `usage` is an
[`EmbeddingUsage`](#embeddingusage).

```ts
import { EmbeddingModel } from "effect/unstable/ai"

const r = new EmbeddingModel.EmbedManyResponse({
  embeddings: [
    new EmbeddingModel.EmbedResponse({ vector: [1, 2] }),
    new EmbeddingModel.EmbedResponse({ vector: [3, 4] })
  ],
  usage: new EmbeddingModel.EmbeddingUsage({ inputTokens: 9 })
})

r.embeddings.length // => 2
r.embeddings[1].vector // => [3, 4]
r.usage.inputTokens // => 9
```

### EmbeddingUsage

A `Schema.Class` holding token usage metadata. `inputTokens` is `number |
undefined` — `undefined` when the provider does not report usage (or when
`embedMany([])` skips the provider).

```ts
import { EmbeddingModel } from "effect/unstable/ai"

new EmbeddingModel.EmbeddingUsage({ inputTokens: 42 }).inputTokens // => 42
new EmbeddingModel.EmbeddingUsage({ inputTokens: undefined }).inputTokens // => undefined
```

### EmbeddingRequest

A `Request.TaggedClass` representing one input to be embedded. It resolves to an
[`EmbedResponse`](#embedresponse) and can fail with `AiError`. You build these
only when working directly with the [`resolver`](#service); `embed` does it for
you.

```ts
import { Effect } from "effect"
import { EmbeddingModel } from "effect/unstable/ai"

const program = Effect.gen(function* () {
  const { resolver } = yield* EmbeddingModel.EmbeddingModel
  // Issue a request directly against the resolver (what `embed` does internally).
  const response = yield* Effect.request(
    new EmbeddingModel.EmbeddingRequest({ input: "hello" }),
    resolver
  )
  return response.vector // => number[]
})
```

### ProviderOptions

The input a provider's `embedMany` receives: `{ inputs: ReadonlyArray<string> }`.
This is the only argument your provider implementation is handed.

```ts
import type { EmbeddingModel } from "effect/unstable/ai"

const options: EmbeddingModel.ProviderOptions = {
  inputs: ["a", "b", "c"]
}
options.inputs // => ["a", "b", "c"]
```

### ProviderResponse

The value a provider's `embedMany` must return: `results`, an array of raw
numeric vectors (one per input, in order), and `usage.inputTokens`
(`number | undefined`).

```ts
import type { EmbeddingModel } from "effect/unstable/ai"

const response: EmbeddingModel.ProviderResponse = {
  results: [
    [0.1, 0.2],
    [0.3, 0.4]
  ],
  usage: { inputTokens: 8 }
}
response.results.length // => 2
response.usage.inputTokens // => 8
```

## See also

- [Language Model](https://effect.plants.sh/ai/language-model/) — text generation, structured output,
  and streaming against a provider-agnostic model.
- [Batching](https://effect.plants.sh/batching/) — the request/resolver machinery that powers `embed`
  batching and caching.
- [Services and Layers](https://effect.plants.sh/services-and-layers/) — how the `EmbeddingModel` and
  `Dimensions` services are provided.