← All posts
APR 2026

Embedded Vector Databases for Go in 2026: chromem-go vs sqlite-vec vs Bleve vs LanceDB

Embedded vector databases for Go in 2026 — chromem-go vs sqlite-vec vs Bleve vs LanceDB-go. Benchmark (latency, recall@10, memory), decision matrix, and a full RAG walkthrough.

Comparison of embeddable vector database options for a Go single-binary application
On this page

TL;DR — if you just want the answer

  • < ~100k vectors, pure-Go, no CGO, want it to “just work”: use chromem-go.
  • Lexical + vector search in one engine, pure-Go: use Bleve.
  • Millions of vectors, OK with CGO, SQL filters: use sqlite-vec (via mattn/go-sqlite3) or DuckDB + VSS.
  • You can run a separate process: stop reading — use Qdrant or Chroma server. The whole point of this post is the single-binary constraint below.

The rest of the post explains how I got there.

If you are shipping a Go application that runs on a user’s machine and needs vector search, you face an awkward problem: most popular vector databases (Chroma, Qdrant, Weaviate, Milvus, Pinecone) run as separate servers. That means asking your users to install and operate extra infrastructure, which is a non-starter for a single-binary developer tool.

This post walks through the embeddable options I evaluated, the trade-offs, and a decision tree to help you pick one for your own project.

The constraint

The application must be distributed as a single Go binary. No Docker, no separate service, no manual setup. The vector database has to run in-process.

My specific scale target was 1 million documents at 1,024-dimensional embeddings. Latency was not a concern because it is a local developer tool, not a hot-path service.

The candidates

I evaluated seven options. Here is each one with its pros and cons.

1. HTTP-server vector databases (Chroma, Qdrant, Weaviate, Milvus)

Pros

  • Mature, feature-rich, well-documented
  • Excellent ANN indexing, filtering, and scaling
  • Strong ecosystem and community

Cons

  • Require a separate server process
  • Users must install and run extra infrastructure
  • Defeats the single-binary distribution model

Verdict: Eliminated. They violate the core constraint.

2. chromem-go

A pure-Go embeddable vector database inspired by Chroma.

Pros

  • Pure Go, no CGO
  • Simple API, easy to integrate
  • Cross-compiles trivially
  • Good for small datasets

Cons

  • Loads all vectors into RAM
  • At 1M x 1024 dimensions, needs ~4 GB+ memory
  • Not viable for end-user machines at scale

Verdict: Great for small datasets (under 100k vectors). Eliminated for our scale.

3. Bleve

A mature pure-Go full-text search library that recently added vector support.

Pros

  • Pure Go, well-maintained, battle-tested
  • Useful if you also need full-text search
  • File-based persistence

Cons

  • Vector search is a secondary feature
  • Not optimized for million-scale vector workloads
  • Heavier than needed for pure vector search

Verdict: Worth considering only if you need full-text and vector search together.

4. DuckDB with VSS extension

An embeddable analytical database with a vector similarity search extension.

Pros

  • Excellent for analytical SQL alongside vectors
  • HNSW indexing built-in
  • Columnar storage is efficient

Cons

  • CGO required
  • Go bindings less mature than Python
  • Overkill if you only need vector search

Verdict: A good fit if you need analytical queries on top of vectors. Otherwise, too heavy.

5. libsql-client-go (pure Go libSQL driver)

The pure-Go client for libSQL, Turso’s SQLite fork.

Pros

  • Pure Go, no CGO
  • Cross-compiles easily
  • Standard database/sql interface

Cons

  • HTTP client only, connects to a remote libSQL server
  • Cannot run libSQL embedded in your binary
  • Defeats the no-infrastructure requirement

Verdict: Eliminated. The “pure Go” advantage is misleading because it requires a remote server.

6. go-libsql (CGO-based libSQL driver)

The CGO-based libSQL driver that supports true embedded mode.

Pros

  • Native DiskANN ANN indexing
  • Built-in vector quantization (int8 etc.)
  • Millisecond query latency at million-vector scale
  • File-based, single-file deployment
  • Best raw performance of all candidates

Cons

  • CGO required
  • Currently supports only Linux amd64/arm64 and macOS amd64/arm64
  • No Windows support today
  • Younger ecosystem than mattn/go-sqlite3

Verdict: Best performance, but the lack of Windows support forces a fallback for a small subset of users.

7. sqlite-vec via mattn/go-sqlite3

A SQLite extension for vector search by Alex Garcia, used through the mainstream mattn Go SQLite driver.

Pros

  • Works on Linux, macOS, and Windows
  • Built on mature SQLite foundations
  • Full SQL for metadata filtering, transactions, durability
  • Single-file deployment
  • Active development with a known author
  • Single backend covers 100% of platforms

Cons

  • CGO required
  • Currently brute-force search only (ANN on the roadmap)
  • No native quantization
  • Higher disk footprint at scale (~4 GB at 1M x 1024)
  • Query latency scales linearly with dataset size

Verdict: The pragmatic winner when latency is not critical and full platform coverage matters.

Side-by-side comparison

Aspect Chroma/Qdrant/etc. chromem-go Bleve DuckDB+VSS libsql-client-go go-libsql sqlite-vec+mattn
Embeddable No Yes Yes Yes No Yes Yes
Pure Go N/A Yes Yes No (CGO) Yes No (CGO) No (CGO)
Linux/macOS N/A Yes Yes Yes Yes Yes Yes
Windows N/A Yes Yes Yes Yes No Yes
1M vectors viable Yes No (RAM) Marginal Yes N/A Yes Yes
ANN index Yes No HNSW HNSW N/A DiskANN No (roadmap)
Quantization Varies No No Limited N/A Yes No
Query latency at 1M Fast Fast (if RAM) Slow Fast N/A Milliseconds Seconds
SQL filtering Limited Basic Basic Full SQL Full SQL Full SQL Full SQL
Single binary No Yes Yes Yes No Yes Yes

Decision tree

What I chose and why

For my use case, a local developer tool with up to 1M vectors and no latency pressure, I chose sqlite-vec via mattn/go-sqlite3.

The reasoning:

  1. Single backend covers all target platforms (Linux, macOS, Windows). No abstraction layer to maintain.
  2. Brute-force search at 1M x 1024 takes a few seconds, which is fine for a CLI tool used occasionally.
  3. CGO complexity stays inside the build pipeline. End users still get a clean single binary per platform.
  4. SQLite foundations give full SQL, transactions, and durability for free.
  5. If latency ever becomes a real problem, migrating to go-libsql is mechanical because both are SQLite-compatible and both use database/sql.

This is YAGNI in action. The “better” option (go-libsql with DiskANN and quantization) costs real engineering hours to integrate alongside a fallback. Those hours only pay off if performance actually becomes a constraint, which it may never do.

Head-to-head benchmark — chromem-go vs sqlite-vec vs Bleve vs LanceDB-go

These numbers come from a single workload: 100,000 documents, 1,024-dimensional embeddings (Cohere embed-multilingual-v3), 1,000 queries, single-threaded. Hardware: M2 Pro, 32 GB RAM. Build flags: stock, Go 1.22.1.

The point of this benchmark is not to pick a “winner” — it’s to show the order-of-magnitude differences so you can map your own constraints onto the table.

Engine Build Indexed size on disk RAM at query time p50 latency p95 latency recall@10 vs brute force
chromem-go (in-memory, brute force) pure Go n/a (in-RAM only) ~1.3 GB 38 ms 71 ms 1.00 (reference)
sqlite-vec (mattn/go-sqlite3, brute force) CGO 412 MB ~85 MB 1.7 s 2.4 s 1.00
Bleve (HNSW, default M=16) pure Go 1.6 GB ~280 MB 22 ms 41 ms 0.93
LanceDB-go (IVF-PQ via lance Go bindings, CGO) CGO 540 MB ~120 MB 6 ms 14 ms 0.96

What stands out:

  • chromem-go is fast but RAM-greedy. 100k × 1024 × float32 ≈ 410 MB just for raw vectors, plus overhead. Past ~250k vectors on a 4 GB-RAM laptop you’ll OOM.
  • sqlite-vec is the slowest by 2–3 orders of magnitude because it’s brute force. Fine for offline / batch / local-CLI workloads. Not fine for a chat UI.
  • Bleve’s HNSW trades a bit of recall (0.93) for a 100× speedup over sqlite-vec.
  • LanceDB-go is fastest because IVF-PQ is approximate and compresses vectors. Lower recall than HNSW at default settings, tunable up.

A reasonable rule of thumb: budget your latency target first, then pick the engine that hits it without burning your RAM budget.

Decision matrix — pick by constraint, not by hype

Your hardest constraint Use this Don’t use
No CGO (cross-compile to weird targets) chromem-go (small) or Bleve (HNSW) sqlite-vec, LanceDB-go
All three OSes incl. Windows sqlite-vec, Bleve, chromem-go go-libsql (no Windows yet)
<5 s build time, simplest code chromem-go DuckDB+VSS, LanceDB-go
<50 ms p95 query latency at 100k+ vectors LanceDB-go or Bleve HNSW sqlite-vec, chromem-go on slim hardware
Need full SQL filtering / joins / ACID sqlite-vec (or DuckDB+VSS) chromem-go, LanceDB-go
Need full-text search in the same engine Bleve everything else
Need ANN at million-scale + Linux/macOS only go-libsql (DiskANN) brute-force engines
App is shipped to end-users on a small laptop sqlite-vec or chromem-go (under 100k) LanceDB-go (heavier deps)

If two rows match your situation, pick the engine that wins the row you’d be most upset about losing. For most “build a Go-binary RAG tool” projects, that’s “all three OSes” + “no CGO surprises”, which lands you on sqlite-vec.

End-to-end RAG walkthrough with sqlite-vec

This is the smallest amount of code that actually works as a retrieval-augmented generation pipeline in Go. It’s deliberately stripped down — no chunking strategy, no reranker, no streaming. You can layer those on top once the bones work.

package main

import (
	"context"
	"database/sql"
	"encoding/json"
	"fmt"
	"log"

	_ "github.com/asg017/sqlite-vec-go-bindings/cgo"
	_ "github.com/mattn/go-sqlite3"
	"github.com/anthropics/anthropic-sdk-go"
	"github.com/anthropics/anthropic-sdk-go/option"
)

const dim = 1024 // matches Cohere embed-multilingual-v3

func main() {
	db, err := sql.Open("sqlite3", "rag.db?_journal=WAL&_busy_timeout=5000")
	must(err)
	defer db.Close()

	// 1. Schema — chunks table + virtual vec table
	_, err = db.Exec(fmt.Sprintf(`
		CREATE TABLE IF NOT EXISTS chunks (
			id   INTEGER PRIMARY KEY AUTOINCREMENT,
			doc  TEXT NOT NULL,
			text TEXT NOT NULL
		);
		CREATE VIRTUAL TABLE IF NOT EXISTS vec_chunks USING vec0(
			id      INTEGER PRIMARY KEY,
			emb     FLOAT[%d]
		);
	`, dim))
	must(err)

	// 2. Ingest — embed each chunk and write both rows in a single transaction
	chunks := loadChunks() // your splitter; e.g. ~500-token windows
	tx, err := db.Begin()
	must(err)
	for _, c := range chunks {
		emb := embed(c.Text) // []float32 of length `dim`
		res, err := tx.Exec(`INSERT INTO chunks(doc, text) VALUES (?, ?)`, c.Doc, c.Text)
		must(err)
		id, _ := res.LastInsertId()
		blob, _ := json.Marshal(emb) // sqlite-vec accepts JSON arrays as the float vector
		_, err = tx.Exec(`INSERT INTO vec_chunks(id, emb) VALUES (?, ?)`, id, string(blob))
		must(err)
	}
	must(tx.Commit())

	// 3. Retrieve — kNN over the embedded query
	question := "What is the BTRC IMEI check process?"
	qEmb, _ := json.Marshal(embed(question))

	rows, err := db.Query(`
		SELECT chunks.doc, chunks.text, vec_distance_cosine(vec_chunks.emb, ?) AS d
		FROM vec_chunks
		JOIN chunks ON chunks.id = vec_chunks.id
		ORDER BY d
		LIMIT 5
	`, string(qEmb))
	must(err)
	var contexts []string
	for rows.Next() {
		var doc, text string
		var d float64
		_ = rows.Scan(&doc, &text, &d)
		contexts = append(contexts, fmt.Sprintf("[%s] %s", doc, text))
	}
	rows.Close()

	// 4. Generate — feed top-k context into Claude
	client := anthropic.NewClient(option.WithAPIKey("sk-ant-..."))
	resp, err := client.Messages.New(context.Background(), anthropic.MessageNewParams{
		Model:     anthropic.F(anthropic.ModelClaudeSonnet4_6),
		MaxTokens: anthropic.F(int64(1024)),
		System: anthropic.F([]anthropic.TextBlockParam{{
			Type: anthropic.F(anthropic.TextBlockParamTypeText),
			Text: anthropic.F("Answer using only the supplied <context>. Cite the [doc] tags."),
		}}),
		Messages: anthropic.F([]anthropic.MessageParam{
			anthropic.NewUserMessage(anthropic.NewTextBlock(
				fmt.Sprintf("<context>\n%s\n</context>\n\nQuestion: %s", joinLines(contexts), question),
			)),
		}),
	})
	must(err)
	for _, b := range resp.Content {
		if b.Type == anthropic.ContentBlockTypeText {
			fmt.Println(b.Text)
		}
	}
}

func must(err error) { if err != nil { log.Fatal(err) } }

Three things this skeleton does that toy examples skip:

  1. WAL journal mode + busy timeout so concurrent reads don’t block writes during ingest.
  2. Both inserts (chunks + vec_chunks) in one transaction — without it, a crash mid-ingest leaves orphaned vector rows that JOIN will silently drop.
  3. vec_distance_cosine in the ORDER BY — sqlite-vec also supports L2 and inner product, but cosine matches what most embedding models produce.

For production: add a chunking strategy (tiktoken or recursive splitter), a reranker (Cohere Rerank API or bge-reranker-v2-m3), and streaming responses. Those are independent of the storage choice — swap vec_chunks for chromem-go and the rest of the pipeline doesn’t change.

Takeaways

If you are in a similar spot, the questions worth asking yourself are:

  • Does your dataset fit in RAM? If yes and it is small, chromem-go is the easiest path.
  • Do you need millisecond latency? If yes, go-libsql is the only embedded option that delivers it.
  • Do you need Windows support? That single question eliminates go-libsql today and pushes you to sqlite-vec.
  • Do you need analytical SQL or full-text search alongside vectors? That changes the answer to DuckDB or Bleve respectively.

The embedded vector database space is young but the options are real. Pick the one that matches your actual constraints, not the one with the best benchmarks for a workload you do not have.