Embedded Vector Databases for Go in 2026: chromem-go vs sqlite-vec vs Bleve vs LanceDB
Embedded vector databases for Go in 2026 — chromem-go vs sqlite-vec vs Bleve vs LanceDB-go. Benchmark (latency, recall@10, memory), decision matrix, and a full RAG walkthrough.
On this page ▾
TL;DR — if you just want the answer
- < ~100k vectors, pure-Go, no CGO, want it to “just work”: use chromem-go.
- Lexical + vector search in one engine, pure-Go: use Bleve.
- Millions of vectors, OK with CGO, SQL filters: use sqlite-vec (via
mattn/go-sqlite3) or DuckDB + VSS.- You can run a separate process: stop reading — use Qdrant or Chroma server. The whole point of this post is the single-binary constraint below.
The rest of the post explains how I got there.
If you are shipping a Go application that runs on a user’s machine and needs vector search, you face an awkward problem: most popular vector databases (Chroma, Qdrant, Weaviate, Milvus, Pinecone) run as separate servers. That means asking your users to install and operate extra infrastructure, which is a non-starter for a single-binary developer tool.
This post walks through the embeddable options I evaluated, the trade-offs, and a decision tree to help you pick one for your own project.
The constraint
The application must be distributed as a single Go binary. No Docker, no separate service, no manual setup. The vector database has to run in-process.
My specific scale target was 1 million documents at 1,024-dimensional embeddings. Latency was not a concern because it is a local developer tool, not a hot-path service.
The candidates
I evaluated seven options. Here is each one with its pros and cons.
1. HTTP-server vector databases (Chroma, Qdrant, Weaviate, Milvus)
Pros
- Mature, feature-rich, well-documented
- Excellent ANN indexing, filtering, and scaling
- Strong ecosystem and community
Cons
- Require a separate server process
- Users must install and run extra infrastructure
- Defeats the single-binary distribution model
Verdict: Eliminated. They violate the core constraint.
2. chromem-go
A pure-Go embeddable vector database inspired by Chroma.
Pros
- Pure Go, no CGO
- Simple API, easy to integrate
- Cross-compiles trivially
- Good for small datasets
Cons
- Loads all vectors into RAM
- At 1M x 1024 dimensions, needs ~4 GB+ memory
- Not viable for end-user machines at scale
Verdict: Great for small datasets (under 100k vectors). Eliminated for our scale.
3. Bleve
A mature pure-Go full-text search library that recently added vector support.
Pros
- Pure Go, well-maintained, battle-tested
- Useful if you also need full-text search
- File-based persistence
Cons
- Vector search is a secondary feature
- Not optimized for million-scale vector workloads
- Heavier than needed for pure vector search
Verdict: Worth considering only if you need full-text and vector search together.
4. DuckDB with VSS extension
An embeddable analytical database with a vector similarity search extension.
Pros
- Excellent for analytical SQL alongside vectors
- HNSW indexing built-in
- Columnar storage is efficient
Cons
- CGO required
- Go bindings less mature than Python
- Overkill if you only need vector search
Verdict: A good fit if you need analytical queries on top of vectors. Otherwise, too heavy.
5. libsql-client-go (pure Go libSQL driver)
The pure-Go client for libSQL, Turso’s SQLite fork.
Pros
- Pure Go, no CGO
- Cross-compiles easily
- Standard database/sql interface
Cons
- HTTP client only, connects to a remote libSQL server
- Cannot run libSQL embedded in your binary
- Defeats the no-infrastructure requirement
Verdict: Eliminated. The “pure Go” advantage is misleading because it requires a remote server.
6. go-libsql (CGO-based libSQL driver)
The CGO-based libSQL driver that supports true embedded mode.
Pros
- Native DiskANN ANN indexing
- Built-in vector quantization (int8 etc.)
- Millisecond query latency at million-vector scale
- File-based, single-file deployment
- Best raw performance of all candidates
Cons
- CGO required
- Currently supports only Linux amd64/arm64 and macOS amd64/arm64
- No Windows support today
- Younger ecosystem than mattn/go-sqlite3
Verdict: Best performance, but the lack of Windows support forces a fallback for a small subset of users.
7. sqlite-vec via mattn/go-sqlite3
A SQLite extension for vector search by Alex Garcia, used through the mainstream mattn Go SQLite driver.
Pros
- Works on Linux, macOS, and Windows
- Built on mature SQLite foundations
- Full SQL for metadata filtering, transactions, durability
- Single-file deployment
- Active development with a known author
- Single backend covers 100% of platforms
Cons
- CGO required
- Currently brute-force search only (ANN on the roadmap)
- No native quantization
- Higher disk footprint at scale (~4 GB at 1M x 1024)
- Query latency scales linearly with dataset size
Verdict: The pragmatic winner when latency is not critical and full platform coverage matters.
Side-by-side comparison
| Aspect | Chroma/Qdrant/etc. | chromem-go | Bleve | DuckDB+VSS | libsql-client-go | go-libsql | sqlite-vec+mattn |
|---|---|---|---|---|---|---|---|
| Embeddable | No | Yes | Yes | Yes | No | Yes | Yes |
| Pure Go | N/A | Yes | Yes | No (CGO) | Yes | No (CGO) | No (CGO) |
| Linux/macOS | N/A | Yes | Yes | Yes | Yes | Yes | Yes |
| Windows | N/A | Yes | Yes | Yes | Yes | No | Yes |
| 1M vectors viable | Yes | No (RAM) | Marginal | Yes | N/A | Yes | Yes |
| ANN index | Yes | No | HNSW | HNSW | N/A | DiskANN | No (roadmap) |
| Quantization | Varies | No | No | Limited | N/A | Yes | No |
| Query latency at 1M | Fast | Fast (if RAM) | Slow | Fast | N/A | Milliseconds | Seconds |
| SQL filtering | Limited | Basic | Basic | Full SQL | Full SQL | Full SQL | Full SQL |
| Single binary | No | Yes | Yes | Yes | No | Yes | Yes |
Decision tree
flowchart TD
Start[Need vector search in your Go app] --> Q1{Can users run a separate server?}
Q1 -->|Yes| Server[Use Chroma, Qdrant, Weaviate, or Milvus]
Q1 -->|No, must be embedded| Q2{Dataset size?}
Q2 -->|Under 100k vectors| Small{CGO acceptable?}
Small -->|No| Chromem[Use chromem-go: pure Go, simple, fast for small data]
Small -->|Yes| Q3
Q2 -->|100k to 10M vectors| Q3{Latency critical?}
Q3 -->|Yes, milliseconds matter| Q4{Need Windows support?}
Q4 -->|No, Linux/macOS only| GoLibsql[Use go-libsql: DiskANN, quantization, fastest]
Q4 -->|Yes, all platforms| Dual[Use go-libsql primary plus sqlite-vec fallback for Windows]
Q3 -->|No, seconds are fine| Q5{Need full-text search too?}
Q5 -->|Yes| Bleve[Use Bleve: combined full-text and vector]
Q5 -->|No| Q6{Need analytical SQL?}
Q6 -->|Yes| DuckDB[Use DuckDB with VSS extension]
Q6 -->|No| SqliteVec[Use sqlite-vec with mattn/go-sqlite3: simple, all platforms]
Q2 -->|Over 10M vectors| Reconsider[Reconsider embedded model: server-based likely better]
What I chose and why
For my use case, a local developer tool with up to 1M vectors and no latency pressure, I chose sqlite-vec via mattn/go-sqlite3.
The reasoning:
- Single backend covers all target platforms (Linux, macOS, Windows). No abstraction layer to maintain.
- Brute-force search at 1M x 1024 takes a few seconds, which is fine for a CLI tool used occasionally.
- CGO complexity stays inside the build pipeline. End users still get a clean single binary per platform.
- SQLite foundations give full SQL, transactions, and durability for free.
- If latency ever becomes a real problem, migrating to go-libsql is mechanical because both are SQLite-compatible and both use database/sql.
This is YAGNI in action. The “better” option (go-libsql with DiskANN and quantization) costs real engineering hours to integrate alongside a fallback. Those hours only pay off if performance actually becomes a constraint, which it may never do.
Head-to-head benchmark — chromem-go vs sqlite-vec vs Bleve vs LanceDB-go
These numbers come from a single workload: 100,000 documents, 1,024-dimensional embeddings (Cohere embed-multilingual-v3), 1,000 queries, single-threaded. Hardware: M2 Pro, 32 GB RAM. Build flags: stock, Go 1.22.1.
The point of this benchmark is not to pick a “winner” — it’s to show the order-of-magnitude differences so you can map your own constraints onto the table.
| Engine | Build | Indexed size on disk | RAM at query time | p50 latency | p95 latency | recall@10 vs brute force |
|---|---|---|---|---|---|---|
| chromem-go (in-memory, brute force) | pure Go | n/a (in-RAM only) | ~1.3 GB | 38 ms | 71 ms | 1.00 (reference) |
sqlite-vec (mattn/go-sqlite3, brute force) |
CGO | 412 MB | ~85 MB | 1.7 s | 2.4 s | 1.00 |
| Bleve (HNSW, default M=16) | pure Go | 1.6 GB | ~280 MB | 22 ms | 41 ms | 0.93 |
LanceDB-go (IVF-PQ via lance Go bindings, CGO) |
CGO | 540 MB | ~120 MB | 6 ms | 14 ms | 0.96 |
What stands out:
- chromem-go is fast but RAM-greedy. 100k × 1024 × float32 ≈ 410 MB just for raw vectors, plus overhead. Past ~250k vectors on a 4 GB-RAM laptop you’ll OOM.
- sqlite-vec is the slowest by 2–3 orders of magnitude because it’s brute force. Fine for offline / batch / local-CLI workloads. Not fine for a chat UI.
- Bleve’s HNSW trades a bit of recall (0.93) for a 100× speedup over sqlite-vec.
- LanceDB-go is fastest because IVF-PQ is approximate and compresses vectors. Lower recall than HNSW at default settings, tunable up.
A reasonable rule of thumb: budget your latency target first, then pick the engine that hits it without burning your RAM budget.
Decision matrix — pick by constraint, not by hype
| Your hardest constraint | Use this | Don’t use |
|---|---|---|
| No CGO (cross-compile to weird targets) | chromem-go (small) or Bleve (HNSW) | sqlite-vec, LanceDB-go |
| All three OSes incl. Windows | sqlite-vec, Bleve, chromem-go | go-libsql (no Windows yet) |
| <5 s build time, simplest code | chromem-go | DuckDB+VSS, LanceDB-go |
| <50 ms p95 query latency at 100k+ vectors | LanceDB-go or Bleve HNSW | sqlite-vec, chromem-go on slim hardware |
| Need full SQL filtering / joins / ACID | sqlite-vec (or DuckDB+VSS) | chromem-go, LanceDB-go |
| Need full-text search in the same engine | Bleve | everything else |
| Need ANN at million-scale + Linux/macOS only | go-libsql (DiskANN) | brute-force engines |
| App is shipped to end-users on a small laptop | sqlite-vec or chromem-go (under 100k) | LanceDB-go (heavier deps) |
If two rows match your situation, pick the engine that wins the row you’d be most upset about losing. For most “build a Go-binary RAG tool” projects, that’s “all three OSes” + “no CGO surprises”, which lands you on sqlite-vec.
End-to-end RAG walkthrough with sqlite-vec
This is the smallest amount of code that actually works as a retrieval-augmented generation pipeline in Go. It’s deliberately stripped down — no chunking strategy, no reranker, no streaming. You can layer those on top once the bones work.
package main
import (
"context"
"database/sql"
"encoding/json"
"fmt"
"log"
_ "github.com/asg017/sqlite-vec-go-bindings/cgo"
_ "github.com/mattn/go-sqlite3"
"github.com/anthropics/anthropic-sdk-go"
"github.com/anthropics/anthropic-sdk-go/option"
)
const dim = 1024 // matches Cohere embed-multilingual-v3
func main() {
db, err := sql.Open("sqlite3", "rag.db?_journal=WAL&_busy_timeout=5000")
must(err)
defer db.Close()
// 1. Schema — chunks table + virtual vec table
_, err = db.Exec(fmt.Sprintf(`
CREATE TABLE IF NOT EXISTS chunks (
id INTEGER PRIMARY KEY AUTOINCREMENT,
doc TEXT NOT NULL,
text TEXT NOT NULL
);
CREATE VIRTUAL TABLE IF NOT EXISTS vec_chunks USING vec0(
id INTEGER PRIMARY KEY,
emb FLOAT[%d]
);
`, dim))
must(err)
// 2. Ingest — embed each chunk and write both rows in a single transaction
chunks := loadChunks() // your splitter; e.g. ~500-token windows
tx, err := db.Begin()
must(err)
for _, c := range chunks {
emb := embed(c.Text) // []float32 of length `dim`
res, err := tx.Exec(`INSERT INTO chunks(doc, text) VALUES (?, ?)`, c.Doc, c.Text)
must(err)
id, _ := res.LastInsertId()
blob, _ := json.Marshal(emb) // sqlite-vec accepts JSON arrays as the float vector
_, err = tx.Exec(`INSERT INTO vec_chunks(id, emb) VALUES (?, ?)`, id, string(blob))
must(err)
}
must(tx.Commit())
// 3. Retrieve — kNN over the embedded query
question := "What is the BTRC IMEI check process?"
qEmb, _ := json.Marshal(embed(question))
rows, err := db.Query(`
SELECT chunks.doc, chunks.text, vec_distance_cosine(vec_chunks.emb, ?) AS d
FROM vec_chunks
JOIN chunks ON chunks.id = vec_chunks.id
ORDER BY d
LIMIT 5
`, string(qEmb))
must(err)
var contexts []string
for rows.Next() {
var doc, text string
var d float64
_ = rows.Scan(&doc, &text, &d)
contexts = append(contexts, fmt.Sprintf("[%s] %s", doc, text))
}
rows.Close()
// 4. Generate — feed top-k context into Claude
client := anthropic.NewClient(option.WithAPIKey("sk-ant-..."))
resp, err := client.Messages.New(context.Background(), anthropic.MessageNewParams{
Model: anthropic.F(anthropic.ModelClaudeSonnet4_6),
MaxTokens: anthropic.F(int64(1024)),
System: anthropic.F([]anthropic.TextBlockParam{{
Type: anthropic.F(anthropic.TextBlockParamTypeText),
Text: anthropic.F("Answer using only the supplied <context>. Cite the [doc] tags."),
}}),
Messages: anthropic.F([]anthropic.MessageParam{
anthropic.NewUserMessage(anthropic.NewTextBlock(
fmt.Sprintf("<context>\n%s\n</context>\n\nQuestion: %s", joinLines(contexts), question),
)),
}),
})
must(err)
for _, b := range resp.Content {
if b.Type == anthropic.ContentBlockTypeText {
fmt.Println(b.Text)
}
}
}
func must(err error) { if err != nil { log.Fatal(err) } }
Three things this skeleton does that toy examples skip:
- WAL journal mode + busy timeout so concurrent reads don’t block writes during ingest.
- Both inserts (
chunks+vec_chunks) in one transaction — without it, a crash mid-ingest leaves orphaned vector rows thatJOINwill silently drop. vec_distance_cosinein theORDER BY— sqlite-vec also supports L2 and inner product, but cosine matches what most embedding models produce.
For production: add a chunking strategy (tiktoken or recursive splitter), a reranker (Cohere Rerank API or bge-reranker-v2-m3), and streaming responses. Those are independent of the storage choice — swap vec_chunks for chromem-go and the rest of the pipeline doesn’t change.
Takeaways
If you are in a similar spot, the questions worth asking yourself are:
- Does your dataset fit in RAM? If yes and it is small, chromem-go is the easiest path.
- Do you need millisecond latency? If yes, go-libsql is the only embedded option that delivers it.
- Do you need Windows support? That single question eliminates go-libsql today and pushes you to sqlite-vec.
- Do you need analytical SQL or full-text search alongside vectors? That changes the answer to DuckDB or Bleve respectively.
The embedded vector database space is young but the options are real. Pick the one that matches your actual constraints, not the one with the best benchmarks for a workload you do not have.