--batch-size¶
Why we added this flag: embedding APIs have rate and token limits; controlling batch size lets you trade throughput for stability.
What it does¶
- Sets the maximum number of documents per embedding request (default:
128). - Works alongside
--token-limitto prevent oversized batches. - Smaller batches reduce retry storms when your model registry contains very large documents.
Typical usage¶
idxr vectorize index \
--model "$IDXR_MODEL" \
--partition-manifest workdir/partitions/manifest.json \
--partition-out-dir workdir/chroma_partitions \
--batch-size 300 \
--truncation-strategy middle_out
Tips¶
- Increase cautiously—monitor OpenAI rate limits and Chroma ingest timing.
- Pair with
--parallel-partitionsto achieve higher concurrency without breaching per-request limits.