--directory-size

Why we added this flag: large exports can overwhelm filesystem limits; directory sizing caps how many rows a partition may contain so runs stay manageable.

What it does

  • Sets the maximum number of rows per model per partition directory.
  • 0 (default) means unlimited rows; any positive integer enforces a split.
  • Useful when chunking massive tables for parallel indexing or archiving.

Typical usage

idxr prepare_datasets \
  --model "$IDXR_MODEL" \
  --config configs/full_export.json \
  --output-root workdir/partitions \
  --directory-size 500000

Tips

  • Match the partition size to the throughput of your vectorization job—smaller chunks resume faster after failures.
  • Monitor manifest growth; each new directory is recorded with schema and timestamp metadata.