Vectorize Config Reference

When you index directly from raw CSVs instead of a partition manifest, idxr vectorize reads a JSON config that mirrors the prepare-datasets layout with a couple of vectorization-specific knobs.

{
  "Contract": {
    "path": "datasets/contracts.csv",
    "columns": {
      "id": "CONTRACT_ID",
      "title": "CONTRACT_TITLE",
      "summary": "DESCRIPTION"
    },
    "truncation_strategy": "middle_out"
  }
}

Auto-generated: If you run idxr prepare_datasets new-config, the companion vectorize config is scaffolded automatically. Use that stub as the starting point whenever possible.

Field Required Description
path Source file path. Leave blank to skip a model temporarily.
columns Mapping of model field names to CSV headers. Should align with the prepare-datasets config.
truncation_strategy optional Override the truncation behaviour for this model (end, start, middle_out, sentences). Leave null for the global default.

For partition-based pipelines you do not need this config—just point idxr vectorize index at the manifest emitted by idxr prepare_datasets.