Vectorize Config Reference¶

When you index directly from raw CSVs instead of a partition manifest, idxr vectorize reads a JSON config that mirrors the prepare-datasets layout with a couple of vectorization-specific knobs.

{
  "Contract": {
    "path": "datasets/contracts.csv",
    "columns": {
      "id": "CONTRACT_ID",
      "title": "CONTRACT_TITLE",
      "summary": "DESCRIPTION"
    },
    "truncation_strategy": "middle_out"
  }
}

Auto-generated: If you run idxr prepare_datasets new-config, the companion vectorize config is scaffolded automatically. Use that stub as the starting point whenever possible.

Field	Required	Description
`path`	✅	Source file path. Leave blank to skip a model temporarily.
`columns`	✅	Mapping of model field names to CSV headers. Should align with the prepare-datasets config.
`truncation_strategy`	optional	Override the truncation behaviour for this model (`end`, `start`, `middle_out`, `sentences`). Leave `null` for the global default.

For partition-based pipelines you do not need this config—just point idxr vectorize index at the manifest emitted by idxr prepare_datasets.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search