Getting Started¶

This guide walks you from a clean environment to a working idxr pipeline in minutes.

Installation¶

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install idxr

Optional helpers¶

Install mkdocs if you plan to build the documentation locally:

pip install mkdocs

Quickstart Example¶

The public idxr repository ships a miniature dataset that exercises the full lifecycle—manifest generation, partitioning, and vectorization into an in-memory Chroma collection.

# Clone the repository (or download the tarball) and jump into the example
git clone https://github.com/GetAdriAI/idxr.git
cd idxr/examples/quickstart

# Create an isolated environment for the demo
python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

# Provide credentials for embeddings (required by default strategies)
export OPENAI_API_KEY="sk-your-key"

# 1. Prepare dataset partitions
idxr prepare_datasets \
  --model quickstart.registry:MODEL_REGISTRY \
  --config config/prepare_datasets_config.json \
  --output-root workdir/partitions

# 2. Index the generated partitions into a local Chroma store
idxr vectorize index \
  --model quickstart.registry:MODEL_REGISTRY \
  --partition-manifest workdir/partitions/manifest.json \
  --partition-out-dir workdir/chroma_partitions \
  --collection quickstart \
  --batch-size 50 \
  --resume

# 3. Inspect indexing status
idxr vectorize status \
  --model quickstart.registry:MODEL_REGISTRY \
  --partition-dir workdir/partitions \
  --partition-out-dir workdir/chroma_partitions

Each command is self-contained—configs live under examples/quickstart/config, the demo registry is in quickstart/registry.py, and the sample CSV resides in data/contracts.csv. Change file paths or models to plug in your own domain once you are comfortable with the workflow.

Querying Your Index¶

Once indexing completes, query your multi-collection index with the async query client:

# 1. Generate query configuration
idxr vectorize generate-query-config \
  --partition-out-dir workdir/chroma_partitions \
  --output query_config.json \
  --model quickstart.registry:MODEL_REGISTRY

# 2. Use in Python
python -c "
from indexer.vectorize_lib import AsyncMultiCollectionQueryClient
from pathlib import Path
import asyncio
import os

async def search():
    async with AsyncMultiCollectionQueryClient(
        config_path=Path('query_config.json'),
        client_type='http',  # or 'cloud' for ChromaDB Cloud
        http_host='localhost:8000',
    ) as client:
        results = await client.query(
            query_texts=['search term'],
            n_results=5,
            models=None,  # Query all models
        )
        for doc_id, distance in zip(results['ids'][0], results['distances'][0]):
            print(f'{doc_id}: {distance:.4f}')

asyncio.run(search())
"

Next Steps¶

Read the Prepare Datasets overview to understand how manifests are stitched together.
Explore the Vectorize overview to learn how idxr batches and truncates content.
Learn about Querying to search your multi-collection index efficiently.
Consult the argument reference pages whenever you introduce new flags into your automation.

Keys	Action
`?`	Open this help
`n`	Next page
`p`	Previous page
`s`	Search