Multi-Collection¶
Large deployments often shard data across multiple ChromaDB collections. The multi-collection module provides adapters and routing helpers that let you fan out a single ChromaSQL query across those shards.
Core Concepts¶
| Component | Description |
|---|---|
CollectionRouter |
Decides which collections should be queried based on the parsed AST. |
AsyncCollectionProvider |
Async interface for fetching collection handles. |
execute_multi_collection |
Orchestrates parse → plan → execute across collections and merges results. |
Metadata-Based Routing¶
Use the built-in MetadataFieldRouter to route by discriminator values stored
in metadata:
from chromasql.adapters import MetadataFieldRouter
from chromasql.multi_collection import execute_multi_collection
from idxr.query_lib.async_multi_collection_adapter import AsyncMultiCollectionAdapter
router = MetadataFieldRouter(
query_config=config,
field_path=("model",),
fallback_to_all=True,
)
adapter = AsyncMultiCollectionAdapter(async_client)
result = await execute_multi_collection(
query_str="SELECT id FROM demo WHERE metadata.model IN ('Table', 'Field');",
router=router,
collection_provider=adapter,
embed_fn=my_embed_fn,
)
The router reads the parsed query, extracts metadata.model values, maps them
to collection names using the query config, and executes the plan in parallel.
Custom Routers¶
Implement the CollectionRouter protocol for bespoke routing logic:
from chromasql.multi_collection import CollectionRouter
from chromasql.analysis import extract_metadata_values
class TenantRouter(CollectionRouter):
def __init__(self, mapping):
self.mapping = mapping
def route(self, query):
tenant_ids = extract_metadata_values(query, field_path=("tenant_id",))
if not tenant_ids:
return None # query all collections
return [self.mapping[tenant] for tenant in tenant_ids if tenant in self.mapping]
Merging Strategy¶
execute_multi_collection collects rows from all collections, applies score
thresholds, sorts by distance (or metadata order), respects LIMIT/OFFSET, and
projects the requested columns.
The merged ExecutionResult contains:
rows: projected rows after global ordering/pagination.raw: summary payload with counts and metadata, useful for diagnostics.
Error Handling¶
- Failed collections are logged and skipped; results from successful collections are still returned.
- When
fallback_to_allis disabled and no discriminator is present, the router raises an error to avoid unexpected fan-out.
Refer to tests/chromasql/test_multi_collection.py and
tests/chromasql/test_or_routing_integration.py for exhaustive behavior.
- Need Help?
Open a GitHub issue with the steps to reproduce and we’ll help you debug it.