2 posts tagged with "partitioning"

Spice v1.8.2 (Oct 21, 2025)

October 21, 2025 · 5 min read

Token Plumber at Spice AI

Announcing the release of Spice v1.8.2! 🔍

Spice v1.8.2 is a patch release focused on reliability, validation, performance, and bug fixes, with improvements across DuckDB acceleration, S3 Vectors, document tables, and HTTP search.

What's New in v1.8.2

Support Table Relations in `/v1/search` HTTP Endpoint

Spice now supports table relations for the additional_columns and where parameters in the /v1/search endpoint. This enables improved search for multi-dataset use cases, where filters and columns can be used on specific datasets.

Example:

curl 'http://localhost:8090/v1/search' \
    -H 'Content-Type: application/json' \
    -H 'Accept: application/json' -d '{
        "text": "hello world",
        "additional_columns": ["tbl1.foo", "tbl2.bar", "baz"],
        "where": "tbl1.foo > 100000",
        "limit": 5
    }'

In this example, search results from the tbl1 dataset will include columns foo and baz, where foo > 100000. For tbl2, columns bar and baz will be returned.

DuckDB Data Accelerator Table Partitioning & Indexing

Configurable DuckDB Index Scan: DuckDB acceleration now supports configurable duckdb_index_scan_percentage and duckdb_index_scan_max_count parameters, supporting fine-tuning of index scan behavior for improved query performance.

Example:

datasets:
  - from: postgres:my_table
    name: my_table
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      params:
        # When combined, DuckDB will use an index scan when the number of qualifying rows is less than the maximum of these two thresholds
        duckdb_index_scan_percentage: '0.10' # 10% as decimal
        duckdb_index_scan_max_count: '1000'

Hive-Style Partitioning: In file-partitioned mode, the DuckDB data accelerator uses Hive-style partitioning for more efficient file management.
Table-Based Partitioning: Spice now supports partitioning DuckDB accelerations within a single file. This approach maintains ACID guarantees for full and append mode refreshes, while optimizing resource usage and improving query performance. Configure via the partition_mode parameter:

datasets:
  - from: file:test_data.parquet
    name: test_data
    params:
      file_format: parquet
    acceleration:
      enabled: true
      engine: duckdb
      mode: file
      params:
        partition_mode: tables
      partition_by:
        - bucket(100, Field1)

S3 Vectors Reliability

Race Condition Fix: Resolved a race condition in S3 Vectors index and bucket creation. The runtime also now checks if an index or bucket exists after a ConflictException, ensuring robust error handling during index creation and improving reliability for large-scale multi-index vector search.

Document Table Improvements

Primary Key Update: Document tables now use the location column as the primary key, improving performance, consistency, and query reliability.

Additional Improvements & Bugfixes

Reliability: Improved error handling and resource checks for S3 Vectors and DuckDB acceleration.
Validation: Expanded validation for partitioning and index creation.
Performance: Optimized partition refresh and index scan logic.
Bugfix: Don't nullify DuckDB release callbacks for schemas.

Contributors

Breaking Changes

No breaking changes.

Cookbook Updates

No major cookbook updates.

The Spice Cookbook includes 81 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.8.2, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.8.2 image:

docker pull spiceai/spiceai:1.8.2

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

🎉 Spice is now available in the AWS Marketplace!

What's Changed

Changelog

Update mongo config for benchmarks by @krinart in #7546
Configurable DuckDB duckdb_index_scan_percentage & duckdb_index_scan_max_count by @lukekim in #7551
Fix race condition in S3 Vectors index and bucket creation by @kczimm in #7577
Use 'location' as primary key for document tables by @Jeadie in #7567
Update official Docker builds to use release binaries by @phillipleblanc in #7597
Hive-style partitioning for DuckDB file mode by @kczimm in #7563
New Generate Changelog workflow by @krinart in #7562
Add support for DuckDB table-based partitioning by @sgrebnov in #7581
DuckDB table partitioning: delete partitions that no longer exist after full refresh by @sgrebnov in #7614
Rename duckdb_partition_mode to partition_mode param by @sgrebnov in #7622
Fix license issue in table-providers by @phillipleblanc in #7620
Make DuckDB table partition data write threshold configurable by @sgrebnov in #7626
fix: Don't nullify DuckDB release callbacks for schemas by @peasee in #7628
Fix integration tests by reverting the use of batch inserts w/ prepared statements by @phillipleblanc in #7630
Return TableProvider from CandidateGeneration::search by @Jeadie in #7559
Handle table relations in HTTP v1/search by @Jeadie in #7615

Spice v1.5.1 (July 28, 2025)

July 29, 2025 · 5 min read

Jack Eadie

Token Plumber at Spice AI

Announcing the release of Spice v1.5.1! 🔑

Spice v1.5.1 expands the GitHub data connector to include pull-request comments, adds a configurable rate limiting for AWS Bedrock embedding models, expands partition pruning with inequality operators, and adds client-supplied cache keys for granular caching control in the HTTP and Arrow Flight SQL APIs.

What's New in v1.5.1

GitHub Data Connector Pull Request Comments: Configure GitHub pulls datasets to include comments.

Example Spicepod.yaml:

datasets:
  - from: github:github.com/spiceai/spiceai/pulls
    name: spiceai.pulls
    params:
      github_include_comments: all # 'review', 'discussion', or 'none'. Defaults to 'none'.
      github_max_comments_fetched: '25' # Defaults to 100
      # ...

For details, see the GitHub Data Connector documentation.

AWS Bedrock Embedding Models Invocation Control: Improved rate limiting control for AWS Bedrock embedding models with max_concurrent_invocations configuration.

embeddings:
  - from: bedrock:cohere.embed-english-v3
    name: cohere-embeddings
    params:
      max_concurrent_invocations: '41'
      # ...

For details, see the AWS Bedrock Embeddings Model Provider documentation.

Improved Query Partitioning: Expanded partition pruning support with additional inequality operators (e.g. >, >=, <, <=).

For details, see the Query Partitioning documentation.

Client-Supplied Cache Keys: Support for a new Spice-Cache-Key header/metadata-key in the HTTP and Arrow Flight SQL query APIs to for fine-grained client-side caching control.

Example HTTP API usage:

$ curl -vvS -XPOST http://localhost:8090/v1/sql \
-H"spice-cache-key: 1851400_20170216_north_america" \
-d "select * from scihub_journals_accessed
    where user_id = '1851400'
      and date_trunc('DAY', timestamp) = '2017-02-16'
      and city = 'New York';"

Example Response:

< HTTP/1.1 200 OK
< content-type: application/json
< x-cache: Hit from spiceai
< results-cache-status: HIT
< vary: Spice-Cache-Key
< vary: origin, access-control-request-method, access-control-request-headers
< content-length: 604
< date: Wed, 23 Jul 2025 20:26:12 GMT
<
[{
"timestamp": "2017-02-16 09:55:06",
"doi": "10.1155/2012/650929",
"ip_identifier": 1000856,
"user_id": 1851400,
"country": "United States",
"city": "New York",
"longitude": 40.7830603,
"latitude": -73.9712488
},
...
]

For details, see the Cache Control documentation.

Contributors

New Contributors

@varunguleriaCodes made their first contribution in github.com/spiceai/spiceai/pull/6383

Breaking Changes

Cookbook Updates

No new recipes added in this release.

The Spice Cookbook includes 74 recipes to help you get started with Spice quickly and easily.

Upgrading

To upgrade to v1.5.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.5.1 image:

docker pull spiceai/spiceai:1.5.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changed

Dependencies

No major dependency updates.

Changelog

Fix refresh via Api when dataset is already accelerated and no refresh interval is set by @sgrebnov in #6549
Add support for custom GraphQL unnesting behavior by @Advayp in #6540
Regex Update to disallow hyphens dataset names by @varunguleriaCodes in #6383
Enforce max limit on comments fetched per PR by @Advayp in #6580
Fix accelerated refresh issue by @Advayp in #6590
Enable configurations of max invocations for Bedrock models by @Advayp in #6592
Client-supplied cache keys (Spice-Cache-Key) by @mach-kernel in #6579
Improved partition pruning by @kczimm in #6582
Fix retention filter when both retention_sql and period are set by @sgrebnov in #6595
Initial support for PR comments by @Advayp in #6569
chore: Update croner by @peasee in #6547
fix databricks streaming for Claude model by @peasee in #6601
Remove FullTextUDTFAnalyzerRule and move FTS code into search crate by @jeadie in #6596
Remove download of legacy sentence transformers config by @jeadie in #6605
re-add snapshot tests by @jeadie
Embedding column config to support client-specified vector sizes by @mach-kernel in #6610
Fix mismatch in columns for the GitHub PR table type by @Advayp in #6616
bump version to 1.5.1 by @phillipleblanc
fix issues with cherry-picking by @jeadie
Add integration tests for GitHub PRs with comments by @Advayp in #6581
Add view name to view creation errors by @lukekim in #6611
CDC: Compute embeddings on ingest by @mach-kernel in #6612

What's New in v1.8.2​

Support Table Relations in /v1/search HTTP Endpoint​

DuckDB Data Accelerator Table Partitioning & Indexing​

S3 Vectors Reliability​

Document Table Improvements​

Additional Improvements & Bugfixes​

Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Changelog​

What's New in v1.5.1​

Contributors​

New Contributors​

Breaking Changes​

Cookbook Updates​

Upgrading​

What's Changed​

Dependencies​

Changelog​

What's New in v1.8.2

Support Table Relations in `/v1/search` HTTP Endpoint

DuckDB Data Accelerator Table Partitioning & Indexing

S3 Vectors Reliability

Document Table Improvements

Additional Improvements & Bugfixes

Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Changelog

What's New in v1.5.1

Contributors

New Contributors

Breaking Changes

Cookbook Updates

Upgrading

What's Changed

Dependencies

Changelog