Skip to main content

2 posts tagged with "partitioning"

View All Tags

Spice v1.8.2 (Oct 21, 2025)

ยท 5 min read
Jack Eadie
Token Plumber at Spice AI

Announcing the release of Spice v1.8.2! ๐Ÿ”

Spice v1.8.2 is a patch release focused on reliability, validation, performance, and bug fixes, with improvements across DuckDB acceleration, S3 Vectors, document tables, and HTTP search.

What's New in v1.8.2โ€‹

Support Table Relations in /v1/search HTTP Endpointโ€‹

Spice now supports table relations for the additional_columns and where parameters in the /v1/search endpoint. This enables improved search for multi-dataset use cases, where filters and columns can be used on specific datasets.

Example:

curl 'http://localhost:8090/v1/search' \
-H 'Content-Type: application/json' \
-H 'Accept: application/json' -d '{
"text": "hello world",
"additional_columns": ["tbl1.foo", "tbl2.bar", "baz"],
"where": "tbl1.foo > 100000",
"limit": 5
}'

In this example, search results from the tbl1 dataset will include columns foo and baz, where foo > 100000. For tbl2, columns bar and baz will be returned.

DuckDB Data Accelerator Table Partitioning & Indexingโ€‹

  • Configurable DuckDB Index Scan: DuckDB acceleration now supports configurable duckdb_index_scan_percentage and duckdb_index_scan_max_count parameters, supporting fine-tuning of index scan behavior for improved query performance.

Example:

datasets:
- from: postgres:my_table
name: my_table
acceleration:
enabled: true
engine: duckdb
mode: file
params:
# When combined, DuckDB will use an index scan when the number of qualifying rows is less than the maximum of these two thresholds
duckdb_index_scan_percentage: '0.10' # 10% as decimal
duckdb_index_scan_max_count: '1000'
  • Hive-Style Partitioning: In file-partitioned mode, the DuckDB data accelerator uses Hive-style partitioning for more efficient file management.

  • Table-Based Partitioning: Spice now supports partitioning DuckDB accelerations within a single file. This approach maintains ACID guarantees for full and append mode refreshes, while optimizing resource usage and improving query performance. Configure via the partition_mode parameter:

datasets:
- from: file:test_data.parquet
name: test_data
params:
file_format: parquet
acceleration:
enabled: true
engine: duckdb
mode: file
params:
partition_mode: tables
partition_by:
- bucket(100, Field1)

S3 Vectors Reliabilityโ€‹

  • Race Condition Fix: Resolved a race condition in S3 Vectors index and bucket creation. The runtime also now checks if an index or bucket exists after a ConflictException, ensuring robust error handling during index creation and improving reliability for large-scale multi-index vector search.

Document Table Improvementsโ€‹

  • Primary Key Update: Document tables now use the location column as the primary key, improving performance, consistency, and query reliability.

Additional Improvements & Bugfixesโ€‹

  • Reliability: Improved error handling and resource checks for S3 Vectors and DuckDB acceleration.
  • Validation: Expanded validation for partitioning and index creation.
  • Performance: Optimized partition refresh and index scan logic.
  • Bugfix: Don't nullify DuckDB release callbacks for schemas.

Contributorsโ€‹

Breaking Changesโ€‹

No breaking changes.

Cookbook Updatesโ€‹

No major cookbook updates.

The Spice Cookbook includes 81 recipes to help you get started with Spice quickly and easily.

Upgradingโ€‹

To upgrade to v1.8.2, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.8.2 image:

docker pull spiceai/spiceai:1.8.2

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

AWS Marketplace:

๐ŸŽ‰ Spice is now available in the AWS Marketplace!

What's Changedโ€‹

Changelogโ€‹

  • Update mongo config for benchmarks by @krinart in #7546
  • Configurable DuckDB duckdb_index_scan_percentage & duckdb_index_scan_max_count by @lukekim in #7551
  • Fix race condition in S3 Vectors index and bucket creation by @kczimm in #7577
  • Use 'location' as primary key for document tables by @Jeadie in #7567
  • Update official Docker builds to use release binaries by @phillipleblanc in #7597
  • Hive-style partitioning for DuckDB file mode by @kczimm in #7563
  • New Generate Changelog workflow by @krinart in #7562
  • Add support for DuckDB table-based partitioning by @sgrebnov in #7581
  • DuckDB table partitioning: delete partitions that no longer exist after full refresh by @sgrebnov in #7614
  • Rename duckdb_partition_mode to partition_mode param by @sgrebnov in #7622
  • Fix license issue in table-providers by @phillipleblanc in #7620
  • Make DuckDB table partition data write threshold configurable by @sgrebnov in #7626
  • fix: Don't nullify DuckDB release callbacks for schemas by @peasee in #7628
  • Fix integration tests by reverting the use of batch inserts w/ prepared statements by @phillipleblanc in #7630
  • Return TableProvider from CandidateGeneration::search by @Jeadie in #7559
  • Handle table relations in HTTP v1/search by @Jeadie in #7615

Spice v1.5.1 (July 28, 2025)

ยท 5 min read
Jack Eadie
Token Plumber at Spice AI

Announcing the release of Spice v1.5.1! ๐Ÿ”‘

Spice v1.5.1 expands the GitHub data connector to include pull-request comments, adds a configurable rate limiting for AWS Bedrock embedding models, expands partition pruning with inequality operators, and adds client-supplied cache keys for granular caching control in the HTTP and Arrow Flight SQL APIs.

What's New in v1.5.1โ€‹

GitHub Data Connector Pull Request Comments: Configure GitHub pulls datasets to include comments.

Example Spicepod.yaml:

datasets:
- from: github:github.com/spiceai/spiceai/pulls
name: spiceai.pulls
params:
github_include_comments: all # 'review', 'discussion', or 'none'. Defaults to 'none'.
github_max_comments_fetched: '25' # Defaults to 100
# ...

For details, see the GitHub Data Connector documentation.

AWS Bedrock Embedding Models Invocation Control: Improved rate limiting control for AWS Bedrock embedding models with max_concurrent_invocations configuration.

embeddings:
- from: bedrock:cohere.embed-english-v3
name: cohere-embeddings
params:
max_concurrent_invocations: '41'
# ...

For details, see the AWS Bedrock Embeddings Model Provider documentation.

Improved Query Partitioning: Expanded partition pruning support with additional inequality operators (e.g. >, >=, <, <=).

For details, see the Query Partitioning documentation.

Client-Supplied Cache Keys: Support for a new Spice-Cache-Key header/metadata-key in the HTTP and Arrow Flight SQL query APIs to for fine-grained client-side caching control.

Example HTTP API usage:

$ curl -vvS -XPOST http://localhost:8090/v1/sql \
-H"spice-cache-key: 1851400_20170216_north_america" \
-d "select * from scihub_journals_accessed
where user_id = '1851400'
and date_trunc('DAY', timestamp) = '2017-02-16'
and city = 'New York';"

Example Response:

< HTTP/1.1 200 OK
< content-type: application/json
< x-cache: Hit from spiceai
< results-cache-status: HIT
< vary: Spice-Cache-Key
< vary: origin, access-control-request-method, access-control-request-headers
< content-length: 604
< date: Wed, 23 Jul 2025 20:26:12 GMT
<
[{
"timestamp": "2017-02-16 09:55:06",
"doi": "10.1155/2012/650929",
"ip_identifier": 1000856,
"user_id": 1851400,
"country": "United States",
"city": "New York",
"longitude": 40.7830603,
"latitude": -73.9712488
},
...
]

For details, see the Cache Control documentation.

Contributorsโ€‹

New Contributorsโ€‹

Breaking Changesโ€‹

  • N/A

Cookbook Updatesโ€‹

No new recipes added in this release.

The Spice Cookbook includes 74 recipes to help you get started with Spice quickly and easily.

Upgradingโ€‹

To upgrade to v1.5.1, use one of the following methods:

CLI:

spice upgrade

Homebrew:

brew upgrade spiceai/spiceai/spice

Docker:

Pull the spiceai/spiceai:1.5.1 image:

docker pull spiceai/spiceai:1.5.1

For available tags, see DockerHub.

Helm:

helm repo update
helm upgrade spiceai spiceai/spiceai

What's Changedโ€‹

Dependenciesโ€‹

No major dependency updates.

Changelogโ€‹

  • Fix refresh via Api when dataset is already accelerated and no refresh interval is set by @sgrebnov in #6549
  • Add support for custom GraphQL unnesting behavior by @Advayp in #6540
  • Regex Update to disallow hyphens dataset names by @varunguleriaCodes in #6383
  • Enforce max limit on comments fetched per PR by @Advayp in #6580
  • Fix accelerated refresh issue by @Advayp in #6590
  • Enable configurations of max invocations for Bedrock models by @Advayp in #6592
  • Client-supplied cache keys (Spice-Cache-Key) by @mach-kernel in #6579
  • Improved partition pruning by @kczimm in #6582
  • Fix retention filter when both retention_sql and period are set by @sgrebnov in #6595
  • Initial support for PR comments by @Advayp in #6569
  • chore: Update croner by @peasee in #6547
  • fix databricks streaming for Claude model by @peasee in #6601
  • Remove FullTextUDTFAnalyzerRule and move FTS code into search crate by @jeadie in #6596
  • Remove download of legacy sentence transformers config by @jeadie in #6605
  • re-add snapshot tests by @jeadie
  • Embedding column config to support client-specified vector sizes by @mach-kernel in #6610
  • Fix mismatch in columns for the GitHub PR table type by @Advayp in #6616
  • bump version to 1.5.1 by @phillipleblanc
  • fix issues with cherry-picking by @jeadie
  • Add integration tests for GitHub PRs with comments by @Advayp in #6581
  • Add view name to view creation errors by @lukekim in #6611
  • CDC: Compute embeddings on ingest by @mach-kernel in #6612