rocksdb/java/rocksjni
Josh Kang 3ad23b2d94 Support automated interpolation search (#14383)
Summary:
Add automatic per-block interpolation search selection (`kAuto` mode) for index blocks. During SST construction, each index block's key distribution is analyzed using the coefficient of variation (CV) of gaps between restart-point keys. Blocks with uniformly distributed keys are flagged via a new bit in the data block footer, and at read time, `kAuto` resolves to interpolation search for uniform blocks and binary search otherwise.

## Key changes

- **New `BlockSearchType::kAuto` enum value**: Resolves per-block at read time to either `kInterpolation` or `kBinary` based on the block's uniformity flag. Falls back to `kBinary` on older versions that don't recognize it.
- **Write-path uniformity analysis**: `BlockBuilder::ScanForUniformity()` uses Welford's online algorithm to incrementally compute the CV of key gaps at restart points. The result is stored in a new bit (bit 30) of the data block footer's packed restart count.
- **New table option `uniform_cv_threshold`** (default: -1 `disabled`): Controls how strict the uniformity check is. Set to negative to disable. Exposed in C++, Java (JNI), and `db_bench`.
- **Code reorganization**: Block entry decode helpers (`DecodeEntry`, `DecodeKey`, `DecodeKeyV4`, `ReadBe64FromKey`) moved from `block.cc` to a new shared header `block_util.h` so they can be reused by `BlockBuilder` on the write path.
- **New histogram `BLOCK_KEY_DISTRIBUTION_CV`**: Records the CV (scaled by 10000) of each index block's key distribution for observability.
- **Java bindings**: `IndexSearchType.kAuto`, `uniformCvThreshold` getter/setter, JNI portal constructor signature updated, and `HistogramType.BLOCK_KEY_DISTRIBUTION_CV` added.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/14383

Test Plan:
- `IndexBlockTest.IndexValueEncodingTest` parameterized to include `kAuto` search type alongside `kBinary` and `kInterpolation`, verifying correct seek/iteration behavior across all combinations of key distributions, restart intervals, and key lengths.
- Uniformity detection validated: blocks with uniform key distribution correctly set `is_uniform = true`, blocks with clustered/non-uniform keys set `is_uniform = false`.
- Stress test coverage
- Updated check_format_compatible to also include a "uniform" dataset. By default using uniform_cv_threshold=-1 does not result in an incompatibility issues. When manually changing the threshold (e.g. `uniform_cv_threshold=1000`), I see `bad block contents`, which is expected

## Benchmark

readrandom with `fillrandom,compact -seed=1 --statistics`:

| Benchmark | Branch | Params | avg ops/s | % change vs main | CV P50 |
|-----------|--------|--------|-----------|------------------|--------|
| readrandom | main | `binary_search, shortening=1` | 335,791 | baseline | N/A |
| readrandom | feature | `binary_search, shortening=1` (default) | 335,749 | -0.0% | 1,500 |
| readrandom | feature | `auto_search, shortening=1` (kAuto) | 366,832 | **+9.2%** | 1,500 |
| readrandom | feature | `interpolation_search, shortening=1` | 366,598 | **+9.2%** | 1,500 |
| readrandom | feature | `auto_search, shortening=2` (kAuto) | 344,631 | **+2.6%** | 1,030,000 |
| readrandom | feature | `interpolation_search, shortening=2` | 201,178 | **-40.1%** | 1,030,000 |

As seen with shortening=2, a non-uniform distribution produces a high CV, which does not use interpolation search.

## Write benchmark

There is a write overhead which scans each restart entry for a block upon Finish. In practice this is very low because currently it is only applied to index blocks.

See cpu profile (https://fburl.com/strobelight/io5hwj9h) here of `-benchmarks=fillseq,compact -compression_type=none -disable_wal=1`. Only 0.08% attributed to `ScanForUniformity`.

Reviewed By: pdillinger

Differential Revision: D94738890

Pulled By: joshkang97

fbshipit-source-id: 9661ac593c5fef89d49f3a8a027f1338a0c96766
2026-03-06 10:13:51 -08:00
..
backup_engine_options.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
backupenginejni.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
cache.cc Put Cache and CacheWrapper in new public header (#11192) 2023-02-09 12:12:02 -08:00
cassandra_compactionfilterjni.cc Fix pointer to jlong conversion in 32 bit OS (#9396) 2022-03-01 09:02:15 -08:00
cassandra_value_operator.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
checkpoint.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
clock_cache.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
columnfamilyhandle.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
compact_range_options.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
compaction_filter.cc Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
compaction_filter_factory.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
compaction_filter_factory_jnicallback.cc Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
compaction_filter_factory_jnicallback.h Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
compaction_job_info.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
compaction_job_stats.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
compaction_options.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
compaction_options_fifo.cc Add a new picking algorithm in fifo compaction (#14326) 2026-02-15 10:04:58 -08:00
compaction_options_universal.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
comparator.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
comparatorjnicallback.cc Fix Java API ComparatorOptions use after delete error (#11176) 2023-02-17 13:03:41 -08:00
comparatorjnicallback.h Fix Java API ComparatorOptions use after delete error (#11176) 2023-02-17 13:03:41 -08:00
compression_options.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
concurrent_task_limiter.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
config_options.cc Reformat source files (#14331) 2026-02-13 11:56:22 -08:00
cplusplus_to_java_convert.h Fix pointer to jlong conversion in 32 bit OS (#9396) 2022-03-01 09:02:15 -08:00
env.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
env_options.cc Reformat source files (#14331) 2026-02-13 11:56:22 -08:00
event_listener.cc Fix pointer to jlong conversion in 32 bit OS (#9396) 2022-03-01 09:02:15 -08:00
event_listener_jnicallback.cc Add (& fix) some simple source code checks (#8821) 2021-09-07 21:19:27 -07:00
event_listener_jnicallback.h Add (& fix) some simple source code checks (#8821) 2021-09-07 21:19:27 -07:00
export_import_files_metadatajni.cc Add jni Support for API CreateColumnFamilyWithImport (#11646) 2023-11-06 07:38:42 -08:00
filter.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
hyper_clock_cache.cc Add HyperClockCache Java API. (#12065) 2023-11-16 15:46:31 -08:00
import_column_family_options.cc Reformat source files (#14331) 2026-02-13 11:56:22 -08:00
ingest_external_file_options.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
iterator.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
jni_multiget_helpers.cc JNI get_helper code sharing / multiGet() use efficient batch C++ support (#12344) 2024-03-12 12:42:08 -07:00
jni_multiget_helpers.h JNI get_helper code sharing / multiGet() use efficient batch C++ support (#12344) 2024-03-12 12:42:08 -07:00
jni_perf_context.cc Implement PerfContex#toString for the Java API (#12473) 2024-04-03 14:33:31 -07:00
jnicallback.cc Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
jnicallback.h Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
kv_helper.h Reformat source files (#14331) 2026-02-13 11:56:22 -08:00
loggerjnicallback.cc Add native logger support to RocksJava (#12213) 2024-01-17 17:51:36 -08:00
loggerjnicallback.h Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
lru_cache.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
memory_util.cc Reformat source files (#14331) 2026-02-13 11:56:22 -08:00
memtablejni.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
merge_operator.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
native_comparator_wrapper_test.cc Fix pointer to jlong conversion in 32 bit OS (#9396) 2022-03-01 09:02:15 -08:00
optimistic_transaction_db.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
optimistic_transaction_options.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
options.cc Add option to validate sst files in the background on DB open (#14322) 2026-03-02 16:18:14 -08:00
options_util.cc java API - load block based table config (#10826) 2023-10-12 09:39:01 -07:00
persistent_cache.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
portal.h Support automated interpolation search (#14383) 2026-03-06 10:13:51 -08:00
ratelimiterjni.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
remove_emptyvalue_compactionfilterjni.cc Fix pointer to jlong conversion in 32 bit OS (#9396) 2022-03-01 09:02:15 -08:00
restorejni.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
rocks_callback_object.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
rocksdb_exception_test.cc Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
rocksjni.cc Remove deprecated DB::Open raw pointer variants (and more) (#14335) 2026-02-17 23:33:39 -08:00
slice.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
snapshot.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
sst_file_manager.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
sst_file_reader_iterator.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
sst_file_readerjni.cc Reformat source files (#14331) 2026-02-13 11:56:22 -08:00
sst_file_writerjni.cc Reformat source files (#14331) 2026-02-13 11:56:22 -08:00
sst_partitioner.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
statistics.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
statisticsjni.cc Fix tabs and lint-ignores (#6734) 2020-04-20 11:39:31 -07:00
statisticsjni.h Remove 'virtual' when implied by 'override' (#12319) 2024-01-31 13:14:42 -08:00
stderr_logger.cc Add native logger support to RocksJava (#12213) 2024-01-17 17:51:36 -08:00
table.cc Support automated interpolation search (#14383) 2026-03-06 10:13:51 -08:00
table_filter.cc Fix pointer to jlong conversion in 32 bit OS (#9396) 2022-03-01 09:02:15 -08:00
table_filter_jnicallback.cc Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
table_filter_jnicallback.h Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
table_properties_collector_factory.cc Reformat source files (#14331) 2026-02-13 11:56:22 -08:00
table_properties_collector_factory.h Fix header files to meet Open source requirements (#12164) 2023-12-19 13:43:17 -08:00
testable_event_listener.cc Reformat source files (#14331) 2026-02-13 11:56:22 -08:00
thread_status.cc Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
trace_writer.cc Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
trace_writer_jnicallback.cc Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
trace_writer_jnicallback.h Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
transaction.cc Expose optimized TransactionBaseImpl::MultiGet through JNI (#13589) 2025-05-14 13:19:06 -07:00
transaction_db.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
transaction_db_options.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
transaction_log.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
transaction_notifier.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
transaction_notifier_jnicallback.cc Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
transaction_notifier_jnicallback.h Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
transaction_options.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
ttl.cc use nullptr instead of NULL / 0 in rocksdbjni (#12575) 2024-05-21 12:56:07 -07:00
wal_filter.cc Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
wal_filter_jnicallback.cc Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
wal_filter_jnicallback.h Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
write_batch.cc Replace ScopedArenaIterator with ScopedArenaPtr<InternalIterator> (#12470) 2024-03-22 13:40:42 -07:00
write_batch_test.cc Introduce a transaction option to skip memtable write during commit (#13144) 2024-12-05 15:00:17 -08:00
write_batch_with_index.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
write_buffer_manager.cc Change Java native methods to static (#11882) 2024-01-25 12:36:30 -08:00
writebatchhandlerjnicallback.cc Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00
writebatchhandlerjnicallback.h Run format check for *.h and *.cc files under java/ (#10851) 2022-10-25 09:26:51 -07:00