rocksdb/java
Josh Kang 3ad23b2d94 Support automated interpolation search (#14383)
Summary:
Add automatic per-block interpolation search selection (`kAuto` mode) for index blocks. During SST construction, each index block's key distribution is analyzed using the coefficient of variation (CV) of gaps between restart-point keys. Blocks with uniformly distributed keys are flagged via a new bit in the data block footer, and at read time, `kAuto` resolves to interpolation search for uniform blocks and binary search otherwise.

## Key changes

- **New `BlockSearchType::kAuto` enum value**: Resolves per-block at read time to either `kInterpolation` or `kBinary` based on the block's uniformity flag. Falls back to `kBinary` on older versions that don't recognize it.
- **Write-path uniformity analysis**: `BlockBuilder::ScanForUniformity()` uses Welford's online algorithm to incrementally compute the CV of key gaps at restart points. The result is stored in a new bit (bit 30) of the data block footer's packed restart count.
- **New table option `uniform_cv_threshold`** (default: -1 `disabled`): Controls how strict the uniformity check is. Set to negative to disable. Exposed in C++, Java (JNI), and `db_bench`.
- **Code reorganization**: Block entry decode helpers (`DecodeEntry`, `DecodeKey`, `DecodeKeyV4`, `ReadBe64FromKey`) moved from `block.cc` to a new shared header `block_util.h` so they can be reused by `BlockBuilder` on the write path.
- **New histogram `BLOCK_KEY_DISTRIBUTION_CV`**: Records the CV (scaled by 10000) of each index block's key distribution for observability.
- **Java bindings**: `IndexSearchType.kAuto`, `uniformCvThreshold` getter/setter, JNI portal constructor signature updated, and `HistogramType.BLOCK_KEY_DISTRIBUTION_CV` added.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/14383

Test Plan:
- `IndexBlockTest.IndexValueEncodingTest` parameterized to include `kAuto` search type alongside `kBinary` and `kInterpolation`, verifying correct seek/iteration behavior across all combinations of key distributions, restart intervals, and key lengths.
- Uniformity detection validated: blocks with uniform key distribution correctly set `is_uniform = true`, blocks with clustered/non-uniform keys set `is_uniform = false`.
- Stress test coverage
- Updated check_format_compatible to also include a "uniform" dataset. By default using uniform_cv_threshold=-1 does not result in an incompatibility issues. When manually changing the threshold (e.g. `uniform_cv_threshold=1000`), I see `bad block contents`, which is expected

## Benchmark

readrandom with `fillrandom,compact -seed=1 --statistics`:

| Benchmark | Branch | Params | avg ops/s | % change vs main | CV P50 |
|-----------|--------|--------|-----------|------------------|--------|
| readrandom | main | `binary_search, shortening=1` | 335,791 | baseline | N/A |
| readrandom | feature | `binary_search, shortening=1` (default) | 335,749 | -0.0% | 1,500 |
| readrandom | feature | `auto_search, shortening=1` (kAuto) | 366,832 | **+9.2%** | 1,500 |
| readrandom | feature | `interpolation_search, shortening=1` | 366,598 | **+9.2%** | 1,500 |
| readrandom | feature | `auto_search, shortening=2` (kAuto) | 344,631 | **+2.6%** | 1,030,000 |
| readrandom | feature | `interpolation_search, shortening=2` | 201,178 | **-40.1%** | 1,030,000 |

As seen with shortening=2, a non-uniform distribution produces a high CV, which does not use interpolation search.

## Write benchmark

There is a write overhead which scans each restart entry for a block upon Finish. In practice this is very low because currently it is only applied to index blocks.

See cpu profile (https://fburl.com/strobelight/io5hwj9h) here of `-benchmarks=fillseq,compact -compression_type=none -disable_wal=1`. Only 0.08% attributed to `ScanForUniformity`.

Reviewed By: pdillinger

Differential Revision: D94738890

Pulled By: joshkang97

fbshipit-source-id: 9661ac593c5fef89d49f3a8a027f1338a0c96766
2026-03-06 10:13:51 -08:00
..
benchmark/src/main/java/org/rocksdb/benchmark Remove deprecated Options::access_hint_on_compaction_start (#11654) 2024-02-05 13:35:19 -08:00
crossbuild Pass build parallelism flag to Docker builds (#12392) 2024-02-28 12:51:00 -08:00
jmh JNI get_helper code sharing / multiGet() use efficient batch C++ support (#12344) 2024-03-12 12:42:08 -07:00
rocksjni Support automated interpolation search (#14383) 2026-03-06 10:13:51 -08:00
samples/src/main/java Remove compressed block cache (#11117) 2023-01-24 17:09:19 -08:00
src Support automated interpolation search (#14383) 2026-03-06 10:13:51 -08:00
CMakeLists.txt Add interpolation search as an alternative to binary search (#14247) 2026-02-13 17:15:10 -08:00
GetPutBenchmarks.md Java API consistency between RocksDB.put() , .merge() and Transaction.put() , .merge() (#11019) 2023-12-11 11:03:17 -08:00
HISTORY-JAVA.md Update JAVA-HISTORY.md for v3.13 2015-08-04 18:12:58 -07:00
jdb_bench.sh Add copyright headers per FB open-source checkup tool. (#5199) 2019-04-18 10:55:01 -07:00
Makefile Add native logger support to RocksJava (#12213) 2024-01-17 17:51:36 -08:00
pmd-rules.xml Java API consistency between RocksDB.put() , .merge() and Transaction.put() , .merge() (#11019) 2023-12-11 11:03:17 -08:00
pom.xml.template Perform java static checks in CI (#11221) 2023-10-17 10:04:35 -07:00
RELEASE.md Add shared library for musl-libc (#3143) 2019-11-26 18:24:09 -08:00
spotbugs-exclude.xml Perform java static checks in CI (#11221) 2023-10-17 10:04:35 -07:00
understanding_options.md New-style blob option bindings, Java option getter and improve/fix option parsing (#8999) 2021-10-19 09:21:52 -07:00