forked from continuwuation/rocksdb
Summary:
This diff introduces the async prepare of all iterators within a MultiScan. The current state has each iterator be prepared as its needed, and with this diff, we prepare all iterators during the prepare phase of the Level Iterator, this will allow more time for each IO to be dispatched and serviced, increasing the odds that a block is ready as the scan seeks to it.
Benchmark is prefilled using
```
KEYSIZE=64
VALUESIZE=512
NUMKEYS=5000000
SCAN_SIZE=100
DISTANCE=25000
NUM_SCANS=15
THREADS=1
./db_bench --db=$DB \
--benchmarks="fillseq" \
--write_buffer_size=5242880 \
--max_write_buffer_number=4 \
--target_file_size_base=5242880 \
--disable_wal=1 --key_size=$KEYSIZE \
--value_size=$VALUESIZE --num=$NUMKEYS --threads=32
}
```
And benchmark ran is
```
run() {
echo 1 | sudo tee /proc/sys/vm/drop_caches
./db_bench --db=$DB --use_existing_db=1 \
--benchmarks=multiscan \
--disable_auto_compactions=1 --seek_nexts=$SCAN_SIZE \
--multiscan-use-async-io=1 \
--multiscan-size=$NUM_SCANS --multiscan-stride=$DISTANCE \
--key_size=$KEYSIZE --value_size=$VALUESIZE \
--num=$NUMKEYS --threads=$THREADS --duration=60 --statistics
}
```
The benchmark uses large stride sides to ensure that two scans would touch separate files. We reduce the size of the block cache to increase likelyhood of reads (and simulate larger data sets)
**Branch:**
```
Integrated BlobDB: blob cache disabled
RocksDB: version 10.8.0
Date: Tue Nov 11 13:26:29 2025
CPU: 166 * AMD EPYC-Milan Processor
CPUCache: 512 KB
Keys: 64 bytes each (+ 0 bytes user-defined timestamp)
Values: 512 bytes each (256 bytes after compression)
Entries: 5000000
Prefix: 0 bytes
Keys per prefix: 0
RawSize: 2746.6 MB (estimated)
FileSize: 1525.9 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: Snappy
Compression sampling rate: 0
Memtablerep: SkipListFactory
Perf Level: 1
------------------------------------------------
multiscan_stride = 25000
multiscan_size = 15
seek_nexts = 100
DB path: [/data/rocksdb/mydb]
multiscan : 837.941 micros/op 1193 ops/sec 60.001 seconds 71605 operations; (multscans:71605)
```
**Baseline:**
```
Set seed to 1762898809121995 because --seed was 0
Initializing RocksDB Options from the specified file
Initializing RocksDB Options from command-line flags
Integrated BlobDB: blob cache disabled
RocksDB: version 10.9.0
Date: Tue Nov 11 14:06:49 2025
CPU: 166 * AMD EPYC-Milan Processor
CPUCache: 512 KB
Keys: 64 bytes each (+ 0 bytes user-defined timestamp)
Values: 512 bytes each (256 bytes after compression)
Entries: 5000000
Prefix: 0 bytes
Keys per prefix: 0
RawSize: 2746.6 MB (estimated)
FileSize: 1525.9 MB (estimated)
Write rate: 0 bytes/second
Read rate: 0 ops/second
Compression: Snappy
Compression sampling rate: 0
Memtablerep: SkipListFactory
Perf Level: 1
------------------------------------------------
multiscan_stride = 25000
multiscan_size = 15
seek_nexts = 100
DB path: [/data/rocksdb/mydb]
multiscan : 1129.916 micros/op 885 ops/sec 60.001 seconds 53102 operations; (multscans:53102)
```
Repeated for confirmation.
This introduces a ~20% improvement in latency and op/s.
Note: Benchmarks are single threaded as, when increasing thread count, we start seeing large amounts of overhead being induced by block cache contention, finally resulting in both baseline and branch becoming equal.
Further on network attached storage with high latency, the level iterator, preparing all iterators so a 20% improvement even at high thread counts.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/14100
Reviewed By: anand1976
Differential Revision: D86913584
Pulled By: krhancoc
fbshipit-source-id: da9d0c890e25e392a33389ce6b80f9bfb84d3f85
|
||
|---|---|---|
| .. | ||
| file_read_sample.h | ||
| histogram.cc | ||
| histogram.h | ||
| histogram_test.cc | ||
| histogram_windowing.cc | ||
| histogram_windowing.h | ||
| in_memory_stats_history.cc | ||
| in_memory_stats_history.h | ||
| instrumented_mutex.cc | ||
| instrumented_mutex.h | ||
| iostats_context.cc | ||
| iostats_context_imp.h | ||
| iostats_context_test.cc | ||
| perf_context.cc | ||
| perf_context_imp.h | ||
| perf_level.cc | ||
| perf_level_imp.h | ||
| perf_step_timer.h | ||
| persistent_stats_history.cc | ||
| persistent_stats_history.h | ||
| statistics.cc | ||
| statistics_impl.h | ||
| statistics_test.cc | ||
| stats_history_test.cc | ||
| thread_status_impl.cc | ||
| thread_status_updater.cc | ||
| thread_status_updater.h | ||
| thread_status_updater_debug.cc | ||
| thread_status_util.cc | ||
| thread_status_util.h | ||
| thread_status_util_debug.cc | ||