Updates to corresponding submodule position in main fork at v10.10.1 #5
Open
gamesguru
wants to merge 221 commits from
gamesguru/rocksdb:main-c10y into 10.5.fb
pull from: gamesguru/rocksdb:main-c10y
merge into: continuwuation:10.5.fb
continuwuation:10.5.fb
continuwuation:renovate/configure
continuwuation:10.4.fb
continuwuation:v9.11.1
continuwuation:v9.9.3
continuwuation:v9.10.0
continuwuation:v9.6.1
continuwuation:tags/v9.8.4
continuwuation:v9.8.4
continuwuation:v9.7.4
continuwuation:v9.7.3
continuwuation:v9.7.2
continuwuation:v9.5.2
continuwuation:v9.4.0
continuwuation:v9.3.1
continuwuation:v9.2.1
continuwuation:v9.1.1
continuwuation:9.1.fb
continuwuation:9.0.fb
continuwuation:main
continuwuation:8.11.fb
continuwuation:8.11.2_zippydb
continuwuation:8.11.fb_zippydb
continuwuation:8.10.fb
continuwuation:8.9.fb
continuwuation:8.8.fb
continuwuation:8.7.fb
continuwuation:8.5.fb
continuwuation:8.6.fb
continuwuation:8.4.fb
continuwuation:8.3.fb
continuwuation:pr/11267
continuwuation:8.2.fb
continuwuation:8.1.fb
continuwuation:fix-win2022-build
continuwuation:7.10.fb
continuwuation:8.0.fb
continuwuation:draft-myrocks-and-fbcode-8.0.fb
continuwuation:7.9.fb
continuwuation:ajkr-patch-2
continuwuation:xxhash_merge_base
continuwuation:7.8.fb
continuwuation:7.7.fb
continuwuation:7.5.fb
continuwuation:mdcallag_benchmark_oct22
continuwuation:gh-pages-old
continuwuation:7.6.fb
continuwuation:revert-10606-7.6.1
continuwuation:7.4.fb
continuwuation:siying-patch-8
continuwuation:7.3.fb
continuwuation:7.2.fb
continuwuation:7.1.fb
continuwuation:siying-patch-7
continuwuation:7.0.fb
continuwuation:siying-patch-6
continuwuation:6.29.fb
continuwuation:release_fix
continuwuation:unschedule_issue_test_base
continuwuation:5.3.fb
continuwuation:5.4.fb
continuwuation:6.28.fb
continuwuation:6.27.fb
continuwuation:6.26.fb
continuwuation:siying-patch-5
continuwuation:6.25.fb
continuwuation:6.24.fb
continuwuation:6.16.fb
continuwuation:6.23.fb
continuwuation:ribbon_bloom_hybrid
continuwuation:6.22.fb
continuwuation:6.21.fb
continuwuation:6.22-history.md-fixup
continuwuation:6.20.fb
continuwuation:nvm_cache_proto
continuwuation:6.19.fb
continuwuation:jijiew-patch-1
continuwuation:6.17.fb
continuwuation:6.18.fb
continuwuation:master
continuwuation:6.17.fb.laser
continuwuation:6.15.fb
continuwuation:ramvadiv-patch-1
continuwuation:6.14.fb
continuwuation:6.13.fb
continuwuation:6.12.fb
continuwuation:6.11.fb
continuwuation:6.10.fb
continuwuation:6.14.fb.laser
continuwuation:6.13.fb.laser
continuwuation:fix-release-notes
continuwuation:siying-patch-4
continuwuation:skip_memtable_flush
continuwuation:yuslepukhin
continuwuation:6.9.fb
continuwuation:ajkr-patch-1
continuwuation:pr-sanity-check-as-GHAction
continuwuation:6.8.fb
continuwuation:6.7.fb
continuwuation:siying-patch-3
continuwuation:testing_ppc_build
continuwuation:format_compatible_4
continuwuation:5.18.fb
continuwuation:6.6.fb
continuwuation:siying-patch-2
continuwuation:6.5.fb
continuwuation:adaptive
continuwuation:checksum_readahead_mmap_fix
continuwuation:pr/6062
continuwuation:6.3.fb.myrocks2
continuwuation:5.2.fb
continuwuation:5.1.fb
continuwuation:5.0.fb
continuwuation:4.13.fb
continuwuation:4.12.fb
continuwuation:4.11.fb
continuwuation:4.10.fb
continuwuation:4.9.fb
continuwuation:4.8.fb
continuwuation:4.7.fb
continuwuation:4.6.fb
continuwuation:4.5.fb
continuwuation:4.4.fb
continuwuation:4.3.fb
continuwuation:4.2.fb
continuwuation:4.1.fb
continuwuation:4.0.fb
continuwuation:3.13.fb
continuwuation:3.12.fb
continuwuation:3.11.fb
continuwuation:3.10.fb
continuwuation:3.6.fb
continuwuation:3.4.fb
continuwuation:6.4.fb
continuwuation:2.6.fb.branch
continuwuation:2.5.fb.branch
continuwuation:2.4.fb.branch
continuwuation:2.3.fb.branch
continuwuation:2.2.fb.branch
continuwuation:2.7.fb.branch
continuwuation:2.8.1.fb
continuwuation:3.0.fb.branch
continuwuation:3.1.fb
continuwuation:3.2.fb
continuwuation:3.3.fb
continuwuation:3.5.fb
continuwuation:3.7.fb
continuwuation:3.8.fb
continuwuation:3.9.fb
continuwuation:5.5.fb
continuwuation:5.6.fb
continuwuation:5.7.fb
continuwuation:5.8.fb
continuwuation:5.9.fb
continuwuation:5.10.fb
continuwuation:5.11.fb
continuwuation:5.12.fb
continuwuation:5.13.fb
continuwuation:v6.6.4
continuwuation:6.3.fb
continuwuation:6.2.fb
continuwuation:6.1.fb
continuwuation:6.0.fb
continuwuation:5.17.fb
continuwuation:5.16.fb
continuwuation:5.15.fb
continuwuation:5.14.fb
continuwuation:feature/travis-arm64
continuwuation:katherinez-patch-2
continuwuation:katherinez-patch-1
continuwuation:6.3.fb.myrocks
continuwuation:getmergeops
continuwuation:6.3fb
continuwuation:6.1.fb.prod201905
continuwuation:6.1.fb.myrocks
continuwuation:6.0.fb.myrocks
continuwuation:history-update
continuwuation:fix-write-batch-comment
continuwuation:siying-patch-1
continuwuation:5.14.fb.myrocks
continuwuation:5.17.fb.myrocks
continuwuation:5.13.fb.myrocks
continuwuation:improve-support
continuwuation:tests
continuwuation:scaffold
continuwuation:hotfix/lambda-capture
continuwuation:bottom-pri-level
continuwuation:feature/debug-rocksdbjavastatic
continuwuation:bugfix-build-detect
continuwuation:unused-var
continuwuation:siying-patch-10
continuwuation:blob_shadow
continuwuation:yiwu_stackable
continuwuation:5.9.fb.myrocks
continuwuation:5.8.3
continuwuation:5.7.fb.myrocks
continuwuation:2.7
continuwuation:3.0.fb
continuwuation:2.8.fb.trunk
continuwuation:2.8.fb
No reviewers
Labels
Clear labels
Something isn't working
Improvements or additions to documentation
This issue or pull request already exists
New feature or request
Good for newcomers
Extra attention is needed
This doesn't seem right
Further information is requested
This will not be worked on
bug
Something isn't working
documentation
Improvements or additions to documentation
duplicate
This issue or pull request already exists
enhancement
New feature or request
good first issue
Good for newcomers
help wanted
Extra attention is needed
invalid
This doesn't seem right
question
Further information is requested
wontfix
This will not be worked on
No labels
bug
documentation
duplicate
enhancement
good first issue
help wanted
invalid
question
wontfix
Milestone
Clear milestone
No items
No milestone
Projects
Clear projects
No items
No project
Assignees
Clear assignees
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".
No due date set.
Dependencies
No dependencies set.
Reference
continuwuation/rocksdb!5
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "gamesguru/rocksdb:main-c10y"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Needs to be tested, both the C++ and some integration tests with continuwuity servers.
See release notes
allow_ingest_behind(#13807) e7a4505a2ecf_allow_ingest_behind(#13810) 3bd7d968e1Summary: fix the following error showing up in continuous tests: ``` Makefile:186: Warning: Compiling in debug mode. Don't use the resulting binary in production port/mmap.cc:46:15: error: first argument in call to 'memcpy' is a pointer to non-trivially copyable type 'rocksdb::MemMapping' [-Werror,-Wnontrivial-memcall] 46 | std::memcpy(this, &other, sizeof(*this)); | ^ port/mmap.cc:46:15: note: explicitly cast the pointer to silence this warning 46 | std::memcpy(this, &other, sizeof(*this)); | ^ | (void*) 1 error generated. make: *** [Makefile:2580: port/mmap.o] Error 1 make: *** Waiting for unfinished jobs.... ``` Pull Request resolved: https://github.com/facebook/rocksdb/pull/13864 Test Plan: `make USE_CLANG=1 j=150 check` with13f054febb/build_tools/build_detect_platform (L61-L70)commented out. Reviewed By: mszeszko-meta Differential Revision: D80033441 Pulled By: cbi42 fbshipit-source-id: b2330eea71fe28243236b75128ec6f3f1e971873expect_valid_internal_keyparameter from CompactionIterator (#13882) 972fd9adf1Summary: The implementation of parallel compression has historically scaled rather poorly, or perhaps modestly with heavy compression, topping out around 3x throughput vs. serial and incurring big overheads in CPU consumption relative to the throughput. This change addresses one source of that extra CPU consumption: stashing all the keys of a block for later processing into building index and filter blocks. Historically with parallel compression, the index and filter block updates were handled in the last stage of processing along with writing each data block to the file writer. This was because the index blocks needed to know the BlockHandle of the new data block, which could only be known after every preceeding data block was compressed, to know the starting location for the BlockHandle. And because index and filter partitions were historically coupled (see decouple_partitioned_filters), filter updates had to happen at the same time. Here we get rid of stashing the keys for later processing and the extra CPU associated with it, by * Creating a two stage process of adding to index blocks ("prepare" and "finish" each entry; one entry per data block). The two stages must be executable in parallel for separate index entries. NOTE: not yet supported by UserDefinedIndex * Requiring decouple_partitioned_filters=true for parallel compression, because we now add to filters in the first stage of processing when each key is readily available and we cannot couple that with finalizing index entries in the last stage of processing. It might seem like adding to filters is something that is expensive (hashing etc.) and should be kept out of the bottle-neck first stage of processing (which includes walking the compaction iterator) but it's probably similar cost to simply stashing the keys away for later processing. (We might be able to reduce a bottle-neck by stashing hashes, but we're not to a point where that is worth the effort.) And it makes sense to make two more simple public API updates in conjunction with this: * Set decouple_partitioned_filters=true by default. No signs of problems in production. * Mark parallel compression as production-ready. It's being thoroughly tested in the crash test, successfully, and in limited production uses. Follow-up: * Improve the threading/sychronization model of parallel compression for the next major efficiency improvement * Consider supporting the parallel-compatible index building APIs with UserDefinedIndex, unless it's considered too dangerous to expect users to safely handle the multi-threading. * (In a subsequent release) remove all the code associated with coupling filter and index partitions and mark the option as ignored. Pull Request resolved: https://github.com/facebook/rocksdb/pull/13850 Test Plan: for correctness, existing tests ## Performance Data The "before" data here includes revert of https://github.com/facebook/rocksdb/issues/13828 for combined performance measurement of this change and that one. ``` SUFFIX=`tty | sed 's|/|_|g'`; for CT in lz4 zstd lz4; do for PT in 1 2 3 4 6 8; do echo "$CT pt=$PT"; (for I in `seq 1 1`; do BIN=/dev/shm/dbbench${SUFFIX}.bin; rm -f $BIN; cp db_bench $BIN; /usr/bin/time $BIN -db=/dev/shm/dbbench$SUFFIX --benchmarks=fillseq -num=30000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=1000 -fifo_compaction_allow_compaction=0 -disable_wal -write_buffer_size=12000000 -format_version=7 -compression_type=$CT -compression_parallel_threads=$PT 2>&1 | tail -n 3 | head -n 2; done); done; done ``` To get a sense of the overall performance relative to number of parallel threads, we vary that with popular fast compression and popular heavier weight compression (some noise in this data, don't interpret each data point too strongly) lz4 pt=1 2107431 -> 2112941 ops/sec (+0.3% - improvement) (26.51 + 0.75) = 27.26 CPU sec -> (26.63 + 0.79) = 27.42 CPU sec (+0.6% - regression) lz4 pt=2 1606660 -> 1580333 ops/sec (-1.6% - regression) (47.10 + 8.37) = 55.47 CPU sec -> (45.05 + 9.23) = 54.28 CPU sec (-2.2% - improvement) lz4 pt=3 1701353 -> 1889283 ops/sec (+11.1% - improvement) (47.23 + 8.29) = 55.52 CPU sec -> (43.89 + 8.33) = 52.22 CPU sec (-6.0% - improvement) lz4 pt=4 1651504 -> 1817890 ops/sec (+10.1% - improvement) (48.07 + 8.31) = 56.38 CPU sec -> (44.77 + 8.45) = 53.22 CPU sec (-5.6% - improvement) lz4 pt=6 1716099 -> 1888523 ops/sec (+10.1% - improvement) (47.50 + 8.45) = 55.95 CPU sec -> (44.25 + 8.73) = 52.98 CPU sec (-5.3% - improvement) lz4 pt=8 1696840 -> 1797256 ops/sec (+5.9% - improvement) (48.09 + 8.61) = 56.70 CPU sec -> (45.90 + 8.68) = 54.58 CPU sec (-3.8% - improvement) Clearly parallel threads do not help with fast compression like LZ4, but it's not as bad as it was before. zstd pt=1 1214258 -> 1202863 ops/sec (-0.9% - regression) (38.26 + 0.66) = 38.92 CPU sec -> (39.37 + 0.69) = 40.06 CPU sec (+2.9% - regression) zstd pt=2 1194673 -> 1152746 ops/sec (-3.5% - regression) (61.01 + 9.85) = 70.86 CPU sec -> (58.28 + 9.99) = 68.27 CPU sec (-3.7% - improvement) zstd pt=3 1653661 -> 1825618 ops/sec (+10.4% - improvement) (60.07 + 8.45) = 68.52 CPU sec -> (56.03 + 8.43) = 64.46 CPU sec (-5.9% - improvement) zstd pt=4 1691723 -> 1890976 ops/sec (+11.8% - improvement) (59.72 + 8.46) = 68.18 CPU sec -> (55.96 + 8.27) = 64.23 CPU sec (-5.7% - improvement) zstd pt=6 1684982 -> 1900002 ops/sec (+12.8% - improvement) (58.89 + 8.26) = 67.15 CPU sec -> (55.98 + 8.48) = 64.46 CPU sec (-4.0% - improvement) zstd pt=8 1648282 -> 1892531 ops/sec (+14.8% - improvement) (59.43 + 8.63) = 68.06 CPU sec -> (56.49 + 8.32) = 64.81 CPU sec (-4.8% - improvement) The throughput is now able to increase by *more than half* with lots of parallelism, rather than only *about a third*. Scalability is a bit better with higher compression level, and we still see a benefit from this change. (We've also enabled partitioned indexes and filters here, which sees essentially the same benefits): zstd pt=1 compression_level=7 595720 -> 597359 ops/sec (+0.3% - improvement) (63.45 + 0.73) = 64.18 CPU sec -> (63.25 + 0.71) = 63.96 CPU sec (-0.3% - improvement) zstd pt=4 compression_level=7 1527116 -> 1501779 ops/sec (-1.7% - regression) (85.00 + 8.14) = 93.14 CPU sec -> (81.85 + 9.02) = 90.87 CPU sec (-2.5% - improvement) zstd pt=6 compression_level=7 1678239 -> 1956070 ops/sec (+16.5% - improvement) (83.77 + 8.11) = 91.88 CPU sec -> (79.87 + 7.78) = 87.65 CPU sec (-4.6% - improvement) zstd pt=8 compression_level=7 1696132 -> 1953041 ops/sec (+15.1% - improvement) (83.97 + 8.14) = 92.11 CPU sec -> (80.61 + 7.78) = 88.39 CPU sec (-4.1% - improvement) With more tests, not really seeing any consistent differences with no parallelism (despite some micro-optimizations thrown in) Reviewed By: hx235 Differential Revision: D79853111 Pulled By: pdillinger fbshipit-source-id: 7a34fd7811217fb74fa6d3efaea7ffcce72beec7IngestDBGeneratedFileTest2.NonZeroSeqno(#13979) 841e364238Summary: Linter complains like this ``` void foo(Arg parameter_name) {} void bar() { Arg a; foo(/*some_other_name=*/ a); // Wrong! Comment/parameter name mismatch foo(/*parameter_name=*/ a); // This is OK; the names match. } ``` ``` Argument name in comment (`read_only`) does not match parameter name (`unchanging`). ``` This used to be warning, but now treated as an error :( Fixing a few other linter warnings before they become errors in the future. Pull Request resolved: https://github.com/facebook/rocksdb/pull/14074 Test Plan: CI Reviewed By: archang19 Differential Revision: D85370353 Pulled By: jaykorean fbshipit-source-id: 20e96aad740d516a29c0424282674e655f99c0a2FindObsoleteFiles()(#14069) e687ca79b4Summary: Adds auto-tuning of manifest file size to avoid the need to scale `max_manifest_file_size` in proportion to things like number of SST files to properly balance (a) manifest file write amp and new file creation, vs. (b) manifest file space amp and replay time, including non-incremental space usage in backups. (Manifest file write amp comes from re-writing a "live" record when the manifest file is re-created, or "compacted"; space amp is usage beyond what would be used by a compacted manifest file.) In more detail, * Add new option `max_manifest_space_amp_pct` with default value of 500, which defaults to 0.2 write amp and up to roughly 5.0 space amp, except `max_manifest_file_size` is treated as the "minimum" size before re-creating ("compacting") the manifest file. * `max_manifest_file_size` in a way means the same thing, with the same default of 1GB, but in a way has taken on a new role. What is the same is that we do not re-create the manifest file before reaching this size (except for DB re-open), and so users are very unlikely to see a change in default behavior (auto-tuning only kicking in if auto-tuning would exceed 1GB for effective max size for the current manifest file). The new role is as a file size lower bound before auto-tuning kicks in, to minimize churn in files considered "negligibly small." We recommend a new setting of around 1MB or even smaller like 64KB, and expect something like this to become the default soon. * These two options along with `manifest_preallocation_size` are now mutable with SetDBOptions. The effect is nearly immediate, affecting the next write to the current manifest file. Also in this PR: * Refactoring of VersionSet to allow it to get (more) settings from MutableDBOptions. This touches a number of files in not very interesting ways, but notably we have to be careful about thread-safe access to MutableDBOptions fields, and even fields within VersionSet. I have decided to save copies of relevant fields from MutableDBOptions to simplify testing, etc. by not saving a reference to MutableDBOptions but getting notified of updates. * Updated some logging in VersionSet to provide some basic data about final and compacted manifest sizes (effects of auto-tuning), making sure to avoid I/O while holding DB mutex. * Added db_etc3_test.cc which is intended as a successor to db_test and db_test2, but having "test.cc" in its name for easier exclusion of test files when using `git grep`. Intended follow-up: rename db_test2 to db_etc2_test * Moved+updated `ManifestRollOver` test to the new file to be closer to other manifest file rollover testing. Pull Request resolved: https://github.com/facebook/rocksdb/pull/14076 Test Plan: As for correctness, new unit test AutoTuneManifestSize is pretty thorough. Some other unit tests updated appropriately. Manual tests in the performance section were also audited for expected behavior based on the new logging in the DB LOG. Example LOG data with -max_manifest_file_size=2048 -max_manifest_space_amp_pct=500: ``` 2025/10/24-11:12:48.979472 2150678 [/version_set.cc:5927] Created manifest 5, compacted+appended from 52 to 116 2025/10/24-11:12:49.626441 2150682 [/version_set.cc:5927] Created manifest 24, compacted+appended from 2169 to 1801 2025/10/24-11:12:52.194592 2150682 [/version_set.cc:5927] Created manifest 91, compacted+appended from 10913 to 8707 2025/10/24-11:13:02.969944 2150682 [/version_set.cc:5927] Created manifest 362, compacted+appended from 52259 to 13321 2025/10/24-11:13:18.815120 2150681 [/version_set.cc:5927] Created manifest 765, compacted+appended from 80064 to 13304 2025/10/24-11:13:35.590905 2150681 [/version_set.cc:5927] Created manifest 1167, compacted+appended from 79863 to 13304 ``` As you can see, it only took a few iterations of ramp-up to settle on the auto-tuned max manifest size for tracking ~122 live SST files, around 80KB and compacting down to about 13KB. (13KB * (500 + 100) / 100 = 78KB). With the default large setting for max_manifest_file_size, we end up with a 232KB manifest, which is more than 90% wasted space. (A long-running DB would be much worse.) As for performance, we don't expect a difference, even with TransactionDB because actual writing of the manifest is done without holding the DB mutex. I was not able to see a performance regression using db_bench with FIFO compaction and >1000 ~10MB SST files, including settings of -max_manifest_file_size=2048 -max_manifest_space_amp_pct={500,10,0}. No "hiccups" visible with -histogram either. I also tried seeding a 1 second delay in writing new manifest files (other than the first). This had no significant effect at -max_manifest_space_amp_pct=500 but at 100 started causing write stalls in my test. In many ways this is kind of a worst case scenario and out-of-proportion test, but gives me more confidence that a higher number like 500 is probably the best balance in general. Reviewed By: xingbowang Differential Revision: D85445178 Pulled By: pdillinger fbshipit-source-id: 1e6e07e89c586762dd65c65bb7cb2b8b719513f9Summary: This diff introduces the async prepare of all iterators within a MultiScan. The current state has each iterator be prepared as its needed, and with this diff, we prepare all iterators during the prepare phase of the Level Iterator, this will allow more time for each IO to be dispatched and serviced, increasing the odds that a block is ready as the scan seeks to it. Benchmark is prefilled using ``` KEYSIZE=64 VALUESIZE=512 NUMKEYS=5000000 SCAN_SIZE=100 DISTANCE=25000 NUM_SCANS=15 THREADS=1 ./db_bench --db=$DB \ --benchmarks="fillseq" \ --write_buffer_size=5242880 \ --max_write_buffer_number=4 \ --target_file_size_base=5242880 \ --disable_wal=1 --key_size=$KEYSIZE \ --value_size=$VALUESIZE --num=$NUMKEYS --threads=32 } ``` And benchmark ran is ``` run() { echo 1 | sudo tee /proc/sys/vm/drop_caches ./db_bench --db=$DB --use_existing_db=1 \ --benchmarks=multiscan \ --disable_auto_compactions=1 --seek_nexts=$SCAN_SIZE \ --multiscan-use-async-io=1 \ --multiscan-size=$NUM_SCANS --multiscan-stride=$DISTANCE \ --key_size=$KEYSIZE --value_size=$VALUESIZE \ --num=$NUMKEYS --threads=$THREADS --duration=60 --statistics } ``` The benchmark uses large stride sides to ensure that two scans would touch separate files. We reduce the size of the block cache to increase likelyhood of reads (and simulate larger data sets) **Branch:** ``` Integrated BlobDB: blob cache disabled RocksDB: version 10.8.0 Date: Tue Nov 11 13:26:29 2025 CPU: 166 * AMD EPYC-Milan Processor CPUCache: 512 KB Keys: 64 bytes each (+ 0 bytes user-defined timestamp) Values: 512 bytes each (256 bytes after compression) Entries: 5000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 2746.6 MB (estimated) FileSize: 1525.9 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: Snappy Compression sampling rate: 0 Memtablerep: SkipListFactory Perf Level: 1 ------------------------------------------------ multiscan_stride = 25000 multiscan_size = 15 seek_nexts = 100 DB path: [/data/rocksdb/mydb] multiscan : 837.941 micros/op 1193 ops/sec 60.001 seconds 71605 operations; (multscans:71605) ``` **Baseline:** ``` Set seed to 1762898809121995 because --seed was 0 Initializing RocksDB Options from the specified file Initializing RocksDB Options from command-line flags Integrated BlobDB: blob cache disabled RocksDB: version 10.9.0 Date: Tue Nov 11 14:06:49 2025 CPU: 166 * AMD EPYC-Milan Processor CPUCache: 512 KB Keys: 64 bytes each (+ 0 bytes user-defined timestamp) Values: 512 bytes each (256 bytes after compression) Entries: 5000000 Prefix: 0 bytes Keys per prefix: 0 RawSize: 2746.6 MB (estimated) FileSize: 1525.9 MB (estimated) Write rate: 0 bytes/second Read rate: 0 ops/second Compression: Snappy Compression sampling rate: 0 Memtablerep: SkipListFactory Perf Level: 1 ------------------------------------------------ multiscan_stride = 25000 multiscan_size = 15 seek_nexts = 100 DB path: [/data/rocksdb/mydb] multiscan : 1129.916 micros/op 885 ops/sec 60.001 seconds 53102 operations; (multscans:53102) ``` Repeated for confirmation. This introduces a ~20% improvement in latency and op/s. Note: Benchmarks are single threaded as, when increasing thread count, we start seeing large amounts of overhead being induced by block cache contention, finally resulting in both baseline and branch becoming equal. Further on network attached storage with high latency, the level iterator, preparing all iterators so a 20% improvement even at high thread counts. Pull Request resolved: https://github.com/facebook/rocksdb/pull/14100 Reviewed By: anand1976 Differential Revision: D86913584 Pulled By: krhancoc fbshipit-source-id: da9d0c890e25e392a33389ce6b80f9bfb84d3f85Summary: **Context/Summary:** This PR adds multi-cf support to option migration. The original implementation sets options, opens db, compacts files and reopens the db in almost all the three branches below. Such design makes expanding to multi-cf difficult as it needs to change all these places within each of the branch causing code redundancy. ``` Status OptionChangeMigration(std::string dbname, const Options& old_opts, const Options& new_opts) { if (old_opts.compaction_style == CompactionStyle::kCompactionStyleFIFO) { // LSM generated by FIFO compaction can be opened by any compaction. return Status::OK(); } else if (new_opts.compaction_style == CompactionStyle::kCompactionStyleUniversal) { return MigrateToUniversal(dbname, old_opts, new_opts); } else if (new_opts.compaction_style == CompactionStyle::kCompactionStyleLevel) { return MigrateToLevelBase(dbname, old_opts, new_opts); } else if (new_opts.compaction_style == CompactionStyle::kCompactionStyleFIFO) { return CompactToLevel(old_opts, dbname, 0, 0 /* l0_file_size */, true); } else { return Status::NotSupported( "Do not how to migrate to this compaction style"); } } ``` Therefore this PR - Refactor the option migration implementation by moving the common parts into the high-level `OptionChangeMigration()` through `PrepareNoCompactionCFDescriptors()` and `OpenDBWithCFs()` so `MigrateAllCFs()` can focus on compaction only. - Treat the original OptionChangeMigration() API as a special case of the multi-cf version option migration - Add multiple-cf support A few notes: - CompactToLevel() originally modifies the compaction-related options conditionally before doing compaction. This is moved into earlier steps through `ApplySpecialSingleLevelSettings()` in `PrepareNoCompactionCFDescriptors()` - MigrateToUniversal() originally opens the db twice with essentially the same option. This PR reduces that to one open - Option migration does not always use the old option to compact the db and reopen the db after migration, see ` return CompactToLevel(new_opts, dbname, new_opts.num_levels - 1,/*l0_file_size=*/0, false);`. `PrepareNoCompactionCFDescriptors()` is where we handle those decisions. Pull Request resolved: https://github.com/facebook/rocksdb/pull/14059 Test Plan: - Existing UTs - New UTs Reviewed By: cbi42 Differential Revision: D84852970 Pulled By: hx235 fbshipit-source-id: 936b456cf9fb4c3ccb687e5d1387f2d67a1448beView command line instructions
Checkout
From your project repository, check out a new branch and test the changes.Merge
Merge the changes and update on Forgejo.Warning: The "Autodetect manual merge" setting is not enabled for this repository, you will have to mark this pull request as manually merged afterwards.