rocksdb/db/compaction
Hui Xiao a1d8318563 Fix resumable compaction to prevent resumption at truncated range deletion boundaries (#14184)
Summary:
**Context/Summary:**

Truncated range deletion in input files can be output by CompactionIterator with type kMaxValid instead of kTypeRangeDeletion, to satisfy ordering requirement between the truncated range deletion start key and a file's point keys. There was a plan to skip such key in https://github.com/facebook/rocksdb/pull/14122 but blockers remain to fulfill the plan.

Resumable compaction is not able to handle resumption from range deletion well at this point and should consider kMaxValid type same as kTypeRangeDeletion for resumption. Previously, it didn't and mistakenly allow resumption from a delete range. That led to an assertion failure, complaining about lacking information to update file boundaries in the presence of range deletion needed during cutting an output file, after the compaction resumes from that delete range and happens to cut the output file shortly after without any point keys in between.

```
frame https://github.com/facebook/rocksdb/issues/9: 0x00007f4f4743bc93 libc.so.6`__GI___assert_fail(assertion="meta.smallest.size() > 0", file="db/compaction/compaction_outputs.cc", line=530, function="rocksdb::Status rocksdb::CompactionOutputs::AddRangeDels(rocksdb::CompactionRangeDelAggregator&, const rocksdb::Slice*, const rocksdb::Slice*, rocksdb::CompactionIterationStats&, bool, const rocksdb::InternalKeyComparator&, rocksdb::SequenceNumber, std::pair<long unsigned int, long unsigned int>, const rocksdb::Slice&, const string&)") at assert.c:101:3
frame https://github.com/facebook/rocksdb/issues/10: 0x00007f4f4808c68c librocksdb.so.10.9`rocksdb::CompactionOutputs::AddRangeDels(this=0x00007f4f0c27e1a0, range_del_agg=0x00007f4f0c21ecc0, comp_start_user_key=0x0000000000000000, comp_end_user_key=0x0000000000000000, range_del_out_stats=0x00007f4f0dffa140, bottommost_level=false, icmp=0x00007f4ef4c93040, earliest_snapshot=13108729, keep_seqno_range=<unavailable>, next_table_min_key=0x00007f4ef4c8f540, full_history_ts_low="") at compaction_outputs.cc:530:7
frame https://github.com/facebook/rocksdb/issues/11: 0x00007f4f480480dd librocksdb.so.10.9`rocksdb::CompactionJob::FinishCompactionOutputFile(this=0x00007f4f0dffb890, input_status=<unavailable>, prev_table_last_internal_key=0x00007f4f0dffa650, next_table_min_key=0x00007f4ef4c8f540, comp_start_user_key=0x0000000000000000, comp_end_user_key=0x0000000000000000, c_iter=0x00007f4ef4c8f400, sub_compact=0x00007f4f0c27e000, outputs=0x00007f4f0c27e1a0) at compaction_job.cc:1917:31
```

This PR simply prevents  MaxValid from being a resumption point like regular range deletion - see commit 842d66eb18ea67e965d6acb1fce12c18eeb778d2

Besides that, the PR also improves the testing, variable naming, logging in resumable compaction codes that were needed to debug this assertion failure - see commit https://github.com/facebook/rocksdb/pull/14184/commits/aecd4e7f971f6dd4df672d9e5f1409fe4747c561. These improvements are covered by existing tests.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/14184

Test Plan:
- The stress initially surfaced the error. Using the exact same LSM shapes and files that were used in stress test but in a unit test, I'm able to get a deterministic repro and confirmed the fix resolves the error.  This is the repro test 1075936e69
```
./compaction_service_test --gtest_filter=ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume
# Before fix
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ResumableCompactionServiceTest
[ RUN      ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume
compaction_service_test: db/compaction/compaction_outputs.cc:530: rocksdb::Status rocksdb::CompactionOutputs::AddRangeDels(rocksdb::CompactionRangeDelAggregator&, const rocksdb::Slice*, const rocksdb::Slice*, rocksdb::CompactionIterationStats&, bool, const rocksdb::InternalKeyComparator&, rocksdb::SequenceNumber, std::pair<long unsigned int, long unsigned int>, const rocksdb::Slice&, const string&): Assertion `meta.smallest.size() > 0' failed.
Received signal 6 (Aborted)
Invoking GDB for stack trace...
[New LWP 2621610]
[New LWP 2621611]
[New LWP 2621612]
[New LWP 2621613]
[New LWP 2621614]
[New LWP 2621630]
[New LWP 2621631]

# After fix
Note: Google Test filter = ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume
[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from ResumableCompactionServiceTest
[ RUN      ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume
[       OK ] ResumableCompactionServiceTest.CompactSpecificFilesFromExistingDBWithCancelAndResume (4722 ms)
[----------] 1 test from ResumableCompactionServiceTest (4722 ms total)

[----------] Global test environment tear-down
[==========] 1 test from 1 test case ran. (4722 ms total)
[  PASSED  ] 1 test.

```
- Follow-up: I tried a couple time to coerce the truncated range delete from scratch in the unit test but failed doing so. Considering kMaxValid may not be outputted by compaction iterator anymore after https://github.com/facebook/rocksdb/pull/14122/files gets landed again (and obsolete the bug) ADN the simple nature of this fix 842d66eb18ea67e965d6acb1fce12c18eeb778d2 AND the worst case of such fix going wrong is just less resumption, I decided to leave writing a unit test to coerce truncated ranged deletion from scratch a follow-up. Maybe I will draw inspiration from https://github.com/facebook/rocksdb/pull/14122/files.

Reviewed By: jaykorean

Differential Revision: D88912663

Pulled By: hx235

fbshipit-source-id: 80a01135684c8fea659650faaa00c2dc452c482a
2025-12-11 16:50:42 -08:00
..
clipping_iterator.h Refactor AddRangeDels() + consider range tombstone during compaction file cutting (#11113) 2023-02-22 12:28:18 -08:00
clipping_iterator_test.cc Print stack traces on frozen tests in CI (#10828) 2022-10-18 00:35:35 -07:00
compaction.cc Support output temperature in CompactFiles (#13955) 2025-09-22 13:36:26 -07:00
compaction.h Fix compaction picking with L0 standalone range deletion file (#14061) 2025-10-23 13:34:07 -07:00
compaction_iteration_stats.h Add initial support for TimedPut API (#12419) 2024-03-14 15:44:55 -07:00
compaction_iterator.cc Avoid overwriting non-okay status due to shutdown or manual compaction pause (#13891) 2025-09-02 12:37:16 -07:00
compaction_iterator.h Integrate compaction resumption with DB::OpenAndCompact() (#13984) 2025-10-15 13:43:53 -07:00
compaction_iterator_test.cc Remove expect_valid_internal_key parameter from CompactionIterator (#13882) 2025-08-14 16:40:25 -07:00
compaction_job.cc Fix resumable compaction to prevent resumption at truncated range deletion boundaries (#14184) 2025-12-11 16:50:42 -08:00
compaction_job.h Fix resumable compaction to prevent resumption at truncated range deletion boundaries (#14184) 2025-12-11 16:50:42 -08:00
compaction_job_stats_test.cc Standardize on clang-format version 18 (#13233) 2024-12-19 10:58:40 -08:00
compaction_job_test.cc Fix resumable compaction to prevent resumption at truncated range deletion boundaries (#14184) 2025-12-11 16:50:42 -08:00
compaction_outputs.cc Fix resumable compaction to prevent resumption at truncated range deletion boundaries (#14184) 2025-12-11 16:50:42 -08:00
compaction_outputs.h Fix resumable compaction to prevent resumption at truncated range deletion boundaries (#14184) 2025-12-11 16:50:42 -08:00
compaction_picker.cc Fix compaction picking with L0 standalone range deletion file (#14061) 2025-10-23 13:34:07 -07:00
compaction_picker.h Fix compaction picking with L0 standalone range deletion file (#14061) 2025-10-23 13:34:07 -07:00
compaction_picker_fifo.cc Support output temperature in CompactFiles (#13955) 2025-09-22 13:36:26 -07:00
compaction_picker_fifo.h Rename CompactFiles() and CompactRange() in CompactionPickers (#13831) 2025-08-05 13:11:01 -07:00
compaction_picker_level.cc Support output temperature in CompactFiles (#13955) 2025-09-22 13:36:26 -07:00
compaction_picker_level.h Reduce universal compaction input lock time by forwarding intended compaction and re-picking (#13633) 2025-06-12 18:16:47 -07:00
compaction_picker_test.cc Fix compaction picking with L0 standalone range deletion file (#14061) 2025-10-23 13:34:07 -07:00
compaction_picker_universal.cc Fix compaction picking with L0 standalone range deletion file (#14061) 2025-10-23 13:34:07 -07:00
compaction_picker_universal.h Reduce universal compaction input lock time by forwarding intended compaction and re-picking (#13633) 2025-06-12 18:16:47 -07:00
compaction_service_job.cc Stress/crash test improvement to remote compaction with resumable compaction (#14041) 2025-10-21 12:13:57 -07:00
compaction_service_test.cc Add option to verify block checksums of output files (#14103) 2025-11-07 14:22:00 -08:00
compaction_state.cc Fix Compaction Stats for Remote Compaction and Tiered Storage (#13464) 2025-03-18 16:28:18 -07:00
compaction_state.h Fix Compaction Stats for Remote Compaction and Tiered Storage (#13464) 2025-03-18 16:28:18 -07:00
file_pri.h Avoid shifting component too large error in FileTtlBooster (#11673) 2023-08-04 14:29:50 -07:00
sst_partitioner.cc Remove FactoryFunc from LoadXXXObject (#11203) 2023-02-17 12:54:07 -08:00
subcompaction_state.cc Fix resumable compaction to prevent resumption at truncated range deletion boundaries (#14184) 2025-12-11 16:50:42 -08:00
subcompaction_state.h Fix resumable compaction to prevent resumption at truncated range deletion boundaries (#14184) 2025-12-11 16:50:42 -08:00
tiered_compaction_test.cc Add kCool Temperature (#14000) 2025-09-25 11:27:00 -07:00