rocksdb

History

Danny Chen b6f498b2c9 Add verify_manifest_content_on_close option (#14451 ) Summary: Add a new mutable DB option `verify_manifest_content_on_close` (default: false). When enabled, on DB close the MANIFEST file is read back and all records are validated (CRC checksums via log::Reader and logical content via VersionEdit::DecodeFrom). If corruption is detected, a fresh MANIFEST is written from in-memory state using the existing LogAndApply recovery path. This complements the existing size validation in VersionSet::Close() with content validation, reusing the same manifest reading pattern as VersionSet::Recover(). Implementation plan: ## Part 1: New DB Option — verify_manifest_content_on_close - A new mutable bool DB option (default: false) that can be dynamically toggled via SetDBOptions() at runtime, following the pattern of other mutable manifest options like max_manifest_file_size. - Propagation: SetDBOptions() -> DBImpl::mutable_db_options_ -> versions_->UpdatedMutableDbOptions() -> VersionSet::verify_manifest_content_on_close_ ## Part 2: Core Implementation — Content Validation in VersionSet::Close() - Inserted after existing size check, before closed_ = true - Opens manifest as SequentialFileReader, creates log::Reader with checksum=true - Loops ReadRecord with WALRecoveryMode::kAbsoluteConsistency, decodes each record as VersionEdit - On corruption: fires OnIOError listeners, logs error, calls LogAndApply with empty edit to trigger manifest rewrite from in-memory state - If manifest can't be opened for reading: logs warning, doesn't fail close ## Part 3: Unit Tests (in version_set_test.cc) - ManifestContentValidationOnClose_Clean: enable option, normal close, verify no manifest rotation - ManifestContentValidationOnClose_CorruptRecord: enable option, corrupt manifest via SyncPoint, verify rotation occurs and DB reopens cleanly - ManifestContentValidationOnClose_Disabled: default off, verify content validation does not run - ManifestContentValidationOnClose_SizeCheckFails: truncate manifest so size check fails first, verify recovery via size-check path ## What Happens If a Corruption is Detected If corruption was detected, four things happen: 1. Notify listeners — Fires `OnIOError` on all registered event listeners (from db_options_->listeners) so monitoring/alerting systems can observe the corruption event. Uses `FileOperationType::kVerify` to categorize it. 2. Permit unchecked errors — `PermitUncheckedError()` silences RocksDB's debug-mode assertion that every `IOStatus` must be inspected. These statuses are informational-only here; the real recovery is via `LogAndApply`. 3. Log the error — Writes a `ROCKS_LOG_ERROR` message with the filename for operational visibility (grep-able in production logs). 4. Rewrite the manifest via `LogAndApply` — This is the actual recovery. `LogAndApply` is called with an empty `VersionEdit` (no changes). Internally, `LogAndApply` detects that the current `descriptor_log_` is null (it was reset at line 5551, or by the previous `LogAndApply` in the size-check path) and creates a brand-new MANIFEST file. It serializes the entire current in-memory LSM state — all column families, all levels, all file metadata, sequence numbers, etc. — into this new file. It then atomically updates the `CURRENT` file pointer to reference the new MANIFEST. This works because the in-memory state was built from the original manifest during `DB::Open()` and has been kept fully up to date through all subsequent operations (flushes, compactions, etc.) during the DB's lifetime. The on-disk manifest is essentially a journal of changes; `LogAndApply` with an empty edit produces a fresh, compacted snapshot of that state. ## Flow Diagram of Manifest Content Validation VersionSet::Close() │ ├─ Close descriptor_log_ and check size │ └─ Size mismatch? → LogAndApply (rewrite manifest) │ ├─ Content validation (if s.ok() && option enabled) │ ├─ Open manifest for sequential reading │ │ └─ Can't open? → WARN log, continue │ │ │ ├─ For each record: │ │ ├─ ReadRecord (CRC32 check, kAbsoluteConsistency) │ │ └─ DecodeFrom (VersionEdit logical check) │ │ │ └─ Corruption detected? │ ├─ Notify OnIOError listeners │ ├─ LOG_ERROR │ └─ LogAndApply (rewrite manifest from in-memory state) │ └─ closed_ = true; return s; ## How This Relates to the Existing Size Check The existing size check (lines 5556-5582) and the new content validation are complementary: \| Check \| What it catches \| How it checks \| \|----------------\|-----------------------------------------\|----------------------------\| \| Size check \| Truncation, partial writes, extra bytes \| Compare expected vs actual file size \| \| Content check \| Bit-rot, silent corruption, bad records \| CRC32 + VersionEdit decode \| The size check catches gross corruption (file too short or too long). The content check catches subtle corruption where the file is the right size but individual bytes have been flipped (e.g., storage media bit-rot, buggy filesystem, incomplete block write). Both recovery paths use the same mechanism: `LogAndApply` with an empty `VersionEdit` to rewrite the manifest from in-memory state. Pull Request resolved: https://github.com/facebook/rocksdb/pull/14451 Reviewed By: xingbowang Differential Revision: D96004906 Pulled By: dannyhchen fbshipit-source-id: 0b0ecdada3a74e97d2cadbba2091b8b577f1d684		2026-03-19 12:01:23 -07:00
..
cf_options.cc	Add memtable MultiGet finger search optimization (#14428 )	2026-03-16 10:45:49 -07:00
cf_options.h	Add memtable MultiGet finger search optimization (#14428 )	2026-03-16 10:45:49 -07:00
configurable.cc	Preliminary support for custom compression algorithms (#13659 )	2025-06-16 14:19:03 -07:00
configurable_helper.h	Fix race to make BlockBasedTableOptions effectively mutable (#13082 )	2024-10-25 10:24:54 -07:00
configurable_test.cc	Run internal cpp modernizer on RocksDB repo (#12398 )	2024-03-04 10:08:32 -08:00
configurable_test.h	Standardize on clang-format version 18 (#13233 )	2024-12-19 10:58:40 -08:00
customizable.cc	Standardize on clang-format version 18 (#13233 )	2024-12-19 10:58:40 -08:00
customizable_test.cc	Remove deprecated SliceTransform::InRange() virtual method (#14353 )	2026-02-19 16:45:51 -08:00
db_options.cc	Add verify_manifest_content_on_close option (#14451 )	2026-03-19 12:01:23 -07:00
db_options.h	Add verify_manifest_content_on_close option (#14451 )	2026-03-19 12:01:23 -07:00
offpeak_time_info.cc	Mark more files for periodic compaction during offpeak (#12031 )	2023-11-06 11:43:59 -08:00
offpeak_time_info.h	Fix build on alpine 3.19 (#12345 )	2024-02-12 11:24:56 -08:00
options.cc	Add memtable MultiGet finger search optimization (#14428 )	2026-03-16 10:45:49 -07:00
options_helper.cc	Add verify_manifest_content_on_close option (#14451 )	2026-03-19 12:01:23 -07:00
options_helper.h	Support recompress-with-CompressionManager in sst_dump (#13783 )	2025-07-18 14:22:29 -07:00
options_parser.cc	Standardize on clang-format version 18 (#13233 )	2024-12-19 10:58:40 -08:00
options_parser.h	Standardize on clang-format version 18 (#13233 )	2024-12-19 10:58:40 -08:00
options_settable_test.cc	Add verify_manifest_content_on_close option (#14451 )	2026-03-19 12:01:23 -07:00
options_test.cc	Add option to verify file checksum of output files (#14433 )	2026-03-11 21:14:28 -07:00