rocksdb/options
Danny Chen b6f498b2c9 Add verify_manifest_content_on_close option (#14451)
Summary:
Add a new mutable DB option `verify_manifest_content_on_close` (default: false).
When enabled, on DB close the MANIFEST file is read back and all records are
validated (CRC checksums via log::Reader and logical content via
VersionEdit::DecodeFrom). If corruption is detected, a fresh MANIFEST is written
from in-memory state using the existing LogAndApply recovery path.

This complements the existing size validation in VersionSet::Close() with content
validation, reusing the same manifest reading pattern as VersionSet::Recover().

Implementation plan:

## Part 1: New DB Option — verify_manifest_content_on_close
- A new mutable bool DB option (default: false) that can be dynamically toggled
  via SetDBOptions() at runtime, following the pattern of other mutable manifest
  options like max_manifest_file_size.
- Propagation: SetDBOptions() -> DBImpl::mutable_db_options_ ->
  versions_->UpdatedMutableDbOptions() -> VersionSet::verify_manifest_content_on_close_

## Part 2: Core Implementation — Content Validation in VersionSet::Close()
- Inserted after existing size check, before closed_ = true
- Opens manifest as SequentialFileReader, creates log::Reader with checksum=true
- Loops ReadRecord with WALRecoveryMode::kAbsoluteConsistency, decodes each
  record as VersionEdit
- On corruption: fires OnIOError listeners, logs error, calls LogAndApply with
  empty edit to trigger manifest rewrite from in-memory state
- If manifest can't be opened for reading: logs warning, doesn't fail close

## Part 3: Unit Tests (in version_set_test.cc)
- ManifestContentValidationOnClose_Clean: enable option, normal close, verify
  no manifest rotation
- ManifestContentValidationOnClose_CorruptRecord: enable option, corrupt manifest
  via SyncPoint, verify rotation occurs and DB reopens cleanly
- ManifestContentValidationOnClose_Disabled: default off, verify content
  validation does not run
- ManifestContentValidationOnClose_SizeCheckFails: truncate manifest so size
  check fails first, verify recovery via size-check path

## What Happens If a Corruption is Detected
If corruption was detected, four things happen:
1. **Notify listeners** — Fires `OnIOError` on all registered event listeners
   (from db_options_->listeners) so monitoring/alerting systems can observe
   the corruption event. Uses `FileOperationType::kVerify` to categorize it.
2. **Permit unchecked errors** — `PermitUncheckedError()` silences RocksDB's
   debug-mode assertion that every `IOStatus` must be inspected. These statuses
   are informational-only here; the real recovery is via `LogAndApply`.
3. **Log the error** — Writes a `ROCKS_LOG_ERROR` message with the filename
   for operational visibility (grep-able in production logs).
4. **Rewrite the manifest via `LogAndApply`** — This is the actual recovery.
   `LogAndApply` is called with an empty `VersionEdit` (no changes). Internally,
   `LogAndApply` detects that the current `descriptor_log_` is null (it was
   reset at line 5551, or by the previous `LogAndApply` in the size-check
   path) and creates a brand-new MANIFEST file. It serializes the entire
   current in-memory LSM state — all column families, all levels, all file
   metadata, sequence numbers, etc. — into this new file. It then atomically
   updates the `CURRENT` file pointer to reference the new MANIFEST.
   This works because the in-memory state was built from the original manifest
   during `DB::Open()` and has been kept fully up to date through all
   subsequent operations (flushes, compactions, etc.) during the DB's lifetime.
   The on-disk manifest is essentially a journal of changes; `LogAndApply`
   with an empty edit produces a fresh, compacted snapshot of that state.

## Flow Diagram of Manifest Content Validation

VersionSet::Close()
│
├─ Close descriptor_log_ and check size
│  └─ Size mismatch? → LogAndApply (rewrite manifest)
│
├─ Content validation (if s.ok() && option enabled)
│  ├─ Open manifest for sequential reading
│  │  └─ Can't open? → WARN log, continue
│  │
│  ├─ For each record:
│  │  ├─ ReadRecord (CRC32 check, kAbsoluteConsistency)
│  │  └─ DecodeFrom (VersionEdit logical check)
│  │
│  └─ Corruption detected?
│     ├─ Notify OnIOError listeners
│     ├─ LOG_ERROR
│     └─ LogAndApply (rewrite manifest from in-memory state)
│
└─ closed_ = true; return s;

## How This Relates to the Existing Size Check
The existing size check (lines 5556-5582) and the new content validation are
complementary:
| Check          | What it catches                         | How it checks              |
|----------------|-----------------------------------------|----------------------------|
| Size check     | Truncation, partial writes, extra bytes | Compare expected vs actual file size |
| Content check  | Bit-rot, silent corruption, bad records | CRC32 + VersionEdit decode |
The size check catches gross corruption (file too short or too long). The
content check catches subtle corruption where the file is the right size but
individual bytes have been flipped (e.g., storage media bit-rot, buggy
filesystem, incomplete block write).
Both recovery paths use the same mechanism: `LogAndApply` with an empty
`VersionEdit` to rewrite the manifest from in-memory state.

Pull Request resolved: https://github.com/facebook/rocksdb/pull/14451

Reviewed By: xingbowang

Differential Revision: D96004906

Pulled By: dannyhchen

fbshipit-source-id: 0b0ecdada3a74e97d2cadbba2091b8b577f1d684
2026-03-19 12:01:23 -07:00
..
cf_options.cc Add memtable MultiGet finger search optimization (#14428) 2026-03-16 10:45:49 -07:00
cf_options.h Add memtable MultiGet finger search optimization (#14428) 2026-03-16 10:45:49 -07:00
configurable.cc Preliminary support for custom compression algorithms (#13659) 2025-06-16 14:19:03 -07:00
configurable_helper.h Fix race to make BlockBasedTableOptions effectively mutable (#13082) 2024-10-25 10:24:54 -07:00
configurable_test.cc Run internal cpp modernizer on RocksDB repo (#12398) 2024-03-04 10:08:32 -08:00
configurable_test.h Standardize on clang-format version 18 (#13233) 2024-12-19 10:58:40 -08:00
customizable.cc Standardize on clang-format version 18 (#13233) 2024-12-19 10:58:40 -08:00
customizable_test.cc Remove deprecated SliceTransform::InRange() virtual method (#14353) 2026-02-19 16:45:51 -08:00
db_options.cc Add verify_manifest_content_on_close option (#14451) 2026-03-19 12:01:23 -07:00
db_options.h Add verify_manifest_content_on_close option (#14451) 2026-03-19 12:01:23 -07:00
offpeak_time_info.cc Mark more files for periodic compaction during offpeak (#12031) 2023-11-06 11:43:59 -08:00
offpeak_time_info.h Fix build on alpine 3.19 (#12345) 2024-02-12 11:24:56 -08:00
options.cc Add memtable MultiGet finger search optimization (#14428) 2026-03-16 10:45:49 -07:00
options_helper.cc Add verify_manifest_content_on_close option (#14451) 2026-03-19 12:01:23 -07:00
options_helper.h Support recompress-with-CompressionManager in sst_dump (#13783) 2025-07-18 14:22:29 -07:00
options_parser.cc Standardize on clang-format version 18 (#13233) 2024-12-19 10:58:40 -08:00
options_parser.h Standardize on clang-format version 18 (#13233) 2024-12-19 10:58:40 -08:00
options_settable_test.cc Add verify_manifest_content_on_close option (#14451) 2026-03-19 12:01:23 -07:00
options_test.cc Add option to verify file checksum of output files (#14433) 2026-03-11 21:14:28 -07:00