Summary:
Add a circular ring buffer (InjectedErrorLog) to FaultInjectionTestFS that records the last 1000 injected errors. Each entry captures the timestamp, thread ID, FS API name with all arguments (file path, offset, buffer size, first 8 bytes of data in hex), and the injected error status. For example:
Append("/path/035354.sst", size=4096, head=[1a 2b 3c ...]) -> IOError (Retryable): injected write error
RenameFile("/path/tmp.sst", "/path/035355.sst") -> IOError: injected metadata write error
Read(offset=16384, size=4096) -> IOError (Retryable): injected read error
The ring buffer is printed to a file automatically:
- On any fatal signal (SIGABRT, SIGSEGV, SIGTERM, SIGINT, SIGHUP, SIGFPE, SIGBUS, SIGILL, SIGQUIT, SIGXCPU, SIGXFSZ, SIGSYS) via a registered crash callback
- At the end of db_stress main(), for diagnostic visibility even when the test completes normally
This addresses a key debugging gap: when write fault injection causes secondary failures (e.g., the builder error propagation issue in T257612259), the injected errors were previously completely silent with no logging trail. The ring buffer provides the missing diagnostic context to correlate fault injection with downstream failures.
Changes:
- port/stack_trace.h/.cc: Add RegisterCrashCallback() API; extend InstallStackTraceHandler() to catch all catchable termination signals
- utilities/fault_injection_fs.h: Add InjectedErrorLog class with printf-style Record(), HexHead() for data bytes, and signal-safe PrintAll()
- utilities/fault_injection_fs.cc: Record full API arguments and error status at all 31 fault injection call sites
- db_stress_tool/db_stress_tool.cc: Register crash callback and print ring buffer at end of main()
Pull Request resolved: https://github.com/facebook/rocksdb/pull/14431
Reviewed By: hx235
Differential Revision: D95435430
Pulled By: xingbowang
fbshipit-source-id: 6c18e1b072044575d6c8c3f198070127b0f80608
40 lines
1.6 KiB
C++
40 lines
1.6 KiB
C++
// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
// (found in the LICENSE.Apache file in the root directory).
|
|
//
|
|
#pragma once
|
|
|
|
#include "rocksdb/rocksdb_namespace.h"
|
|
|
|
namespace ROCKSDB_NAMESPACE {
|
|
namespace port {
|
|
|
|
// Install a signal handler to print callstack on the following signals:
|
|
// SIGILL SIGSEGV SIGBUS SIGABRT
|
|
// And also (Linux ony for now) overrides security settings to allow outside
|
|
// processes to attach to this one as a debugger. ONLY USE FOR NON-SECURITY
|
|
// CRITICAL PROCESSES such as unit tests or benchmarking tools.
|
|
// Currently supports only some POSIX implementations. No-op otherwise.
|
|
void InstallStackTraceHandler();
|
|
|
|
// Prints stack, skips skip_first_frames frames
|
|
void PrintStack(int first_frames_to_skip = 0);
|
|
|
|
// Prints the given callstack
|
|
void PrintAndFreeStack(void* callstack, int num_frames);
|
|
|
|
// Save the current callstack
|
|
void* SaveStack(int* num_frame, int first_frames_to_skip = 0);
|
|
|
|
// Register a callback to be invoked when a fatal signal is received,
|
|
// before the stack trace is printed. This is useful for printing diagnostic
|
|
// information (e.g., recently injected errors) when a crash occurs.
|
|
// The callback must only call async-signal-safe functions (write, snprintf,
|
|
// etc.) or functions that are safe enough in practice (fprintf to stderr).
|
|
// Only one callback is supported; subsequent calls overwrite the previous one.
|
|
using CrashCallback = void (*)();
|
|
void RegisterCrashCallback(CrashCallback callback);
|
|
|
|
} // namespace port
|
|
} // namespace ROCKSDB_NAMESPACE
|