rocksdb/memory/memory_allocator_impl.h
Peter Dillinger ee6b0def55 Refactor, improve CompressedSecondaryCache (#13797)
Summary:
To be compatible with some upcoming compression change/refactoring where we supply a fixed size buffer to CompressBlock, we need to support CompressedSecondaryCache storing uncompressed values when the compression ratio is not suitable. It seems crazy that CompressedSecondaryCache currently stores compressed values that are *larger* than the uncompressed value, and even explicitly exercises that case (almost exclusively) in the existing unit tests. But it's true.

This change fixes that with some other nearby refactoring/improvement:
* Update the in-memory representation of these cache entries to support uncompressed entries even when compression is enabled. AFAIK this also allows us to safely get rid of "don't support custom split/merge for the tiered case".
* Use more efficient in-memory representation for non-split entries
  * For CompressionType and CacheTier, which are defined as single-byte data types, use a single byte instead of varint32. (I don't know if varint32 was an attempt at future-proofing for a memory-only schema or what.) Now using lossless_cast will raise a compiler error if either of these types is made too large for a single byte.
  * Don't wrap entries in a CacheAllocationPtr object; it's not necessary. We can rely on the same allocator being provided at delete time.
* Restructure serialization/deserialization logic, hopefully simpler or easier to read/understand.
* Use a RelaxedAtomic for disable_cache_ to avoid race.

Suggested follow-up on CompressedSecondaryCache:
* Refine the exact strategy for rejecting compressions
* Still have a lot of buffer copies; try to reduce
* Revisit the split-merge logic and try to make it more efficient overall, more unified with non-split case

Pull Request resolved: https://github.com/facebook/rocksdb/pull/13797

Test Plan:
Unit tests updated to use actually compressible strings in many places and more testing around non-compressible string.

## Performance Test
There was some pre-existing issue causing decompression failures in compressed secondary cache with cache_bench that is somehow fixed in this change. This decompression failures were present before the new compression API, but since then cause assertion failures rather than being quietly ignored. For the "before" test here, they are back to quietly ignored. And the cache_bench changes here were  back-ported to the "before" configuration.

### No compressed secondary (setting expectations)
```
./cache_bench --cache_type=auto_hyper_clock_cache -cache_size=8000000000 -populate_cache
```
Max key             : 3906250

Before:
Complete in 12.784 s; Rough parallel ops/sec = 2503123
Thread ops/sec = 160329; Lookup hit ratio: 0.686771

After:
Complete in 12.745 s; Rough parallel ops/sec = 2510717 (in the noise)
Thread ops/sec = 159498; Lookup hit ratio: 0.68686

### Compressed secondary, no split/merge
Same max key and approximate total memory size
```
/usr/bin/time ./cache_bench --cache_type=auto_hyper_clock_cache -cache_size=4000000000 -populate_cache -resident_ratio=0.125 -compressible_to_ratio=0.4 --secondary_cache_uri=compressed_secondary_cache://capacity=4000000000
```
Before:
Complete in 18.690 s; Rough parallel ops/sec = 1712144
Thread ops/sec = 108683; Lookup hit ratio: 0.776683
Latency: P50: 4205.19 P75: 15281.76 P99: 43810.98 P99.9: 71487.41 P99.99: 165453.32
max RSS (according to /usr/bin/time): 9341856

After:
Complete in 17.878 s; Rough parallel ops/sec = 1789951 (+4.5%)
Thread ops/sec = 114957; Lookup hit ratio: 0.792998 (+0.016)
Latency: P50: 4012.70 P75: 14477.63 P99: 40039.70 P99.9: 62521.04 P99.99: 167049.18
max RSS (according to /usr/bin/time): 9235688

The improved hit ratio is probably from fixing the failed decompressions (somehow). And my modifications could have improved CPU efficiency, or it could be the small penalty the benchmark naturally imposes on most misses (generate another value and insert it).

### Compressed secondary, with split/merge
```
/usr/bin/time ./cache_bench --cache_type=auto_hyper_clock_cache -cache_size=4000000000 -populate_cache -resident_ratio=0.125 -compressible_to_ratio=0.4 --secondary_cache_uri='compressed_secondary_cache://capacity=4000000000;enable_custom_split_merge=true'
```
Before:
Complete in 20.062 s; Rough parallel ops/sec = 1595075
Thread ops/sec = 101759; Lookup hit ratio: 0.787129
Latency: P50: 5338.53 P75: 16073.46 P99: 46752.65 P99.9: 73459.11 P99.99: 201318.75
max RSS (according to /usr/bin/time): 9049852

After:
Complete in 18.564 s; Rough parallel ops/sec = 1723771 (+8.1%)
Thread ops/sec = 110724; Lookup hit ratio: 0.813414 (+0.026)
Latency: P50: 5234.75 P75: 14590.43 P99: 41401.03 P99.9: 65606.50 P99.99: 157248.04
max RSS (according to /usr/bin/time): 8917592

Looks like an improvement

Reviewed By: anand1976

Differential Revision: D78842120

Pulled By: pdillinger

fbshipit-source-id: 5f754b160c37ebee789279178ebb5e862071bdb2
2025-07-25 13:39:25 -07:00

47 lines
1.3 KiB
C++

// Copyright (c) 2011-present, Facebook, Inc. All rights reserved.
// This source code is licensed under both the GPLv2 (found in the
// COPYING file in the root directory) and Apache 2.0 License
// (found in the LICENSE.Apache file in the root directory).
//
#pragma once
#include <algorithm>
#include "rocksdb/memory_allocator.h"
namespace ROCKSDB_NAMESPACE {
struct CacheAllocationDeleter {
CacheAllocationDeleter(MemoryAllocator* a = nullptr) : allocator(a) {}
void operator()(char* ptr) const {
if (allocator) {
allocator->Deallocate(ptr);
} else {
delete[] ptr;
}
}
MemoryAllocator* allocator;
};
using CacheAllocationPtr = std::unique_ptr<char[], CacheAllocationDeleter>;
inline CacheAllocationPtr AllocateBlock(size_t size,
MemoryAllocator* allocator) {
if (allocator) {
auto block = static_cast<char*>(allocator->Allocate(size));
return CacheAllocationPtr(block, allocator);
}
return CacheAllocationPtr(new char[size]);
}
inline CacheAllocationPtr AllocateAndCopyBlock(const Slice& data,
MemoryAllocator* allocator) {
CacheAllocationPtr cap = AllocateBlock(data.size(), allocator);
std::copy_n(data.data(), data.size(), cap.get());
return cap;
}
} // namespace ROCKSDB_NAMESPACE