Summary: Adds new classes etc. in internal compression.h that are intended to become public APIs for supporting custom/pluggable compression. Some steps remain to allow for pluggable compression and to remove a lot of legacy code (e.g. now called `OLD_CompressData` and `OLD_UncompressData`), but this change refactors the key integration points of SST building and reading and compressed secondary cache over to the new APIs. Compared with the proposed https://github.com/facebook/rocksdb/issues/7650, this fixes a number of issues including * Making a clean divide between public and internal APIs (currently just indicated with comments) * Enough generality that built-in compressions generally fit into the framework rather than needing special treatment * Avoid exposing obnoxious idioms like `compress_format_version` to the user. * Enough generality that a compressor mixing algorithms/strategies from other compressors is pretty well supported without an extra schema layer * Explicit thread-safety contracts (carefully considered) * Contract details around schema compatibility and extension with code changes (more detail in next PR) * Customizable "working areas" (e.g. for ZSTD "context") * Decompression into an arbitrary memory location (rather than involving the decompressor in memory allocation; should facilitate reducing number of objects in block cache) Pull Request resolved: https://github.com/facebook/rocksdb/pull/13540 Test Plan: This is currently an internal refactor. More testing will come when the new API is migrated to the public API. A test in db_block_cache_test is updated to meaningfully cover a case (cache warming compression dictionary block) that was previously only covered in the crash test. SST write performance test, like https://github.com/facebook/rocksdb/issues/13583. Compile with CLANG, run before & after simultaneously: ``` SUFFIX=`tty | sed 's|/|_|g'`; for ARGS in "-compression_parallel_threads=1 -compression_type=none" "-compression_parallel_threads=1 -compression_type=snappy" "-compression_parallel_threads=1 -compression_type=zstd" "-compression_parallel_threads=1 -compression_type=zstd -verify_compression=1" "-compression_parallel_threads=1 -compression_type=zstd -compression_max_dict_bytes=8180" "-compression_parallel_threads=4 -compression_type=snappy"; do echo $ARGS; (for I in `seq 1 20`; do ./db_bench -db=/dev/shm/dbbench$SUFFIX --benchmarks=fillseq -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=1000 -fifo_compaction_allow_compaction=0 -disable_wal -write_buffer_size=12000000 $ARGS 2>&1 | grep micros/op; done) | awk '{n++; sum += $5;} END { print int(sum / n); }'; done ``` Before (this PR and with https://github.com/facebook/rocksdb/issues/13583 reverted): -compression_parallel_threads=1 -compression_type=none 1908372 -compression_parallel_threads=1 -compression_type=snappy 1926093 -compression_parallel_threads=1 -compression_type=zstd 1208259 -compression_parallel_threads=1 -compression_type=zstd -verify_compression=1 997583 -compression_parallel_threads=1 -compression_type=zstd -compression_max_dict_bytes=8180 934246 -compression_parallel_threads=4 -compression_type=snappy 1644849 After: -compression_parallel_threads=1 -compression_type=none 1956054 (+2.5%) -compression_parallel_threads=1 -compression_type=snappy 1911433 (-0.8%) -compression_parallel_threads=1 -compression_type=zstd 1205668 (-0.3%) -compression_parallel_threads=1 -compression_type=zstd -verify_compression=1 999263 (+0.2%) -compression_parallel_threads=1 -compression_type=zstd -compression_max_dict_bytes=8180 934322 (+0.0%) -compression_parallel_threads=4 -compression_type=snappy 1642519 (-0.2%) Pretty neutral change(s) overall. SST read performance test (related to https://github.com/facebook/rocksdb/issues/13583). Set up: ``` for COMP in none snappy zstd; do echo $ARGS; ./db_bench -db=/dev/shm/dbbench-$COMP --benchmarks=fillseq,flush -num=10000000 -compaction_style=2 -fifo_compaction_max_table_files_size_mb=1000 -fifo_compaction_allow_compaction=0 -disable_wal -write_buffer_size=12000000 -compression_type=$COMP; done ``` Test (compile with CLANG, run before & after simultaneously): ``` for COMP in none snappy zstd; do echo $COMP; (for I in `seq 1 5`; do ./db_bench -readonly -db=/dev/shm/dbbench-$COMP --benchmarks=readrandom -num=10000000 -duration=20 -threads=8 2>&1 | grep micros/op; done) | awk '{n++; sum += $5;} END { print int(sum / n); }'; done ``` Before (this PR and with https://github.com/facebook/rocksdb/issues/13583 reverted): none 1495646 snappy 1172443 zstd 706036 zstd (after constructing with -compression_max_dict_bytes=8180) 656182 After: none 1494981 (-0.0%) snappy 1171846 (-0.1%) zstd 696363 (-1.4%) zstd (after constructing with -compression_max_dict_bytes=8180) 667585 (+1.7%) Pretty neutral. Reviewed By: hx235 Differential Revision: D74626863 Pulled By: pdillinger fbshipit-source-id: dc8ff3178da9b4eaa7c16aa1bb910c872afaf14a
188 lines
6.8 KiB
C++
188 lines
6.8 KiB
C++
// Copyright (c) Meta Platforms, Inc. and affiliates.
|
|
// This source code is licensed under both the GPLv2 (found in the
|
|
// COPYING file in the root directory) and Apache 2.0 License
|
|
// (found in the LICENSE.Apache file in the root directory).
|
|
|
|
// Code supporting block cache (Cache) access for block-based table, based on
|
|
// the convenient APIs in typed_cache.h
|
|
|
|
#pragma once
|
|
|
|
#include <type_traits>
|
|
|
|
#include "cache/typed_cache.h"
|
|
#include "port/lang.h"
|
|
#include "table/block_based/block.h"
|
|
#include "table/block_based/block_type.h"
|
|
#include "table/block_based/parsed_full_filter_block.h"
|
|
#include "table/format.h"
|
|
|
|
namespace ROCKSDB_NAMESPACE {
|
|
|
|
// Metaprogramming wrappers for Block, to give each type a single role when
|
|
// used with FullTypedCacheInterface.
|
|
// (NOTE: previous attempts to create actual derived classes of Block with
|
|
// virtual calls resulted in performance regression)
|
|
|
|
class Block_kData : public Block {
|
|
public:
|
|
using Block::Block;
|
|
|
|
static constexpr CacheEntryRole kCacheEntryRole = CacheEntryRole::kDataBlock;
|
|
static constexpr BlockType kBlockType = BlockType::kData;
|
|
};
|
|
|
|
class Block_kIndex : public Block {
|
|
public:
|
|
using Block::Block;
|
|
|
|
static constexpr CacheEntryRole kCacheEntryRole = CacheEntryRole::kIndexBlock;
|
|
static constexpr BlockType kBlockType = BlockType::kIndex;
|
|
};
|
|
|
|
class Block_kFilterPartitionIndex : public Block {
|
|
public:
|
|
using Block::Block;
|
|
|
|
static constexpr CacheEntryRole kCacheEntryRole =
|
|
CacheEntryRole::kFilterMetaBlock;
|
|
static constexpr BlockType kBlockType = BlockType::kFilterPartitionIndex;
|
|
};
|
|
|
|
class Block_kRangeDeletion : public Block {
|
|
public:
|
|
using Block::Block;
|
|
|
|
static constexpr CacheEntryRole kCacheEntryRole = CacheEntryRole::kOtherBlock;
|
|
static constexpr BlockType kBlockType = BlockType::kRangeDeletion;
|
|
};
|
|
|
|
// Useful for creating the Block even though meta index blocks are not
|
|
// yet stored in block cache
|
|
class Block_kMetaIndex : public Block {
|
|
public:
|
|
using Block::Block;
|
|
|
|
static constexpr CacheEntryRole kCacheEntryRole = CacheEntryRole::kOtherBlock;
|
|
static constexpr BlockType kBlockType = BlockType::kMetaIndex;
|
|
};
|
|
|
|
struct BlockCreateContext : public Cache::CreateContext {
|
|
BlockCreateContext() {}
|
|
BlockCreateContext(const BlockBasedTableOptions* _table_options,
|
|
const ImmutableOptions* _ioptions, Statistics* _statistics,
|
|
Decompressor* _decompressor,
|
|
uint8_t _protection_bytes_per_key,
|
|
const Comparator* _raw_ucmp,
|
|
bool _index_value_is_full = false,
|
|
bool _index_has_first_key = false)
|
|
: table_options(_table_options),
|
|
ioptions(_ioptions),
|
|
statistics(_statistics),
|
|
decompressor(_decompressor),
|
|
raw_ucmp(_raw_ucmp),
|
|
protection_bytes_per_key(_protection_bytes_per_key),
|
|
index_value_is_full(_index_value_is_full),
|
|
index_has_first_key(_index_has_first_key) {}
|
|
|
|
const BlockBasedTableOptions* table_options = nullptr;
|
|
const ImmutableOptions* ioptions = nullptr;
|
|
Statistics* statistics = nullptr;
|
|
// TODO: refactor to avoid copying BlockCreateContext for dict in block cache
|
|
Decompressor* decompressor = nullptr;
|
|
const Comparator* raw_ucmp = nullptr;
|
|
uint8_t protection_bytes_per_key = 0;
|
|
bool index_value_is_full;
|
|
bool index_has_first_key;
|
|
|
|
// For TypedCacheInterface
|
|
template <typename TBlocklike>
|
|
inline void Create(std::unique_ptr<TBlocklike>* parsed_out,
|
|
size_t* charge_out, const Slice& data,
|
|
CompressionType type, MemoryAllocator* alloc) {
|
|
BlockContents uncompressed_block_contents;
|
|
if (type != CompressionType::kNoCompression) {
|
|
assert(decompressor != nullptr);
|
|
Status s =
|
|
DecompressBlockData(data.data(), data.size(), type, *decompressor,
|
|
&uncompressed_block_contents, *ioptions, alloc);
|
|
if (!s.ok()) {
|
|
parsed_out->reset();
|
|
return;
|
|
}
|
|
} else {
|
|
uncompressed_block_contents =
|
|
BlockContents(AllocateAndCopyBlock(data, alloc), data.size());
|
|
}
|
|
Create(parsed_out, std::move(uncompressed_block_contents));
|
|
*charge_out = parsed_out->get()->ApproximateMemoryUsage();
|
|
}
|
|
|
|
void Create(std::unique_ptr<Block_kData>* parsed_out, BlockContents&& block);
|
|
void Create(std::unique_ptr<Block_kIndex>* parsed_out, BlockContents&& block);
|
|
void Create(std::unique_ptr<Block_kFilterPartitionIndex>* parsed_out,
|
|
BlockContents&& block);
|
|
void Create(std::unique_ptr<Block_kRangeDeletion>* parsed_out,
|
|
BlockContents&& block);
|
|
void Create(std::unique_ptr<Block_kMetaIndex>* parsed_out,
|
|
BlockContents&& block);
|
|
void Create(std::unique_ptr<ParsedFullFilterBlock>* parsed_out,
|
|
BlockContents&& block);
|
|
void Create(std::unique_ptr<DecompressorDict>* parsed_out,
|
|
BlockContents&& block);
|
|
};
|
|
|
|
// Convenient cache interface to use for block_cache, with support for
|
|
// SecondaryCache.
|
|
template <typename TBlocklike>
|
|
using BlockCacheInterface =
|
|
FullTypedCacheInterface<TBlocklike, BlockCreateContext>;
|
|
|
|
// Shortcut name for cache handles under BlockCacheInterface
|
|
template <typename TBlocklike>
|
|
using BlockCacheTypedHandle =
|
|
typename BlockCacheInterface<TBlocklike>::TypedHandle;
|
|
|
|
// Selects the right helper based on BlockType and CacheTier
|
|
const Cache::CacheItemHelper* GetCacheItemHelper(
|
|
BlockType block_type,
|
|
CacheTier lowest_used_cache_tier = CacheTier::kNonVolatileBlockTier);
|
|
|
|
// For SFINAE check that a type is "blocklike" with a kCacheEntryRole member.
|
|
// Can get difficult compiler/linker errors without a good check like this.
|
|
template <typename TUse, typename TBlocklike>
|
|
using WithBlocklikeCheck = std::enable_if_t<
|
|
TBlocklike::kCacheEntryRole == CacheEntryRole::kMisc || true, TUse>;
|
|
|
|
// Helper for the uncache_aggressiveness option
|
|
class UncacheAggressivenessAdvisor {
|
|
public:
|
|
UncacheAggressivenessAdvisor(uint32_t uncache_aggressiveness) {
|
|
assert(uncache_aggressiveness > 0);
|
|
allowance_ = std::min(uncache_aggressiveness, uint32_t{3});
|
|
threshold_ = std::pow(0.99, uncache_aggressiveness - 1);
|
|
}
|
|
void Report(bool erased) { ++(erased ? useful_ : not_useful_); }
|
|
bool ShouldContinue() {
|
|
if (not_useful_ < allowance_) {
|
|
return true;
|
|
} else {
|
|
// See UncacheAggressivenessAdvisor unit test
|
|
return (useful_ + 1.0) / (useful_ + not_useful_ - allowance_ + 1.5) >=
|
|
threshold_;
|
|
}
|
|
}
|
|
|
|
private:
|
|
// Baseline minimum number of "not useful" to consider stopping, to allow
|
|
// sufficient evidence for checking the threshold. Actual minimum will be
|
|
// higher as threshold gets well below 1.0.
|
|
int allowance_;
|
|
// After allowance, stop if useful ratio is below this threshold
|
|
double threshold_;
|
|
// Counts
|
|
int useful_ = 0;
|
|
int not_useful_ = 0;
|
|
};
|
|
|
|
} // namespace ROCKSDB_NAMESPACE
|