RocksDB corruption - any way to fix? #1107
Labels
No labels
Bug
Cherry-picking
Database
Dependencies
Dependencies/Renovate
Difficulty
Easy
Difficulty
Hard
Difficulty
Medium
Documentation
Enhancement
Good first issue
Help wanted
Inherited
Matrix/Administration
Matrix/Appservices
Matrix/Auth
Matrix/Client
Matrix/Core
Matrix/Federation
Matrix/Hydra
Matrix/MSC
Matrix/Media
Meta
Meta/CI
Meta/Packaging
Priority
Blocking
Priority
High
Priority
Low
Security
Status/Blocked
Status
Confirmed
Status
Duplicate
Status
Invalid
Status
Needs Investigation
To-Merge
Wont fix
old/ci/cd
old/rust
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
continuwuation/continuwuity#1107
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
(Without throwing away all my existing data and starting over; I'm expecting some loss at this point.)
After a power outage, I was getting this error on startup:
Critical error starting server: I/O error: Corruption: no next_file_number entry in MANIFEST The file /data/db/MANIFEST-11265429 may be corrupted
I backed up the (corrupted) database before attempting any repair, so starting from that point again is possible. (And I've done it several times.)
After a repair completes (
CONDUWUIT_ROCKSDB_REPAIR=true
), withCONDUWUIT_ROCKSDB_RECOVERY_MODE
set to1
,2
, or3
, I get the following error:Critical error starting server: I/O error: Invalid argument: Column families not opened: roomid_lasttypingupdate, userid_lastpresenceupdate, typingid_userid
Allowing it to restart after that, it either hangs indefinitely (seemingly rewriting the database into new files over and over, based on watching
strace
and the DB directory) or results in continuwuity coming up healthy, but with the database fully wiped.Any additional troubleshooting I can perform or tips for possible repair would be extremely appreciated. If I were to lose those three column families - I'm assuming they represent last typing updates, last presence updates, and user typing relations - I wouldn't be particularly heartbroken, but private (unfederated) room event histories, space/room hierarchy, users, etc are things I'm hoping to be able to recover, at least mostly.
Running Continuwuity v0.5.0-rc6 in a container, in case it's relevant
If none of the recovery modes allowed you to get going again, it's highly likely that you have severe data loss and what you could get back even from a successful recovery at this point might not even be of worth.
Your best bet is to restore to a backup from before the corruption point, although from the tone of your issue I'm assuming you don't have one of those. You can try using some external rocksdb tools to tinker with the db - a quick search of the error message you get after a repair completes seems to indicate that the columns
roomid_lasttypingupdate
,userid_lastpresenceupdate
, andtypingid_userid
don't exist, so you could try manually creating some empty columns and retrying the repair, but I don't know enough about rocksdb myself to help you more than that :(Nope, no working backups. I don't know much about rocksdb, and a couple brief searches over the years never yielded any results for backup utilities that wouldn't require shutting the server down.
It's hard to imagine that the bulk of the data is cooked - it's not like days/weeks/months old files were being written to at the time of the power loss. It would only be recent data and (by the looks of things) that
MANIFEST
file. Hoping that there's a rocksdb expert floating around here somewhere :)There's built-in support for online backups by sending the
!admin server backup-database
command to the admin room (possibly on a crontab) - see https://continuwuity.org/maintenance#backups for future reference.Feel free to mention it in either #main:continuwuity.org and/or our dev room at #dev:continuwuity.org, there's generally more eyes there than the issue tracker (no guarantees though, rocksdb is a little bit of magic scribed in a GitHub wiki of all things)