FR: More helpful INFO logging during database open stage to detect corrupted db deadlock #1610
Labels
No labels
Blocked
Bug
Changelog
Added
Changelog
Missing
Changelog
None
Cherry-picking
Database
Dependencies
Dependencies/Renovate
Difficulty
Easy
Difficulty
Hard
Difficulty
Medium
Documentation
Enhancement
Good first issue
Help wanted
Inherited
Matrix/Administration
Matrix/Appservices
Matrix/Auth
Matrix/Client
Matrix/Core
Matrix/E2EE
Matrix/Federation
Matrix/Hydra
Matrix/MSC
Matrix/Media
Matrix/T&S
Merge
Merge/Manual
Merge/Squash
Meta
Meta/CI
Meta/Packaging
Priority
Blocking
Priority
High
Priority
Low
Security
Status
Confirmed
Status
Duplicate
Status
Invalid
Status
Needs Investigation
Support
Wont fix
old/ci/cd
old/rust
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
continuwuation/continuwuity#1610
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The problem
If the rocksdb is corrupted, it's possible that the database open will deadlock and never complete. This is extremely confusing to the admin given that the last log message they see is:
The next line should be:
.. And by not receiving that line, you're left in the dark about what's going on.
Impact
Extreme admin confusion. The service is "up" but tcp connect to all endpoints is failing and no errors are emitted in the log. Also your matrix homeserver is now down so you can't go ask the chat what the hell is going on. :-)
Proposed solution
Be slightly more verbose in startup logging by adding a new emitted INFO line, something like:
Obviously it would be nicer if db corruption didn't just deadlock startup in the first place, but that's a hard problem to fix, and verbosity would at least make it easier to figure out how screwed you are.
I think I wrote some additional logs for the startup process that may help identify when stuff gets stuck, I'll see if I can find it again soon
Actually, does #1494 not suit your needs?
Seems plausible it might help indeed. I guess I'll see once that hits a release and rocksdb dies the next time. :-)
I'll close this as resolved then, feel free to re-open after the next release if you think the issue still applies!