Error sending events of any kind: "event incorrectly had zero prev_events" #969
Labels
No labels
Bug
Cherry-picking
Database
Dependencies
Dependencies/Renovate
Difficulty
Easy
Difficulty
Hard
Difficulty
Medium
Documentation
Enhancement
Good first issue
Help wanted
Inherited
Matrix/Administration
Matrix/Appservices
Matrix/Auth
Matrix/Client
Matrix/Core
Matrix/Federation
Matrix/MSC
Matrix/Media
Meta
Meta/Packaging
Priority
Blocking
Priority
High
Priority
Low
Security
Status
Confirmed
Status
Duplicate
Status
Invalid
Status
Needs Investigation
To-Merge
Wont fix
old/ci/cd
old/rust
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: continuwuation/continuwuity#969
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Sometimes, when sending an event, the error message "event incorrectly had zero prev_events" with HTTP 400 will be returned. This is a confusing error, but is actually merely a symptom of a deeper issue.
Before
543ab27747
introduced some additional safety checks, which introduced this error. This check has been very useful as it prevents depth resets, which have historically caused problems with state resolution. However, it does intentionally then lead to a room voluntarily becoming unusable.This is typically not a problem, as what happens here is extremity exhaustion. If you as the reader don't know what an "extremity" is, each time a new event is sent in a room, it references up to 20 previous events (in this case, edges specifically) in order to make sure there's as few leaves in the room graph as possible. In a perfectly linear room, the dag will be a straight line with only one edge - the latest event. Once another message is sent, it will reference that previous extremity, and then itself become an extremity. When federation is involved, if two servers send an event at the same time, they both reference only the last event, and both events become extremities.
When constructing an event to persist into the room, servers must select at least one previous event (the only case this doesn't happen is with the room create event, because there are no previous events to pick from), up to at most twenty. Once a third event in this theoretical room is sent, it will reference both edges, resulting in one forward extremity yet again, restoring linearity.
As such, when extremity exhaustion is encountered, another server will typically send an event which allows continuwuity to add it as an extremity, fixing the issue. This does, however, become problematic if you encounter this issue in a room within which other servers are unlikely to or are unable to send further events.
The issue here is that the function that selects extremities for a room somehow ends up with zero extremities, as the error implies. Before
543ab27747
, the event builder would just smile and nod at this, and happily send{"type": "m.room.message", "prev_events": [], "depth": 1}
, which as mentioned previously, has caused issues.In order to fix this, the
set_forward_extremities
andget_forward_extremities
functions need to be traced to see what causes this. I've noticed it happens particularly around the time states reset, but that might just be coincidental. This is a really difficult one to troubleshoot, because there's no reliable reproduction steps, and the logging required for this is only really something a debug build can achieve. I've been running a debug build for months now but haven't managed to repro the issue anywhere but my main deployment, which is on fire and exploding even with the max-perf build on the best of days, I don't think it can even start in debug mode.If you end up like I did with my policy room with the "event incorrectly had zero prev_events" error, and you can't resolve the issue by receiving an event from another server, you can use the
nex/feat/manual-extremities branch
branch, which introduces the!admin debug force-append-last-extremity
command. Just run!admin debug force-append-last-extremity <room ID>
and you should be able to use the room again. This likely won't be merged into main as the bug should just be fixed rather than painting over it.TL;DR:
nex/feat/manual-extremities branch
branch