Error sending events of any kind: "event incorrectly had zero prev_events" #969

Open
opened 2025-09-01 11:54:21 +00:00 by nex · 0 comments
Owner

Sometimes, when sending an event, the error message "event incorrectly had zero prev_events" with HTTP 400 will be returned. This is a confusing error, but is actually merely a symptom of a deeper issue.

Before 543ab27747 introduced some additional safety checks, which introduced this error. This check has been very useful as it prevents depth resets, which have historically caused problems with state resolution. However, it does intentionally then lead to a room voluntarily becoming unusable.

This is typically not a problem, as what happens here is extremity exhaustion. If you as the reader don't know what an "extremity" is, each time a new event is sent in a room, it references up to 20 previous events (in this case, edges specifically) in order to make sure there's as few leaves in the room graph as possible. In a perfectly linear room, the dag will be a straight line with only one edge - the latest event. Once another message is sent, it will reference that previous extremity, and then itself become an extremity. When federation is involved, if two servers send an event at the same time, they both reference only the last event, and both events become extremities.
When constructing an event to persist into the room, servers must select at least one previous event (the only case this doesn't happen is with the room create event, because there are no previous events to pick from), up to at most twenty. Once a third event in this theoretical room is sent, it will reference both edges, resulting in one forward extremity yet again, restoring linearity.
As such, when extremity exhaustion is encountered, another server will typically send an event which allows continuwuity to add it as an extremity, fixing the issue. This does, however, become problematic if you encounter this issue in a room within which other servers are unlikely to or are unable to send further events.

The issue here is that the function that selects extremities for a room somehow ends up with zero extremities, as the error implies. Before 543ab27747, the event builder would just smile and nod at this, and happily send {"type": "m.room.message", "prev_events": [], "depth": 1}, which as mentioned previously, has caused issues.

In order to fix this, the set_forward_extremities and get_forward_extremities functions need to be traced to see what causes this. I've noticed it happens particularly around the time states reset, but that might just be coincidental. This is a really difficult one to troubleshoot, because there's no reliable reproduction steps, and the logging required for this is only really something a debug build can achieve. I've been running a debug build for months now but haven't managed to repro the issue anywhere but my main deployment, which is on fire and exploding even with the max-perf build on the best of days, I don't think it can even start in debug mode.

If you end up like I did with my policy room with the "event incorrectly had zero prev_events" error, and you can't resolve the issue by receiving an event from another server, you can use the nex/feat/manual-extremities branch branch, which introduces the !admin debug force-append-last-extremity command. Just run !admin debug force-append-last-extremity <room ID> and you should be able to use the room again. This likely won't be merged into main as the bug should just be fixed rather than painting over it.


TL;DR:

  • "event incorrectly had zero prev_events" is a safety check and symptom of a deeper problem
  • Extremities become exhausted in a room (only in the database), preventing creating new events
  • Another server sending an event fixes this
  • The root cause of the extremity exhaustion is unknown
  • The issue is difficult to debug as there is no known way to trigger the bug in the first place
  • There is a temporary fix in the nex/feat/manual-extremities branch branch
Sometimes, when sending an event, the error message "event incorrectly had zero prev_events" with HTTP 400 will be returned. This is a confusing error, but is actually merely a symptom of a deeper issue. Before 543ab27747dbf31b583175dd81d7d3aa3c82df79 introduced some additional safety checks, which introduced this error. This check has been very useful as it prevents depth resets, which have historically caused problems with state resolution. However, it does intentionally then lead to a room voluntarily becoming unusable. This is typically not a problem, as what happens here is *extremity exhaustion*. If you as the reader don't know what an "extremity" is, each time a new event is sent in a room, it references up to 20 previous events (in this case, edges specifically) in order to make sure there's as few leaves in the room graph as possible. In a perfectly linear room, the dag will be a straight line with only one edge - the latest event. Once another message is sent, it will reference that previous extremity, and then itself become an extremity. When federation is involved, if two servers send an event at the same time, they both reference only the last event, and both events become extremities. When constructing an event to persist into the room, servers must select at least one previous event (the only case this doesn't happen is with the room create event, because there *are* no previous events to pick from), up to at most twenty. Once a third event in this theoretical room is sent, it will reference *both* edges, resulting in one forward extremity yet again, restoring linearity. As such, when extremity exhaustion is encountered, another server will typically send an event which allows continuwuity to add it as an extremity, fixing the issue. This does, however, become problematic if you encounter this issue in a room within which other servers are unlikely to or are unable to send further events. The issue here is that the function that selects extremities for a room somehow ends up with *zero* extremities, as the error implies. Before 543ab27747dbf31b583175dd81d7d3aa3c82df79, the event builder would just smile and nod at this, and happily send `{"type": "m.room.message", "prev_events": [], "depth": 1}`, which as mentioned previously, has caused issues. In order to fix this, the `set_forward_extremities` and `get_forward_extremities` functions need to be traced to see what causes this. I've noticed it happens particularly around the time states reset, but that might just be coincidental. This is a really difficult one to troubleshoot, because there's no reliable reproduction steps, and the logging required for this is only really something a debug build can achieve. I've been running a debug build for months now but haven't managed to repro the issue anywhere but my main deployment, which is on fire and exploding even with the max-perf build on the best of days, I don't think it can even start in debug mode. If you end up like I did with my policy room with the "event incorrectly had zero prev_events" error, and you can't resolve the issue by receiving an event from another server, you can use the [`nex/feat/manual-extremities branch`](https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/nex/feat/manual-extremities) branch, which introduces the `!admin debug force-append-last-extremity` command. Just run `!admin debug force-append-last-extremity <room ID>` and you should be able to use the room again. This likely won't be merged into main as the bug should just be fixed rather than painting over it. --- TL;DR: - "event incorrectly had zero prev_events" is a safety check and symptom of a deeper problem - Extremities become exhausted in a room (only in the database), preventing creating new events - Another server sending an event fixes this - The root cause of the extremity exhaustion is unknown - The issue is difficult to debug as there is no known way to trigger the bug in the first place - There is a temporary fix in the [`nex/feat/manual-extremities branch`](https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/nex/feat/manual-extremities) branch
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: continuwuation/continuwuity#969
No description provided.