Updating profile information with a large account effectively crashes the server #1205
Labels
No labels
Bug
Cherry-picking
Database
Dependencies
Dependencies/Renovate
Difficulty
Easy
Difficulty
Hard
Difficulty
Medium
Documentation
Enhancement
Good first issue
Help wanted
Inherited
Matrix/Administration
Matrix/Appservices
Matrix/Auth
Matrix/Client
Matrix/Core
Matrix/Federation
Matrix/Hydra
Matrix/MSC
Matrix/Media
Meta
Meta/CI
Meta/Packaging
Priority
Blocking
Priority
High
Priority
Low
Security
Status/Blocked
Status
Confirmed
Status
Duplicate
Status
Invalid
Status
Needs Investigation
Support
To-Merge
Wont fix
old/ci/cd
old/rust
No milestone
No project
No assignees
1 participant
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
continuwuation/continuwuity#1205
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
The other day, I updated my global profile picture temporarily, which required sending a new membership event to over 800 rooms. For better or worse, continuwuity does not send these updates concurrently, in this case meaning that I had set off a chain reaction that would eventually result in my homeserver being DDOSsed by remote servers. I ended up adding in a sleep between each profile update in an attempt to combat this effect (
nex/continuwuity@fb38be9c84, since commented out since I don't need it anymore).However, my initial suspicions that it was the media fetching that was bringing my homeserver to a halt (I mean, I had 8000 incoming stalled connections all waiting on something, and media is generally larger than most API responses) turned out to be incorrect - using an MXC provided by another server still caused my homeserver to get absolutely hammered, bringing it down a third time, even with the 5 second sleep between each update.
After investigating my reverse proxy logs, I saw that there were thousands of (as mentioned, concurrent) requests to
/_matrix/federation/v1/state_ids/...(fetching the state at the membership event I was sending), followed closely by a similar number of calls to/_matrix/federation/v1/get_missing_events/...(fetching the events that are missing). After blocking thestate_idsendpoint in my reverse proxy, a bunch of servers started rejecting my new membership event, whereas another set of them continued to just tryget_missing_events(presumably already had the required state locally? or something?). I blockedget_missing_eventsalso, and it looks like the origins started fetching each event individually instead, which was much easier for my server to process.I think we should investigate the performance of
state_idsandget_missing_events, and potentially see if we could benefit from adding a cache or two in there?