perf: Attempt to prevent people joining known busted rooms #1503

Open
nex wants to merge 2 commits from nex/feat/block-busted-rooms into main
Owner

As people keep setting up a server and immediately start trying to join rooms that have caused performance issues for even some of the beefiest servers in the network, this PR introduces a more drastic measure to prevent people footgunning - a list of room IDs are now hardcoded to be blocked, which prevents even admins joining them, unless a config option is enabled.

This is necessary since people keep trying to join, for example, the Matrix Community space, and being unable to do so, or being able to do so, but later having their machines absolutely crushed trying to resolve the room's state some time later (see: pretty much everyone on the maintainer team, federated.nexus, even some big name public deployments have recently started banning this room). Once joined, leaving itself is a difficult process, and simply participating in the room is enough to cause performance issues, which is terrible for anyone who is just getting started.

Pull request checklist:

  • This pull request targets the main branch, and the branch is named something other than
    main.
  • I have written an appropriate pull request title and my description is clear.
  • I understand I am responsible for the contents of this pull request.
  • I have followed the contributing guidelines:
As people keep setting up a server and immediately start trying to join rooms that have caused performance issues for even some of the beefiest servers in the network, this PR introduces a more drastic measure to prevent people footgunning - a list of room IDs are now hardcoded to be blocked, which prevents even admins joining them, unless a config option is enabled. This is necessary since people keep trying to join, for example, the Matrix Community space, and being unable to do so, or being able to do so, but later having their machines absolutely crushed trying to resolve the room's state some time later (see: pretty much everyone on the maintainer team, federated.nexus, even some big name public deployments have recently started banning this room). Once joined, leaving itself is a difficult process, and simply participating in the room is enough to cause performance issues, which is *terrible* for anyone who is just getting started. <!-- Example: This pull request allows us to warp through time and space ten times faster than before by double-inverting the warp drive with hyperheated jump fluid, both making the drive faster and more efficient. This resolves the common issue where we have to wait more than 10 milliseconds to engage, use, and disengage the warp drive when travelling between galaxies. --> <!-- Closes: #... --> <!-- Fixes: #... --> <!-- Uncomment the above line(s) if your pull request fixes an issue or closes another pull request by superseding it. Replace `#...` with the issue/pr number, such as `#123`. --> **Pull request checklist:** <!-- You need to complete these before your PR can be considered. If you aren't sure about some, feel free to ask for clarification in #dev:continuwuity.org. --> - [x] This pull request targets the `main` branch, and the branch is named something other than `main`. - [x] I have written an appropriate pull request title and my description is clear. - [x] I understand I am responsible for the contents of this pull request. - I have followed the [contributing guidelines][c1]: - [x] My contribution follows the [code style][c2], if applicable. - [x] I ran [pre-commit checks][c1pc] before opening/drafting this pull request. - [ ] I have [tested my contribution][c1t] (or proof-read it for documentation-only changes) myself, if applicable. This includes ensuring code compiles. - [x] My commit messages follow the [commit message format][c1cm] and are descriptive. - [x] I have written a [news fragment][n1] for this PR, if applicable<!--(can be done after hitting open!)-->. <!-- Notes on these requirements: - While not required, we encourage you to sign your commits with GPG or SSH to attest the authenticity of your changes. - While we allow LLM-assisted contributions, we do not appreciate contributions that are low quality, which is typical of machine-generated contributions that have not had a lot of love and care from a human. Please do not open a PR if all you have done is asked ChatGPT to tidy up the codebase with a +-100,000 diff. - In the case of code style violations, reviewers may leave review comments/change requests indicating what the ideal change would look like. For example, a reviewer may suggest you lower a log level, or use `match` instead of `if/else` etc. - In the case of code style violations, pre-commit check failures, minor things like typos/spelling errors, and in some cases commit format violations, reviewers may modify your branch directly, typically by making changes and adding a commit. Particularly in the latter case, a reviewer may rebase your commits to squash "spammy" ones (like "fix", "fix", "actually fix"), and reword commit messages that don't satisfy the format. - Pull requests MUST pass the `Checks` CI workflows to be capable of being merged. This can only be bypassed in exceptional circumstances. If your CI flakes, let us know in matrix:r/dev:continuwuity.org. - Pull requests have to be based on the latest `main` commit before being merged. If the main branch changes while you're making your changes, you should make sure you rebase on main before opening a PR. Your branch will be rebased on main before it is merged if it has fallen behind. - We typically only do fast-forward merges, so your entire commit log will be included. Once in main, it's difficult to get out cleanly, so put on your best dress, smile for the cameras! --> [c1]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/CONTRIBUTING.md [c2]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/docs/development/code_style.mdx [c1pc]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/CONTRIBUTING.md#pre-commit-checks [c1t]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/CONTRIBUTING.md#running-tests-locally [c1cm]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/CONTRIBUTING.md#commit-messages [n1]: https://towncrier.readthedocs.io/en/stable/tutorial.html#creating-news-fragments
perf: Attempt to prevent people joining known busted rooms
Some checks failed
Update flake hashes / update-flake-hashes (pull_request) Waiting to run
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
511bb8bf55
chore: Correct news frag file name
All checks were successful
Documentation / Build and Deploy Documentation (pull_request) Successful in 3m7s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 3m14s
Update flake hashes / update-flake-hashes (pull_request) Successful in 50s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 19m54s
b42e6a67f0
@ -79,0 +87,4 @@
if !services.config.allow_joining_broken_rooms
&& BROKEN_ROOM_IDS.contains(&room_id.as_str())
{
return Err!(Request(Forbidden("This room is too complex.")));
Owner

we may want to add a new section to the FAQ and have this error message link to it

we may want to add a new section to the FAQ and have this error message link to it
nex added this to the next milestone 2026-03-07 17:07:10 +00:00
@ -1522,0 +1534,4 @@
# forgo your right to complain about any slowdowns or inflated resource
# usage you encounter.
#
#allow_joining_broken_rooms = false
Contributor

couldn't it be better if admins could tweak the list? removing or adding individual rooms? We just gave them the default list as guidance.

couldn't it be better if admins could tweak the list? removing or adding individual rooms? We just gave them the default list as guidance.
Author
Owner

!admin rooms moderation ban-room exists for user-configurable room bans; this is merely meant to provide a "default" set that prevents new users joining rooms that will just destroy their server while they're none the wiser.

`!admin rooms moderation ban-room` exists for user-configurable room bans; this is merely meant to provide a "default" set that prevents new users joining rooms that will just destroy their server while they're none the wiser.
Contributor

as in brick their server? or just require they evict a user in a race condition?

i agree there is overlap between the functions ban-room and the proposed room filter list. But unless there's a pattern of complete bricking, i'm against taking control away from admins completely. If they want to erase or comment out our recommendations, they will learn about race conditions and log spam.

I saw a similar thing indented deep in the code, a hard-coded rule. We should probably find out why these rooms break. Like if this room breaks because it's v5, we can adjust our interface in general for all room v5s. If it's just a bad room, we can add it to the proposed list. I am not sure we should be hard-coding edge cases throughout the production code, I would think it's better to leave them configurable.

...
		.ruma_route(&client::get_hierarchy_route)
		.ruma_route(&client::get_mutual_rooms_route)
		.ruma_route(&client::get_room_summary)
		.route(
			"/_matrix/client/unstable/im.nheko.summary/rooms/{room_id_or_alias}/summary",
			get(client::get_room_summary_legacy)
		)
		.ruma_route(&client::get_suspended_status)
		.ruma_route(&client::put_suspended_status)
		.ruma_route(&client::well_known_support)
...
as in brick their server? or just require they evict a user in a race condition? i agree there is overlap between the functions `ban-room` and the proposed room filter list. But unless there's a pattern of complete bricking, i'm against taking control away from admins completely. If they want to erase or comment out our recommendations, they will learn about race conditions and log spam. I saw a similar thing indented deep in the code, a hard-coded rule. We should probably find out why these rooms break. Like if this room breaks because it's v5, we can adjust our interface in general for all room v5s. If it's just a bad room, we can add it to the proposed list. I am not sure we should be hard-coding edge cases throughout the production code, I would think it's better to leave them configurable. ```rust ... .ruma_route(&client::get_hierarchy_route) .ruma_route(&client::get_mutual_rooms_route) .ruma_route(&client::get_room_summary) .route( "/_matrix/client/unstable/im.nheko.summary/rooms/{room_id_or_alias}/summary", get(client::get_room_summary_legacy) ) .ruma_route(&client::get_suspended_status) .ruma_route(&client::put_suspended_status) .ruma_route(&client::well_known_support) ... ```
Contributor

perhaps we could even just combine the two lists before even feeding it to the server, and treat ban/disable as the same

perhaps we could even just combine the two lists before even feeding it to the server, and treat ban/disable as the same
Author
Owner

as in brick their server?

Pretty much.

i'm against taking control away from admins completely

I'm not sure what control this takes away? It simply prevents people joining a room without realising it will blow up their server. It can be turned on and off at will, and is independent from the runtime-configured bans, which are typically for a different purpose.

they will learn about race conditions and log spam.

The problem is this is the current system and it is resulting in almost daily people coming into our main room and complaining of slow joins / slow server / high CPU & RAM usage / insane disk usage inflation. We clearly need something to prevent people who are uneducated on the rooms they're joining from blowing up their server and not knowing until it's too late.

We should probably find out why these rooms break

It's because of state resets and/or insanely deep auth chains

Like if this room breaks because it's v5, we can adjust our interface in general for all room v5s.

We can't blanket affect like this - some v5 rooms (for example) are perfectly fine, whereas some are practically unusable. There's no one-size-fits-all :(

I am not sure we should be hard-coding edge cases throughout the production code

This is basically a last-resort. I don't want to do this either but I'm not seeing another option.

I would think it's better to leave them configurable.

I still don't understand this - you can manually ban and unban rooms with the relevant room moderation commands, that is configuration. Why would you add to the hardcoded list if you can just... use the admin command for the same effect?

> as in brick their server? Pretty much. > i'm against taking control away from admins completely I'm not sure what control this takes away? It simply prevents people joining a room without realising it will blow up their server. It can be turned on and off at will, and is independent from the runtime-configured bans, which are typically for a different purpose. > they will learn about race conditions and log spam. The problem is this is the current system and it is resulting in almost daily people coming into our main room and complaining of slow joins / slow server / high CPU & RAM usage / insane disk usage inflation. We clearly need something to prevent people who are uneducated on the rooms they're joining from blowing up their server and not knowing until it's too late. > We should probably find out why these rooms break It's because of state resets and/or insanely deep auth chains > Like if this room breaks because it's v5, we can adjust our interface in general for all room v5s. We can't blanket affect like this - some v5 rooms (for example) are perfectly fine, whereas some are practically unusable. There's no one-size-fits-all :( > I am not sure we should be hard-coding edge cases throughout the production code This is basically a last-resort. I don't want to do this either but I'm not seeing another option. > I would think it's better to leave them configurable. I still don't understand this - you can manually ban and unban rooms with the relevant room moderation commands, that *is* configuration. Why would you add to the hardcoded list if you can just... use the admin command for the same effect?
@ -61,0 +67,4 @@
"!MBrxZRUoApYYjmyion:t2bot.io", // Old t2bot room - insane auth chain depths
"izahlpcyIDeymNjiOd:matrix.debian.social", // #debian-next:matrix.debian.social
"!mefQhZzgTaxNCNzAeK:kde.org", // KDE user help
"!OTxETzuhBDbnPqBqbP:kde.org", // KDE space
Contributor

oh hell yeah homie, i'm gonna join them all on my nightly account 🙌 LFG keep it maintained

oh hell yeah homie, i'm gonna join them all on my nightly account 🙌 LFG keep it maintained
gamesguru left a comment
Contributor

Please avoid hard-coding configuration values in production rust code. Brainstorm an approach which uses dynamic configuration if possible.

Please avoid hard-coding configuration values in production rust code. Brainstorm an approach which uses dynamic configuration if possible.
All checks were successful
Documentation / Build and Deploy Documentation (pull_request) Successful in 3m7s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 3m14s
Required
Details
Update flake hashes / update-flake-hashes (pull_request) Successful in 50s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 19m54s
Required
Details
This pull request is blocked because it's outdated.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin nex/feat/block-busted-rooms:nex/feat/block-busted-rooms
git switch nex/feat/block-busted-rooms
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
continuwuation/continuwuity!1503
No description provided.