perf: Attempt to prevent people joining known busted rooms #1503

Open
nex wants to merge 3 commits from nex/feat/block-busted-rooms into main
Owner

As people keep setting up a server and immediately start trying to join rooms that have caused performance issues for even some of the beefiest servers in the network, this PR introduces a more drastic measure to prevent people footgunning - a list of room IDs are now hardcoded to be blocked, which prevents even admins joining them, unless a config option is enabled.

This is necessary since people keep trying to join, for example, the Matrix Community space, and being unable to do so, or being able to do so, but later having their machines absolutely crushed trying to resolve the room's state some time later (see: pretty much everyone on the maintainer team, federated.nexus, even some big name public deployments have recently started banning this room). Once joined, leaving itself is a difficult process, and simply participating in the room is enough to cause performance issues, which is terrible for anyone who is just getting started.

Pull request checklist:

  • This pull request targets the main branch, and the branch is named something other than
    main.
  • I have written an appropriate pull request title and my description is clear.
  • I understand I am responsible for the contents of this pull request.
  • I have followed the contributing guidelines:
As people keep setting up a server and immediately start trying to join rooms that have caused performance issues for even some of the beefiest servers in the network, this PR introduces a more drastic measure to prevent people footgunning - a list of room IDs are now hardcoded to be blocked, which prevents even admins joining them, unless a config option is enabled. This is necessary since people keep trying to join, for example, the Matrix Community space, and being unable to do so, or being able to do so, but later having their machines absolutely crushed trying to resolve the room's state some time later (see: pretty much everyone on the maintainer team, federated.nexus, even some big name public deployments have recently started banning this room). Once joined, leaving itself is a difficult process, and simply participating in the room is enough to cause performance issues, which is *terrible* for anyone who is just getting started. <!-- Example: This pull request allows us to warp through time and space ten times faster than before by double-inverting the warp drive with hyperheated jump fluid, both making the drive faster and more efficient. This resolves the common issue where we have to wait more than 10 milliseconds to engage, use, and disengage the warp drive when travelling between galaxies. --> <!-- Closes: #... --> <!-- Fixes: #... --> <!-- Uncomment the above line(s) if your pull request fixes an issue or closes another pull request by superseding it. Replace `#...` with the issue/pr number, such as `#123`. --> **Pull request checklist:** <!-- You need to complete these before your PR can be considered. If you aren't sure about some, feel free to ask for clarification in #dev:continuwuity.org. --> - [x] This pull request targets the `main` branch, and the branch is named something other than `main`. - [x] I have written an appropriate pull request title and my description is clear. - [x] I understand I am responsible for the contents of this pull request. - I have followed the [contributing guidelines][c1]: - [x] My contribution follows the [code style][c2], if applicable. - [x] I ran [pre-commit checks][c1pc] before opening/drafting this pull request. - [ ] I have [tested my contribution][c1t] (or proof-read it for documentation-only changes) myself, if applicable. This includes ensuring code compiles. - [x] My commit messages follow the [commit message format][c1cm] and are descriptive. - [x] I have written a [news fragment][n1] for this PR, if applicable<!--(can be done after hitting open!)-->. <!-- Notes on these requirements: - While not required, we encourage you to sign your commits with GPG or SSH to attest the authenticity of your changes. - While we allow LLM-assisted contributions, we do not appreciate contributions that are low quality, which is typical of machine-generated contributions that have not had a lot of love and care from a human. Please do not open a PR if all you have done is asked ChatGPT to tidy up the codebase with a +-100,000 diff. - In the case of code style violations, reviewers may leave review comments/change requests indicating what the ideal change would look like. For example, a reviewer may suggest you lower a log level, or use `match` instead of `if/else` etc. - In the case of code style violations, pre-commit check failures, minor things like typos/spelling errors, and in some cases commit format violations, reviewers may modify your branch directly, typically by making changes and adding a commit. Particularly in the latter case, a reviewer may rebase your commits to squash "spammy" ones (like "fix", "fix", "actually fix"), and reword commit messages that don't satisfy the format. - Pull requests MUST pass the `Checks` CI workflows to be capable of being merged. This can only be bypassed in exceptional circumstances. If your CI flakes, let us know in matrix:r/dev:continuwuity.org. - Pull requests have to be based on the latest `main` commit before being merged. If the main branch changes while you're making your changes, you should make sure you rebase on main before opening a PR. Your branch will be rebased on main before it is merged if it has fallen behind. - We typically only do fast-forward merges, so your entire commit log will be included. Once in main, it's difficult to get out cleanly, so put on your best dress, smile for the cameras! --> [c1]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/CONTRIBUTING.md [c2]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/docs/development/code_style.mdx [c1pc]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/CONTRIBUTING.md#pre-commit-checks [c1t]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/CONTRIBUTING.md#running-tests-locally [c1cm]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/CONTRIBUTING.md#commit-messages [n1]: https://towncrier.readthedocs.io/en/stable/tutorial.html#creating-news-fragments
perf: Attempt to prevent people joining known busted rooms
Some checks failed
Update flake hashes / update-flake-hashes (pull_request) Waiting to run
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
511bb8bf55
chore: Correct news frag file name
All checks were successful
Documentation / Build and Deploy Documentation (pull_request) Successful in 3m7s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 3m14s
Update flake hashes / update-flake-hashes (pull_request) Successful in 50s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 19m54s
b42e6a67f0
@ -79,0 +87,4 @@
if !services.config.allow_joining_broken_rooms
&& BROKEN_ROOM_IDS.contains(&room_id.as_str())
{
return Err!(Request(Forbidden("This room is too complex.")));
Owner

we may want to add a new section to the FAQ and have this error message link to it

we may want to add a new section to the FAQ and have this error message link to it
Contributor

it could probably also do with being more specific, maybe something like "This room is known to (be broken / cause issues)."
i would hazard a guess that most users will not understand what "too complex" means for a room

it could probably also do with being more specific, maybe something like "This room is known to (be broken / cause issues)." i would hazard a guess that most users will not understand what "too complex" means for a room
Author
Owner

"This room is too complex" is easy enough to find in documentation without overloading the actual error response with too much information. "This room is known to be broken" sounds scary and also isn't exactly true (these rooms aren't broken, just so complex that they're essentially broken) and "this room is known to cause issues" is too vague. "too complex" describes the problem precisely.

"This room is too complex" is easy enough to find in documentation without overloading the actual error response with too much information. "This room is known to be broken" sounds scary and also isn't exactly true (these rooms aren't broken, just so complex that they're essentially broken) and "this room is known to cause issues" is too vague. "too complex" describes the problem precisely.
Contributor

Perhaps "This room is known to cause issues due to being too complex"?

Perhaps "This room is known to cause issues due to being too complex"?
Contributor

That is not a very fun compromise. I agree that complexity is a precise description, but it still sounds odd to unfamiliar ears. I would lean more towards "This room's history is too complex" to be clear about what exactly complexity entails in this instance.

That is not a very fun compromise. I agree that complexity is a precise description, but it still sounds odd to unfamiliar ears. I would lean more towards "This room's history is too complex" to be clear about what exactly complexity entails in this instance.
nex added this to the next milestone 2026-03-07 17:07:10 +00:00
@ -1522,0 +1534,4 @@
# forgo your right to complain about any slowdowns or inflated resource
# usage you encounter.
#
#allow_joining_broken_rooms = false
Contributor

couldn't it be better if admins could tweak the list? removing or adding individual rooms? We just gave them the default list as guidance.

couldn't it be better if admins could tweak the list? removing or adding individual rooms? We just gave them the default list as guidance.
Author
Owner

!admin rooms moderation ban-room exists for user-configurable room bans; this is merely meant to provide a "default" set that prevents new users joining rooms that will just destroy their server while they're none the wiser.

`!admin rooms moderation ban-room` exists for user-configurable room bans; this is merely meant to provide a "default" set that prevents new users joining rooms that will just destroy their server while they're none the wiser.
Contributor

as in brick their server? or just require they evict a user in a race condition?

i agree there is overlap between the functions ban-room and the proposed room filter list. But unless there's a pattern of complete bricking, i'm against taking control away from admins completely. If they want to erase or comment out our recommendations, they will learn about race conditions and log spam.

I saw a similar thing indented deep in the code, a hard-coded rule. We should probably find out why these rooms break. Like if this room breaks because it's v5, we can adjust our interface in general for all room v5s. If it's just a bad room, we can add it to the proposed list. I am not sure we should be hard-coding edge cases throughout the production code, I would think it's better to leave them configurable.

...
		.ruma_route(&client::get_hierarchy_route)
		.ruma_route(&client::get_mutual_rooms_route)
		.ruma_route(&client::get_room_summary)
		.route(
			"/_matrix/client/unstable/im.nheko.summary/rooms/{room_id_or_alias}/summary",
			get(client::get_room_summary_legacy)
		)
		.ruma_route(&client::get_suspended_status)
		.ruma_route(&client::put_suspended_status)
		.ruma_route(&client::well_known_support)
...
as in brick their server? or just require they evict a user in a race condition? i agree there is overlap between the functions `ban-room` and the proposed room filter list. But unless there's a pattern of complete bricking, i'm against taking control away from admins completely. If they want to erase or comment out our recommendations, they will learn about race conditions and log spam. I saw a similar thing indented deep in the code, a hard-coded rule. We should probably find out why these rooms break. Like if this room breaks because it's v5, we can adjust our interface in general for all room v5s. If it's just a bad room, we can add it to the proposed list. I am not sure we should be hard-coding edge cases throughout the production code, I would think it's better to leave them configurable. ```rust ... .ruma_route(&client::get_hierarchy_route) .ruma_route(&client::get_mutual_rooms_route) .ruma_route(&client::get_room_summary) .route( "/_matrix/client/unstable/im.nheko.summary/rooms/{room_id_or_alias}/summary", get(client::get_room_summary_legacy) ) .ruma_route(&client::get_suspended_status) .ruma_route(&client::put_suspended_status) .ruma_route(&client::well_known_support) ... ```
Contributor

perhaps we could even just combine the two lists before even feeding it to the server, and treat ban/disable as the same

perhaps we could even just combine the two lists before even feeding it to the server, and treat ban/disable as the same
Author
Owner

as in brick their server?

Pretty much.

i'm against taking control away from admins completely

I'm not sure what control this takes away? It simply prevents people joining a room without realising it will blow up their server. It can be turned on and off at will, and is independent from the runtime-configured bans, which are typically for a different purpose.

they will learn about race conditions and log spam.

The problem is this is the current system and it is resulting in almost daily people coming into our main room and complaining of slow joins / slow server / high CPU & RAM usage / insane disk usage inflation. We clearly need something to prevent people who are uneducated on the rooms they're joining from blowing up their server and not knowing until it's too late.

We should probably find out why these rooms break

It's because of state resets and/or insanely deep auth chains

Like if this room breaks because it's v5, we can adjust our interface in general for all room v5s.

We can't blanket affect like this - some v5 rooms (for example) are perfectly fine, whereas some are practically unusable. There's no one-size-fits-all :(

I am not sure we should be hard-coding edge cases throughout the production code

This is basically a last-resort. I don't want to do this either but I'm not seeing another option.

I would think it's better to leave them configurable.

I still don't understand this - you can manually ban and unban rooms with the relevant room moderation commands, that is configuration. Why would you add to the hardcoded list if you can just... use the admin command for the same effect?

> as in brick their server? Pretty much. > i'm against taking control away from admins completely I'm not sure what control this takes away? It simply prevents people joining a room without realising it will blow up their server. It can be turned on and off at will, and is independent from the runtime-configured bans, which are typically for a different purpose. > they will learn about race conditions and log spam. The problem is this is the current system and it is resulting in almost daily people coming into our main room and complaining of slow joins / slow server / high CPU & RAM usage / insane disk usage inflation. We clearly need something to prevent people who are uneducated on the rooms they're joining from blowing up their server and not knowing until it's too late. > We should probably find out why these rooms break It's because of state resets and/or insanely deep auth chains > Like if this room breaks because it's v5, we can adjust our interface in general for all room v5s. We can't blanket affect like this - some v5 rooms (for example) are perfectly fine, whereas some are practically unusable. There's no one-size-fits-all :( > I am not sure we should be hard-coding edge cases throughout the production code This is basically a last-resort. I don't want to do this either but I'm not seeing another option. > I would think it's better to leave them configurable. I still don't understand this - you can manually ban and unban rooms with the relevant room moderation commands, that *is* configuration. Why would you add to the hardcoded list if you can just... use the admin command for the same effect?
Contributor

When I said this room breaks bc v5 i meant the quoted rust block giving a custom legacy route to the nheko room. IF ALL v5 rooms need that legacy route, handle it as such... don't hard-code one or two common or popular cases. Right? Does that make sense?

I hadn't considered the impact of having other servers with race conditions join ours. Well, that still feels like something we should be filtering on our end. And we can't control if it afflicts conduwuit, tuwunel, or synapse. So we need a solution from our side, imo.

I still don't understand this - you can manually ban and unban rooms with the relevant room moderation commands, that is configuration. Why would you add to the hardcoded list if you can just... use the admin command for the same effect?

I would rather see a redundant configuration value (as silly as it feels) than have hard-coded strings (cough tech debt) living throughout the code.

There could be tons of people already in a race condition and their admins just never check the logs.

Based on my federation metrics, very few people even adopt the latest version. I think we can do a much better job investigating this and come up with a long-term solution, sooner and before we will see significant return on these hard-coded rules.

When I said this room breaks bc v5 i meant the quoted rust block giving a custom legacy route to the `nheko` room. IF ALL v5 rooms need that legacy route, handle it as such... don't hard-code one or two common or popular cases. Right? Does that make sense? I hadn't considered the impact of having other servers with race conditions join ours. Well, that still feels like something we should be filtering on our end. And we can't control if it afflicts conduwuit, tuwunel, or synapse. So we need a solution from our side, imo. > I still don't understand this - you can manually ban and unban rooms with the relevant room moderation commands, that is configuration. Why would you add to the hardcoded list if you can just... use the admin command for the same effect? I would rather see a redundant configuration value (as silly as it feels) than have hard-coded strings (*cough* tech debt) living throughout the code. There could be tons of people already in a race condition and their admins just never check the logs. Based on my federation metrics, very few people even adopt the latest version. I think we can do a much better job investigating this and come up with a long-term solution, sooner and before we will see significant return on these hard-coded rules.
Author
Owner

The route you highlighted isn't a "legacy" route, that's just the unstable path for /_matrix/client/v1/room_summary/{roomId}, presumably just missing a route definition in ruwuma

I would rather see a redundant configuration value (as silly as it feels) than have hard-coded strings (cough tech debt) living throughout the code.

I disagree with the notion of "tech debt" here - redundant configuration leaves room for error due to additional moving parts, and there's very little "tech debt" when this is basically just an append-only array of strings, which ideally won't even need updating all that often.

Based on my federation metrics, very few people even adopt the latest version

This is primarily targeting new deployments, not existing ones - and nobody in their right mind is deploying outdated versions. Also, 0.5.6 has been out less than a week. People often have auto-updates set up, which means there will probably be a delay in doing so.

I think we can do a much better job investigating this and come up with a long-term solution, sooner and before we will see significant return on these hard-coded rules.

I'll happily accept alternatives later down the line in another PR or whatever, but right now we need something now, and the team lacks the time to investigate this more than "we know what the issue is". Given it's taking down even huge servers that aren't even continuwuity, I fear this may not be something we can "fix".

The route you highlighted isn't a "legacy" route, that's just the unstable path for `/_matrix/client/v1/room_summary/{roomId}`, presumably just missing a route definition in ruwuma > I would rather see a redundant configuration value (as silly as it feels) than have hard-coded strings (cough tech debt) living throughout the code. I disagree with the notion of "tech debt" here - redundant configuration leaves room for error due to additional moving parts, and there's very little "tech debt" when this is basically just an append-only array of strings, which ideally won't even need updating all that often. > Based on my federation metrics, very few people even adopt the latest version This is primarily targeting new deployments, not existing ones - and nobody in their right mind is deploying outdated versions. Also, 0.5.6 has been out less than a week. People often have auto-updates set up, which means there will probably be a delay in doing so. > I think we can do a much better job investigating this and come up with a long-term solution, sooner and before we will see significant return on these hard-coded rules. I'll happily accept alternatives later down the line in another PR or whatever, but right now we need something *now*, and the team lacks the time to investigate this more than "we know what the issue is". Given it's taking down even huge servers that aren't even continuwuity, I fear this may not be something we can "fix".
Contributor

I see upwards of half of peers are not even on 0.5.5 yet.

I'm not explaining myself the best here, ugh. I'm going to log in my nightly account and explain some points in the morning hopefully.

I'm okay if this gets merged as is, but I would rather not have you depend on others submitting "PRs down the line." The original author really needs to be the one most committed to following up with a better implementation.

I see upwards of half of peers are not even on 0.5.5 yet. I'm not explaining myself the best here, ugh. I'm going to log in my nightly account and explain some points in the morning hopefully. I'm okay if this gets merged as is, but I would rather not have you depend on others submitting "PRs down the line." The original author really needs to be the one most committed to following up with a better implementation.
nex marked this conversation as resolved
@ -61,0 +67,4 @@
"!MBrxZRUoApYYjmyion:t2bot.io", // Old t2bot room - insane auth chain depths
"izahlpcyIDeymNjiOd:matrix.debian.social", // #debian-next:matrix.debian.social
"!mefQhZzgTaxNCNzAeK:kde.org", // KDE user help
"!OTxETzuhBDbnPqBqbP:kde.org", // KDE space
Contributor

oh hell yeah homie, i'm gonna join them all on my nightly account 🙌 LFG keep it maintained

oh hell yeah homie, i'm gonna join them all on my nightly account 🙌 LFG keep it maintained
nex marked this conversation as resolved
gamesguru left a comment
Contributor

Please avoid hard-coding configuration values in production rust code. Brainstorm an approach which uses dynamic configuration if possible.

Please avoid hard-coding configuration values in production rust code. Brainstorm an approach which uses dynamic configuration if possible.
Contributor

also, please request a complement test report from me. my PR is not yet merged, but i can get you test results in 15-20 minutes. This looks relatively harmless from a regression standpoint, but i want to start testing our code more please

also, please request a complement test report from me. my PR is not yet merged, but i can get you test results in 15-20 minutes. This looks relatively harmless from a regression standpoint, but i want to start testing our code more please
ginger approved these changes 2026-03-08 12:04:26 +00:00
Author
Owner

There's no reason to run complement against this

There's no reason to run complement against this
fix: Missing sigil
All checks were successful
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m11s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 2m37s
Update flake hashes / update-flake-hashes (pull_request) Successful in 47s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 11m53s
4951d6c7b9
@ -61,0 +62,4 @@
"!iMZEhwCvbfeAYUxAjZ:t2l.io", // Matrix community space - insanely broken state
"!OGEhHVWSdvArJzumhm:matrix.org", // Old Matrix HQ - huge room, very broken
"!IemiTbwVankHTFiEoh:matrix.org", // Old Element Web - huge room, very broken
"!brXHJeAtqliwNGqHQx:lossy.network", // NixOS space - frequent bug reports, huge state
Contributor

IDK, I haven't had issues with the NixOS space, and I'd prefer to be able to use it. IMO either only ban the really really bad ones or make the banned room list configurable. I'd prefer to not need to patch the code for my builds.

IDK, I haven't had issues with the NixOS space, and I'd prefer to be able to use it. IMO either only ban the really really bad ones or make the banned room list configurable. I'd prefer to not need to patch the code for my builds.
Contributor

Update to my previous reply: nevermind, I am not exactly sure how or why this is the actual NixOS space, but I was wrong. I don't think this should be banned either.

Update to my previous reply: nevermind, I am not exactly sure how or why this is the actual NixOS space, but I was wrong. I don't think this should be banned either.
First-time contributor

Considering the issue is likely to persist, I think it's desirable to have the list being configurable instead of hardcoded.

Considering the issue is likely to persist, I think it's desirable to have the list being configurable instead of hardcoded.
Contributor

It's hard to see an outcome that everyone is happy with. The point of the PR is to have hard-coded maintainer-curated rooms banned by default. We don't have issues with one of them, but nex wouldn't put it here for no reason, so I assume lots of people have had issues with it. The best solution to this I can see is to make the banned rooms accessible from configuration, not just admin commands, and set the list from this PR as its default.

Now, that could introduce confusing behaviour, since there will be two separate points of configuration via the configuration and admin commands. My suggestion for that would be to make the configuration something like bootstrap_banned_rooms, so the list pre-fills the banned rooms list the first time the server starts and it's admin commands from there onwards. Either that, or you could cut off the admin command access when the value is set in config, but I think the first option makes more sense.

At the end of the day, it's up to the maintainers.

It's hard to see an outcome that everyone is happy with. The point of the PR is to have hard-coded maintainer-curated rooms banned by default. We don't have issues with one of them, but nex wouldn't put it here for no reason, so I assume lots of people have had issues with it. The best solution to this I can see is to make the banned rooms accessible from configuration, not just admin commands, and set the list from this PR as its default. Now, that could introduce confusing behaviour, since there will be two separate points of configuration via the configuration and admin commands. My suggestion for that would be to make the configuration something like `bootstrap_banned_rooms`, so the list pre-fills the banned rooms list the first time the server starts and it's admin commands from there onwards. Either that, or you could cut off the admin command access when the value is set in config, but I think the first option makes more sense. At the end of the day, it's up to the maintainers.
Owner

This all feels like over-complication and an introduction point for confusing behaviour and unhelpful error messages. If users want to take over this list, they can disable it and ban the rooms themselves.

This all feels like over-complication and an introduction point for confusing behaviour and unhelpful error messages. If users want to take over this list, they can disable it and ban the rooms themselves.
First-time contributor

[...] If users want to take over this list, they can disable it and ban the rooms themselves.

Such approach I believe wouldn't account for auto_deactivate_banned_room_attempts, the proposed mechanism is much earlier nope out that could've been used for "soft" banning complex rooms as well. In any case, I don't have a strong opinion about this PR from the server operator's point of view, I believe the maintainers could make the proper judgement.

> [...] If users want to take over this list, they can disable it and ban the rooms themselves. Such approach I believe wouldn't account for `auto_deactivate_banned_room_attempts`, the proposed mechanism is much earlier nope out that could've been used for "soft" banning complex rooms as well. In any case, I don't have a strong opinion about this PR from the server operator's point of view, I believe the maintainers could make the proper judgement.
Contributor

I would like more details of performance issues attached in the future. Unless you offer a better impl, it's hard to judge where the code's old sticking points were.

Whether that's a heatmap SVG visual or just some explanation or adding me to a group chat.

But words like "insanely broken" are inherently vague. On my stateless sync branch, I was able to fully load the 28K members (very shocking the Cinny and Element UI can actually handle lists that long) by adjusting state resolution logic regarding outlier PDUs.

I am left questioning whether the state of the room, or our resolution logic is the more broken thing.

Regardless, I think I'm not alone in this fugue, and we would all benefit immensely from more diagnostic details better outlining the exact root cause of the performance degredations. That would better put us in position to understand the PR and what it means/does.

I would like more details of performance issues attached in the future. Unless you offer a better impl, it's hard to judge where the code's old sticking points were. Whether that's a heatmap SVG visual or just some explanation or adding me to a group chat. But words like "insanely broken" are inherently vague. On my stateless sync branch, I was able to fully load the 28K members (very shocking the Cinny and Element UI can actually handle lists that long) by adjusting state resolution logic regarding outlier PDUs. I am left questioning whether the state of the room, or our resolution logic is the more broken thing. Regardless, I think I'm not alone in this fugue, and we would all benefit immensely from more diagnostic details better outlining the exact root cause of the performance degredations. That would better put us in position to understand the PR and what it means/does.
Contributor

@gamesguru wrote in #1503 (comment):

I would like more details of performance issues attached in the future. Unless you offer a better impl, it's hard to judge where the code's old sticking points were.

Whether that's a heatmap SVG visual or just some explanation or adding me to a group chat.

But words like "insanely broken" are inherently vague. On my stateless sync branch, I was able to fully load the 28K members (very shocking the Cinny and Element UI can actually handle lists that long) by adjusting state resolution logic regarding outlier PDUs.

I am left questioning whether the state of the room, or our resolution logic is the more broken thing.

Regardless, I think I'm not alone in this fugue, and we would all benefit immensely from more diagnostic details better outlining the exact root cause of the performance degredations. That would better put us in position to understand the PR and what it means/does.

I mean, I think this PR is ultimately helpful, its known that these rooms cause a lot of issues, check your processing PDUs and you can often see PDUs from these rooms taking multiple minutes, and using up a lot of CPU. I just would like the list to be configurable, maybe. But, I guess Jade is right, and an admin can deactivate this measure and ban the rooms manually.

@gamesguru wrote in https://forgejo.ellis.link/continuwuation/continuwuity/pulls/1503#issuecomment-26254: > I would like more details of performance issues attached in the future. Unless you offer a better impl, it's hard to judge where the code's old sticking points were. > > Whether that's a heatmap SVG visual or just some explanation or adding me to a group chat. > > But words like "insanely broken" are inherently vague. On my stateless sync branch, I was able to fully load the 28K members (very shocking the Cinny and Element UI can actually handle lists that long) by adjusting state resolution logic regarding outlier PDUs. > > I am left questioning whether the state of the room, or our resolution logic is the more broken thing. > > Regardless, I think I'm not alone in this fugue, and we would all benefit immensely from more diagnostic details better outlining the exact root cause of the performance degredations. That would better put us in position to understand the PR and what it means/does. I mean, I think this PR is ultimately helpful, its known that these rooms cause a lot of issues, check your processing PDUs and you can often see PDUs from these rooms taking multiple minutes, and using up a lot of CPU. I just would like the list to be configurable, maybe. But, I guess Jade is right, and an admin can deactivate this measure and ban the rooms manually.
All checks were successful
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m11s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 2m37s
Required
Details
Update flake hashes / update-flake-hashes (pull_request) Successful in 47s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 11m53s
Required
Details
This pull request is blocked because it's outdated.
This branch is out-of-date with the base branch
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin nex/feat/block-busted-rooms:nex/feat/block-busted-rooms
git switch nex/feat/block-busted-rooms
Sign in to join this conversation.
No milestone
No project
No assignees
7 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
continuwuation/continuwuity!1503
No description provided.