calling policy servers #857

Open
nex wants to merge 10 commits from nex/serving-policy into main
Owner

Allows us to call MSC4284 policy servers to determine if an event should be soft failed or not.

  • Call policy server on incoming event
  • Don't send redactions for spam events to clients
  • Prevent sending events that fail the policy checks
Allows us to call MSC4284 policy servers to determine if an event should be soft failed or not. - [X] Call policy server on incoming event - [x] Don't send redactions for spam events to clients - [x] Prevent sending events that fail the policy checks
nex added the
Enhancement
Matrix/Federation
Matrix/MSC
Priority
Low
labels 2025-06-20 00:55:45 +00:00
nex self-assigned this 2025-06-20 00:55:45 +00:00
nex added 6 commits 2025-06-20 00:55:45 +00:00
(cherry picked from commit e1c06e10f6)
(cherry picked from commit a8c10f6317)
(cherry picked from commit 3b876ea029b7ec44db143ead31e1696eb4c44b5e)
Need to figure out why signing is sad

(cherry picked from commit 6134b971633c9981871c1c9d060a677588714141)
(cherry picked from commit f0ff0e4599c3db8786693e43691b164bb0b7e3c3)
no!
Some checks failed
Release Docker Image / merge (push) Blocked by required conditions
Release Docker Image / define-variables (push) Successful in 2s
Rust Checks / Format (push) Failing after 55s
Rust Checks / Clippy (push) Failing after 4m5s
Documentation / Build and Deploy Documentation (pull_request) Successful in 36s
Rust Checks / Cargo Test (push) Successful in 5m55s
Release Docker Image / build-image (linux/amd64, linux-amd64) (push) Has been cancelled
Release Docker Image / build-image (linux/arm64, linux-arm64) (push) Has been cancelled
124de732e4
(cherry picked from commit a3d8e222a2d3a45ccef6142a55f7ff8a8a265ad4)
nex added 1 commit 2025-06-20 01:01:30 +00:00
Run cargo fmt
Some checks failed
Release Docker Image / define-variables (push) Successful in 3s
Rust Checks / Format (push) Successful in 1m4s
Rust Checks / Clippy (push) Failing after 2m30s
Documentation / Build and Deploy Documentation (pull_request) Successful in 37s
Release Docker Image / build-image (linux/arm64, linux-arm64) (push) Successful in 10m27s
Rust Checks / Cargo Test (push) Failing after 16m1s
Release Docker Image / build-image (linux/amd64, linux-amd64) (push) Successful in 10m52s
Release Docker Image / merge (push) Successful in 24s
67cfd5fd99
Author
Owner

There's some artifacts of some other branches in here, I'll want to filter those out before merge. Truly am a git mastermind

There's some artifacts of some other branches in here, I'll want to filter those out before merge. Truly am a git mastermind
Jade requested changes 2025-06-20 09:51:55 +00:00
Jade left a comment
Owner

It looks mostly good aside from what happened overnight in offtopic. If you want I can handle some of these changes for you.

It looks mostly good aside from what happened overnight in offtopic. If you want I can handle some of these changes for you.
@ -0,0 +59,4 @@
return Ok(());
},
};
if response.recommendation == "spam" {
Owner

Are there any other recommendations we should handle there? Or is it just spam so far?

Are there any other recommendations we should handle there? Or is it just spam so far?
Author
Owner

The MSC says spam and ok are the only two defined responses, however namespaced ones are possible. Synapse only actions if the recommendation is spam, so we do the same

The MSC says `spam` and `ok` are the only two defined responses, however namespaced ones are possible. Synapse only actions if the recommendation is `spam`, so we do the same
Jade marked this conversation as resolved
@ -45,3 +45,3 @@
}
debug!("Upgrading to timeline pdu");
debug!("Upgrading pdu {} from outlier to timeline pdu", incoming_pdu.event_id);
Owner

Use structured logging parameters at the start of the log macro rather than interpolating the dynamic value into the string.
Could probably add this as a code style guide.

Use structured logging parameters at the start of the log macro rather than interpolating the dynamic value into the string. Could probably add this as a code style guide.
Author
Owner

Could probably add this as a code style guide.

Would be a good idea given I've been doing this everywhere else too lol

> Could probably add this as a code style guide. Would be a good idea given I've been doing this everywhere else too lol
nex marked this conversation as resolved
@ -217,1 +219,4 @@
// 14-pre. If the event is not a state event, ask the policy server about it
if incoming_pdu.state_key.is_none()
&& incoming_pdu.sender().server_name() != self.services.globals.server_name()
Owner

Aren't we supposed to use the policy server to check our own events?

Aren't we supposed to use the policy server to check our own events?
Author
Owner

We aren't supposed to send federation requests to ourselves (or at least Synapse said they didn't do this, intentionally, but that was probably something to do with "unexpected behaviours")

We aren't supposed to send federation requests to ourselves (or at least Synapse said they didn't do this, intentionally, but that was probably something to do with "unexpected behaviours")
Owner

That's not what this is doing though, you're checking the sender of the event against our server name.
To avoid sending federation requests to ourselves, compare in src/service/rooms/event_handler/call_policyserv.rs:25 the via vs our server name.

That's not what this is doing though, you're checking the sender of the event against our server name. To avoid sending federation requests to ourselves, compare in src/service/rooms/event_handler/call_policyserv.rs:25 the via vs our server name.
Author
Owner

call_policyserv will be used to check ourselves later, so I can't put the check there. Upgrading an outlier PDU should only happen for remote events anyway afaik, so comparing the sender's homeserver to ours before calling the check is just a guard rail

call_policyserv will be used to check ourselves later, so I can't put the check there. Upgrading an outlier PDU should only happen for remote events anyway afaik, so comparing the sender's homeserver to ours before calling the check is just a guard rail
Owner

That's the function that has the federation request, so the guard should be in that function anyway.

call_policyserv will be used to check ourselves later, so I can't put the check there.

Are we misunderstanding each other?

  • Event sender
  • Policy server
  • Our server

Policy server != our server as self-federation isn't allowed
Current check is event sender != our server - but that check should never be false as upgrading an outlier PDU should only happen for remote events ??

That's the function that has the federation request, so the guard *should* be in that function anyway. > call_policyserv will be used to check ourselves later, so I can't put the check there. Are we misunderstanding each other? - Event sender - Policy server - Our server Policy server != our server as self-federation isn't allowed Current check is event sender != our server - but that check should never be false as upgrading an outlier PDU should only happen for remote events ??
Author
Owner

I haven't checked, but are you sure our own events can never be outliers? I'm inclined to keep the check just as a safety net since it doesn't exactly cost anything to compare, but if you're sure we've never got own-outliers I can remove it

I haven't checked, but are you sure our own events can never be outliers? I'm inclined to keep the check just as a safety net since it doesn't exactly cost anything to compare, but if you're sure we've never got own-outliers I can remove it
nex marked this conversation as resolved
@ -218,0 +222,4 @@
&& incoming_pdu.sender().server_name() != self.services.globals.server_name()
{
debug!("Checking policy server for event {}", incoming_pdu.event_id);
let policy = self.policyserv_check(&incoming_pdu, room_id);
Owner

We're missing a timeout here

We're missing a timeout here
nex marked this conversation as resolved
@ -218,0 +223,4 @@
{
debug!("Checking policy server for event {}", incoming_pdu.event_id);
let policy = self.policyserv_check(&incoming_pdu, room_id);
if let Err(e) = policy.await {
Owner

Can we send off the policy server check at an earlier point so that it's done in parallel to another check? Might require moving a part of this function into its own async function that can be async-joined

Can we send off the policy server check at an earlier point so that it's done in parallel to another check? Might require moving a part of this function into its own async function that can be async-joined
Author
Owner

not really sure if doing it in parallel is a good use of resources - if an event fails the auth checks because of its own auth events, there's no poing making an additional network request to see if it's spam, for example. It only needs to call the policy server if it'll potentially be sent to the client, so if it's not going to be sent to the client there's not much point checking.
Unless you think it's worth it anyway? I'm not actually sure how heavy the cost of checking is yet

not really sure if doing it in parallel is a good use of resources - if an event fails the auth checks because of its own auth events, there's no poing making an additional network request to see if it's spam, for example. It only needs to call the policy server if it'll potentially be sent to the client, so if it's not going to be sent to the client there's not much point checking. Unless you think it's worth it anyway? I'm not actually sure how heavy the cost of checking is yet
Owner

My intuition is that it is time-slow but resource-cheap, so we want to start it as early as possible. However rooting through the code here again, during normal operation the only thing in there that would be slow is state events, and this doesn't check those anyway, so ignore this for now

My intuition is that it is time-slow but resource-cheap, so we want to start it as early as possible. However rooting through the code here again, during normal operation the only thing in there that would be slow is state events, and this doesn't check those anyway, so ignore this for now
Author
Owner

State events should be checked eventually, but I want to battle test it on timeline events before I start soft-failing state events, since the latter is quite dangerous in comparison

State events *should* be checked eventually, but I want to battle test it on timeline events before I start soft-failing state events, since the latter is quite dangerous in comparison
nex marked this conversation as resolved
@ -701,0 +701,4 @@
if state_key.is_none() {
if prev_events.is_empty() {
warn!("Timeline event had zero prev_events, something broke.");
return Err!(Request(Unknown("Timeline event had zero prev_events.")));
Owner

Does this block belong in this PR? Should probably be tested separately

Does this block belong in this PR? Should probably be tested separately
Author
Owner

No, this is from the fix-create-auth branch, got caught up in the cherry pick

No, this is from the fix-create-auth branch, got caught up in the cherry pick
nex marked this conversation as resolved
Author
Owner

Note, the aforementioned offtopic explosion was a Meowlnir bug, not bug with this impl.

Note, the aforementioned offtopic explosion was a Meowlnir bug, not bug with this impl.
nex force-pushed nex/serving-policy from 67cfd5fd99 to 5ec8e378c2 2025-06-30 16:17:06 +00:00 Compare
nex force-pushed nex/serving-policy from 5ec8e378c2 to ce94e241b3 2025-07-19 19:23:35 +00:00 Compare
nex added 2 commits 2025-07-19 19:34:47 +00:00
# Conflicts:
#	src/service/rooms/event_handler/upgrade_outlier_pdu.rs
#	src/service/rooms/timeline/mod.rs
chore: Update ruwuma & fix lints
All checks were successful
Checks / Prefligit / prefligit (push) Successful in 32s
Release Docker Image / define-variables (push) Successful in 4s
Checks / Rust / Format (push) Successful in 34s
Documentation / Build and Deploy Documentation (pull_request) Successful in 33s
Checks / Prefligit / prefligit (pull_request) Successful in 19s
Checks / Rust / Clippy (push) Successful in 3m50s
Checks / Rust / Cargo Test (push) Successful in 6m11s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Successful in 14m40s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Successful in 14m0s
Release Docker Image / merge (push) Successful in 36s
ca263af321
nex added 4 commits 2025-07-19 20:09:37 +00:00
feat(policy-server): Prevent local events that fail the policy check
All checks were successful
Release Docker Image / define-variables (push) Successful in 3s
Checks / Prefligit / prefligit (push) Successful in 44s
Checks / Rust / Format (push) Successful in 44s
Documentation / Build and Deploy Documentation (pull_request) Successful in 48s
Checks / Prefligit / prefligit (pull_request) Successful in 27s
Checks / Rust / Clippy (push) Successful in 3m57s
Checks / Rust / Cargo Test (push) Successful in 4m40s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Successful in 12m13s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Successful in 13m51s
Release Docker Image / merge (push) Successful in 31s
d4049a79bf
nex added 1 commit 2025-07-19 21:07:27 +00:00
feat(policy-server): Limit policy server request timeout to 10 seconds
All checks were successful
Release Docker Image / define-variables (push) Successful in 13s
Checks / Prefligit / prefligit (push) Successful in 16s
Checks / Prefligit / prefligit (pull_request) Successful in 15s
Checks / Rust / Format (push) Successful in 44s
Documentation / Build and Deploy Documentation (pull_request) Successful in 39s
Checks / Rust / Clippy (push) Successful in 3m4s
Checks / Rust / Cargo Test (push) Successful in 5m10s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Successful in 13m16s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Successful in 13m3s
Release Docker Image / merge (push) Successful in 17s
5ebe8cd8dd
nex added 1 commit 2025-07-19 22:50:41 +00:00
feat(policy-server): Optimise policy server lookups
Some checks failed
Release Docker Image / merge (push) Blocked by required conditions
Release Docker Image / define-variables (push) Successful in 13s
Checks / Prefligit / prefligit (push) Successful in 15s
Checks / Prefligit / prefligit (pull_request) Successful in 34s
Documentation / Build and Deploy Documentation (pull_request) Successful in 42s
Checks / Rust / Format (push) Successful in 55s
Checks / Rust / Cargo Test (push) Has been cancelled
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Has been cancelled
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Has been cancelled
Checks / Rust / Clippy (push) Has been cancelled
cb4a7b5b2e
nex force-pushed nex/serving-policy from cb4a7b5b2e to 977fddf4c5 2025-07-19 22:52:10 +00:00 Compare
nex added 1 commit 2025-07-19 22:54:13 +00:00
style(policy-server): Run clippy
All checks were successful
Release Docker Image / define-variables (push) Successful in 3s
Checks / Prefligit / prefligit (push) Successful in 17s
Checks / Rust / Format (push) Successful in 39s
Checks / Prefligit / prefligit (pull_request) Successful in 26s
Documentation / Build and Deploy Documentation (pull_request) Successful in 44s
Checks / Rust / Clippy (push) Successful in 3m48s
Checks / Rust / Cargo Test (push) Successful in 4m18s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Successful in 10m31s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Successful in 20m31s
Release Docker Image / merge (push) Successful in 34s
5454c22b5b
nex changed title from WIP: calling policy servers to calling policy servers 2025-07-19 22:54:34 +00:00
requested reviews from Jade, Owners 2025-07-19 22:54:36 +00:00
Jade added 1 commit 2025-07-20 00:03:24 +00:00
style: Improve logging and comments
All checks were successful
Release Docker Image / define-variables (push) Successful in 2s
Checks / Prefligit / prefligit (push) Successful in 32s
Checks / Rust / Format (push) Successful in 43s
Checks / Prefligit / prefligit (pull_request) Successful in 28s
Documentation / Build and Deploy Documentation (pull_request) Successful in 47s
Checks / Rust / Clippy (push) Successful in 3m41s
Checks / Rust / Cargo Test (push) Successful in 4m21s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Successful in 10m25s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Successful in 20m16s
Release Docker Image / merge (push) Successful in 43s
4f90379ac1
Jade approved these changes 2025-07-20 00:04:05 +00:00
@ -0,0 +10,4 @@
/// Returns Ok if the policy server allows the event
#[implement(super::Service)]
#[tracing::instrument(skip_all, level = "debug")]
pub async fn policyserv_check(&self, pdu: &PduEvent, room_id: &RoomId) -> Result {
Owner

I get the feeling that this should actually be a result(bool) - fail due to an error is different from rejection, and I don't think we're supposed to fail closed? For clarity at least.

I get the feeling that this should actually be a result(bool) - fail due to an error is different from rejection, and I don't think we're supposed to fail closed? For clarity at least.
Owner

And a comment more like

/// Checks with the room's policy server (if configured) whether the given event should
/// be allowed in the room.
///
/// Returns `Ok(())` if the event is permitted, or an error if the event is
/// denied as spam. Skips the check for policy server meta-events or if no
/// policy server is configured for the room.
And a comment more like ``` /// Checks with the room's policy server (if configured) whether the given event should /// be allowed in the room. /// /// Returns `Ok(())` if the event is permitted, or an error if the event is /// denied as spam. Skips the check for policy server meta-events or if no /// policy server is configured for the room. ```
@ -0,0 +25,4 @@
return Ok(());
};
let via = match policyserver.via {
Owner

via? This is not a via server for routing!

via? This is not a via server for routing!
Jade force-pushed nex/serving-policy from 4f90379ac1 to dd3ace92e9 2025-07-20 20:01:47 +00:00 Compare
Some checks failed
Checks / Prefligit / prefligit (push) Successful in 16s
Required
Details
Release Docker Image / define-variables (push) Successful in 4s
Checks / Rust / Format (push) Successful in 51s
Required
Details
Checks / Prefligit / prefligit (pull_request) Successful in 33s
Required
Details
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m6s
Checks / Rust / Cargo Test (push) Successful in 7m55s
Required
Details
Checks / Rust / Clippy (push) Failing after 12m45s
Required
Details
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Successful in 17m1s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Successful in 17m41s
Release Docker Image / merge (push) Successful in 26s
This pull request can be merged automatically.
You are not authorized to merge this pull request.
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin nex/serving-policy:nex/serving-policy
git checkout nex/serving-policy
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: continuwuation/continuwuity#857
No description provided.