Fix state res v2.1 edge case #1075

Merged
Jade merged 1 commit from nex/fix/incoming-fetch into main 2025-09-25 08:14:45 +00:00
Owner

Previously continuwuity would just ignore unknown auth events, however with the introduction of state res v2.1 requiring the conflicted subgraph, this is not an option.
This PR fixes an edge case where a previous auth event is unknown, is not backfilled in time, and is then depended upon for state resolution. This results in incoming state not being resolvable, resulting in further auth events failing, gradually causing the room to fall behind. This PR instead correctly refuses to de-outlier incoming events that lack auth events by not trusting the caller fully and always fetching events that may be missing. The result is more consistently having all required auth events, which fixes the root problem causing state resolution panics.

fixes this problem (log)
2025-09-25T01:20:20.241484Z  WARN conduwuit_service::rooms::event_handler::handle_incoming_pdu: Prev $wSLQPE_tDWsVPlHQo1Rk0dMb7xCQWTBQqdRA5m-N0L4 failed: M_INVALID_PARAM: Event has been soft failed
  2025-09-25T01:20:20.243201Z ERROR conduwuit_service::rooms::event_handler::resolve_state: State resolution failed: InvalidPdu("Failed to calculate conflicted subgraph")
    at src/service/rooms/event_handler/resolve_state.rs:116 on conduwuit:worker ThreadId(2)
  2025-09-25T01:20:44.789993Z ERROR conduwuit_service::rooms::event_handler::resolve_state: State resolution failed: InvalidPdu("Failed to calculate conflicted subgraph")
    at src/service/rooms/event_handler/resolve_state.rs:116 on conduwuit:worker ThreadId(3)
  2025-09-25T01:21:12.938899Z ERROR conduwuit_service::rooms::event_handler::resolve_state: State resolution failed: InvalidPdu("Failed to calculate conflicted subgraph")
    at src/service/rooms/event_handler/resolve_state.rs:116 on conduwuit:worker ThreadId(3)
2025-09-25T01:21:12.938924Z  WARN conduwuit_service::rooms::event_handler::handle_incoming_pdu: Prev $dl3dYf52iz2etcXPM5yCU0bpXJBgcqcQDDbVx3JecPo failed: State resolution failed: InvalidPdu("Failed to calculate conflicted subgraph")
2025-09-25T01:22:43.209662Z  WARN conduwuit_service::sending::appservice: Could not send request to appservice "meowlnir" at http://127.0.0.1:29339: reqwest::Error { kind: Request, url: "http://127.0.0.1:29339/_matrix/app/v1/transactions/UeRImFD0Y4R-ZFK4dotDcGKd55NVQB1dGzE18PGp3GE?access_token=fa11e970c12cacf584818a9db6458fae1d6fea7d1296d33bee5eabc98a8dcf28", source: hyper_util::client::legacy::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })) }
2025-09-25T01:22:43.222702Z  INFO create_join_event{room_id="!OsgPt00PZet9BvF2v0:nexy7574.co.uk" origin="nyxt.dev"}: conduwuit_api::server::send_join: Sending join event to other servers fast_join=false
2025-09-25T01:22:43.222843Z  INFO conduwuit_api::server::send_join: Finished sending a join for nyxt.dev in !OsgPt00PZet9BvF2v0:nexy7574.co.uk in 66.547632ms
  2025-09-25T01:23:21.994417Z ERROR conduwuit_service::rooms::event_handler::resolve_state: State resolution failed: InvalidPdu("Failed to calculate conflicted subgraph")
    at src/service/rooms/event_handler/resolve_state.rs:116 on conduwuit:worker ThreadId(4)
  2025-09-25T01:23:48.619531Z ERROR conduwuit_service::rooms::event_handler::resolve_state: State resolution failed: InvalidPdu("Failed to calculate conflicted subgraph")
    at src/service/rooms/event_handler/resolve_state.rs:116 on conduwuit:worker ThreadId(4)
2025-09-25T01:23:48.619561Z  WARN conduwuit_service::rooms::event_handler::handle_incoming_pdu: Prev $LhC3uIclOBuRY9rqvmzhV8OUB9b7QE_V76Gx-5SD1qc failed: State resolution failed: InvalidPdu("Failed to calculate conflicted subgraph")
2025-09-25T01:23:48.765620Z  WARN conduwuit_core::matrix::state_res::event_auth: sender's membership is not join
2025-09-25T01:23:48.765885Z  INFO conduwuit_service::rooms::event_handler::upgrade_outlier_pdu: Soft failing event event_id=$dOKg_U9LxyKFUP67kX6REdCwMtJIBNihB7rUX9_76qQ
2025-09-25T01:23:48.776317Z  WARN conduwuit_service::rooms::event_handler::upgrade_outlier_pdu: Event was soft failed event_id=$dOKg_U9LxyKFUP67kX6REdCwMtJIBNihB7rUX9_76qQ
2025-09-25T01:24:37.899182Z  WARN conduwuit_service::rooms::event_handler::handle_incoming_pdu: Prev $dOKg_U9LxyKFUP67kX6REdCwMtJIBNihB7rUX9_76qQ failed: M_INVALID_PARAM: Event has been soft failed
Previously continuwuity would just ignore unknown auth events, however with the introduction of state res v2.1 requiring the conflicted subgraph, this is not an option. This PR fixes an edge case where a previous auth event is unknown, is not backfilled in time, and is then depended upon for state resolution. This results in incoming state not being resolvable, resulting in further auth events failing, gradually causing the room to fall behind. This PR instead correctly refuses to de-outlier incoming events that lack auth events by not trusting the caller fully and always fetching events that may be missing. The result is more consistently having all required auth events, which fixes the root problem causing state resolution panics. <details> <summary>fixes this problem (log)</summary> ``` 2025-09-25T01:20:20.241484Z WARN conduwuit_service::rooms::event_handler::handle_incoming_pdu: Prev $wSLQPE_tDWsVPlHQo1Rk0dMb7xCQWTBQqdRA5m-N0L4 failed: M_INVALID_PARAM: Event has been soft failed 2025-09-25T01:20:20.243201Z ERROR conduwuit_service::rooms::event_handler::resolve_state: State resolution failed: InvalidPdu("Failed to calculate conflicted subgraph") at src/service/rooms/event_handler/resolve_state.rs:116 on conduwuit:worker ThreadId(2) 2025-09-25T01:20:44.789993Z ERROR conduwuit_service::rooms::event_handler::resolve_state: State resolution failed: InvalidPdu("Failed to calculate conflicted subgraph") at src/service/rooms/event_handler/resolve_state.rs:116 on conduwuit:worker ThreadId(3) 2025-09-25T01:21:12.938899Z ERROR conduwuit_service::rooms::event_handler::resolve_state: State resolution failed: InvalidPdu("Failed to calculate conflicted subgraph") at src/service/rooms/event_handler/resolve_state.rs:116 on conduwuit:worker ThreadId(3) 2025-09-25T01:21:12.938924Z WARN conduwuit_service::rooms::event_handler::handle_incoming_pdu: Prev $dl3dYf52iz2etcXPM5yCU0bpXJBgcqcQDDbVx3JecPo failed: State resolution failed: InvalidPdu("Failed to calculate conflicted subgraph") 2025-09-25T01:22:43.209662Z WARN conduwuit_service::sending::appservice: Could not send request to appservice "meowlnir" at http://127.0.0.1:29339: reqwest::Error { kind: Request, url: "http://127.0.0.1:29339/_matrix/app/v1/transactions/UeRImFD0Y4R-ZFK4dotDcGKd55NVQB1dGzE18PGp3GE?access_token=fa11e970c12cacf584818a9db6458fae1d6fea7d1296d33bee5eabc98a8dcf28", source: hyper_util::client::legacy::Error(Connect, ConnectError("tcp connect error", Os { code: 111, kind: ConnectionRefused, message: "Connection refused" })) } 2025-09-25T01:22:43.222702Z INFO create_join_event{room_id="!OsgPt00PZet9BvF2v0:nexy7574.co.uk" origin="nyxt.dev"}: conduwuit_api::server::send_join: Sending join event to other servers fast_join=false 2025-09-25T01:22:43.222843Z INFO conduwuit_api::server::send_join: Finished sending a join for nyxt.dev in !OsgPt00PZet9BvF2v0:nexy7574.co.uk in 66.547632ms 2025-09-25T01:23:21.994417Z ERROR conduwuit_service::rooms::event_handler::resolve_state: State resolution failed: InvalidPdu("Failed to calculate conflicted subgraph") at src/service/rooms/event_handler/resolve_state.rs:116 on conduwuit:worker ThreadId(4) 2025-09-25T01:23:48.619531Z ERROR conduwuit_service::rooms::event_handler::resolve_state: State resolution failed: InvalidPdu("Failed to calculate conflicted subgraph") at src/service/rooms/event_handler/resolve_state.rs:116 on conduwuit:worker ThreadId(4) 2025-09-25T01:23:48.619561Z WARN conduwuit_service::rooms::event_handler::handle_incoming_pdu: Prev $LhC3uIclOBuRY9rqvmzhV8OUB9b7QE_V76Gx-5SD1qc failed: State resolution failed: InvalidPdu("Failed to calculate conflicted subgraph") 2025-09-25T01:23:48.765620Z WARN conduwuit_core::matrix::state_res::event_auth: sender's membership is not join 2025-09-25T01:23:48.765885Z INFO conduwuit_service::rooms::event_handler::upgrade_outlier_pdu: Soft failing event event_id=$dOKg_U9LxyKFUP67kX6REdCwMtJIBNihB7rUX9_76qQ 2025-09-25T01:23:48.776317Z WARN conduwuit_service::rooms::event_handler::upgrade_outlier_pdu: Event was soft failed event_id=$dOKg_U9LxyKFUP67kX6REdCwMtJIBNihB7rUX9_76qQ 2025-09-25T01:24:37.899182Z WARN conduwuit_service::rooms::event_handler::handle_incoming_pdu: Prev $dOKg_U9LxyKFUP67kX6REdCwMtJIBNihB7rUX9_76qQ failed: M_INVALID_PARAM: Event has been soft failed ``` </summary>
nex added this to the 0.5.0 milestone 2025-09-25 01:58:11 +00:00
nex self-assigned this 2025-09-25 01:58:11 +00:00
fix(stateres): Correctly fetch missing auth events for incoming PDUs
Some checks failed
Documentation / Build and Deploy Documentation (pull_request) Successful in 57s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 2m6s
Release Docker Image / Build linux-amd64 (release) (pull_request) Successful in 6m21s
Release Docker Image / Build linux-arm64 (release) (pull_request) Successful in 8m6s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 11m31s
Release Docker Image / Create Multi-arch Release Manifest (pull_request) Successful in 12s
Release Docker Image / Build linux-arm64 (max-perf) (pull_request) Successful in 13m31s
Release Docker Image / Build linux-amd64 (max-perf) (pull_request) Successful in 13m45s
Release Docker Image / Create Max-Perf Manifest (pull_request) Successful in 11s
Checks / Prek / Clippy and Cargo Tests (push) Has been cancelled
Checks / Prek / Pre-commit & Formatting (push) Has been cancelled
Documentation / Build and Deploy Documentation (push) Has been cancelled
Release Docker Image / Build linux-amd64 (release) (push) Has been cancelled
Release Docker Image / Build linux-arm64 (release) (push) Has been cancelled
Release Docker Image / Create Multi-arch Release Manifest (push) Has been cancelled
Release Docker Image / Build linux-amd64 (max-perf) (push) Has been cancelled
Release Docker Image / Build linux-arm64 (max-perf) (push) Has been cancelled
Release Docker Image / Create Max-Perf Manifest (push) Has been cancelled
c66f6f8900
nex requested review from Jade 2025-09-25 01:58:27 +00:00
Jade approved these changes 2025-09-25 08:08:27 +00:00
Jade merged commit c66f6f8900 into main 2025-09-25 08:14:45 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
continuwuation/continuwuity!1075
No description provided.