Incorrect server destination cache after /.well-known timeout #1755

Open
opened 2026-05-11 08:02:28 +00:00 by NN708 · 2 comments

When connecting to another homeserver like example.com, Continuwuity normally retrieves matrix.example.com:443 via the /.well-known endpoint. However, if that request times out due to a transient network issue, Continuwuity caches the incorrect destination example.com:8448 and never retries the /.well-known endpoint. This prevents recovery even after the network problem is resolved.

I would like to propose implementing exponential backoff for retrying the /.well-known request.

Relevant logs:

2026-05-10T04:38:02.222339Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=713 handled=0}:request:fed{dest="example.com"}:resolve:actual{dest="example.com"}:well-known: conduwuit_service::resolver::well_known: error: reqwest::Error { kind: Request, url: "https://example.com/.well-known/matrix/server", source: hyper_util::client::legacy::Error(Connect, TimedOut) }
2026-05-10T04:38:02.222416Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=713 handled=0}:request:fed{dest="example.com"}:resolve:actual{dest="example.com"}:srv{hostname="example.com"}: conduwuit_service::resolver::actual: querying SRV for "_matrix-fed._tcp.example.com."
2026-05-10T04:38:02.599936Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=713 handled=0}:request:fed{dest="example.com"}:resolve:actual{dest="example.com"}:srv{hostname="example.com"}: conduwuit_service::resolver::actual: querying SRV for "_matrix._tcp.example.com."
2026-05-10T04:38:02.789966Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=713 handled=0}:request:fed{dest="example.com"}:resolve:actual{dest="example.com"}: conduwuit_service::resolver::actual: 5: No SRV record found
2026-05-10T04:38:02.805483Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=713 handled=0}:request:fed{dest="example.com"}:resolve:actual{dest="example.com"}:ip{untername="example.com" hostname="example.com" port=8448}: conduwuit_service::resolver::actual: querying IP for "example.com" ("example.com":8448)
2026-05-10T04:38:02.807800Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=713 handled=0}:request:fed{dest="example.com"}:resolve:actual{dest="example.com"}: conduwuit_service::resolver::actual: Actual destination: Named("example.com", ":8448") hostname: Named("example.com", ":8448")
2026-05-10T04:38:02.808023Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=713 handled=0}:request:fed{dest="example.com"}: conduwuit_service::federation::execute: Sending request method=POST url=https://example.com:8448/_matrix/federation/v1/user/keys/query
2026-05-10T04:38:32.159794Z DEBUG fed{dest="example.com"}: conduwuit_service::federation::execute: Sending request method=PUT url=https://example.com:8448/_matrix/federation/v1/send/foaYk_z0PVm7nJGsCAEKFGGC7UMrX31oc6R0_lg5udw
2026-05-10T04:38:42.161846Z DEBUG fed{dest="example.com"}: conduwuit_service::federation::execute: reqwest::Error { kind: Request, source: hyper_util::client::legacy::Error(Connect, TimedOut) }
2026-05-10T04:38:42.161912Z DEBUG response: conduwuit_service::sending::sender: operation timed out dest=Federation("example.com")
2026-05-10T04:38:57.785093Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=731 handled=0}:request:fed{dest="example.com"}: conduwuit_service::federation::execute: Sending request method=POST url=https://example.com:8448/_matrix/federation/v1/user/keys/query
2026-05-10T04:39:07.786451Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=731 handled=0}:request:fed{dest="example.com"}: conduwuit_service::federation::execute: reqwest::Error { kind: Request, source: hyper_util::client::legacy::Error(Connect, TimedOut) }
When connecting to another homeserver like `example.com`, Continuwuity normally retrieves `matrix.example.com:443` via the `/.well-known` endpoint. However, if that request times out due to a transient network issue, Continuwuity caches the incorrect destination `example.com:8448` and never retries the `/.well-known` endpoint. This prevents recovery even after the network problem is resolved. I would like to propose implementing exponential backoff for retrying the `/.well-known` request. Relevant logs: ``` 2026-05-10T04:38:02.222339Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=713 handled=0}:request:fed{dest="example.com"}:resolve:actual{dest="example.com"}:well-known: conduwuit_service::resolver::well_known: error: reqwest::Error { kind: Request, url: "https://example.com/.well-known/matrix/server", source: hyper_util::client::legacy::Error(Connect, TimedOut) } 2026-05-10T04:38:02.222416Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=713 handled=0}:request:fed{dest="example.com"}:resolve:actual{dest="example.com"}:srv{hostname="example.com"}: conduwuit_service::resolver::actual: querying SRV for "_matrix-fed._tcp.example.com." 2026-05-10T04:38:02.599936Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=713 handled=0}:request:fed{dest="example.com"}:resolve:actual{dest="example.com"}:srv{hostname="example.com"}: conduwuit_service::resolver::actual: querying SRV for "_matrix._tcp.example.com." 2026-05-10T04:38:02.789966Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=713 handled=0}:request:fed{dest="example.com"}:resolve:actual{dest="example.com"}: conduwuit_service::resolver::actual: 5: No SRV record found 2026-05-10T04:38:02.805483Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=713 handled=0}:request:fed{dest="example.com"}:resolve:actual{dest="example.com"}:ip{untername="example.com" hostname="example.com" port=8448}: conduwuit_service::resolver::actual: querying IP for "example.com" ("example.com":8448) 2026-05-10T04:38:02.807800Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=713 handled=0}:request:fed{dest="example.com"}:resolve:actual{dest="example.com"}: conduwuit_service::resolver::actual: Actual destination: Named("example.com", ":8448") hostname: Named("example.com", ":8448") 2026-05-10T04:38:02.808023Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=713 handled=0}:request:fed{dest="example.com"}: conduwuit_service::federation::execute: Sending request method=POST url=https://example.com:8448/_matrix/federation/v1/user/keys/query 2026-05-10T04:38:32.159794Z DEBUG fed{dest="example.com"}: conduwuit_service::federation::execute: Sending request method=PUT url=https://example.com:8448/_matrix/federation/v1/send/foaYk_z0PVm7nJGsCAEKFGGC7UMrX31oc6R0_lg5udw 2026-05-10T04:38:42.161846Z DEBUG fed{dest="example.com"}: conduwuit_service::federation::execute: reqwest::Error { kind: Request, source: hyper_util::client::legacy::Error(Connect, TimedOut) } 2026-05-10T04:38:42.161912Z DEBUG response: conduwuit_service::sending::sender: operation timed out dest=Federation("example.com") 2026-05-10T04:38:57.785093Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=731 handled=0}:request:fed{dest="example.com"}: conduwuit_service::federation::execute: Sending request method=POST url=https://example.com:8448/_matrix/federation/v1/user/keys/query 2026-05-10T04:39:07.786451Z DEBUG router{method=POST path=/_matrix/client/v3/keys/query}:request:handle{active=731 handled=0}:request:fed{dest="example.com"}: conduwuit_service::federation::execute: reqwest::Error { kind: Request, source: hyper_util::client::legacy::Error(Connect, TimedOut) } ```
Contributor

Hello,

The issue will likely be picked up when the sender service rewrite can be done. You can see a previous attempt to resolve this in #1463.

For now, C10y caches it for 24 hours or until manual purging. It's not ideal, but you can try that manual purge. For your own server, consider hosting a fallback route on example.com:8448 as well

Hello, The issue will likely be picked up when the sender service rewrite can be done. You can see a previous attempt to resolve this in #1463. For now, C10y caches it for 24 hours or until [manual purging](https://continuwuity.org/troubleshooting.html#intermittent-federation-failures-to-a-specific-server). It's not ideal, but you can try that manual purge. For your own server, consider hosting a fallback route on `example.com:8448` as well
Member

Yes, this should be fixed by #1505 when its done and merged.

Yes, this should be fixed by #1505 when its done and merged.
Sign in to join this conversation.
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
continuwuation/continuwuity#1755
No description provided.