Question about propagating URLs with errors inside the federation service #1769

Open
opened 2026-05-15 01:52:46 +00:00 by ianthetechie · 3 comments
Contributor

I was troubleshooting a federation issue between my continuwuity server and another server. Everything looked fine from the federation tester, but the root cause turned out to be only detectable from my own system. (The issue was that my firewall config was too restrictive and didn't allow outbound connections on port 8448, which surprisingly didn't cause me issues for over a month, but that's neither here nor there...)

The piece of information that would have helped me debug this 100x faster is knowing the fully resolved destination URL. !admin debug ping is a great tool, but it doesn't show you the URL when things go wrong. Seeing this would have made me immediately realize that my curl test from the box was not hitting the same URL; it worked fine on standard HTTPS port 443 ;)

I traced the chain of function calls that would execute, down to the point of the error as follows:

  • conduwuit_admin::debug::commands::ping function
  • Service::send_unauthenticated_request in the conduwuit_service::sending module
  • conduwuit_service::federation::execute::execute_unauthenticated
  • conduwuit_service::federation::execute::execute_on (actual destination resolved here, via the resolver get_actual_dest call)
  • conduwuit_service::federation::execute::perform
  • conduwuit_service::federation::execute::handle_error

The error that I saw from the debug ping was as follows:

Failed sending federation request to specified server:

tcp connect error: deadline has elapsed

The message is not very helpful because handle_error explicitly scrubs the URL out:

fn handle_error(
	actual: &ActualDest,
	method: &Method,
	url: &Url,
	mut e: reqwest::Error,
) -> Result {
	if e.is_timeout() || e.is_connect() {
		e = e.without_url();
		debug_warn!("{e:?}");
	} else if e.is_redirect() {
		debug_error!(
			%method,
			%url,
			final_url = e.url().map(tracing::field::display),
			"Redirect loop {}: {}",
			actual.host,
			e,
		);
	} else {
		debug_error!("{e:?}");
	}

	Err(e.into())
}

I'm not quite sure what the solution is here, which is why I'm opening an issue. The docstring for without_url says that it's for security to prevent leaking secrets. This seems plausible on its face, but raises a few follow-up questions/observations:

  1. Why is the connection error path the only one that removes the URL? If this is a security problem, then wouldn't the other paths be vulnerable (e.g. a 5XX server error will still retain the url and invoke debug_error!)?
  2. If I could hazard a guess (I don't know the history), I'd say maybe the intent of this code was to prevent secrets from entering logs that are visible to other users on a system. Does the threat model extend to information contained inside errors throughout propagation? My instinct says it probably shouldn't, but I'm in no way qualified to answer since I don't know the codebase + history very well :D
  3. In my opinion (take it with a truckload of salt), we should propagate the actual error up the stack. That would give the debug info that I was missing automatically.

I'd like to hear thoughts from the maintainers before working up a PR on whether this is an acceptable change or if we should go another path (and what that path might be).

I would also suggest that the mut consuming with_X and without_X methods are not particularly idiomatic Rust. This style reduces the utility of the methods compared to taking &self and returning a copy with the field update. The Rust compiler is very good at ensuring this doesn't copy the whole struct every time ;) Cleaning that up would let us have nice, sanitized logs without mutating the underlying error (or doing an explicit clone every time).

I was troubleshooting a federation issue between my continuwuity server and another server. Everything looked fine from the federation tester, but the root cause turned out to be only detectable from my own system. (The issue was that my firewall config was too restrictive and didn't allow outbound connections on port 8448, which surprisingly didn't cause me issues for over a month, but that's neither here nor there...) The piece of information that would have helped me debug this 100x faster is knowing the fully resolved destination URL. `!admin debug ping` is a great tool, but it doesn't show you the URL when things go wrong. Seeing this would have made me immediately realize that my `curl` test from the box was not hitting the same URL; it worked fine on standard HTTPS port 443 ;) I traced the chain of function calls that would execute, down to the point of the error as follows: * `conduwuit_admin::debug::commands::ping` function * `Service::send_unauthenticated_request` in the `conduwuit_service::sending` module * `conduwuit_service::federation::execute::execute_unauthenticated` * `conduwuit_service::federation::execute::execute_on` (actual destination resolved here, via the resolver `get_actual_dest` call) * `conduwuit_service::federation::execute::perform` * `conduwuit_service::federation::execute::handle_error` The error that I saw from the debug ping was as follows: ``` Failed sending federation request to specified server: tcp connect error: deadline has elapsed ``` The message is not very helpful because `handle_error` explicitly scrubs the URL out: ```rust fn handle_error( actual: &ActualDest, method: &Method, url: &Url, mut e: reqwest::Error, ) -> Result { if e.is_timeout() || e.is_connect() { e = e.without_url(); debug_warn!("{e:?}"); } else if e.is_redirect() { debug_error!( %method, %url, final_url = e.url().map(tracing::field::display), "Redirect loop {}: {}", actual.host, e, ); } else { debug_error!("{e:?}"); } Err(e.into()) } ``` I'm not quite sure what the solution is here, which is why I'm opening an issue. The docstring for `without_url` says that it's for security to prevent leaking secrets. This seems plausible on its face, but raises a few follow-up questions/observations: 1. Why is the connection error path the only one that removes the URL? If this *is* a security problem, then wouldn't the other paths be vulnerable (e.g. a 5XX server error will still retain the url and invoke `debug_error!`)? 2. If I could hazard a guess (I don't know the history), I'd say maybe the intent of this code was to prevent secrets from entering logs that are visible to other users on a system. Does the threat model extend to information contained inside errors throughout propagation? My instinct says it probably _shouldn't_, but I'm in no way qualified to answer since I don't know the codebase + history very well :D 3. **In my opinion (take it with a truckload of salt), we should propagate the actual error up the stack**. That would give the debug info that I was missing automatically. I'd like to hear thoughts from the maintainers before working up a PR on whether this is an acceptable change or if we should go another path (and what that path might be). I would also suggest that the `mut` consuming `with_X` and `without_X` methods are not particularly idiomatic Rust. This style reduces the utility of the methods compared to taking `&self` and returning a copy with the field update. The Rust compiler is very good at ensuring this doesn't copy the whole struct every time ;) Cleaning that up would let us have nice, sanitized logs without mutating the underlying error (or doing an explicit clone every time).
Contributor

The piece of information that would have helped me debug this 100x faster is knowing the fully resolved destination URL

This can be fetched with !admin debug resolve-true-destination, as well as from external federation testers.

> The piece of information that would have helped me debug this 100x faster is knowing the fully resolved destination URL This can be fetched with `!admin debug resolve-true-destination`, as well as from external federation testers.
Author
Contributor

This can be fetched with !admin debug resolve-true-destination, as well as from external federation testers.

I did stumble upon !admin debug resolve-true-destination eventually, and that was one of the puzzle pieces that eventually led to me discovering the issue. I would have still identified the issue much faster if the debug ping output had more detail.

I don't understand the pointer to external federation testers. Both of our servers show green on all federation testers that I'm aware of. This particular network pathology (firewall in front of continuwuity blocking outbound on 8448) is not detectable except from inside the network. All external servers can still send inbound requests, and continuwuity could send outbound on 443, but not 8448.

> This can be fetched with !admin debug resolve-true-destination, as well as from external federation testers. I did stumble upon `!admin debug resolve-true-destination` eventually, and that was one of the puzzle pieces that eventually led to me discovering the issue. I would have still identified the issue much faster if the debug ping output had more detail. I don't understand the pointer to external federation testers. Both of our servers show green on all federation testers that I'm aware of. This particular network pathology (firewall in front of continuwuity blocking *outbound* on 8448) is not detectable except from inside the network. All external servers can still send inbound requests, and continuwuity could send outbound on 443, but not 8448.
Contributor

I'm saying you can use the external federation testers against the remote servers, in order to fetch their true destinations. However I get your point that the debug ping command could be more detailed

(Ideally, you shouldn't block any outbound port for Matrix anyways, as there are some servers running on other nonstandard ports)

I'm saying you can use the external federation testers against the remote servers, in order to fetch _their_ true destinations. However I get your point that the debug ping command could be more detailed (Ideally, you shouldn't block any outbound port for Matrix anyways, as there are some servers running on other nonstandard ports)
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
continuwuation/continuwuity#1769
No description provided.