fix: improve appservice authentication and thread pagination UX #923

Merged
tcpipuk merged 4 commits from tom/fixes into main 2025-08-12 07:06:31 +00:00
Owner

This PR addresses authentication issues affecting appservice users and improves the thread pagination experience.

  • Appservice Authentication (#813): Appservices were failing to authenticate because their sender_localpart user was never created in the database. This fix ensures the user account is created during appservice startup and registration, and automatically reactivates the account if it was previously deactivated. Fixes #813.

  • Token Collision Prevention: Prevents access tokens from being shared between users and appservices, which could cause authentication ambiguity. On startup, any existing collisions are resolved in favour of appservices, and new collisions are prevented during token generation. Token lookups are also optimised using concurrent queries (adapted from tuwunel).

  • Thread Pagination: Replaces PduCount pagination tokens with persistent ShortEventId tokens, but retain backwards-compatibility for both types. The thread root event is now included when paginating to the beginning of a thread, duplicate events at boundaries are fixed, and token format is consistent across /relations and /messages endpoints.

This PR addresses authentication issues affecting appservice users and improves the thread pagination experience. - **Appservice Authentication (#813):** Appservices were failing to authenticate because their `sender_localpart` user was never created in the database. This fix ensures the user account is created during appservice startup and registration, and automatically reactivates the account if it was previously deactivated. Fixes #813. - **Token Collision Prevention:** Prevents access tokens from being shared between users and appservices, which could cause authentication ambiguity. On startup, any existing collisions are resolved in favour of appservices, and new collisions are prevented during token generation. Token lookups are also optimised using concurrent queries (adapted from tuwunel). - **Thread Pagination:** Replaces `PduCount` pagination tokens with persistent `ShortEventId` tokens, but retain backwards-compatibility for both types. The thread root event is now included when paginating to the beginning of a thread, duplicate events at boundaries are fixed, and token format is consistent across `/relations` and `/messages` endpoints.
tcpipuk self-assigned this 2025-08-10 18:44:16 +00:00
Fixes #813: Application services were unable to work because their sender_localpart
user was never created in the database, preventing authentication.

This fix ensures the appservice user account is created when:
- The server starts up and loads existing appservices from the database
- A new appservice is registered via the admin command

Additionally, if an appservice user has been accidentally deactivated, it will be
automatically reactivated when the appservice starts.

The solution centralises all appservice startup logic into a single `start_appservice`
helper method, eliminating code duplication between the registration and startup paths.
Ensures access tokens are unique across both user and appservice tables to
prevent authentication ambiguity and potential security issues.

Changes:
- On startup, automatically logout any user devices using tokens that
  conflict with appservice tokens (resolves in favour of appservices)
  and log a warning with affected user/device details
- When creating new user tokens, check for conflicts with appservice tokens
  and generate a new token if a collision would occur
- When registering new appservices, reject registration if the token is
  already in use by a user device
- Use futures::select_ok to race token lookups concurrently for better
  performance (adapted from tuwunel commit 066097a8)

This fix-forward approach resolves existing token collisions on startup
whilst preventing new ones from being created, without breaking existing
valid authentications.

The find_token optimisation is adapted from tuwunel (matrix-construct/tuwunel)
commit 066097a8: "Optimize user and appservice token queries" by Jason Volk.
fix(relations): improve thread pagination and include root event
All checks were successful
Release Docker Image / define-variables (push) Successful in 5s
Checks / Prefligit / prefligit (push) Successful in 21s
Checks / Rust / Format (push) Successful in 55s
Checks / Rust / Clippy (push) Successful in 4m33s
Checks / Rust / Cargo Test (push) Successful in 5m20s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Successful in 14m52s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Successful in 15m10s
Release Docker Image / merge (push) Successful in 30s
Checks / Prefligit / prefligit (pull_request) Successful in 26s
Documentation / Build and Deploy Documentation (pull_request) Successful in 35s
9286838d23
Replace unreliable PduCount pagination tokens with ShortEventId throughout
the relations and messages endpoints. ShortEventId provides stable, unique
identifiers that persist across server restarts and database operations.

Key improvements:
- Add token parsing helpers that try ShortEventId first, fall back to
  PduCount for backwards compatibility
- Include thread root event when paginating backwards to thread start
- Fix off-by-one error in get_relations that was returning the starting
  event in results
- Only return next_batch/prev_batch tokens when more events are available,
  preventing clients from making unnecessary requests at thread boundaries
- Ensure consistent token format between /relations, /messages, and /sync
  endpoints for interoperability

This fixes duplicate events when scrolling at thread boundaries and ensures
the thread root message is visible when viewing a thread, matching expected
client behaviour.
Author
Owner

I'm currently running forgejo.ellis.link/continuwuation/continuwuity:branch-tom-threads and confirmed threading works great - the attached screenshot shows it's correctly telling the client there are no more events before the top one, and including the "hey" root message from outside the thread that the first in-thread reply was responding to.

I'm currently using appservices without an issue, but may need someone who originally experienced the issue in #813 to replicate and confirm if this has resolved their issues.

I'm currently running `forgejo.ellis.link/continuwuation/continuwuity:branch-tom-threads` and confirmed threading works great - the attached screenshot shows it's correctly telling the client there are no more events before the top one, and including the "hey" root message from outside the thread that the first in-thread reply was responding to. I'm currently using appservices without an issue, but may need someone who originally experienced the issue in #813 to replicate and confirm if this has resolved their issues.
Jade requested changes 2025-08-10 23:03:20 +00:00
Dismissed
Jade left a comment
Owner

OK, lotta comments here. Good news - they're mostly code quality!

OK, lotta comments here. Good news - they're mostly code quality!
@ -64,0 +64,4 @@
/// Parse a pagination token, trying ShortEventId first, then falling back to
/// PduCount
async fn parse_pagination_token(
_services: &Services,
Owner

if it needs access to services and isn't a top level API request, it should be a service, but it doesn't, so it doesn't need these unused params

if it needs access to services and isn't a top level API request, it should be a service, but it doesn't, so it doesn't need these unused params
Author
Owner

Agreed! I've removed the unused parameters and extracted to a shared utils module 🙂

Agreed! I've removed the unused parameters and extracted to a shared utils module 🙂
tcpipuk marked this conversation as resolved
@ -64,0 +69,4 @@
token: Option<&str>,
default: PduCount,
) -> Result<PduCount> {
let Some(token) = token else {
Owner

this shouldn't be a part of this function. use unwrap_or* functions

this shouldn't be a part of this function. use unwrap_or* functions
Author
Owner

I didn't see that, thanks - I've rewritten to use .map(parse_token).transpose()?.unwrap_or_else(||...) so defaults are handled at call sites 👌

I didn't see that, thanks - I've rewritten to use `.map(parse_token).transpose()?.unwrap_or_else(||...)` so defaults are handled at call sites 👌
tcpipuk marked this conversation as resolved
@ -88,2 +117,2 @@
.transpose()?
.unwrap_or_else(|| match body.dir {
let from: PduCount =
parse_pagination_token(&services, room_id, body.from.as_deref(), match body.dir {
Owner

applying the feedback on the referenced function, you'd be able to do this like the original code, but with .map(parse_pagination_token) instead.

applying the feedback on the referenced function, you'd be able to do this like the original code, but with `.map(parse_pagination_token)` instead.
Author
Owner

Good shout, both files now use the .map(parse_token).transpose()? pattern - much cleaner! 😄

Good shout, both files now use the `.map(parse_token).transpose()?` pattern - much cleaner! 😄
tcpipuk marked this conversation as resolved
@ -93,2 +122,3 @@
.await?;
let to: Option<PduCount> = body.to.as_deref().map(str::parse).flat_ok();
let to: Option<PduCount> = if let Some(to_str) = body.to.as_deref() {
Owner

see above

see above
Author
Owner

Ditto!

Ditto!
tcpipuk marked this conversation as resolved
@ -22,1 +26,4 @@
/// Parse a pagination token, trying ShortEventId first, then falling back to
/// PduCount
async fn parse_pagination_token(
Owner

duplicated code

duplicated code
Author
Owner

Yeah, sorry... I've created src/api/client/utils.rs with a shared parse_pagination_token and count_to_token so am reusing those instead of duplicating now.

Yeah, sorry... I've created `src/api/client/utils.rs` with a shared `parse_pagination_token` and `count_to_token` so am reusing those instead of duplicating now.
tcpipuk marked this conversation as resolved
@ -23,0 +53,4 @@
}
/// Convert a PduCount to a token string (using the underlying ShortEventId)
fn count_to_token(count: PduCount) -> String {
Owner

duplicated code

duplicated code
Author
Owner

Ditto

Ditto
tcpipuk marked this conversation as resolved
@ -116,3 +150,1 @@
| Direction::Forward => PduCount::min(),
| Direction::Backward => PduCount::max(),
});
let start: PduCount = parse_pagination_token(services, room_id, from, match dir {
Owner

see above

see above
Author
Owner

Ditto

Ditto
tcpipuk marked this conversation as resolved
@ -119,2 +154,3 @@
.await?;
let to: Option<PduCount> = to.map(str::parse).flat_ok();
let to: Option<PduCount> = if let Some(to_str) = to {
Owner

see above

see above
Author
Owner

Ditto!

Ditto!
tcpipuk marked this conversation as resolved
@ -162,0 +232,4 @@
| Direction::Forward => events.last(),
| Direction::Backward => events.first(),
}
.map(|(count, _)| count_to_token(*count))
Owner

for reference, could be .map(at!(0)) .map(count_to_token), it's just a matter of preference

for reference, could be `.map(at!(0))` `.map(count_to_token)`, it's just a matter of preference
Author
Owner

As above 🙂

As above 🙂
tcpipuk marked this conversation as resolved
@ -162,0 +238,4 @@
};
// Build the response chunk with thread root if needed
let chunk: Vec<_> = if let Some(root) = root_event {
Owner

Default behaviour should be written once. Don't worry if that's too difficult though.

Default behaviour should be written once. Don't worry if that's too difficult though.
Author
Owner

I wasn't sure how to do this before, but have had a go at simplifying to use root_event.into_iter()...chain(...) pattern - hopefully this looks right to you?

I wasn't sure how to do this before, but have had a go at simplifying to use `root_event.into_iter()...chain(...)` pattern - hopefully this looks right to you?
tcpipuk marked this conversation as resolved
@ -345,0 +355,4 @@
.map_ok(Token::Appservice);
pin_mut!(user_token, appservice_token);
match select_ok([Left(user_token), Right(appservice_token)]).await {
Owner

I'm hoping that this will only return an err if both futures error.

I'm hoping that this will only return an err if both futures error.
Author
Owner

Exactly that, yeah - I've added a comment to confirm this is intentional 👌

Exactly that, yeah - I've added a comment to confirm this is intentional 👌
tcpipuk marked this conversation as resolved
@ -56,0 +57,4 @@
if let Ok((user_id, device_id)) = self
.services
.users
.find_from_token(&registration.as_token)
Owner

What might be missing from the overarching logic here is checking that appservices don't conflict with each other?

What might be missing from the overarching logic here is checking that appservices don't conflict with each other?
Author
Owner

I can't believe I missed that! I've added a collision check at startup, and also made it so you can't configure a new appservice with the same token - it allows appservices with the same name through, so you're allowed to update the config of an appservice without it demanding the token change each time 😅

I can't believe I missed that! I've added a collision check at startup, and also made it so you can't configure a new appservice with the same token - it allows appservices with the same name through, so you're allowed to update the config of an appservice without it demanding the token change each time 😅
tcpipuk marked this conversation as resolved
@ -114,3 +183,3 @@
}
pub async fn find_from_token(&self, token: &str) -> Option<RegistrationInfo> {
pub async fn find_from_token(&self, token: &str) -> Result<RegistrationInfo> {
Owner

I'm guessing this is Jason code to use select_ok, but it still feels wrong :(

I'm guessing this is Jason code to use select_ok, but it still feels wrong :(
Author
Owner

I replied in Matrix then remembered I can reply here! 😅

The user find_from_token was written to use Result so it seemed to make sense to align the two so we can incorporate the "check both together" method rather than "check Option for appservice then get Result for user".

I can switch to Option for both instead, it just seemed a lot cleaner than trying to convert at the point we're receiving two different types of outputs from the two functions - what do you think?

I replied in Matrix then remembered I can reply here! 😅 The user `find_from_token` was written to use `Result` so it seemed to make sense to align the two so we can incorporate the "check both together" method rather than "check Option for appservice then get Result for user". I can switch to Option for both instead, it just seemed a lot cleaner than trying to convert at the point we're receiving two different types of outputs from the two functions - what do you think?
Author
Owner

(in the meantime, I've added a brief doc comment explaining why Result is used!)

(in the meantime, I've added a brief doc comment explaining why Result is used!)
Owner

Yeah I get it.

Yeah I get it.
tcpipuk marked this conversation as resolved
@ -64,3 +65,3 @@
let mut current = ArrayVec::<u8, 16>::new();
current.extend(target.to_be_bytes());
current.extend(from.saturating_inc(dir).into_unsigned().to_be_bytes());
current.extend(from_unsigned.to_be_bytes());
Owner

Mind explaining this change? not sure what's going on here

Mind explaining this change? not sure what's going on here
Author
Owner

Oh yeah, it was a fun one... the old saturating_inc could skip events at min/max boundaries. It was making a mess of paging through threads, so now we query from the exact position and filter excludes it explicitly - I've also added a comment to clarify what's going on.

Oh yeah, it was a fun one... the old saturating_inc could skip events at min/max boundaries. It was making a mess of paging through threads, so now we query from the exact position and filter excludes it explicitly - I've also added a comment to clarify what's going on.
tcpipuk marked this conversation as resolved
@ -410,0 +417,4 @@
.await
.is_ok()
{
let new_token = utils::random_string(32);
Owner

Uh, I think handling the token regeneration here could result in unexpected behaviour if in future somewhere attempts to set a specific token for a user. Either return an error and have the caller match on it to retry, or don't take the token as an arg (always generate one). You could combine the best of both with a helper function that handles regeneration.

Uh, I think handling the token regeneration here could result in unexpected behaviour if in future somewhere attempts to set a specific token for a user. Either return an error and have the caller match on it to retry, or don't take the token as an arg (always generate one). You could combine the best of both with a helper function that handles regeneration.
Author
Owner

I'm happy to refactor this either way, though I think it's a risk to even entertain the possibility that users provide a custom token?

AFAIK, the only reason we do it for appservices is that it's provided by the admin and part of the YAML confg snippet given to the admin command, so generating the token and providing it back afterwards would be worse UX.

I'll put together a function for generating a guaranteed unique token though, so the flow can use that function, and that codepath might be useful for something else in the future 🙂

I'm happy to refactor this either way, though I think it's a risk to even entertain the possibility that users provide a custom token? AFAIK, the only reason we do it for appservices is that it's provided by the admin and part of the YAML confg snippet given to the admin command, so generating the token and providing it back afterwards would be worse UX. I'll put together a function for generating a guaranteed unique token though, so the flow can use that function, and that codepath might be useful for something else in the future 🙂
Author
Owner

I've split this into separate functions - generate_unique_token() for creating tokens, then set_token() now fails on collision rather than silently regenerating. Hopefully that's ok?

I've split this into separate functions - `generate_unique_token()` for creating tokens, then `set_token()` now fails on collision rather than silently regenerating. Hopefully that's ok?
tcpipuk marked this conversation as resolved
tcpipuk force-pushed tom/fixes from e56a7ab115
Some checks failed
Release Docker Image / define-variables (push) Successful in 2s
Checks / Prefligit / prefligit (push) Successful in 26s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Checks / Prefligit / prefligit (pull_request) Successful in 18s
Checks / Rust / Format (push) Successful in 1m34s
Checks / Rust / Cargo Test (push) Failing after 3m0s
Checks / Rust / Clippy (push) Failing after 3m14s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Failing after 9m55s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Failing after 10m9s
Release Docker Image / merge (push) Has been skipped
to 077eac688e
Some checks failed
Release Docker Image / define-variables (push) Successful in 2s
Checks / Prefligit / prefligit (push) Successful in 12s
Checks / Rust / Format (push) Successful in 54s
Documentation / Build and Deploy Documentation (pull_request) Successful in 50s
Checks / Prefligit / prefligit (pull_request) Successful in 16s
Checks / Rust / Cargo Test (push) Failing after 3m52s
Checks / Rust / Clippy (push) Failing after 3m59s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Failing after 9m48s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Failing after 10m24s
Release Docker Image / merge (push) Has been skipped
2025-08-11 05:56:55 +00:00
Compare
tcpipuk force-pushed tom/fixes from 077eac688e
Some checks failed
Release Docker Image / define-variables (push) Successful in 2s
Checks / Prefligit / prefligit (push) Successful in 12s
Checks / Rust / Format (push) Successful in 54s
Documentation / Build and Deploy Documentation (pull_request) Successful in 50s
Checks / Prefligit / prefligit (pull_request) Successful in 16s
Checks / Rust / Cargo Test (push) Failing after 3m52s
Checks / Rust / Clippy (push) Failing after 3m59s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Failing after 9m48s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Failing after 10m24s
Release Docker Image / merge (push) Has been skipped
to 8e98d533f6
Some checks failed
Release Docker Image / define-variables (push) Successful in 2s
Checks / Prefligit / prefligit (push) Successful in 27s
Checks / Rust / Format (push) Successful in 1m14s
Checks / Prefligit / prefligit (pull_request) Successful in 21s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m20s
Checks / Rust / Clippy (push) Failing after 6m2s
Checks / Rust / Cargo Test (push) Successful in 6m58s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Successful in 14m57s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Successful in 15m12s
Release Docker Image / merge (push) Successful in 19s
2025-08-11 06:12:14 +00:00
Compare
tcpipuk force-pushed tom/fixes from 8e98d533f6
Some checks failed
Release Docker Image / define-variables (push) Successful in 2s
Checks / Prefligit / prefligit (push) Successful in 27s
Checks / Rust / Format (push) Successful in 1m14s
Checks / Prefligit / prefligit (pull_request) Successful in 21s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m20s
Checks / Rust / Clippy (push) Failing after 6m2s
Checks / Rust / Cargo Test (push) Successful in 6m58s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Successful in 14m57s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Successful in 15m12s
Release Docker Image / merge (push) Successful in 19s
to 971a82425a
Some checks failed
Release Docker Image / merge (push) Blocked by required conditions
Release Docker Image / define-variables (push) Successful in 2s
Checks / Prefligit / prefligit (push) Successful in 26s
Checks / Rust / Format (push) Successful in 59s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m15s
Checks / Prefligit / prefligit (pull_request) Successful in 58s
Checks / Rust / Clippy (push) Failing after 5m29s
Checks / Rust / Cargo Test (push) Successful in 6m16s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Has been cancelled
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Has been cancelled
2025-08-11 06:34:14 +00:00
Compare
tcpipuk force-pushed tom/fixes from 971a82425a
Some checks failed
Release Docker Image / merge (push) Blocked by required conditions
Release Docker Image / define-variables (push) Successful in 2s
Checks / Prefligit / prefligit (push) Successful in 26s
Checks / Rust / Format (push) Successful in 59s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m15s
Checks / Prefligit / prefligit (pull_request) Successful in 58s
Checks / Rust / Clippy (push) Failing after 5m29s
Checks / Rust / Cargo Test (push) Successful in 6m16s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Has been cancelled
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Has been cancelled
to 8b295b1e22
Some checks failed
Release Docker Image / define-variables (push) Successful in 14s
Checks / Prefligit / prefligit (push) Successful in 44s
Checks / Rust / Format (push) Failing after 59s
Documentation / Build and Deploy Documentation (pull_request) Successful in 37s
Checks / Prefligit / prefligit (pull_request) Successful in 32s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Successful in 13m8s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Successful in 13m27s
Release Docker Image / merge (push) Successful in 30s
Checks / Rust / Clippy (push) Successful in 3m49s
Checks / Rust / Cargo Test (push) Successful in 4m19s
2025-08-11 06:44:15 +00:00
Compare
Jade approved these changes 2025-08-11 23:55:14 +00:00
tcpipuk force-pushed tom/fixes from 8b295b1e22
Some checks failed
Release Docker Image / define-variables (push) Successful in 14s
Checks / Prefligit / prefligit (push) Successful in 44s
Checks / Rust / Format (push) Failing after 59s
Documentation / Build and Deploy Documentation (pull_request) Successful in 37s
Checks / Prefligit / prefligit (pull_request) Successful in 32s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Successful in 13m8s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Successful in 13m27s
Release Docker Image / merge (push) Successful in 30s
Checks / Rust / Clippy (push) Successful in 3m49s
Checks / Rust / Cargo Test (push) Successful in 4m19s
to 583cb924f1
All checks were successful
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m8s
Checks / Prefligit / prefligit (pull_request) Successful in 19s
Release Docker Image / define-variables (push) Successful in 4s
Documentation / Build and Deploy Documentation (push) Successful in 34s
Checks / Prefligit / prefligit (push) Successful in 34s
Checks / Rust / Format (push) Successful in 39s
Checks / Rust / Clippy (push) Successful in 4m33s
Checks / Rust / Cargo Test (push) Successful in 4m57s
Release Docker Image / build-image (linux/amd64, release, linux-amd64, base) (push) Successful in 14m12s
Release Docker Image / build-image (linux/arm64, release, linux-arm64, base) (push) Successful in 14m31s
Release Docker Image / merge (push) Successful in 12s
2025-08-12 04:45:28 +00:00
Compare
nex added this to the v0.5.0-rc.8 milestone 2025-08-16 16:00:38 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
continuwuation/continuwuity!923
No description provided.