WIP: perf: Improve gapfill logic #1818

Draft
nex wants to merge 49 commits from nex/perf/get-missing-events into main
Owner

Right now our gapfilling code is pretty awful for performance: for each event we are missing, we make an individual GET /_matrix/federation/v1/event/$eventID call, which is incredibly slow. This also happens for fetching auth events we are missing. Luckily, Matrix defines two bulk-fetch endpoints that help us. So, this PR updates our gapfilling code to be more reliable and effective.

Warning

This pull request is UNSTABLE and may result in "PERMANENT" DATA LOSS. Do NOT deploy to a production server until it is confirmed to be stable. I know "we can now persist >1,000 events in under 5 seconds" sounds tantalising, but there's no guarantees half of those events aren't just being silently dropped.

Pull request checklist:

  • This pull request targets the main branch, and the branch is named something other than
    main.
  • I have written an appropriate pull request title and my description is clear.
  • I understand I am responsible for the contents of this pull request.
  • I have followed the contributing guidelines:
Right now our gapfilling code is pretty awful for performance: for each event we are missing, we make an individual `GET /_matrix/federation/v1/event/$eventID` call, which is *incredibly* slow. This *also* happens for fetching auth events we are missing. Luckily, Matrix defines two bulk-fetch endpoints that help us. So, this PR updates our gapfilling code to be more reliable and effective. > [!WARNING] > This pull request is **UNSTABLE** and may result in **"PERMANENT" DATA LOSS**. Do NOT deploy to a production server until it is confirmed to be stable. I know "we can now persist >1,000 events in under 5 seconds" sounds tantalising, but there's no guarantees half of those events aren't just being silently dropped. **Pull request checklist:** <!-- You need to complete these before your PR can be considered. If you aren't sure about some, feel free to ask for clarification in #dev:continuwuity.org. --> - [x] This pull request targets the `main` branch, and the branch is named something other than `main`. - [x] I have written an appropriate pull request title and my description is clear. - [x] I understand I am responsible for the contents of this pull request. - I have followed the [contributing guidelines][c1]: - [x] My contribution follows the [code style][c2], if applicable. - [x] I ran [pre-commit checks][c1pc] before opening/drafting this pull request. - [x] I have [tested my contribution][c1t] (or proof-read it for documentation-only changes) myself, if applicable. This includes ensuring code compiles. - [x] My commit messages follow the [commit message format][c1cm] and are descriptive. <!-- Notes on these requirements: - While not required, we encourage you to sign your commits with GPG or SSH to attest the authenticity of your changes. - While we allow LLM-assisted contributions, we do not appreciate contributions that are low quality, which is typical of machine-generated contributions that have not had a lot of love and care from a human. Please do not open a PR if all you have done is asked ChatGPT to tidy up the codebase with a +-100,000 diff. - In the case of code style violations, reviewers may leave review comments/change requests indicating what the ideal change would look like. For example, a reviewer may suggest you lower a log level, or use `match` instead of `if/else` etc. - In the case of code style violations, pre-commit check failures, minor things like typos/spelling errors, and in some cases commit format violations, reviewers may modify your branch directly, typically by making changes and adding a commit. Particularly in the latter case, a reviewer may rebase your commits to squash "spammy" ones (like "fix", "fix", "actually fix"), and reword commit messages that don't satisfy the format. - Pull requests MUST pass the `Checks` CI workflows to be capable of being merged. This can only be bypassed in exceptional circumstances. If your CI flakes, let us know in matrix:r/dev:continuwuity.org. - Pull requests have to be based on the latest `main` commit before being merged. If the main branch changes while you're making your changes, you should make sure you rebase on main before opening a PR. Your branch will be rebased on main before it is merged if it has fallen behind. - We typically only do fast-forward merges, so your entire commit log will be included. Once in main, it's difficult to get out cleanly, so put on your best dress, smile for the cameras! --> [c1]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/CONTRIBUTING.md [c2]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/docs/development/code_style.mdx [c1pc]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/CONTRIBUTING.md#pre-commit-checks [c1t]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/CONTRIBUTING.md#running-tests-locally [c1cm]: https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/main/CONTRIBUTING.md#commit-messages
nex self-assigned this 2026-05-26 19:16:29 +00:00
perf: Make max gap depth fetch configurable
Some checks failed
Auto Labeler / Apply labels based on changed files (pull_request_target) Successful in 4s
Checks / Prek / Check changed files (pull_request) Successful in 6s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m33s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m36s
Checks / Changelog / Check changelog is added (pull_request_target) Failing after 8s
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
a1eeef95e0
nex force-pushed nex/perf/get-missing-events from a1eeef95e0
Some checks failed
Auto Labeler / Apply labels based on changed files (pull_request_target) Successful in 4s
Checks / Prek / Check changed files (pull_request) Successful in 6s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m33s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m36s
Checks / Changelog / Check changelog is added (pull_request_target) Failing after 8s
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
to d15064871e
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Failing after 33s
Checks / Prek / Check changed files (pull_request) Successful in 10s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m58s
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
2026-05-26 19:22:51 +00:00
Compare
perf: Increase default max_fetch_prev_events to 256
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Failing after 8s
Checks / Prek / Check changed files (pull_request) Successful in 27s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m17s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m32s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 34m17s
c4d297ae3b
feat: Keep track of a min_depth value
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Failing after 8s
Checks / Prek / Check changed files (pull_request) Successful in 32s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m14s
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
0d4bbe612d
Should prevent weird situations where we accidentally gapfill into backfill territory
chore: Add newsfrag
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Check changed files (pull_request) Successful in 7s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m17s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m12s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 21m16s
c7f8eec282
fix: Be smarter when re-receiving already-seen PDUs
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Check changed files (pull_request) Successful in 33s
Documentation / Build and Deploy Documentation (pull_request) Failing after 2m18s
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
58b2af3f45
fix: Don't lie about using already-known content
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Check changed files (pull_request) Successful in 6s
Documentation / Build and Deploy Documentation (pull_request) Failing after 2m18s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m52s
Checks / Prek / Pre-commit & Formatting (pull_request) Failing after 25m17s
a229961c1d
feat: Enhance reliability by fetching full state when we're missing a lot of auth events
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 4s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 7s
Documentation / Build and Deploy Documentation (pull_request) Failing after 2m17s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 4m27s
Checks / Prek / Pre-commit & Formatting (pull_request) Failing after 6m0s
94fb7e6f84
fix: Remove short-term memory loss
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 5s
Checks / Prek / Check changed files (pull_request) Successful in 5s
Documentation / Build and Deploy Documentation (pull_request) Failing after 2m17s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 4m27s
Checks / Prek / Pre-commit & Formatting (pull_request) Failing after 5m17s
4edc4269e7
I keep writing forgetful code, it's a problem
fix: Fall back to atomic fetch when full-state fetch fails
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 5s
Checks / Prek / Check changed files (pull_request) Successful in 7s
Documentation / Build and Deploy Documentation (pull_request) Failing after 2m15s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 4m26s
Checks / Prek / Pre-commit & Formatting (pull_request) Failing after 5m18s
9daf828724
fix: Don't try and fetch zero events
All checks were successful
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 53s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m12s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m21s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m45s
c6785befa9
style: Rename gapfill helpers instruments
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 6s
Checks / Prek / Check changed files (pull_request) Successful in 7s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m12s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m32s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m40s
2eeddc3c14
fix: Don't silence PDU handle logs
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 6s
Checks / Prek / Check changed files (pull_request) Successful in 7s
Documentation / Build and Deploy Documentation (pull_request) Successful in 55s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m12s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m32s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m46s
701d2aae08
fix: Make dedupe noisy, don't allow non-create event as create event
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 6s
Checks / Prek / Check changed files (pull_request) Successful in 6s
Documentation / Build and Deploy Documentation (pull_request) Successful in 56s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m35s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m47s
7ff499a78d
fix: Make PDU handle errors noisier & correct error types
All checks were successful
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m11s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m57s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 2m3s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m44s
48c182d05d
fix: Remove redundant check that accidentally banned everyone
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 6s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 55s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m37s
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
22abf0acfe
fix: Remove redundant check 2
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 7s
Checks / Prek / Check changed files (pull_request) Successful in 6s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 55s
Documentation / Build and Deploy Documentation (pull_request) Successful in 58s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m36s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m48s
17d0e3edab
This may look scary, but this is safe because event auth performs the same check, and will reject the event if it doesn't reference the create event correctly.
fix: Ask more servers for state_ids when origin fails to provide
All checks were successful
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 57s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m10s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m18s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m44s
f06997ba93
Some servers reference events in prev_events that they might not yet have finished processing, so this allows us to at least attempt to get the state from another trustworthy server in the room that might be faster. I don't think this is too effective, however it's more effective than giving up immediately.
fix: Don't treat prev outlier upgrades as fetch failures
All checks were successful
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 54s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m15s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m31s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 11m30s
82616ed9be
feat: Include timing information in debug logs
All checks were successful
Checks / Prek / Check changed files (pull_request) Successful in 4s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m13s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m30s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m42s
186cd149f4
feat: Make logging more verbose to diagnose the aranjesplosion
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 6s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 55s
Documentation / Build and Deploy Documentation (pull_request) Successful in 58s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m35s
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
60e4fe0bed
fix: Don't download the world
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 9s
Checks / Prek / Check changed files (pull_request) Successful in 7s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 59s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m1s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m41s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m34s
110c482122
fix: Downgrade safe assert to debug assert
All checks were successful
Checks / Prek / Check changed files (pull_request) Successful in 7s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 56s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m0s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m37s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m42s
f8ebd0a107
fix: Progress log in fetch_prev
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 7s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 55s
Documentation / Build and Deploy Documentation (pull_request) Successful in 57s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m35s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m5s
39866f87d6
fix: Elide auth chain from fetch_and_handle_outliers
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m0s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m18s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m28s
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
90c33b6ab2
fix: Be noisy when there's no incoming state
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 9s
Checks / Prek / Check changed files (pull_request) Successful in 9s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 56s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m41s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m40s
ade214cb2b
perf: Don't add trees we already have to latest boundary
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Documentation / Build and Deploy Documentation (pull_request) Successful in 57s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m27s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m8s
e0359ad7cb
perf: Don't try to re-persist non-outliers we already have
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 7s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m1s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m15s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m30s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m14s
d5eb48e2a2
fix: Friendly assertations
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m0s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m15s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m27s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m7s
883adcef9e
fix: Correctly pre-populate state events vec with known events
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 6s
Checks / Prek / Check changed files (pull_request) Successful in 7s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m12s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m14s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m58s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m5s
784ae234ca
fix: Correct inverted boolean condition, add explicit timeout on /state fetch
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 7s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m9s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m18s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m12s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m6s
0acd9819a8
fix: Correctly handle still-missing state, always fetch full state atomically if regular fetch fails
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 6s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 6s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m14s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m29s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m6s
7fe8d95fa1
style: Resolve lint complaints
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 6s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m15s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m18s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m25s
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
5290b36f4b
fix: Make fetch_state_ids_from_backfill_servers candidate-free safe
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 7s
Checks / Prek / Check changed files (pull_request) Successful in 5s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m28s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m35s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m3s
c8e1c657ae
nex force-pushed nex/perf/get-missing-events from c8e1c657ae
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 7s
Checks / Prek / Check changed files (pull_request) Successful in 5s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m28s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m35s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m3s
to b7e0d96b91
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 6s
Checks / Prek / Check changed files (pull_request) Successful in 6s
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Update flake hashes / update-flake-hashes (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
2026-05-31 17:51:53 +00:00
Compare
nex force-pushed nex/perf/get-missing-events from b7e0d96b91
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 6s
Checks / Prek / Check changed files (pull_request) Successful in 6s
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Update flake hashes / update-flake-hashes (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
to c8e1c657ae
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 7s
Checks / Prek / Check changed files (pull_request) Successful in 5s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m28s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m35s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m3s
2026-05-31 17:52:52 +00:00
Compare
Author
Owner

So this PR yields unparalleled speed and consistency improvements over the base branch, even in this unoptimised state, however it highlights a problem. Filling gaps can take an extraordinarily long time, especially if servers pushing events to us are unreachable (e.g. ipv6 only server pushes to a server without ipv6 egress), or the room graph is really complex and actively moving (e.g. the ping room). As a result, the median PDU handle time with this PR can easily go from ~4ms to >500ms, with a few minutes being the worst-case observed thus far, since we now try significantly more aggressively to pull in events that we missed. The best-case remains unchanged, and generally speaking consistency is improved too, but in the future, processing missing events and upgrading outlier PDUs to timeline PDUs will probably need to be cast into a background task, as right now the entire room becomes locked while prev events are being fetched, which as stated, can take minutes. Not sure if that'll get squeezed into this PR's scope or whether it'll end up being a separate job.

So this PR yields unparalleled speed and consistency improvements over the base branch, even in this unoptimised state, however it highlights a problem. Filling gaps can take an extraordinarily long time, especially if servers pushing events to us are unreachable (e.g. ipv6 only server pushes to a server without ipv6 egress), or the room graph is really complex and actively moving (e.g. the ping room). As a result, the median PDU handle time with this PR can easily go from ~4ms to >500ms, with a few minutes being the worst-case observed thus far, since we now try significantly more aggressively to pull in events that we missed. The best-case remains unchanged, and generally speaking consistency is improved too, but in the future, processing missing events and upgrading outlier PDUs to timeline PDUs will probably need to be cast into a background task, as right now the entire room becomes locked while prev events are being fetched, which as stated, can take minutes. Not sure if that'll get squeezed into this PR's scope or whether it'll end up being a separate job.
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 7s
Required
Details
Checks / Prek / Check changed files (pull_request) Successful in 5s
Required
Details
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m28s
Required
Details
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m35s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m3s
Required
Details
This pull request is marked as a work in progress.
This branch is out-of-date with the base branch
View command line instructions

Checkout

From your project repository, check out a new branch and test the changes.
git fetch -u origin nex/perf/get-missing-events:nex/perf/get-missing-events
git switch nex/perf/get-missing-events
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
continuwuation/continuwuity!1818
No description provided.