WIP: perf: Improve gapfill logic #1818

Draft
nex wants to merge 49 commits from nex/perf/get-missing-events into main

49 commits

Author SHA1 Message Date
c8e1c657ae
fix: Make fetch_state_ids_from_backfill_servers candidate-free safe
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 7s
Checks / Prek / Check changed files (pull_request) Successful in 5s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m28s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m35s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m3s
2026-05-31 17:47:22 +01:00
5290b36f4b
style: Resolve lint complaints
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 6s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m15s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m18s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m25s
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
2026-05-31 17:41:15 +01:00
7fe8d95fa1
fix: Correctly handle still-missing state, always fetch full state atomically if regular fetch fails
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 6s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 6s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m14s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m29s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m6s
2026-05-31 17:26:26 +01:00
0acd9819a8
fix: Correct inverted boolean condition, add explicit timeout on /state fetch
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 7s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m9s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m18s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m12s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m6s
2026-05-31 16:35:09 +01:00
ef3a54cbe2
perf: Always fetch at least N events per GME 2026-05-31 16:24:03 +01:00
784ae234ca
fix: Correctly pre-populate state events vec with known events
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 6s
Checks / Prek / Check changed files (pull_request) Successful in 7s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m12s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m14s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m58s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m5s
2026-05-31 15:55:08 +01:00
883adcef9e
fix: Friendly assertations
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m0s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m15s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m27s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m7s
2026-05-31 00:17:30 +01:00
d5eb48e2a2
perf: Don't try to re-persist non-outliers we already have
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 7s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m1s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m15s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m30s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m14s
2026-05-30 22:56:00 +01:00
e0359ad7cb
perf: Don't add trees we already have to latest boundary
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Documentation / Build and Deploy Documentation (pull_request) Successful in 57s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m27s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m8s
2026-05-30 22:48:23 +01:00
ade214cb2b
fix: Be noisy when there's no incoming state
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 9s
Checks / Prek / Check changed files (pull_request) Successful in 9s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 56s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m41s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m40s
2026-05-30 22:05:19 +01:00
90c33b6ab2
fix: Elide auth chain from fetch_and_handle_outliers
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m0s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m18s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m28s
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
2026-05-30 21:57:30 +01:00
39866f87d6
fix: Progress log in fetch_prev
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 7s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 55s
Documentation / Build and Deploy Documentation (pull_request) Successful in 57s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m35s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m5s
2026-05-30 21:03:01 +01:00
f8ebd0a107
fix: Downgrade safe assert to debug assert
All checks were successful
Checks / Prek / Check changed files (pull_request) Successful in 7s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 56s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m0s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m37s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m42s
2026-05-30 20:29:16 +01:00
110c482122
fix: Don't download the world
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 9s
Checks / Prek / Check changed files (pull_request) Successful in 7s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 59s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m1s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m41s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m34s
2026-05-30 19:30:35 +01:00
60e4fe0bed
feat: Make logging more verbose to diagnose the aranjesplosion
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 6s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 55s
Documentation / Build and Deploy Documentation (pull_request) Successful in 58s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m35s
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
2026-05-30 19:24:48 +01:00
186cd149f4
feat: Include timing information in debug logs
All checks were successful
Checks / Prek / Check changed files (pull_request) Successful in 4s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m13s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m30s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m42s
2026-05-30 18:33:44 +01:00
82616ed9be
fix: Don't treat prev outlier upgrades as fetch failures
All checks were successful
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 54s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m15s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m31s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 11m30s
2026-05-30 18:02:09 +01:00
f06997ba93
fix: Ask more servers for state_ids when origin fails to provide
All checks were successful
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 57s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m10s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m18s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m44s
Some servers reference events in prev_events that they might not yet have finished processing, so this allows us to at least attempt to get the state from another trustworthy server in the room that might be faster. I don't think this is too effective, however it's more effective than giving up immediately.
2026-05-30 17:41:23 +01:00
17d0e3edab
fix: Remove redundant check 2
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 7s
Checks / Prek / Check changed files (pull_request) Successful in 6s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 55s
Documentation / Build and Deploy Documentation (pull_request) Successful in 58s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m36s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m48s
This may look scary, but this is safe because event auth performs the same check, and will reject the event if it doesn't reference the create event correctly.
2026-05-30 17:12:40 +01:00
22abf0acfe
fix: Remove redundant check that accidentally banned everyone
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 6s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 55s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m37s
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
2026-05-30 17:07:27 +01:00
48c182d05d
fix: Make PDU handle errors noisier & correct error types
All checks were successful
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m11s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m57s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 2m3s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m44s
2026-05-30 16:55:32 +01:00
7ff499a78d
fix: Make dedupe noisy, don't allow non-create event as create event
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 6s
Checks / Prek / Check changed files (pull_request) Successful in 6s
Documentation / Build and Deploy Documentation (pull_request) Successful in 56s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m35s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m47s
2026-05-29 21:47:36 +01:00
701d2aae08
fix: Don't silence PDU handle logs
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 6s
Checks / Prek / Check changed files (pull_request) Successful in 7s
Documentation / Build and Deploy Documentation (pull_request) Successful in 55s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m12s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m32s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m46s
2026-05-29 20:57:05 +01:00
2eeddc3c14
style: Rename gapfill helpers instruments
All checks were successful
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 6s
Checks / Prek / Check changed files (pull_request) Successful in 7s
Documentation / Build and Deploy Documentation (pull_request) Successful in 59s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m12s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m32s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m40s
2026-05-29 17:30:46 +01:00
8b80d64bce
fix: Properly remove event_id from the PDU JSON before upgrading it 2026-05-29 17:15:07 +01:00
f1456e28f4
fix: Hold a federation room lock while remotely joining a room 2026-05-29 14:53:15 +01:00
0fd7194084
fix: Replace our local extremity tracking when joining a disconnected room remotely 2026-05-29 14:45:37 +01:00
c6785befa9
fix: Don't try and fetch zero events
All checks were successful
Checks / Prek / Check changed files (pull_request) Successful in 5s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 53s
Update flake hashes / update-flake-hashes (pull_request) Successful in 1m12s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m21s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 8m45s
2026-05-29 14:22:08 +01:00
9daf828724
fix: Fall back to atomic fetch when full-state fetch fails
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 5s
Checks / Prek / Check changed files (pull_request) Successful in 7s
Documentation / Build and Deploy Documentation (pull_request) Failing after 2m15s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 4m26s
Checks / Prek / Pre-commit & Formatting (pull_request) Failing after 5m18s
2026-05-28 21:13:42 +01:00
4edc4269e7
fix: Remove short-term memory loss
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 5s
Checks / Prek / Check changed files (pull_request) Successful in 5s
Documentation / Build and Deploy Documentation (pull_request) Failing after 2m17s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 4m27s
Checks / Prek / Pre-commit & Formatting (pull_request) Failing after 5m17s
I keep writing forgetful code, it's a problem
2026-05-28 19:59:33 +01:00
060d0718f1
fix: Don't try to fetch the same event endlessly 2026-05-28 19:30:17 +01:00
de06b7ccf8
fix: Don't repeat already-included metadata in fetch_state instrument 2026-05-28 18:46:27 +01:00
94fb7e6f84
feat: Enhance reliability by fetching full state when we're missing a lot of auth events
Some checks failed
Checks / Prek / Check changed files (pull_request) Successful in 4s
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 7s
Documentation / Build and Deploy Documentation (pull_request) Failing after 2m17s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 4m27s
Checks / Prek / Pre-commit & Formatting (pull_request) Failing after 6m0s
2026-05-28 18:38:27 +01:00
e437fc2351
fix: Calculate max iterations dynamically, and bump max prevs 2026-05-28 17:42:01 +01:00
dc5505f25a
perf(wip): Improve individual events fetcher 2026-05-27 02:43:08 +01:00
a229961c1d
fix: Don't lie about using already-known content
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Check changed files (pull_request) Successful in 6s
Documentation / Build and Deploy Documentation (pull_request) Failing after 2m18s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m52s
Checks / Prek / Pre-commit & Formatting (pull_request) Failing after 25m17s
2026-05-27 00:51:12 +01:00
58b2af3f45
fix: Be smarter when re-receiving already-seen PDUs
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Check changed files (pull_request) Successful in 33s
Documentation / Build and Deploy Documentation (pull_request) Failing after 2m18s
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
2026-05-27 00:43:37 +01:00
8ce6ca50dd
perf: Don't re-process events as outliers 2026-05-27 00:33:57 +01:00
40770cbd08
style: Improve logging 2026-05-27 00:25:56 +01:00
8be7739c9a
fix: Lower floor for min depth 2026-05-27 00:08:11 +01:00
9132e98236
fix: Only increment mindepth on state events 2026-05-26 23:29:31 +01:00
c7f8eec282
chore: Add newsfrag
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Successful in 8s
Checks / Prek / Check changed files (pull_request) Successful in 7s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m17s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 5m12s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 21m16s
2026-05-26 21:26:57 +01:00
0d4bbe612d
feat: Keep track of a min_depth value
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Failing after 8s
Checks / Prek / Check changed files (pull_request) Successful in 32s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m14s
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
Should prevent weird situations where we accidentally gapfill into backfill territory
2026-05-26 21:22:32 +01:00
c4d297ae3b
perf: Increase default max_fetch_prev_events to 256
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Failing after 8s
Checks / Prek / Check changed files (pull_request) Successful in 27s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m17s
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 1m32s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 34m17s
2026-05-26 20:25:37 +01:00
d15064871e
perf: Make max gap depth fetch configurable
Some checks failed
Checks / Changelog / Check changelog is added (pull_request_target) Failing after 33s
Checks / Prek / Check changed files (pull_request) Successful in 10s
Documentation / Build and Deploy Documentation (pull_request) Successful in 1m58s
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
2026-05-26 20:22:45 +01:00
b925936195
perf: Improve gap filling, handle missing auth events better 2026-05-26 20:22:45 +01:00
56feba0ea0
fix: This is some bullshit I tell you 2026-05-26 20:22:40 +01:00
8d89ba94d5
feat: Better prev event fetching
fix: Don't panic in debug mode when making an empty notary query
2026-05-26 20:22:10 +01:00
0b135c7717
feat: Add backfill_missing_events helper 2026-05-26 20:22:10 +01:00