perf: refactor the service manager to achieve significantly faster server initialization #1482

Closed
gamesguru wants to merge 27 commits from gamesguru/continuwuity:guru/fix/cli-optimization/faster-startup-safer-shutdown into main
Contributor

Update (3/14): All systems go, console Ctrl+C missing fixed with: cargo build --profile release --features default,console
busted due to moving some things to another PR, especially Ctrl+C is badly busted


Update (3/5/26): Going to remove the work regarding the shutdown, since I realized there's a better way that I don't have time for at the moment. The startup refactor I feel pretty confident in. The amount of speedup depends on your hardware... may be only 3x faster on a powerful setup, but could be close to 100x on systems that are completely choking up.


Refactor the service manager to achieve 30x faster server initialization and avoiding full database scans by implementing index-aware presence resets on a background thread.

And fix the shutdown manager by terminating workers in the correct order of their dependency hierarchy, thereby preventing shutdown hangs and resulting in an extremely fast shutdown sequence which gives special preference only to the RocksDB worker and the integrity of the database.

Pull request checklist:

  • This pull request targets the main branch, and the branch is named something other than
    main.
  • I have written an appropriate pull request title and my description is clear.
  • I understand I am responsible for the contents of this pull request.
  • I have followed the [contributing guidelines][c1]:
    • My contribution follows the [code style][c2], if applicable.
    • I ran [pre-commit checks][c1pc] before opening/drafting this pull request.
    • I have [tested my contribution][c1t] (or proof-read it for documentation-only changes)
      myself, if applicable. This includes ensuring code compiles.
    • My commit messages follow the [commit message format][c1cm] and are descriptive.
    • I have written a [news fragment][n1] for this PR, if applicable.
Update (3/14): All systems go, console Ctrl+C missing fixed with: `cargo build --profile release --features default,console` ~~busted due to moving some things to another PR, especially `Ctrl+C` is badly busted~~ ------------------------------------ Update (3/5/26): Going to remove the work regarding the **shutdown**, since I realized there's a better way that I don't have time for at the moment. The **startup** refactor I feel pretty confident in. The amount of speedup depends on your hardware... may be only 3x faster on a powerful setup, but could be close to 100x on systems that are completely choking up. ------------------------------------ Refactor the service manager to achieve 30x faster server initialization and avoiding full database scans by implementing index-aware presence resets on a background thread. And fix the shutdown manager by terminating workers in the correct order of their dependency hierarchy, thereby preventing shutdown hangs and resulting in an extremely fast shutdown sequence which gives special preference only to the RocksDB worker and the integrity of the database. **Pull request checklist:** <!-- You need to complete these before your PR can be considered. If you aren't sure about some, feel free to ask for clarification in #dev:continuwuity.org. --> - [x] This pull request targets the `main` branch, and the branch is named something other than `main`. - [x] I have written an appropriate pull request title and my description is clear. - [x] I understand I am responsible for the contents of this pull request. - I have followed the [contributing guidelines][c1]: - [x] My contribution follows the [code style][c2], if applicable. - [x] I ran [pre-commit checks][c1pc] before opening/drafting this pull request. - [x] I have [tested my contribution][c1t] (or proof-read it for documentation-only changes) myself, if applicable. This includes ensuring code compiles. - [x] My commit messages follow the [commit message format][c1cm] and are descriptive. - [x] I have written a [news fragment][n1] for this PR, if applicable<!--(can be done after hitting open!)-->.
fix: restore less ominous warning log from guru/nightly
All checks were successful
Documentation / Build and Deploy Documentation (pull_request) Has been skipped
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 3m44s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 21m36s
6a4f968f5f
gamesguru changed title from Refactor the service manager to achieve 30x faster server initialization to feat: refactor the service manager to achieve 30x faster server initialization 2026-03-03 08:47:50 +00:00
@ -10,3 +10,3 @@
events::{
AnyGlobalAccountDataEventContent, AnyRoomAccountDataEventContent,
GlobalAccountDataEventType, RoomAccountDataEventType,
RoomAccountDataEventType,
Member

FWIW, I believe this change is covered by #1479, so you may want to rebase depending on the outcome of that one.

FWIW, I believe this change is covered by #1479, so you may want to rebase depending on the outcome of that one.
gamesguru marked this conversation as resolved
Jade changed title from feat: refactor the service manager to achieve 30x faster server initialization to WIP: feat: refactor the service manager to achieve 30x faster server initialization 2026-03-03 18:32:36 +00:00
Merge branch 'main' into guru/fix/cli-optimization/faster-startup-safer-shutdown
Some checks failed
Documentation / Build and Deploy Documentation (pull_request) Has been skipped
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 4m19s
Checks / Prek / Clippy and Cargo Tests (pull_request) Failing after 36m19s
de9b531955
Author
Contributor

Commits affecting the below files will be removed to other branches. It is advised to use the git CLI when reviewing my PRs, especially large ones.

# setup
git remote add ellis-gg https://forgejo.ellis.link/gamesguru/continuwuity.git
git fetch --all

# configure lgb alias in .gitconfig
git config --global alias.lgb "log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%aN>%Creset%n' --abbrev-commit --date=relative"

# view raw branch history
git lgb ellis-gg/guru/fix/cli-optimization/faster-startup-safer-shutdown

# view full diff against main (using '...' to see changes Diverged from main)
git diff origin/main...ellis-gg/guru/fix/cli-optimization/faster-startup-safer-shutdown

# view diff excluding the undesired files
IGNORE_FILES_LIST=(
  src/api/router/auth.rs
  src/api/server/utils.rs
  src/core/config/mod.rs
  src/database/engine/db_opts.rs
  src/service/resolver/actual.rs
  src/service/resolver/dns.rs
  src/service/server_keys/acquire.rs
)
IGNORE="${IGNORE_FILES_LIST[@]/#/:^}"

git diff origin/main...ellis-gg/guru/fix/cli-optimization/faster-startup-safer-shutdown -- $IGNORE
# "stat" only diff (+/- count per file)
git diff --stat origin/main...ellis-gg/guru/fix/cli-optimization/faster-startup-safer-shutdown -- $IGNORE
Commits affecting the below files will be removed to other branches. It is advised to use the `git` CLI when reviewing my PRs, especially large ones. ```shell # setup git remote add ellis-gg https://forgejo.ellis.link/gamesguru/continuwuity.git git fetch --all # configure lgb alias in .gitconfig git config --global alias.lgb "log --graph --pretty=format:'%Cred%h%Creset -%C(yellow)%d%Creset %s %Cgreen(%cr) %C(bold blue)<%aN>%Creset%n' --abbrev-commit --date=relative" # view raw branch history git lgb ellis-gg/guru/fix/cli-optimization/faster-startup-safer-shutdown # view full diff against main (using '...' to see changes Diverged from main) git diff origin/main...ellis-gg/guru/fix/cli-optimization/faster-startup-safer-shutdown # view diff excluding the undesired files IGNORE_FILES_LIST=( src/api/router/auth.rs src/api/server/utils.rs src/core/config/mod.rs src/database/engine/db_opts.rs src/service/resolver/actual.rs src/service/resolver/dns.rs src/service/server_keys/acquire.rs ) IGNORE="${IGNORE_FILES_LIST[@]/#/:^}" git diff origin/main...ellis-gg/guru/fix/cli-optimization/faster-startup-safer-shutdown -- $IGNORE # "stat" only diff (+/- count per file) git diff --stat origin/main...ellis-gg/guru/fix/cli-optimization/faster-startup-safer-shutdown -- $IGNORE ```
Author
Contributor

@lveneris I'm not able to see your comment due to some mobile bug. But referring back to my email...

I did manual testing to verify approximately correct behavior.

The commit about the admin service was added (perhaps only on my nightly build branch?) as a direct result of unexpected behavior.

As far as I recall, router is king. He holds the database handles, which in turn hold handles to sub processes calling out to the DB driver.

The code you are referencing occurs at the end, not start. So I'm not understanding your specific concern... That the database must continue to function after closing the app? Or that this permanently corrupts the database?

@lveneris I'm not able to see your comment due to some mobile bug. But referring back to my email... I did manual testing to verify approximately correct behavior. The commit about the admin service was added (perhaps only on my nightly build branch?) as a direct result of unexpected behavior. As far as I recall, router is king. He holds the database handles, which in turn hold handles to sub processes calling out to the DB driver. The code you are referencing occurs at the end, not start. So I'm not understanding your specific concern... That the database must continue to function after closing the app? Or that this permanently corrupts the database?
Author
Contributor

The proper, safe way, according to a rust expert from a bulliten board system involves both traceability and parallelism refactors, and likely some changes in DB driver behavior

The proper, safe way, according to a rust expert from a bulliten board system involves both traceability and parallelism refactors, and likely some changes in DB driver behavior
Author
Contributor

All mine really achieves in the present form on this PR is the avoidance of deadlocks at shutdown time due to congestion along the dependency chain from various services still trying to operate.

It achieved consistently quick shutdowns perhaps at the expense of, if your computer loses power immediately after, you may need to rebuild from the WAL next boot, which is painfully slow and mildly risky

All mine really achieves in the present form on this PR is the avoidance of deadlocks at shutdown time due to congestion along the dependency chain from various services still trying to operate. It achieved consistently quick shutdowns perhaps at the expense of, if your computer loses power immediately after, you may need to rebuild from the WAL next boot, which is painfully slow and mildly risky
Contributor

@gamesguru That wasn't a mobile bug - I reviewed your PR from mobile, didn't see the full filenames when I was looking over things because my phone is small, noticed my mistake within a minute of posting my review, and promptly retracted it. Not the highlight of my career, that's for sure. Apologies for the misunderstanding.

I will go over this on a more appropriate device at a later date. When do you plan to remove / relocate the changes you have deemed irrelevant to this PR?

@gamesguru That wasn't a mobile bug - I reviewed your PR from mobile, didn't see the full filenames when I was looking over things because my phone is small, noticed my mistake within a minute of posting my review, and promptly retracted it. Not the highlight of my career, that's for sure. Apologies for the misunderstanding. I will go over this on a more appropriate device at a later date. When do you plan to remove / relocate the changes you have deemed irrelevant to this PR?
Author
Contributor

@lveneris
Ok no worries man. I've done the same thing, but usually not leaving an email record of it 😅

You don't have to hang your head in shame, next time just edit the comment. "Edit: whoops realized that was at the end not start! Carry on, I'll see if i notice anything else here."

We're all human.. you don't know how many hours I spent staring at the screen for this one, how many builds i cut where i added log statements that didn't tell me what I hoped, or how many of those builds crapped out after 10 minutes from missed formatting changes or basic syntax... any participation here is good man. Thanks for commenting. Gave me a chance to explain a bit better something that probably left a lot of people a bit unclear.

@lveneris Ok no worries man. I've done the same thing, but usually not leaving an email record of it 😅 You don't have to hang your head in shame, next time just edit the comment. "Edit: whoops realized that was at the end not start! Carry on, I'll see if i notice anything else here." We're all human.. you don't know how many hours I spent staring at the screen for this one, how many builds i cut where i added log statements that didn't tell me what I hoped, or how many of those builds crapped out after 10 minutes from missed formatting changes or basic syntax... any participation here is good man. Thanks for commenting. Gave me a chance to explain a bit better something that probably left a lot of people a bit unclear.
shelve(logging): add some details to wimpy auth log
Some checks failed
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
82b5b6f22a
shelve(logging): log details added to server/utils
Some checks failed
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
b5201828e4
@ -2806,2 +2805,3 @@
fn default_client_shutdown_timeout() -> u64 { 10 }
fn default_sender_shutdown_timeout() -> u64 { 5 }
fn default_sender_shutdown_timeout() -> u64 { 3 }
Author
Contributor

There's really no point in waiting longer than 3 seconds after corking. I'm not sure that 5 seconds guarantees any safer shutdown.

There's really no point in waiting longer than 3 seconds after corking. I'm not sure that 5 seconds guarantees any safer shutdown.
Owner

Some people have slower or more overloaded devices.

Some people have slower or more overloaded devices.
Author
Contributor

My device is quite overloaded, 150,000 users and 1 GB of ram. Yet waiting 1 second in my experiments changed nothing about the state of the RocksDB handlers compared to even 120 seconds.

Regardless, it's going into a separate PR for shutdown cleanup logic.

My device is quite overloaded, 150,000 users and 1 GB of ram. Yet waiting 1 second in my experiments changed nothing about the state of the RocksDB handlers compared to even 120 seconds. Regardless, it's going into a separate PR for shutdown cleanup logic.
gamesguru marked this conversation as resolved
@ -29,3 +29,3 @@
opts.set_max_subcompactions(num_threads::<u32>(config)?);
opts.set_avoid_unnecessary_blocking_io(true);
opts.set_max_file_opening_threads(0);
opts.set_max_file_opening_threads(num_threads::<i32>(config)?);
Author
Contributor

Supposedly allows for faster ingestion of the WAL on startup, as well as runtime parallelism for read ops. I'm not sure what the default behavior of zero did.

Supposedly allows for faster ingestion of the WAL on startup, as well as runtime parallelism for read ops. I'm not sure what the default behavior of zero did.
Owner

This appears undocumented.

This appears undocumented.
Author
Contributor

Internally or at RocksDB? See, for example, their C# wrapper:

        /// <summary>
        /// If max_open_files is -1, DB will open all files on DB::Open(). You can
        /// use this option to increase the number of threads used to open the files.
        /// Default: 16
        /// </summary>
        /// <param name="value"></param>
        /// <returns></returns>
        public DbOptions SetMaxFileOpeningThreads(int value)
        {
            Native.Instance.rocksdb_options_set_max_file_opening_threads(Handle, value);
            return this;
        }

Whoever set it to zero, I'm not sure where they got the idea to use zero.

Internally or at RocksDB? See, for example, their C# wrapper: ```csharp /// <summary> /// If max_open_files is -1, DB will open all files on DB::Open(). You can /// use this option to increase the number of threads used to open the files. /// Default: 16 /// </summary> /// <param name="value"></param> /// <returns></returns> public DbOptions SetMaxFileOpeningThreads(int value) { Native.Instance.rocksdb_options_set_max_file_opening_threads(Handle, value); return this; } ``` Whoever set it to zero, I'm not sure where they got the idea to use zero.
@ -373,0 +374,4 @@
"Admin command handler is not yet loaded. The server may still be booting or \
the admin module failed to load.",
));
};
Author
Contributor

This was necessary, as interrupting the program during the initialization sequence led to some pretty horrible crashes.

This was necessary, as interrupting the program during the initialization sequence led to some pretty horrible crashes.
@ -61,0 +67,4 @@
} else {
None
};
Author
Contributor

Presence updates are directly related to this PR.

A dumb, full Database scan was the primary culprit for slow startups (see below comment on removed code block with call chains).

Presence updates are directly related to this PR. A dumb, full Database scan was the primary culprit for slow startups (see below comment on removed code block with call chains).
@ -185,4 +196,0 @@
.list_local_users()
.map(ToOwned::to_owned)
.collect::<Vec<OwnedUserId>>()
.await
Author
Contributor

I determined this was the ultimate source of the problem. It scanned every user, a notoriously slow raw query.

I determined this was the ultimate source of the problem. It scanned every user, a notoriously slow raw query.
Owner

This only scans local users -

pub fn list_local_users(&self) -> impl Stream<Item = &UserId> + Send + '_ {

- so is unlikely to be the culprit.

This only scans local users - https://forgejo.ellis.link/continuwuation/continuwuity/src/commit/7207398a9e1cdd7a100d745845062c5460cdcf0b/src/service/users/mod.rs#L361 - so is unlikely to be the culprit.
Author
Contributor

The issue is the function call chain does not behave as the English names of the methods would suggest!

It actually reads all users into memory (incredibly expensive and slow, basically a full raw query), and only then filters based on local or not.

Let me ask you this Jade, how often do you restart your server? Have you ever made an effort to run it from a debugger and pause it on the very obnoxiously slow startup?

Please give it a try if you're still skeptical about this PR. Then build this PR and try it out yourself, your jaw will absolutely drop at the startup improvements.

The issue is the function call chain does not behave as the English names of the methods would suggest! It actually reads all users into memory (incredibly expensive and slow, basically a full raw query), and only then filters based on local or not. Let me ask you this Jade, how often do you restart your server? Have you ever made an effort to run it from a debugger and pause it on the very obnoxiously slow startup? Please give it a try if you're still skeptical about this PR. Then build this PR and try it out yourself, your jaw will absolutely drop at the startup improvements.
@ -135,4 +131,0 @@
.insert(Manager::new(self))
.clone()
.start()
.await?;
Author
Contributor

This was similarly a bit naively implemented.

Prolly is also related to the bug in the --read-only option to the CLI, which was ripped out of the code recently.

This was similarly a bit naively implemented. Prolly is also related to the bug in the `--read-only` option to the CLI, which was ripped out of the code recently.
@ -229,3 +229,3 @@
if let Some(key_ids) = missing.get_mut(server) {
key_ids.retain(|key_id| key_exists(&server_keys, key_id));
key_ids.retain(|key_id| !key_exists(&server_keys, key_id));
Author
Contributor

This is a logical error and performance improvement I would like to get in ASAP.

This is a logical error and performance improvement I would like to get in ASAP.
Owner

To get a fix in sooner, only include that fix, with absolutely no other changes. Without that it will be delayed by the slowest thing in the PR

To get a fix in sooner, only include that fix, with absolutely no other changes. Without that it will be delayed by the slowest thing in the PR
Author
Contributor

This fix is not important to me, as it doesn't really speed things up as much as I hoped. It has anyways been pruned from here and merged separately...

What is important is getting my boot times down from half an hour to 10 seconds.

Please expedite the review process for this PR. I'm completely frustrated with the startup time of half an hour. It's absurd!

This fix is not important to me, as it doesn't really speed things up as much as I hoped. It has anyways been pruned from here and merged separately... What is important is getting my boot times down from half an hour to 10 seconds. Please expedite the review process for this PR. I'm completely frustrated with the startup time of half an hour. It's absurd!
gamesguru marked this conversation as resolved
gamesguru changed title from WIP: feat: refactor the service manager to achieve 30x faster server initialization to feat: refactor the service manager to achieve 30x faster server initialization 2026-03-09 06:14:38 +00:00
lint fixes/formatting
Some checks failed
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
2110ea455b
@ -142,0 +136,4 @@
let manager = {
let mut lock = self.manager.lock().await;
let manager = Manager::new(self);
_ = lock.insert(Arc::clone(&manager));
Author
Contributor

this is admittedly unnecessary, and part of the problem. Since we're doing it everywhere, it's hard to trace the DB handles or flush them properly at shut down.

Think i'll revert this part.

this is admittedly unnecessary, and part of the problem. Since we're doing it everywhere, it's hard to trace the DB handles or flush them properly at shut down. Think i'll revert this part.
Author
Contributor

i simplified this.

i simplified this.
gamesguru marked this conversation as resolved
@ -144,4 +148,0 @@
_ = self
.presence
.ping_presence(&self.globals.server_user, &ruma::presence::PresenceState::Online)
.await;
Author
Contributor

However, this, again, was rather slow and needs help.

However, this, again, was rather slow and needs help.
nex changed title from feat: refactor the service manager to achieve 30x faster server initialization to perf: refactor the service manager to achieve significantly faster server initialization 2026-03-09 16:12:28 +00:00
resolve ginger's migration conflicts in services.rs
Some checks failed
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
eae9e322a8
shelve(sep-PR): Revert "fix: missing logic inversion for acquired keys (should speed up room joins)"
Some checks failed
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
842277af59
This reverts commit 8cd60f5878.
remove dangling info log
Some checks failed
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
518df29bcc
Merge remote-tracking branch 'origin/main' into guru/fix/cli-optimization/faster-startup-safer-shutdown
Some checks failed
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
eaaf40dafb
Author
Contributor

I'm going to re-test this. Since there were significant changes requested, it's effectively untested code.

Luckily this is the one branch that starts up quickly and doesn't take half an hour, phew!

I'm going to re-test this. Since there were significant changes requested, it's effectively untested code. Luckily this is the one branch that starts up quickly and doesn't take half an hour, phew!
gamesguru changed title from perf: refactor the service manager to achieve significantly faster server initialization to wip: perf: refactor the service manager to achieve significantly faster server initialization 2026-03-14 21:44:27 +00:00
gamesguru changed title from wip: perf: refactor the service manager to achieve significantly faster server initialization to perf: refactor the service manager to achieve significantly faster server initialization 2026-03-15 01:23:59 +00:00
Author
Contributor

this is the output from testing today. Notice it only takes 6 seconds to open the database, and 2 more seconds to open the socket. And 4 more seconds to clear the presence updates.

Prior to this PR (the threaded DB init and avoiding the full users table scan), it took 2-10 minutes for each step.

02:06:56.642  INFO conduwuit::server: 0.5.6 (60c3438d) server_name=nutra.tk database_path="/var/lib/conduwuit" log_levels=info
02:07:02.759  INFO open: Opened database. columns=95 sequence=471854674 time=6.069012619s
02:07:02.884  INFO services: Starting services...
02:07:02.884  INFO services: Running database migrations...
02:07:02.970  INFO migrations: Starting media startup integrity check.
02:07:04.448  INFO migrations: Finished media startup integrity check in 1.4780084 seconds.
02:07:04.448  INFO migrations: Loaded RocksDB database with schema version 18
02:07:04.448  INFO services: Starting service manager...
02:07:04.448  INFO services: Starting service workers...
02:07:04.450  INFO services: Services startup complete.
02:07:04.455  INFO unix: Listening at "/var/lib/conduwuit/conduwuit.sock"
02:07:08.246  INFO presence: Presence reset complete: 5766 users reset to offline.
this is the output from testing today. Notice it only takes 6 seconds to open the database, and 2 more seconds to open the socket. And 4 more seconds to clear the presence updates. Prior to this PR (the threaded DB init and avoiding the full users table scan), it took 2-10 minutes for each step. ```log 02:06:56.642 INFO conduwuit::server: 0.5.6 (60c3438d) server_name=nutra.tk database_path="/var/lib/conduwuit" log_levels=info 02:07:02.759 INFO open: Opened database. columns=95 sequence=471854674 time=6.069012619s 02:07:02.884 INFO services: Starting services... 02:07:02.884 INFO services: Running database migrations... 02:07:02.970 INFO migrations: Starting media startup integrity check. 02:07:04.448 INFO migrations: Finished media startup integrity check in 1.4780084 seconds. 02:07:04.448 INFO migrations: Loaded RocksDB database with schema version 18 02:07:04.448 INFO services: Starting service manager... 02:07:04.448 INFO services: Starting service workers... 02:07:04.450 INFO services: Services startup complete. 02:07:04.455 INFO unix: Listening at "/var/lib/conduwuit/conduwuit.sock" 02:07:08.246 INFO presence: Presence reset complete: 5766 users reset to offline. ```
Merge branch 'main' into guru/fix/cli-optimization/faster-startup-safer-shutdown
Some checks failed
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
7a7fa08b5b
Author
Contributor

not much of a flamegraph but i can't replicate this on debug builds. Here's logs showing excess of a minute (19:06:41 -> 19:07:46)... which is already way longer than I've seen on my branches.

Seems the issue gets worse the more I use the release binary, and has a tendency to diminish as I run my allegedly healthier WAL'd, parallel, index-based, etc etc branches.

flamegraph2.svg

note... my main domain nutra tk only has ~80,000 linked users thru rooms. The mdev nutra tk domain, which sadly is even more impossible to test due to being on a sync tokenless v19 schema, has closer to ~150,000 and that is where the issue was absolutely more than twice as bad (i.e., more than linear trend imho).

conduwuit@vps16:~/continuwuity$ /usr/bin/conduwuit --maintenance
2026-03-15T19:06:22.756516Z  INFO conduwuit::server: 0.5.6 (2c723381) server_name=nutra.tk database_path="/var/lib/conduwuit" log_levels=info
2026-03-15T19:06:37.273266Z  INFO main:start:open: conduwuit_database::engine::open: Opened database. columns=95 sequence=475973614 time=14.418700934s
2026-03-15T19:06:41.126223Z  INFO main:start: conduwuit_service::migrations: Loaded RocksDB database with schema version 18
^\
2026-03-15T19:07:46.434424Z  WARN signal: conduwuit::signal: Received SIGQUIT
2026-03-15T19:07:46.467073Z  INFO main:stop: conduwuit_service::services: Shutting down services...
2026-03-15T19:07:46.478467Z  INFO main:stop: conduwuit_database::engine: Closing database... sequence=475973636
2026-03-15T19:07:46.497035Z  INFO main:stop: conduwuit_router::run: Shutdown complete.
not much of a flamegraph but i can't replicate this on debug builds. Here's logs showing excess of a minute (`19:06:41` -> `19:07:46`)... which is already way longer than I've seen on my branches. Seems the issue gets worse the more I use the release binary, and has a tendency to diminish as I run my allegedly healthier WAL'd, parallel, index-based, etc etc branches. `flamegraph2.svg` note... my main domain nutra tk only has ~80,000 linked users thru rooms. The mdev nutra tk domain, which sadly is even more impossible to test due to being on a sync tokenless v19 schema, has closer to ~150,000 and that is where the issue was absolutely more than twice as bad (i.e., more than linear trend imho). ```shell conduwuit@vps16:~/continuwuity$ /usr/bin/conduwuit --maintenance 2026-03-15T19:06:22.756516Z INFO conduwuit::server: 0.5.6 (2c723381) server_name=nutra.tk database_path="/var/lib/conduwuit" log_levels=info 2026-03-15T19:06:37.273266Z INFO main:start:open: conduwuit_database::engine::open: Opened database. columns=95 sequence=475973614 time=14.418700934s 2026-03-15T19:06:41.126223Z INFO main:start: conduwuit_service::migrations: Loaded RocksDB database with schema version 18 ^\ 2026-03-15T19:07:46.434424Z WARN signal: conduwuit::signal: Received SIGQUIT 2026-03-15T19:07:46.467073Z INFO main:stop: conduwuit_service::services: Shutting down services... 2026-03-15T19:07:46.478467Z INFO main:stop: conduwuit_database::engine: Closing database... sequence=475973636 2026-03-15T19:07:46.497035Z INFO main:stop: conduwuit_router::run: Shutdown complete. ```
Owner

Closed due to moderation action

Closed due to moderation action
Jade closed this pull request 2026-03-17 02:32:02 +00:00
Some checks failed
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Required
Details
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
Required
Details

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
continuwuation/continuwuity!1482
No description provided.