feat: implement transactional wrappers around room joins and other key code blocks writing to the database #1455

Closed
gamesguru wants to merge 1 commit from gamesguru/continuwuity:guru/experiment/rocksdb-transactional-wrappers into main
Contributor

Update (3/16/26): I have made solid progress. The room state is being sent without a refresh or cache clear. Unfortunately, there is a race condition associate with it I need to debug, so I ADVISE NO ONE ELSE TEST THIS CODE TIL I PUSH A FIX AND GIVE THE GO AHEAD!

Update (3/15/26): Added logic & tests to support atomic reads in an effort to get downstream syncs to always work without needing to clear the cache, i.e., hoping to fix !779.


This pull request hopefully prevents the race condition of !1142.

Pull request checklist

  • This pull request targets the main branch, and the branch is named something other than
    main.
  • I have written an appropriate pull request title and my description is clear.
  • I understand I am responsible for the contents of this pull request.
  • I have followed the [contributing guidelines][c1]:
    • My contribution follows the [code style][c2], if applicable.
    • I ran [pre-commit checks][c1pc] before opening/drafting this pull request.
    • I have [tested my contribution][c1t] (or proof-read it for documentation-only changes)
      myself, if applicable. This includes ensuring code compiles.
    • My commit messages follow the [commit message format][c1cm] and are descriptive.
    • I have written a [news fragment][n1] for this PR, if applicable.
Update (3/16/26): I have made solid progress. The room state is being sent without a refresh or cache clear. Unfortunately, there is a race condition associate with it I need to debug, so I ADVISE NO ONE ELSE TEST THIS CODE TIL I PUSH A FIX AND GIVE THE GO AHEAD! Update (3/15/26): Added logic & tests to support atomic _reads_ in an effort to get downstream syncs to always work without needing to clear the cache, i.e., hoping to fix !779. ---------------------------------- This pull request hopefully prevents the race condition of !1142. ## Pull request checklist - [x] This pull request targets the `main` branch, and the branch is named something other than `main`. - [x] I have written an appropriate pull request title and my description is clear. - [x] I understand I am responsible for the contents of this pull request. - I have followed the [contributing guidelines][c1]: - [x] My contribution follows the [code style][c2], if applicable. - [x] I ran [pre-commit checks][c1pc] before opening/drafting this pull request. - [ ] I have [tested my contribution][c1t] (or proof-read it for documentation-only changes) myself, if applicable. This includes ensuring code compiles. - [x] My commit messages follow the [commit message format][c1cm] and are descriptive. - [ ] I have written a [news fragment][n1] for this PR, if applicable<!--(can be done after hitting open!)-->.
feat: implement transactional wrappers around room joins and other key code blocks writing to the database
All checks were successful
Documentation / Build and Deploy Documentation (pull_request) Has been skipped
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 15m40s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 24m23s
7b846ccffe
edits in join.rs to support better transactions

lint fixes

add a test to ensure the database nested transaction panic works as expected

add news fragment
nex added this to the (deleted) milestone 2026-02-25 12:42:26 +00:00
nex requested reviews from Jade, nex, ginger 2026-02-25 12:42:33 +00:00
First-time contributor

Unfortunately, this doesn't fix !779. I still need to clear cache in Cinny to make a room join go through.

Unfortunately, this doesn't fix !779. I still need to clear cache in Cinny to make a room join go through.
Author
Contributor

@melogale okay thanks for checking. Wishful thinking on my part. Sounds like @Jade may have something they worked on independently. Are you able to provide steps to reproduce? Or provide any relevant errors or logs? Did you observe any problems on this branch? I may test it out, it seems safe.

The main focus of this PR then shifts to preventing the race condition and server log spam admins would see from joining busted rooms, see the other issue, !1142

@melogale okay thanks for checking. Wishful thinking on my part. Sounds like @Jade may have something they worked on independently. Are you able to provide steps to reproduce? Or provide any relevant errors or logs? Did you observe any problems on this branch? I may test it out, it seems safe. The main focus of this PR then shifts to preventing the race condition and server log spam admins would see from joining busted rooms, see the other issue, !1142
First-time contributor

I've this branch deployed for nuirons.org and so far there are no breakages at least. So if it works for !1142, it's still awesome. Forgive the log dump: I still know nothing about matrix. This is on the end of the dm-inviter immediately after sending the dm-request.

[Perf]: Mark Request as sent 6 took 0ms index-B47ftdGJ.js:11:164450
[Perf]: BackupRoomKeys: Get keys to backup from rust crypto-sdk took 1ms index-B47ftdGJ.js:11:164450
Backup: Ending loop for version 8082. index-B47ftdGJ.js:11:164450
FetchHttpApi: <-- GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx [16800ms 200] index-B47ftdGJ.js:11:164450
FetchHttpApi: --> GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx index-B47ftdGJ.js:11:164450
FetchHttpApi: <-- GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx [7777ms 200] index-B47ftdGJ.js:11:164450
FetchHttpApi: --> GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx index-B47ftdGJ.js:11:164450
FetchHttpApi: <-- GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx [59ms 200] index-B47ftdGJ.js:11:164450

[MatrixRTCSessionManager] Got room state event for unknown room !jMP1RcSSLddEtaWB6N:nuirons.org! 5 index-B47ftdGJ.js:11:164450

Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted. index-B47ftdGJ.js:117:25699
Uncaught (in promise) DOMException: The play method is not allowed by the user agent or the platform in the current context, possibly because the user denied permission.
FetchHttpApi: --> GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx index-B47ftdGJ.js:11:164450
FetchHttpApi: --> GET https://matrix.nuirons.org/_matrix/client/v3/devices index-B47ftdGJ.js:11:164450
FetchHttpApi: <-- GET https://matrix.nuirons.org/_matrix/client/v3/devices [25ms 200] index-B47ftdGJ.js:11:164450
joinRoom[!jMP1RcSSLddEtaWB6N:nuirons.org]: preJoinMembership=invite, inviter=@ariagale:nuirons.org, opts={} index-B47ftdGJ.js:11:164450
FetchHttpApi: --> POST https://matrix.nuirons.org/_matrix/client/v3/join/!jMP1RcSSLddEtaWB6N%3Anuirons.org index-B47ftdGJ.js:11:164450
FetchHttpApi: <-- POST https://matrix.nuirons.org/_matrix/client/v3/join/!jMP1RcSSLddEtaWB6N%3Anuirons.org [16ms 200] index-B47ftdGJ.js:11:164450
FetchHttpApi: --> PUT https://matrix.nuirons.org/_matrix/client/v3/user/%40melogale%3Anuirons.org/account_data/m.direct index-B47ftdGJ.js:11:164450
FetchHttpApi: <-- GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx [5073ms 200] index-B47ftdGJ.js:11:164450
FetchHttpApi: --> GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx index-B47ftdGJ.js:11:164450
FetchHttpApi: <-- PUT https://matrix.nuirons.org/_matrix/client/v3/user/%40melogale%3Anuirons.org/account_data/m.direct [16ms 200] index-B47ftdGJ.js:11:164450
FetchHttpApi: <-- GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx [22ms 200] index-B47ftdGJ.js:11:164450
FetchHttpApi: --> GET https://matrix.nuirons.org/_matrix/client/unstable/im.nheko.summary/summary/!jMP1RcSSLddEtaWB6N%3Anuirons.org index-B47ftdGJ.js:11:164450
FetchHttpApi: --> GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx index-B47ftdGJ.js:11:164450
FetchHttpApi: <-- GET https://matrix.nuirons.org/_matrix/client/unstable/im.nheko.summary/summary/!jMP1RcSSLddEtaWB6N%3Anuirons.org [43ms 200] index-B47ftdGJ.js:11:164450
joinRoom[!jMP1RcSSLddEtaWB6N:nuirons.org]: preJoinMembership=invite, inviter=@ariagale:nuirons.org, opts={}

I've this branch deployed for nuirons.org and so far there are no breakages at least. So if it works for !1142, it's still awesome. Forgive the log dump: I still know nothing about matrix. This is on the end of the dm-inviter immediately after sending the dm-request. [Perf]: Mark Request as sent 6 took 0ms index-B47ftdGJ.js:11:164450 [Perf]: BackupRoomKeys: Get keys to backup from rust crypto-sdk took 1ms index-B47ftdGJ.js:11:164450 Backup: Ending loop for version 8082. index-B47ftdGJ.js:11:164450 FetchHttpApi: <-- GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx [16800ms 200] index-B47ftdGJ.js:11:164450 FetchHttpApi: --> GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx index-B47ftdGJ.js:11:164450 FetchHttpApi: <-- GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx [7777ms 200] index-B47ftdGJ.js:11:164450 FetchHttpApi: --> GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx index-B47ftdGJ.js:11:164450 FetchHttpApi: <-- GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx [59ms 200] index-B47ftdGJ.js:11:164450 **[MatrixRTCSessionManager] Got room state event for unknown room !jMP1RcSSLddEtaWB6N:nuirons.org! 5 index-B47ftdGJ.js:11:164450** Autoplay is only allowed when approved by the user, the site is activated by the user, or media is muted. index-B47ftdGJ.js:117:25699 Uncaught (in promise) DOMException: The play method is not allowed by the user agent or the platform in the current context, possibly because the user denied permission. FetchHttpApi: --> GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx index-B47ftdGJ.js:11:164450 FetchHttpApi: --> GET https://matrix.nuirons.org/_matrix/client/v3/devices index-B47ftdGJ.js:11:164450 FetchHttpApi: <-- GET https://matrix.nuirons.org/_matrix/client/v3/devices [25ms 200] index-B47ftdGJ.js:11:164450 joinRoom[!jMP1RcSSLddEtaWB6N:nuirons.org]: preJoinMembership=invite, inviter=@ariagale:nuirons.org, opts={} index-B47ftdGJ.js:11:164450 FetchHttpApi: --> POST https://matrix.nuirons.org/_matrix/client/v3/join/!jMP1RcSSLddEtaWB6N%3Anuirons.org index-B47ftdGJ.js:11:164450 FetchHttpApi: <-- POST https://matrix.nuirons.org/_matrix/client/v3/join/!jMP1RcSSLddEtaWB6N%3Anuirons.org [16ms 200] index-B47ftdGJ.js:11:164450 FetchHttpApi: --> PUT https://matrix.nuirons.org/_matrix/client/v3/user/%40melogale%3Anuirons.org/account_data/m.direct index-B47ftdGJ.js:11:164450 FetchHttpApi: <-- GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx [5073ms 200] index-B47ftdGJ.js:11:164450 FetchHttpApi: --> GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx index-B47ftdGJ.js:11:164450 FetchHttpApi: <-- PUT https://matrix.nuirons.org/_matrix/client/v3/user/%40melogale%3Anuirons.org/account_data/m.direct [16ms 200] index-B47ftdGJ.js:11:164450 FetchHttpApi: <-- GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx [22ms 200] index-B47ftdGJ.js:11:164450 FetchHttpApi: --> GET https://matrix.nuirons.org/_matrix/client/unstable/im.nheko.summary/summary/!jMP1RcSSLddEtaWB6N%3Anuirons.org index-B47ftdGJ.js:11:164450 FetchHttpApi: --> GET https://matrix.nuirons.org/_matrix/client/v3/sync?filter=xxx&timeout=xxx&org.matrix.msc4222.use_state_after=xxx&since=xxx index-B47ftdGJ.js:11:164450 FetchHttpApi: <-- GET https://matrix.nuirons.org/_matrix/client/unstable/im.nheko.summary/summary/!jMP1RcSSLddEtaWB6N%3Anuirons.org [43ms 200] index-B47ftdGJ.js:11:164450 joinRoom[!jMP1RcSSLddEtaWB6N:nuirons.org]: preJoinMembership=invite, inviter=@ariagale:nuirons.org, opts={}
Author
Contributor

@melogale If you have it still running, can you try joining a broken room (issue 1142)? If the server logs start spamming, you would need to issue an admin command to force leave the room, or else deal with the flood of logs til your admin can.

I appreciate your efforts and bravery! I'm setting up my a development box to help aid in the guinea pig testing!

It's great to here there haven't been any obvious issues or bugs introduced on this PR with joins or other database modifications that these transactional wrappers touch!

Here's an example room that should likely still be broken,

WARN conduwuit_api::client::sync::v3: error loading joined room err=Room !eBgZCVRnRRkKchiYzS:monero.social has no state room_id=!eBgZCVRnRRkKchiYzS:monero.social
WARN conduwuit_api::client::sync::v3: error loading joined room err=Room !eBgZCVRnRRkKchiYzS:monero.social has no state room_id=!eBgZCVRnRRkKchiYzS:monero.social
  ERROR conduwuit_api::client::sync::v3::joined: Room !eBgZCVRnRRkKchiYzS:monero.social has no state
    at src/api/client/sync/v3/joined.rs:355 on conduwuit:worker ThreadId(2)

WARN conduwuit_api::client::sync::v3: error loading joined room err=Room !eBgZCVRnRRkKchiYzS:monero.social has no state room_id=!eBgZCVRnRRkKchiYzS:monero.social
  ERROR conduwuit_api::client::sync::v3::joined: Room !eBgZCVRnRRkKchiYzS:monero.social has no state
    at src/api/client/sync/v3/joined.rs:355 on conduwuit:worker ThreadId(2)

WARN conduwuit_api::client::sync::v3: error loading joined room err=Room !eBgZCVRnRRkKchiYzS:monero.social has no state room_id=!eBgZCVRnRRkKchiYzS:monero.social
  ERROR conduwuit_api::client::sync::v3::joined: Room !eBgZCVRnRRkKchiYzS:monero.social has no state
    at src/api/client/sync/v3/joined.rs:355 on conduwuit:worker ThreadId(4)
@melogale If you have it still running, can you try joining a broken room (issue 1142)? If the server logs start spamming, you would need to issue an admin command to force leave the room, or else deal with the flood of logs til your admin can. I appreciate your efforts and bravery! I'm setting up my a development box to help aid in the guinea pig testing! It's great to here there haven't been any obvious issues or bugs introduced on this PR with joins or other database modifications that these transactional wrappers touch! Here's an example room that should likely still be broken, ```log WARN conduwuit_api::client::sync::v3: error loading joined room err=Room !eBgZCVRnRRkKchiYzS:monero.social has no state room_id=!eBgZCVRnRRkKchiYzS:monero.social WARN conduwuit_api::client::sync::v3: error loading joined room err=Room !eBgZCVRnRRkKchiYzS:monero.social has no state room_id=!eBgZCVRnRRkKchiYzS:monero.social ERROR conduwuit_api::client::sync::v3::joined: Room !eBgZCVRnRRkKchiYzS:monero.social has no state at src/api/client/sync/v3/joined.rs:355 on conduwuit:worker ThreadId(2) WARN conduwuit_api::client::sync::v3: error loading joined room err=Room !eBgZCVRnRRkKchiYzS:monero.social has no state room_id=!eBgZCVRnRRkKchiYzS:monero.social ERROR conduwuit_api::client::sync::v3::joined: Room !eBgZCVRnRRkKchiYzS:monero.social has no state at src/api/client/sync/v3/joined.rs:355 on conduwuit:worker ThreadId(2) WARN conduwuit_api::client::sync::v3: error loading joined room err=Room !eBgZCVRnRRkKchiYzS:monero.social has no state room_id=!eBgZCVRnRRkKchiYzS:monero.social ERROR conduwuit_api::client::sync::v3::joined: Room !eBgZCVRnRRkKchiYzS:monero.social has no state at src/api/client/sync/v3/joined.rs:355 on conduwuit:worker ThreadId(4) ```
Jade changed title from feat: implement transactional wrappers around room joins and other key code blocks writing to the database to WIP: feat: implement transactional wrappers around room joins and other key code blocks writing to the database 2026-03-03 18:32:20 +00:00
Author
Contributor

Not sure why this was marked WIP. It's been tested, both by me and a volunteer, without apparent detriment.

Note that I did find in my nightly/stateless sync branch more instances of places needing these transactional wrappers, but i did not find any inherent flaw in the implementation, just missing a few places the wrapper.

Not sure why this was marked WIP. It's been tested, both by me and a volunteer, without apparent detriment. Note that I did find in my nightly/stateless sync branch more instances of places needing these transactional wrappers, but i did not find any inherent flaw in the implementation, just missing a few places the wrapper.
gamesguru changed title from WIP: feat: implement transactional wrappers around room joins and other key code blocks writing to the database to feat: implement transactional wrappers around room joins and other key code blocks writing to the database 2026-03-16 03:35:40 +00:00
gamesguru force-pushed guru/experiment/rocksdb-transactional-wrappers from 7b846ccffe
All checks were successful
Documentation / Build and Deploy Documentation (pull_request) Has been skipped
Checks / Prek / Pre-commit & Formatting (pull_request) Successful in 15m40s
Checks / Prek / Clippy and Cargo Tests (pull_request) Successful in 24m23s
to 7d5dcd0e4d
Some checks failed
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
Update flake hashes / update-flake-hashes (pull_request) Has been cancelled
2026-03-16 03:38:02 +00:00
Compare
Owner

Closed due to moderation action

Closed due to moderation action
Jade closed this pull request 2026-03-17 02:32:39 +00:00
Some checks failed
Documentation / Build and Deploy Documentation (pull_request) Has been cancelled
Checks / Prek / Pre-commit & Formatting (pull_request) Has been cancelled
Required
Details
Checks / Prek / Clippy and Cargo Tests (pull_request) Has been cancelled
Required
Details
Update flake hashes / update-flake-hashes (pull_request) Has been cancelled

Pull request closed

Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
3 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
continuwuation/continuwuity!1455
No description provided.