roomsynctoken_shortstatehash database table takes up the majority of the database #917

Open
opened 2025-08-03 02:17:17 +00:00 by nex · 5 comments
Owner

Tip

This issue is marked as "needs investigation" as more data is ideal

I've had my server for quite a long time now, and my database has grown to approximately 55GiB. While not problematic, since this is a fraction of the size of my Synapse's database (with 10x as many rooms and one year older), it'd be nice to have some of that space back.

A lot of people have suggested purging rooms (#163), since that fixes the issue on Synapse, but I remembered recently that we recently fixed the admin commands overflowing their response size, so I took a look at !admin debug database-stats, and to my surprise, event-related tables only took up <10GiB, with just shy of 1 million events (around 750k) on record.
Meanwhile, the largest table by far was roomsynctoken_shortstatehash at 33GiB. That's 60% of my database size!

Haven't actually looked at the references yet, but I wonder if it's a potential path to include the shortstatehash in the token itself somehow, rather than maintaining a secondary table for it. Alternatively, a way to clean up old and unused entries maybe?

> [!TIP] > This issue is marked as "needs investigation" as more data is ideal I've had my server for quite a long time now, and my database has grown to approximately 55GiB. While not problematic, since this is a fraction of the size of my Synapse's database (with 10x as many rooms and one year older), it'd be nice to have some of that space back. A lot of people have suggested purging rooms (#163), since that fixes the issue on Synapse, but I remembered recently that we recently fixed the admin commands overflowing their response size, so I took a look at `!admin debug database-stats`, and to my surprise, event-related tables only took up <10GiB, with just shy of 1 million events (around 750k) on record. Meanwhile, the largest table by far was **`roomsynctoken_shortstatehash` at 33GiB**. That's 60% of my database size! Haven't actually looked at the references yet, but I wonder if it's a potential path to include the shortstatehash in the token itself somehow, rather than maintaining a secondary table for it. Alternatively, a way to clean up old and unused entries maybe?
345 KiB
Author
Owner

If anyone else wants to read the output of this admin command (it's incredibly comprehensive), the important parts look like this:

##### tablename:

Level Files Size <...> KeyIn
--------------------------------------
Sum X/Y ...M <...> nK

e.g.

#### foo:


Level Files Size <...> KeyIn
--------------------------------------
Sum 1/0 200M <...> 50K

The sum row tells you the sum of all the compaction levels, so the most relevant stuff.

If anyone else wants to read the output of this admin command (it's incredibly comprehensive), the important parts look like this: ``` ##### tablename: Level Files Size <...> KeyIn -------------------------------------- Sum X/Y ...M <...> nK ``` e.g. ``` #### foo: Level Files Size <...> KeyIn -------------------------------------- Sum 1/0 200M <...> 50K ``` The sum row tells you the sum of all the compaction levels, so the most relevant stuff.
Author
Owner

These look to be the only references to the roomsynctoken_shortstatehash table:

#[implement(Service)]
pub async fn associate_token_shortstatehash(
&self,
room_id: &RoomId,
token: u64,
shortstatehash: ShortStateHash,
) {
let shortroomid = self
.services
.short
.get_shortroomid(room_id)
.await
.expect("room exists");
let _cork = self.db.db.cork();
let key: &[u64] = &[shortroomid, token];
self.db
.roomsynctoken_shortstatehash
.put(key, shortstatehash);
}
#[implement(Service)]
pub async fn get_token_shortstatehash(
&self,
room_id: &RoomId,
token: u64,
) -> Result<ShortStateHash> {
let shortroomid = self.services.short.get_shortroomid(room_id).await?;
let key: &[u64] = &[shortroomid, token];
self.db
.roomsynctoken_shortstatehash
.qry(key)
.await
.deserialized()
}

associate_token_statehash is only referenced in sync v2 (helpfully called v3.rs):

services
.rooms
.user
.associate_token_shortstatehash(room_id, next_batch, current_shortstatehash)
.await;

get_token_shortstatehash on the other hand is referenced in all of the sync versions:

let since_shortstatehash = services.rooms.user.get_token_shortstatehash(room_id, since);

let since_shortstatehash = services
.rooms
.user
.get_token_shortstatehash(room_id, globalsince)
.await
.ok();

let since_shortstatehash = services
.rooms
.user
.get_token_shortstatehash(room_id, globalsince)
.await
.ok();

The simplified sliding sync (v5) call looks to be encryption related, but it looks like the shortstatehash might already be supplied by the client:

let globalsince = body
.pos
.as_ref()
.and_then(|string| string.parse().ok())
.unwrap_or(0);

For sync v2, it may be possible to actually include the SSH in the token itself, and parse it on each incremental sync. Will investigate possibilities another time.

These look to be the only references to the `roomsynctoken_shortstatehash` table: https://forgejo.ellis.link/continuwuation/continuwuity/src/commit/d8311a5ff672fdc4729d956af5e3af8646b0670d/src/service/rooms/user/mod.rs#L94-L129 `associate_token_statehash` is only referenced in sync v2 (helpfully called v3.rs): https://forgejo.ellis.link/continuwuation/continuwuity/src/commit/293e7243b3c08aaed71b89a16544a5e75b9105dc/src/api/client/sync/v3.rs#L853-L857 `get_token_shortstatehash` on the other hand is referenced in all of the sync versions: https://forgejo.ellis.link/continuwuation/continuwuity/src/commit/293e7243b3c08aaed71b89a16544a5e75b9105dc/src/api/client/sync/v3.rs#L482 https://forgejo.ellis.link/continuwuation/continuwuity/src/commit/293e7243b3c08aaed71b89a16544a5e75b9105dc/src/api/client/sync/v4.rs#L189-L194 https://forgejo.ellis.link/continuwuation/continuwuity/src/commit/293e7243b3c08aaed71b89a16544a5e75b9105dc/src/api/client/sync/v5.rs#L740-L745 The simplified sliding sync (v5) call looks to be encryption related, but it looks like the shortstatehash might already be supplied by the client: https://forgejo.ellis.link/continuwuation/continuwuity/src/commit/293e7243b3c08aaed71b89a16544a5e75b9105dc/src/api/client/sync/v5.rs#L74-L78 For sync v2, it may be possible to actually include the SSH in the token itself, and parse it on each incremental sync. Will investigate possibilities another time.
Owner

Here's my database-stats, 88GB of ~106GB is this table for me

Here's my database-stats, 88GB of ~106GB is this table for me
Contributor

decided to check, mine is 86.49 GB out of ~95 GB total

decided to check, mine is 86.49 GB out of ~95 GB total
Owner

Last shot at cleaning tokens up was https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/jade/purge-sync-tokens, it's nearly there but iterating through rooms was buggy.

Last shot at cleaning tokens up was https://forgejo.ellis.link/continuwuation/continuwuity/src/branch/jade/purge-sync-tokens, it's nearly there but iterating through rooms was buggy.
Sign in to join this conversation.
No milestone
No project
No assignees
4 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: continuwuation/continuwuity#917
No description provided.