Some checks failed
Documentation / Build and Deploy Documentation (push) Successful in 1m15s
Checks / Prek / Pre-commit & Formatting (push) Failing after 1m26s
Release Docker Image / Build linux-amd64 (release) (push) Failing after 5m3s
Checks / Prek / Clippy and Cargo Tests (push) Successful in 9m18s
Release Docker Image / Build linux-arm64 (release) (push) Successful in 13m5s
Release Docker Image / Create Multi-arch Release Manifest (push) Has been skipped
Release Docker Image / Build linux-amd64 (max-perf) (push) Has been skipped
Release Docker Image / Build linux-arm64 (max-perf) (push) Has been skipped
Release Docker Image / Create Max-Perf Manifest (push) Has been skipped
Add "DNS server overload" title to one of the sections
198 lines
9.2 KiB
Text
198 lines
9.2 KiB
Text
# Troubleshooting Continuwuity
|
|
|
|
:::warning{title="Docker users:"}
|
|
Docker can be difficult to use and debug. It's common for Docker
|
|
misconfigurations to cause issues, particularly with networking and permissions.
|
|
Please check that your issues are not due to problems with your Docker setup.
|
|
:::
|
|
|
|
## Continuwuity issues
|
|
|
|
### Slow joins to rooms
|
|
|
|
Some slowness is to be expected if you're the first person on your homserver to join a room (which will
|
|
always be the case for single-user homeservers). In this situation, your homeserver has to verify the signatures of
|
|
all of the state events sent by other servers before your join. To make this process as fast as possible, make sure you have
|
|
multiple fast, trusted servers listed in `trusted_servers` in your configuration, and ensure
|
|
`query_trusted_key_servers_first_on_join` is set to true (the default).
|
|
If you need suggestions for trusted servers, ask in the Continuwuity main room.
|
|
|
|
However, _very_ slow joins, especially to rooms with only a few users in them or rooms created by another user
|
|
on your homeserver, may be caused by [issue !779](https://forgejo.ellis.link/continuwuation/continuwuity/issues/779),
|
|
which is a longstanding bug with synchronizing room joins to clients. In this situation, you did succeed in joining the room, but
|
|
the bug caused your homeserver to forget to tell your client. **To fix this, clear your client's cache.** Both Element and Cinny
|
|
have a button to clear their cache in the "About" section of their settings.
|
|
|
|
### Configuration not working as expected
|
|
|
|
Sometimes you can make a mistake in your configuration that
|
|
means things don't get passed to Continuwuity correctly.
|
|
This is particularly easy to do with environment variables.
|
|
To check what configuration Continuwuity actually sees, you can
|
|
use the `!admin server show-config` command in your admin room.
|
|
Beware that this prints out any secrets in your configuration,
|
|
so you might want to delete the result afterwards!
|
|
|
|
### Lost access to admin room
|
|
|
|
You can reinvite yourself to the admin room through the following methods:
|
|
|
|
- Use the `--execute "users make_user_admin <username>"` Continuwuity binary
|
|
argument once to invite yourslf to the admin room on startup
|
|
- Use the Continuwuity console/CLI to run the `users make_user_admin` command
|
|
- Or specify the `emergency_password` config option to allow you to temporarily
|
|
log into the server account (`@conduit`) from a web client
|
|
|
|
## DNS issues
|
|
|
|
### DNS server overload
|
|
|
|
If your server experience any of the following symptoms:
|
|
|
|
- Spurious server log entries with "DNS No connections available", "mismatching responding nameservers", or "error sending request"
|
|
- Excessively long room joins (30+ minutes) as seen from server logs
|
|
- Partial or non-functional outbound federation
|
|
|
|
This is likely due to your DNS server being overloaded. Most likely, these problems are encountered in the following scenarios:
|
|
|
|
- Homeservers hosted on a machine that uses `systemd-resolved`.
|
|
- Docker deployments which use the bridge network's forwarding resolver.
|
|
|
|
Matrix federation is extremely heavy and sends wild amounts of DNS requests. This makes normal resolvers like the ones above unsuitable for its activity. Ultimately, the best solution/fix for this is to selfhost a high quality caching DNS resolver such as Unbound, and configure Continuwuity to use it.
|
|
|
|
Follow the [**DNS tuning guide**](./advanced/dns) for details on setting it up.
|
|
|
|
### Intermittent federation failures to a specific server
|
|
|
|
There may be circumstances where servers fail to connect to each other, probably due to a bad DNS cache. In such cases, issuing `!admin debug ping <SERVER_NAME>` would return some errors.
|
|
|
|
To fix this, you can run `!admin query resolver flush-cache <SERVER_NAME>` to clear the bad cache for that domain, and outbound requests should work again.
|
|
|
|
You may also use `!admin server clear-caches` or `!admin query resolver flush-cache -a` to clear all server/resolver caches, in case of failures with many domains. However, note that this significantly increases your server load for a short period.
|
|
|
|
## RocksDB / database issues
|
|
|
|
### Database corruption
|
|
|
|
If your database is corrupted *and* is failing to start (e.g. checksum
|
|
mismatch), it may be recoverable but careful steps must be taken, and there is
|
|
no guarantee it may be recoverable.
|
|
|
|
The first thing that can be done is launching Continuwuity with the
|
|
`rocksdb_repair` config option set to true. This will tell RocksDB to attempt to
|
|
repair itself at launch. If this does not work, disable the option and continue
|
|
reading.
|
|
|
|
RocksDB has the following recovery modes:
|
|
|
|
- `TolerateCorruptedTailRecords`
|
|
- `AbsoluteConsistency`
|
|
- `PointInTime`
|
|
- `SkipAnyCorruptedRecord`
|
|
|
|
By default, Continuwuity uses `TolerateCorruptedTailRecords` as generally these may
|
|
be due to bad federation and we can re-fetch the correct data over federation.
|
|
The RocksDB default is `PointInTime` which will attempt to restore a "snapshot"
|
|
of the data when it was last known to be good. This data can be either a few
|
|
seconds old, or multiple minutes prior. `PointInTime` may not be suitable for
|
|
default usage due to clients and servers possibly not being able to handle
|
|
sudden "backwards time travels", and `AbsoluteConsistency` may be too strict.
|
|
|
|
`AbsoluteConsistency` will fail to start the database if any sign of corruption
|
|
is detected. `SkipAnyCorruptedRecord` will skip all forms of corruption unless
|
|
it forbids the database from opening (e.g. too severe). Usage of
|
|
`SkipAnyCorruptedRecord` voids any support as this may cause more damage and/or
|
|
leave your database in a permanently inconsistent state, but it may do something
|
|
if `PointInTime` does not work as a last ditch effort.
|
|
|
|
With this in mind:
|
|
|
|
- First start Continuwuity with the `PointInTime` recovery method. See the [example
|
|
config](./reference/config.mdx) for how to do this using
|
|
`rocksdb_recovery_mode`
|
|
- If your database successfully opens, clients are recommended to clear their
|
|
client cache to account for the rollback
|
|
- Leave your Continuwuity running in `PointInTime` for at least 30-60 minutes so as
|
|
much possible corruption is restored
|
|
- If all goes will, you should be able to restore back to using
|
|
`TolerateCorruptedTailRecords` and you have successfully recovered your database
|
|
|
|
## Debugging
|
|
|
|
Note that users should not really need to debug things. If you find yourself
|
|
debugging and find the issue, please let us know and/or how we can fix it.
|
|
Various debug commands can be found in `!admin debug`.
|
|
|
|
### Debug/Trace log level
|
|
|
|
Continuwuity builds without debug or trace log levels at compile time by default
|
|
for substantial performance gains in CPU usage and improved compile times. If
|
|
you need to access debug/trace log levels, you will need to build without the
|
|
`release_max_log_level` feature or use our provided static debug binaries.
|
|
|
|
### Changing log level dynamically
|
|
|
|
Continuwuity supports changing the tracing log environment filter on-the-fly using
|
|
the admin command `!admin debug change-log-level <log env filter>`. This accepts
|
|
a string **without quotes** the same format as the `log` config option.
|
|
|
|
Example: `!admin debug change-log-level debug`
|
|
|
|
This can also accept complex filters such as:
|
|
`!admin debug change-log-level info,conduit_service[{dest="example.com"}]=trace,ruma_state_res=trace`
|
|
`!admin debug change-log-level info,conduit_service[{dest="example.com"}]=trace,conduit_service[send{dest="example.org"}]=trace`
|
|
|
|
And to reset the log level to the one that was set at startup / last config
|
|
load, simply pass the `--reset` flag.
|
|
|
|
`!admin debug change-log-level --reset`
|
|
|
|
### Pinging servers
|
|
|
|
Continuwuity can ping other servers using `!admin debug ping <server>`. This takes
|
|
a server name and goes through the server discovery process and queries
|
|
`/_matrix/federation/v1/version`. Errors are outputted.
|
|
|
|
While it does measure the latency of the request, it is not indicative of
|
|
server performance on either side as that endpoint is completely unauthenticated
|
|
and simply fetches a string on a static JSON endpoint. It is very low cost both
|
|
bandwidth and computationally.
|
|
|
|
### Enabling backtraces for errors
|
|
|
|
Continuwuity can capture backtraces (stack traces) for errors to help diagnose
|
|
issues. Backtraces show the exact sequence of function calls that led to an
|
|
error, which is invaluable for debugging.
|
|
|
|
To enable backtraces, set the `RUST_BACKTRACE` environment variable before starting Continuwuity:
|
|
|
|
```bash
|
|
# For both panics and errors
|
|
RUST_BACKTRACE=1 ./conduwuit
|
|
|
|
```
|
|
|
|
For systemd deployments, add this to your service file:
|
|
|
|
```ini
|
|
[Service]
|
|
Environment="RUST_BACKTRACE=1"
|
|
```
|
|
|
|
Backtrace capture has a performance cost. Avoid leaving it on.
|
|
You can also enable it only for panics by setting
|
|
`RUST_BACKTRACE=1` and `RUST_LIB_BACKTRACE=0`.
|
|
|
|
### Allocator memory stats
|
|
|
|
When using jemalloc with jemallocator's `stats` feature (`--enable-stats`), you
|
|
can see Continuwuity's high-level allocator stats by using
|
|
`!admin server memory-usage` at the bottom.
|
|
|
|
If you are a developer, you can also view the raw jemalloc statistics with
|
|
`!admin debug memory-stats`. Please note that this output is extremely large
|
|
which may only be visible in the Continuwuity console CLI due to PDU size limits,
|
|
and is not easy for non-developers to understand.
|
|
|
|
[unbound-tuning]: https://unbound.docs.nlnetlabs.nl/en/latest/topics/core/performance.html
|
|
[unbound-arch]: https://wiki.archlinux.org/title/Unbound
|