From 2ab69e081674596352f3fd4a2af874fad9abfdcf Mon Sep 17 00:00:00 2001
From: erikjohnston It is possible to monitor much of the internal state of Synapse using Prometheus
-metrics and Grafana.
-A guide for configuring Synapse to provide metrics is available here
-and information on setting up Grafana is here.
+ It is possible to monitor much of the internal state of Synapse using Prometheus
+metrics and Grafana.
+A guide for configuring Synapse to provide metrics is available here
+and information on setting up Grafana is here.
In this setup, Prometheus will periodically scrape the information Synapse provides and
store a record of it over time. Grafana is then used as an interface to query and
present this information through a series of pretty graphs. Once you have grafana set up, and assuming you're using our grafana dashboard template, look for the following graphs when debugging a slow/overloaded Synapse: Once you have grafana set up, and assuming you're using our grafana dashboard template, look for the following graphs when debugging a slow/overloaded Synapse: This, along with the CPU and Memory graphs, is a good way to check the general health of your Synapse instance. It represents how long it takes for a user on your homeserver to send a message.Understanding Synapse through Grafana graphs
-Message Event Send Time
This is quite a useful graph. It shows how many times Synapse attempts to retrieve a piece of data from a cache which the cache did not contain, thus resulting in a call to the database. We can see here that the _get_joined_profile_from_event_id
cache is being requested a lot, and often the data we're after is not cached.
Cross-referencing this with the Eviction Rate graph, which shows that entries are being evicted from _get_joined_profile_from_event_id
quite often:
we should probably consider raising the size of that cache by raising its cache factor (a multiplier value for the size of an individual cache). Information on doing so is available here (note that the configuration of individual cache factors through the configuration file is available in Synapse v1.14.0+, whereas doing so through environment variables has been supported for a very long time). Note that this will increase Synapse's overall memory usage.
+we should probably consider raising the size of that cache by raising its cache factor (a multiplier value for the size of an individual cache). Information on doing so is available here (note that the configuration of individual cache factors through the configuration file is available in Synapse v1.14.0+, whereas doing so through environment variables has been supported for a very long time). Note that this will increase Synapse's overall memory usage.
Forward extremities are the leaf events at the end of a DAG in a room, aka events that have no children. The more that exist in a room, the more state resolution that Synapse needs to perform (hint: it's an expensive operation). While Synapse has code to prevent too many of these existing at one time in a room, bugs can sometimes make them crop up again.
If a room has >10 forward extremities, it's worth checking which room is the culprit and potentially removing them using the SQL queries mentioned in #1760.
Large spikes in garbage collection times (bigger than shown here, I'm talking in the -multiple seconds range), can cause lots of problems in Synapse performance. It's more an +
Large spikes in garbage collection times (bigger than shown here, I'm talking in the +multiple seconds range), can cause lots of problems in Synapse performance. It's more an indicator of problems, and a symptom of other problems though, so check other graphs for what might be causing it.
If you're still having performance problems with your Synapse instance and you've +
If you're still having performance problems with your Synapse instance and you've tried everything you can, it may just be a lack of system resources. Consider adding -more CPU and RAM, and make use of worker mode +more CPU and RAM, and make use of worker mode to make use of multiple CPU cores / multiple machines for your homeserver.
-- cgit 1.5.1