diff options
author | Azrenbeth <7782548+Azrenbeth@users.noreply.github.com> | 2021-09-28 13:50:57 +0100 |
---|---|---|
committer | Azrenbeth <7782548+Azrenbeth@users.noreply.github.com> | 2021-09-28 13:50:57 +0100 |
commit | d6b511e669eeefeb1dfa634c20bb6f1e552a9931 (patch) | |
tree | 27aa217bb202066c75c08e03c1cd07e085796a82 | |
parent | Better search for state database (diff) | |
download | synapse-d6b511e669eeefeb1dfa634c20bb6f1e552a9931.tar.xz |
Tidy up documentation a bit
-rw-r--r-- | docs/state_compressor.md | 127 | ||||
-rw-r--r-- | synapse/config/state_compressor.py | 5 |
2 files changed, 12 insertions, 120 deletions
diff --git a/docs/state_compressor.md b/docs/state_compressor.md index 97265bfa93..56f21a03cd 100644 --- a/docs/state_compressor.md +++ b/docs/state_compressor.md @@ -1,135 +1,27 @@ -TODO: Update with final contents of README after PR #70 merged in rust-synapse-compress-state repo - # State compressor The state compressor is an **experimental** tool that attempts to reduce the number of rows -in the `state_groups_state` table inside of a postgres database. - -## Introduction to the state tables and compression -### What is state? -State is things like who is in a room, what the room topic/name is, who has -what privilege levels etc. Synapse keeps track of it so that it can spot invalid -events (e.g. ones sent by banned users, or by people with insufficient privilege). - -### What is a state group? - -Synapse needs to keep track of the state at the moment of each event. A state group -corresponds to a unique state. The database table `event_to_state_groups` keeps track -of the mapping from event ids to state group ids. - -Consider the following simplified example: -``` -State group id | State -_____________________________________________ - 1 | Alice in room - 2 | Alice in room, Bob in room - 3 | Bob in room - - -Event id | What the event was -______________________________________ - 1 | Alice sends a message - 3 | Bob joins the room - 4 | Bob sends a message - 5 | Alice leaves the room - 6 | Bob sends a message - - -Event id | State group id -_________________________ - 1 | 1 - 2 | 1 - 3 | 2 - 4 | 2 - 5 | 3 - 6 | 3 -``` -### What are deltas and predecessors? -When a new state event happens (e.g. Bob joins the room) a new state group is created. -BUT instead of copying all of the state from the previous state group, we just store -the change from the previous group (saving on lots of storage space!). The difference -from the previous state group is called the "delta" - -So for the previous example we would have the following (Note only rows 1 and 2 will -make sense at this point): - -``` -State group id | Previous state group id | Delta -____________________________________________________________ - 1 | NONE | Alice in room - 2 | 1 | Bob in room - 3 | NONE | Bob in room -``` -So why is state group 3's previous state group NONE and not 2? Well the way that deltas -work in synapse is that they can only add in new state or overwrite old state, but they -cannot remove it. (So if the room topic is changed then that is just overwriting state, -but removing alice from the room is neither an addition or an overwriting). If it is -impossible to find a delta, then you just start from scratch again with a "snapshot" of -the entire state. - -(NOTE this is not documentation on how synapse handles leaving rooms but is purely for illustrative -purposes) - -The state of a state group is worked out by following the previous state group's and adding -together all of the deltas (with the most recent taking precedence). - -The mapping from state group to previous state group takes place in `state_group_edges` -and the deltas are stored in `state_groups_state` - -### What are we compressing then? -In order to speed up the converstion from state group id to state, there is a limit of 100 -hops set by synapse (that is: we will only ever have to lookup the deltas for a maximum of -100 state groups). It does this by taking another "snapshot" every 100 state groups. - -However, it is these snapshots that take up the bulk of the storage in a synapse database, -so we want to find a way to reduce the number of them without dramatically increasing the -maximum number of hops needed to do lookups. - - -## Compression Algorithm - -The algorithm works by attempting to create a *tree* of deltas, produced by -appending state groups to different "levels". Each level has a maximum size, where -each state group is appended to the lowest level that is not full. This tool calls a -state group "compressed" once it has been added to -one of these levels. - -This produces a graph that looks approximately like the following, in the case -of having two levels with the bottom level (L1) having a maximum size of 3: - -``` -L2 <-------------------- L2 <---------- ... -^--- L1 <--- L1 <--- L1 ^--- L1 <--- L1 <--- L1 - -NOTE: A <--- B means that state group B's predecessor is A -``` -The structure that synapse creates by default would be equivalent to having one level with -a maximum length of 100. - -**Note**: Increasing the sum of the sizes of levels will increase the time it -takes to query the full state of a given state group. +in the `state_groups_state` table inside of a postgres database. Documentation on how it works +can be found on [its github repository](https://github.com/matrix-org/rust-synapse-compress-state). ## Enabling the state compressor The state compressor requires the python library for the `auto_compressor` tool to be -installed. Instructions for this can be found in the `README.md` file -in the <a href=https://github.com/matrix-org/rust-synapse-compress-state>source repo</a> . +installed. Instructions for this can be found in [the `python.md` file in the source +repo](https://github.com/matrix-org/rust-synapse-compress-state/blob/main/docs/python.md). The following configuration options are provided: - `chunk_size` -The rough number of state groups to work on at once. All of the entries from +The number of state groups to work on at once. All of the entries from `state_groups_state` are requested from the database for state groups that are worked on. Therefore small chunk sizes may be needed on machines with low memory. Note: if the compressor fails to find space savings on the chunk as a whole (which may well happen in rooms with lots of backfill in) then the entire chunk -is skipped. This defaults to 500 +is skipped. This defaults to 500 - -- `number_of_rooms` -The compressor will identify the rooms with the most uncompressed state and run on -this many of them. This defaults to 5 - +- `number_of_chunks` +The compressor will stop once it has finished compressing this many chunks. Defaults to 100 - `default_levels` Sizes of each new level in the compression algorithm, as a comma separated list. @@ -140,7 +32,6 @@ the levels effect the performance of fetching the state from the database, as th sum of the sizes is the upper bound on number of iterations needed to fetch a given set of state. This defaults to "100,50,25" - - `time_between_runs` This controls how often the state compressor is run. This defaults to once every day. @@ -150,7 +41,7 @@ An example configuration: state_compressor: enabled: true chunk_size: 500 - number_of_rooms: 5 + number_of_chunks: 5 default_levels: 100,50,25 time_between_runs: 1d ``` \ No newline at end of file diff --git a/synapse/config/state_compressor.py b/synapse/config/state_compressor.py index 40390fbf52..92a0b7e533 100644 --- a/synapse/config/state_compressor.py +++ b/synapse/config/state_compressor.py @@ -36,7 +36,7 @@ class StateCompressorConfig(Config): raise ConfigError from e self.compressor_chunk_size = compressor_config.get("chunk_size") or 500 - self.compressor_number_of_chunks = compressor_config.get("number_of_chunks") or 50 + self.compressor_number_of_chunks = compressor_config.get("number_of_chunks") or 100 self.compressor_default_levels = ( compressor_config.get("default_levels") or "100,50,25" ) @@ -67,7 +67,7 @@ class StateCompressorConfig(Config): # #chunk_size: 1000 - # The number of chunks to compress on each run. Defaults to 50. + # The number of chunks to compress on each run. Defaults to 100. # #number_of_chunks: 1 @@ -87,6 +87,7 @@ _STATE_COMPRESSOR_SCHEMA = { "properties": { "enabled": {"type": "boolean"}, "chunk_size": {"type": "number"}, + "number_of_chunks": {"type": "number"}, "default_levels": {"type": "string"}, "time_between_runs": {"type": "string"}, }, |