From 653044649a198d951e6eef7fcf967c563ba2d761 Mon Sep 17 00:00:00 2001 From: David Robertson Date: Mon, 24 Oct 2022 13:45:31 +0100 Subject: Move dev pages into dev dir --- docs/SUMMARY.md | 22 +- docs/admin_api/media_admin_api.md | 2 +- docs/admin_api/user_admin_api.md | 6 +- docs/auth_chain_diff.dot | 32 -- docs/auth_chain_diff.dot.png | Bin 42427 -> 0 bytes docs/auth_chain_difference_algorithm.md | 141 -------- docs/code_style.md | 130 -------- docs/development/cas.md | 64 ---- docs/development/code_style.md | 130 ++++++++ docs/development/contributing_guide.md | 4 +- .../internal_documentation/auth_chain_diff.dot | 32 ++ .../internal_documentation/auth_chain_diff.dot.png | Bin 0 -> 42427 bytes .../auth_chain_difference_algorithm.md | 141 ++++++++ docs/development/internal_documentation/cas.md | 64 ++++ .../internal_documentation/media_repository.md | 78 +++++ .../internal_documentation/room-dag-concepts.md | 113 +++++++ .../room_and_user_statistics.md | 22 ++ docs/development/internal_documentation/saml.md | 40 +++ docs/development/opentracing.md | 94 ++++++ docs/development/room-dag-concepts.md | 113 ------- docs/development/saml.md | 40 --- .../synapse_architecture/log_contexts.md | 364 +++++++++++++++++++++ .../synapse_architecture/replication.md | 42 +++ .../synapse_architecture/tcp_replication.md | 257 +++++++++++++++ docs/log_contexts.md | 364 --------------------- docs/media_repository.md | 78 ----- docs/opentracing.md | 94 ------ docs/replication.md | 42 --- docs/room_and_user_statistics.md | 22 -- docs/tcp_replication.md | 257 --------------- docs/usage/configuration/config_documentation.md | 4 +- 31 files changed, 1396 insertions(+), 1396 deletions(-) delete mode 100644 docs/auth_chain_diff.dot delete mode 100644 docs/auth_chain_diff.dot.png delete mode 100644 docs/auth_chain_difference_algorithm.md delete mode 100644 docs/code_style.md delete mode 100644 docs/development/cas.md create mode 100644 docs/development/code_style.md create mode 100644 docs/development/internal_documentation/auth_chain_diff.dot create mode 100644 docs/development/internal_documentation/auth_chain_diff.dot.png create mode 100644 docs/development/internal_documentation/auth_chain_difference_algorithm.md create mode 100644 docs/development/internal_documentation/cas.md create mode 100644 docs/development/internal_documentation/media_repository.md create mode 100644 docs/development/internal_documentation/room-dag-concepts.md create mode 100644 docs/development/internal_documentation/room_and_user_statistics.md create mode 100644 docs/development/internal_documentation/saml.md create mode 100644 docs/development/opentracing.md delete mode 100644 docs/development/room-dag-concepts.md delete mode 100644 docs/development/saml.md create mode 100644 docs/development/synapse_architecture/log_contexts.md create mode 100644 docs/development/synapse_architecture/replication.md create mode 100644 docs/development/synapse_architecture/tcp_replication.md delete mode 100644 docs/log_contexts.md delete mode 100644 docs/media_repository.md delete mode 100644 docs/opentracing.md delete mode 100644 docs/replication.md delete mode 100644 docs/room_and_user_statistics.md delete mode 100644 docs/tcp_replication.md (limited to 'docs') diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index 744a076ef1..ceb96b5c6d 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -80,30 +80,30 @@ # Development - [Contributing Guide](development/contributing_guide.md) - - [Code Style](code_style.md) + - [Code Style](development/code_style.md) - [Reviewing Code](development/reviews.md) - [Release Cycle](development/releases.md) - [Git Usage](development/git.md) - [Testing]() - [Demo scripts](development/demo.md) - - [OpenTracing](opentracing.md) + - [OpenTracing](development/opentracing.md) - [Database Schemas](development/database_schema.md) - [Experimental features](development/experimental_features.md) - [Dependency management](development/dependencies.md) - [Synapse Architecture]() - [Cancellation](development/synapse_architecture/cancellation.md) - - [Log Contexts](log_contexts.md) - - [Replication](replication.md) - - [TCP Replication](tcp_replication.md) + - [Log Contexts](development/synapse_architecture/log_contexts.md) + - [Replication](development/synapse_architecture/replication.md) + - [TCP Replication](development/synapse_architecture/tcp_replication.md) - [Internal Documentation](development/internal_documentation/README.md) - [Single Sign-On]() - - [SAML](development/saml.md) - - [CAS](development/cas.md) - - [Room DAG concepts](development/room-dag-concepts.md) + - [SAML](development/internal_documentation/saml.md) + - [CAS](development/internal_documentation/cas.md) + - [Room DAG concepts](development/internal_documentation/room-dag-concepts.md) - [State Resolution]() - - [The Auth Chain Difference Algorithm](auth_chain_difference_algorithm.md) - - [Media Repository](media_repository.md) - - [Room and User Statistics](room_and_user_statistics.md) + - [The Auth Chain Difference Algorithm](development/internal_documentation/auth_chain_difference_algorithm.md) + - [Media Repository](development/internal_documentation/media_repository.md) + - [Room and User Statistics](development/internal_documentation/room_and_user_statistics.md) - [Scripts]() # Other diff --git a/docs/admin_api/media_admin_api.md b/docs/admin_api/media_admin_api.md index d57c5aedae..960c10332f 100644 --- a/docs/admin_api/media_admin_api.md +++ b/docs/admin_api/media_admin_api.md @@ -3,7 +3,7 @@ These APIs allow extracting media information from the homeserver. Details about the format of the `media_id` and storage of the media in the file system -are documented under [media repository](../media_repository.md). +are documented under [media repository](../development/internal_documentation/media_repository.md). To use it, you will need to authenticate by providing an `access_token` for a server admin: see [Admin API](../usage/administration/admin_api). diff --git a/docs/admin_api/user_admin_api.md b/docs/admin_api/user_admin_api.md index c95d6c9b05..800a4de441 100644 --- a/docs/admin_api/user_admin_api.md +++ b/docs/admin_api/user_admin_api.md @@ -548,8 +548,8 @@ The following fields are returned in the JSON response body: ### List media uploaded by a user Gets a list of all local media that a specific `user_id` has created. These are media that the user has uploaded themselves -([local media](../media_repository.md#local-media)), as well as -[URL preview images](../media_repository.md#url-previews) requested by the user if the +([local media](../development/internal_documentation/media_repository.md#local-media)), as well as +[URL preview images](../development/internal_documentation/media_repository.md#url-previews) requested by the user if the [feature is enabled](../usage/configuration/config_documentation.md#url_preview_enabled). By default, the response is ordered by descending creation date and ascending media ID. @@ -650,7 +650,7 @@ The following fields are returned in the JSON response body: - `last_access_ts` - integer - Timestamp when the content was last accessed in ms. - `media_id` - string - The id used to refer to the media. Details about the format are documented under - [media repository](../media_repository.md). + [media repository](../development/internal_documentation/media_repository.md). - `media_length` - integer - Length of the media in bytes. - `media_type` - string - The MIME-type of the media. - `quarantined_by` - string - The user ID that initiated the quarantine request diff --git a/docs/auth_chain_diff.dot b/docs/auth_chain_diff.dot deleted file mode 100644 index 978d579ada..0000000000 --- a/docs/auth_chain_diff.dot +++ /dev/null @@ -1,32 +0,0 @@ -digraph auth { - nodesep=0.5; - rankdir="RL"; - - C [label="Create (1,1)"]; - - BJ [label="Bob's Join (2,1)", color=red]; - BJ2 [label="Bob's Join (2,2)", color=red]; - BJ2 -> BJ [color=red, dir=none]; - - subgraph cluster_foo { - A1 [label="Alice's invite (4,1)", color=blue]; - A2 [label="Alice's Join (4,2)", color=blue]; - A3 [label="Alice's Join (4,3)", color=blue]; - A3 -> A2 -> A1 [color=blue, dir=none]; - color=none; - } - - PL1 [label="Power Level (3,1)", color=darkgreen]; - PL2 [label="Power Level (3,2)", color=darkgreen]; - PL2 -> PL1 [color=darkgreen, dir=none]; - - {rank = same; C; BJ; PL1; A1;} - - A1 -> C [color=grey]; - A1 -> BJ [color=grey]; - PL1 -> C [color=grey]; - BJ2 -> PL1 [penwidth=2]; - - A3 -> PL2 [penwidth=2]; - A1 -> PL1 -> BJ -> C [penwidth=2]; -} diff --git a/docs/auth_chain_diff.dot.png b/docs/auth_chain_diff.dot.png deleted file mode 100644 index 771c07308f..0000000000 Binary files a/docs/auth_chain_diff.dot.png and /dev/null differ diff --git a/docs/auth_chain_difference_algorithm.md b/docs/auth_chain_difference_algorithm.md deleted file mode 100644 index ebc9de25b8..0000000000 --- a/docs/auth_chain_difference_algorithm.md +++ /dev/null @@ -1,141 +0,0 @@ -# Auth Chain Difference Algorithm - -The auth chain difference algorithm is used by V2 state resolution, where a -naive implementation can be a significant source of CPU and DB usage. - -### Definitions - -A *state set* is a set of state events; e.g. the input of a state resolution -algorithm is a collection of state sets. - -The *auth chain* of a set of events are all the events' auth events and *their* -auth events, recursively (i.e. the events reachable by walking the graph induced -by an event's auth events links). - -The *auth chain difference* of a collection of state sets is the union minus the -intersection of the sets of auth chains corresponding to the state sets, i.e an -event is in the auth chain difference if it is reachable by walking the auth -event graph from at least one of the state sets but not from *all* of the state -sets. - -## Breadth First Walk Algorithm - -A way of calculating the auth chain difference without calculating the full auth -chains for each state set is to do a parallel breadth first walk (ordered by -depth) of each state set's auth chain. By tracking which events are reachable -from each state set we can finish early if every pending event is reachable from -every state set. - -This can work well for state sets that have a small auth chain difference, but -can be very inefficient for larger differences. However, this algorithm is still -used if we don't have a chain cover index for the room (e.g. because we're in -the process of indexing it). - -## Chain Cover Index - -Synapse computes auth chain differences by pre-computing a "chain cover" index -for the auth chain in a room, allowing us to efficiently make reachability queries -like "is event `A` in the auth chain of event `B`?". We could do this with an index -that tracks all pairs `(A, B)` such that `A` is in the auth chain of `B`. However, this -would be prohibitively large, scaling poorly as the room accumulates more state -events. - -Instead, we break down the graph into *chains*. A chain is a subset of a DAG -with the following property: for any pair of events `E` and `F` in the chain, -the chain contains a path `E -> F` or a path `F -> E`. This forces a chain to be -linear (without forks), e.g. `E -> F -> G -> ... -> H`. Each event in the chain -is given a *sequence number* local to that chain. The oldest event `E` in the -chain has sequence number 1. If `E` has a child `F` in the chain, then `F` has -sequence number 2. If `E` has a grandchild `G` in the chain, then `G` has -sequence number 3; and so on. - -Synapse ensures that each persisted event belongs to exactly one chain, and -tracks how the chains are connected to one another. This allows us to -efficiently answer reachability queries. Doing so uses less storage than -tracking reachability on an event-by-event basis, particularly when we have -fewer and longer chains. See - -> Jagadish, H. (1990). [A compression technique to materialize transitive closure](https://doi.org/10.1145/99935.99944). -> *ACM Transactions on Database Systems (TODS)*, 15*(4)*, 558-598. - -for the original idea or - -> Y. Chen, Y. Chen, [An efficient algorithm for answering graph -> reachability queries](https://doi.org/10.1109/ICDE.2008.4497498), -> in: 2008 IEEE 24th International Conference on Data Engineering, April 2008, -> pp. 893–902. (PDF available via [Google Scholar](https://scholar.google.com/scholar?q=Y.%20Chen,%20Y.%20Chen,%20An%20efficient%20algorithm%20for%20answering%20graph%20reachability%20queries,%20in:%202008%20IEEE%2024th%20International%20Conference%20on%20Data%20Engineering,%20April%202008,%20pp.%20893902.).) - -for a more modern take. - -In practical terms, the chain cover assigns every event a -*chain ID* and *sequence number* (e.g. `(5,3)`), and maintains a map of *links* -between events in chains (e.g. `(5,3) -> (2,4)`) such that `A` is reachable by `B` -(i.e. `A` is in the auth chain of `B`) if and only if either: - -1. `A` and `B` have the same chain ID and `A`'s sequence number is less than `B`'s - sequence number; or -2. there is a link `L` between `B`'s chain ID and `A`'s chain ID such that - `L.start_seq_no` <= `B.seq_no` and `A.seq_no` <= `L.end_seq_no`. - -There are actually two potential implementations, one where we store links from -each chain to every other reachable chain (the transitive closure of the links -graph), and one where we remove redundant links (the transitive reduction of the -links graph) e.g. if we have chains `C3 -> C2 -> C1` then the link `C3 -> C1` -would not be stored. Synapse uses the former implementation so that it doesn't -need to recurse to test reachability between chains. This trades-off extra storage -in order to save CPU cycles and DB queries. - -### Example - -An example auth graph would look like the following, where chains have been -formed based on type/state_key and are denoted by colour and are labelled with -`(chain ID, sequence number)`. Links are denoted by the arrows (links in grey -are those that would be remove in the second implementation described above). - -![Example](auth_chain_diff.dot.png) - -Note that we don't include all links between events and their auth events, as -most of those links would be redundant. For example, all events point to the -create event, but each chain only needs the one link from it's base to the -create event. - -## Using the Index - -This index can be used to calculate the auth chain difference of the state sets -by looking at the chain ID and sequence numbers reachable from each state set: - -1. For every state set lookup the chain ID/sequence numbers of each state event -2. Use the index to find all chains and the maximum sequence number reachable - from each state set. -3. The auth chain difference is then all events in each chain that have sequence - numbers between the maximum sequence number reachable from *any* state set and - the minimum reachable by *all* state sets (if any). - -Note that steps 2 is effectively calculating the auth chain for each state set -(in terms of chain IDs and sequence numbers), and step 3 is calculating the -difference between the union and intersection of the auth chains. - -### Worked Example - -For example, given the above graph, we can calculate the difference between -state sets consisting of: - -1. `S1`: Alice's invite `(4,1)` and Bob's second join `(2,2)`; and -2. `S2`: Alice's second join `(4,3)` and Bob's first join `(2,1)`. - -Using the index we see that the following auth chains are reachable from each -state set: - -1. `S1`: `(1,1)`, `(2,2)`, `(3,1)` & `(4,1)` -2. `S2`: `(1,1)`, `(2,1)`, `(3,2)` & `(4,3)` - -And so, for each the ranges that are in the auth chain difference: -1. Chain 1: None, (since everything can reach the create event). -2. Chain 2: The range `(1, 2]` (i.e. just `2`), as `1` is reachable by all state - sets and the maximum reachable is `2` (corresponding to Bob's second join). -3. Chain 3: Similarly the range `(1, 2]` (corresponding to the second power - level). -4. Chain 4: The range `(1, 3]` (corresponding to both of Alice's joins). - -So the final result is: Bob's second join `(2,2)`, the second power level -`(3,2)` and both of Alice's joins `(4,2)` & `(4,3)`. diff --git a/docs/code_style.md b/docs/code_style.md deleted file mode 100644 index d65fda62d1..0000000000 --- a/docs/code_style.md +++ /dev/null @@ -1,130 +0,0 @@ -# Code Style - -## Formatting tools - -The Synapse codebase uses a number of code formatting tools in order to -quickly and automatically check for formatting (and sometimes logical) -errors in code. - -The necessary tools are: - -- [black](https://black.readthedocs.io/en/stable/), a source code formatter; -- [isort](https://pycqa.github.io/isort/), which organises each file's imports; -- [flake8](https://flake8.pycqa.org/en/latest/), which can spot common errors; and -- [mypy](https://mypy.readthedocs.io/en/stable/), a type checker. - -Install them with: - -```sh -pip install -e ".[lint,mypy]" -``` - -The easiest way to run the lints is to invoke the linter script as follows. - -```sh -scripts-dev/lint.sh -``` - -It's worth noting that modern IDEs and text editors can run these tools -automatically on save. It may be worth looking into whether this -functionality is supported in your editor for a more convenient -development workflow. It is not, however, recommended to run `flake8` or `mypy` -on save as they take a while and can be very resource intensive. - -## General rules - -- **Naming**: - - Use `CamelCase` for class and type names - - Use underscores for `function_names` and `variable_names`. -- **Docstrings**: should follow the [google code - style](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings). - See the - [examples](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) - in the sphinx documentation. -- **Imports**: - - Imports should be sorted by `isort` as described above. - - Prefer to import classes and functions rather than packages or - modules. - - Example: - - ```python - from synapse.types import UserID - ... - user_id = UserID(local, server) - ``` - - is preferred over: - - ```python - from synapse import types - ... - user_id = types.UserID(local, server) - ``` - - (or any other variant). - - This goes against the advice in the Google style guide, but it - means that errors in the name are caught early (at import time). - - - Avoid wildcard imports (`from synapse.types import *`) and - relative imports (`from .types import UserID`). - -## Configuration code and documentation format - -When adding a configuration option to the code, if several settings are grouped into a single dict, ensure that your code -correctly handles the top-level option being set to `None` (as it will be if no sub-options are enabled). - -The [configuration manual](usage/configuration/config_documentation.md) acts as a -reference to Synapse's configuration options for server administrators. -Remember that many readers will be unfamiliar with YAML and server -administration in general, so it is important that when you add -a configuration option the documentation be as easy to understand as possible, which -includes following a consistent format. - -Some guidelines follow: - -- Each option should be listed in the config manual with the following format: - - - The name of the option, prefixed by `###`. - - - A comment which describes the default behaviour (i.e. what - happens if the setting is omitted), as well as what the effect - will be if the setting is changed. - - An example setting, using backticks to define the code block - - For boolean (on/off) options, convention is that this example - should be the *opposite* to the default. For other options, the example should give - some non-default value which is likely to be useful to the reader. - -- There should be a horizontal rule between each option, which can be achieved by adding `---` before and - after the option. -- `true` and `false` are spelt thus (as opposed to `True`, etc.) - -Example: - ---- -### `modules` - -Use the `module` sub-option to add a module under `modules` to extend functionality. -The `module` setting then has a sub-option, `config`, which can be used to define some configuration -for the `module`. - -Defaults to none. - -Example configuration: -```yaml -modules: - - module: my_super_module.MySuperClass - config: - do_thing: true - - module: my_other_super_module.SomeClass - config: {} -``` ---- - -Note that the sample configuration is generated from the synapse code -and is maintained by a script, `scripts-dev/generate_sample_config.sh`. -Making sure that the output from this script matches the desired format -is left as an exercise for the reader! - diff --git a/docs/development/cas.md b/docs/development/cas.md deleted file mode 100644 index 7c0668e034..0000000000 --- a/docs/development/cas.md +++ /dev/null @@ -1,64 +0,0 @@ -# How to test CAS as a developer without a server - -The [django-mama-cas](https://github.com/jbittel/django-mama-cas) project is an -easy to run CAS implementation built on top of Django. - -## Prerequisites - -1. Create a new virtualenv: `python3 -m venv ` -2. Activate your virtualenv: `source /path/to/your/virtualenv/bin/activate` -3. Install Django and django-mama-cas: - ```sh - python -m pip install "django<3" "django-mama-cas==2.4.0" - ``` -4. Create a Django project in the current directory: - ```sh - django-admin startproject cas_test . - ``` -5. Follow the [install directions](https://django-mama-cas.readthedocs.io/en/latest/installation.html#configuring) for django-mama-cas -6. Setup the SQLite database: `python manage.py migrate` -7. Create a user: - ```sh - python manage.py createsuperuser - ``` - 1. Use whatever you want as the username and password. - 2. Leave the other fields blank. -8. Use the built-in Django test server to serve the CAS endpoints on port 8000: - ```sh - python manage.py runserver - ``` - -You should now have a Django project configured to serve CAS authentication with -a single user created. - -## Configure Synapse (and Element) to use CAS - -1. Modify your `homeserver.yaml` to enable CAS and point it to your locally - running Django test server: - ```yaml - cas_config: - enabled: true - server_url: "http://localhost:8000" - service_url: "http://localhost:8081" - #displayname_attribute: name - #required_attributes: - # name: value - ``` -2. Restart Synapse. - -Note that the above configuration assumes the homeserver is running on port 8081 -and that the CAS server is on port 8000, both on localhost. - -## Testing the configuration - -Then in Element: - -1. Visit the login page with a Element pointing at your homeserver. -2. Click the Single Sign-On button. -3. Login using the credentials created with `createsuperuser`. -4. You should be logged in. - -If you want to repeat this process you'll need to manually logout first: - -1. http://localhost:8000/admin/ -2. Click "logout" in the top right. diff --git a/docs/development/code_style.md b/docs/development/code_style.md new file mode 100644 index 0000000000..3fb98d7cb7 --- /dev/null +++ b/docs/development/code_style.md @@ -0,0 +1,130 @@ +# Code Style + +## Formatting tools + +The Synapse codebase uses a number of code formatting tools in order to +quickly and automatically check for formatting (and sometimes logical) +errors in code. + +The necessary tools are: + +- [black](https://black.readthedocs.io/en/stable/), a source code formatter; +- [isort](https://pycqa.github.io/isort/), which organises each file's imports; +- [flake8](https://flake8.pycqa.org/en/latest/), which can spot common errors; and +- [mypy](https://mypy.readthedocs.io/en/stable/), a type checker. + +Install them with: + +```sh +pip install -e ".[lint,mypy]" +``` + +The easiest way to run the lints is to invoke the linter script as follows. + +```sh +scripts-dev/lint.sh +``` + +It's worth noting that modern IDEs and text editors can run these tools +automatically on save. It may be worth looking into whether this +functionality is supported in your editor for a more convenient +development workflow. It is not, however, recommended to run `flake8` or `mypy` +on save as they take a while and can be very resource intensive. + +## General rules + +- **Naming**: + - Use `CamelCase` for class and type names + - Use underscores for `function_names` and `variable_names`. +- **Docstrings**: should follow the [google code + style](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings). + See the + [examples](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html) + in the sphinx documentation. +- **Imports**: + - Imports should be sorted by `isort` as described above. + - Prefer to import classes and functions rather than packages or + modules. + + Example: + + ```python + from synapse.types import UserID + ... + user_id = UserID(local, server) + ``` + + is preferred over: + + ```python + from synapse import types + ... + user_id = types.UserID(local, server) + ``` + + (or any other variant). + + This goes against the advice in the Google style guide, but it + means that errors in the name are caught early (at import time). + + - Avoid wildcard imports (`from synapse.types import *`) and + relative imports (`from .types import UserID`). + +## Configuration code and documentation format + +When adding a configuration option to the code, if several settings are grouped into a single dict, ensure that your code +correctly handles the top-level option being set to `None` (as it will be if no sub-options are enabled). + +The [configuration manual](../usage/configuration/config_documentation.md) acts as a +reference to Synapse's configuration options for server administrators. +Remember that many readers will be unfamiliar with YAML and server +administration in general, so it is important that when you add +a configuration option the documentation be as easy to understand as possible, which +includes following a consistent format. + +Some guidelines follow: + +- Each option should be listed in the config manual with the following format: + + - The name of the option, prefixed by `###`. + + - A comment which describes the default behaviour (i.e. what + happens if the setting is omitted), as well as what the effect + will be if the setting is changed. + - An example setting, using backticks to define the code block + + For boolean (on/off) options, convention is that this example + should be the *opposite* to the default. For other options, the example should give + some non-default value which is likely to be useful to the reader. + +- There should be a horizontal rule between each option, which can be achieved by adding `---` before and + after the option. +- `true` and `false` are spelt thus (as opposed to `True`, etc.) + +Example: + +--- +### `modules` + +Use the `module` sub-option to add a module under `modules` to extend functionality. +The `module` setting then has a sub-option, `config`, which can be used to define some configuration +for the `module`. + +Defaults to none. + +Example configuration: +```yaml +modules: + - module: my_super_module.MySuperClass + config: + do_thing: true + - module: my_other_super_module.SomeClass + config: {} +``` +--- + +Note that the sample configuration is generated from the synapse code +and is maintained by a script, `scripts-dev/generate_sample_config.sh`. +Making sure that the output from this script matches the desired format +is left as an exercise for the reader! + diff --git a/docs/development/contributing_guide.md b/docs/development/contributing_guide.md index 1e52f9808c..91488d7f73 100644 --- a/docs/development/contributing_guide.md +++ b/docs/development/contributing_guide.md @@ -103,9 +103,9 @@ Synapse developers. regarding Synapse's Admin API, which is used mostly by sysadmins and external service developers. -Synapse's code style is documented [here](../code_style.md). Please follow +Synapse's code style is documented [here](code_style.md). Please follow it, including the conventions for the [sample configuration -file](../code_style.md#configuration-file-format). +file](code_style.md#configuration-file-format). We welcome improvements and additions to our documentation itself! When writing new pages, please diff --git a/docs/development/internal_documentation/auth_chain_diff.dot b/docs/development/internal_documentation/auth_chain_diff.dot new file mode 100644 index 0000000000..978d579ada --- /dev/null +++ b/docs/development/internal_documentation/auth_chain_diff.dot @@ -0,0 +1,32 @@ +digraph auth { + nodesep=0.5; + rankdir="RL"; + + C [label="Create (1,1)"]; + + BJ [label="Bob's Join (2,1)", color=red]; + BJ2 [label="Bob's Join (2,2)", color=red]; + BJ2 -> BJ [color=red, dir=none]; + + subgraph cluster_foo { + A1 [label="Alice's invite (4,1)", color=blue]; + A2 [label="Alice's Join (4,2)", color=blue]; + A3 [label="Alice's Join (4,3)", color=blue]; + A3 -> A2 -> A1 [color=blue, dir=none]; + color=none; + } + + PL1 [label="Power Level (3,1)", color=darkgreen]; + PL2 [label="Power Level (3,2)", color=darkgreen]; + PL2 -> PL1 [color=darkgreen, dir=none]; + + {rank = same; C; BJ; PL1; A1;} + + A1 -> C [color=grey]; + A1 -> BJ [color=grey]; + PL1 -> C [color=grey]; + BJ2 -> PL1 [penwidth=2]; + + A3 -> PL2 [penwidth=2]; + A1 -> PL1 -> BJ -> C [penwidth=2]; +} diff --git a/docs/development/internal_documentation/auth_chain_diff.dot.png b/docs/development/internal_documentation/auth_chain_diff.dot.png new file mode 100644 index 0000000000..771c07308f Binary files /dev/null and b/docs/development/internal_documentation/auth_chain_diff.dot.png differ diff --git a/docs/development/internal_documentation/auth_chain_difference_algorithm.md b/docs/development/internal_documentation/auth_chain_difference_algorithm.md new file mode 100644 index 0000000000..ebc9de25b8 --- /dev/null +++ b/docs/development/internal_documentation/auth_chain_difference_algorithm.md @@ -0,0 +1,141 @@ +# Auth Chain Difference Algorithm + +The auth chain difference algorithm is used by V2 state resolution, where a +naive implementation can be a significant source of CPU and DB usage. + +### Definitions + +A *state set* is a set of state events; e.g. the input of a state resolution +algorithm is a collection of state sets. + +The *auth chain* of a set of events are all the events' auth events and *their* +auth events, recursively (i.e. the events reachable by walking the graph induced +by an event's auth events links). + +The *auth chain difference* of a collection of state sets is the union minus the +intersection of the sets of auth chains corresponding to the state sets, i.e an +event is in the auth chain difference if it is reachable by walking the auth +event graph from at least one of the state sets but not from *all* of the state +sets. + +## Breadth First Walk Algorithm + +A way of calculating the auth chain difference without calculating the full auth +chains for each state set is to do a parallel breadth first walk (ordered by +depth) of each state set's auth chain. By tracking which events are reachable +from each state set we can finish early if every pending event is reachable from +every state set. + +This can work well for state sets that have a small auth chain difference, but +can be very inefficient for larger differences. However, this algorithm is still +used if we don't have a chain cover index for the room (e.g. because we're in +the process of indexing it). + +## Chain Cover Index + +Synapse computes auth chain differences by pre-computing a "chain cover" index +for the auth chain in a room, allowing us to efficiently make reachability queries +like "is event `A` in the auth chain of event `B`?". We could do this with an index +that tracks all pairs `(A, B)` such that `A` is in the auth chain of `B`. However, this +would be prohibitively large, scaling poorly as the room accumulates more state +events. + +Instead, we break down the graph into *chains*. A chain is a subset of a DAG +with the following property: for any pair of events `E` and `F` in the chain, +the chain contains a path `E -> F` or a path `F -> E`. This forces a chain to be +linear (without forks), e.g. `E -> F -> G -> ... -> H`. Each event in the chain +is given a *sequence number* local to that chain. The oldest event `E` in the +chain has sequence number 1. If `E` has a child `F` in the chain, then `F` has +sequence number 2. If `E` has a grandchild `G` in the chain, then `G` has +sequence number 3; and so on. + +Synapse ensures that each persisted event belongs to exactly one chain, and +tracks how the chains are connected to one another. This allows us to +efficiently answer reachability queries. Doing so uses less storage than +tracking reachability on an event-by-event basis, particularly when we have +fewer and longer chains. See + +> Jagadish, H. (1990). [A compression technique to materialize transitive closure](https://doi.org/10.1145/99935.99944). +> *ACM Transactions on Database Systems (TODS)*, 15*(4)*, 558-598. + +for the original idea or + +> Y. Chen, Y. Chen, [An efficient algorithm for answering graph +> reachability queries](https://doi.org/10.1109/ICDE.2008.4497498), +> in: 2008 IEEE 24th International Conference on Data Engineering, April 2008, +> pp. 893–902. (PDF available via [Google Scholar](https://scholar.google.com/scholar?q=Y.%20Chen,%20Y.%20Chen,%20An%20efficient%20algorithm%20for%20answering%20graph%20reachability%20queries,%20in:%202008%20IEEE%2024th%20International%20Conference%20on%20Data%20Engineering,%20April%202008,%20pp.%20893902.).) + +for a more modern take. + +In practical terms, the chain cover assigns every event a +*chain ID* and *sequence number* (e.g. `(5,3)`), and maintains a map of *links* +between events in chains (e.g. `(5,3) -> (2,4)`) such that `A` is reachable by `B` +(i.e. `A` is in the auth chain of `B`) if and only if either: + +1. `A` and `B` have the same chain ID and `A`'s sequence number is less than `B`'s + sequence number; or +2. there is a link `L` between `B`'s chain ID and `A`'s chain ID such that + `L.start_seq_no` <= `B.seq_no` and `A.seq_no` <= `L.end_seq_no`. + +There are actually two potential implementations, one where we store links from +each chain to every other reachable chain (the transitive closure of the links +graph), and one where we remove redundant links (the transitive reduction of the +links graph) e.g. if we have chains `C3 -> C2 -> C1` then the link `C3 -> C1` +would not be stored. Synapse uses the former implementation so that it doesn't +need to recurse to test reachability between chains. This trades-off extra storage +in order to save CPU cycles and DB queries. + +### Example + +An example auth graph would look like the following, where chains have been +formed based on type/state_key and are denoted by colour and are labelled with +`(chain ID, sequence number)`. Links are denoted by the arrows (links in grey +are those that would be remove in the second implementation described above). + +![Example](auth_chain_diff.dot.png) + +Note that we don't include all links between events and their auth events, as +most of those links would be redundant. For example, all events point to the +create event, but each chain only needs the one link from it's base to the +create event. + +## Using the Index + +This index can be used to calculate the auth chain difference of the state sets +by looking at the chain ID and sequence numbers reachable from each state set: + +1. For every state set lookup the chain ID/sequence numbers of each state event +2. Use the index to find all chains and the maximum sequence number reachable + from each state set. +3. The auth chain difference is then all events in each chain that have sequence + numbers between the maximum sequence number reachable from *any* state set and + the minimum reachable by *all* state sets (if any). + +Note that steps 2 is effectively calculating the auth chain for each state set +(in terms of chain IDs and sequence numbers), and step 3 is calculating the +difference between the union and intersection of the auth chains. + +### Worked Example + +For example, given the above graph, we can calculate the difference between +state sets consisting of: + +1. `S1`: Alice's invite `(4,1)` and Bob's second join `(2,2)`; and +2. `S2`: Alice's second join `(4,3)` and Bob's first join `(2,1)`. + +Using the index we see that the following auth chains are reachable from each +state set: + +1. `S1`: `(1,1)`, `(2,2)`, `(3,1)` & `(4,1)` +2. `S2`: `(1,1)`, `(2,1)`, `(3,2)` & `(4,3)` + +And so, for each the ranges that are in the auth chain difference: +1. Chain 1: None, (since everything can reach the create event). +2. Chain 2: The range `(1, 2]` (i.e. just `2`), as `1` is reachable by all state + sets and the maximum reachable is `2` (corresponding to Bob's second join). +3. Chain 3: Similarly the range `(1, 2]` (corresponding to the second power + level). +4. Chain 4: The range `(1, 3]` (corresponding to both of Alice's joins). + +So the final result is: Bob's second join `(2,2)`, the second power level +`(3,2)` and both of Alice's joins `(4,2)` & `(4,3)`. diff --git a/docs/development/internal_documentation/cas.md b/docs/development/internal_documentation/cas.md new file mode 100644 index 0000000000..7c0668e034 --- /dev/null +++ b/docs/development/internal_documentation/cas.md @@ -0,0 +1,64 @@ +# How to test CAS as a developer without a server + +The [django-mama-cas](https://github.com/jbittel/django-mama-cas) project is an +easy to run CAS implementation built on top of Django. + +## Prerequisites + +1. Create a new virtualenv: `python3 -m venv ` +2. Activate your virtualenv: `source /path/to/your/virtualenv/bin/activate` +3. Install Django and django-mama-cas: + ```sh + python -m pip install "django<3" "django-mama-cas==2.4.0" + ``` +4. Create a Django project in the current directory: + ```sh + django-admin startproject cas_test . + ``` +5. Follow the [install directions](https://django-mama-cas.readthedocs.io/en/latest/installation.html#configuring) for django-mama-cas +6. Setup the SQLite database: `python manage.py migrate` +7. Create a user: + ```sh + python manage.py createsuperuser + ``` + 1. Use whatever you want as the username and password. + 2. Leave the other fields blank. +8. Use the built-in Django test server to serve the CAS endpoints on port 8000: + ```sh + python manage.py runserver + ``` + +You should now have a Django project configured to serve CAS authentication with +a single user created. + +## Configure Synapse (and Element) to use CAS + +1. Modify your `homeserver.yaml` to enable CAS and point it to your locally + running Django test server: + ```yaml + cas_config: + enabled: true + server_url: "http://localhost:8000" + service_url: "http://localhost:8081" + #displayname_attribute: name + #required_attributes: + # name: value + ``` +2. Restart Synapse. + +Note that the above configuration assumes the homeserver is running on port 8081 +and that the CAS server is on port 8000, both on localhost. + +## Testing the configuration + +Then in Element: + +1. Visit the login page with a Element pointing at your homeserver. +2. Click the Single Sign-On button. +3. Login using the credentials created with `createsuperuser`. +4. You should be logged in. + +If you want to repeat this process you'll need to manually logout first: + +1. http://localhost:8000/admin/ +2. Click "logout" in the top right. diff --git a/docs/development/internal_documentation/media_repository.md b/docs/development/internal_documentation/media_repository.md new file mode 100644 index 0000000000..23e6da7f31 --- /dev/null +++ b/docs/development/internal_documentation/media_repository.md @@ -0,0 +1,78 @@ +# Media Repository + +*Synapse implementation-specific details for the media repository* + +The media repository + * stores avatars, attachments and their thumbnails for media uploaded by local + users. + * caches avatars, attachments and their thumbnails for media uploaded by remote + users. + * caches resources and thumbnails used for URL previews. + +All media in Matrix can be identified by a unique +[MXC URI](https://spec.matrix.org/latest/client-server-api/#matrix-content-mxc-uris), +consisting of a server name and media ID: +``` +mxc:/// +``` + +## Local Media +Synapse generates 24 character media IDs for content uploaded by local users. +These media IDs consist of upper and lowercase letters and are case-sensitive. +Other homeserver implementations may generate media IDs differently. + +Local media is recorded in the `local_media_repository` table, which includes +metadata such as MIME types, upload times and file sizes. +Note that this table is shared by the URL cache, which has a different media ID +scheme. + +### Paths +A file with media ID `aabbcccccccccccccccccccc` and its `128x96` `image/jpeg` +thumbnail, created by scaling, would be stored at: +``` +local_content/aa/bb/cccccccccccccccccccc +local_thumbnails/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale +``` + +## Remote Media +When media from a remote homeserver is requested from Synapse, it is assigned +a local `filesystem_id`, with the same format as locally-generated media IDs, +as described above. + +A record of remote media is stored in the `remote_media_cache` table, which +can be used to map remote MXC URIs (server names and media IDs) to local +`filesystem_id`s. + +### Paths +A file from `matrix.org` with `filesystem_id` `aabbcccccccccccccccccccc` and its +`128x96` `image/jpeg` thumbnail, created by scaling, would be stored at: +``` +remote_content/matrix.org/aa/bb/cccccccccccccccccccc +remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale +``` +Older thumbnails may omit the thumbnailing method: +``` +remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg +``` + +Note that `remote_thumbnail/` does not have an `s`. + +## URL Previews + +When generating previews for URLs, Synapse may download and cache various +resources, including images. These resources are assigned temporary media IDs +of the form `yyyy-mm-dd_aaaaaaaaaaaaaaaa`, where `yyyy-mm-dd` is the current +date and `aaaaaaaaaaaaaaaa` is a random sequence of 16 case-sensitive letters. + +The metadata for these cached resources is stored in the +`local_media_repository` and `local_media_repository_url_cache` tables. + +Resources for URL previews are deleted after a few days. + +### Paths +The file with media ID `yyyy-mm-dd_aaaaaaaaaaaaaaaa` and its `128x96` +`image/jpeg` thumbnail, created by scaling, would be stored at: +``` +url_cache/yyyy-mm-dd/aaaaaaaaaaaaaaaa +url_cache_thumbnails/yyyy-mm-dd/aaaaaaaaaaaaaaaa/128-96-image-jpeg-scale +``` diff --git a/docs/development/internal_documentation/room-dag-concepts.md b/docs/development/internal_documentation/room-dag-concepts.md new file mode 100644 index 0000000000..76709487f8 --- /dev/null +++ b/docs/development/internal_documentation/room-dag-concepts.md @@ -0,0 +1,113 @@ +# Room DAG concepts + +## Edges + +The word "edge" comes from graph theory lingo. An edge is just a connection +between two events. In Synapse, we connect events by specifying their +`prev_events`. A subsequent event points back at a previous event. + +``` +A (oldest) <---- B <---- C (most recent) +``` + + +## Depth and stream ordering + +Events are normally sorted by `(topological_ordering, stream_ordering)` where +`topological_ordering` is just `depth`. In other words, we first sort by `depth` +and then tie-break based on `stream_ordering`. `depth` is incremented as new +messages are added to the DAG. Normally, `stream_ordering` is an auto +incrementing integer, but backfilled events start with `stream_ordering=-1` and decrement. + +--- + + - `/sync` returns things in the order they arrive at the server (`stream_ordering`). + - `/messages` (and `/backfill` in the federation API) return them in the order determined by the event graph `(topological_ordering, stream_ordering)`. + +The general idea is that, if you're following a room in real-time (i.e. +`/sync`), you probably want to see the messages as they arrive at your server, +rather than skipping any that arrived late; whereas if you're looking at a +historical section of timeline (i.e. `/messages`), you want to see the best +representation of the state of the room as others were seeing it at the time. + +## Outliers + +We mark an event as an `outlier` when we haven't figured out the state for the +room at that point in the DAG yet. They are "floating" events that we haven't +yet correlated to the DAG. + +Outliers typically arise when we fetch the auth chain or state for a given +event. When that happens, we just grab the events in the state/auth chain, +without calculating the state at those events, or backfilling their +`prev_events`. Since we don't have the state at any events fetched in that +way, we mark them as outliers. + +So, typically, we won't have the `prev_events` of an `outlier` in the database, +(though it's entirely possible that we *might* have them for some other +reason). Other things that make outliers different from regular events: + + * We don't have state for them, so there should be no entry in + `event_to_state_groups` for an outlier. (In practice this isn't always + the case, though I'm not sure why: see https://github.com/matrix-org/synapse/issues/12201). + + * We don't record entries for them in the `event_edges`, + `event_forward_extremeties` or `event_backward_extremities` tables. + +Since outliers are not tied into the DAG, they do not normally form part of the +timeline sent down to clients via `/sync` or `/messages`; however there is an +exception: + +### Out-of-band membership events + +A special case of outlier events are some membership events for federated rooms +that we aren't full members of. For example: + + * invites received over federation, before we join the room + * *rejections* for said invites + * knock events for rooms that we would like to join but have not yet joined. + +In all the above cases, we don't have the state for the room, which is why they +are treated as outliers. They are a bit special though, in that they are +proactively sent to clients via `/sync`. + +## Forward extremity + +Most-recent-in-time events in the DAG which are not referenced by any other +events' `prev_events` yet. (In this definition, outliers, rejected events, and +soft-failed events don't count.) + +The forward extremities of a room (or at least, a subset of them, if there are +more than ten) are used as the `prev_events` when the next event is sent. + +The "current state" of a room (ie: the state which would be used if we +generated a new event) is, therefore, the resolution of the room states +at each of the forward extremities. + +## Backward extremity + +The current marker of where we have backfilled up to and will generally be the +`prev_events` of the oldest-in-time events we have in the DAG. This gives a starting point when +backfilling history. + +Note that, unlike forward extremities, we typically don't have any backward +extremity events themselves in the database - or, if we do, they will be "outliers" (see +above). Either way, we don't expect to have the room state at a backward extremity. + +When we persist a non-outlier event, if it was previously a backward extremity, +we clear it as a backward extremity and set all of its `prev_events` as the new +backward extremities if they aren't already persisted as non-outliers. This +therefore keeps the backward extremities up-to-date. + +## State groups + +For every non-outlier event we need to know the state at that event. Instead of +storing the full state for each event in the DB (i.e. a `event_id -> state` +mapping), which is *very* space inefficient when state doesn't change, we +instead assign each different set of state a "state group" and then have +mappings of `event_id -> state_group` and `state_group -> state`. + + +### Stage group edges + +TODO: `state_group_edges` is a further optimization... + notes from @Azrenbeth, https://pastebin.com/seUGVGeT diff --git a/docs/development/internal_documentation/room_and_user_statistics.md b/docs/development/internal_documentation/room_and_user_statistics.md new file mode 100644 index 0000000000..cc38c890bb --- /dev/null +++ b/docs/development/internal_documentation/room_and_user_statistics.md @@ -0,0 +1,22 @@ +Room and User Statistics +======================== + +Synapse maintains room and user statistics in various tables. These can be used +for administrative purposes but are also used when generating the public room +directory. + + +# Synapse Developer Documentation + +## High-Level Concepts + +### Definitions + +* **subject**: Something we are tracking stats about – currently a room or user. +* **current row**: An entry for a subject in the appropriate current statistics + table. Each subject can have only one. + +### Overview + +Stats correspond to the present values. Current rows contain the most up-to-date +statistics for a room. Each subject can only have one entry. diff --git a/docs/development/internal_documentation/saml.md b/docs/development/internal_documentation/saml.md new file mode 100644 index 0000000000..b08bcb7419 --- /dev/null +++ b/docs/development/internal_documentation/saml.md @@ -0,0 +1,40 @@ +# How to test SAML as a developer without a server + +https://fujifish.github.io/samling/samling.html (https://github.com/fujifish/samling) is a great resource for being able to tinker with the +SAML options within Synapse without needing to deploy and configure a complicated software stack. + +To make Synapse (and therefore Element) use it: + +1. Use the samling.html URL above or deploy your own and visit the IdP Metadata tab. +2. Copy the XML to your clipboard. +3. On your Synapse server, create a new file `samling.xml` next to your `homeserver.yaml` with + the XML from step 2 as the contents. +4. Edit your `homeserver.yaml` to include: + ```yaml + saml2_config: + sp_config: + allow_unknown_attributes: true # Works around a bug with AVA Hashes: https://github.com/IdentityPython/pysaml2/issues/388 + metadata: + local: ["samling.xml"] + ``` +5. Ensure that your `homeserver.yaml` has a setting for `public_baseurl`: + ```yaml + public_baseurl: http://localhost:8080/ + ``` +6. Run `apt-get install xmlsec1` and `pip install --upgrade --force 'pysaml2>=4.5.0'` to ensure + the dependencies are installed and ready to go. +7. Restart Synapse. + +Then in Element: + +1. Visit the login page and point Element towards your homeserver using the `public_baseurl` above. +2. Click the Single Sign-On button. +3. On the samling page, enter a Name Identifier and add a SAML Attribute for `uid=your_localpart`. + The response must also be signed. +4. Click "Next". +5. Click "Post Response" (change nothing). +6. You should be logged in. + +If you try and repeat this process, you may be automatically logged in using the information you +gave previously. To fix this, open your developer console (`F12` or `Ctrl+Shift+I`) while on the +samling page and clear the site data. In Chrome, this will be a button on the Application tab. diff --git a/docs/development/opentracing.md b/docs/development/opentracing.md new file mode 100644 index 0000000000..26e5c8b605 --- /dev/null +++ b/docs/development/opentracing.md @@ -0,0 +1,94 @@ +# OpenTracing + +## Background + +OpenTracing is a semi-standard being adopted by a number of distributed +tracing platforms. It is a common api for facilitating vendor-agnostic +tracing instrumentation. That is, we can use the OpenTracing api and +select one of a number of tracer implementations to do the heavy lifting +in the background. Our current selected implementation is Jaeger. + +OpenTracing is a tool which gives an insight into the causal +relationship of work done in and between servers. The servers each track +events and report them to a centralised server - in Synapse's case: +Jaeger. The basic unit used to represent events is the span. The span +roughly represents a single piece of work that was done and the time at +which it occurred. A span can have child spans, meaning that the work of +the child had to be completed for the parent span to complete, or it can +have follow-on spans which represent work that is undertaken as a result +of the parent but is not depended on by the parent to in order to +finish. + +Since this is undertaken in a distributed environment a request to +another server, such as an RPC or a simple GET, can be considered a span +(a unit or work) for the local server. This causal link is what +OpenTracing aims to capture and visualise. In order to do this metadata +about the local server's span, i.e the 'span context', needs to be +included with the request to the remote. + +It is up to the remote server to decide what it does with the spans it +creates. This is called the sampling policy and it can be configured +through Jaeger's settings. + +For OpenTracing concepts see +. + +For more information about Jaeger's implementation see + + +## Setting up OpenTracing + +To receive OpenTracing spans, start up a Jaeger server. This can be done +using docker like so: + +```sh +docker run -d --name jaeger \ + -p 6831:6831/udp \ + -p 6832:6832/udp \ + -p 5778:5778 \ + -p 16686:16686 \ + -p 14268:14268 \ + jaegertracing/all-in-one:1 +``` + +Latest documentation is probably at +https://www.jaegertracing.io/docs/latest/getting-started. + +## Enable OpenTracing in Synapse + +OpenTracing is not enabled by default. It must be enabled in the +homeserver config by adding the `opentracing` option to your config file. You can find +documentation about how to do this in the [config manual under the header 'Opentracing'](../usage/configuration/config_documentation.md#opentracing). +See below for an example Opentracing configuration: + +```yaml +opentracing: + enabled: true + homeserver_whitelist: + - "mytrustedhomeserver.org" + - "*.myotherhomeservers.com" +``` + +## Homeserver whitelisting + +The homeserver whitelist is configured using regular expressions. A list +of regular expressions can be given and their union will be compared +when propagating any spans contexts to another homeserver. + +Though it's mostly safe to send and receive span contexts to and from +untrusted users since span contexts are usually opaque ids it can lead +to two problems, namely: + +- If the span context is marked as sampled by the sending homeserver + the receiver will sample it. Therefore two homeservers with wildly + different sampling policies could incur higher sampling counts than + intended. +- Sending servers can attach arbitrary data to spans, known as + 'baggage'. For safety this has been disabled in Synapse but that + doesn't prevent another server sending you baggage which will be + logged to OpenTracing's logs. + +## Configuring Jaeger + +Sampling strategies can be set as in this document: +. diff --git a/docs/development/room-dag-concepts.md b/docs/development/room-dag-concepts.md deleted file mode 100644 index 76709487f8..0000000000 --- a/docs/development/room-dag-concepts.md +++ /dev/null @@ -1,113 +0,0 @@ -# Room DAG concepts - -## Edges - -The word "edge" comes from graph theory lingo. An edge is just a connection -between two events. In Synapse, we connect events by specifying their -`prev_events`. A subsequent event points back at a previous event. - -``` -A (oldest) <---- B <---- C (most recent) -``` - - -## Depth and stream ordering - -Events are normally sorted by `(topological_ordering, stream_ordering)` where -`topological_ordering` is just `depth`. In other words, we first sort by `depth` -and then tie-break based on `stream_ordering`. `depth` is incremented as new -messages are added to the DAG. Normally, `stream_ordering` is an auto -incrementing integer, but backfilled events start with `stream_ordering=-1` and decrement. - ---- - - - `/sync` returns things in the order they arrive at the server (`stream_ordering`). - - `/messages` (and `/backfill` in the federation API) return them in the order determined by the event graph `(topological_ordering, stream_ordering)`. - -The general idea is that, if you're following a room in real-time (i.e. -`/sync`), you probably want to see the messages as they arrive at your server, -rather than skipping any that arrived late; whereas if you're looking at a -historical section of timeline (i.e. `/messages`), you want to see the best -representation of the state of the room as others were seeing it at the time. - -## Outliers - -We mark an event as an `outlier` when we haven't figured out the state for the -room at that point in the DAG yet. They are "floating" events that we haven't -yet correlated to the DAG. - -Outliers typically arise when we fetch the auth chain or state for a given -event. When that happens, we just grab the events in the state/auth chain, -without calculating the state at those events, or backfilling their -`prev_events`. Since we don't have the state at any events fetched in that -way, we mark them as outliers. - -So, typically, we won't have the `prev_events` of an `outlier` in the database, -(though it's entirely possible that we *might* have them for some other -reason). Other things that make outliers different from regular events: - - * We don't have state for them, so there should be no entry in - `event_to_state_groups` for an outlier. (In practice this isn't always - the case, though I'm not sure why: see https://github.com/matrix-org/synapse/issues/12201). - - * We don't record entries for them in the `event_edges`, - `event_forward_extremeties` or `event_backward_extremities` tables. - -Since outliers are not tied into the DAG, they do not normally form part of the -timeline sent down to clients via `/sync` or `/messages`; however there is an -exception: - -### Out-of-band membership events - -A special case of outlier events are some membership events for federated rooms -that we aren't full members of. For example: - - * invites received over federation, before we join the room - * *rejections* for said invites - * knock events for rooms that we would like to join but have not yet joined. - -In all the above cases, we don't have the state for the room, which is why they -are treated as outliers. They are a bit special though, in that they are -proactively sent to clients via `/sync`. - -## Forward extremity - -Most-recent-in-time events in the DAG which are not referenced by any other -events' `prev_events` yet. (In this definition, outliers, rejected events, and -soft-failed events don't count.) - -The forward extremities of a room (or at least, a subset of them, if there are -more than ten) are used as the `prev_events` when the next event is sent. - -The "current state" of a room (ie: the state which would be used if we -generated a new event) is, therefore, the resolution of the room states -at each of the forward extremities. - -## Backward extremity - -The current marker of where we have backfilled up to and will generally be the -`prev_events` of the oldest-in-time events we have in the DAG. This gives a starting point when -backfilling history. - -Note that, unlike forward extremities, we typically don't have any backward -extremity events themselves in the database - or, if we do, they will be "outliers" (see -above). Either way, we don't expect to have the room state at a backward extremity. - -When we persist a non-outlier event, if it was previously a backward extremity, -we clear it as a backward extremity and set all of its `prev_events` as the new -backward extremities if they aren't already persisted as non-outliers. This -therefore keeps the backward extremities up-to-date. - -## State groups - -For every non-outlier event we need to know the state at that event. Instead of -storing the full state for each event in the DB (i.e. a `event_id -> state` -mapping), which is *very* space inefficient when state doesn't change, we -instead assign each different set of state a "state group" and then have -mappings of `event_id -> state_group` and `state_group -> state`. - - -### Stage group edges - -TODO: `state_group_edges` is a further optimization... - notes from @Azrenbeth, https://pastebin.com/seUGVGeT diff --git a/docs/development/saml.md b/docs/development/saml.md deleted file mode 100644 index b08bcb7419..0000000000 --- a/docs/development/saml.md +++ /dev/null @@ -1,40 +0,0 @@ -# How to test SAML as a developer without a server - -https://fujifish.github.io/samling/samling.html (https://github.com/fujifish/samling) is a great resource for being able to tinker with the -SAML options within Synapse without needing to deploy and configure a complicated software stack. - -To make Synapse (and therefore Element) use it: - -1. Use the samling.html URL above or deploy your own and visit the IdP Metadata tab. -2. Copy the XML to your clipboard. -3. On your Synapse server, create a new file `samling.xml` next to your `homeserver.yaml` with - the XML from step 2 as the contents. -4. Edit your `homeserver.yaml` to include: - ```yaml - saml2_config: - sp_config: - allow_unknown_attributes: true # Works around a bug with AVA Hashes: https://github.com/IdentityPython/pysaml2/issues/388 - metadata: - local: ["samling.xml"] - ``` -5. Ensure that your `homeserver.yaml` has a setting for `public_baseurl`: - ```yaml - public_baseurl: http://localhost:8080/ - ``` -6. Run `apt-get install xmlsec1` and `pip install --upgrade --force 'pysaml2>=4.5.0'` to ensure - the dependencies are installed and ready to go. -7. Restart Synapse. - -Then in Element: - -1. Visit the login page and point Element towards your homeserver using the `public_baseurl` above. -2. Click the Single Sign-On button. -3. On the samling page, enter a Name Identifier and add a SAML Attribute for `uid=your_localpart`. - The response must also be signed. -4. Click "Next". -5. Click "Post Response" (change nothing). -6. You should be logged in. - -If you try and repeat this process, you may be automatically logged in using the information you -gave previously. To fix this, open your developer console (`F12` or `Ctrl+Shift+I`) while on the -samling page and clear the site data. In Chrome, this will be a button on the Application tab. diff --git a/docs/development/synapse_architecture/log_contexts.md b/docs/development/synapse_architecture/log_contexts.md new file mode 100644 index 0000000000..cb15dbe158 --- /dev/null +++ b/docs/development/synapse_architecture/log_contexts.md @@ -0,0 +1,364 @@ +# Log Contexts + +To help track the processing of individual requests, synapse uses a +'`log context`' to track which request it is handling at any given +moment. This is done via a thread-local variable; a `logging.Filter` is +then used to fish the information back out of the thread-local variable +and add it to each log record. + +Logcontexts are also used for CPU and database accounting, so that we +can track which requests were responsible for high CPU use or database +activity. + +The `synapse.logging.context` module provides facilities for managing +the current log context (as well as providing the `LoggingContextFilter` +class). + +Asynchronous functions make the whole thing complicated, so this document describes +how it all works, and how to write code which follows the rules. + +In this document, "awaitable" refers to any object which can be `await`ed. In the context of +Synapse, that normally means either a coroutine or a Twisted +[`Deferred`](https://twistedmatrix.com/documents/current/api/twisted.internet.defer.Deferred.html). + +## Logcontexts without asynchronous code + +In the absence of any asynchronous voodoo, things are simple enough. As with +any code of this nature, the rule is that our function should leave +things as it found them: + +```python +from synapse.logging import context # omitted from future snippets + +def handle_request(request_id): + request_context = context.LoggingContext() + + calling_context = context.set_current_context(request_context) + try: + request_context.request = request_id + do_request_handling() + logger.debug("finished") + finally: + context.set_current_context(calling_context) + +def do_request_handling(): + logger.debug("phew") # this will be logged against request_id +``` + +LoggingContext implements the context management methods, so the above +can be written much more succinctly as: + +```python +def handle_request(request_id): + with context.LoggingContext() as request_context: + request_context.request = request_id + do_request_handling() + logger.debug("finished") + +def do_request_handling(): + logger.debug("phew") +``` + +## Using logcontexts with awaitables + +Awaitables break the linear flow of code so that there is no longer a single entry point +where we should set the logcontext and a single exit point where we should remove it. + +Consider the example above, where `do_request_handling` needs to do some +blocking operation, and returns an awaitable: + +```python +async def handle_request(request_id): + with context.LoggingContext() as request_context: + request_context.request = request_id + await do_request_handling() + logger.debug("finished") +``` + +In the above flow: + +- The logcontext is set +- `do_request_handling` is called, and returns an awaitable +- `handle_request` awaits the awaitable +- Execution of `handle_request` is suspended + +So we have stopped processing the request (and will probably go on to +start processing the next), without clearing the logcontext. + +To circumvent this problem, synapse code assumes that, wherever you have +an awaitable, you will want to `await` it. To that end, whereever +functions return awaitables, we adopt the following conventions: + +**Rules for functions returning awaitables:** + +> - If the awaitable is already complete, the function returns with the +> same logcontext it started with. +> - If the awaitable is incomplete, the function clears the logcontext +> before returning; when the awaitable completes, it restores the +> logcontext before running any callbacks. + +That sounds complicated, but actually it means a lot of code (including +the example above) "just works". There are two cases: + +- If `do_request_handling` returns a completed awaitable, then the + logcontext will still be in place. In this case, execution will + continue immediately after the `await`; the "finished" line will + be logged against the right context, and the `with` block restores + the original context before we return to the caller. +- If the returned awaitable is incomplete, `do_request_handling` clears + the logcontext before returning. The logcontext is therefore clear + when `handle_request` `await`s the awaitable. + + Once `do_request_handling`'s awaitable completes, it will reinstate + the logcontext, before running the second half of `handle_request`, + so again the "finished" line will be logged against the right context, + and the `with` block restores the original context. + +As an aside, it's worth noting that `handle_request` follows our rules +- though that only matters if the caller has its own logcontext which it +cares about. + +The following sections describe pitfalls and helpful patterns when +implementing these rules. + +Always await your awaitables +---------------------------- + +Whenever you get an awaitable back from a function, you should `await` on +it as soon as possible. Do not pass go; do not do any logging; do not +call any other functions. + +```python +async def fun(): + logger.debug("starting") + await do_some_stuff() # just like this + + coro = more_stuff() + result = await coro # also fine, of course + + return result +``` + +Provided this pattern is followed all the way back up to the callchain +to where the logcontext was set, this will make things work out ok: +provided `do_some_stuff` and `more_stuff` follow the rules above, then +so will `fun`. + +It's all too easy to forget to `await`: for instance if we forgot that +`do_some_stuff` returned an awaitable, we might plough on regardless. This +leads to a mess; it will probably work itself out eventually, but not +before a load of stuff has been logged against the wrong context. +(Normally, other things will break, more obviously, if you forget to +`await`, so this tends not to be a major problem in practice.) + +Of course sometimes you need to do something a bit fancier with your +awaitable - not all code follows the linear A-then-B-then-C pattern. +Notes on implementing more complex patterns are in later sections. + +## Where you create a new awaitable, make it follow the rules + +Most of the time, an awaitable comes from another synapse function. +Sometimes, though, we need to make up a new awaitable, or we get an awaitable +back from external code. We need to make it follow our rules. + +The easy way to do it is by using `context.make_deferred_yieldable`. Suppose we want to implement +`sleep`, which returns a deferred which will run its callbacks after a +given number of seconds. That might look like: + +```python +# not a logcontext-rules-compliant function +def get_sleep_deferred(seconds): + d = defer.Deferred() + reactor.callLater(seconds, d.callback, None) + return d +``` + +That doesn't follow the rules, but we can fix it by calling it through +`context.make_deferred_yieldable`: + +```python +async def sleep(seconds): + return await context.make_deferred_yieldable(get_sleep_deferred(seconds)) +``` + +## Fire-and-forget + +Sometimes you want to fire off a chain of execution, but not wait for +its result. That might look a bit like this: + +```python +async def do_request_handling(): + await foreground_operation() + + # *don't* do this + background_operation() + + logger.debug("Request handling complete") + +async def background_operation(): + await first_background_step() + logger.debug("Completed first step") + await second_background_step() + logger.debug("Completed second step") +``` + +The above code does a couple of steps in the background after +`do_request_handling` has finished. The log lines are still logged +against the `request_context` logcontext, which may or may not be +desirable. There are two big problems with the above, however. The first +problem is that, if `background_operation` returns an incomplete +awaitable, it will expect its caller to `await` immediately, so will have +cleared the logcontext. In this example, that means that 'Request +handling complete' will be logged without any context. + +The second problem, which is potentially even worse, is that when the +awaitable returned by `background_operation` completes, it will restore +the original logcontext. There is nothing waiting on that awaitable, so +the logcontext will leak into the reactor and possibly get attached to +some arbitrary future operation. + +There are two potential solutions to this. + +One option is to surround the call to `background_operation` with a +`PreserveLoggingContext` call. That will reset the logcontext before +starting `background_operation` (so the context restored when the +deferred completes will be the empty logcontext), and will restore the +current logcontext before continuing the foreground process: + +```python +async def do_request_handling(): + await foreground_operation() + + # start background_operation off in the empty logcontext, to + # avoid leaking the current context into the reactor. + with PreserveLoggingContext(): + background_operation() + + # this will now be logged against the request context + logger.debug("Request handling complete") +``` + +Obviously that option means that the operations done in +`background_operation` would be not be logged against a logcontext +(though that might be fixed by setting a different logcontext via a +`with LoggingContext(...)` in `background_operation`). + +The second option is to use `context.run_in_background`, which wraps a +function so that it doesn't reset the logcontext even when it returns +an incomplete awaitable, and adds a callback to the returned awaitable to +reset the logcontext. In other words, it turns a function that follows +the Synapse rules about logcontexts and awaitables into one which behaves +more like an external function --- the opposite operation to that +described in the previous section. It can be used like this: + +```python +async def do_request_handling(): + await foreground_operation() + + context.run_in_background(background_operation) + + # this will now be logged against the request context + logger.debug("Request handling complete") +``` + +## Passing synapse deferreds into third-party functions + +A typical example of this is where we want to collect together two or +more awaitables via `defer.gatherResults`: + +```python +a1 = operation1() +a2 = operation2() +a3 = defer.gatherResults([a1, a2]) +``` + +This is really a variation of the fire-and-forget problem above, in that +we are firing off `a1` and `a2` without awaiting on them. The difference +is that we now have third-party code attached to their callbacks. Anyway +either technique given in the [Fire-and-forget](#fire-and-forget) +section will work. + +Of course, the new awaitable returned by `gather` needs to be +wrapped in order to make it follow the logcontext rules before we can +yield it, as described in [Where you create a new awaitable, make it +follow the +rules](#where-you-create-a-new-awaitable-make-it-follow-the-rules). + +So, option one: reset the logcontext before starting the operations to +be gathered: + +```python +async def do_request_handling(): + with PreserveLoggingContext(): + a1 = operation1() + a2 = operation2() + result = await defer.gatherResults([a1, a2]) +``` + +In this case particularly, though, option two, of using +`context.run_in_background` almost certainly makes more sense, so that +`operation1` and `operation2` are both logged against the original +logcontext. This looks like: + +```python +async def do_request_handling(): + a1 = context.run_in_background(operation1) + a2 = context.run_in_background(operation2) + + result = await make_deferred_yieldable(defer.gatherResults([a1, a2])) +``` + +## A note on garbage-collection of awaitable chains + +It turns out that our logcontext rules do not play nicely with awaitable +chains which get orphaned and garbage-collected. + +Imagine we have some code that looks like this: + +```python +listener_queue = [] + +def on_something_interesting(): + for d in listener_queue: + d.callback("foo") + +async def await_something_interesting(): + new_awaitable = defer.Deferred() + listener_queue.append(new_awaitable) + + with PreserveLoggingContext(): + await new_awaitable +``` + +Obviously, the idea here is that we have a bunch of things which are +waiting for an event. (It's just an example of the problem here, but a +relatively common one.) + +Now let's imagine two further things happen. First of all, whatever was +waiting for the interesting thing goes away. (Perhaps the request times +out, or something *even more* interesting happens.) + +Secondly, let's suppose that we decide that the interesting thing is +never going to happen, and we reset the listener queue: + +```python +def reset_listener_queue(): + listener_queue.clear() +``` + +So, both ends of the awaitable chain have now dropped their references, +and the awaitable chain is now orphaned, and will be garbage-collected at +some point. Note that `await_something_interesting` is a coroutine, +which Python implements as a generator function. When Python +garbage-collects generator functions, it gives them a chance to +clean up by making the `await` (or `yield`) raise a `GeneratorExit` +exception. In our case, that means that the `__exit__` handler of +`PreserveLoggingContext` will carefully restore the request context, but +there is now nothing waiting for its return, so the request context is +never cleared. + +To reiterate, this problem only arises when *both* ends of a awaitable +chain are dropped. Dropping the the reference to an awaitable you're +supposed to be awaiting is bad practice, so this doesn't +actually happen too much. Unfortunately, when it does happen, it will +lead to leaked logcontexts which are incredibly hard to track down. diff --git a/docs/development/synapse_architecture/replication.md b/docs/development/synapse_architecture/replication.md new file mode 100644 index 0000000000..108da9a065 --- /dev/null +++ b/docs/development/synapse_architecture/replication.md @@ -0,0 +1,42 @@ +# Replication Architecture + +## Motivation + +We'd like to be able to split some of the work that synapse does into +multiple python processes. In theory multiple synapse processes could +share a single postgresql database and we\'d scale up by running more +synapse processes. However much of synapse assumes that only one process +is interacting with the database, both for assigning unique identifiers +when inserting into tables, notifying components about new updates, and +for invalidating its caches. + +So running multiple copies of the current code isn't an option. One way +to run multiple processes would be to have a single writer process and +multiple reader processes connected to the same database. In order to do +this we'd need a way for the reader process to invalidate its in-memory +caches when an update happens on the writer. One way to do this is for +the writer to present an append-only log of updates which the readers +can consume to invalidate their caches and to push updates to listening +clients or pushers. + +Synapse already stores much of its data as an append-only log so that it +can correctly respond to `/sync` requests so the amount of code changes +needed to expose the append-only log to the readers should be fairly +minimal. + +## Architecture + +### The Replication Protocol + +See [the TCP replication documentation](tcp_replication.md). + +### The Slaved DataStore + +There are read-only version of the synapse storage layer in +`synapse/replication/slave/storage` that use the response of the +replication API to invalidate their caches. + +### The TCP Replication Module +Information about how the tcp replication module is structured, including how +the classes interact, can be found in +`synapse/replication/tcp/__init__.py` diff --git a/docs/development/synapse_architecture/tcp_replication.md b/docs/development/synapse_architecture/tcp_replication.md new file mode 100644 index 0000000000..15df949deb --- /dev/null +++ b/docs/development/synapse_architecture/tcp_replication.md @@ -0,0 +1,257 @@ +# TCP Replication + +## Motivation + +Previously the workers used an HTTP long poll mechanism to get updates +from the master, which had the problem of causing a lot of duplicate +work on the server. This TCP protocol replaces those APIs with the aim +of increased efficiency. + +## Overview + +The protocol is based on fire and forget, line based commands. An +example flow would be (where '>' indicates master to worker and +'<' worker to master flows): + + > SERVER example.com + < REPLICATE + > POSITION events master 53 53 + > RDATA events master 54 ["$foo1:bar.com", ...] + > RDATA events master 55 ["$foo4:bar.com", ...] + +The example shows the server accepting a new connection and sending its identity +with the `SERVER` command, followed by the client server to respond with the +position of all streams. The server then periodically sends `RDATA` commands +which have the format `RDATA `, where +the format of `` is defined by the individual streams. The +`` is the name of the Synapse process that generated the data +(usually "master"). + +Error reporting happens by either the client or server sending an ERROR +command, and usually the connection will be closed. + +Since the protocol is a simple line based, its possible to manually +connect to the server using a tool like netcat. A few things should be +noted when manually using the protocol: + +- The federation stream is only available if federation sending has + been disabled on the main process. +- The server will only time connections out that have sent a `PING` + command. If a ping is sent then the connection will be closed if no + further commands are receieved within 15s. Both the client and + server protocol implementations will send an initial PING on + connection and ensure at least one command every 5s is sent (not + necessarily `PING`). +- `RDATA` commands *usually* include a numeric token, however if the + stream has multiple rows to replicate per token the server will send + multiple `RDATA` commands, with all but the last having a token of + `batch`. See the documentation on `commands.RdataCommand` for + further details. + +## Architecture + +The basic structure of the protocol is line based, where the initial +word of each line specifies the command. The rest of the line is parsed +based on the command. For example, the RDATA command is defined as: + + RDATA + +(Note that may contains spaces, but cannot contain +newlines.) + +Blank lines are ignored. + +### Keep alives + +Both sides are expected to send at least one command every 5s or so, and +should send a `PING` command if necessary. If either side do not receive +a command within e.g. 15s then the connection should be closed. + +Because the server may be connected to manually using e.g. netcat, the +timeouts aren't enabled until an initial `PING` command is seen. Both +the client and server implementations below send a `PING` command +immediately on connection to ensure the timeouts are enabled. + +This ensures that both sides can quickly realize if the tcp connection +has gone and handle the situation appropriately. + +### Start up + +When a new connection is made, the server: + +- Sends a `SERVER` command, which includes the identity of the server, + allowing the client to detect if its connected to the expected + server +- Sends a `PING` command as above, to enable the client to time out + connections promptly. + +The client: + +- Sends a `NAME` command, allowing the server to associate a human + friendly name with the connection. This is optional. +- Sends a `PING` as above +- Sends a `REPLICATE` to get the current position of all streams. +- On receipt of a `SERVER` command, checks that the server name + matches the expected server name. + +### Error handling + +If either side detects an error it can send an `ERROR` command and close +the connection. + +If the client side loses the connection to the server it should +reconnect, following the steps above. + +### Congestion + +If the server sends messages faster than the client can consume them the +server will first buffer a (fairly large) number of commands and then +disconnect the client. This ensures that we don't queue up an unbounded +number of commands in memory and gives us a potential oppurtunity to +squawk loudly. When/if the client recovers it can reconnect to the +server and ask for missed messages. + +### Reliability + +In general the replication stream should be considered an unreliable +transport since e.g. commands are not resent if the connection +disappears. + +The exception to that are the replication streams, i.e. RDATA commands, +since these include tokens which can be used to restart the stream on +connection errors. + +The client should keep track of the token in the last RDATA command +received for each stream so that on reconneciton it can start streaming +from the correct place. Note: not all RDATA have valid tokens due to +batching. See `RdataCommand` for more details. + +### Example + +An example iteraction is shown below. Each line is prefixed with '>' +or '<' to indicate which side is sending, these are *not* included on +the wire: + + * connection established * + > SERVER localhost:8823 + > PING 1490197665618 + < NAME synapse.app.appservice + < PING 1490197665618 + < REPLICATE + > POSITION events master 1 1 + > POSITION backfill master 1 1 + > POSITION caches master 1 1 + > RDATA caches master 2 ["get_user_by_id",["@01register-user:localhost:8823"],1490197670513] + > RDATA events master 14 ["$149019767112vOHxz:localhost:8823", + "!AFDCvgApUmpdfVjIXm:localhost:8823","m.room.guest_access","",null] + < PING 1490197675618 + > ERROR server stopping + * connection closed by server * + +The `POSITION` command sent by the server is used to set the clients +position without needing to send data with the `RDATA` command. + +An example of a batched set of `RDATA` is: + + > RDATA caches master batch ["get_user_by_id",["@test:localhost:8823"],1490197670513] + > RDATA caches master batch ["get_user_by_id",["@test2:localhost:8823"],1490197670513] + > RDATA caches master batch ["get_user_by_id",["@test3:localhost:8823"],1490197670513] + > RDATA caches master 54 ["get_user_by_id",["@test4:localhost:8823"],1490197670513] + +In this case the client shouldn't advance their caches token until it +sees the the last `RDATA`. + +### List of commands + +The list of valid commands, with which side can send it: server (S) or +client (C): + +#### SERVER (S) + + Sent at the start to identify which server the client is talking to + +#### RDATA (S) + + A single update in a stream + +#### POSITION (S) + + On receipt of a POSITION command clients should check if they have missed any + updates, and if so then fetch them out of band. Sent in response to a + REPLICATE command (but can happen at any time). + + The POSITION command includes the source of the stream. Currently all streams + are written by a single process (usually "master"). If fetching missing + updates via HTTP API, rather than via the DB, then processes should make the + request to the appropriate process. + + Two positions are included, the "new" position and the last position sent respectively. + This allows servers to tell instances that the positions have advanced but no + data has been written, without clients needlessly checking to see if they + have missed any updates. + +#### ERROR (S, C) + + There was an error + +#### PING (S, C) + + Sent periodically to ensure the connection is still alive + +#### NAME (C) + + Sent at the start by client to inform the server who they are + +#### REPLICATE (C) + +Asks the server for the current position of all streams. + +#### USER_SYNC (C) + + A user has started or stopped syncing on this process. + +#### CLEAR_USER_SYNC (C) + + The server should clear all associated user sync data from the worker. + + This is used when a worker is shutting down. + +#### FEDERATION_ACK (C) + + Acknowledge receipt of some federation data + +### REMOTE_SERVER_UP (S, C) + + Inform other processes that a remote server may have come back online. + +See `synapse/replication/tcp/commands.py` for a detailed description and +the format of each command. + +### Cache Invalidation Stream + +The cache invalidation stream is used to inform workers when they need +to invalidate any of their caches in the data store. This is done by +streaming all cache invalidations done on master down to the workers, +assuming that any caches on the workers also exist on the master. + +Each individual cache invalidation results in a row being sent down +replication, which includes the cache name (the name of the function) +and they key to invalidate. For example: + + > RDATA caches master 550953771 ["get_user_by_id", ["@bob:example.com"], 1550574873251] + +Alternatively, an entire cache can be invalidated by sending down a `null` +instead of the key. For example: + + > RDATA caches master 550953772 ["get_user_by_id", null, 1550574873252] + +However, there are times when a number of caches need to be invalidated +at the same time with the same key. To reduce traffic we batch those +invalidations into a single poke by defining a special cache name that +workers understand to mean to expand to invalidate the correct caches. + +Currently the special cache names are declared in +`synapse/storage/_base.py` and are: + +1. `cs_cache_fake` ─ invalidates caches that depend on the current + state diff --git a/docs/log_contexts.md b/docs/log_contexts.md deleted file mode 100644 index cb15dbe158..0000000000 --- a/docs/log_contexts.md +++ /dev/null @@ -1,364 +0,0 @@ -# Log Contexts - -To help track the processing of individual requests, synapse uses a -'`log context`' to track which request it is handling at any given -moment. This is done via a thread-local variable; a `logging.Filter` is -then used to fish the information back out of the thread-local variable -and add it to each log record. - -Logcontexts are also used for CPU and database accounting, so that we -can track which requests were responsible for high CPU use or database -activity. - -The `synapse.logging.context` module provides facilities for managing -the current log context (as well as providing the `LoggingContextFilter` -class). - -Asynchronous functions make the whole thing complicated, so this document describes -how it all works, and how to write code which follows the rules. - -In this document, "awaitable" refers to any object which can be `await`ed. In the context of -Synapse, that normally means either a coroutine or a Twisted -[`Deferred`](https://twistedmatrix.com/documents/current/api/twisted.internet.defer.Deferred.html). - -## Logcontexts without asynchronous code - -In the absence of any asynchronous voodoo, things are simple enough. As with -any code of this nature, the rule is that our function should leave -things as it found them: - -```python -from synapse.logging import context # omitted from future snippets - -def handle_request(request_id): - request_context = context.LoggingContext() - - calling_context = context.set_current_context(request_context) - try: - request_context.request = request_id - do_request_handling() - logger.debug("finished") - finally: - context.set_current_context(calling_context) - -def do_request_handling(): - logger.debug("phew") # this will be logged against request_id -``` - -LoggingContext implements the context management methods, so the above -can be written much more succinctly as: - -```python -def handle_request(request_id): - with context.LoggingContext() as request_context: - request_context.request = request_id - do_request_handling() - logger.debug("finished") - -def do_request_handling(): - logger.debug("phew") -``` - -## Using logcontexts with awaitables - -Awaitables break the linear flow of code so that there is no longer a single entry point -where we should set the logcontext and a single exit point where we should remove it. - -Consider the example above, where `do_request_handling` needs to do some -blocking operation, and returns an awaitable: - -```python -async def handle_request(request_id): - with context.LoggingContext() as request_context: - request_context.request = request_id - await do_request_handling() - logger.debug("finished") -``` - -In the above flow: - -- The logcontext is set -- `do_request_handling` is called, and returns an awaitable -- `handle_request` awaits the awaitable -- Execution of `handle_request` is suspended - -So we have stopped processing the request (and will probably go on to -start processing the next), without clearing the logcontext. - -To circumvent this problem, synapse code assumes that, wherever you have -an awaitable, you will want to `await` it. To that end, whereever -functions return awaitables, we adopt the following conventions: - -**Rules for functions returning awaitables:** - -> - If the awaitable is already complete, the function returns with the -> same logcontext it started with. -> - If the awaitable is incomplete, the function clears the logcontext -> before returning; when the awaitable completes, it restores the -> logcontext before running any callbacks. - -That sounds complicated, but actually it means a lot of code (including -the example above) "just works". There are two cases: - -- If `do_request_handling` returns a completed awaitable, then the - logcontext will still be in place. In this case, execution will - continue immediately after the `await`; the "finished" line will - be logged against the right context, and the `with` block restores - the original context before we return to the caller. -- If the returned awaitable is incomplete, `do_request_handling` clears - the logcontext before returning. The logcontext is therefore clear - when `handle_request` `await`s the awaitable. - - Once `do_request_handling`'s awaitable completes, it will reinstate - the logcontext, before running the second half of `handle_request`, - so again the "finished" line will be logged against the right context, - and the `with` block restores the original context. - -As an aside, it's worth noting that `handle_request` follows our rules -- though that only matters if the caller has its own logcontext which it -cares about. - -The following sections describe pitfalls and helpful patterns when -implementing these rules. - -Always await your awaitables ----------------------------- - -Whenever you get an awaitable back from a function, you should `await` on -it as soon as possible. Do not pass go; do not do any logging; do not -call any other functions. - -```python -async def fun(): - logger.debug("starting") - await do_some_stuff() # just like this - - coro = more_stuff() - result = await coro # also fine, of course - - return result -``` - -Provided this pattern is followed all the way back up to the callchain -to where the logcontext was set, this will make things work out ok: -provided `do_some_stuff` and `more_stuff` follow the rules above, then -so will `fun`. - -It's all too easy to forget to `await`: for instance if we forgot that -`do_some_stuff` returned an awaitable, we might plough on regardless. This -leads to a mess; it will probably work itself out eventually, but not -before a load of stuff has been logged against the wrong context. -(Normally, other things will break, more obviously, if you forget to -`await`, so this tends not to be a major problem in practice.) - -Of course sometimes you need to do something a bit fancier with your -awaitable - not all code follows the linear A-then-B-then-C pattern. -Notes on implementing more complex patterns are in later sections. - -## Where you create a new awaitable, make it follow the rules - -Most of the time, an awaitable comes from another synapse function. -Sometimes, though, we need to make up a new awaitable, or we get an awaitable -back from external code. We need to make it follow our rules. - -The easy way to do it is by using `context.make_deferred_yieldable`. Suppose we want to implement -`sleep`, which returns a deferred which will run its callbacks after a -given number of seconds. That might look like: - -```python -# not a logcontext-rules-compliant function -def get_sleep_deferred(seconds): - d = defer.Deferred() - reactor.callLater(seconds, d.callback, None) - return d -``` - -That doesn't follow the rules, but we can fix it by calling it through -`context.make_deferred_yieldable`: - -```python -async def sleep(seconds): - return await context.make_deferred_yieldable(get_sleep_deferred(seconds)) -``` - -## Fire-and-forget - -Sometimes you want to fire off a chain of execution, but not wait for -its result. That might look a bit like this: - -```python -async def do_request_handling(): - await foreground_operation() - - # *don't* do this - background_operation() - - logger.debug("Request handling complete") - -async def background_operation(): - await first_background_step() - logger.debug("Completed first step") - await second_background_step() - logger.debug("Completed second step") -``` - -The above code does a couple of steps in the background after -`do_request_handling` has finished. The log lines are still logged -against the `request_context` logcontext, which may or may not be -desirable. There are two big problems with the above, however. The first -problem is that, if `background_operation` returns an incomplete -awaitable, it will expect its caller to `await` immediately, so will have -cleared the logcontext. In this example, that means that 'Request -handling complete' will be logged without any context. - -The second problem, which is potentially even worse, is that when the -awaitable returned by `background_operation` completes, it will restore -the original logcontext. There is nothing waiting on that awaitable, so -the logcontext will leak into the reactor and possibly get attached to -some arbitrary future operation. - -There are two potential solutions to this. - -One option is to surround the call to `background_operation` with a -`PreserveLoggingContext` call. That will reset the logcontext before -starting `background_operation` (so the context restored when the -deferred completes will be the empty logcontext), and will restore the -current logcontext before continuing the foreground process: - -```python -async def do_request_handling(): - await foreground_operation() - - # start background_operation off in the empty logcontext, to - # avoid leaking the current context into the reactor. - with PreserveLoggingContext(): - background_operation() - - # this will now be logged against the request context - logger.debug("Request handling complete") -``` - -Obviously that option means that the operations done in -`background_operation` would be not be logged against a logcontext -(though that might be fixed by setting a different logcontext via a -`with LoggingContext(...)` in `background_operation`). - -The second option is to use `context.run_in_background`, which wraps a -function so that it doesn't reset the logcontext even when it returns -an incomplete awaitable, and adds a callback to the returned awaitable to -reset the logcontext. In other words, it turns a function that follows -the Synapse rules about logcontexts and awaitables into one which behaves -more like an external function --- the opposite operation to that -described in the previous section. It can be used like this: - -```python -async def do_request_handling(): - await foreground_operation() - - context.run_in_background(background_operation) - - # this will now be logged against the request context - logger.debug("Request handling complete") -``` - -## Passing synapse deferreds into third-party functions - -A typical example of this is where we want to collect together two or -more awaitables via `defer.gatherResults`: - -```python -a1 = operation1() -a2 = operation2() -a3 = defer.gatherResults([a1, a2]) -``` - -This is really a variation of the fire-and-forget problem above, in that -we are firing off `a1` and `a2` without awaiting on them. The difference -is that we now have third-party code attached to their callbacks. Anyway -either technique given in the [Fire-and-forget](#fire-and-forget) -section will work. - -Of course, the new awaitable returned by `gather` needs to be -wrapped in order to make it follow the logcontext rules before we can -yield it, as described in [Where you create a new awaitable, make it -follow the -rules](#where-you-create-a-new-awaitable-make-it-follow-the-rules). - -So, option one: reset the logcontext before starting the operations to -be gathered: - -```python -async def do_request_handling(): - with PreserveLoggingContext(): - a1 = operation1() - a2 = operation2() - result = await defer.gatherResults([a1, a2]) -``` - -In this case particularly, though, option two, of using -`context.run_in_background` almost certainly makes more sense, so that -`operation1` and `operation2` are both logged against the original -logcontext. This looks like: - -```python -async def do_request_handling(): - a1 = context.run_in_background(operation1) - a2 = context.run_in_background(operation2) - - result = await make_deferred_yieldable(defer.gatherResults([a1, a2])) -``` - -## A note on garbage-collection of awaitable chains - -It turns out that our logcontext rules do not play nicely with awaitable -chains which get orphaned and garbage-collected. - -Imagine we have some code that looks like this: - -```python -listener_queue = [] - -def on_something_interesting(): - for d in listener_queue: - d.callback("foo") - -async def await_something_interesting(): - new_awaitable = defer.Deferred() - listener_queue.append(new_awaitable) - - with PreserveLoggingContext(): - await new_awaitable -``` - -Obviously, the idea here is that we have a bunch of things which are -waiting for an event. (It's just an example of the problem here, but a -relatively common one.) - -Now let's imagine two further things happen. First of all, whatever was -waiting for the interesting thing goes away. (Perhaps the request times -out, or something *even more* interesting happens.) - -Secondly, let's suppose that we decide that the interesting thing is -never going to happen, and we reset the listener queue: - -```python -def reset_listener_queue(): - listener_queue.clear() -``` - -So, both ends of the awaitable chain have now dropped their references, -and the awaitable chain is now orphaned, and will be garbage-collected at -some point. Note that `await_something_interesting` is a coroutine, -which Python implements as a generator function. When Python -garbage-collects generator functions, it gives them a chance to -clean up by making the `await` (or `yield`) raise a `GeneratorExit` -exception. In our case, that means that the `__exit__` handler of -`PreserveLoggingContext` will carefully restore the request context, but -there is now nothing waiting for its return, so the request context is -never cleared. - -To reiterate, this problem only arises when *both* ends of a awaitable -chain are dropped. Dropping the the reference to an awaitable you're -supposed to be awaiting is bad practice, so this doesn't -actually happen too much. Unfortunately, when it does happen, it will -lead to leaked logcontexts which are incredibly hard to track down. diff --git a/docs/media_repository.md b/docs/media_repository.md deleted file mode 100644 index 23e6da7f31..0000000000 --- a/docs/media_repository.md +++ /dev/null @@ -1,78 +0,0 @@ -# Media Repository - -*Synapse implementation-specific details for the media repository* - -The media repository - * stores avatars, attachments and their thumbnails for media uploaded by local - users. - * caches avatars, attachments and their thumbnails for media uploaded by remote - users. - * caches resources and thumbnails used for URL previews. - -All media in Matrix can be identified by a unique -[MXC URI](https://spec.matrix.org/latest/client-server-api/#matrix-content-mxc-uris), -consisting of a server name and media ID: -``` -mxc:/// -``` - -## Local Media -Synapse generates 24 character media IDs for content uploaded by local users. -These media IDs consist of upper and lowercase letters and are case-sensitive. -Other homeserver implementations may generate media IDs differently. - -Local media is recorded in the `local_media_repository` table, which includes -metadata such as MIME types, upload times and file sizes. -Note that this table is shared by the URL cache, which has a different media ID -scheme. - -### Paths -A file with media ID `aabbcccccccccccccccccccc` and its `128x96` `image/jpeg` -thumbnail, created by scaling, would be stored at: -``` -local_content/aa/bb/cccccccccccccccccccc -local_thumbnails/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale -``` - -## Remote Media -When media from a remote homeserver is requested from Synapse, it is assigned -a local `filesystem_id`, with the same format as locally-generated media IDs, -as described above. - -A record of remote media is stored in the `remote_media_cache` table, which -can be used to map remote MXC URIs (server names and media IDs) to local -`filesystem_id`s. - -### Paths -A file from `matrix.org` with `filesystem_id` `aabbcccccccccccccccccccc` and its -`128x96` `image/jpeg` thumbnail, created by scaling, would be stored at: -``` -remote_content/matrix.org/aa/bb/cccccccccccccccccccc -remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale -``` -Older thumbnails may omit the thumbnailing method: -``` -remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg -``` - -Note that `remote_thumbnail/` does not have an `s`. - -## URL Previews - -When generating previews for URLs, Synapse may download and cache various -resources, including images. These resources are assigned temporary media IDs -of the form `yyyy-mm-dd_aaaaaaaaaaaaaaaa`, where `yyyy-mm-dd` is the current -date and `aaaaaaaaaaaaaaaa` is a random sequence of 16 case-sensitive letters. - -The metadata for these cached resources is stored in the -`local_media_repository` and `local_media_repository_url_cache` tables. - -Resources for URL previews are deleted after a few days. - -### Paths -The file with media ID `yyyy-mm-dd_aaaaaaaaaaaaaaaa` and its `128x96` -`image/jpeg` thumbnail, created by scaling, would be stored at: -``` -url_cache/yyyy-mm-dd/aaaaaaaaaaaaaaaa -url_cache_thumbnails/yyyy-mm-dd/aaaaaaaaaaaaaaaa/128-96-image-jpeg-scale -``` diff --git a/docs/opentracing.md b/docs/opentracing.md deleted file mode 100644 index abb94b565f..0000000000 --- a/docs/opentracing.md +++ /dev/null @@ -1,94 +0,0 @@ -# OpenTracing - -## Background - -OpenTracing is a semi-standard being adopted by a number of distributed -tracing platforms. It is a common api for facilitating vendor-agnostic -tracing instrumentation. That is, we can use the OpenTracing api and -select one of a number of tracer implementations to do the heavy lifting -in the background. Our current selected implementation is Jaeger. - -OpenTracing is a tool which gives an insight into the causal -relationship of work done in and between servers. The servers each track -events and report them to a centralised server - in Synapse's case: -Jaeger. The basic unit used to represent events is the span. The span -roughly represents a single piece of work that was done and the time at -which it occurred. A span can have child spans, meaning that the work of -the child had to be completed for the parent span to complete, or it can -have follow-on spans which represent work that is undertaken as a result -of the parent but is not depended on by the parent to in order to -finish. - -Since this is undertaken in a distributed environment a request to -another server, such as an RPC or a simple GET, can be considered a span -(a unit or work) for the local server. This causal link is what -OpenTracing aims to capture and visualise. In order to do this metadata -about the local server's span, i.e the 'span context', needs to be -included with the request to the remote. - -It is up to the remote server to decide what it does with the spans it -creates. This is called the sampling policy and it can be configured -through Jaeger's settings. - -For OpenTracing concepts see -. - -For more information about Jaeger's implementation see - - -## Setting up OpenTracing - -To receive OpenTracing spans, start up a Jaeger server. This can be done -using docker like so: - -```sh -docker run -d --name jaeger \ - -p 6831:6831/udp \ - -p 6832:6832/udp \ - -p 5778:5778 \ - -p 16686:16686 \ - -p 14268:14268 \ - jaegertracing/all-in-one:1 -``` - -Latest documentation is probably at -https://www.jaegertracing.io/docs/latest/getting-started. - -## Enable OpenTracing in Synapse - -OpenTracing is not enabled by default. It must be enabled in the -homeserver config by adding the `opentracing` option to your config file. You can find -documentation about how to do this in the [config manual under the header 'Opentracing'](usage/configuration/config_documentation.md#opentracing). -See below for an example Opentracing configuration: - -```yaml -opentracing: - enabled: true - homeserver_whitelist: - - "mytrustedhomeserver.org" - - "*.myotherhomeservers.com" -``` - -## Homeserver whitelisting - -The homeserver whitelist is configured using regular expressions. A list -of regular expressions can be given and their union will be compared -when propagating any spans contexts to another homeserver. - -Though it's mostly safe to send and receive span contexts to and from -untrusted users since span contexts are usually opaque ids it can lead -to two problems, namely: - -- If the span context is marked as sampled by the sending homeserver - the receiver will sample it. Therefore two homeservers with wildly - different sampling policies could incur higher sampling counts than - intended. -- Sending servers can attach arbitrary data to spans, known as - 'baggage'. For safety this has been disabled in Synapse but that - doesn't prevent another server sending you baggage which will be - logged to OpenTracing's logs. - -## Configuring Jaeger - -Sampling strategies can be set as in this document: -. diff --git a/docs/replication.md b/docs/replication.md deleted file mode 100644 index 108da9a065..0000000000 --- a/docs/replication.md +++ /dev/null @@ -1,42 +0,0 @@ -# Replication Architecture - -## Motivation - -We'd like to be able to split some of the work that synapse does into -multiple python processes. In theory multiple synapse processes could -share a single postgresql database and we\'d scale up by running more -synapse processes. However much of synapse assumes that only one process -is interacting with the database, both for assigning unique identifiers -when inserting into tables, notifying components about new updates, and -for invalidating its caches. - -So running multiple copies of the current code isn't an option. One way -to run multiple processes would be to have a single writer process and -multiple reader processes connected to the same database. In order to do -this we'd need a way for the reader process to invalidate its in-memory -caches when an update happens on the writer. One way to do this is for -the writer to present an append-only log of updates which the readers -can consume to invalidate their caches and to push updates to listening -clients or pushers. - -Synapse already stores much of its data as an append-only log so that it -can correctly respond to `/sync` requests so the amount of code changes -needed to expose the append-only log to the readers should be fairly -minimal. - -## Architecture - -### The Replication Protocol - -See [the TCP replication documentation](tcp_replication.md). - -### The Slaved DataStore - -There are read-only version of the synapse storage layer in -`synapse/replication/slave/storage` that use the response of the -replication API to invalidate their caches. - -### The TCP Replication Module -Information about how the tcp replication module is structured, including how -the classes interact, can be found in -`synapse/replication/tcp/__init__.py` diff --git a/docs/room_and_user_statistics.md b/docs/room_and_user_statistics.md deleted file mode 100644 index cc38c890bb..0000000000 --- a/docs/room_and_user_statistics.md +++ /dev/null @@ -1,22 +0,0 @@ -Room and User Statistics -======================== - -Synapse maintains room and user statistics in various tables. These can be used -for administrative purposes but are also used when generating the public room -directory. - - -# Synapse Developer Documentation - -## High-Level Concepts - -### Definitions - -* **subject**: Something we are tracking stats about – currently a room or user. -* **current row**: An entry for a subject in the appropriate current statistics - table. Each subject can have only one. - -### Overview - -Stats correspond to the present values. Current rows contain the most up-to-date -statistics for a room. Each subject can only have one entry. diff --git a/docs/tcp_replication.md b/docs/tcp_replication.md deleted file mode 100644 index 15df949deb..0000000000 --- a/docs/tcp_replication.md +++ /dev/null @@ -1,257 +0,0 @@ -# TCP Replication - -## Motivation - -Previously the workers used an HTTP long poll mechanism to get updates -from the master, which had the problem of causing a lot of duplicate -work on the server. This TCP protocol replaces those APIs with the aim -of increased efficiency. - -## Overview - -The protocol is based on fire and forget, line based commands. An -example flow would be (where '>' indicates master to worker and -'<' worker to master flows): - - > SERVER example.com - < REPLICATE - > POSITION events master 53 53 - > RDATA events master 54 ["$foo1:bar.com", ...] - > RDATA events master 55 ["$foo4:bar.com", ...] - -The example shows the server accepting a new connection and sending its identity -with the `SERVER` command, followed by the client server to respond with the -position of all streams. The server then periodically sends `RDATA` commands -which have the format `RDATA `, where -the format of `` is defined by the individual streams. The -`` is the name of the Synapse process that generated the data -(usually "master"). - -Error reporting happens by either the client or server sending an ERROR -command, and usually the connection will be closed. - -Since the protocol is a simple line based, its possible to manually -connect to the server using a tool like netcat. A few things should be -noted when manually using the protocol: - -- The federation stream is only available if federation sending has - been disabled on the main process. -- The server will only time connections out that have sent a `PING` - command. If a ping is sent then the connection will be closed if no - further commands are receieved within 15s. Both the client and - server protocol implementations will send an initial PING on - connection and ensure at least one command every 5s is sent (not - necessarily `PING`). -- `RDATA` commands *usually* include a numeric token, however if the - stream has multiple rows to replicate per token the server will send - multiple `RDATA` commands, with all but the last having a token of - `batch`. See the documentation on `commands.RdataCommand` for - further details. - -## Architecture - -The basic structure of the protocol is line based, where the initial -word of each line specifies the command. The rest of the line is parsed -based on the command. For example, the RDATA command is defined as: - - RDATA - -(Note that may contains spaces, but cannot contain -newlines.) - -Blank lines are ignored. - -### Keep alives - -Both sides are expected to send at least one command every 5s or so, and -should send a `PING` command if necessary. If either side do not receive -a command within e.g. 15s then the connection should be closed. - -Because the server may be connected to manually using e.g. netcat, the -timeouts aren't enabled until an initial `PING` command is seen. Both -the client and server implementations below send a `PING` command -immediately on connection to ensure the timeouts are enabled. - -This ensures that both sides can quickly realize if the tcp connection -has gone and handle the situation appropriately. - -### Start up - -When a new connection is made, the server: - -- Sends a `SERVER` command, which includes the identity of the server, - allowing the client to detect if its connected to the expected - server -- Sends a `PING` command as above, to enable the client to time out - connections promptly. - -The client: - -- Sends a `NAME` command, allowing the server to associate a human - friendly name with the connection. This is optional. -- Sends a `PING` as above -- Sends a `REPLICATE` to get the current position of all streams. -- On receipt of a `SERVER` command, checks that the server name - matches the expected server name. - -### Error handling - -If either side detects an error it can send an `ERROR` command and close -the connection. - -If the client side loses the connection to the server it should -reconnect, following the steps above. - -### Congestion - -If the server sends messages faster than the client can consume them the -server will first buffer a (fairly large) number of commands and then -disconnect the client. This ensures that we don't queue up an unbounded -number of commands in memory and gives us a potential oppurtunity to -squawk loudly. When/if the client recovers it can reconnect to the -server and ask for missed messages. - -### Reliability - -In general the replication stream should be considered an unreliable -transport since e.g. commands are not resent if the connection -disappears. - -The exception to that are the replication streams, i.e. RDATA commands, -since these include tokens which can be used to restart the stream on -connection errors. - -The client should keep track of the token in the last RDATA command -received for each stream so that on reconneciton it can start streaming -from the correct place. Note: not all RDATA have valid tokens due to -batching. See `RdataCommand` for more details. - -### Example - -An example iteraction is shown below. Each line is prefixed with '>' -or '<' to indicate which side is sending, these are *not* included on -the wire: - - * connection established * - > SERVER localhost:8823 - > PING 1490197665618 - < NAME synapse.app.appservice - < PING 1490197665618 - < REPLICATE - > POSITION events master 1 1 - > POSITION backfill master 1 1 - > POSITION caches master 1 1 - > RDATA caches master 2 ["get_user_by_id",["@01register-user:localhost:8823"],1490197670513] - > RDATA events master 14 ["$149019767112vOHxz:localhost:8823", - "!AFDCvgApUmpdfVjIXm:localhost:8823","m.room.guest_access","",null] - < PING 1490197675618 - > ERROR server stopping - * connection closed by server * - -The `POSITION` command sent by the server is used to set the clients -position without needing to send data with the `RDATA` command. - -An example of a batched set of `RDATA` is: - - > RDATA caches master batch ["get_user_by_id",["@test:localhost:8823"],1490197670513] - > RDATA caches master batch ["get_user_by_id",["@test2:localhost:8823"],1490197670513] - > RDATA caches master batch ["get_user_by_id",["@test3:localhost:8823"],1490197670513] - > RDATA caches master 54 ["get_user_by_id",["@test4:localhost:8823"],1490197670513] - -In this case the client shouldn't advance their caches token until it -sees the the last `RDATA`. - -### List of commands - -The list of valid commands, with which side can send it: server (S) or -client (C): - -#### SERVER (S) - - Sent at the start to identify which server the client is talking to - -#### RDATA (S) - - A single update in a stream - -#### POSITION (S) - - On receipt of a POSITION command clients should check if they have missed any - updates, and if so then fetch them out of band. Sent in response to a - REPLICATE command (but can happen at any time). - - The POSITION command includes the source of the stream. Currently all streams - are written by a single process (usually "master"). If fetching missing - updates via HTTP API, rather than via the DB, then processes should make the - request to the appropriate process. - - Two positions are included, the "new" position and the last position sent respectively. - This allows servers to tell instances that the positions have advanced but no - data has been written, without clients needlessly checking to see if they - have missed any updates. - -#### ERROR (S, C) - - There was an error - -#### PING (S, C) - - Sent periodically to ensure the connection is still alive - -#### NAME (C) - - Sent at the start by client to inform the server who they are - -#### REPLICATE (C) - -Asks the server for the current position of all streams. - -#### USER_SYNC (C) - - A user has started or stopped syncing on this process. - -#### CLEAR_USER_SYNC (C) - - The server should clear all associated user sync data from the worker. - - This is used when a worker is shutting down. - -#### FEDERATION_ACK (C) - - Acknowledge receipt of some federation data - -### REMOTE_SERVER_UP (S, C) - - Inform other processes that a remote server may have come back online. - -See `synapse/replication/tcp/commands.py` for a detailed description and -the format of each command. - -### Cache Invalidation Stream - -The cache invalidation stream is used to inform workers when they need -to invalidate any of their caches in the data store. This is done by -streaming all cache invalidations done on master down to the workers, -assuming that any caches on the workers also exist on the master. - -Each individual cache invalidation results in a row being sent down -replication, which includes the cache name (the name of the function) -and they key to invalidate. For example: - - > RDATA caches master 550953771 ["get_user_by_id", ["@bob:example.com"], 1550574873251] - -Alternatively, an entire cache can be invalidated by sending down a `null` -instead of the key. For example: - - > RDATA caches master 550953772 ["get_user_by_id", null, 1550574873252] - -However, there are times when a number of caches need to be invalidated -at the same time with the same key. To reduce traffic we batch those -invalidations into a single poke by defining a special cache name that -workers understand to mean to expand to invalidate the correct caches. - -Currently the special cache names are declared in -`synapse/storage/_base.py` and are: - -1. `cs_cache_fake` ─ invalidates caches that depend on the current - state diff --git a/docs/usage/configuration/config_documentation.md b/docs/usage/configuration/config_documentation.md index 65025d3840..9fbc328042 100644 --- a/docs/usage/configuration/config_documentation.md +++ b/docs/usage/configuration/config_documentation.md @@ -3493,7 +3493,7 @@ user_consent: --- ### `stats` -Settings for local room and user statistics collection. See [here](../../room_and_user_statistics.md) +Settings for local room and user statistics collection. See [here](../../development/internal_documentation/room_and_user_statistics.md) for more. * `enabled`: Set to false to disable room and user statistics. Note that doing @@ -3642,7 +3642,7 @@ synapse or any other services which support opentracing Sub-options include: * `enabled`: whether tracing is enabled. Set to true to enable. Disabled by default. * `homeserver_whitelist`: The list of homeservers we wish to send and receive span contexts and span baggage. - See [here](../../opentracing.md) for more. + See [here](../../development/opentracing.md) for more. This is a list of regexes which are matched against the `server_name` of the homeserver. By default, it is empty, so no servers are matched. * `force_tracing_for_users`: # A list of the matrix IDs of users whose requests will always be traced, -- cgit 1.5.1