diff options
Diffstat (limited to 'docs/development')
-rw-r--r-- | docs/development/contributing_guide.md | 66 | ||||
-rw-r--r-- | docs/development/database_schema.md | 34 | ||||
-rw-r--r-- | docs/development/dependencies.md | 33 | ||||
-rw-r--r-- | docs/development/releases.md | 4 | ||||
-rw-r--r-- | docs/development/synapse_architecture/faster_joins.md | 375 | ||||
-rw-r--r-- | docs/development/synapse_architecture/streams.md | 157 |
6 files changed, 614 insertions, 55 deletions
diff --git a/docs/development/contributing_guide.md b/docs/development/contributing_guide.md index 342bc1d340..4ae2fcfee3 100644 --- a/docs/development/contributing_guide.md +++ b/docs/development/contributing_guide.md @@ -22,15 +22,17 @@ on Windows is not officially supported. The code of Synapse is written in Python 3. To do pretty much anything, you'll need [a recent version of Python 3](https://www.python.org/downloads/). Your Python also needs support for [virtual environments](https://docs.python.org/3/library/venv.html). This is usually built-in, but some Linux distributions like Debian and Ubuntu split it out into its own package. Running `sudo apt install python3-venv` should be enough. +A recent version of the Rust compiler is needed to build the native modules. The +easiest way of installing the latest version is to use [rustup](https://rustup.rs/). + Synapse can connect to PostgreSQL via the [psycopg2](https://pypi.org/project/psycopg2/) Python library. Building this library from source requires access to PostgreSQL's C header files. On Debian or Ubuntu Linux, these can be installed with `sudo apt install libpq-dev`. +Synapse has an optional, improved user search with better Unicode support. For that you need the development package of `libicu`. On Debian or Ubuntu Linux, this can be installed with `sudo apt install libicu-dev`. + The source code of Synapse is hosted on GitHub. You will also need [a recent version of git](https://github.com/git-guides/install-git). For some tests, you will need [a recent version of Docker](https://docs.docker.com/get-docker/). -A recent version of the Rust compiler is needed to build the native modules. The -easiest way of installing the latest version is to use [rustup](https://rustup.rs/). - # 3. Get the source. @@ -51,6 +53,11 @@ can find many good git tutorials on the web. # 4. Install the dependencies + +Before installing the Python dependencies, make sure you have installed a recent version +of Rust (see the "What do I need?" section above). The easiest way of installing the +latest version is to use [rustup](https://rustup.rs/). + Synapse uses the [poetry](https://python-poetry.org/) project to manage its dependencies and development environment. Once you have installed Python 3 and added the source, you should install `poetry`. @@ -65,7 +72,7 @@ pipx install poetry but see poetry's [installation instructions](https://python-poetry.org/docs/#installation) for other installation methods. -Synapse requires Poetry version 1.2.0 or later. +Developing Synapse requires Poetry version 1.3.2 or later. Next, open a terminal and install dependencies as follows: @@ -74,8 +81,39 @@ cd path/where/you/have/cloned/the/repository poetry install --extras all ``` -This will install the runtime and developer dependencies for the project. +This will install the runtime and developer dependencies for the project. Be sure to check +that the `poetry install` step completed cleanly. + +## Running Synapse via poetry + +To start a local instance of Synapse in the locked poetry environment, create a config file: + +```sh +cp docs/sample_config.yaml homeserver.yaml +cp docs/sample_log_config.yaml log_config.yaml +``` + +Now edit `homeserver.yaml`, things you might want to change include: + +- Set a `server_name` +- Adjusting paths to be correct for your system like the `log_config` to point to the log config you just copied +- Using a [PostgreSQL database instead of SQLite](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#database) +- Adding a [`registration_shared_secret`](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#registration_shared_secret) so you can use [`register_new_matrix_user` command](https://matrix-org.github.io/synapse/latest/setup/installation.html#registering-a-user). + +And then run Synapse with the following command: + +```sh +poetry run python -m synapse.app.homeserver -c homeserver.yaml +``` + +If you get an error like the following: + +``` +importlib.metadata.PackageNotFoundError: matrix-synapse +``` +this probably indicates that the `poetry install` step did not complete cleanly - go back and +resolve any issues and re-run until successful. # 5. Get in touch. @@ -104,8 +142,8 @@ regarding Synapse's Admin API, which is used mostly by sysadmins and external service developers. Synapse's code style is documented [here](../code_style.md). Please follow -it, including the conventions for the [sample configuration -file](../code_style.md#configuration-file-format). +it, including the conventions for [configuration +options and documentation](../code_style.md#configuration-code-and-documentation-format). We welcome improvements and additions to our documentation itself! When writing new pages, please @@ -124,7 +162,7 @@ changes to the Rust code. # 8. Test, test, test! -<a name="test-test-test"></a> +<a name="test-test-test" id="test-test-test"></a> While you're developing and before submitting a patch, you'll want to test your code. @@ -228,7 +266,7 @@ The easiest way to do so is to run Postgres via a docker container. In one terminal: ```shell -docker run --rm -e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_USER=postgres -e POSTGRES_DB=postgress -p 5432:5432 postgres:14 +docker run --rm -e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_USER=postgres -e POSTGRES_DB=postgres -p 5432:5432 postgres:14 ``` If you see an error like @@ -284,7 +322,7 @@ The following command will let you run the integration test with the most common configuration: ```sh -$ docker run --rm -it -v /path/where/you/have/cloned/the/repository\:/src:ro -v /path/to/where/you/want/logs\:/logs matrixdotorg/sytest-synapse:buster +$ docker run --rm -it -v /path/where/you/have/cloned/the/repository\:/src:ro -v /path/to/where/you/want/logs\:/logs matrixdotorg/sytest-synapse:focal ``` (Note that the paths must be full paths! You could also write `$(realpath relative/path)` if needed.) @@ -330,6 +368,9 @@ The above will run a monolithic (single-process) Synapse with SQLite as the data [here](https://github.com/matrix-org/synapse/blob/develop/docker/configure_workers_and_start.py#L54). A safe example would be `WORKER_TYPES="federation_inbound, federation_sender, synchrotron"`. See the [worker documentation](../workers.md) for additional information on workers. +- Passing `ASYNCIO_REACTOR=1` as an environment variable to use the Twisted asyncio reactor instead of the default one. +- Passing `PODMAN=1` will use the [podman](https://podman.io/) container runtime, instead of docker. +- Passing `UNIX_SOCKETS=1` will utilise Unix socket functionality for Synapse, Redis, and Postgres(when applicable). To increase the log level for the tests, set `SYNAPSE_TEST_LOG_LEVEL`, e.g: ```sh @@ -380,7 +421,7 @@ To prepare a Pull Request, please: ## Changelog All changes, even minor ones, need a corresponding changelog / newsfragment -entry. These are managed by [Towncrier](https://github.com/hawkowl/towncrier). +entry. These are managed by [Towncrier](https://github.com/twisted/towncrier). To create a changelog entry, make a new file in the `changelog.d` directory named in the format of `PRnumber.type`. The type can be one of the following: @@ -422,8 +463,7 @@ chicken-and-egg problem. There are two options for solving this: 1. Open the PR without a changelog file, see what number you got, and *then* - add the changelog file to your branch (see [Updating your pull - request](#updating-your-pull-request)), or: + add the changelog file to your branch, or: 1. Look at the [list of all issues/PRs](https://github.com/matrix-org/synapse/issues?q=), add one to the diff --git a/docs/development/database_schema.md b/docs/development/database_schema.md index 29945c264e..e231be21dd 100644 --- a/docs/development/database_schema.md +++ b/docs/development/database_schema.md @@ -155,43 +155,11 @@ def run_upgrade( Boolean columns require special treatment, since SQLite treats booleans the same as integers. -There are three separate aspects to this: - - * Any new boolean column must be added to the `BOOLEAN_COLUMNS` list in +Any new boolean column must be added to the `BOOLEAN_COLUMNS` list in `synapse/_scripts/synapse_port_db.py`. This tells the port script to cast the integer value from SQLite to a boolean before writing the value to the postgres database. - * Before SQLite 3.23, `TRUE` and `FALSE` were not recognised as constants by - SQLite, and the `IS [NOT] TRUE`/`IS [NOT] FALSE` operators were not - supported. This makes it necessary to avoid using `TRUE` and `FALSE` - constants in SQL commands. - - For example, to insert a `TRUE` value into the database, write: - - ```python - txn.execute("INSERT INTO tbl(col) VALUES (?)", (True, )) - ``` - - * Default values for new boolean columns present a particular - difficulty. Generally it is best to create separate schema files for - Postgres and SQLite. For example: - - ```sql - # in 00delta.sql.postgres: - ALTER TABLE tbl ADD COLUMN col BOOLEAN DEFAULT FALSE; - ``` - - ```sql - # in 00delta.sql.sqlite: - ALTER TABLE tbl ADD COLUMN col BOOLEAN DEFAULT 0; - ``` - - Note that there is a particularly insidious failure mode here: the Postgres - flavour will be accepted by SQLite 3.22, but will give a column whose - default value is the **string** `"FALSE"` - which, when cast back to a boolean - in Python, evaluates to `True`. - ## `event_id` global uniqueness diff --git a/docs/development/dependencies.md b/docs/development/dependencies.md index 8474525480..b5926d96ff 100644 --- a/docs/development/dependencies.md +++ b/docs/development/dependencies.md @@ -2,6 +2,13 @@ This is a quick cheat sheet for developers on how to use [`poetry`](https://python-poetry.org/). +# Installing + +See the [contributing guide](contributing_guide.md#4-install-the-dependencies). + +Developers should use Poetry 1.3.2 or higher. If you encounter problems related +to poetry, please [double-check your poetry version](#check-the-version-of-poetry-with-poetry---version). + # Background Synapse uses a variety of third-party Python packages to function as a homeserver. @@ -123,7 +130,7 @@ context of poetry's venv, without having to run `poetry shell` beforehand. ## ...reset my venv to the locked environment? ```shell -poetry install --extras all --remove-untracked +poetry install --all-extras --sync ``` ## ...delete everything and start over from scratch? @@ -183,7 +190,6 @@ Either: - manually update `pyproject.toml`; then `poetry lock --no-update`; or else - `poetry add packagename`. See `poetry add --help`; note the `--dev`, `--extras` and `--optional` flags in particular. - - **NB**: this specifies the new package with a version given by a "caret bound". This won't get forced to its lowest version in the old deps CI job: see [this TODO](https://github.com/matrix-org/synapse/blob/4e1374373857f2f7a911a31c50476342d9070681/.ci/scripts/test_old_deps.sh#L35-L39). Include the updated `pyproject.toml` and `poetry.lock` files in your commit. @@ -196,7 +202,7 @@ poetry remove packagename ``` ought to do the trick. Alternatively, manually update `pyproject.toml` and -`poetry lock --no-update`. Include the updated `pyproject.toml` and poetry.lock` +`poetry lock --no-update`. Include the updated `pyproject.toml` and `poetry.lock` files in your commit. ## ...update the version range for an existing dependency? @@ -240,9 +246,6 @@ poetry export --extras all Be wary of bugs in `poetry export` and `pip install -r requirements.txt`. -Note: `poetry export` will be made a plugin in Poetry 1.2. Additional config may -be required. - ## ...build a test wheel? I usually use @@ -255,12 +258,28 @@ because [`build`](https://github.com/pypa/build) is a standardish tool which doesn't require poetry. (It's what we use in CI too). However, you could try `poetry build` too. +## ...handle a Dependabot pull request? + +Synapse uses Dependabot to keep the `poetry.lock` and `Cargo.lock` file +up-to-date with the latest releases of our dependencies. The changelog check is +omitted for Dependabot PRs; the release script will include them in the +changelog. + +When reviewing a dependabot PR, ensure that: + +* the lockfile changes look reasonable; +* the upstream changelog file (linked in the description) doesn't include any + breaking changes; +* continuous integration passes. + +In particular, any updates to the type hints (usually packages which start with `types-`) +should be safe to merge if linting passes. # Troubleshooting ## Check the version of poetry with `poetry --version`. -The minimum version of poetry supported by Synapse is 1.2. +The minimum version of poetry supported by Synapse is 1.3.2. It can also be useful to check the version of `poetry-core` in use. If you've installed `poetry` with `pipx`, try `pipx runpip poetry list | grep diff --git a/docs/development/releases.md b/docs/development/releases.md index c9a8c69945..6e83c81e27 100644 --- a/docs/development/releases.md +++ b/docs/development/releases.md @@ -12,7 +12,7 @@ Note that this schedule might be modified depending on the availability of the Synapse team, e.g. releases may be skipped to avoid holidays. Release announcements can be found in the -[release category of the Matrix blog](https://matrix.org/blog/category/releases). +[release category of the Matrix blog](https://matrix.org/category/releases). ## Bugfix releases @@ -34,4 +34,4 @@ be held to be released together. In some cases, a pre-disclosure of a security release will be issued as a notice to Synapse operators that there is an upcoming security release. These can be -found in the [security category of the Matrix blog](https://matrix.org/blog/category/security). +found in the [security category of the Matrix blog](https://matrix.org/category/security). diff --git a/docs/development/synapse_architecture/faster_joins.md b/docs/development/synapse_architecture/faster_joins.md new file mode 100644 index 0000000000..2256c30239 --- /dev/null +++ b/docs/development/synapse_architecture/faster_joins.md @@ -0,0 +1,375 @@ +# How do faster joins work? + +This is a work-in-progress set of notes with two goals: +- act as a reference, explaining how Synapse implements faster joins; and +- record the rationale behind our choices. + +See also [MSC3902](https://github.com/matrix-org/matrix-spec-proposals/pull/3902). + +The key idea is described by [MSC3706](https://github.com/matrix-org/matrix-spec-proposals/pull/3706). This allows servers to +request a lightweight response to the federation `/send_join` endpoint. +This is called a **faster join**, also known as a **partial join**. In these +notes we'll usually use the word "partial" as it matches the database schema. + +## Overview: processing events in a partially-joined room + +The response to a partial join consists of +- the requested join event `J`, +- a list of the servers in the room (according to the state before `J`), +- a subset of the state of the room before `J`, +- the full auth chain of that state subset. + +Synapse marks the room as partially joined by adding a row to the database table +`partial_state_rooms`. It also marks the join event `J` as "partially stated", +meaning that we have neither received nor computed the full state before/after +`J`. This is done by adding a row to `partial_state_events`. + +<details><summary>DB schema</summary> + +``` +matrix=> \d partial_state_events +Table "matrix.partial_state_events" + Column │ Type │ Collation │ Nullable │ Default +══════════╪══════╪═══════════╪══════════╪═════════ + room_id │ text │ │ not null │ + event_id │ text │ │ not null │ + +matrix=> \d partial_state_rooms + Table "matrix.partial_state_rooms" + Column │ Type │ Collation │ Nullable │ Default +════════════════════════╪════════╪═══════════╪══════════╪═════════ + room_id │ text │ │ not null │ + device_lists_stream_id │ bigint │ │ not null │ 0 + join_event_id │ text │ │ │ + joined_via │ text │ │ │ + +matrix=> \d partial_state_rooms_servers + Table "matrix.partial_state_rooms_servers" + Column │ Type │ Collation │ Nullable │ Default +═════════════╪══════╪═══════════╪══════════╪═════════ + room_id │ text │ │ not null │ + server_name │ text │ │ not null │ +``` + +Indices, foreign-keys and check constraints are omitted for brevity. +</details> + +While partially joined to a room, Synapse receives events `E` from remote +homeservers as normal, and can create events at the request of its local users. +However, we run into trouble when we enforce the [checks on an event]. + +> 1. Is a valid event, otherwise it is dropped. For an event to be valid, it + must contain a room_id, and it must comply with the event format of that +> room version. +> 2. Passes signature checks, otherwise it is dropped. +> 3. Passes hash checks, otherwise it is redacted before being processed further. +> 4. Passes authorization rules based on the event’s auth events, otherwise it +> is rejected. +> 5. **Passes authorization rules based on the state before the event, otherwise +> it is rejected.** +> 6. **Passes authorization rules based on the current state of the room, +> otherwise it is “soft failed”.** + +[checks on an event]: https://spec.matrix.org/v1.5/server-server-api/#checks-performed-on-receipt-of-a-pdu + +We can enforce checks 1--4 without any problems. +But we cannot enforce checks 5 or 6 with complete certainty, since Synapse does +not know the full state before `E`, nor that of the room. + +### Partial state + +Instead, we make a best-effort approximation. +While the room is considered partially joined, Synapse tracks the "partial +state" before events. +This works in a similar way as regular state: + +- The partial state before `J` is that given to us by the partial join response. +- The partial state before an event `E` is the resolution of the partial states + after each of `E`'s `prev_event`s. +- If `E` is rejected or a message event, the partial state after `E` is the + partial state before `E`. +- Otherwise, the partial state after `E` is the partial state before `E`, plus + `E` itself. + +More concisely, partial state propagates just like full state; the only +difference is that we "seed" it with an incomplete initial state. +Synapse records that we have only calculated partial state for this event with +a row in `partial_state_events`. + +While the room remains partially stated, check 5 on incoming events to that +room becomes: + +> 5. Passes authorization rules based on **the resolution between the partial +> state before `E` and `E`'s auth events.** If the event fails to pass +> authorization rules, it is rejected. + +Additionally, check 6 is deleted: no soft-failures are enforced. + +While partially joined, the current partial state of the room is defined as the +resolution across the partial states after all forward extremities in the room. + +_Remark._ Events with partial state are _not_ considered +[outliers](../room-dag-concepts.md#outliers). + +### Approximation error + +Using partial state means the auth checks can fail in a few different ways[^2]. + +[^2]: Is this exhaustive? + +- We may erroneously accept an incoming event in check 5 based on partial state + when it would have been rejected based on full state, or vice versa. +- This means that an event could erroneously be added to the current partial + state of the room when it would not be present in the full state of the room, + or vice versa. +- Additionally, we may have skipped soft-failing an event that would have been + soft-failed based on full state. + +(Note that the discrepancies described in the last two bullets are user-visible.) + +This means that we have to be very careful when we want to lookup pieces of room +state in a partially-joined room. Our approximation of the state may be +incorrect or missing. But we can make some educated guesses. If + +- our partial state is likely to be correct, or +- the consequences of our partial state being incorrect are minor, + +then we proceed as normal, and let the resync process fix up any mistakes (see +below). + +When is our partial state likely to be correct? + +- It's more accurate the closer we are to the partial join event. (So we should + ideally complete the resync as soon as possible.) +- Non-member events: we will have received them as part of the partial join + response, if they were part of the room state at that point. We may + incorrectly accept or reject updates to that state (at first because we lack + remote membership information; later because of compounding errors), so these + can become incorrect over time. +- Local members' memberships: we are the only ones who can create join and + knock events for our users. We can't be completely confident in the + correctness of bans, invites and kicks from other homeservers, but the resync + process should correct any mistakes. +- Remote members' memberships: we did not receive these in the /send_join + response, so we have essentially no idea if these are correct or not. + +In short, we deem it acceptable to trust the partial state for non-membership +and local membership events. For remote membership events, we wait for the +resync to complete, at which point we have the full state of the room and can +proceed as normal. + +### Fixing the approximation with a resync + +The partial-state approximation is only a temporary affair. In the background, +synapse beings a "resync" process. This is a continuous loop, starting at the +partial join event and proceeding downwards through the event graph. For each +`E` seen in the room since partial join, Synapse will fetch + +- the event ids in the state of the room before `E`, via + [`/state_ids`](https://spec.matrix.org/v1.5/server-server-api/#get_matrixfederationv1state_idsroomid); +- the event ids in the full auth chain of `E`, included in the `/state_ids` + response; and +- any events from the previous two bullets that Synapse hasn't persisted, via + [`/state](https://spec.matrix.org/v1.5/server-server-api/#get_matrixfederationv1stateroomid). + +This means Synapse has (or can compute) the full state before `E`, which allows +Synapse to properly authorise or reject `E`. At this point ,the event +is considered to have "full state" rather than "partial state". We record this +by removing `E` from the `partial_state_events` table. + +\[**TODO:** Does Synapse persist a new state group for the full state +before `E`, or do we alter the (partial-)state group in-place? Are state groups +ever marked as partially-stated? \] + +This scheme means it is possible for us to have accepted and sent an event to +clients, only to reject it during the resync. From a client's perspective, the +effect is similar to a retroactive +state change due to state resolution---i.e. a "state reset".[^3] + +[^3]: Clients should refresh caches to detect such a change. Rumour has it that +sliding sync will fix this. + +When all events since the join `J` have been fully-stated, the room resync +process is complete. We record this by removing the room from +`partial_state_rooms`. + +## Faster joins on workers + +For the time being, the resync process happens on the master worker. +A new replication stream `un_partial_stated_room` is added. Whenever a resync +completes and a partial-state room becomes fully stated, a new message is sent +into that stream containing the room ID. + +## Notes on specific cases + +> **NB.** The notes below are rough. Some of them are hidden under `<details>` +disclosures because they have yet to be implemented in mainline Synapse. + +### Creating events during a partial join + +When sending out messages during a partial join, we assume our partial state is +accurate and proceed as normal. For this to have any hope of succeeding at all, +our partial state must contain an entry for each of the (type, state key) pairs +[specified by the auth rules](https://spec.matrix.org/v1.3/rooms/v10/#authorization-rules): + +- `m.room.create` +- `m.room.join_rules` +- `m.room.power_levels` +- `m.room.third_party_invite` +- `m.room.member` + +The first four of these should be present in the state before `J` that is given +to us in the partial join response; only membership events are omitted. In order +for us to consider the user joined, we must have their membership event. That +means the only possible omission is the target's membership in an invite, kick +or ban. + +The worst possibility is that we locally invite someone who is banned according to +the full state, because we lack their ban in our current partial state. The rest +of the federation---at least, those who are fully joined---should correctly +enforce the [membership transition constraints]( + https://spec.matrix.org/v1.3/client-server-api/#room-membership +). So any the erroneous invite should be ignored by fully-joined +homeservers and resolved by the resync for partially-joined homeservers. + + + +In more generality, there are two problems we're worrying about here: + +- We might create an event that is valid under our partial state, only to later + find out that is actually invalid according to the full state. +- Or: we might refuse to create an event that is invalid under our partial + state, even though it would be perfectly valid under the full state. + +However we expect such problems to be unlikely in practise, because + +- We trust that the room has sensible power levels, e.g. that bad actors with + high power levels are demoted before their ban. +- We trust that the resident server provides us up-to-date power levels, join + rules, etc. +- State changes in rooms are relatively infrequent, and the resync period is + relatively quick. + +#### Sending out the event over federation + +**TODO:** needs prose fleshing out. + +Normally: send out in a fed txn to all HSes in the room. +We only know that some HSes were in the room at some point. Wat do. +Send it out to the list of servers from the first join. +**TODO** what do we do here if we have full state? +If the prev event was created by us, we can risk sending it to the wrong HS. (Motivation: privacy concern of the content. Not such a big deal for a public room or an encrypted room. But non-encrypted invite-only...) +But don't want to send out sensitive data in other HS's events in this way. + +Suppose we discover after resync that we shouldn't have sent out one our events (not a prev_event) to a target HS. Not much we can do. +What about if we didn't send them an event but shouldn't've? +E.g. what if someone joined from a new HS shortly after you did? We wouldn't talk to them. +Could imagine sending out the "Missed" events after the resync but... painful to work out what they should have seen if they joined/left. +Instead, just send them the latest event (if they're still in the room after resync) and let them backfill.(?) +- Don't do this currently. +- If anyone who has received our messages sends a message to a HS we missed, they can backfill our messages +- Gap: rooms which are infrequently used and take a long time to resync. + +### Joining after a partial join + +**NB.** Not yet implemented. + +<details> + +**TODO:** needs prose fleshing out. Liase with Matthieu. Explain why /send_join +(Rich was surprised we didn't just create it locally. Answer: to try and avoid +a join which then gets rejected after resync.) + +We don't know for sure that any join we create would be accepted. +E.g. the joined user might have been banned; the join rules might have changed in a way that we didn't realise... some way in which the partial state was mistaken. +Instead, do another partial make-join/send-join handshake to confirm that the join works. +- Probably going to get a bunch of duplicate state events and auth events.... but the point of partial joins is that these should be small. Many are already persisted = good. +- What if the second send_join response includes a different list of reisdent HSes? Could ignore it. + - Could even have a special flag that says "just make me a join", i.e. don't bother giving me state or servers in room. Deffo want the auth chain tho. +- SQ: wrt device lists it's a lot safer to ignore it!!!!! +- What if the state at the second join is inconsistent with what we have? Ignore it? + +</details> + +### Leaving (and kicks and bans) after a partial join + +**NB.** Not yet implemented. + +<details> + +When you're fully joined to a room, to have `U` leave a room their homeserver +needs to + +- create a new leave event for `U` which will be accepted by other homeservers, + and +- send that event `U` out to the homeservers in the federation. + +When is a leave event accepted? See +[v10 auth rules](https://spec.matrix.org/v1.5/rooms/v10/#authorization-rules): + +> 4. If type is m.room.member: [...] + > + > 5. If membership is leave: + > + > 1. If the sender matches state_key, allow if and only if that user’s current membership state is invite, join, or knock. +> 2. [...] + +I think this means that (well-formed!) self-leaves are governed entirely by +4.5.1. This means that if we correctly calculate state which says that `U` is +invited, joined or knocked and include it in the leave's auth events, our event +is accepted by checks 4 and 5 on incoming events. + +> 4. Passes authorization rules based on the event’s auth events, otherwise + > it is rejected. +> 5. Passes authorization rules based on the state before the event, otherwise + > it is rejected. + +The only way to fail check 6 is if the receiving server's current state of the +room says that `U` is banned, has left, or has no membership event. But this is +fine: the receiving server already thinks that `U` isn't in the room. + +> 6. Passes authorization rules based on the current state of the room, + > otherwise it is “soft failed”. + +For the second point (publishing the leave event), the best thing we can do is +to is publish to all HSes we know to be currently in the room. If they miss that +event, they might send us traffic in the room that we don't care about. This is +a problem with leaving after a "full" join; we don't seek to fix this with +partial joins. + +(With that said: there's nothing machine-readable in the /send response. I don't +think we can deduce "destination has left the room" from a failure to /send an +event into that room?) + +#### Can we still do this during a partial join? + +We can create leave events and can choose what gets included in our auth events, +so we can be sure that we pass check 4 on incoming events. For check 5, we might +have an incorrect view of the state before an event. +The only way we might erroneously think a leave is valid is if + +- the partial state before the leave has `U` joined, invited or knocked, but +- the full state before the leave has `U` banned, left or not present, + +in which case the leave doesn't make anything worse: other HSes already consider +us as not in the room, and will continue to do so after seeing the leave. + +The remaining obstacle is then: can we safely broadcast the leave event? We may +miss servers or incorrectly think that a server is in the room. Or the +destination server may be offline and miss the transaction containing our leave +event.This should self-heal when they see an event whose `prev_events` descends +from our leave. + +Another option we considered was to use federation `/send_leave` to ask a +fully-joined server to send out the event on our behalf. But that introduces +complexity without much benefit. Besides, as Rich put it, + +> sending out leaves is pretty best-effort currently + +so this is probably good enough as-is. + +#### Cleanup after the last leave + +**TODO**: what cleanup is necessary? Is it all just nice-to-have to save unused +work? +</details> diff --git a/docs/development/synapse_architecture/streams.md b/docs/development/synapse_architecture/streams.md new file mode 100644 index 0000000000..bee0b8a8c0 --- /dev/null +++ b/docs/development/synapse_architecture/streams.md @@ -0,0 +1,157 @@ +## Streams + +Synapse has a concept of "streams", which are roughly described in [`id_generators.py`]( + https://github.com/matrix-org/synapse/blob/develop/synapse/storage/util/id_generators.py +). +Generally speaking, streams are a series of notifications that something in Synapse's database has changed that the application might need to respond to. +For example: + +- The events stream reports new events (PDUs) that Synapse creates, or that Synapse accepts from another homeserver. +- The account data stream reports changes to users' [account data](https://spec.matrix.org/v1.7/client-server-api/#client-config). +- The to-device stream reports when a device has a new [to-device message](https://spec.matrix.org/v1.7/client-server-api/#send-to-device-messaging). + +See [`synapse.replication.tcp.streams`]( + https://github.com/matrix-org/synapse/blob/develop/synapse/replication/tcp/streams/__init__.py +) for the full list of streams. + +It is very helpful to understand the streams mechanism when working on any part of Synapse that needs to respond to changes—especially if those changes are made by different workers. +To that end, let's describe streams formally, paraphrasing from the docstring of [`AbstractStreamIdGenerator`]( + https://github.com/matrix-org/synapse/blob/a719b703d9bd0dade2565ddcad0e2f3a7a9d4c37/synapse/storage/util/id_generators.py#L96 +). + +### Definition + +A stream is an append-only log `T1, T2, ..., Tn, ...` of facts[^1] which grows over time. +Only "writers" can add facts to a stream, and there may be multiple writers. + +Each fact has an ID, called its "stream ID". +Readers should only process facts in ascending stream ID order. + +Roughly speaking, each stream is backed by a database table. +It should have a `stream_id` (or similar) bigint column holding stream IDs, plus additional columns as necessary to describe the fact. +Typically, a fact is expressed with a single row in its backing table.[^2] +Within a stream, no two facts may have the same stream_id. + +> _Aside_. Some additional notes on streams' backing tables. +> +> 1. Rich would like to [ditch the backing tables](https://github.com/matrix-org/synapse/issues/13456). +> 2. The backing tables may have other uses. + > For example, the events table serves backs the events stream, and is read when processing new events. + > But old rows are read from the table all the time, whenever Synapse needs to lookup some facts about an event. +> 3. Rich suspects that sometimes the stream is backed by multiple tables, so the stream proper is the union of those tables. + +Stream writers can "reserve" a stream ID, and then later mark it as having being completed. +Stream writers need to track the completion of each stream fact. +In the happy case, completion means a fact has been written to the stream table. +But unhappy cases (e.g. transaction rollback due to an error) also count as completion. +Once completed, the rows written with that stream ID are fixed, and no new rows +will be inserted with that ID. + +### Current stream ID + +For any given stream reader (including writers themselves), we may define a per-writer current stream ID: + +> The current stream ID _for a writer W_ is the largest stream ID such that +> all transactions added by W with equal or smaller ID have completed. + +Similarly, there is a "linear" notion of current stream ID: + +> The "linear" current stream ID is the largest stream ID such that +> all facts (added by any writer) with equal or smaller ID have completed. + +Because different stream readers A and B learn about new facts at different times, A and B may disagree about current stream IDs. +Put differently: we should think of stream readers as being independent of each other, proceeding through a stream of facts at different rates. + +**NB.** For both senses of "current", that if a writer opens a transaction that never completes, the current stream ID will never advance beyond that writer's last written stream ID. + +For single-writer streams, the per-writer current ID and the linear current ID are the same. +Both senses of current ID are monotonic, but they may "skip" or jump over IDs because facts complete out of order. + + +_Example_. +Consider a single-writer stream which is initially at ID 1. + +| Action | Current stream ID | Notes | +|------------|-------------------|-------------------------------------------------| +| | 1 | | +| Reserve 2 | 1 | | +| Reserve 3 | 1 | | +| Complete 3 | 1 | current ID unchanged, waiting for 2 to complete | +| Complete 2 | 3 | current ID jumps from 1 -> 3 | +| Reserve 4 | 3 | | +| Reserve 5 | 3 | | +| Reserve 6 | 3 | | +| Complete 5 | 3 | | +| Complete 4 | 5 | current ID jumps 3->5, even though 6 is pending | +| Complete 6 | 6 | | + + +### Multi-writer streams + +There are two ways to view a multi-writer stream. + +1. Treat it as a collection of distinct single-writer streams, one + for each writer. +2. Treat it as a single stream. + +The single stream (option 2) is conceptually simpler, and easier to represent (a single stream id). +However, it requires each reader to know about the entire set of writers, to ensures that readers don't erroneously advance their current stream position too early and miss a fact from an unknown writer. +In contrast, multiple parallel streams (option 1) are more complex, requiring more state to represent (map from writer to stream id). +The payoff for doing so is that readers can "peek" ahead to facts that completed on one writer no matter the state of the others, reducing latency. + +Note that a multi-writer stream can be viewed in both ways. +For example, the events stream is treated as multiple single-writer streams (option 1) by the sync handler, so that events are sent to clients as soon as possible. +But the background process that works through events treats them as a single linear stream. + +Another useful example is the cache invalidation stream. +The facts this stream holds are instructions to "you should now invalidate these cache entries". +We only ever treat this as a multiple single-writer streams as there is no important ordering between cache invalidations. +(Invalidations are self-contained facts; and the invalidations commute/are idempotent). + +### Writing to streams + +Writers need to track: + - track their current position (i.e. its own per-writer stream ID). + - their facts currently awaiting completion. + +At startup, + - the current position of that writer can be found by querying the database (which suggests that facts need to be written to the database atomically, in a transaction); and + - there are no facts awaiting completion. + +To reserve a stream ID, call [`nextval`](https://www.postgresql.org/docs/current/functions-sequence.html) on the appropriate postgres sequence. + +To write a fact to the stream: insert the appropriate rows to the appropriate backing table. + +To complete a fact, first remove it from your map of facts currently awaiting completion. +Then, if no earlier fact is awaiting completion, the writer can advance its current position in that stream. +Upon doing so it should emit an `RDATA` message[^3], once for every fact between the old and the new stream ID. + +### Subscribing to streams + +Readers need to track the current position of every writer. + +At startup, they can find this by contacting each writer with a `REPLICATE` message, +requesting that all writers reply describing their current position in their streams. +Writers reply with a `POSITION` message. + +To learn about new facts, readers should listen for `RDATA` messages and process them to respond to the new fact. +The `RDATA` itself is not a self-contained representation of the fact; +readers will have to query the stream tables for the full details. +Readers must also advance their record of the writer's current position for that stream. + +# Summary + +In a nutshell: we have an append-only log with a "buffer/scratchpad" at the end where we have to wait for the sequence to be linear and contiguous. + + +--- + +[^1]: we use the word _fact_ here for two reasons. +Firstly, the word "event" is already heavily overloaded (PDUs, EDUs, account data, ...) and we don't need to make that worse. +Secondly, "fact" emphasises that the things we append to a stream cannot change after the fact. + +[^2]: A fact might be expressed with 0 rows, e.g. if we opened a transaction to persist an event, but failed and rolled the transaction back before marking the fact as completed. +In principle a fact might be expressed with 2 or more rows; if so, each of those rows should share the fact's stream ID. + +[^3]: This communication used to happen directly with the writers [over TCP](../../tcp_replication.md); +nowadays it's done via Redis's Pubsub. |