summary refs log tree commit diff
path: root/docs/development
diff options
context:
space:
mode:
Diffstat (limited to 'docs/development')
-rw-r--r--docs/development/contributing_guide.md66
-rw-r--r--docs/development/database_schema.md34
-rw-r--r--docs/development/dependencies.md33
-rw-r--r--docs/development/releases.md4
-rw-r--r--docs/development/synapse_architecture/faster_joins.md375
-rw-r--r--docs/development/synapse_architecture/streams.md157
6 files changed, 614 insertions, 55 deletions
diff --git a/docs/development/contributing_guide.md b/docs/development/contributing_guide.md
index 342bc1d340..4ae2fcfee3 100644
--- a/docs/development/contributing_guide.md
+++ b/docs/development/contributing_guide.md
@@ -22,15 +22,17 @@ on Windows is not officially supported.
 
 The code of Synapse is written in Python 3. To do pretty much anything, you'll need [a recent version of Python 3](https://www.python.org/downloads/). Your Python also needs support for [virtual environments](https://docs.python.org/3/library/venv.html). This is usually built-in, but some Linux distributions like Debian and Ubuntu split it out into its own package. Running `sudo apt install python3-venv` should be enough.
 
+A recent version of the Rust compiler is needed to build the native modules. The
+easiest way of installing the latest version is to use [rustup](https://rustup.rs/).
+
 Synapse can connect to PostgreSQL via the [psycopg2](https://pypi.org/project/psycopg2/) Python library. Building this library from source requires access to PostgreSQL's C header files. On Debian or Ubuntu Linux, these can be installed with `sudo apt install libpq-dev`.
 
+Synapse has an optional, improved user search with better Unicode support. For that you need the development package of `libicu`. On Debian or Ubuntu Linux, this can be installed with `sudo apt install libicu-dev`.
+
 The source code of Synapse is hosted on GitHub. You will also need [a recent version of git](https://github.com/git-guides/install-git).
 
 For some tests, you will need [a recent version of Docker](https://docs.docker.com/get-docker/).
 
-A recent version of the Rust compiler is needed to build the native modules. The
-easiest way of installing the latest version is to use [rustup](https://rustup.rs/).
-
 
 # 3. Get the source.
 
@@ -51,6 +53,11 @@ can find many good git tutorials on the web.
 
 # 4. Install the dependencies
 
+
+Before installing the Python dependencies, make sure you have installed a recent version
+of Rust (see the "What do I need?" section above). The easiest way of installing the
+latest version is to use [rustup](https://rustup.rs/).
+
 Synapse uses the [poetry](https://python-poetry.org/) project to manage its dependencies
 and development environment. Once you have installed Python 3 and added the
 source, you should install `poetry`.
@@ -65,7 +72,7 @@ pipx install poetry
 but see poetry's [installation instructions](https://python-poetry.org/docs/#installation)
 for other installation methods.
 
-Synapse requires Poetry version 1.2.0 or later.
+Developing Synapse requires Poetry version 1.3.2 or later.
 
 Next, open a terminal and install dependencies as follows:
 
@@ -74,8 +81,39 @@ cd path/where/you/have/cloned/the/repository
 poetry install --extras all
 ```
 
-This will install the runtime and developer dependencies for the project.
+This will install the runtime and developer dependencies for the project.  Be sure to check
+that the `poetry install` step completed cleanly.
+
+## Running Synapse via poetry
+
+To start a local instance of Synapse in the locked poetry environment, create a config file:
+
+```sh
+cp docs/sample_config.yaml homeserver.yaml
+cp docs/sample_log_config.yaml log_config.yaml
+```
+
+Now edit `homeserver.yaml`, things you might want to change include:
+
+- Set a `server_name`
+- Adjusting paths to be correct for your system like the `log_config` to point to the log config you just copied
+- Using a [PostgreSQL database instead of SQLite](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#database)
+- Adding a [`registration_shared_secret`](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#registration_shared_secret) so you can use [`register_new_matrix_user` command](https://matrix-org.github.io/synapse/latest/setup/installation.html#registering-a-user).
+
+And then run Synapse with the following command:
+
+```sh
+poetry run python -m synapse.app.homeserver -c homeserver.yaml
+```
+
+If you get an error like the following:
+
+```
+importlib.metadata.PackageNotFoundError: matrix-synapse
+```
 
+this probably indicates that the `poetry install` step did not complete cleanly - go back and
+resolve any issues and re-run until successful.
 
 # 5. Get in touch.
 
@@ -104,8 +142,8 @@ regarding Synapse's Admin API, which is used mostly by sysadmins and external
 service developers.
 
 Synapse's code style is documented [here](../code_style.md). Please follow
-it, including the conventions for the [sample configuration
-file](../code_style.md#configuration-file-format).
+it, including the conventions for [configuration
+options and documentation](../code_style.md#configuration-code-and-documentation-format).
 
 We welcome improvements and additions to our documentation itself! When
 writing new pages, please
@@ -124,7 +162,7 @@ changes to the Rust code.
 
 
 # 8. Test, test, test!
-<a name="test-test-test"></a>
+<a name="test-test-test" id="test-test-test"></a>
 
 While you're developing and before submitting a patch, you'll
 want to test your code.
@@ -228,7 +266,7 @@ The easiest way to do so is to run Postgres via a docker container. In one
 terminal:
 
 ```shell
-docker run --rm -e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_USER=postgres -e POSTGRES_DB=postgress -p 5432:5432 postgres:14
+docker run --rm -e POSTGRES_PASSWORD=mysecretpassword -e POSTGRES_USER=postgres -e POSTGRES_DB=postgres -p 5432:5432 postgres:14
 ```
 
 If you see an error like
@@ -284,7 +322,7 @@ The following command will let you run the integration test with the most common
 configuration:
 
 ```sh
-$ docker run --rm -it -v /path/where/you/have/cloned/the/repository\:/src:ro -v /path/to/where/you/want/logs\:/logs matrixdotorg/sytest-synapse:buster
+$ docker run --rm -it -v /path/where/you/have/cloned/the/repository\:/src:ro -v /path/to/where/you/want/logs\:/logs matrixdotorg/sytest-synapse:focal
 ```
 (Note that the paths must be full paths! You could also write `$(realpath relative/path)` if needed.)
 
@@ -330,6 +368,9 @@ The above will run a monolithic (single-process) Synapse with SQLite as the data
     [here](https://github.com/matrix-org/synapse/blob/develop/docker/configure_workers_and_start.py#L54).
     A safe example would be `WORKER_TYPES="federation_inbound, federation_sender, synchrotron"`.
     See the [worker documentation](../workers.md) for additional information on workers.
+- Passing `ASYNCIO_REACTOR=1` as an environment variable to use the Twisted asyncio reactor instead of the default one.
+- Passing `PODMAN=1` will use the [podman](https://podman.io/) container runtime, instead of docker.
+- Passing `UNIX_SOCKETS=1` will utilise Unix socket functionality for Synapse, Redis, and Postgres(when applicable).
 
 To increase the log level for the tests, set `SYNAPSE_TEST_LOG_LEVEL`, e.g:
 ```sh
@@ -380,7 +421,7 @@ To prepare a Pull Request, please:
 ## Changelog
 
 All changes, even minor ones, need a corresponding changelog / newsfragment
-entry. These are managed by [Towncrier](https://github.com/hawkowl/towncrier).
+entry. These are managed by [Towncrier](https://github.com/twisted/towncrier).
 
 To create a changelog entry, make a new file in the `changelog.d` directory named
 in the format of `PRnumber.type`. The type can be one of the following:
@@ -422,8 +463,7 @@ chicken-and-egg problem.
 There are two options for solving this:
 
 1. Open the PR without a changelog file, see what number you got, and *then*
-   add the changelog file to your branch (see [Updating your pull
-   request](#updating-your-pull-request)), or:
+   add the changelog file to your branch, or:
 
 1. Look at the [list of all
    issues/PRs](https://github.com/matrix-org/synapse/issues?q=), add one to the
diff --git a/docs/development/database_schema.md b/docs/development/database_schema.md
index 29945c264e..e231be21dd 100644
--- a/docs/development/database_schema.md
+++ b/docs/development/database_schema.md
@@ -155,43 +155,11 @@ def run_upgrade(
 Boolean columns require special treatment, since SQLite treats booleans the
 same as integers.
 
-There are three separate aspects to this:
-
- * Any new boolean column must be added to the `BOOLEAN_COLUMNS` list in
+Any new boolean column must be added to the `BOOLEAN_COLUMNS` list in
    `synapse/_scripts/synapse_port_db.py`. This tells the port script to cast
    the integer value from SQLite to a boolean before writing the value to the
    postgres database.
 
- * Before SQLite 3.23, `TRUE` and `FALSE` were not recognised as constants by
-   SQLite, and the `IS [NOT] TRUE`/`IS [NOT] FALSE` operators were not
-   supported. This makes it necessary to avoid using `TRUE` and `FALSE`
-   constants in SQL commands.
-
-   For example, to insert a `TRUE` value into the database, write:
-
-   ```python
-   txn.execute("INSERT INTO tbl(col) VALUES (?)", (True, ))
-   ```
-
- * Default values for new boolean columns present a particular
-   difficulty. Generally it is best to create separate schema files for
-   Postgres and SQLite. For example:
-
-   ```sql
-   # in 00delta.sql.postgres:
-   ALTER TABLE tbl ADD COLUMN col BOOLEAN DEFAULT FALSE;
-   ```
-
-   ```sql
-   # in 00delta.sql.sqlite:
-   ALTER TABLE tbl ADD COLUMN col BOOLEAN DEFAULT 0;
-   ```
-
-   Note that there is a particularly insidious failure mode here: the Postgres
-   flavour will be accepted by SQLite 3.22, but will give a column whose
-   default value is the **string** `"FALSE"` - which, when cast back to a boolean
-   in Python, evaluates to `True`.
-
 
 ## `event_id` global uniqueness
 
diff --git a/docs/development/dependencies.md b/docs/development/dependencies.md
index 8474525480..b5926d96ff 100644
--- a/docs/development/dependencies.md
+++ b/docs/development/dependencies.md
@@ -2,6 +2,13 @@
 
 This is a quick cheat sheet for developers on how to use [`poetry`](https://python-poetry.org/).
 
+# Installing
+
+See the [contributing guide](contributing_guide.md#4-install-the-dependencies).
+
+Developers should use Poetry 1.3.2 or higher. If you encounter problems related
+to poetry, please [double-check your poetry version](#check-the-version-of-poetry-with-poetry---version).
+
 # Background
 
 Synapse uses a variety of third-party Python packages to function as a homeserver.
@@ -123,7 +130,7 @@ context of poetry's venv, without having to run `poetry shell` beforehand.
 ## ...reset my venv to the locked environment?
 
 ```shell
-poetry install --extras all --remove-untracked
+poetry install --all-extras --sync
 ```
 
 ## ...delete everything and start over from scratch?
@@ -183,7 +190,6 @@ Either:
 - manually update `pyproject.toml`; then `poetry lock --no-update`; or else
 - `poetry add packagename`. See `poetry add --help`; note the `--dev`,
   `--extras` and `--optional` flags in particular.
-  - **NB**: this specifies the new package with a version given by a "caret bound". This won't get forced to its lowest version in the old deps CI job: see [this TODO](https://github.com/matrix-org/synapse/blob/4e1374373857f2f7a911a31c50476342d9070681/.ci/scripts/test_old_deps.sh#L35-L39).
 
 Include the updated `pyproject.toml` and `poetry.lock` files in your commit.
 
@@ -196,7 +202,7 @@ poetry remove packagename
 ```
 
 ought to do the trick. Alternatively, manually update `pyproject.toml` and
-`poetry lock --no-update`. Include the updated `pyproject.toml` and poetry.lock`
+`poetry lock --no-update`. Include the updated `pyproject.toml` and `poetry.lock`
 files in your commit.
 
 ## ...update the version range for an existing dependency?
@@ -240,9 +246,6 @@ poetry export --extras all
 
 Be wary of bugs in `poetry export` and `pip install -r requirements.txt`.
 
-Note: `poetry export` will be made a plugin in Poetry 1.2. Additional config may
-be required.
-
 ## ...build a test wheel?
 
 I usually use
@@ -255,12 +258,28 @@ because [`build`](https://github.com/pypa/build) is a standardish tool which
 doesn't require poetry. (It's what we use in CI too). However, you could try
 `poetry build` too.
 
+## ...handle a Dependabot pull request?
+
+Synapse uses Dependabot to keep the `poetry.lock` and `Cargo.lock` file 
+up-to-date with the latest releases of our dependencies. The changelog check is
+omitted for Dependabot PRs; the release script will include them in the 
+changelog.
+
+When reviewing a dependabot PR, ensure that:
+
+* the lockfile changes look reasonable;
+* the upstream changelog file (linked in the description) doesn't include any
+  breaking changes;
+* continuous integration passes.
+
+In particular, any updates to the type hints (usually packages which start with `types-`)
+should be safe to merge if linting passes.
 
 # Troubleshooting
 
 ## Check the version of poetry with `poetry --version`.
 
-The minimum version of poetry supported by Synapse is 1.2.
+The minimum version of poetry supported by Synapse is 1.3.2.
 
 It can also be useful to check the version of `poetry-core` in use. If you've
 installed `poetry` with `pipx`, try `pipx runpip poetry list | grep
diff --git a/docs/development/releases.md b/docs/development/releases.md
index c9a8c69945..6e83c81e27 100644
--- a/docs/development/releases.md
+++ b/docs/development/releases.md
@@ -12,7 +12,7 @@ Note that this schedule might be modified depending on the availability of the
 Synapse team, e.g. releases may be skipped to avoid holidays.
 
 Release announcements can be found in the
-[release category of the Matrix blog](https://matrix.org/blog/category/releases).
+[release category of the Matrix blog](https://matrix.org/category/releases).
 
 ## Bugfix releases
 
@@ -34,4 +34,4 @@ be held to be released together.
 
 In some cases, a pre-disclosure of a security release will be issued as a notice
 to Synapse operators that there is an upcoming security release. These can be
-found in the [security category of the Matrix blog](https://matrix.org/blog/category/security).
+found in the [security category of the Matrix blog](https://matrix.org/category/security).
diff --git a/docs/development/synapse_architecture/faster_joins.md b/docs/development/synapse_architecture/faster_joins.md
new file mode 100644
index 0000000000..2256c30239
--- /dev/null
+++ b/docs/development/synapse_architecture/faster_joins.md
@@ -0,0 +1,375 @@
+# How do faster joins work?
+
+This is a work-in-progress set of notes with two goals:
+- act as a reference, explaining how Synapse implements faster joins; and
+- record the rationale behind our choices.
+
+See also [MSC3902](https://github.com/matrix-org/matrix-spec-proposals/pull/3902).
+
+The key idea is described by [MSC3706](https://github.com/matrix-org/matrix-spec-proposals/pull/3706). This allows servers to
+request a lightweight response to the federation `/send_join` endpoint.
+This is called a **faster join**, also known as a **partial join**. In these
+notes we'll usually use the word "partial" as it matches the database schema.
+
+## Overview: processing events in a partially-joined room
+
+The response to a partial join consists of
+- the requested join event `J`,
+- a list of the servers in the room (according to the state before `J`),
+- a subset of the state of the room before `J`,
+- the full auth chain of that state subset.
+
+Synapse marks the room as partially joined by adding a row to the database table
+`partial_state_rooms`. It also marks the join event `J` as "partially stated",
+meaning that we have neither received nor computed the full state before/after
+`J`. This is done by adding a row to `partial_state_events`.
+
+<details><summary>DB schema</summary>
+
+```
+matrix=> \d partial_state_events
+Table "matrix.partial_state_events"
+  Column  │ Type │ Collation │ Nullable │ Default
+══════════╪══════╪═══════════╪══════════╪═════════
+ room_id  │ text │           │ not null │
+ event_id │ text │           │ not null │
+ 
+matrix=> \d partial_state_rooms
+                Table "matrix.partial_state_rooms"
+         Column         │  Type  │ Collation │ Nullable │ Default 
+════════════════════════╪════════╪═══════════╪══════════╪═════════
+ room_id                │ text   │           │ not null │ 
+ device_lists_stream_id │ bigint │           │ not null │ 0
+ join_event_id          │ text   │           │          │ 
+ joined_via             │ text   │           │          │ 
+
+matrix=> \d partial_state_rooms_servers
+     Table "matrix.partial_state_rooms_servers"
+   Column    │ Type │ Collation │ Nullable │ Default 
+═════════════╪══════╪═══════════╪══════════╪═════════
+ room_id     │ text │           │ not null │ 
+ server_name │ text │           │ not null │ 
+```
+
+Indices, foreign-keys and check constraints are omitted for brevity.
+</details>
+
+While partially joined to a room, Synapse receives events `E` from remote
+homeservers as normal, and can create events at the request of its local users.
+However, we run into trouble when we enforce the [checks on an event].
+
+> 1. Is a valid event, otherwise it is dropped. For an event to be valid, it
+     must contain a room_id, and it must comply with the event format of that
+>    room version.
+> 2. Passes signature checks, otherwise it is dropped.
+> 3. Passes hash checks, otherwise it is redacted before being processed further.
+> 4. Passes authorization rules based on the event’s auth events, otherwise it
+>    is rejected.
+> 5. **Passes authorization rules based on the state before the event, otherwise
+>    it is rejected.**
+> 6. **Passes authorization rules based on the current state of the room,
+>    otherwise it is “soft failed”.**
+
+[checks on an event]: https://spec.matrix.org/v1.5/server-server-api/#checks-performed-on-receipt-of-a-pdu
+
+We can enforce checks 1--4 without any problems.
+But we cannot enforce checks 5 or 6 with complete certainty, since Synapse does
+not know the full state before `E`, nor that of the room.
+
+### Partial state
+
+Instead, we make a best-effort approximation.
+While the room is considered partially joined, Synapse tracks the "partial
+state" before events.
+This works in a similar way as regular state:
+
+- The partial state before `J` is that given to us by the partial join response.
+- The partial state before an event `E` is the resolution of the partial states
+  after each of `E`'s `prev_event`s.
+- If `E` is rejected or a message event, the partial state after `E` is the
+  partial state before `E`.
+- Otherwise, the partial state after `E` is the partial state before `E`, plus
+  `E` itself.
+
+More concisely, partial state propagates just like full state; the only
+difference is that we "seed" it with an incomplete initial state.
+Synapse records that we have only calculated partial state for this event with
+a row in `partial_state_events`.
+
+While the room remains partially stated, check 5 on incoming events to that
+room becomes:
+
+> 5. Passes authorization rules based on **the resolution between the partial
+>    state before `E` and `E`'s auth events.** If the event fails to pass
+>    authorization rules, it is rejected.
+
+Additionally, check 6 is deleted: no soft-failures are enforced.
+
+While partially joined, the current partial state of the room is defined as the
+resolution across the partial states after all forward extremities in the room.
+
+_Remark._ Events with partial state are _not_ considered
+[outliers](../room-dag-concepts.md#outliers).
+
+### Approximation error
+
+Using partial state means the auth checks can fail in a few different ways[^2].
+
+[^2]: Is this exhaustive?
+
+- We may erroneously accept an incoming event in check 5 based on partial state
+  when it would have been rejected based on full state, or vice versa.
+- This means that an event could erroneously be added to the current partial
+  state of the room when it would not be present in the full state of the room,
+  or vice versa.
+- Additionally, we may have skipped soft-failing an event that would have been
+  soft-failed based on full state.
+
+(Note that the discrepancies described in the last two bullets are user-visible.)
+
+This means that we have to be very careful when we want to lookup pieces of room
+state in a partially-joined room. Our approximation of the state may be
+incorrect or missing. But we can make some educated guesses. If
+
+- our partial state is likely to be correct, or
+- the consequences of our partial state being incorrect are minor,
+
+then we proceed as normal, and let the resync process fix up any mistakes (see
+below).
+
+When is our partial state likely to be correct?
+
+- It's more accurate the closer we are to the partial join event. (So we should
+  ideally complete the resync as soon as possible.)
+- Non-member events: we will have received them as part of the partial join
+  response, if they were part of the room state at that point. We may
+  incorrectly accept or reject updates to that state (at first because we lack
+  remote membership information; later because of compounding errors), so these
+  can become incorrect over time.
+- Local members' memberships: we are the only ones who can create join and
+  knock events for our users. We can't be completely confident in the
+  correctness of bans, invites and kicks from other homeservers, but the resync
+  process should correct any mistakes.
+- Remote members' memberships: we did not receive these in the /send_join
+  response, so we have essentially no idea if these are correct or not.
+
+In short, we deem it acceptable to trust the partial state for non-membership
+and local membership events. For remote membership events, we wait for the
+resync to complete, at which point we have the full state of the room and can
+proceed as normal.
+
+### Fixing the approximation with a resync
+
+The partial-state approximation is only a temporary affair. In the background,
+synapse beings a "resync" process. This is a continuous loop, starting at the
+partial join event and proceeding downwards through the event graph. For each 
+`E` seen in the room since partial join, Synapse will fetch 
+
+- the event ids in the state of the room before `E`, via 
+  [`/state_ids`](https://spec.matrix.org/v1.5/server-server-api/#get_matrixfederationv1state_idsroomid);
+- the event ids in the full auth chain of `E`, included in the `/state_ids` 
+  response; and
+- any events from the previous two bullets that Synapse hasn't persisted, via
+  [`/state](https://spec.matrix.org/v1.5/server-server-api/#get_matrixfederationv1stateroomid).
+
+This means Synapse has (or can compute) the full state before `E`, which allows
+Synapse to properly authorise or reject `E`. At this point ,the event
+is considered to have "full state" rather than "partial state". We record this
+by removing `E` from the `partial_state_events` table.
+
+\[**TODO:** Does Synapse persist a new state group for the full state
+before `E`, or do we alter the (partial-)state group in-place? Are state groups
+ever marked as partially-stated? \]
+
+This scheme means it is possible for us to have accepted and sent an event to 
+clients, only to reject it during the resync. From a client's perspective, the 
+effect is similar to a retroactive 
+state change due to state resolution---i.e. a "state reset".[^3]
+
+[^3]: Clients should refresh caches to detect such a change. Rumour has it that 
+sliding sync will fix this.
+
+When all events since the join `J` have been fully-stated, the room resync
+process is complete. We record this by removing the room from
+`partial_state_rooms`.
+
+## Faster joins on workers
+
+For the time being, the resync process happens on the master worker.
+A new replication stream `un_partial_stated_room` is added. Whenever a resync
+completes and a partial-state room becomes fully stated, a new message is sent
+into that stream containing the room ID.
+
+## Notes on specific cases
+
+> **NB.** The notes below are rough. Some of them are hidden under `<details>`
+disclosures because they have yet to be implemented in mainline Synapse.
+
+### Creating events during a partial join
+
+When sending out messages during a partial join, we assume our partial state is 
+accurate and proceed as normal. For this to have any hope of succeeding at all,
+our partial state must contain an entry for each of the (type, state key) pairs
+[specified by the auth rules](https://spec.matrix.org/v1.3/rooms/v10/#authorization-rules):
+
+- `m.room.create`
+- `m.room.join_rules`
+- `m.room.power_levels`
+- `m.room.third_party_invite`
+- `m.room.member`
+
+The first four of these should be present in the state before `J` that is given
+to us in the partial join response; only membership events are omitted. In order
+for us to consider the user joined, we must have their membership event. That
+means the only possible omission is the target's membership in an invite, kick
+or ban.
+
+The worst possibility is that we locally invite someone who is banned according to
+the full state, because we lack their ban in our current partial state. The rest 
+of the federation---at least, those who are fully joined---should correctly 
+enforce the [membership transition constraints](
+    https://spec.matrix.org/v1.3/client-server-api/#room-membership
+). So any the erroneous invite should be ignored by fully-joined
+homeservers and resolved by the resync for partially-joined homeservers.
+
+
+
+In more generality, there are two problems we're worrying about here:
+
+- We might create an event that is valid under our partial state, only to later
+  find out that is actually invalid according to the full state.
+- Or: we might refuse to create an event that is invalid under our partial
+  state, even though it would be perfectly valid under the full state.
+
+However we expect such problems to be unlikely in practise, because
+
+- We trust that the room has sensible power levels, e.g. that bad actors with
+  high power levels are demoted before their ban.
+- We trust that the resident server provides us up-to-date power levels, join
+  rules, etc.
+- State changes in rooms are relatively infrequent, and the resync period is
+  relatively quick.
+
+#### Sending out the event over federation
+
+**TODO:** needs prose fleshing out.
+
+Normally: send out in a fed txn to all HSes in the room.
+We only know that some HSes were in the room at some point. Wat do.
+Send it out to the list of servers from the first join.
+**TODO** what do we do here if we have full state?
+If the prev event was created by us, we can risk sending it to the wrong HS. (Motivation: privacy concern of the content. Not such a big deal for a public room or an encrypted room. But non-encrypted invite-only...)
+But don't want to send out sensitive data in other HS's events in this way.
+
+Suppose we discover after resync that we shouldn't have sent out one our events (not a prev_event) to a target HS. Not much we can do.
+What about if we didn't send them an event but shouldn't've?
+E.g. what if someone joined from a new HS shortly after you did? We wouldn't talk to them.
+Could imagine sending out the "Missed" events after the resync but... painful to work out what they should have seen if they joined/left.
+Instead, just send them the latest event (if they're still in the room after resync) and let them backfill.(?)
+- Don't do this currently.
+- If anyone who has received our messages sends a message to a HS we missed, they can backfill our messages
+- Gap: rooms which are infrequently used and take a long time to resync.
+
+### Joining after a partial join
+
+**NB.** Not yet implemented.
+
+<details>
+
+**TODO:** needs prose fleshing out. Liase with Matthieu. Explain why /send_join
+(Rich was surprised we didn't just create it locally. Answer: to try and avoid
+a join which then gets rejected after resync.)
+
+We don't know for sure that any join we create would be accepted.
+E.g. the joined user might have been banned; the join rules might have changed in a way that we didn't realise... some way in which the partial state was mistaken.
+Instead, do another partial make-join/send-join handshake to confirm that the join works.
+- Probably going to get a bunch of duplicate state events and auth events.... but the point of partial joins is that these should be small. Many are already persisted = good.
+- What if the second send_join response includes a different list of reisdent HSes? Could ignore it.
+  - Could even have a special flag that says "just make me a join", i.e. don't bother giving me state or servers in room. Deffo want the auth chain tho.
+- SQ: wrt device lists it's a lot safer to ignore it!!!!!
+- What if the state at the second join is inconsistent with what we have? Ignore it?
+
+</details>
+
+### Leaving (and kicks and bans) after a partial join
+
+**NB.** Not yet implemented.
+
+<details>
+
+When you're fully joined to a room, to have `U` leave a room their homeserver
+needs to
+
+- create a new leave event for `U` which will be accepted by other homeservers,
+  and
+- send that event `U` out to the homeservers in the federation.
+
+When is a leave event accepted? See
+[v10 auth rules](https://spec.matrix.org/v1.5/rooms/v10/#authorization-rules):
+
+> 4. If type is m.room.member: [...]
+     >
+     >    5. If membership is leave:
+             >
+             >       1. If the sender matches state_key, allow if and only if that user’s current membership state is invite, join, or knock.
+>       2. [...]
+
+I think this means that (well-formed!) self-leaves are governed entirely by
+4.5.1. This means that if we correctly calculate state which says that `U` is
+invited, joined or knocked and include it in the leave's auth events, our event
+is accepted by checks 4 and 5 on incoming events.
+
+> 4. Passes authorization rules based on the event’s auth events, otherwise
+     >    it is rejected.
+> 5. Passes authorization rules based on the state before the event, otherwise
+     >    it is rejected.
+
+The only way to fail check 6 is if the receiving server's current state of the
+room says that `U` is banned, has left, or has no membership event. But this is
+fine: the receiving server already thinks that `U` isn't in the room.
+
+> 6. Passes authorization rules based on the current state of the room,
+     >    otherwise it is “soft failed”.
+
+For the second point (publishing the leave event), the best thing we can do is
+to is publish to all HSes we know to be currently in the room. If they miss that
+event, they might send us traffic in the room that we don't care about. This is
+a problem with leaving after a "full" join; we don't seek to fix this with
+partial joins.
+
+(With that said: there's nothing machine-readable in the /send response. I don't
+think we can deduce "destination has left the room" from a failure to /send an
+event into that room?)
+
+#### Can we still do this during a partial join?
+
+We can create leave events and can choose what gets included in our auth events,
+so we can be sure that we pass check 4 on incoming events. For check 5, we might
+have an incorrect view of the state before an event.
+The only way we might erroneously think a leave is valid is if
+
+- the partial state before the leave has `U` joined, invited or knocked, but
+- the full state before the leave has `U` banned, left or not present,
+
+in which case the leave doesn't make anything worse: other HSes already consider
+us as not in the room, and will continue to do so after seeing the leave.
+
+The remaining obstacle is then: can we safely broadcast the leave event? We may
+miss servers or incorrectly think that a server is in the room. Or the
+destination server may be offline and miss the transaction containing our leave
+event.This should self-heal when they see an event whose `prev_events` descends
+from our leave.
+
+Another option we considered was to use federation `/send_leave` to ask a
+fully-joined server to send out the event on our behalf. But that introduces
+complexity without much benefit. Besides, as Rich put it,
+
+> sending out leaves is pretty best-effort currently
+
+so this is probably good enough as-is.
+
+#### Cleanup after the last leave
+
+**TODO**: what cleanup is necessary? Is it all just nice-to-have to save unused
+work?
+</details>
diff --git a/docs/development/synapse_architecture/streams.md b/docs/development/synapse_architecture/streams.md
new file mode 100644
index 0000000000..bee0b8a8c0
--- /dev/null
+++ b/docs/development/synapse_architecture/streams.md
@@ -0,0 +1,157 @@
+## Streams
+
+Synapse has a concept of "streams", which are roughly described in [`id_generators.py`](
+    https://github.com/matrix-org/synapse/blob/develop/synapse/storage/util/id_generators.py
+).
+Generally speaking, streams are a series of notifications that something in Synapse's database has changed that the application might need to respond to.
+For example:
+
+- The events stream reports new events (PDUs) that Synapse creates, or that Synapse accepts from another homeserver.
+- The account data stream reports changes to users' [account data](https://spec.matrix.org/v1.7/client-server-api/#client-config).
+- The to-device stream reports when a device has a new [to-device message](https://spec.matrix.org/v1.7/client-server-api/#send-to-device-messaging).
+
+See [`synapse.replication.tcp.streams`](
+    https://github.com/matrix-org/synapse/blob/develop/synapse/replication/tcp/streams/__init__.py
+) for the full list of streams.
+
+It is very helpful to understand the streams mechanism when working on any part of Synapse that needs to respond to changes—especially if those changes are made by different workers.
+To that end, let's describe streams formally, paraphrasing from the docstring of [`AbstractStreamIdGenerator`](
+    https://github.com/matrix-org/synapse/blob/a719b703d9bd0dade2565ddcad0e2f3a7a9d4c37/synapse/storage/util/id_generators.py#L96
+).
+
+### Definition
+
+A stream is an append-only log `T1, T2, ..., Tn, ...` of facts[^1] which grows over time.
+Only "writers" can add facts to a stream, and there may be multiple writers.
+
+Each fact has an ID, called its "stream ID".
+Readers should only process facts in ascending stream ID order.
+
+Roughly speaking, each stream is backed by a database table.
+It should have a `stream_id` (or similar) bigint column holding stream IDs, plus additional columns as necessary to describe the fact.
+Typically, a fact is expressed with a single row in its backing table.[^2]
+Within a stream, no two facts may have the same stream_id.
+
+> _Aside_. Some additional notes on streams' backing tables.
+>
+> 1. Rich would like to [ditch the backing tables](https://github.com/matrix-org/synapse/issues/13456).
+> 2. The backing tables may have other uses.
+     >    For example, the events table serves backs the events stream, and is read when processing new events.
+     >    But old rows are read from the table all the time, whenever Synapse needs to lookup some facts about an event.
+> 3. Rich suspects that sometimes the stream is backed by multiple tables, so the stream proper is the union of those tables.
+
+Stream writers can "reserve" a stream ID, and then later mark it as having being completed.
+Stream writers need to track the completion of each stream fact.
+In the happy case, completion means a fact has been written to the stream table.
+But unhappy cases (e.g. transaction rollback due to an error) also count as completion.
+Once completed, the rows written with that stream ID are fixed, and no new rows
+will be inserted with that ID.
+
+### Current stream ID
+
+For any given stream reader (including writers themselves), we may define a per-writer current stream ID:
+
+> The current stream ID _for a writer W_ is the largest stream ID such that
+> all transactions added by W with equal or smaller ID have completed.
+
+Similarly, there is a "linear" notion of current stream ID:
+
+> The "linear" current stream ID is the largest stream ID such that
+> all facts (added by any writer) with equal or smaller ID have completed.
+
+Because different stream readers A and B learn about new facts at different times, A and B may disagree about current stream IDs.
+Put differently: we should think of stream readers as being independent of each other, proceeding through a stream of facts at different rates.
+
+**NB.** For both senses of "current", that if a writer opens a transaction that never completes, the current stream ID will never advance beyond that writer's last written stream ID.
+
+For single-writer streams, the per-writer current ID and the linear current ID are the same.
+Both senses of current ID are monotonic, but they may "skip" or jump over IDs because facts complete out of order.
+
+
+_Example_.
+Consider a single-writer stream which is initially at ID 1.
+
+| Action     | Current stream ID | Notes                                           |
+|------------|-------------------|-------------------------------------------------|
+|            | 1                 |                                                 |
+| Reserve 2  | 1                 |                                                 |
+| Reserve 3  | 1                 |                                                 |
+| Complete 3 | 1                 | current ID unchanged, waiting for 2 to complete |
+| Complete 2 | 3                 | current ID jumps from 1 -> 3                    |
+| Reserve 4  | 3                 |                                                 |
+| Reserve 5  | 3                 |                                                 |
+| Reserve 6  | 3                 |                                                 |
+| Complete 5 | 3                 |                                                 |
+| Complete 4 | 5                 | current ID jumps 3->5, even though 6 is pending |
+| Complete 6 | 6                 |                                                 |
+
+
+### Multi-writer streams
+
+There are two ways to view a multi-writer stream.
+
+1. Treat it as a collection of distinct single-writer streams, one
+   for each writer.
+2. Treat it as a single stream.
+
+The single stream (option 2) is conceptually simpler, and easier to represent (a single stream id).
+However, it requires each reader to know about the entire set of writers, to ensures that readers don't erroneously advance their current stream position too early and miss a fact from an unknown writer.
+In contrast, multiple parallel streams (option 1) are more complex, requiring more state to represent (map from writer to stream id).
+The payoff for doing so is that readers can "peek" ahead to facts that completed on one writer no matter the state of the others, reducing latency.
+
+Note that a multi-writer stream can be viewed in both ways.
+For example, the events stream is treated as multiple single-writer streams (option 1) by the sync handler, so that events are sent to clients as soon as possible.
+But the background process that works through events treats them as a single linear stream.
+
+Another useful example is the cache invalidation stream.
+The facts this stream holds are instructions to "you should now invalidate these cache entries".
+We only ever treat this as a multiple single-writer streams as there is no important ordering between cache invalidations.
+(Invalidations are self-contained facts; and the invalidations commute/are idempotent).
+
+### Writing to streams
+
+Writers need to track:
+ - track their current position (i.e. its own per-writer stream ID).
+ - their facts currently awaiting completion.
+
+At startup, 
+ - the current position of that writer can be found by querying the database (which suggests that facts need to be written to the database atomically, in a transaction); and
+ - there are no facts awaiting completion.
+
+To reserve a stream ID, call [`nextval`](https://www.postgresql.org/docs/current/functions-sequence.html) on the appropriate postgres sequence.
+
+To write a fact to the stream: insert the appropriate rows to the appropriate backing table.
+
+To complete a fact, first remove it from your map of facts currently awaiting completion.
+Then, if no earlier fact is awaiting completion, the writer can advance its current position in that stream.
+Upon doing so it should emit an `RDATA` message[^3], once for every fact between the old and the new stream ID.
+
+### Subscribing to streams
+
+Readers need to track the current position of every writer.
+
+At startup, they can find this by contacting each writer with a `REPLICATE` message,
+requesting that all writers reply describing their current position in their streams.
+Writers reply with a `POSITION` message.
+
+To learn about new facts, readers should listen for `RDATA` messages and process them to respond to the new fact.
+The `RDATA` itself is not a self-contained representation of the fact;
+readers will have to query the stream tables for the full details.
+Readers must also advance their record of the writer's current position for that stream.
+
+# Summary
+
+In a nutshell: we have an append-only log with a "buffer/scratchpad" at the end where we have to wait for the sequence to be linear and contiguous.
+
+
+---
+
+[^1]: we use the word _fact_ here for two reasons.
+Firstly, the word "event" is already heavily overloaded (PDUs, EDUs, account data, ...) and we don't need to make that worse.
+Secondly, "fact" emphasises that the things we append to a stream cannot change after the fact.
+
+[^2]: A fact might be expressed with 0 rows, e.g. if we opened a transaction to persist an event, but failed and rolled the transaction back before marking the fact as completed.
+In principle a fact might be expressed with 2 or more rows; if so, each of those rows should share the fact's stream ID.
+
+[^3]: This communication used to happen directly with the writers [over TCP](../../tcp_replication.md);
+nowadays it's done via Redis's Pubsub.