From 653044649a198d951e6eef7fcf967c563ba2d761 Mon Sep 17 00:00:00 2001
From: David Robertson <davidr@element.io>
Date: Mon, 24 Oct 2022 13:45:31 +0100
Subject: Move dev pages into dev dir

---
 docs/SUMMARY.md                                    |  22 +-
 docs/admin_api/media_admin_api.md                  |   2 +-
 docs/admin_api/user_admin_api.md                   |   6 +-
 docs/auth_chain_diff.dot                           |  32 --
 docs/auth_chain_diff.dot.png                       | Bin 42427 -> 0 bytes
 docs/auth_chain_difference_algorithm.md            | 141 --------
 docs/code_style.md                                 | 130 --------
 docs/development/cas.md                            |  64 ----
 docs/development/code_style.md                     | 130 ++++++++
 docs/development/contributing_guide.md             |   4 +-
 .../internal_documentation/auth_chain_diff.dot     |  32 ++
 .../internal_documentation/auth_chain_diff.dot.png | Bin 0 -> 42427 bytes
 .../auth_chain_difference_algorithm.md             | 141 ++++++++
 docs/development/internal_documentation/cas.md     |  64 ++++
 .../internal_documentation/media_repository.md     |  78 +++++
 .../internal_documentation/room-dag-concepts.md    | 113 +++++++
 .../room_and_user_statistics.md                    |  22 ++
 docs/development/internal_documentation/saml.md    |  40 +++
 docs/development/opentracing.md                    |  94 ++++++
 docs/development/room-dag-concepts.md              | 113 -------
 docs/development/saml.md                           |  40 ---
 .../synapse_architecture/log_contexts.md           | 364 +++++++++++++++++++++
 .../synapse_architecture/replication.md            |  42 +++
 .../synapse_architecture/tcp_replication.md        | 257 +++++++++++++++
 docs/log_contexts.md                               | 364 ---------------------
 docs/media_repository.md                           |  78 -----
 docs/opentracing.md                                |  94 ------
 docs/replication.md                                |  42 ---
 docs/room_and_user_statistics.md                   |  22 --
 docs/tcp_replication.md                            | 257 ---------------
 docs/usage/configuration/config_documentation.md   |   4 +-
 31 files changed, 1396 insertions(+), 1396 deletions(-)
 delete mode 100644 docs/auth_chain_diff.dot
 delete mode 100644 docs/auth_chain_diff.dot.png
 delete mode 100644 docs/auth_chain_difference_algorithm.md
 delete mode 100644 docs/code_style.md
 delete mode 100644 docs/development/cas.md
 create mode 100644 docs/development/code_style.md
 create mode 100644 docs/development/internal_documentation/auth_chain_diff.dot
 create mode 100644 docs/development/internal_documentation/auth_chain_diff.dot.png
 create mode 100644 docs/development/internal_documentation/auth_chain_difference_algorithm.md
 create mode 100644 docs/development/internal_documentation/cas.md
 create mode 100644 docs/development/internal_documentation/media_repository.md
 create mode 100644 docs/development/internal_documentation/room-dag-concepts.md
 create mode 100644 docs/development/internal_documentation/room_and_user_statistics.md
 create mode 100644 docs/development/internal_documentation/saml.md
 create mode 100644 docs/development/opentracing.md
 delete mode 100644 docs/development/room-dag-concepts.md
 delete mode 100644 docs/development/saml.md
 create mode 100644 docs/development/synapse_architecture/log_contexts.md
 create mode 100644 docs/development/synapse_architecture/replication.md
 create mode 100644 docs/development/synapse_architecture/tcp_replication.md
 delete mode 100644 docs/log_contexts.md
 delete mode 100644 docs/media_repository.md
 delete mode 100644 docs/opentracing.md
 delete mode 100644 docs/replication.md
 delete mode 100644 docs/room_and_user_statistics.md
 delete mode 100644 docs/tcp_replication.md

(limited to 'docs')

diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
index 744a076ef1..ceb96b5c6d 100644
--- a/docs/SUMMARY.md
+++ b/docs/SUMMARY.md
@@ -80,30 +80,30 @@
 
 # Development
   - [Contributing Guide](development/contributing_guide.md)
-  - [Code Style](code_style.md)
+  - [Code Style](development/code_style.md)
   - [Reviewing Code](development/reviews.md)
   - [Release Cycle](development/releases.md)
   - [Git Usage](development/git.md)
   - [Testing]()
     - [Demo scripts](development/demo.md)
-  - [OpenTracing](opentracing.md)
+  - [OpenTracing](development/opentracing.md)
   - [Database Schemas](development/database_schema.md)
   - [Experimental features](development/experimental_features.md)
   - [Dependency management](development/dependencies.md)
   - [Synapse Architecture]()
     - [Cancellation](development/synapse_architecture/cancellation.md)
-    - [Log Contexts](log_contexts.md)
-    - [Replication](replication.md)
-    - [TCP Replication](tcp_replication.md)
+    - [Log Contexts](development/synapse_architecture/log_contexts.md)
+    - [Replication](development/synapse_architecture/replication.md)
+    - [TCP Replication](development/synapse_architecture/tcp_replication.md)
   - [Internal Documentation](development/internal_documentation/README.md)
     - [Single Sign-On]()
-      - [SAML](development/saml.md)
-      - [CAS](development/cas.md)
-    - [Room DAG concepts](development/room-dag-concepts.md)
+      - [SAML](development/internal_documentation/saml.md)
+      - [CAS](development/internal_documentation/cas.md)
+    - [Room DAG concepts](development/internal_documentation/room-dag-concepts.md)
     - [State Resolution]()
-      - [The Auth Chain Difference Algorithm](auth_chain_difference_algorithm.md)
-    - [Media Repository](media_repository.md)
-    - [Room and User Statistics](room_and_user_statistics.md)
+      - [The Auth Chain Difference Algorithm](development/internal_documentation/auth_chain_difference_algorithm.md)
+    - [Media Repository](development/internal_documentation/media_repository.md)
+    - [Room and User Statistics](development/internal_documentation/room_and_user_statistics.md)
   - [Scripts]()
 
 # Other
diff --git a/docs/admin_api/media_admin_api.md b/docs/admin_api/media_admin_api.md
index d57c5aedae..960c10332f 100644
--- a/docs/admin_api/media_admin_api.md
+++ b/docs/admin_api/media_admin_api.md
@@ -3,7 +3,7 @@
 These APIs allow extracting media information from the homeserver.
 
 Details about the format of the `media_id` and storage of the media in the file system
-are documented under [media repository](../media_repository.md).
+are documented under [media repository](../development/internal_documentation/media_repository.md).
 
 To use it, you will need to authenticate by providing an `access_token`
 for a server admin: see [Admin API](../usage/administration/admin_api).
diff --git a/docs/admin_api/user_admin_api.md b/docs/admin_api/user_admin_api.md
index c95d6c9b05..800a4de441 100644
--- a/docs/admin_api/user_admin_api.md
+++ b/docs/admin_api/user_admin_api.md
@@ -548,8 +548,8 @@ The following fields are returned in the JSON response body:
 ### List media uploaded by a user
 Gets a list of all local media that a specific `user_id` has created.
 These are media that the user has uploaded themselves
-([local media](../media_repository.md#local-media)), as well as
-[URL preview images](../media_repository.md#url-previews) requested by the user if the
+([local media](../development/internal_documentation/media_repository.md#local-media)), as well as
+[URL preview images](../development/internal_documentation/media_repository.md#url-previews) requested by the user if the
 [feature is enabled](../usage/configuration/config_documentation.md#url_preview_enabled).
 
 By default, the response is ordered by descending creation date and ascending media ID.
@@ -650,7 +650,7 @@ The following fields are returned in the JSON response body:
   - `last_access_ts` - integer - Timestamp when the content was last accessed in ms.
   - `media_id` - string - The id used to refer to the media. Details about the format
     are documented under
-    [media repository](../media_repository.md).
+    [media repository](../development/internal_documentation/media_repository.md).
   - `media_length` - integer - Length of the media in bytes.
   - `media_type` - string - The MIME-type of the media.
   - `quarantined_by` - string - The user ID that initiated the quarantine request
diff --git a/docs/auth_chain_diff.dot b/docs/auth_chain_diff.dot
deleted file mode 100644
index 978d579ada..0000000000
--- a/docs/auth_chain_diff.dot
+++ /dev/null
@@ -1,32 +0,0 @@
-digraph auth {
-    nodesep=0.5;
-    rankdir="RL";
-
-    C [label="Create (1,1)"];
-
-    BJ [label="Bob's Join (2,1)", color=red];
-    BJ2 [label="Bob's Join (2,2)", color=red];
-    BJ2 -> BJ [color=red, dir=none];
-
-    subgraph cluster_foo {
-        A1 [label="Alice's invite (4,1)", color=blue];
-        A2 [label="Alice's Join (4,2)", color=blue];
-        A3 [label="Alice's Join (4,3)", color=blue];
-        A3 -> A2 -> A1 [color=blue, dir=none];
-        color=none;
-    }
-
-    PL1 [label="Power Level (3,1)", color=darkgreen];
-    PL2 [label="Power Level (3,2)", color=darkgreen];
-    PL2 -> PL1 [color=darkgreen, dir=none];
-
-    {rank = same; C; BJ; PL1; A1;}
-
-    A1 -> C [color=grey];
-    A1 -> BJ [color=grey];
-    PL1 -> C [color=grey];
-    BJ2 -> PL1 [penwidth=2];
-
-    A3 -> PL2 [penwidth=2];
-    A1 -> PL1 -> BJ -> C [penwidth=2];
-}
diff --git a/docs/auth_chain_diff.dot.png b/docs/auth_chain_diff.dot.png
deleted file mode 100644
index 771c07308f..0000000000
Binary files a/docs/auth_chain_diff.dot.png and /dev/null differ
diff --git a/docs/auth_chain_difference_algorithm.md b/docs/auth_chain_difference_algorithm.md
deleted file mode 100644
index ebc9de25b8..0000000000
--- a/docs/auth_chain_difference_algorithm.md
+++ /dev/null
@@ -1,141 +0,0 @@
-# Auth Chain Difference Algorithm
-
-The auth chain difference algorithm is used by V2 state resolution, where a
-naive implementation can be a significant source of CPU and DB usage.
-
-### Definitions
-
-A *state set* is a set of state events; e.g. the input of a state resolution
-algorithm is a collection of state sets.
-
-The *auth chain* of a set of events are all the events' auth events and *their*
-auth events, recursively (i.e. the events reachable by walking the graph induced
-by an event's auth events links).
-
-The *auth chain difference* of a collection of state sets is the union minus the
-intersection of the sets of auth chains corresponding to the state sets, i.e an
-event is in the auth chain difference if it is reachable by walking the auth
-event graph from at least one of the state sets but not from *all* of the state
-sets.
-
-## Breadth First Walk Algorithm
-
-A way of calculating the auth chain difference without calculating the full auth
-chains for each state set is to do a parallel breadth first walk (ordered by
-depth) of each state set's auth chain. By tracking which events are reachable
-from each state set we can finish early if every pending event is reachable from
-every state set.
-
-This can work well for state sets that have a small auth chain difference, but
-can be very inefficient for larger differences. However, this algorithm is still
-used if we don't have a chain cover index for the room (e.g. because we're in
-the process of indexing it).
-
-## Chain Cover Index
-
-Synapse computes auth chain differences by pre-computing a "chain cover" index
-for the auth chain in a room, allowing us to efficiently make reachability queries
-like "is event `A` in the auth chain of event `B`?". We could do this with an index
-that tracks all pairs `(A, B)` such that `A` is in the auth chain of `B`. However, this
-would be prohibitively large, scaling poorly as the room accumulates more state
-events.
-
-Instead, we break down the graph into *chains*. A chain is a subset of a DAG
-with the following property: for any pair of events `E` and `F` in the chain,
-the chain contains a path `E -> F` or a path `F -> E`. This forces a chain to be
-linear (without forks), e.g. `E -> F -> G -> ... -> H`. Each event in the chain
-is given a *sequence number* local to that chain. The oldest event `E` in the
-chain has sequence number 1. If `E` has a child `F` in the chain, then `F` has
-sequence number 2. If `E` has a grandchild `G` in the chain, then `G` has
-sequence number 3; and so on.
-
-Synapse ensures that each persisted event belongs to exactly one chain, and
-tracks how the chains are connected to one another. This allows us to
-efficiently answer reachability queries. Doing so uses less storage than
-tracking reachability on an event-by-event basis, particularly when we have
-fewer and longer chains. See
-
-> Jagadish, H. (1990). [A compression technique to materialize transitive closure](https://doi.org/10.1145/99935.99944).
-> *ACM Transactions on Database Systems (TODS)*, 15*(4)*, 558-598.
-
-for the original idea or
-
-> Y. Chen, Y. Chen, [An efficient algorithm for answering graph
-> reachability queries](https://doi.org/10.1109/ICDE.2008.4497498),
-> in: 2008 IEEE 24th International Conference on Data Engineering, April 2008,
-> pp. 893–902. (PDF available via [Google Scholar](https://scholar.google.com/scholar?q=Y.%20Chen,%20Y.%20Chen,%20An%20efficient%20algorithm%20for%20answering%20graph%20reachability%20queries,%20in:%202008%20IEEE%2024th%20International%20Conference%20on%20Data%20Engineering,%20April%202008,%20pp.%20893902.).)
-
-for a more modern take.
-
-In practical terms, the chain cover assigns every event a
-*chain ID* and *sequence number* (e.g. `(5,3)`), and maintains a map of *links*
-between events in chains (e.g. `(5,3) -> (2,4)`) such that `A` is reachable by `B`
-(i.e. `A` is in the auth chain of `B`) if and only if either:
-
-1. `A` and `B` have the same chain ID and `A`'s sequence number is less than `B`'s
-   sequence number; or
-2. there is a link `L` between `B`'s chain ID and `A`'s chain ID such that
-   `L.start_seq_no` <= `B.seq_no` and `A.seq_no` <= `L.end_seq_no`.
-
-There are actually two potential implementations, one where we store links from
-each chain to every other reachable chain (the transitive closure of the links
-graph), and one where we remove redundant links (the transitive reduction of the
-links graph) e.g. if we have chains `C3 -> C2 -> C1` then the link `C3 -> C1`
-would not be stored. Synapse uses the former implementation so that it doesn't
-need to recurse to test reachability between chains. This trades-off extra storage
-in order to save CPU cycles and DB queries.
-
-### Example
-
-An example auth graph would look like the following, where chains have been
-formed based on type/state_key and are denoted by colour and are labelled with
-`(chain ID, sequence number)`. Links are denoted by the arrows (links in grey
-are those that would be remove in the second implementation described above).
-
-![Example](auth_chain_diff.dot.png)
-
-Note that we don't include all links between events and their auth events, as
-most of those links would be redundant. For example, all events point to the
-create event, but each chain only needs the one link from it's base to the
-create event.
-
-## Using the Index
-
-This index can be used to calculate the auth chain difference of the state sets
-by looking at the chain ID and sequence numbers reachable from each state set:
-
-1. For every state set lookup the chain ID/sequence numbers of each state event
-2. Use the index to find all chains and the maximum sequence number reachable
-   from each state set.
-3. The auth chain difference is then all events in each chain that have sequence
-   numbers between the maximum sequence number reachable from *any* state set and
-   the minimum reachable by *all* state sets (if any).
-
-Note that steps 2 is effectively calculating the auth chain for each state set
-(in terms of chain IDs and sequence numbers), and step 3 is calculating the
-difference between the union and intersection of the auth chains.
-
-### Worked Example
-
-For example, given the above graph, we can calculate the difference between
-state sets consisting of:
-
-1. `S1`: Alice's invite `(4,1)` and Bob's second join `(2,2)`; and
-2. `S2`: Alice's second join `(4,3)` and Bob's first join `(2,1)`.
-
-Using the index we see that the following auth chains are reachable from each
-state set:
-
-1. `S1`: `(1,1)`, `(2,2)`, `(3,1)` & `(4,1)`
-2. `S2`: `(1,1)`, `(2,1)`, `(3,2)` & `(4,3)`
-
-And so, for each the ranges that are in the auth chain difference:
-1. Chain 1: None, (since everything can reach the create event).
-2. Chain 2: The range `(1, 2]` (i.e. just `2`), as `1` is reachable by all state
-   sets and the maximum reachable is `2` (corresponding to Bob's second join).
-3. Chain 3: Similarly the range `(1, 2]` (corresponding to the second power
-   level).
-4. Chain 4: The range `(1, 3]` (corresponding to both of Alice's joins).
-
-So the final result is: Bob's second join `(2,2)`, the second power level
-`(3,2)` and both of Alice's joins `(4,2)` & `(4,3)`.
diff --git a/docs/code_style.md b/docs/code_style.md
deleted file mode 100644
index d65fda62d1..0000000000
--- a/docs/code_style.md
+++ /dev/null
@@ -1,130 +0,0 @@
-# Code Style
-
-## Formatting tools
-
-The Synapse codebase uses a number of code formatting tools in order to
-quickly and automatically check for formatting (and sometimes logical)
-errors in code.
-
-The necessary tools are:
-
-- [black](https://black.readthedocs.io/en/stable/), a source code formatter;
-- [isort](https://pycqa.github.io/isort/), which organises each file's imports;
-- [flake8](https://flake8.pycqa.org/en/latest/), which can spot common errors; and
-- [mypy](https://mypy.readthedocs.io/en/stable/), a type checker.
-
-Install them with:
-
-```sh
-pip install -e ".[lint,mypy]"
-```
-
-The easiest way to run the lints is to invoke the linter script as follows.
-
-```sh
-scripts-dev/lint.sh
-```
-
-It's worth noting that modern IDEs and text editors can run these tools
-automatically on save. It may be worth looking into whether this
-functionality is supported in your editor for a more convenient
-development workflow. It is not, however, recommended to run `flake8` or `mypy`
-on save as they take a while and can be very resource intensive.
-
-## General rules
-
--   **Naming**:
-    -   Use `CamelCase` for class and type names
-    -   Use underscores for `function_names` and `variable_names`.
--   **Docstrings**: should follow the [google code
-    style](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings).
-    See the
-    [examples](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)
-    in the sphinx documentation.
--   **Imports**:
-    -   Imports should be sorted by `isort` as described above.
-    -   Prefer to import classes and functions rather than packages or
-        modules.
-
-        Example:
-
-        ```python
-        from synapse.types import UserID
-        ...
-        user_id = UserID(local, server)
-        ```
-
-        is preferred over:
-
-        ```python
-        from synapse import types
-        ...
-        user_id = types.UserID(local, server)
-        ```
-
-        (or any other variant).
-
-        This goes against the advice in the Google style guide, but it
-        means that errors in the name are caught early (at import time).
-
-    -   Avoid wildcard imports (`from synapse.types import *`) and
-        relative imports (`from .types import UserID`).
-
-## Configuration code and documentation format
-
-When adding a configuration option to the code, if several settings are grouped into a single dict, ensure that your code
-correctly handles the top-level option being set to `None` (as it will be if no sub-options are enabled).
-
-The [configuration manual](usage/configuration/config_documentation.md) acts as a
-reference to Synapse's configuration options for server administrators.
-Remember that many readers will be unfamiliar with YAML and server
-administration in general, so it is important that when you add
-a configuration option the documentation be as easy to understand as possible, which 
-includes following a consistent format.
-
-Some guidelines follow:
-
-- Each option should be listed in the config manual with the following format:
-      
-    - The name of the option, prefixed by `###`. 
-
-    - A comment which describes the default behaviour (i.e. what
-        happens if the setting is omitted), as well as what the effect
-        will be if the setting is changed.
-    - An example setting, using backticks to define the code block
-
-        For boolean (on/off) options, convention is that this example
-        should be the *opposite* to the default. For other options, the example should give
-        some non-default value which is likely to be useful to the reader.
-
-- There should be a horizontal rule between each option, which can be achieved by adding `---` before and
-  after the option.
-- `true` and `false` are spelt thus (as opposed to `True`, etc.)
-
-Example:
-
----
-### `modules`
-
-Use the `module` sub-option to add a module under `modules` to extend functionality. 
-The `module` setting then has a sub-option, `config`, which can be used to define some configuration
-for the `module`.
-
-Defaults to none.
-
-Example configuration:
-```yaml
-modules:
-  - module: my_super_module.MySuperClass
-    config:
-      do_thing: true
-  - module: my_other_super_module.SomeClass
-    config: {}
-```
----
-
-Note that the sample configuration is generated from the synapse code
-and is maintained by a script, `scripts-dev/generate_sample_config.sh`.
-Making sure that the output from this script matches the desired format
-is left as an exercise for the reader!
-
diff --git a/docs/development/cas.md b/docs/development/cas.md
deleted file mode 100644
index 7c0668e034..0000000000
--- a/docs/development/cas.md
+++ /dev/null
@@ -1,64 +0,0 @@
-# How to test CAS as a developer without a server
-
-The [django-mama-cas](https://github.com/jbittel/django-mama-cas) project is an
-easy to run CAS implementation built on top of Django.
-
-## Prerequisites
-
-1. Create a new virtualenv: `python3 -m venv <your virtualenv>`
-2. Activate your virtualenv: `source /path/to/your/virtualenv/bin/activate`
-3. Install Django and django-mama-cas:
-   ```sh
-   python -m pip install "django<3" "django-mama-cas==2.4.0"
-   ```
-4. Create a Django project in the current directory:
-   ```sh
-   django-admin startproject cas_test .
-   ```
-5. Follow the [install directions](https://django-mama-cas.readthedocs.io/en/latest/installation.html#configuring) for django-mama-cas
-6. Setup the SQLite database: `python manage.py migrate`
-7. Create a user:
-   ```sh
-   python manage.py createsuperuser
-   ```
-   1. Use whatever you want as the username and password.
-   2. Leave the other fields blank.
-8. Use the built-in Django test server to serve the CAS endpoints on port 8000:
-   ```sh
-   python manage.py runserver
-   ```
-
-You should now have a Django project configured to serve CAS authentication with
-a single user created.
-
-## Configure Synapse (and Element) to use CAS
-
-1. Modify your `homeserver.yaml` to enable CAS and point it to your locally
-   running Django test server:
-   ```yaml
-   cas_config:
-     enabled: true
-     server_url: "http://localhost:8000"
-     service_url: "http://localhost:8081"
-     #displayname_attribute: name
-     #required_attributes:
-     #    name: value
-   ```
-2. Restart Synapse.
-
-Note that the above configuration assumes the homeserver is running on port 8081
-and that the CAS server is on port 8000, both on localhost.
-
-## Testing the configuration
-
-Then in Element:
-
-1. Visit the login page with a Element pointing at your homeserver.
-2. Click the Single Sign-On button.
-3. Login using the credentials created with `createsuperuser`.
-4. You should be logged in.
-
-If you want to repeat this process you'll need to manually logout first:
-
-1. http://localhost:8000/admin/
-2. Click "logout" in the top right.
diff --git a/docs/development/code_style.md b/docs/development/code_style.md
new file mode 100644
index 0000000000..3fb98d7cb7
--- /dev/null
+++ b/docs/development/code_style.md
@@ -0,0 +1,130 @@
+# Code Style
+
+## Formatting tools
+
+The Synapse codebase uses a number of code formatting tools in order to
+quickly and automatically check for formatting (and sometimes logical)
+errors in code.
+
+The necessary tools are:
+
+- [black](https://black.readthedocs.io/en/stable/), a source code formatter;
+- [isort](https://pycqa.github.io/isort/), which organises each file's imports;
+- [flake8](https://flake8.pycqa.org/en/latest/), which can spot common errors; and
+- [mypy](https://mypy.readthedocs.io/en/stable/), a type checker.
+
+Install them with:
+
+```sh
+pip install -e ".[lint,mypy]"
+```
+
+The easiest way to run the lints is to invoke the linter script as follows.
+
+```sh
+scripts-dev/lint.sh
+```
+
+It's worth noting that modern IDEs and text editors can run these tools
+automatically on save. It may be worth looking into whether this
+functionality is supported in your editor for a more convenient
+development workflow. It is not, however, recommended to run `flake8` or `mypy`
+on save as they take a while and can be very resource intensive.
+
+## General rules
+
+-   **Naming**:
+    -   Use `CamelCase` for class and type names
+    -   Use underscores for `function_names` and `variable_names`.
+-   **Docstrings**: should follow the [google code
+    style](https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings).
+    See the
+    [examples](http://sphinxcontrib-napoleon.readthedocs.io/en/latest/example_google.html)
+    in the sphinx documentation.
+-   **Imports**:
+    -   Imports should be sorted by `isort` as described above.
+    -   Prefer to import classes and functions rather than packages or
+        modules.
+
+        Example:
+
+        ```python
+        from synapse.types import UserID
+        ...
+        user_id = UserID(local, server)
+        ```
+
+        is preferred over:
+
+        ```python
+        from synapse import types
+        ...
+        user_id = types.UserID(local, server)
+        ```
+
+        (or any other variant).
+
+        This goes against the advice in the Google style guide, but it
+        means that errors in the name are caught early (at import time).
+
+    -   Avoid wildcard imports (`from synapse.types import *`) and
+        relative imports (`from .types import UserID`).
+
+## Configuration code and documentation format
+
+When adding a configuration option to the code, if several settings are grouped into a single dict, ensure that your code
+correctly handles the top-level option being set to `None` (as it will be if no sub-options are enabled).
+
+The [configuration manual](../usage/configuration/config_documentation.md) acts as a
+reference to Synapse's configuration options for server administrators.
+Remember that many readers will be unfamiliar with YAML and server
+administration in general, so it is important that when you add
+a configuration option the documentation be as easy to understand as possible, which 
+includes following a consistent format.
+
+Some guidelines follow:
+
+- Each option should be listed in the config manual with the following format:
+      
+    - The name of the option, prefixed by `###`. 
+
+    - A comment which describes the default behaviour (i.e. what
+        happens if the setting is omitted), as well as what the effect
+        will be if the setting is changed.
+    - An example setting, using backticks to define the code block
+
+        For boolean (on/off) options, convention is that this example
+        should be the *opposite* to the default. For other options, the example should give
+        some non-default value which is likely to be useful to the reader.
+
+- There should be a horizontal rule between each option, which can be achieved by adding `---` before and
+  after the option.
+- `true` and `false` are spelt thus (as opposed to `True`, etc.)
+
+Example:
+
+---
+### `modules`
+
+Use the `module` sub-option to add a module under `modules` to extend functionality. 
+The `module` setting then has a sub-option, `config`, which can be used to define some configuration
+for the `module`.
+
+Defaults to none.
+
+Example configuration:
+```yaml
+modules:
+  - module: my_super_module.MySuperClass
+    config:
+      do_thing: true
+  - module: my_other_super_module.SomeClass
+    config: {}
+```
+---
+
+Note that the sample configuration is generated from the synapse code
+and is maintained by a script, `scripts-dev/generate_sample_config.sh`.
+Making sure that the output from this script matches the desired format
+is left as an exercise for the reader!
+
diff --git a/docs/development/contributing_guide.md b/docs/development/contributing_guide.md
index 1e52f9808c..91488d7f73 100644
--- a/docs/development/contributing_guide.md
+++ b/docs/development/contributing_guide.md
@@ -103,9 +103,9 @@ Synapse developers.
 regarding Synapse's Admin API, which is used mostly by sysadmins and external
 service developers.
 
-Synapse's code style is documented [here](../code_style.md). Please follow
+Synapse's code style is documented [here](code_style.md). Please follow
 it, including the conventions for the [sample configuration
-file](../code_style.md#configuration-file-format).
+file](code_style.md#configuration-file-format).
 
 We welcome improvements and additions to our documentation itself! When
 writing new pages, please
diff --git a/docs/development/internal_documentation/auth_chain_diff.dot b/docs/development/internal_documentation/auth_chain_diff.dot
new file mode 100644
index 0000000000..978d579ada
--- /dev/null
+++ b/docs/development/internal_documentation/auth_chain_diff.dot
@@ -0,0 +1,32 @@
+digraph auth {
+    nodesep=0.5;
+    rankdir="RL";
+
+    C [label="Create (1,1)"];
+
+    BJ [label="Bob's Join (2,1)", color=red];
+    BJ2 [label="Bob's Join (2,2)", color=red];
+    BJ2 -> BJ [color=red, dir=none];
+
+    subgraph cluster_foo {
+        A1 [label="Alice's invite (4,1)", color=blue];
+        A2 [label="Alice's Join (4,2)", color=blue];
+        A3 [label="Alice's Join (4,3)", color=blue];
+        A3 -> A2 -> A1 [color=blue, dir=none];
+        color=none;
+    }
+
+    PL1 [label="Power Level (3,1)", color=darkgreen];
+    PL2 [label="Power Level (3,2)", color=darkgreen];
+    PL2 -> PL1 [color=darkgreen, dir=none];
+
+    {rank = same; C; BJ; PL1; A1;}
+
+    A1 -> C [color=grey];
+    A1 -> BJ [color=grey];
+    PL1 -> C [color=grey];
+    BJ2 -> PL1 [penwidth=2];
+
+    A3 -> PL2 [penwidth=2];
+    A1 -> PL1 -> BJ -> C [penwidth=2];
+}
diff --git a/docs/development/internal_documentation/auth_chain_diff.dot.png b/docs/development/internal_documentation/auth_chain_diff.dot.png
new file mode 100644
index 0000000000..771c07308f
Binary files /dev/null and b/docs/development/internal_documentation/auth_chain_diff.dot.png differ
diff --git a/docs/development/internal_documentation/auth_chain_difference_algorithm.md b/docs/development/internal_documentation/auth_chain_difference_algorithm.md
new file mode 100644
index 0000000000..ebc9de25b8
--- /dev/null
+++ b/docs/development/internal_documentation/auth_chain_difference_algorithm.md
@@ -0,0 +1,141 @@
+# Auth Chain Difference Algorithm
+
+The auth chain difference algorithm is used by V2 state resolution, where a
+naive implementation can be a significant source of CPU and DB usage.
+
+### Definitions
+
+A *state set* is a set of state events; e.g. the input of a state resolution
+algorithm is a collection of state sets.
+
+The *auth chain* of a set of events are all the events' auth events and *their*
+auth events, recursively (i.e. the events reachable by walking the graph induced
+by an event's auth events links).
+
+The *auth chain difference* of a collection of state sets is the union minus the
+intersection of the sets of auth chains corresponding to the state sets, i.e an
+event is in the auth chain difference if it is reachable by walking the auth
+event graph from at least one of the state sets but not from *all* of the state
+sets.
+
+## Breadth First Walk Algorithm
+
+A way of calculating the auth chain difference without calculating the full auth
+chains for each state set is to do a parallel breadth first walk (ordered by
+depth) of each state set's auth chain. By tracking which events are reachable
+from each state set we can finish early if every pending event is reachable from
+every state set.
+
+This can work well for state sets that have a small auth chain difference, but
+can be very inefficient for larger differences. However, this algorithm is still
+used if we don't have a chain cover index for the room (e.g. because we're in
+the process of indexing it).
+
+## Chain Cover Index
+
+Synapse computes auth chain differences by pre-computing a "chain cover" index
+for the auth chain in a room, allowing us to efficiently make reachability queries
+like "is event `A` in the auth chain of event `B`?". We could do this with an index
+that tracks all pairs `(A, B)` such that `A` is in the auth chain of `B`. However, this
+would be prohibitively large, scaling poorly as the room accumulates more state
+events.
+
+Instead, we break down the graph into *chains*. A chain is a subset of a DAG
+with the following property: for any pair of events `E` and `F` in the chain,
+the chain contains a path `E -> F` or a path `F -> E`. This forces a chain to be
+linear (without forks), e.g. `E -> F -> G -> ... -> H`. Each event in the chain
+is given a *sequence number* local to that chain. The oldest event `E` in the
+chain has sequence number 1. If `E` has a child `F` in the chain, then `F` has
+sequence number 2. If `E` has a grandchild `G` in the chain, then `G` has
+sequence number 3; and so on.
+
+Synapse ensures that each persisted event belongs to exactly one chain, and
+tracks how the chains are connected to one another. This allows us to
+efficiently answer reachability queries. Doing so uses less storage than
+tracking reachability on an event-by-event basis, particularly when we have
+fewer and longer chains. See
+
+> Jagadish, H. (1990). [A compression technique to materialize transitive closure](https://doi.org/10.1145/99935.99944).
+> *ACM Transactions on Database Systems (TODS)*, 15*(4)*, 558-598.
+
+for the original idea or
+
+> Y. Chen, Y. Chen, [An efficient algorithm for answering graph
+> reachability queries](https://doi.org/10.1109/ICDE.2008.4497498),
+> in: 2008 IEEE 24th International Conference on Data Engineering, April 2008,
+> pp. 893–902. (PDF available via [Google Scholar](https://scholar.google.com/scholar?q=Y.%20Chen,%20Y.%20Chen,%20An%20efficient%20algorithm%20for%20answering%20graph%20reachability%20queries,%20in:%202008%20IEEE%2024th%20International%20Conference%20on%20Data%20Engineering,%20April%202008,%20pp.%20893902.).)
+
+for a more modern take.
+
+In practical terms, the chain cover assigns every event a
+*chain ID* and *sequence number* (e.g. `(5,3)`), and maintains a map of *links*
+between events in chains (e.g. `(5,3) -> (2,4)`) such that `A` is reachable by `B`
+(i.e. `A` is in the auth chain of `B`) if and only if either:
+
+1. `A` and `B` have the same chain ID and `A`'s sequence number is less than `B`'s
+   sequence number; or
+2. there is a link `L` between `B`'s chain ID and `A`'s chain ID such that
+   `L.start_seq_no` <= `B.seq_no` and `A.seq_no` <= `L.end_seq_no`.
+
+There are actually two potential implementations, one where we store links from
+each chain to every other reachable chain (the transitive closure of the links
+graph), and one where we remove redundant links (the transitive reduction of the
+links graph) e.g. if we have chains `C3 -> C2 -> C1` then the link `C3 -> C1`
+would not be stored. Synapse uses the former implementation so that it doesn't
+need to recurse to test reachability between chains. This trades-off extra storage
+in order to save CPU cycles and DB queries.
+
+### Example
+
+An example auth graph would look like the following, where chains have been
+formed based on type/state_key and are denoted by colour and are labelled with
+`(chain ID, sequence number)`. Links are denoted by the arrows (links in grey
+are those that would be remove in the second implementation described above).
+
+![Example](auth_chain_diff.dot.png)
+
+Note that we don't include all links between events and their auth events, as
+most of those links would be redundant. For example, all events point to the
+create event, but each chain only needs the one link from it's base to the
+create event.
+
+## Using the Index
+
+This index can be used to calculate the auth chain difference of the state sets
+by looking at the chain ID and sequence numbers reachable from each state set:
+
+1. For every state set lookup the chain ID/sequence numbers of each state event
+2. Use the index to find all chains and the maximum sequence number reachable
+   from each state set.
+3. The auth chain difference is then all events in each chain that have sequence
+   numbers between the maximum sequence number reachable from *any* state set and
+   the minimum reachable by *all* state sets (if any).
+
+Note that steps 2 is effectively calculating the auth chain for each state set
+(in terms of chain IDs and sequence numbers), and step 3 is calculating the
+difference between the union and intersection of the auth chains.
+
+### Worked Example
+
+For example, given the above graph, we can calculate the difference between
+state sets consisting of:
+
+1. `S1`: Alice's invite `(4,1)` and Bob's second join `(2,2)`; and
+2. `S2`: Alice's second join `(4,3)` and Bob's first join `(2,1)`.
+
+Using the index we see that the following auth chains are reachable from each
+state set:
+
+1. `S1`: `(1,1)`, `(2,2)`, `(3,1)` & `(4,1)`
+2. `S2`: `(1,1)`, `(2,1)`, `(3,2)` & `(4,3)`
+
+And so, for each the ranges that are in the auth chain difference:
+1. Chain 1: None, (since everything can reach the create event).
+2. Chain 2: The range `(1, 2]` (i.e. just `2`), as `1` is reachable by all state
+   sets and the maximum reachable is `2` (corresponding to Bob's second join).
+3. Chain 3: Similarly the range `(1, 2]` (corresponding to the second power
+   level).
+4. Chain 4: The range `(1, 3]` (corresponding to both of Alice's joins).
+
+So the final result is: Bob's second join `(2,2)`, the second power level
+`(3,2)` and both of Alice's joins `(4,2)` & `(4,3)`.
diff --git a/docs/development/internal_documentation/cas.md b/docs/development/internal_documentation/cas.md
new file mode 100644
index 0000000000..7c0668e034
--- /dev/null
+++ b/docs/development/internal_documentation/cas.md
@@ -0,0 +1,64 @@
+# How to test CAS as a developer without a server
+
+The [django-mama-cas](https://github.com/jbittel/django-mama-cas) project is an
+easy to run CAS implementation built on top of Django.
+
+## Prerequisites
+
+1. Create a new virtualenv: `python3 -m venv <your virtualenv>`
+2. Activate your virtualenv: `source /path/to/your/virtualenv/bin/activate`
+3. Install Django and django-mama-cas:
+   ```sh
+   python -m pip install "django<3" "django-mama-cas==2.4.0"
+   ```
+4. Create a Django project in the current directory:
+   ```sh
+   django-admin startproject cas_test .
+   ```
+5. Follow the [install directions](https://django-mama-cas.readthedocs.io/en/latest/installation.html#configuring) for django-mama-cas
+6. Setup the SQLite database: `python manage.py migrate`
+7. Create a user:
+   ```sh
+   python manage.py createsuperuser
+   ```
+   1. Use whatever you want as the username and password.
+   2. Leave the other fields blank.
+8. Use the built-in Django test server to serve the CAS endpoints on port 8000:
+   ```sh
+   python manage.py runserver
+   ```
+
+You should now have a Django project configured to serve CAS authentication with
+a single user created.
+
+## Configure Synapse (and Element) to use CAS
+
+1. Modify your `homeserver.yaml` to enable CAS and point it to your locally
+   running Django test server:
+   ```yaml
+   cas_config:
+     enabled: true
+     server_url: "http://localhost:8000"
+     service_url: "http://localhost:8081"
+     #displayname_attribute: name
+     #required_attributes:
+     #    name: value
+   ```
+2. Restart Synapse.
+
+Note that the above configuration assumes the homeserver is running on port 8081
+and that the CAS server is on port 8000, both on localhost.
+
+## Testing the configuration
+
+Then in Element:
+
+1. Visit the login page with a Element pointing at your homeserver.
+2. Click the Single Sign-On button.
+3. Login using the credentials created with `createsuperuser`.
+4. You should be logged in.
+
+If you want to repeat this process you'll need to manually logout first:
+
+1. http://localhost:8000/admin/
+2. Click "logout" in the top right.
diff --git a/docs/development/internal_documentation/media_repository.md b/docs/development/internal_documentation/media_repository.md
new file mode 100644
index 0000000000..23e6da7f31
--- /dev/null
+++ b/docs/development/internal_documentation/media_repository.md
@@ -0,0 +1,78 @@
+# Media Repository 
+
+*Synapse implementation-specific details for the media repository*
+
+The media repository
+ * stores avatars, attachments and their thumbnails for media uploaded by local
+   users.
+ * caches avatars, attachments and their thumbnails for media uploaded by remote
+   users.
+ * caches resources and thumbnails used for URL previews.
+
+All media in Matrix can be identified by a unique
+[MXC URI](https://spec.matrix.org/latest/client-server-api/#matrix-content-mxc-uris),
+consisting of a server name and media ID:
+```
+mxc://<server-name>/<media-id>
+```
+
+## Local Media
+Synapse generates 24 character media IDs for content uploaded by local users.
+These media IDs consist of upper and lowercase letters and are case-sensitive.
+Other homeserver implementations may generate media IDs differently.
+
+Local media is recorded in the `local_media_repository` table, which includes
+metadata such as MIME types, upload times and file sizes.
+Note that this table is shared by the URL cache, which has a different media ID
+scheme.
+
+### Paths
+A file with media ID `aabbcccccccccccccccccccc` and its `128x96` `image/jpeg`
+thumbnail, created by scaling, would be stored at:
+```
+local_content/aa/bb/cccccccccccccccccccc
+local_thumbnails/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale
+```
+
+## Remote Media
+When media from a remote homeserver is requested from Synapse, it is assigned
+a local `filesystem_id`, with the same format as locally-generated media IDs,
+as described above.
+
+A record of remote media is stored in the `remote_media_cache` table, which
+can be used to map remote MXC URIs (server names and media IDs) to local
+`filesystem_id`s.
+
+### Paths
+A file from `matrix.org` with `filesystem_id` `aabbcccccccccccccccccccc` and its
+`128x96` `image/jpeg` thumbnail, created by scaling, would be stored at:
+```
+remote_content/matrix.org/aa/bb/cccccccccccccccccccc
+remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale
+```
+Older thumbnails may omit the thumbnailing method:
+```
+remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg
+```
+
+Note that `remote_thumbnail/` does not have an `s`.
+
+## URL Previews
+
+When generating previews for URLs, Synapse may download and cache various
+resources, including images. These resources are assigned temporary media IDs
+of the form `yyyy-mm-dd_aaaaaaaaaaaaaaaa`, where `yyyy-mm-dd` is the current
+date and `aaaaaaaaaaaaaaaa` is a random sequence of 16 case-sensitive letters.
+
+The metadata for these cached resources is stored in the
+`local_media_repository` and `local_media_repository_url_cache` tables.
+
+Resources for URL previews are deleted after a few days.
+
+### Paths
+The file with media ID `yyyy-mm-dd_aaaaaaaaaaaaaaaa` and its `128x96`
+`image/jpeg` thumbnail, created by scaling, would be stored at:
+```
+url_cache/yyyy-mm-dd/aaaaaaaaaaaaaaaa
+url_cache_thumbnails/yyyy-mm-dd/aaaaaaaaaaaaaaaa/128-96-image-jpeg-scale
+```
diff --git a/docs/development/internal_documentation/room-dag-concepts.md b/docs/development/internal_documentation/room-dag-concepts.md
new file mode 100644
index 0000000000..76709487f8
--- /dev/null
+++ b/docs/development/internal_documentation/room-dag-concepts.md
@@ -0,0 +1,113 @@
+# Room DAG concepts
+
+## Edges
+
+The word "edge" comes from graph theory lingo. An edge is just a connection
+between two events. In Synapse, we connect events by specifying their
+`prev_events`. A subsequent event points back at a previous event.
+
+```
+A (oldest) <---- B <---- C (most recent)
+```
+
+
+## Depth and stream ordering
+
+Events are normally sorted by `(topological_ordering, stream_ordering)` where
+`topological_ordering` is just `depth`. In other words, we first sort by `depth`
+and then tie-break based on `stream_ordering`. `depth` is incremented as new
+messages are added to the DAG. Normally, `stream_ordering` is an auto
+incrementing integer, but backfilled events start with `stream_ordering=-1` and decrement.
+
+---
+
+ - `/sync` returns things in the order they arrive at the server (`stream_ordering`).
+ - `/messages` (and `/backfill` in the federation API) return them in the order determined by the event graph `(topological_ordering, stream_ordering)`.
+
+The general idea is that, if you're following a room in real-time (i.e.
+`/sync`), you probably want to see the messages as they arrive at your server,
+rather than skipping any that arrived late; whereas if you're looking at a
+historical section of timeline (i.e. `/messages`), you want to see the best
+representation of the state of the room as others were seeing it at the time.
+
+## Outliers
+
+We mark an event as an `outlier` when we haven't figured out the state for the
+room at that point in the DAG yet. They are "floating" events that we haven't
+yet correlated to the DAG.
+
+Outliers typically arise when we fetch the auth chain or state for a given
+event. When that happens, we just grab the events in the state/auth chain,
+without calculating the state at those events, or backfilling their
+`prev_events`. Since we don't have the state at any events fetched in that
+way, we mark them as outliers.
+
+So, typically, we won't have the `prev_events` of an `outlier` in the database,
+(though it's entirely possible that we *might* have them for some other
+reason). Other things that make outliers different from regular events:
+
+ * We don't have state for them, so there should be no entry in
+   `event_to_state_groups` for an outlier. (In practice this isn't always
+   the case, though I'm not sure why: see https://github.com/matrix-org/synapse/issues/12201).
+
+ * We don't record entries for them in the `event_edges`,
+   `event_forward_extremeties` or `event_backward_extremities` tables.
+
+Since outliers are not tied into the DAG, they do not normally form part of the
+timeline sent down to clients via `/sync` or `/messages`; however there is an
+exception:
+
+### Out-of-band membership events
+
+A special case of outlier events are some membership events for federated rooms
+that we aren't full members of. For example:
+
+ * invites received over federation, before we join the room
+ * *rejections* for said invites
+ * knock events for rooms that we would like to join but have not yet joined.
+
+In all the above cases, we don't have the state for the room, which is why they
+are treated as outliers. They are a bit special though, in that they are
+proactively sent to clients via `/sync`.
+
+## Forward extremity
+
+Most-recent-in-time events in the DAG which are not referenced by any other
+events' `prev_events` yet. (In this definition, outliers, rejected events, and
+soft-failed events don't count.)
+
+The forward extremities of a room (or at least, a subset of them, if there are
+more than ten) are used as the `prev_events` when the next event is sent.
+
+The "current state" of a room (ie: the state which would be used if we
+generated a new event) is, therefore, the resolution of the room states
+at each of the forward extremities.
+
+## Backward extremity
+
+The current marker of where we have backfilled up to and will generally be the
+`prev_events` of the oldest-in-time events we have in the DAG. This gives a starting point when
+backfilling history.
+
+Note that, unlike forward extremities, we typically don't have any backward
+extremity events themselves in the database - or, if we do, they will be "outliers" (see
+above). Either way, we don't expect to have the room state at a backward extremity.
+
+When we persist a non-outlier event, if it was previously a backward extremity,
+we clear it as a backward extremity and set all of its `prev_events` as the new
+backward extremities if they aren't already persisted as non-outliers. This
+therefore keeps the backward extremities up-to-date.
+
+## State groups
+
+For every non-outlier event we need to know the state at that event. Instead of
+storing the full state for each event in the DB (i.e. a `event_id -> state`
+mapping), which is *very* space inefficient when state doesn't change, we
+instead assign each different set of state a "state group" and then have
+mappings of `event_id -> state_group` and `state_group -> state`.
+
+
+### Stage group edges
+
+TODO: `state_group_edges` is a further optimization...
+      notes from @Azrenbeth, https://pastebin.com/seUGVGeT
diff --git a/docs/development/internal_documentation/room_and_user_statistics.md b/docs/development/internal_documentation/room_and_user_statistics.md
new file mode 100644
index 0000000000..cc38c890bb
--- /dev/null
+++ b/docs/development/internal_documentation/room_and_user_statistics.md
@@ -0,0 +1,22 @@
+Room and User Statistics
+========================
+
+Synapse maintains room and user statistics in various tables. These can be used
+for administrative purposes but are also used when generating the public room
+directory.
+
+
+# Synapse Developer Documentation
+
+## High-Level Concepts
+
+### Definitions
+
+* **subject**: Something we are tracking stats about – currently a room or user.
+* **current row**: An entry for a subject in the appropriate current statistics
+    table. Each subject can have only one.
+
+### Overview
+
+Stats correspond to the present values. Current rows contain the most up-to-date
+statistics for a room. Each subject can only have one entry.
diff --git a/docs/development/internal_documentation/saml.md b/docs/development/internal_documentation/saml.md
new file mode 100644
index 0000000000..b08bcb7419
--- /dev/null
+++ b/docs/development/internal_documentation/saml.md
@@ -0,0 +1,40 @@
+# How to test SAML as a developer without a server
+
+https://fujifish.github.io/samling/samling.html (https://github.com/fujifish/samling) is a great resource for being able to tinker with the 
+SAML options within Synapse without needing to deploy and configure a complicated software stack.
+
+To make Synapse (and therefore Element) use it:
+
+1. Use the samling.html URL above or deploy your own and visit the IdP Metadata tab.
+2. Copy the XML to your clipboard.
+3. On your Synapse server, create a new file `samling.xml` next to your `homeserver.yaml` with
+   the XML from step 2 as the contents.
+4. Edit your `homeserver.yaml` to include:
+   ```yaml
+   saml2_config:
+     sp_config:
+       allow_unknown_attributes: true  # Works around a bug with AVA Hashes: https://github.com/IdentityPython/pysaml2/issues/388
+       metadata:
+         local: ["samling.xml"]
+   ```
+5. Ensure that your `homeserver.yaml` has a setting for `public_baseurl`:
+   ```yaml
+   public_baseurl: http://localhost:8080/
+   ```
+6. Run `apt-get install xmlsec1` and `pip install --upgrade --force 'pysaml2>=4.5.0'` to ensure
+   the dependencies are installed and ready to go.
+7. Restart Synapse.
+
+Then in Element:
+
+1. Visit the login page and point Element towards your homeserver using the `public_baseurl` above.
+2. Click the Single Sign-On button.
+3. On the samling page, enter a Name Identifier and add a SAML Attribute for `uid=your_localpart`.
+   The response must also be signed.
+4. Click "Next".
+5. Click "Post Response" (change nothing).
+6. You should be logged in.
+
+If you try and repeat this process, you may be automatically logged in using the information you
+gave previously. To fix this, open your developer console (`F12` or `Ctrl+Shift+I`) while on the
+samling page and clear the site data. In Chrome, this will be a button on the Application tab.
diff --git a/docs/development/opentracing.md b/docs/development/opentracing.md
new file mode 100644
index 0000000000..26e5c8b605
--- /dev/null
+++ b/docs/development/opentracing.md
@@ -0,0 +1,94 @@
+# OpenTracing
+
+## Background
+
+OpenTracing is a semi-standard being adopted by a number of distributed
+tracing platforms. It is a common api for facilitating vendor-agnostic
+tracing instrumentation. That is, we can use the OpenTracing api and
+select one of a number of tracer implementations to do the heavy lifting
+in the background. Our current selected implementation is Jaeger.
+
+OpenTracing is a tool which gives an insight into the causal
+relationship of work done in and between servers. The servers each track
+events and report them to a centralised server - in Synapse's case:
+Jaeger. The basic unit used to represent events is the span. The span
+roughly represents a single piece of work that was done and the time at
+which it occurred. A span can have child spans, meaning that the work of
+the child had to be completed for the parent span to complete, or it can
+have follow-on spans which represent work that is undertaken as a result
+of the parent but is not depended on by the parent to in order to
+finish.
+
+Since this is undertaken in a distributed environment a request to
+another server, such as an RPC or a simple GET, can be considered a span
+(a unit or work) for the local server. This causal link is what
+OpenTracing aims to capture and visualise. In order to do this metadata
+about the local server's span, i.e the 'span context', needs to be
+included with the request to the remote.
+
+It is up to the remote server to decide what it does with the spans it
+creates. This is called the sampling policy and it can be configured
+through Jaeger's settings.
+
+For OpenTracing concepts see
+<https://opentracing.io/docs/overview/what-is-tracing/>.
+
+For more information about Jaeger's implementation see
+<https://www.jaegertracing.io/docs/>
+
+## Setting up OpenTracing
+
+To receive OpenTracing spans, start up a Jaeger server. This can be done
+using docker like so:
+
+```sh
+docker run -d --name jaeger \
+  -p 6831:6831/udp \
+  -p 6832:6832/udp \
+  -p 5778:5778 \
+  -p 16686:16686 \
+  -p 14268:14268 \
+  jaegertracing/all-in-one:1
+```
+
+Latest documentation is probably at
+https://www.jaegertracing.io/docs/latest/getting-started.
+
+## Enable OpenTracing in Synapse
+
+OpenTracing is not enabled by default. It must be enabled in the
+homeserver config by adding the `opentracing` option to your config file. You can find 
+documentation about how to do this in the [config manual under the header 'Opentracing'](../usage/configuration/config_documentation.md#opentracing).
+See below for an example Opentracing configuration: 
+
+```yaml
+opentracing:
+  enabled: true
+  homeserver_whitelist:
+    - "mytrustedhomeserver.org"
+    - "*.myotherhomeservers.com"
+```
+
+## Homeserver whitelisting
+
+The homeserver whitelist is configured using regular expressions. A list
+of regular expressions can be given and their union will be compared
+when propagating any spans contexts to another homeserver.
+
+Though it's mostly safe to send and receive span contexts to and from
+untrusted users since span contexts are usually opaque ids it can lead
+to two problems, namely:
+
+-   If the span context is marked as sampled by the sending homeserver
+    the receiver will sample it. Therefore two homeservers with wildly
+    different sampling policies could incur higher sampling counts than
+    intended.
+-   Sending servers can attach arbitrary data to spans, known as
+    'baggage'. For safety this has been disabled in Synapse but that
+    doesn't prevent another server sending you baggage which will be
+    logged to OpenTracing's logs.
+
+## Configuring Jaeger
+
+Sampling strategies can be set as in this document:
+<https://www.jaegertracing.io/docs/latest/sampling/>.
diff --git a/docs/development/room-dag-concepts.md b/docs/development/room-dag-concepts.md
deleted file mode 100644
index 76709487f8..0000000000
--- a/docs/development/room-dag-concepts.md
+++ /dev/null
@@ -1,113 +0,0 @@
-# Room DAG concepts
-
-## Edges
-
-The word "edge" comes from graph theory lingo. An edge is just a connection
-between two events. In Synapse, we connect events by specifying their
-`prev_events`. A subsequent event points back at a previous event.
-
-```
-A (oldest) <---- B <---- C (most recent)
-```
-
-
-## Depth and stream ordering
-
-Events are normally sorted by `(topological_ordering, stream_ordering)` where
-`topological_ordering` is just `depth`. In other words, we first sort by `depth`
-and then tie-break based on `stream_ordering`. `depth` is incremented as new
-messages are added to the DAG. Normally, `stream_ordering` is an auto
-incrementing integer, but backfilled events start with `stream_ordering=-1` and decrement.
-
----
-
- - `/sync` returns things in the order they arrive at the server (`stream_ordering`).
- - `/messages` (and `/backfill` in the federation API) return them in the order determined by the event graph `(topological_ordering, stream_ordering)`.
-
-The general idea is that, if you're following a room in real-time (i.e.
-`/sync`), you probably want to see the messages as they arrive at your server,
-rather than skipping any that arrived late; whereas if you're looking at a
-historical section of timeline (i.e. `/messages`), you want to see the best
-representation of the state of the room as others were seeing it at the time.
-
-## Outliers
-
-We mark an event as an `outlier` when we haven't figured out the state for the
-room at that point in the DAG yet. They are "floating" events that we haven't
-yet correlated to the DAG.
-
-Outliers typically arise when we fetch the auth chain or state for a given
-event. When that happens, we just grab the events in the state/auth chain,
-without calculating the state at those events, or backfilling their
-`prev_events`. Since we don't have the state at any events fetched in that
-way, we mark them as outliers.
-
-So, typically, we won't have the `prev_events` of an `outlier` in the database,
-(though it's entirely possible that we *might* have them for some other
-reason). Other things that make outliers different from regular events:
-
- * We don't have state for them, so there should be no entry in
-   `event_to_state_groups` for an outlier. (In practice this isn't always
-   the case, though I'm not sure why: see https://github.com/matrix-org/synapse/issues/12201).
-
- * We don't record entries for them in the `event_edges`,
-   `event_forward_extremeties` or `event_backward_extremities` tables.
-
-Since outliers are not tied into the DAG, they do not normally form part of the
-timeline sent down to clients via `/sync` or `/messages`; however there is an
-exception:
-
-### Out-of-band membership events
-
-A special case of outlier events are some membership events for federated rooms
-that we aren't full members of. For example:
-
- * invites received over federation, before we join the room
- * *rejections* for said invites
- * knock events for rooms that we would like to join but have not yet joined.
-
-In all the above cases, we don't have the state for the room, which is why they
-are treated as outliers. They are a bit special though, in that they are
-proactively sent to clients via `/sync`.
-
-## Forward extremity
-
-Most-recent-in-time events in the DAG which are not referenced by any other
-events' `prev_events` yet. (In this definition, outliers, rejected events, and
-soft-failed events don't count.)
-
-The forward extremities of a room (or at least, a subset of them, if there are
-more than ten) are used as the `prev_events` when the next event is sent.
-
-The "current state" of a room (ie: the state which would be used if we
-generated a new event) is, therefore, the resolution of the room states
-at each of the forward extremities.
-
-## Backward extremity
-
-The current marker of where we have backfilled up to and will generally be the
-`prev_events` of the oldest-in-time events we have in the DAG. This gives a starting point when
-backfilling history.
-
-Note that, unlike forward extremities, we typically don't have any backward
-extremity events themselves in the database - or, if we do, they will be "outliers" (see
-above). Either way, we don't expect to have the room state at a backward extremity.
-
-When we persist a non-outlier event, if it was previously a backward extremity,
-we clear it as a backward extremity and set all of its `prev_events` as the new
-backward extremities if they aren't already persisted as non-outliers. This
-therefore keeps the backward extremities up-to-date.
-
-## State groups
-
-For every non-outlier event we need to know the state at that event. Instead of
-storing the full state for each event in the DB (i.e. a `event_id -> state`
-mapping), which is *very* space inefficient when state doesn't change, we
-instead assign each different set of state a "state group" and then have
-mappings of `event_id -> state_group` and `state_group -> state`.
-
-
-### Stage group edges
-
-TODO: `state_group_edges` is a further optimization...
-      notes from @Azrenbeth, https://pastebin.com/seUGVGeT
diff --git a/docs/development/saml.md b/docs/development/saml.md
deleted file mode 100644
index b08bcb7419..0000000000
--- a/docs/development/saml.md
+++ /dev/null
@@ -1,40 +0,0 @@
-# How to test SAML as a developer without a server
-
-https://fujifish.github.io/samling/samling.html (https://github.com/fujifish/samling) is a great resource for being able to tinker with the 
-SAML options within Synapse without needing to deploy and configure a complicated software stack.
-
-To make Synapse (and therefore Element) use it:
-
-1. Use the samling.html URL above or deploy your own and visit the IdP Metadata tab.
-2. Copy the XML to your clipboard.
-3. On your Synapse server, create a new file `samling.xml` next to your `homeserver.yaml` with
-   the XML from step 2 as the contents.
-4. Edit your `homeserver.yaml` to include:
-   ```yaml
-   saml2_config:
-     sp_config:
-       allow_unknown_attributes: true  # Works around a bug with AVA Hashes: https://github.com/IdentityPython/pysaml2/issues/388
-       metadata:
-         local: ["samling.xml"]
-   ```
-5. Ensure that your `homeserver.yaml` has a setting for `public_baseurl`:
-   ```yaml
-   public_baseurl: http://localhost:8080/
-   ```
-6. Run `apt-get install xmlsec1` and `pip install --upgrade --force 'pysaml2>=4.5.0'` to ensure
-   the dependencies are installed and ready to go.
-7. Restart Synapse.
-
-Then in Element:
-
-1. Visit the login page and point Element towards your homeserver using the `public_baseurl` above.
-2. Click the Single Sign-On button.
-3. On the samling page, enter a Name Identifier and add a SAML Attribute for `uid=your_localpart`.
-   The response must also be signed.
-4. Click "Next".
-5. Click "Post Response" (change nothing).
-6. You should be logged in.
-
-If you try and repeat this process, you may be automatically logged in using the information you
-gave previously. To fix this, open your developer console (`F12` or `Ctrl+Shift+I`) while on the
-samling page and clear the site data. In Chrome, this will be a button on the Application tab.
diff --git a/docs/development/synapse_architecture/log_contexts.md b/docs/development/synapse_architecture/log_contexts.md
new file mode 100644
index 0000000000..cb15dbe158
--- /dev/null
+++ b/docs/development/synapse_architecture/log_contexts.md
@@ -0,0 +1,364 @@
+# Log Contexts
+
+To help track the processing of individual requests, synapse uses a
+'`log context`' to track which request it is handling at any given
+moment. This is done via a thread-local variable; a `logging.Filter` is
+then used to fish the information back out of the thread-local variable
+and add it to each log record.
+
+Logcontexts are also used for CPU and database accounting, so that we
+can track which requests were responsible for high CPU use or database
+activity.
+
+The `synapse.logging.context` module provides facilities for managing
+the current log context (as well as providing the `LoggingContextFilter`
+class).
+
+Asynchronous functions make the whole thing complicated, so this document describes
+how it all works, and how to write code which follows the rules.
+
+In this document, "awaitable" refers to any object which can be `await`ed. In the context of
+Synapse, that normally means either a coroutine or a Twisted 
+[`Deferred`](https://twistedmatrix.com/documents/current/api/twisted.internet.defer.Deferred.html).
+
+## Logcontexts without asynchronous code
+
+In the absence of any asynchronous voodoo, things are simple enough. As with
+any code of this nature, the rule is that our function should leave
+things as it found them:
+
+```python
+from synapse.logging import context         # omitted from future snippets
+
+def handle_request(request_id):
+    request_context = context.LoggingContext()
+
+    calling_context = context.set_current_context(request_context)
+    try:
+        request_context.request = request_id
+        do_request_handling()
+        logger.debug("finished")
+    finally:
+        context.set_current_context(calling_context)
+
+def do_request_handling():
+    logger.debug("phew")  # this will be logged against request_id
+```
+
+LoggingContext implements the context management methods, so the above
+can be written much more succinctly as:
+
+```python
+def handle_request(request_id):
+    with context.LoggingContext() as request_context:
+        request_context.request = request_id
+        do_request_handling()
+        logger.debug("finished")
+
+def do_request_handling():
+    logger.debug("phew")
+```
+
+## Using logcontexts with awaitables
+
+Awaitables break the linear flow of code so that there is no longer a single entry point
+where we should set the logcontext and a single exit point where we should remove it.
+
+Consider the example above, where `do_request_handling` needs to do some
+blocking operation, and returns an awaitable:
+
+```python
+async def handle_request(request_id):
+    with context.LoggingContext() as request_context:
+        request_context.request = request_id
+        await do_request_handling()
+        logger.debug("finished")
+```
+
+In the above flow:
+
+-   The logcontext is set
+-   `do_request_handling` is called, and returns an awaitable
+-   `handle_request` awaits the awaitable
+-   Execution of `handle_request` is suspended
+
+So we have stopped processing the request (and will probably go on to
+start processing the next), without clearing the logcontext.
+
+To circumvent this problem, synapse code assumes that, wherever you have
+an awaitable, you will want to `await` it. To that end, whereever
+functions return awaitables, we adopt the following conventions:
+
+**Rules for functions returning awaitables:**
+
+> -   If the awaitable is already complete, the function returns with the
+>     same logcontext it started with.
+> -   If the awaitable is incomplete, the function clears the logcontext
+>     before returning; when the awaitable completes, it restores the
+>     logcontext before running any callbacks.
+
+That sounds complicated, but actually it means a lot of code (including
+the example above) "just works". There are two cases:
+
+-   If `do_request_handling` returns a completed awaitable, then the
+    logcontext will still be in place. In this case, execution will
+    continue immediately after the `await`; the "finished" line will
+    be logged against the right context, and the `with` block restores
+    the original context before we return to the caller.
+-   If the returned awaitable is incomplete, `do_request_handling` clears
+    the logcontext before returning. The logcontext is therefore clear
+    when `handle_request` `await`s the awaitable.
+
+    Once `do_request_handling`'s awaitable completes, it will reinstate
+    the logcontext, before running the second half of `handle_request`,
+    so again the "finished" line will be logged against the right context,
+    and the `with` block restores the original context.
+
+As an aside, it's worth noting that `handle_request` follows our rules
+- though that only matters if the caller has its own logcontext which it
+cares about.
+
+The following sections describe pitfalls and helpful patterns when
+implementing these rules.
+
+Always await your awaitables
+----------------------------
+
+Whenever you get an awaitable back from a function, you should `await` on
+it as soon as possible. Do not pass go; do not do any logging; do not
+call any other functions.
+
+```python
+async def fun():
+    logger.debug("starting")
+    await do_some_stuff()       # just like this
+
+    coro = more_stuff()
+    result = await coro         # also fine, of course
+
+    return result
+```
+
+Provided this pattern is followed all the way back up to the callchain
+to where the logcontext was set, this will make things work out ok:
+provided `do_some_stuff` and `more_stuff` follow the rules above, then
+so will `fun`.
+
+It's all too easy to forget to `await`: for instance if we forgot that
+`do_some_stuff` returned an awaitable, we might plough on regardless. This
+leads to a mess; it will probably work itself out eventually, but not
+before a load of stuff has been logged against the wrong context.
+(Normally, other things will break, more obviously, if you forget to
+`await`, so this tends not to be a major problem in practice.)
+
+Of course sometimes you need to do something a bit fancier with your
+awaitable - not all code follows the linear A-then-B-then-C pattern.
+Notes on implementing more complex patterns are in later sections.
+
+## Where you create a new awaitable, make it follow the rules
+
+Most of the time, an awaitable comes from another synapse function.
+Sometimes, though, we need to make up a new awaitable, or we get an awaitable
+back from external code. We need to make it follow our rules.
+
+The easy way to do it is by using `context.make_deferred_yieldable`. Suppose we want to implement
+`sleep`, which returns a deferred which will run its callbacks after a
+given number of seconds. That might look like:
+
+```python
+# not a logcontext-rules-compliant function
+def get_sleep_deferred(seconds):
+    d = defer.Deferred()
+    reactor.callLater(seconds, d.callback, None)
+    return d
+```
+
+That doesn't follow the rules, but we can fix it by calling it through
+`context.make_deferred_yieldable`:
+
+```python
+async def sleep(seconds):
+    return await context.make_deferred_yieldable(get_sleep_deferred(seconds))
+```
+
+## Fire-and-forget
+
+Sometimes you want to fire off a chain of execution, but not wait for
+its result. That might look a bit like this:
+
+```python
+async def do_request_handling():
+    await foreground_operation()
+
+    # *don't* do this
+    background_operation()
+
+    logger.debug("Request handling complete")
+
+async def background_operation():
+    await first_background_step()
+    logger.debug("Completed first step")
+    await second_background_step()
+    logger.debug("Completed second step")
+```
+
+The above code does a couple of steps in the background after
+`do_request_handling` has finished. The log lines are still logged
+against the `request_context` logcontext, which may or may not be
+desirable. There are two big problems with the above, however. The first
+problem is that, if `background_operation` returns an incomplete
+awaitable, it will expect its caller to `await` immediately, so will have
+cleared the logcontext. In this example, that means that 'Request
+handling complete' will be logged without any context.
+
+The second problem, which is potentially even worse, is that when the
+awaitable returned by `background_operation` completes, it will restore
+the original logcontext. There is nothing waiting on that awaitable, so
+the logcontext will leak into the reactor and possibly get attached to
+some arbitrary future operation.
+
+There are two potential solutions to this.
+
+One option is to surround the call to `background_operation` with a
+`PreserveLoggingContext` call. That will reset the logcontext before
+starting `background_operation` (so the context restored when the
+deferred completes will be the empty logcontext), and will restore the
+current logcontext before continuing the foreground process:
+
+```python
+async def do_request_handling():
+    await foreground_operation()
+
+    # start background_operation off in the empty logcontext, to
+    # avoid leaking the current context into the reactor.
+    with PreserveLoggingContext():
+        background_operation()
+
+    # this will now be logged against the request context
+    logger.debug("Request handling complete")
+```
+
+Obviously that option means that the operations done in
+`background_operation` would be not be logged against a logcontext
+(though that might be fixed by setting a different logcontext via a
+`with LoggingContext(...)` in `background_operation`).
+
+The second option is to use `context.run_in_background`, which wraps a
+function so that it doesn't reset the logcontext even when it returns
+an incomplete awaitable, and adds a callback to the returned awaitable to
+reset the logcontext. In other words, it turns a function that follows
+the Synapse rules about logcontexts and awaitables into one which behaves
+more like an external function --- the opposite operation to that
+described in the previous section. It can be used like this:
+
+```python
+async def do_request_handling():
+    await foreground_operation()
+
+    context.run_in_background(background_operation)
+
+    # this will now be logged against the request context
+    logger.debug("Request handling complete")
+```
+
+## Passing synapse deferreds into third-party functions
+
+A typical example of this is where we want to collect together two or
+more awaitables via `defer.gatherResults`:
+
+```python
+a1 = operation1()
+a2 = operation2()
+a3 = defer.gatherResults([a1, a2])
+```
+
+This is really a variation of the fire-and-forget problem above, in that
+we are firing off `a1` and `a2` without awaiting on them. The difference
+is that we now have third-party code attached to their callbacks. Anyway
+either technique given in the [Fire-and-forget](#fire-and-forget)
+section will work.
+
+Of course, the new awaitable returned by `gather` needs to be
+wrapped in order to make it follow the logcontext rules before we can
+yield it, as described in [Where you create a new awaitable, make it
+follow the
+rules](#where-you-create-a-new-awaitable-make-it-follow-the-rules).
+
+So, option one: reset the logcontext before starting the operations to
+be gathered:
+
+```python
+async def do_request_handling():
+    with PreserveLoggingContext():
+        a1 = operation1()
+        a2 = operation2()
+        result = await defer.gatherResults([a1, a2])
+```
+
+In this case particularly, though, option two, of using
+`context.run_in_background` almost certainly makes more sense, so that
+`operation1` and `operation2` are both logged against the original
+logcontext. This looks like:
+
+```python
+async def do_request_handling():
+    a1 = context.run_in_background(operation1)
+    a2 = context.run_in_background(operation2)
+
+    result = await make_deferred_yieldable(defer.gatherResults([a1, a2]))
+```
+
+## A note on garbage-collection of awaitable chains
+
+It turns out that our logcontext rules do not play nicely with awaitable
+chains which get orphaned and garbage-collected.
+
+Imagine we have some code that looks like this:
+
+```python
+listener_queue = []
+
+def on_something_interesting():
+    for d in listener_queue:
+        d.callback("foo")
+
+async def await_something_interesting():
+    new_awaitable = defer.Deferred()
+    listener_queue.append(new_awaitable)
+
+    with PreserveLoggingContext():
+        await new_awaitable
+```
+
+Obviously, the idea here is that we have a bunch of things which are
+waiting for an event. (It's just an example of the problem here, but a
+relatively common one.)
+
+Now let's imagine two further things happen. First of all, whatever was
+waiting for the interesting thing goes away. (Perhaps the request times
+out, or something *even more* interesting happens.)
+
+Secondly, let's suppose that we decide that the interesting thing is
+never going to happen, and we reset the listener queue:
+
+```python
+def reset_listener_queue():
+    listener_queue.clear()
+```
+
+So, both ends of the awaitable chain have now dropped their references,
+and the awaitable chain is now orphaned, and will be garbage-collected at
+some point. Note that `await_something_interesting` is a coroutine, 
+which Python implements as a generator function.  When Python
+garbage-collects generator functions, it gives them a chance to 
+clean up by making the `await` (or `yield`) raise a `GeneratorExit`
+exception. In our case, that means that the `__exit__` handler of
+`PreserveLoggingContext` will carefully restore the request context, but
+there is now nothing waiting for its return, so the request context is
+never cleared.
+
+To reiterate, this problem only arises when *both* ends of a awaitable
+chain are dropped. Dropping the the reference to an awaitable you're
+supposed to be awaiting is bad practice, so this doesn't
+actually happen too much. Unfortunately, when it does happen, it will
+lead to leaked logcontexts which are incredibly hard to track down.
diff --git a/docs/development/synapse_architecture/replication.md b/docs/development/synapse_architecture/replication.md
new file mode 100644
index 0000000000..108da9a065
--- /dev/null
+++ b/docs/development/synapse_architecture/replication.md
@@ -0,0 +1,42 @@
+# Replication Architecture
+
+## Motivation
+
+We'd like to be able to split some of the work that synapse does into
+multiple python processes. In theory multiple synapse processes could
+share a single postgresql database and we\'d scale up by running more
+synapse processes. However much of synapse assumes that only one process
+is interacting with the database, both for assigning unique identifiers
+when inserting into tables, notifying components about new updates, and
+for invalidating its caches.
+
+So running multiple copies of the current code isn't an option. One way
+to run multiple processes would be to have a single writer process and
+multiple reader processes connected to the same database. In order to do
+this we'd need a way for the reader process to invalidate its in-memory
+caches when an update happens on the writer. One way to do this is for
+the writer to present an append-only log of updates which the readers
+can consume to invalidate their caches and to push updates to listening
+clients or pushers.
+
+Synapse already stores much of its data as an append-only log so that it
+can correctly respond to `/sync` requests so the amount of code changes
+needed to expose the append-only log to the readers should be fairly
+minimal.
+
+## Architecture
+
+### The Replication Protocol
+
+See [the TCP replication documentation](tcp_replication.md).
+
+### The Slaved DataStore
+
+There are read-only version of the synapse storage layer in
+`synapse/replication/slave/storage` that use the response of the
+replication API to invalidate their caches.
+
+### The TCP Replication Module
+Information about how the tcp replication module is structured, including how
+the classes interact, can be found in
+`synapse/replication/tcp/__init__.py`
diff --git a/docs/development/synapse_architecture/tcp_replication.md b/docs/development/synapse_architecture/tcp_replication.md
new file mode 100644
index 0000000000..15df949deb
--- /dev/null
+++ b/docs/development/synapse_architecture/tcp_replication.md
@@ -0,0 +1,257 @@
+# TCP Replication
+
+## Motivation
+
+Previously the workers used an HTTP long poll mechanism to get updates
+from the master, which had the problem of causing a lot of duplicate
+work on the server. This TCP protocol replaces those APIs with the aim
+of increased efficiency.
+
+## Overview
+
+The protocol is based on fire and forget, line based commands. An
+example flow would be (where '>' indicates master to worker and
+'<' worker to master flows):
+
+    > SERVER example.com
+    < REPLICATE
+    > POSITION events master 53 53
+    > RDATA events master 54 ["$foo1:bar.com", ...]
+    > RDATA events master 55 ["$foo4:bar.com", ...]
+
+The example shows the server accepting a new connection and sending its identity
+with the `SERVER` command, followed by the client server to respond with the
+position of all streams. The server then periodically sends `RDATA` commands
+which have the format `RDATA <stream_name> <instance_name> <token> <row>`, where
+the format of `<row>` is defined by the individual streams. The
+`<instance_name>` is the name of the Synapse process that generated the data
+(usually "master").
+
+Error reporting happens by either the client or server sending an ERROR
+command, and usually the connection will be closed.
+
+Since the protocol is a simple line based, its possible to manually
+connect to the server using a tool like netcat. A few things should be
+noted when manually using the protocol:
+
+-   The federation stream is only available if federation sending has
+    been disabled on the main process.
+-   The server will only time connections out that have sent a `PING`
+    command. If a ping is sent then the connection will be closed if no
+    further commands are receieved within 15s. Both the client and
+    server protocol implementations will send an initial PING on
+    connection and ensure at least one command every 5s is sent (not
+    necessarily `PING`).
+-   `RDATA` commands *usually* include a numeric token, however if the
+    stream has multiple rows to replicate per token the server will send
+    multiple `RDATA` commands, with all but the last having a token of
+    `batch`. See the documentation on `commands.RdataCommand` for
+    further details.
+
+## Architecture
+
+The basic structure of the protocol is line based, where the initial
+word of each line specifies the command. The rest of the line is parsed
+based on the command. For example, the RDATA command is defined as:
+
+    RDATA <stream_name> <instance_name> <token> <row_json>
+
+(Note that <row_json> may contains spaces, but cannot contain
+newlines.)
+
+Blank lines are ignored.
+
+### Keep alives
+
+Both sides are expected to send at least one command every 5s or so, and
+should send a `PING` command if necessary. If either side do not receive
+a command within e.g. 15s then the connection should be closed.
+
+Because the server may be connected to manually using e.g. netcat, the
+timeouts aren't enabled until an initial `PING` command is seen. Both
+the client and server implementations below send a `PING` command
+immediately on connection to ensure the timeouts are enabled.
+
+This ensures that both sides can quickly realize if the tcp connection
+has gone and handle the situation appropriately.
+
+### Start up
+
+When a new connection is made, the server:
+
+-   Sends a `SERVER` command, which includes the identity of the server,
+    allowing the client to detect if its connected to the expected
+    server
+-   Sends a `PING` command as above, to enable the client to time out
+    connections promptly.
+
+The client:
+
+-   Sends a `NAME` command, allowing the server to associate a human
+    friendly name with the connection. This is optional.
+-   Sends a `PING` as above
+-   Sends a `REPLICATE` to get the current position of all streams.
+-   On receipt of a `SERVER` command, checks that the server name
+    matches the expected server name.
+
+### Error handling
+
+If either side detects an error it can send an `ERROR` command and close
+the connection.
+
+If the client side loses the connection to the server it should
+reconnect, following the steps above.
+
+### Congestion
+
+If the server sends messages faster than the client can consume them the
+server will first buffer a (fairly large) number of commands and then
+disconnect the client. This ensures that we don't queue up an unbounded
+number of commands in memory and gives us a potential oppurtunity to
+squawk loudly. When/if the client recovers it can reconnect to the
+server and ask for missed messages.
+
+### Reliability
+
+In general the replication stream should be considered an unreliable
+transport since e.g. commands are not resent if the connection
+disappears.
+
+The exception to that are the replication streams, i.e. RDATA commands,
+since these include tokens which can be used to restart the stream on
+connection errors.
+
+The client should keep track of the token in the last RDATA command
+received for each stream so that on reconneciton it can start streaming
+from the correct place. Note: not all RDATA have valid tokens due to
+batching. See `RdataCommand` for more details.
+
+### Example
+
+An example iteraction is shown below. Each line is prefixed with '>'
+or '<' to indicate which side is sending, these are *not* included on
+the wire:
+
+    * connection established *
+    > SERVER localhost:8823
+    > PING 1490197665618
+    < NAME synapse.app.appservice
+    < PING 1490197665618
+    < REPLICATE
+    > POSITION events master 1 1
+    > POSITION backfill master 1 1
+    > POSITION caches master 1 1
+    > RDATA caches master 2 ["get_user_by_id",["@01register-user:localhost:8823"],1490197670513]
+    > RDATA events master 14 ["$149019767112vOHxz:localhost:8823",
+        "!AFDCvgApUmpdfVjIXm:localhost:8823","m.room.guest_access","",null]
+    < PING 1490197675618
+    > ERROR server stopping
+    * connection closed by server *
+
+The `POSITION` command sent by the server is used to set the clients
+position without needing to send data with the `RDATA` command.
+
+An example of a batched set of `RDATA` is:
+
+    > RDATA caches master batch ["get_user_by_id",["@test:localhost:8823"],1490197670513]
+    > RDATA caches master batch ["get_user_by_id",["@test2:localhost:8823"],1490197670513]
+    > RDATA caches master batch ["get_user_by_id",["@test3:localhost:8823"],1490197670513]
+    > RDATA caches master 54 ["get_user_by_id",["@test4:localhost:8823"],1490197670513]
+
+In this case the client shouldn't advance their caches token until it
+sees the the last `RDATA`.
+
+### List of commands
+
+The list of valid commands, with which side can send it: server (S) or
+client (C):
+
+#### SERVER (S)
+
+   Sent at the start to identify which server the client is talking to
+
+#### RDATA (S)
+
+   A single update in a stream
+
+#### POSITION (S)
+
+   On receipt of a POSITION command clients should check if they have missed any
+   updates, and if so then fetch them out of band. Sent in response to a
+   REPLICATE command (but can happen at any time).
+
+   The POSITION command includes the source of the stream. Currently all streams
+   are written by a single process (usually "master"). If fetching missing
+   updates via HTTP API, rather than via the DB, then processes should make the
+   request to the appropriate process.
+
+   Two positions are included, the "new" position and the last position sent respectively.
+   This allows servers to tell instances that the positions have advanced but no
+   data has been written, without clients needlessly checking to see if they
+   have missed any updates.
+
+#### ERROR (S, C)
+
+   There was an error
+
+#### PING (S, C)
+
+   Sent periodically to ensure the connection is still alive
+
+#### NAME (C)
+
+   Sent at the start by client to inform the server who they are
+
+#### REPLICATE (C)
+
+Asks the server for the current position of all streams.
+
+#### USER_SYNC (C)
+
+   A user has started or stopped syncing on this process.
+
+#### CLEAR_USER_SYNC (C)
+
+   The server should clear all associated user sync data from the worker.
+
+   This is used when a worker is shutting down.
+
+#### FEDERATION_ACK (C)
+
+   Acknowledge receipt of some federation data
+
+### REMOTE_SERVER_UP (S, C)
+
+   Inform other processes that a remote server may have come back online.
+
+See `synapse/replication/tcp/commands.py` for a detailed description and
+the format of each command.
+
+### Cache Invalidation Stream
+
+The cache invalidation stream is used to inform workers when they need
+to invalidate any of their caches in the data store. This is done by
+streaming all cache invalidations done on master down to the workers,
+assuming that any caches on the workers also exist on the master.
+
+Each individual cache invalidation results in a row being sent down
+replication, which includes the cache name (the name of the function)
+and they key to invalidate. For example:
+
+    > RDATA caches master 550953771 ["get_user_by_id", ["@bob:example.com"], 1550574873251]
+
+Alternatively, an entire cache can be invalidated by sending down a `null`
+instead of the key. For example:
+
+    > RDATA caches master 550953772 ["get_user_by_id", null, 1550574873252]
+
+However, there are times when a number of caches need to be invalidated
+at the same time with the same key. To reduce traffic we batch those
+invalidations into a single poke by defining a special cache name that
+workers understand to mean to expand to invalidate the correct caches.
+
+Currently the special cache names are declared in
+`synapse/storage/_base.py` and are:
+
+1.  `cs_cache_fake` ─ invalidates caches that depend on the current
+    state
diff --git a/docs/log_contexts.md b/docs/log_contexts.md
deleted file mode 100644
index cb15dbe158..0000000000
--- a/docs/log_contexts.md
+++ /dev/null
@@ -1,364 +0,0 @@
-# Log Contexts
-
-To help track the processing of individual requests, synapse uses a
-'`log context`' to track which request it is handling at any given
-moment. This is done via a thread-local variable; a `logging.Filter` is
-then used to fish the information back out of the thread-local variable
-and add it to each log record.
-
-Logcontexts are also used for CPU and database accounting, so that we
-can track which requests were responsible for high CPU use or database
-activity.
-
-The `synapse.logging.context` module provides facilities for managing
-the current log context (as well as providing the `LoggingContextFilter`
-class).
-
-Asynchronous functions make the whole thing complicated, so this document describes
-how it all works, and how to write code which follows the rules.
-
-In this document, "awaitable" refers to any object which can be `await`ed. In the context of
-Synapse, that normally means either a coroutine or a Twisted 
-[`Deferred`](https://twistedmatrix.com/documents/current/api/twisted.internet.defer.Deferred.html).
-
-## Logcontexts without asynchronous code
-
-In the absence of any asynchronous voodoo, things are simple enough. As with
-any code of this nature, the rule is that our function should leave
-things as it found them:
-
-```python
-from synapse.logging import context         # omitted from future snippets
-
-def handle_request(request_id):
-    request_context = context.LoggingContext()
-
-    calling_context = context.set_current_context(request_context)
-    try:
-        request_context.request = request_id
-        do_request_handling()
-        logger.debug("finished")
-    finally:
-        context.set_current_context(calling_context)
-
-def do_request_handling():
-    logger.debug("phew")  # this will be logged against request_id
-```
-
-LoggingContext implements the context management methods, so the above
-can be written much more succinctly as:
-
-```python
-def handle_request(request_id):
-    with context.LoggingContext() as request_context:
-        request_context.request = request_id
-        do_request_handling()
-        logger.debug("finished")
-
-def do_request_handling():
-    logger.debug("phew")
-```
-
-## Using logcontexts with awaitables
-
-Awaitables break the linear flow of code so that there is no longer a single entry point
-where we should set the logcontext and a single exit point where we should remove it.
-
-Consider the example above, where `do_request_handling` needs to do some
-blocking operation, and returns an awaitable:
-
-```python
-async def handle_request(request_id):
-    with context.LoggingContext() as request_context:
-        request_context.request = request_id
-        await do_request_handling()
-        logger.debug("finished")
-```
-
-In the above flow:
-
--   The logcontext is set
--   `do_request_handling` is called, and returns an awaitable
--   `handle_request` awaits the awaitable
--   Execution of `handle_request` is suspended
-
-So we have stopped processing the request (and will probably go on to
-start processing the next), without clearing the logcontext.
-
-To circumvent this problem, synapse code assumes that, wherever you have
-an awaitable, you will want to `await` it. To that end, whereever
-functions return awaitables, we adopt the following conventions:
-
-**Rules for functions returning awaitables:**
-
-> -   If the awaitable is already complete, the function returns with the
->     same logcontext it started with.
-> -   If the awaitable is incomplete, the function clears the logcontext
->     before returning; when the awaitable completes, it restores the
->     logcontext before running any callbacks.
-
-That sounds complicated, but actually it means a lot of code (including
-the example above) "just works". There are two cases:
-
--   If `do_request_handling` returns a completed awaitable, then the
-    logcontext will still be in place. In this case, execution will
-    continue immediately after the `await`; the "finished" line will
-    be logged against the right context, and the `with` block restores
-    the original context before we return to the caller.
--   If the returned awaitable is incomplete, `do_request_handling` clears
-    the logcontext before returning. The logcontext is therefore clear
-    when `handle_request` `await`s the awaitable.
-
-    Once `do_request_handling`'s awaitable completes, it will reinstate
-    the logcontext, before running the second half of `handle_request`,
-    so again the "finished" line will be logged against the right context,
-    and the `with` block restores the original context.
-
-As an aside, it's worth noting that `handle_request` follows our rules
-- though that only matters if the caller has its own logcontext which it
-cares about.
-
-The following sections describe pitfalls and helpful patterns when
-implementing these rules.
-
-Always await your awaitables
-----------------------------
-
-Whenever you get an awaitable back from a function, you should `await` on
-it as soon as possible. Do not pass go; do not do any logging; do not
-call any other functions.
-
-```python
-async def fun():
-    logger.debug("starting")
-    await do_some_stuff()       # just like this
-
-    coro = more_stuff()
-    result = await coro         # also fine, of course
-
-    return result
-```
-
-Provided this pattern is followed all the way back up to the callchain
-to where the logcontext was set, this will make things work out ok:
-provided `do_some_stuff` and `more_stuff` follow the rules above, then
-so will `fun`.
-
-It's all too easy to forget to `await`: for instance if we forgot that
-`do_some_stuff` returned an awaitable, we might plough on regardless. This
-leads to a mess; it will probably work itself out eventually, but not
-before a load of stuff has been logged against the wrong context.
-(Normally, other things will break, more obviously, if you forget to
-`await`, so this tends not to be a major problem in practice.)
-
-Of course sometimes you need to do something a bit fancier with your
-awaitable - not all code follows the linear A-then-B-then-C pattern.
-Notes on implementing more complex patterns are in later sections.
-
-## Where you create a new awaitable, make it follow the rules
-
-Most of the time, an awaitable comes from another synapse function.
-Sometimes, though, we need to make up a new awaitable, or we get an awaitable
-back from external code. We need to make it follow our rules.
-
-The easy way to do it is by using `context.make_deferred_yieldable`. Suppose we want to implement
-`sleep`, which returns a deferred which will run its callbacks after a
-given number of seconds. That might look like:
-
-```python
-# not a logcontext-rules-compliant function
-def get_sleep_deferred(seconds):
-    d = defer.Deferred()
-    reactor.callLater(seconds, d.callback, None)
-    return d
-```
-
-That doesn't follow the rules, but we can fix it by calling it through
-`context.make_deferred_yieldable`:
-
-```python
-async def sleep(seconds):
-    return await context.make_deferred_yieldable(get_sleep_deferred(seconds))
-```
-
-## Fire-and-forget
-
-Sometimes you want to fire off a chain of execution, but not wait for
-its result. That might look a bit like this:
-
-```python
-async def do_request_handling():
-    await foreground_operation()
-
-    # *don't* do this
-    background_operation()
-
-    logger.debug("Request handling complete")
-
-async def background_operation():
-    await first_background_step()
-    logger.debug("Completed first step")
-    await second_background_step()
-    logger.debug("Completed second step")
-```
-
-The above code does a couple of steps in the background after
-`do_request_handling` has finished. The log lines are still logged
-against the `request_context` logcontext, which may or may not be
-desirable. There are two big problems with the above, however. The first
-problem is that, if `background_operation` returns an incomplete
-awaitable, it will expect its caller to `await` immediately, so will have
-cleared the logcontext. In this example, that means that 'Request
-handling complete' will be logged without any context.
-
-The second problem, which is potentially even worse, is that when the
-awaitable returned by `background_operation` completes, it will restore
-the original logcontext. There is nothing waiting on that awaitable, so
-the logcontext will leak into the reactor and possibly get attached to
-some arbitrary future operation.
-
-There are two potential solutions to this.
-
-One option is to surround the call to `background_operation` with a
-`PreserveLoggingContext` call. That will reset the logcontext before
-starting `background_operation` (so the context restored when the
-deferred completes will be the empty logcontext), and will restore the
-current logcontext before continuing the foreground process:
-
-```python
-async def do_request_handling():
-    await foreground_operation()
-
-    # start background_operation off in the empty logcontext, to
-    # avoid leaking the current context into the reactor.
-    with PreserveLoggingContext():
-        background_operation()
-
-    # this will now be logged against the request context
-    logger.debug("Request handling complete")
-```
-
-Obviously that option means that the operations done in
-`background_operation` would be not be logged against a logcontext
-(though that might be fixed by setting a different logcontext via a
-`with LoggingContext(...)` in `background_operation`).
-
-The second option is to use `context.run_in_background`, which wraps a
-function so that it doesn't reset the logcontext even when it returns
-an incomplete awaitable, and adds a callback to the returned awaitable to
-reset the logcontext. In other words, it turns a function that follows
-the Synapse rules about logcontexts and awaitables into one which behaves
-more like an external function --- the opposite operation to that
-described in the previous section. It can be used like this:
-
-```python
-async def do_request_handling():
-    await foreground_operation()
-
-    context.run_in_background(background_operation)
-
-    # this will now be logged against the request context
-    logger.debug("Request handling complete")
-```
-
-## Passing synapse deferreds into third-party functions
-
-A typical example of this is where we want to collect together two or
-more awaitables via `defer.gatherResults`:
-
-```python
-a1 = operation1()
-a2 = operation2()
-a3 = defer.gatherResults([a1, a2])
-```
-
-This is really a variation of the fire-and-forget problem above, in that
-we are firing off `a1` and `a2` without awaiting on them. The difference
-is that we now have third-party code attached to their callbacks. Anyway
-either technique given in the [Fire-and-forget](#fire-and-forget)
-section will work.
-
-Of course, the new awaitable returned by `gather` needs to be
-wrapped in order to make it follow the logcontext rules before we can
-yield it, as described in [Where you create a new awaitable, make it
-follow the
-rules](#where-you-create-a-new-awaitable-make-it-follow-the-rules).
-
-So, option one: reset the logcontext before starting the operations to
-be gathered:
-
-```python
-async def do_request_handling():
-    with PreserveLoggingContext():
-        a1 = operation1()
-        a2 = operation2()
-        result = await defer.gatherResults([a1, a2])
-```
-
-In this case particularly, though, option two, of using
-`context.run_in_background` almost certainly makes more sense, so that
-`operation1` and `operation2` are both logged against the original
-logcontext. This looks like:
-
-```python
-async def do_request_handling():
-    a1 = context.run_in_background(operation1)
-    a2 = context.run_in_background(operation2)
-
-    result = await make_deferred_yieldable(defer.gatherResults([a1, a2]))
-```
-
-## A note on garbage-collection of awaitable chains
-
-It turns out that our logcontext rules do not play nicely with awaitable
-chains which get orphaned and garbage-collected.
-
-Imagine we have some code that looks like this:
-
-```python
-listener_queue = []
-
-def on_something_interesting():
-    for d in listener_queue:
-        d.callback("foo")
-
-async def await_something_interesting():
-    new_awaitable = defer.Deferred()
-    listener_queue.append(new_awaitable)
-
-    with PreserveLoggingContext():
-        await new_awaitable
-```
-
-Obviously, the idea here is that we have a bunch of things which are
-waiting for an event. (It's just an example of the problem here, but a
-relatively common one.)
-
-Now let's imagine two further things happen. First of all, whatever was
-waiting for the interesting thing goes away. (Perhaps the request times
-out, or something *even more* interesting happens.)
-
-Secondly, let's suppose that we decide that the interesting thing is
-never going to happen, and we reset the listener queue:
-
-```python
-def reset_listener_queue():
-    listener_queue.clear()
-```
-
-So, both ends of the awaitable chain have now dropped their references,
-and the awaitable chain is now orphaned, and will be garbage-collected at
-some point. Note that `await_something_interesting` is a coroutine, 
-which Python implements as a generator function.  When Python
-garbage-collects generator functions, it gives them a chance to 
-clean up by making the `await` (or `yield`) raise a `GeneratorExit`
-exception. In our case, that means that the `__exit__` handler of
-`PreserveLoggingContext` will carefully restore the request context, but
-there is now nothing waiting for its return, so the request context is
-never cleared.
-
-To reiterate, this problem only arises when *both* ends of a awaitable
-chain are dropped. Dropping the the reference to an awaitable you're
-supposed to be awaiting is bad practice, so this doesn't
-actually happen too much. Unfortunately, when it does happen, it will
-lead to leaked logcontexts which are incredibly hard to track down.
diff --git a/docs/media_repository.md b/docs/media_repository.md
deleted file mode 100644
index 23e6da7f31..0000000000
--- a/docs/media_repository.md
+++ /dev/null
@@ -1,78 +0,0 @@
-# Media Repository 
-
-*Synapse implementation-specific details for the media repository*
-
-The media repository
- * stores avatars, attachments and their thumbnails for media uploaded by local
-   users.
- * caches avatars, attachments and their thumbnails for media uploaded by remote
-   users.
- * caches resources and thumbnails used for URL previews.
-
-All media in Matrix can be identified by a unique
-[MXC URI](https://spec.matrix.org/latest/client-server-api/#matrix-content-mxc-uris),
-consisting of a server name and media ID:
-```
-mxc://<server-name>/<media-id>
-```
-
-## Local Media
-Synapse generates 24 character media IDs for content uploaded by local users.
-These media IDs consist of upper and lowercase letters and are case-sensitive.
-Other homeserver implementations may generate media IDs differently.
-
-Local media is recorded in the `local_media_repository` table, which includes
-metadata such as MIME types, upload times and file sizes.
-Note that this table is shared by the URL cache, which has a different media ID
-scheme.
-
-### Paths
-A file with media ID `aabbcccccccccccccccccccc` and its `128x96` `image/jpeg`
-thumbnail, created by scaling, would be stored at:
-```
-local_content/aa/bb/cccccccccccccccccccc
-local_thumbnails/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale
-```
-
-## Remote Media
-When media from a remote homeserver is requested from Synapse, it is assigned
-a local `filesystem_id`, with the same format as locally-generated media IDs,
-as described above.
-
-A record of remote media is stored in the `remote_media_cache` table, which
-can be used to map remote MXC URIs (server names and media IDs) to local
-`filesystem_id`s.
-
-### Paths
-A file from `matrix.org` with `filesystem_id` `aabbcccccccccccccccccccc` and its
-`128x96` `image/jpeg` thumbnail, created by scaling, would be stored at:
-```
-remote_content/matrix.org/aa/bb/cccccccccccccccccccc
-remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg-scale
-```
-Older thumbnails may omit the thumbnailing method:
-```
-remote_thumbnail/matrix.org/aa/bb/cccccccccccccccccccc/128-96-image-jpeg
-```
-
-Note that `remote_thumbnail/` does not have an `s`.
-
-## URL Previews
-
-When generating previews for URLs, Synapse may download and cache various
-resources, including images. These resources are assigned temporary media IDs
-of the form `yyyy-mm-dd_aaaaaaaaaaaaaaaa`, where `yyyy-mm-dd` is the current
-date and `aaaaaaaaaaaaaaaa` is a random sequence of 16 case-sensitive letters.
-
-The metadata for these cached resources is stored in the
-`local_media_repository` and `local_media_repository_url_cache` tables.
-
-Resources for URL previews are deleted after a few days.
-
-### Paths
-The file with media ID `yyyy-mm-dd_aaaaaaaaaaaaaaaa` and its `128x96`
-`image/jpeg` thumbnail, created by scaling, would be stored at:
-```
-url_cache/yyyy-mm-dd/aaaaaaaaaaaaaaaa
-url_cache_thumbnails/yyyy-mm-dd/aaaaaaaaaaaaaaaa/128-96-image-jpeg-scale
-```
diff --git a/docs/opentracing.md b/docs/opentracing.md
deleted file mode 100644
index abb94b565f..0000000000
--- a/docs/opentracing.md
+++ /dev/null
@@ -1,94 +0,0 @@
-# OpenTracing
-
-## Background
-
-OpenTracing is a semi-standard being adopted by a number of distributed
-tracing platforms. It is a common api for facilitating vendor-agnostic
-tracing instrumentation. That is, we can use the OpenTracing api and
-select one of a number of tracer implementations to do the heavy lifting
-in the background. Our current selected implementation is Jaeger.
-
-OpenTracing is a tool which gives an insight into the causal
-relationship of work done in and between servers. The servers each track
-events and report them to a centralised server - in Synapse's case:
-Jaeger. The basic unit used to represent events is the span. The span
-roughly represents a single piece of work that was done and the time at
-which it occurred. A span can have child spans, meaning that the work of
-the child had to be completed for the parent span to complete, or it can
-have follow-on spans which represent work that is undertaken as a result
-of the parent but is not depended on by the parent to in order to
-finish.
-
-Since this is undertaken in a distributed environment a request to
-another server, such as an RPC or a simple GET, can be considered a span
-(a unit or work) for the local server. This causal link is what
-OpenTracing aims to capture and visualise. In order to do this metadata
-about the local server's span, i.e the 'span context', needs to be
-included with the request to the remote.
-
-It is up to the remote server to decide what it does with the spans it
-creates. This is called the sampling policy and it can be configured
-through Jaeger's settings.
-
-For OpenTracing concepts see
-<https://opentracing.io/docs/overview/what-is-tracing/>.
-
-For more information about Jaeger's implementation see
-<https://www.jaegertracing.io/docs/>
-
-## Setting up OpenTracing
-
-To receive OpenTracing spans, start up a Jaeger server. This can be done
-using docker like so:
-
-```sh
-docker run -d --name jaeger \
-  -p 6831:6831/udp \
-  -p 6832:6832/udp \
-  -p 5778:5778 \
-  -p 16686:16686 \
-  -p 14268:14268 \
-  jaegertracing/all-in-one:1
-```
-
-Latest documentation is probably at
-https://www.jaegertracing.io/docs/latest/getting-started.
-
-## Enable OpenTracing in Synapse
-
-OpenTracing is not enabled by default. It must be enabled in the
-homeserver config by adding the `opentracing` option to your config file. You can find 
-documentation about how to do this in the [config manual under the header 'Opentracing'](usage/configuration/config_documentation.md#opentracing).
-See below for an example Opentracing configuration: 
-
-```yaml
-opentracing:
-  enabled: true
-  homeserver_whitelist:
-    - "mytrustedhomeserver.org"
-    - "*.myotherhomeservers.com"
-```
-
-## Homeserver whitelisting
-
-The homeserver whitelist is configured using regular expressions. A list
-of regular expressions can be given and their union will be compared
-when propagating any spans contexts to another homeserver.
-
-Though it's mostly safe to send and receive span contexts to and from
-untrusted users since span contexts are usually opaque ids it can lead
-to two problems, namely:
-
--   If the span context is marked as sampled by the sending homeserver
-    the receiver will sample it. Therefore two homeservers with wildly
-    different sampling policies could incur higher sampling counts than
-    intended.
--   Sending servers can attach arbitrary data to spans, known as
-    'baggage'. For safety this has been disabled in Synapse but that
-    doesn't prevent another server sending you baggage which will be
-    logged to OpenTracing's logs.
-
-## Configuring Jaeger
-
-Sampling strategies can be set as in this document:
-<https://www.jaegertracing.io/docs/latest/sampling/>.
diff --git a/docs/replication.md b/docs/replication.md
deleted file mode 100644
index 108da9a065..0000000000
--- a/docs/replication.md
+++ /dev/null
@@ -1,42 +0,0 @@
-# Replication Architecture
-
-## Motivation
-
-We'd like to be able to split some of the work that synapse does into
-multiple python processes. In theory multiple synapse processes could
-share a single postgresql database and we\'d scale up by running more
-synapse processes. However much of synapse assumes that only one process
-is interacting with the database, both for assigning unique identifiers
-when inserting into tables, notifying components about new updates, and
-for invalidating its caches.
-
-So running multiple copies of the current code isn't an option. One way
-to run multiple processes would be to have a single writer process and
-multiple reader processes connected to the same database. In order to do
-this we'd need a way for the reader process to invalidate its in-memory
-caches when an update happens on the writer. One way to do this is for
-the writer to present an append-only log of updates which the readers
-can consume to invalidate their caches and to push updates to listening
-clients or pushers.
-
-Synapse already stores much of its data as an append-only log so that it
-can correctly respond to `/sync` requests so the amount of code changes
-needed to expose the append-only log to the readers should be fairly
-minimal.
-
-## Architecture
-
-### The Replication Protocol
-
-See [the TCP replication documentation](tcp_replication.md).
-
-### The Slaved DataStore
-
-There are read-only version of the synapse storage layer in
-`synapse/replication/slave/storage` that use the response of the
-replication API to invalidate their caches.
-
-### The TCP Replication Module
-Information about how the tcp replication module is structured, including how
-the classes interact, can be found in
-`synapse/replication/tcp/__init__.py`
diff --git a/docs/room_and_user_statistics.md b/docs/room_and_user_statistics.md
deleted file mode 100644
index cc38c890bb..0000000000
--- a/docs/room_and_user_statistics.md
+++ /dev/null
@@ -1,22 +0,0 @@
-Room and User Statistics
-========================
-
-Synapse maintains room and user statistics in various tables. These can be used
-for administrative purposes but are also used when generating the public room
-directory.
-
-
-# Synapse Developer Documentation
-
-## High-Level Concepts
-
-### Definitions
-
-* **subject**: Something we are tracking stats about – currently a room or user.
-* **current row**: An entry for a subject in the appropriate current statistics
-    table. Each subject can have only one.
-
-### Overview
-
-Stats correspond to the present values. Current rows contain the most up-to-date
-statistics for a room. Each subject can only have one entry.
diff --git a/docs/tcp_replication.md b/docs/tcp_replication.md
deleted file mode 100644
index 15df949deb..0000000000
--- a/docs/tcp_replication.md
+++ /dev/null
@@ -1,257 +0,0 @@
-# TCP Replication
-
-## Motivation
-
-Previously the workers used an HTTP long poll mechanism to get updates
-from the master, which had the problem of causing a lot of duplicate
-work on the server. This TCP protocol replaces those APIs with the aim
-of increased efficiency.
-
-## Overview
-
-The protocol is based on fire and forget, line based commands. An
-example flow would be (where '>' indicates master to worker and
-'<' worker to master flows):
-
-    > SERVER example.com
-    < REPLICATE
-    > POSITION events master 53 53
-    > RDATA events master 54 ["$foo1:bar.com", ...]
-    > RDATA events master 55 ["$foo4:bar.com", ...]
-
-The example shows the server accepting a new connection and sending its identity
-with the `SERVER` command, followed by the client server to respond with the
-position of all streams. The server then periodically sends `RDATA` commands
-which have the format `RDATA <stream_name> <instance_name> <token> <row>`, where
-the format of `<row>` is defined by the individual streams. The
-`<instance_name>` is the name of the Synapse process that generated the data
-(usually "master").
-
-Error reporting happens by either the client or server sending an ERROR
-command, and usually the connection will be closed.
-
-Since the protocol is a simple line based, its possible to manually
-connect to the server using a tool like netcat. A few things should be
-noted when manually using the protocol:
-
--   The federation stream is only available if federation sending has
-    been disabled on the main process.
--   The server will only time connections out that have sent a `PING`
-    command. If a ping is sent then the connection will be closed if no
-    further commands are receieved within 15s. Both the client and
-    server protocol implementations will send an initial PING on
-    connection and ensure at least one command every 5s is sent (not
-    necessarily `PING`).
--   `RDATA` commands *usually* include a numeric token, however if the
-    stream has multiple rows to replicate per token the server will send
-    multiple `RDATA` commands, with all but the last having a token of
-    `batch`. See the documentation on `commands.RdataCommand` for
-    further details.
-
-## Architecture
-
-The basic structure of the protocol is line based, where the initial
-word of each line specifies the command. The rest of the line is parsed
-based on the command. For example, the RDATA command is defined as:
-
-    RDATA <stream_name> <instance_name> <token> <row_json>
-
-(Note that <row_json> may contains spaces, but cannot contain
-newlines.)
-
-Blank lines are ignored.
-
-### Keep alives
-
-Both sides are expected to send at least one command every 5s or so, and
-should send a `PING` command if necessary. If either side do not receive
-a command within e.g. 15s then the connection should be closed.
-
-Because the server may be connected to manually using e.g. netcat, the
-timeouts aren't enabled until an initial `PING` command is seen. Both
-the client and server implementations below send a `PING` command
-immediately on connection to ensure the timeouts are enabled.
-
-This ensures that both sides can quickly realize if the tcp connection
-has gone and handle the situation appropriately.
-
-### Start up
-
-When a new connection is made, the server:
-
--   Sends a `SERVER` command, which includes the identity of the server,
-    allowing the client to detect if its connected to the expected
-    server
--   Sends a `PING` command as above, to enable the client to time out
-    connections promptly.
-
-The client:
-
--   Sends a `NAME` command, allowing the server to associate a human
-    friendly name with the connection. This is optional.
--   Sends a `PING` as above
--   Sends a `REPLICATE` to get the current position of all streams.
--   On receipt of a `SERVER` command, checks that the server name
-    matches the expected server name.
-
-### Error handling
-
-If either side detects an error it can send an `ERROR` command and close
-the connection.
-
-If the client side loses the connection to the server it should
-reconnect, following the steps above.
-
-### Congestion
-
-If the server sends messages faster than the client can consume them the
-server will first buffer a (fairly large) number of commands and then
-disconnect the client. This ensures that we don't queue up an unbounded
-number of commands in memory and gives us a potential oppurtunity to
-squawk loudly. When/if the client recovers it can reconnect to the
-server and ask for missed messages.
-
-### Reliability
-
-In general the replication stream should be considered an unreliable
-transport since e.g. commands are not resent if the connection
-disappears.
-
-The exception to that are the replication streams, i.e. RDATA commands,
-since these include tokens which can be used to restart the stream on
-connection errors.
-
-The client should keep track of the token in the last RDATA command
-received for each stream so that on reconneciton it can start streaming
-from the correct place. Note: not all RDATA have valid tokens due to
-batching. See `RdataCommand` for more details.
-
-### Example
-
-An example iteraction is shown below. Each line is prefixed with '>'
-or '<' to indicate which side is sending, these are *not* included on
-the wire:
-
-    * connection established *
-    > SERVER localhost:8823
-    > PING 1490197665618
-    < NAME synapse.app.appservice
-    < PING 1490197665618
-    < REPLICATE
-    > POSITION events master 1 1
-    > POSITION backfill master 1 1
-    > POSITION caches master 1 1
-    > RDATA caches master 2 ["get_user_by_id",["@01register-user:localhost:8823"],1490197670513]
-    > RDATA events master 14 ["$149019767112vOHxz:localhost:8823",
-        "!AFDCvgApUmpdfVjIXm:localhost:8823","m.room.guest_access","",null]
-    < PING 1490197675618
-    > ERROR server stopping
-    * connection closed by server *
-
-The `POSITION` command sent by the server is used to set the clients
-position without needing to send data with the `RDATA` command.
-
-An example of a batched set of `RDATA` is:
-
-    > RDATA caches master batch ["get_user_by_id",["@test:localhost:8823"],1490197670513]
-    > RDATA caches master batch ["get_user_by_id",["@test2:localhost:8823"],1490197670513]
-    > RDATA caches master batch ["get_user_by_id",["@test3:localhost:8823"],1490197670513]
-    > RDATA caches master 54 ["get_user_by_id",["@test4:localhost:8823"],1490197670513]
-
-In this case the client shouldn't advance their caches token until it
-sees the the last `RDATA`.
-
-### List of commands
-
-The list of valid commands, with which side can send it: server (S) or
-client (C):
-
-#### SERVER (S)
-
-   Sent at the start to identify which server the client is talking to
-
-#### RDATA (S)
-
-   A single update in a stream
-
-#### POSITION (S)
-
-   On receipt of a POSITION command clients should check if they have missed any
-   updates, and if so then fetch them out of band. Sent in response to a
-   REPLICATE command (but can happen at any time).
-
-   The POSITION command includes the source of the stream. Currently all streams
-   are written by a single process (usually "master"). If fetching missing
-   updates via HTTP API, rather than via the DB, then processes should make the
-   request to the appropriate process.
-
-   Two positions are included, the "new" position and the last position sent respectively.
-   This allows servers to tell instances that the positions have advanced but no
-   data has been written, without clients needlessly checking to see if they
-   have missed any updates.
-
-#### ERROR (S, C)
-
-   There was an error
-
-#### PING (S, C)
-
-   Sent periodically to ensure the connection is still alive
-
-#### NAME (C)
-
-   Sent at the start by client to inform the server who they are
-
-#### REPLICATE (C)
-
-Asks the server for the current position of all streams.
-
-#### USER_SYNC (C)
-
-   A user has started or stopped syncing on this process.
-
-#### CLEAR_USER_SYNC (C)
-
-   The server should clear all associated user sync data from the worker.
-
-   This is used when a worker is shutting down.
-
-#### FEDERATION_ACK (C)
-
-   Acknowledge receipt of some federation data
-
-### REMOTE_SERVER_UP (S, C)
-
-   Inform other processes that a remote server may have come back online.
-
-See `synapse/replication/tcp/commands.py` for a detailed description and
-the format of each command.
-
-### Cache Invalidation Stream
-
-The cache invalidation stream is used to inform workers when they need
-to invalidate any of their caches in the data store. This is done by
-streaming all cache invalidations done on master down to the workers,
-assuming that any caches on the workers also exist on the master.
-
-Each individual cache invalidation results in a row being sent down
-replication, which includes the cache name (the name of the function)
-and they key to invalidate. For example:
-
-    > RDATA caches master 550953771 ["get_user_by_id", ["@bob:example.com"], 1550574873251]
-
-Alternatively, an entire cache can be invalidated by sending down a `null`
-instead of the key. For example:
-
-    > RDATA caches master 550953772 ["get_user_by_id", null, 1550574873252]
-
-However, there are times when a number of caches need to be invalidated
-at the same time with the same key. To reduce traffic we batch those
-invalidations into a single poke by defining a special cache name that
-workers understand to mean to expand to invalidate the correct caches.
-
-Currently the special cache names are declared in
-`synapse/storage/_base.py` and are:
-
-1.  `cs_cache_fake` ─ invalidates caches that depend on the current
-    state
diff --git a/docs/usage/configuration/config_documentation.md b/docs/usage/configuration/config_documentation.md
index 65025d3840..9fbc328042 100644
--- a/docs/usage/configuration/config_documentation.md
+++ b/docs/usage/configuration/config_documentation.md
@@ -3493,7 +3493,7 @@ user_consent:
 ---
 ### `stats`
 
-Settings for local room and user statistics collection. See [here](../../room_and_user_statistics.md)
+Settings for local room and user statistics collection. See [here](../../development/internal_documentation/room_and_user_statistics.md)
 for more.
 
 * `enabled`: Set to false to disable room and user statistics. Note that doing
@@ -3642,7 +3642,7 @@ synapse or any other services which support opentracing
 Sub-options include:
 * `enabled`: whether tracing is enabled. Set to true to enable. Disabled by default.
 * `homeserver_whitelist`: The list of homeservers we wish to send and receive span contexts and span baggage.
-   See [here](../../opentracing.md) for more.
+   See [here](../../development/opentracing.md) for more.
    This is a list of regexes which are matched against the `server_name` of the homeserver.
    By default, it is empty, so no servers are matched.
 * `force_tracing_for_users`: # A list of the matrix IDs of users whose requests will always be traced,
-- 
cgit 1.5.1