3 files changed, 268 insertions, 27 deletions
diff --git a/docs/admin_api/user_admin_api.md b/docs/admin_api/user_admin_api.md
index f83facabe4..b91848dd27 100644
--- a/docs/admin_api/user_admin_api.md
+++ b/docs/admin_api/user_admin_api.md
@@ -148,7 +148,6 @@ Body parameters:
 - `admin` - **bool**, optional, defaults to `false`. Whether the user is a homeserver administrator,
   granting them access to the Admin API, among other things.
 - `deactivated` - **bool**, optional. If unspecified, deactivation state will be left unchanged.
-- `locked` - **bool**, optional. If unspecified, locked state will be left unchanged.
 
   Note: the `password` field must also be set if both of the following are true:
   - `deactivated` is set to `false` and the user was previously deactivated (you are reactivating this user)
@@ -158,6 +157,7 @@ Body parameters:
 
   Note: a user cannot be erased with this API. For more details on
   deactivating and erasing users see [Deactivate Account](#deactivate-account).
+- `locked` - **bool**, optional. If unspecified, locked state will be left unchanged.
 - `user_type` - **string** or null, optional. If not provided, the user type will be
   not be changed. If `null` is given, the user type will be cleared.
   Other allowed options are: `bot` and `support`.
diff --git a/docs/development/database_schema.md b/docs/development/database_schema.md
index e231be21dd..675080ae1b 100644
--- a/docs/development/database_schema.md
+++ b/docs/development/database_schema.md
@@ -184,3 +184,160 @@ version `3`, that can only happen with a hash collision, which we basically hope
 will never happen (SHA256 has a massive big key space).
 
 
+## Worked examples of gradual migrations
+
+Some migrations need to be performed gradually. A prime example of this is anything
+which would need to do a large table scan — including adding columns, indices or
+`NOT NULL` constraints to non-empty tables — such a migration should be done as a
+background update where possible, at least on Postgres.
+We can afford to be more relaxed about SQLite databases since they are usually
+used on smaller deployments and SQLite does not support the same concurrent
+DDL operations as Postgres.
+
+We also typically insist on having at least one Synapse version's worth of
+backwards compatibility, so that administrators can roll back Synapse if an upgrade
+did not go smoothly.
+
+This sometimes results in having to plan a migration across multiple versions
+of Synapse.
+
+This section includes an example and may include more in the future.
+
+
+
+### Transforming a column into another one, with `NOT NULL` constraints
+
+This example illustrates how you would introduce a new column, write data into it
+based on data from an old column and then drop the old column.
+
+We are aiming for semantic equivalence to:
+
+```sql
+ALTER TABLE mytable ADD COLUMN new_column INTEGER;
+UPDATE mytable SET new_column = old_column * 100;
+ALTER TABLE mytable ALTER COLUMN new_column ADD CONSTRAINT NOT NULL;
+ALTER TABLE mytable DROP COLUMN old_column;
+```
+
+#### Synapse version `N`
+
+```python
+SCHEMA_VERSION = S
+SCHEMA_COMPAT_VERSION = ... # unimportant at this stage
+```
+
+**Invariants:**
+1. `old_column` is read by Synapse and written to by Synapse.
+
+
+#### Synapse version `N + 1`
+
+```python
+SCHEMA_VERSION = S + 1
+SCHEMA_COMPAT_VERSION = ... # unimportant at this stage
+```
+
+**Changes:**
+1.
+   ```sql
+   ALTER TABLE mytable ADD COLUMN new_column INTEGER;
+   ```
+
+**Invariants:**
+1. `old_column` is read by Synapse and written to by Synapse.
+2. `new_column` is written to by Synapse.
+
+**Notes:**
+1. `new_column` can't have a `NOT NULL NOT VALID` constraint yet, because the previous Synapse version did not write to the new column (since we haven't bumped the `SCHEMA_COMPAT_VERSION` yet, we still need to be compatible with the previous version).
+
+
+#### Synapse version `N + 2`
+
+```python
+SCHEMA_VERSION = S + 2
+SCHEMA_COMPAT_VERSION = S + 1 # this signals that we can't roll back to a time before new_column existed
+```
+
+**Changes:**
+1. On Postgres, add a `NOT VALID` constraint to ensure new rows are compliant. *SQLite does not have such a construct, but it would be unnecessary anyway since there is no way to concurrently perform this migration on SQLite.*
+   ```sql
+   ALTER TABLE mytable ADD CONSTRAINT CHECK new_column_not_null (new_column IS NOT NULL) NOT VALID;
+   ```
+2. Start a background update to perform migration: it should gradually run e.g.
+   ```sql
+   UPDATE mytable SET new_column = old_column * 100 WHERE 0 < mytable_id AND mytable_id <= 5;
+   ```
+   This background update is technically pointless on SQLite, but you must schedule it anyway so that the `portdb` script to migrate to Postgres still works.
+3. Upon completion of the background update, you should run `VALIDATE CONSTRAINT` on Postgres to turn the `NOT VALID` constraint into a valid one.
+   ```sql
+   ALTER TABLE mytable VALIDATE CONSTRAINT new_column_not_null;
+   ```
+   This will take some time but does **NOT** hold an exclusive lock over the table.
+
+**Invariants:**
+1. `old_column` is read by Synapse and written to by Synapse.
+2. `new_column` is written to by Synapse and new rows always have a non-`NULL` value in this field.
+
+
+**Notes:**
+1. If you wish, you can convert the `CHECK (new_column IS NOT NULL)` to a `NOT NULL` constraint free of charge in Postgres by adding the `NOT NULL` constraint and then dropping the `CHECK` constraint, because Postgres can statically verify that the `NOT NULL` constraint is implied by the `CHECK` constraint without performing a table scan.
+2. It might be tempting to make version `N + 2` redundant by moving the background update to `N + 1` and delaying adding the `NOT NULL` constraint to `N + 3`, but that would mean the constraint would always be validated in the foreground in `N + 3`. Whereas if the `N + 2` step is kept, the migration in `N + 3` would be fast in the happy case.
+
+#### Synapse version `N + 3`
+
+```python
+SCHEMA_VERSION = S + 3
+SCHEMA_COMPAT_VERSION = S + 1 # we can't roll back to a time before new_column existed
+```
+
+**Changes:**
+1. (Postgres) Update the table to populate values of `new_column` in case the background update had not completed. Additionally, `VALIDATE CONSTRAINT` to make the check fully valid.
+   ```sql
+   -- you ideally want an index on `new_column` or e.g. `(new_column) WHERE new_column IS NULL` first, or perhaps you can find a way to skip this if the `NOT NULL` constraint has already been validated.
+   UPDATE mytable SET new_column = old_column * 100 WHERE new_column IS NULL;
+
+   -- this is a no-op if it already ran as part of the background update
+   ALTER TABLE mytable VALIDATE CONSTRAINT new_column_not_null;
+   ```
+2. (SQLite) Recreate the table by precisely following [the 12-step procedure for SQLite table schema changes](https://www.sqlite.org/lang_altertable.html#otheralter).
+   During this table rewrite, you should recreate `new_column` as `NOT NULL` and populate any outstanding `NULL` values at the same time.
+   Unfortunately, you can't drop `old_column` yet because it must be present for compatibility with the Postgres schema, as needed by `portdb`.
+   (Otherwise you could do this all in one go with SQLite!)
+
+**Invariants:**
+1. `old_column` is written to by Synapse (but no longer read by Synapse!).
+2. `new_column` is read by Synapse and written to by Synapse. Moreover, all rows have a non-`NULL` value in this field, as guaranteed by a schema constraint.
+
+**Notes:**
+1. We can't drop `old_column` yet, or even stop writing to it, because that would break a rollback to the previous version of Synapse.
+2. Application code can now rely on `new_column` being populated. The remaining steps are only motivated by the wish to clean-up old columns.
+
+
+#### Synapse version `N + 4`
+
+```python
+SCHEMA_VERSION = S + 4
+SCHEMA_COMPAT_VERSION = S + 3 # we can't roll back to a time before new_column was entirely non-NULL
+```
+
+**Invariants:**
+1. `old_column` exists but is not written to or read from by Synapse.
+2. `new_column` is read by Synapse and written to by Synapse. Moreover, all rows have a non-`NULL` value in this field, as guaranteed by a schema constraint.
+
+**Notes:**
+1. We can't drop `old_column` yet because that would break a rollback to the previous version of Synapse. \
+   **TODO:** It may be possible to relax this and drop the column straight away as long as the previous version of Synapse detected a rollback occurred and stopped attempting to write to the column. This could possibly be done by checking whether the database's schema compatibility version was `S + 3`.
+
+
+#### Synapse version `N + 5`
+
+```python
+SCHEMA_VERSION = S + 5
+SCHEMA_COMPAT_VERSION = S + 4 # we can't roll back to a time before old_column was no longer being touched
+```
+
+**Changes:**
+1.
+   ```sql
+   ALTER TABLE mytable DROP COLUMN old_column;
+   ```
diff --git a/docs/user_directory.md b/docs/user_directory.md
index c4794b04cf..b33fd2bc2a 100644
--- a/docs/user_directory.md
+++ b/docs/user_directory.md
@@ -1,49 +1,133 @@
-User Directory API Implementation
-=================================
+# User Directory API Implementation
 
-The user directory is currently maintained based on the 'visible' users
-on this particular server - i.e. ones which your account shares a room with, or
-who are present in a publicly viewable room present on the server.
+The user directory is maintained based on users that are 'visible' to the homeserver -
+i.e. ones which are local to the server and ones which any local user shares a
+room with.
 
-The directory info is stored in various tables, which can (typically after
-DB corruption) get stale or out of sync. If this happens, for now the
+The directory info is stored in various tables, which can sometimes get out of
+sync (although this is considered a bug). If this happens, for now the
 solution to fix it is to use the [admin API](usage/administration/admin_api/background_updates.md#run)
 and execute the job `regenerate_directory`. This should then start a background task to
-flush the current tables and regenerate the directory.
+flush the current tables and regenerate the directory. Depending on the size
+of your homeserver (number of users and rooms) this can take a while.
 
-Data model
-----------
+## Data model
 
 There are five relevant tables that collectively form the "user directory".
-Three of them track a master list of all the users we could search for.
-The last two (collectively called the "search tables") track who can
-see who.
+Three of them track a list of all known users. The last two (collectively called
+the "search tables") track which users are visible to each other.
 
 From all of these tables we exclude three types of local user:
-  - support users
-  - appservice users
-  - deactivated users
-
-* `user_directory`. This contains the user_id, display name and avatar we'll
-  return when you search the directory.
-  - Because there's only one directory entry per user, it's important that we only
-    ever put publicly visible names here. Otherwise we might leak a private
+
+- support users
+- appservice users
+- deactivated users
+
+A description of each table follows:
+
+* `user_directory`. This contains the user ID, display name and avatar of each user.
+  - Because there is only one directory entry per user, it is important that it
+    only contain publicly visible information. Otherwise, this will leak the
     nickname or avatar used in a private room.
   - Indexed on rooms. Indexed on users.
 
 * `user_directory_search`. To be joined to `user_directory`. It contains an extra
-  column that enables full text search based on user ids and display names.
-  Different schemas for SQLite and Postgres with different code paths to match.
+  column that enables full text search based on user IDs and display names.
+  Different schemas for SQLite and Postgres are used.
   - Indexed on the full text search data. Indexed on users.
 
 * `user_directory_stream_pos`. When the initial background update to populate
   the directory is complete, we record a stream position here. This indicates
   that synapse should now listen for room changes and incrementally update
-  the directory where necessary.
+  the directory where necessary. (See [stream positions](development/synapse_architecture/streams.html).)
 
-* `users_in_public_rooms`. Contains associations between users and the public rooms they're in.
-  Used to determine which users are in public rooms and should be publicly visible in the directory.
+* `users_in_public_rooms`. Contains associations between users and the public
+  rooms they're in.  Used to determine which users are in public rooms and should
+  be publicly visible in the directory. Both local and remote users are tracked.
 
 * `users_who_share_private_rooms`. Rows are triples `(L, M, room id)` where `L`
    is a local user and `M` is a local or remote user. `L` and `M` should be
    different, but this isn't enforced by a constraint.
+
+   Note that if two local users share a room then there will be two entries:
+   `(user1, user2, !room_id)` and `(user2, user1, !room_id)`.
+
+## Configuration options
+
+The exact way user search works can be tweaked via some server-level
+[configuration options](usage/configuration/config_documentation.md#user_directory).
+
+The information is not repeated here, but the options are mentioned below.
+
+## Search algorithm
+
+If `search_all_users` is `false`, then results are limited to users who:
+
+1. Are found in the `users_in_public_rooms` table, or
+2. Are found in the `users_who_share_private_rooms` where `L` is the requesting
+   user and `M` is the search result.
+
+Otherwise, if `search_all_users` is `true`, no such limits are placed and all
+users known to the server (matching the search query) will be returned.
+
+By default, locked users are not returned. If `show_locked_users` is `true` then
+no filtering on the locked status of a user is done.
+
+The user provided search term is lowercased and normalized using [NFKC](https://en.wikipedia.org/wiki/Unicode_equivalence#Normalization),
+this treats the string as case-insensitive, canonicalizes different forms of the
+same text, and maps some "roughly equivalent" characters together.
+
+The search term is then split into words:
+
+* If [ICU](https://en.wikipedia.org/wiki/International_Components_for_Unicode) is
+  available, then the system's [default locale](https://unicode-org.github.io/icu/userguide/locale/#default-locales)
+  will be used to break the search term into words. (See the
+  [installation instructions](setup/installation.md) for how to install ICU.)
+* If unavailable, then runs of ASCII characters, numbers, underscores, and hypens
+  are considered words.
+
+The queries for PostgreSQL and SQLite are detailed below, by their overall goal
+is to find matching users, preferring users who are "real" (e.g. not bots,
+not deactivated). It is assumed that real users will have an display name and
+avatar set.
+
+### PostgreSQL
+
+The above words are then transformed into two queries:
+
+1. "exact" which matches the parsed words exactly (using [`to_tsquery`](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-PARSING-QUERIES));
+2. "prefix" which matches the parsed words as prefixes (using `to_tsquery`).
+
+Results are composed of all rows in the `user_directory_search` table whose information
+matches one (or both) of these queries. Results are ordered by calculating a weighted
+score for each result, higher scores are returned first:
+
+* 4x if a user ID exists.
+* 1.2x if the user has a display name set.
+* 1.2x if the user has an avatar set.
+* 0x-3x by the full text search results using the [`ts_rank_cd` function](https://www.postgresql.org/docs/current/textsearch-controls.html#TEXTSEARCH-RANKING)
+  against the "exact" search query; this has four variables with the following weightings:
+  * `D`: 0.1 for the user ID's domain
+  * `C`: 0.1 for unused
+  * `B`: 0.9 for the user's display name (or an empty string if it is not set)
+  * `A`: 0.1 for the user ID's localpart
+* 0x-1x by the full text search results using the `ts_rank_cd` function against the
+  "prefix" search query. (Using the same weightings as above.)
+* If `prefer_local_users` is `true`, then 2x if the user is local to the homeserver.
+
+Note that `ts_rank_cd` returns a weight between 0 and 1. The initial weighting of
+all results is 1.
+
+### SQLite
+
+Results are composed of all rows in the `user_directory_search` whose information
+matches the query. Results are ordered by the following information, with each
+subsequent column used as a tiebreaker, for each result:
+
+1. By the [`rank`](https://www.sqlite.org/windowfunctions.html#built_in_window_functions)
+   of the full text search results using the [`matchinfo` function](https://www.sqlite.org/fts3.html#matchinfo). Higher
+   ranks are returned first.
+2. If `prefer_local_users` is `true`, then users local to the homeserver are
+   returned first.
+3. Users with a display name set are returned first.
+4. Users with an avatar set are returned first.