summary refs log tree commit diff
path: root/docs
diff options
context:
space:
mode:
authorRichard van der Hoff <1389908+richvdh@users.noreply.github.com>2021-06-11 14:45:53 +0100
committerGitHub <noreply@github.com>2021-06-11 14:45:53 +0100
commitc1b9922498dea4b2882d26a4eaef3e0a37e727fd (patch)
tree926ab9057e6276f7f11fb0b00d4107836b9484dd /docs
parentUse the matching complement branch when running tests in CI. (#10160) (diff)
downloadsynapse-c1b9922498dea4b2882d26a4eaef3e0a37e727fd.tar.xz
Support for database schema version ranges (#9933)
This is essentially an implementation of the proposal made at https://hackmd.io/@richvdh/BJYXQMQHO, though the details have ended up looking slightly different.
Diffstat (limited to 'docs')
-rw-r--r--docs/SUMMARY.md3
-rw-r--r--docs/development/database_schema.md95
2 files changed, 97 insertions, 1 deletions
diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md
index 8f39ae0270..af2c968c9a 100644
--- a/docs/SUMMARY.md
+++ b/docs/SUMMARY.md
@@ -69,6 +69,7 @@
   - [Git Usage](dev/git.md)
   - [Testing]()
   - [OpenTracing](opentracing.md)
+  - [Database Schemas](development/database_schema.md)
   - [Synapse Architecture]()
     - [Log Contexts](log_contexts.md)
     - [Replication](replication.md)
@@ -84,4 +85,4 @@
   - [Scripts]()
 
 # Other
-  - [Dependency Deprecation Policy](deprecation_policy.md)
\ No newline at end of file
+  - [Dependency Deprecation Policy](deprecation_policy.md)
diff --git a/docs/development/database_schema.md b/docs/development/database_schema.md
new file mode 100644
index 0000000000..7fe8ec63e1
--- /dev/null
+++ b/docs/development/database_schema.md
@@ -0,0 +1,95 @@
+# Synapse database schema files
+
+Synapse's database schema is stored in the `synapse.storage.schema` module.
+
+## Logical databases
+
+Synapse supports splitting its datastore across multiple physical databases (which can
+be useful for large installations), and the schema files are therefore split according
+to the logical database they apply to.
+
+At the time of writing, the following "logical" databases are supported:
+
+* `state` - used to store Matrix room state (more specifically, `state_groups`,
+  their relationships and contents).
+* `main` - stores everything else.
+
+Additionally, the `common` directory contains schema files for tables which must be
+present on *all* physical databases.
+
+## Synapse schema versions
+
+Synapse manages its database schema via "schema versions". These are mainly used to
+help avoid confusion if the Synapse codebase is rolled back after the database is
+updated. They work as follows:
+
+ * The Synapse codebase defines a constant `synapse.storage.schema.SCHEMA_VERSION`
+   which represents the expectations made about the database by that version. For
+   example, as of Synapse v1.36, this is `59`.
+
+ * The database stores a "compatibility version" in
+   `schema_compat_version.compat_version` which defines the `SCHEMA_VERSION` of the
+   oldest version of Synapse which will work with the database. On startup, if
+   `compat_version` is found to be newer than `SCHEMA_VERSION`, Synapse will refuse to
+   start.
+
+   Synapse automatically updates this field from
+   `synapse.storage.schema.SCHEMA_COMPAT_VERSION`.
+
+ * Whenever a backwards-incompatible change is made to the database format (normally
+   via a `delta` file), `synapse.storage.schema.SCHEMA_COMPAT_VERSION` is also updated
+   so that administrators can not accidentally roll back to a too-old version of Synapse.
+
+Generally, the goal is to maintain compatibility with at least one or two previous
+releases of Synapse, so any substantial change tends to require multiple releases and a
+bit of forward-planning to get right.
+
+As a worked example: we want to remove the `room_stats_historical` table. Here is how it
+might pan out.
+
+ 1. Replace any code that *reads* from `room_stats_historical` with alternative
+    implementations, but keep writing to it in case of rollback to an earlier version.
+    Also, increase `synapse.storage.schema.SCHEMA_VERSION`.  In this
+    instance, there is no existing code which reads from `room_stats_historical`, so
+    our starting point is:
+
+    v1.36.0: `SCHEMA_VERSION=59`, `SCHEMA_COMPAT_VERSION=59`
+
+ 2. Next (say in Synapse v1.37.0): remove the code that *writes* to
+    `room_stats_historical`, but don’t yet remove the table in case of rollback to
+    v1.36.0. Again, we increase `synapse.storage.schema.SCHEMA_VERSION`, but
+    because we have not broken compatibility with v1.36, we do not yet update
+    `SCHEMA_COMPAT_VERSION`. We now have:
+
+    v1.37.0: `SCHEMA_VERSION=60`, `SCHEMA_COMPAT_VERSION=59`.
+
+ 3. Later (say in Synapse v1.38.0): we can remove the table altogether. This will
+    break compatibility with v1.36.0, so we must update `SCHEMA_COMPAT_VERSION` accordingly.
+    There is no need to update `synapse.storage.schema.SCHEMA_VERSION`, since there is no
+    change to the Synapse codebase here. So we end up with:
+
+    v1.38.0: `SCHEMA_VERSION=60`, `SCHEMA_COMPAT_VERSION=60`.
+
+If in doubt about whether to update `SCHEMA_VERSION` or not, it is generally best to
+lean towards doing so.
+
+## Full schema dumps
+
+In the `full_schemas` directories, only the most recently-numbered snapshot is used
+(`54` at the time of writing). Older snapshots (eg, `16`) are present for historical
+reference only.
+
+### Building full schema dumps
+
+If you want to recreate these schemas, they need to be made from a database that
+has had all background updates run.
+
+To do so, use `scripts-dev/make_full_schema.sh`. This will produce new
+`full.sql.postgres` and `full.sql.sqlite` files.
+
+Ensure postgres is installed, then run:
+
+    ./scripts-dev/make_full_schema.sh -p postgres_username -o output_dir/
+
+NB at the time of writing, this script predates the split into separate `state`/`main`
+databases so will require updates to handle that correctly.