From 63d9d909077fdbb1bcae12db1ad110b97f00b16b Mon Sep 17 00:00:00 2001 From: MatMaul Date: Tue, 25 Jul 2023 12:33:48 +0000 Subject: deploy: dbee081d149cf1b496d147e144279b621220e429 --- v1.89/development/database_schema.html | 338 +++++++++++++++++++++++++++++++++ 1 file changed, 338 insertions(+) create mode 100644 v1.89/development/database_schema.html (limited to 'v1.89/development/database_schema.html') diff --git a/v1.89/development/database_schema.html b/v1.89/development/database_schema.html new file mode 100644 index 0000000000..fe1e178fb5 --- /dev/null +++ b/v1.89/development/database_schema.html @@ -0,0 +1,338 @@ + + + + + + Database Schemas - Synapse + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + +
+
+ +
+ +
+ +

Synapse database schema files

+

Synapse's database schema is stored in the synapse.storage.schema module.

+

Logical databases

+

Synapse supports splitting its datastore across multiple physical databases (which can +be useful for large installations), and the schema files are therefore split according +to the logical database they apply to.

+

At the time of writing, the following "logical" databases are supported:

+
    +
  • state - used to store Matrix room state (more specifically, state_groups, +their relationships and contents).
  • +
  • main - stores everything else.
  • +
+

Additionally, the common directory contains schema files for tables which must be +present on all physical databases.

+

Synapse schema versions

+

Synapse manages its database schema via "schema versions". These are mainly used to +help avoid confusion if the Synapse codebase is rolled back after the database is +updated. They work as follows:

+
    +
  • +

    The Synapse codebase defines a constant synapse.storage.schema.SCHEMA_VERSION +which represents the expectations made about the database by that version. For +example, as of Synapse v1.36, this is 59.

    +
  • +
  • +

    The database stores a "compatibility version" in +schema_compat_version.compat_version which defines the SCHEMA_VERSION of the +oldest version of Synapse which will work with the database. On startup, if +compat_version is found to be newer than SCHEMA_VERSION, Synapse will refuse to +start.

    +

    Synapse automatically updates this field from +synapse.storage.schema.SCHEMA_COMPAT_VERSION.

    +
  • +
  • +

    Whenever a backwards-incompatible change is made to the database format (normally +via a delta file), synapse.storage.schema.SCHEMA_COMPAT_VERSION is also updated +so that administrators can not accidentally roll back to a too-old version of Synapse.

    +
  • +
+

Generally, the goal is to maintain compatibility with at least one or two previous +releases of Synapse, so any substantial change tends to require multiple releases and a +bit of forward-planning to get right.

+

As a worked example: we want to remove the room_stats_historical table. Here is how it +might pan out.

+
    +
  1. +

    Replace any code that reads from room_stats_historical with alternative +implementations, but keep writing to it in case of rollback to an earlier version. +Also, increase synapse.storage.schema.SCHEMA_VERSION. In this +instance, there is no existing code which reads from room_stats_historical, so +our starting point is:

    +

    v1.36.0: SCHEMA_VERSION=59, SCHEMA_COMPAT_VERSION=59

    +
  2. +
  3. +

    Next (say in Synapse v1.37.0): remove the code that writes to +room_stats_historical, but don’t yet remove the table in case of rollback to +v1.36.0. Again, we increase synapse.storage.schema.SCHEMA_VERSION, but +because we have not broken compatibility with v1.36, we do not yet update +SCHEMA_COMPAT_VERSION. We now have:

    +

    v1.37.0: SCHEMA_VERSION=60, SCHEMA_COMPAT_VERSION=59.

    +
  4. +
  5. +

    Later (say in Synapse v1.38.0): we can remove the table altogether. This will +break compatibility with v1.36.0, so we must update SCHEMA_COMPAT_VERSION accordingly. +There is no need to update synapse.storage.schema.SCHEMA_VERSION, since there is no +change to the Synapse codebase here. So we end up with:

    +

    v1.38.0: SCHEMA_VERSION=60, SCHEMA_COMPAT_VERSION=60.

    +
  6. +
+

If in doubt about whether to update SCHEMA_VERSION or not, it is generally best to +lean towards doing so.

+

Full schema dumps

+

In the full_schemas directories, only the most recently-numbered snapshot is used +(54 at the time of writing). Older snapshots (eg, 16) are present for historical +reference only.

+

Building full schema dumps

+

If you want to recreate these schemas, they need to be made from a database that +has had all background updates run.

+

To do so, use scripts-dev/make_full_schema.sh. This will produce new +full.sql.postgres and full.sql.sqlite files.

+

Ensure postgres is installed, then run:

+
./scripts-dev/make_full_schema.sh -p postgres_username -o output_dir/
+
+

NB at the time of writing, this script predates the split into separate state/main +databases so will require updates to handle that correctly.

+

Delta files

+

Delta files define the steps required to upgrade the database from an earlier version. +They can be written as either a file containing a series of SQL statements, or a Python +module.

+

Synapse remembers which delta files it has applied to a database (they are stored in the +applied_schema_deltas table) and will not re-apply them (even if a given file is +subsequently updated).

+

Delta files should be placed in a directory named synapse/storage/schema/<database>/delta/<version>/. +They are applied in alphanumeric order, so by convention the first two characters +of the filename should be an integer such as 01, to put the file in the right order.

+

SQL delta files

+

These should be named *.sql, or — for changes which should only be applied for a +given database engine — *.sql.posgres or *.sql.sqlite. For example, a delta which +adds a new column to the foo table might be called 01add_bar_to_foo.sql.

+

Note that our SQL parser is a bit simple - it understands comments (-- and /*...*/), +but complex statements which require a ; in the middle of them (such as CREATE TRIGGER) are beyond it and you'll have to use a Python delta file.

+

Python delta files

+

For more flexibility, a delta file can take the form of a python module. These should +be named *.py. Note that database-engine-specific modules are not supported here – +instead you can write if isinstance(database_engine, PostgresEngine) or similar.

+

A Python delta module should define either or both of the following functions:

+
import synapse.config.homeserver
+import synapse.storage.engines
+import synapse.storage.types
+
+
+def run_create(
+    cur: synapse.storage.types.Cursor,
+    database_engine: synapse.storage.engines.BaseDatabaseEngine,
+) -> None:
+    """Called whenever an existing or new database is to be upgraded"""
+    ...
+
+def run_upgrade(
+    cur: synapse.storage.types.Cursor,
+    database_engine: synapse.storage.engines.BaseDatabaseEngine,
+    config: synapse.config.homeserver.HomeServerConfig,
+) -> None:
+    """Called whenever an existing database is to be upgraded."""
+    ...
+
+

Boolean columns

+

Boolean columns require special treatment, since SQLite treats booleans the +same as integers.

+

Any new boolean column must be added to the BOOLEAN_COLUMNS list in +synapse/_scripts/synapse_port_db.py. This tells the port script to cast +the integer value from SQLite to a boolean before writing the value to the +postgres database.

+

event_id global uniqueness

+

event_id's can be considered globally unique although there has been a lot of +debate on this topic in places like +MSC2779 and +MSC2848 which +has no resolution yet (as of 2022-09-01). There are several places in Synapse +and even in the Matrix APIs like GET /_matrix/federation/v1/event/{eventId} +where we assume that event IDs are globally unique.

+

When scoping event_id in a database schema, it is often nice to accompany it +with room_id (PRIMARY KEY (room_id, event_id) and a FOREIGN KEY(room_id) REFERENCES rooms(room_id)) which makes flexible lookups easy. For example it +makes it very easy to find and clean up everything in a room when it needs to be +purged (no need to use sub-select query or join from the events table).

+

A note on collisions: In room versions 1 and 2 it's possible to end up with +two events with the same event_id (in the same or different rooms). After room +version 3, that can only happen with a hash collision, which we basically hope +will never happen (SHA256 has a massive big key space).

+ +
+ + +
+
+ + + +
+ + + + + + + + + + + + + \ No newline at end of file -- cgit 1.5.1