summary refs log tree commit diff
diff options
context:
space:
mode:
authorEric Eastwood <erice@element.io>2022-09-27 14:43:16 -0500
committerGitHub <noreply@github.com>2022-09-27 19:43:16 +0000
commit35e9d6a6161c05db491a36e8c194f9cb6b0af073 (patch)
tree8d10b296489e759ad30b8d3c87ea039056582703
parentAdd new columns tracking when we partial-joined (#13892) (diff)
downloadsynapse-35e9d6a6161c05db491a36e8c194f9cb6b0af073.tar.xz
Emphasize the right reasons to use `(room_id, event_id)` in a schema (#13915)
* Emphasize the right reasons to use (room_id, event_id)

Follow-up to:
 - https://github.com/matrix-org/synapse/pull/13701
 - https://github.com/matrix-org/synapse/pull/13771
-rw-r--r--changelog.d/13915.doc1
-rw-r--r--docs/development/database_schema.md31
2 files changed, 17 insertions, 15 deletions
diff --git a/changelog.d/13915.doc b/changelog.d/13915.doc
new file mode 100644
index 0000000000..828cc30536
--- /dev/null
+++ b/changelog.d/13915.doc
@@ -0,0 +1 @@
+Emphasize the right reasons when to use `(room_id, event_id)` in a database schema.
diff --git a/docs/development/database_schema.md b/docs/development/database_schema.md
index e9b925ddd8..29945c264e 100644
--- a/docs/development/database_schema.md
+++ b/docs/development/database_schema.md
@@ -195,23 +195,24 @@ There are three separate aspects to this:
 
 ## `event_id` global uniqueness
 
-In room versions `1` and `2` it's possible to end up with two events with the
-same `event_id` (in the same or different rooms). After room version `3`, that
-can only happen with a hash collision, which we basically hope will never
-happen.
-
-There are several places in Synapse and even Matrix APIs like [`GET
+`event_id`'s can be considered globally unique although there has been a lot of
+debate on this topic in places like
+[MSC2779](https://github.com/matrix-org/matrix-spec-proposals/issues/2779) and
+[MSC2848](https://github.com/matrix-org/matrix-spec-proposals/pull/2848) which
+has no resolution yet (as of 2022-09-01). There are several places in Synapse
+and even in the Matrix APIs like [`GET
 /_matrix/federation/v1/event/{eventId}`](https://spec.matrix.org/v1.1/server-server-api/#get_matrixfederationv1eventeventid)
 where we assume that event IDs are globally unique.
 
-But hash collisions are still possible, and by treating event IDs as room
-scoped, we can reduce the possibility of a hash collision. When scoping
-`event_id` in the database schema, it should be also accompanied by `room_id`
-(`PRIMARY KEY (room_id, event_id)`) and lookups should be done through the pair
-`(room_id, event_id)`.
+When scoping `event_id` in a database schema, it is often nice to accompany it
+with `room_id` (`PRIMARY KEY (room_id, event_id)` and a `FOREIGN KEY(room_id)
+REFERENCES rooms(room_id)`) which makes flexible lookups easy. For example it
+makes it very easy to find and clean up everything in a room when it needs to be
+purged (no need to use sub-`select` query or join from the `events` table).
+
+A note on collisions: In room versions `1` and `2` it's possible to end up with
+two events with the same `event_id` (in the same or different rooms). After room
+version `3`, that can only happen with a hash collision, which we basically hope
+will never happen (SHA256 has a massive big key space).
 
-There has been a lot of debate on this in places like
-https://github.com/matrix-org/matrix-spec-proposals/issues/2779 and
-[MSC2848](https://github.com/matrix-org/matrix-spec-proposals/pull/2848) which
-has no resolution yet (as of 2022-09-01).