From 66dc75ec0555669b38b5446df40ddadcd76ef70c Mon Sep 17 00:00:00 2001
From: erikjohnston
Synapse computes auth chain differences by pre-computing a "chain cover" index
-for the auth chain in a room, allowing efficient reachability queries like "is
-event A in the auth chain of event B". This is done by assigning every event a
-chain ID and sequence number (e.g. (5,3)
), and having a map of links
-between chains (e.g. (5,3) -> (2,4)
) such that A is reachable by B (i.e. A
-is in the auth chain of B
) if and only if either:
A
in the auth chain of event B
?". We could do this with an index
+that tracks all pairs (A, B)
such that A
is in the auth chain of B
. However, this
+would be prohibitively large, scaling poorly as the room accumulates more state
+events.
+Instead, we break down the graph into chains. A chain is a subset of a DAG
+with the following property: for any pair of events E
and F
in the chain,
+the chain contains a path E -> F
or a path F -> E
. This forces a chain to be
+linear (without forks), e.g. E -> F -> G -> ... -> H
. Each event in the chain
+is given a sequence number local to that chain. The oldest event E
in the
+chain has sequence number 1. If E
has a child F
in the chain, then F
has
+sequence number 2. If E
has a grandchild G
in the chain, then G
has
+sequence number 3; and so on.
Synapse ensures that each persisted event belongs to exactly one chain, and +tracks how the chains are connected to one another. This allows us to +efficiently answer reachability queries. Doing so uses less storage than +tracking reachability on an event-by-event basis, particularly when we have +fewer and longer chains. See
+++Jagadish, H. (1990). A compression technique to materialize transitive closure. +ACM Transactions on Database Systems (TODS), 15*(4)*, 558-598.
+
for the original idea or
+++Y. Chen, Y. Chen, An efficient algorithm for answering graph +reachability queries, +in: 2008 IEEE 24th International Conference on Data Engineering, April 2008, +pp. 893–902. (PDF available via Google Scholar.)
+
for a more modern take.
+In practical terms, the chain cover assigns every event a
+chain ID and sequence number (e.g. (5,3)
), and maintains a map of links
+between events in chains (e.g. (5,3) -> (2,4)
) such that A
is reachable by B
+(i.e. A
is in the auth chain of B
) if and only if either:
A
's sequence number is less than B
's
+A
and B
have the same chain ID and A
's sequence number is less than B
's
sequence number; orL
between B
's chain ID and A
's chain ID such that
L.start_seq_no
<= B.seq_no
and A.seq_no
<= L.end_seq_no
.C3 -> C2 -> C1
then the link C3 -> C1
-would not be stored. Synapse uses the former implementations so that it doesn't
-need to recurse to test reachability between chains.
+would not be stored. Synapse uses the former implementation so that it doesn't
+need to recurse to test reachability between chains. This trades-off extra storage
+in order to save CPU cycles and DB queries.
An example auth graph would look like the following, where chains have been formed based on type/state_key and are denoted by colour and are labelled with -- cgit 1.5.1