docs/state_delta_resolution.tex


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276

\documentclass{article}
\usepackage[utf8]{inputenc}
\usepackage{amsfonts}
\usepackage{amsthm}
\usepackage{amsmath}
\usepackage{amssymb}
\usepackage{algorithmic}
\usepackage{algorithm}
\usepackage{appendix}
\usepackage{listings}

\widowpenalties 1 10000
\raggedbottom


\title{State Delta Resolution Algorithm}
\author{Erik Johnston}
\date{April 2018}

\newtheorem{theorem}{Theorem}
\newtheorem{lemma}[theorem]{Lemma}
\newtheorem{corollary}[theorem]{Corollary}

\setlength{\parskip}{1em}

\newcommand{\inlinemaketitle}{{\let\newpage\relax\maketitle}}


\begin{document}
	
\maketitle

\section{Definitions}
	
We first define some basic functions that correspond to the synapse code.

Let $K$ be the set of all type/state key tuples and $E$ the set of all events. We can then define arbitrary functions:
\begin{equation}
\begin{split}
f: F \rightarrow E\\
g: G \rightarrow E\\
\end{split}
\end{equation} which we call state maps, for $F, G \subset K$.

State maps have the additional properties that they are injective and that $\forall$ state maps $f, g$ then $f(x) = g(y) \Rightarrow x = y$ (i.e. a given event can only be mapped to from a single state key).

We can compute the set of all ``unconflicted state keys":
\begin{equation}
U_{f,g} = \{x\ |\ \forall x \in F \cap G,\ f(x) = g(x)\} \cup (F \bigtriangleup G)
\end{equation} i.e. the set of state keys where $f$ and $g$ don't conflict. Similarly, we define:
\begin{equation}
\begin{split}
u_{f,g} : U_{f,g} \longrightarrow &\ E\\
x \longmapsto &\ \begin{cases}
	f(x), & \text{if}\ x \in F \\
	g(x), & \text{otherwise}
\end{cases}
\end{split}
\end{equation} which gets the unconflicted event for a given state key.


We can also define a function on $C_{f,g} = (F \cup G) \setminus U_{f,g}$:
\begin{equation}
c_{f,g}: C_{f,g} \rightarrow E
\end{equation} which is used to resolve conflicts between $f$ and $g$. Note that $c_{f,g}(x)$ is either $f(x)$ or $g(x)$.

Now we define:
\begin{equation} 
\begin{split}
r_{f,g}: F \cup G \longrightarrow &\ E\\
x \longmapsto &\ \begin{cases}
u_{f,g}(x), & \text{if}\ x \in U_{f,g}\\
c_{f,g}(x), & \text{otherwise}
\end{cases} 
\end{split}
\end{equation} which we call the resolved state of $f$ and $g$. 

Note that this definition immediately implies that $\forall x \in U_{f,g}$ s.t. $g(x) = g'(x)$ then $r_{f,g}(x) = r_{f,g'}(x)$.

We define 
\begin{equation}
\alpha: E \rightarrow \mathbb{P}(K)
\end{equation} to be the mapping of an event to the type/state keys needed to auth the event, and
\begin{equation}
\alpha_{f,g}(x) = (\alpha f(x)) \cup (\alpha g(x))
\end{equation} which is the set of auth events required for $f(x)$ and $g(x)$. Note that $\alpha_{f,g}(x) \subset F \cup G$.

Further, we can define
\begin{equation}
\rho_{f,g}(x) = \bigcup_{n=0}^\infty (\alpha_{f,g})^n(x)
\end{equation} to be the auth chain of $f(x)$ and $g(x)$. This is well defined as there are a finite number of elements in $F \cup G$ and $\rho_{f,g} \rightarrow F \cup G$.

For examples of what $\alpha$ and $\rho$ look like, see the appendix.

\newpage
\section{Results}

If we consider the implementation of $c_{f,g}$ in Synapse we can see that it depends not only on $f$ and $g$, but also on the resolved state of their auth events, i.e. $r_{f,g}\alpha_{f,g}$.

We can formalise this idea by defining the notion of dependency relations, which we do in the appendix. Using that notation, we can say that:
\[
c_{f,g} \propto \{f,\ g,\ r_{f,g}\alpha_{f,g}\}
\]
which leads to the following results.:

\begin{lemma}
	 $r_{f,g} \propto \{f\rho_{f,g}, g\rho_{f,g}\}$
\end{lemma}

\begin{proof}
	
	First:
	\[
		\begin{split}
		r_{f,g}\alpha_{f,g}(x)\ =\ & (u_{f,g}\alpha_{f,g}(x)) \cup (c_{f,g}\alpha_{f,g}(x))\\
		\end{split}
	\]\footnotemark but by definition $u_{f,g} \alpha_{f,g}(x) \propto \alpha_{f,g}(x)$, therefore:
	\footnotetext{Note that e.g. $u_{f,g}\alpha_{f,g}(x)$ is shorthand for \{$u_{f,g}(y)\ |\ \forall y \in \alpha_{f,g}(x) \cap U_{f,g}\}$, i.e. we apply the function to the values in the given set that are also in the functions domain.}
	\[
		r_{f,g}\alpha_{f,g} \propto \{f\alpha_{f,g},\ g\alpha_{f,g},\ c_{f,g}\alpha_{f,g}\}
	\]
	so by induction, starting from $c_{f,g} \propto \{f,\ g,\ r_{f,g}\alpha_{f,g}\}$:
	\[
		r_{f,g} \propto \{ f(\alpha_{f,g})^i \ |\ \forall i \ge 0 \} \cup \{ g(\alpha_{f,g})^i \ |\ \forall i \ge 0 \}
	\]
    and the result follows since $f (\alpha_{f,g})^i \propto f \rho_{f,g}$
	
\end{proof}

By inspecting the actual implementation of $\alpha$ we can define $\rho_{f,g}^{-1}(x)$ to be a function which $\forall y \in \rho_{f,g}(x)$ then $x \in \rho_{f,g}^{-1}(y)$. We can similarly define $\alpha_{f,g}^{-1}(x)$. Note that $\forall x$, $x \in \rho_{f,g}^{-1}(x)$

We now consider $g': G' \rightarrow E$, where $g(x) = g'(x)$ except for $x \in G_\delta$, i.e. $g'$ is a state map based on $g$.

\begin{lemma}
	For $f, g, g'$ s.t. $\forall x \notin G_\delta,\ g(x) = g'(x)$, then $\forall x \notin \rho_{f,g'}^{-1}(G_\delta),\ r_{f,g}(x) = r_{f,g'}(x)$.
\end{lemma}

\begin{proof}
	Let $x$ be s.t. $r_{f,g}(x) \neq r_{f,g'}(x)$:
	\[
	\begin{split}
		&\Rightarrow f\rho_{f,g}(x) \neq f\rho_{f,g'}(x) \text{ or } g\rho_{f,g}(x) \neq g'\rho_{f,g'}(x)\\
		&\Rightarrow \exists y \in \rho_{f,g'}(x) \text{ s.t. } g(y) \neq g'(y)\\
		&\Rightarrow y \in G_\delta\\
		&\therefore \rho_{f,g'}^{-1}(y) \subseteq \rho_{f,g'}^{-1}(G_\delta)\\
		&\Rightarrow x \in \rho_{f,g'}^{-1}(G_\delta)
	\end{split}
	\]
\end{proof}

\begin{corollary}
	For $f, g, g'$ s.t. $\forall x \notin G_\delta$, $g(x) = g'(x)$, then $\forall x \notin C_{f,g} \cap \rho_{f,g'}^{-1}(G_\delta)$, $r_{f,g}(x) = r_{f,g'}(x)$
\end{corollary}

\begin{proof}
	This follows from the previous result and that if $x \in U_{f,g} \setminus G_\delta$ then $r_{f,g}(x) = r_{f,g'}(x)$.
\end{proof}

This allows us to reuse most of the results of $r_{f,g}$ when calculating $r_{f,g'}$ if $G_\delta$ is small. In particular we can calculate the delta between the two functions without having to inspect $U_{f,g}$, which dramatically cuts down the amount of data used to compute deltas of resolved state of large state maps.

However, we can do better than this. We can note that $r_{f,g}(x)$ only depends on $r_{f,g}\alpha_{f,g}(x)$ for values of $\alpha_{f,g}(x)$ not in $U_{f,g}$. Concretely, this means for example that if $G_\delta$ includes the membership of the sender of a power level event, but the power level event is in $U_{f,g}$, then we don't need to recalculate all conflicted events---despite the membership event being in every event's auth chain.

\begin{lemma}
	$\forall x$ s.t. $r_{f,g}(x) \neq r_{f,g'}(x)$ then $\exists y_1, ..., y_n$ s.t. $y_n \in G_\delta$, $y_i \notin U_{f,g}$ and $y_{i+1} \in \alpha_{f,g'}(y_i)$
\end{lemma}

\begin{proof}
	If $r_{f,g}(x) \neq r_{f,g'}(x)$ then $\exists y \in G_\delta$ s.t. $y \in \rho_{f,g'}(x)$. By definition of $\rho_{f,g'}(x)$, $\exists y_0, ..., y_n$ s.t. $y_0 = x$ and $y_{i+1} \in \alpha_{f,g'}(y_i)$. 
	
	We know that $r_{f,g'}(x)$ depends on either $u_{f,g'}(x)$ or $c_{f,g'}(x)$, but if $x \in U_{f,g'}$ then there is no dependency on $x$'s auth events and so $y_n = y_0 = x \in G_\delta$. Otherwise, we have $c_{f,g}(x) \neq c_{f,g'}(x)$, which depends on $f(x)$, $g'(x)$ or $r_{f,g'}(\alpha_{f,g'}(x))$. If $x \notin G_\delta$ then we know $f(x)$ and $g'(x)$ are the same, and so $r_{f,g}(\alpha_{f,g'}(x)) \neq r_{f,g'}(\alpha_{f,g'}(x)) \Rightarrow \exists y_1 \in \alpha_{f,g'}(x)$ s.t. $r_{f,g}(y_1) \neq r_{f,g'}(y_1)$.
	
	 Applying the above to $y_1$ then if $y_1 \in U_{f,g'} \Rightarrow y_1 = y_n \in G_\delta$. By induction $y_i \notin U_{f,g'}$ for $i < n$.
	 
	 (Note that we can assume $y_i \notin G_\delta$ as otherwise we would pick $n = i$,  and so if $y_i \notin U_{f,g} \Leftrightarrow y_i \notin U_{f,g'}$)
\end{proof}

\begin{minipage}{\textwidth}

We can use this approach and create an iterative algorithm for computing the set of state keys that need to be recalculated:

\begin{algorithm}[H]
\caption{Calculate state keys needing to be recalculated}
\begin{algorithmic}
	\STATE $to\_recalculate \leftarrow \text{empty set of state keys}$
	\STATE $pending \leftarrow G_\delta$
	\WHILE{$pending$ is not empty}
		\STATE $x \leftarrow \text{pop from}\ pending$
		\IF{$x \notin U_{f,g}$ and $x \notin to\_recalculate$}
			\STATE $\text{add all in}\ \alpha_{f,g'}^{-1}(x)\ \text{to}\ pending$
			\STATE $\text{add}\ x\ \text{to}\ to\_recalculate$
		\ENDIF
	\ENDWHILE
	\RETURN $to\_recalculate$
\end{algorithmic}
\end{algorithm}

\end{minipage}

It should be noted that this only gives the set of keys that need to be recalculated, and not the full set that would be needed to actually recalculate them. The full set needs to include the auth events for each key, i.e.:
\[
T \cup \bigcup_{x \in T} \alpha_{f,g}(x)
\]
where $T$ is the set $to\_reacalculate$.

\clearpage
\appendix
\appendixpage
\addappheadtotoc
\section{Dependency Relations}

If we have functions $a_f$, $b_f$, etc we can define a notion of dependency. We say that $a_f$ ``depends only on" $b_f$, which is written as $a_f \propto b_f$, if
\[
	\forall x, g \text{ s.t. } a_f(x) \neq a_g(x) \implies b_f(x) \neq b_g(x)
\]
This relation is transitive, i.e. if $a_f \propto b_f \propto c_f$ then $a_f \propto c_f$ as:
\[
\begin{split}
	\forall x, g \text{ s.t. } & a_f(x) \neq a_g(x)\\
	\implies & b_f(x) \neq b_g(x)\\
	\implies & c_f(x) \neq c_g(x)\\
\end{split}
\]

Most functions depend on more than one other function, so we introduce the notation $a_f \propto \{b_f, c_f\}$ ($a_f$ depends on the set of functions $\{b_f, c_f\}$) to mean
\[
\begin{split}
	\forall x, g \text{ s.t. }& a_f(x) \neq a_g(x)\\
	\implies \text{either }& b_f(x) \neq b_g(x),\\
	\text{or }& c_f(x) \neq c_g(x)\\
\end{split}
\]


\section{Auth Functions}

\begin{minipage}{\textwidth}

The following is the implementation of $\alpha$ in Synapse:

\begin{lstlisting}[language=Python, frame=tb, caption={Definition of $\alpha$},basicstyle=\small]
def auth_types_for_event(event):
    if event.type == EventTypes.Create:
        return []

    auth_types = []

    auth_types.append((EventTypes.PowerLevels, "", ))
    auth_types.append((EventTypes.Member, event.user_id, ))
    auth_types.append((EventTypes.Create, "", ))

    if event.type == EventTypes.Member:
        membership = event.content["membership"]
        if membership in [Membership.JOIN, Membership.INVITE]:
            auth_types.append((EventTypes.JoinRules, "", ))

        auth_types.append((EventTypes.Member, event.state_key, ))

        if membership == Membership.INVITE:
            if "third_party_invite" in event.content:
                key = (
                    EventTypes.ThirdPartyInvite,
                    event.content["third_party_invite"]["signed"]["token"]
                )
                auth_types.append(key)

    return auth_types
\end{lstlisting}
\end{minipage}

In particular the auth types are the same for all events except membership events. This means that $\alpha_{f,g}(x)$ is always the set $\{$ $\text{(``m.room.create", ``")},$ $\text{(``m.room.power\_levels", ``")}, $ $\text{(``m.room.member", sender)}$ $\}$ for non membership events.

If we have a room created by user $u_1$ (so the power levels, join rules etc. were all sent by them) and a state event sent by $u_2$, then the auth chain of that event (with state key $x$) is $\rho_{f,g}(x) = $ $\{ x,$ $\text{(``m.room.create", ``")},$ $\text{(``m.room.power\_levels", ``")}, $ $\text{(``m.room.join\_rules", ``")},$ $(\text{``m.room.member", } u_1), $ $(\text{``m.room.member", } u_2 )$ $ \}$.

This gives a valid, though imperfect, possible definition of $\rho_{f,g}^{-1}$ where $\rho_{f,g}^{-1}(x) = \{x\}$ for all state tuples that aren't create/power\_levels/membership/etc., and $\rho_{f,g}^{-1}(x) = F \cup G$ for those keys. This trivially satisfies the property that $\forall y \in \rho_{f,g}(x)$ then $x \in \rho_{f,g}^{-1}(y)$

\end{document}