# How do faster joins work?
This is a work-in-progress set of notes with two goals:
- act as a reference, explaining how Synapse implements faster joins; and
- record the rationale behind our choices.
See also [MSC3902](https://github.com/matrix-org/matrix-spec-proposals/pull/3902).
The key idea is described by [MSC3706](https://github.com/matrix-org/matrix-spec-proposals/pull/3706). This allows servers to
request a lightweight response to the federation `/send_join` endpoint.
This is called a **faster join**, also known as a **partial join**. In these
notes we'll usually use the word "partial" as it matches the database schema.
## Overview: processing events in a partially-joined room
The response to a partial join consists of
- the requested join event `J`,
- a list of the servers in the room (according to the state before `J`),
- a subset of the state of the room before `J`,
- the full auth chain of that state subset.
Synapse marks the room as partially joined by adding a row to the database table
`partial_state_rooms`. It also marks the join event `J` as "partially stated",
meaning that we have neither received nor computed the full state before/after
`J`. This is done by adding a row to `partial_state_events`.
DB schema
```
matrix=> \d partial_state_events
Table "matrix.partial_state_events"
  Column  │ Type │ Collation │ Nullable │ Default
══════════╪══════╪═══════════╪══════════╪═════════
 room_id  │ text │           │ not null │
 event_id │ text │           │ not null │
 
matrix=> \d partial_state_rooms
                Table "matrix.partial_state_rooms"
         Column         │  Type  │ Collation │ Nullable │ Default 
════════════════════════╪════════╪═══════════╪══════════╪═════════
 room_id                │ text   │           │ not null │ 
 device_lists_stream_id │ bigint │           │ not null │ 0
 join_event_id          │ text   │           │          │ 
 joined_via             │ text   │           │          │ 
matrix=> \d partial_state_rooms_servers
     Table "matrix.partial_state_rooms_servers"
   Column    │ Type │ Collation │ Nullable │ Default 
═════════════╪══════╪═══════════╪══════════╪═════════
 room_id     │ text │           │ not null │ 
 server_name │ text │           │ not null │ 
```
Indices, foreign-keys and check constraints are omitted for brevity.
`
disclosures because they have yet to be implemented in mainline Synapse.
### Creating events during a partial join
When sending out messages during a partial join, we assume our partial state is 
accurate and proceed as normal. For this to have any hope of succeeding at all,
our partial state must contain an entry for each of the (type, state key) pairs
[specified by the auth rules](https://spec.matrix.org/v1.3/rooms/v10/#authorization-rules):
- `m.room.create`
- `m.room.join_rules`
- `m.room.power_levels`
- `m.room.third_party_invite`
- `m.room.member`
The first four of these should be present in the state before `J` that is given
to us in the partial join response; only membership events are omitted. In order
for us to consider the user joined, we must have their membership event. That
means the only possible omission is the target's membership in an invite, kick
or ban.
The worst possibility is that we locally invite someone who is banned according to
the full state, because we lack their ban in our current partial state. The rest 
of the federation---at least, those who are fully joined---should correctly 
enforce the [membership transition constraints](
    https://spec.matrix.org/v1.3/client-server-api/#room-membership
). So any the erroneous invite should be ignored by fully-joined
homeservers and resolved by the resync for partially-joined homeservers.
In more generality, there are two problems we're worrying about here:
- We might create an event that is valid under our partial state, only to later
  find out that is actually invalid according to the full state.
- Or: we might refuse to create an event that is invalid under our partial
  state, even though it would be perfectly valid under the full state.
However we expect such problems to be unlikely in practise, because
- We trust that the room has sensible power levels, e.g. that bad actors with
  high power levels are demoted before their ban.
- We trust that the resident server provides us up-to-date power levels, join
  rules, etc.
- State changes in rooms are relatively infrequent, and the resync period is
  relatively quick.
#### Sending out the event over federation
**TODO:** needs prose fleshing out.
Normally: send out in a fed txn to all HSes in the room.
We only know that some HSes were in the room at some point. Wat do.
Send it out to the list of servers from the first join.
**TODO** what do we do here if we have full state?
If the prev event was created by us, we can risk sending it to the wrong HS. (Motivation: privacy concern of the content. Not such a big deal for a public room or an encrypted room. But non-encrypted invite-only...)
But don't want to send out sensitive data in other HS's events in this way.
Suppose we discover after resync that we shouldn't have sent out one our events (not a prev_event) to a target HS. Not much we can do.
What about if we didn't send them an event but shouldn't've?
E.g. what if someone joined from a new HS shortly after you did? We wouldn't talk to them.
Could imagine sending out the "Missed" events after the resync but... painful to work out what they should have seen if they joined/left.
Instead, just send them the latest event (if they're still in the room after resync) and let them backfill.(?)
- Don't do this currently.
- If anyone who has received our messages sends a message to a HS we missed, they can backfill our messages
- Gap: rooms which are infrequently used and take a long time to resync.
### Joining after a partial join
**NB.** Not yet implemented.
**TODO:** needs prose fleshing out. Liase with Matthieu. Explain why /send_join
(Rich was surprised we didn't just create it locally. Answer: to try and avoid
a join which then gets rejected after resync.)
We don't know for sure that any join we create would be accepted.
E.g. the joined user might have been banned; the join rules might have changed in a way that we didn't realise... some way in which the partial state was mistaken.
Instead, do another partial make-join/send-join handshake to confirm that the join works.
- Probably going to get a bunch of duplicate state events and auth events.... but the point of partial joins is that these should be small. Many are already persisted = good.
- What if the second send_join response includes a different list of reisdent HSes? Could ignore it.
  - Could even have a special flag that says "just make me a join", i.e. don't bother giving me state or servers in room. Deffo want the auth chain tho.
- SQ: wrt device lists it's a lot safer to ignore it!!!!!
- What if the state at the second join is inconsistent with what we have? Ignore it?
 
### Leaving (and kicks and bans) after a partial join
**NB.** Not yet implemented.
When you're fully joined to a room, to have `U` leave a room their homeserver
needs to
- create a new leave event for `U` which will be accepted by other homeservers,
  and
- send that event `U` out to the homeservers in the federation.
When is a leave event accepted? See
[v10 auth rules](https://spec.matrix.org/v1.5/rooms/v10/#authorization-rules):
> 4. If type is m.room.member: [...]
     >
     >    5. If membership is leave:
             >
             >       1. If the sender matches state_key, allow if and only if that user’s current membership state is invite, join, or knock.
>       2. [...]
I think this means that (well-formed!) self-leaves are governed entirely by
4.5.1. This means that if we correctly calculate state which says that `U` is
invited, joined or knocked and include it in the leave's auth events, our event
is accepted by checks 4 and 5 on incoming events.
> 4. Passes authorization rules based on the event’s auth events, otherwise
     >    it is rejected.
> 5. Passes authorization rules based on the state before the event, otherwise
     >    it is rejected.
The only way to fail check 6 is if the receiving server's current state of the
room says that `U` is banned, has left, or has no membership event. But this is
fine: the receiving server already thinks that `U` isn't in the room.
> 6. Passes authorization rules based on the current state of the room,
     >    otherwise it is “soft failed”.
For the second point (publishing the leave event), the best thing we can do is
to is publish to all HSes we know to be currently in the room. If they miss that
event, they might send us traffic in the room that we don't care about. This is
a problem with leaving after a "full" join; we don't seek to fix this with
partial joins.
(With that said: there's nothing machine-readable in the /send response. I don't
think we can deduce "destination has left the room" from a failure to /send an
event into that room?)
#### Can we still do this during a partial join?
We can create leave events and can choose what gets included in our auth events,
so we can be sure that we pass check 4 on incoming events. For check 5, we might
have an incorrect view of the state before an event.
The only way we might erroneously think a leave is valid is if
- the partial state before the leave has `U` joined, invited or knocked, but
- the full state before the leave has `U` banned, left or not present,
in which case the leave doesn't make anything worse: other HSes already consider
us as not in the room, and will continue to do so after seeing the leave.
The remaining obstacle is then: can we safely broadcast the leave event? We may
miss servers or incorrectly think that a server is in the room. Or the
destination server may be offline and miss the transaction containing our leave
event.This should self-heal when they see an event whose `prev_events` descends
from our leave.
Another option we considered was to use federation `/send_leave` to ask a
fully-joined server to send out the event on our behalf. But that introduces
complexity without much benefit. Besides, as Rich put it,
> sending out leaves is pretty best-effort currently
so this is probably good enough as-is.
#### Cleanup after the last leave
**TODO**: what cleanup is necessary? Is it all just nice-to-have to save unused
work?