From 2044f365c8845b369ae26b880d4c1e5e862c0ee5 Mon Sep 17 00:00:00 2001 From: erikjohnston Date: Tue, 28 Jun 2022 12:15:27 +0000 Subject: deploy: 7469824d5838577f5a07aec6ab73b457459d8b4a --- v1.62/tcp_replication.html | 384 +++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 384 insertions(+) create mode 100644 v1.62/tcp_replication.html (limited to 'v1.62/tcp_replication.html') diff --git a/v1.62/tcp_replication.html b/v1.62/tcp_replication.html new file mode 100644 index 0000000000..d1b33493b7 --- /dev/null +++ b/v1.62/tcp_replication.html @@ -0,0 +1,384 @@ + + + + + + TCP Replication - Synapse + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + +
+
+ +
+ +
+ +

TCP Replication

+

Motivation

+

Previously the workers used an HTTP long poll mechanism to get updates +from the master, which had the problem of causing a lot of duplicate +work on the server. This TCP protocol replaces those APIs with the aim +of increased efficiency.

+

Overview

+

The protocol is based on fire and forget, line based commands. An +example flow would be (where '>' indicates master to worker and +'<' worker to master flows):

+
> SERVER example.com
+< REPLICATE
+> POSITION events master 53 53
+> RDATA events master 54 ["$foo1:bar.com", ...]
+> RDATA events master 55 ["$foo4:bar.com", ...]
+
+

The example shows the server accepting a new connection and sending its identity +with the SERVER command, followed by the client server to respond with the +position of all streams. The server then periodically sends RDATA commands +which have the format RDATA <stream_name> <instance_name> <token> <row>, where +the format of <row> is defined by the individual streams. The +<instance_name> is the name of the Synapse process that generated the data +(usually "master").

+

Error reporting happens by either the client or server sending an ERROR +command, and usually the connection will be closed.

+

Since the protocol is a simple line based, its possible to manually +connect to the server using a tool like netcat. A few things should be +noted when manually using the protocol:

+
    +
  • The federation stream is only available if federation sending has +been disabled on the main process.
  • +
  • The server will only time connections out that have sent a PING +command. If a ping is sent then the connection will be closed if no +further commands are receieved within 15s. Both the client and +server protocol implementations will send an initial PING on +connection and ensure at least one command every 5s is sent (not +necessarily PING).
  • +
  • RDATA commands usually include a numeric token, however if the +stream has multiple rows to replicate per token the server will send +multiple RDATA commands, with all but the last having a token of +batch. See the documentation on commands.RdataCommand for +further details.
  • +
+

Architecture

+

The basic structure of the protocol is line based, where the initial +word of each line specifies the command. The rest of the line is parsed +based on the command. For example, the RDATA command is defined as:

+
RDATA <stream_name> <instance_name> <token> <row_json>
+
+

(Note that <row_json> may contains spaces, but cannot contain +newlines.)

+

Blank lines are ignored.

+

Keep alives

+

Both sides are expected to send at least one command every 5s or so, and +should send a PING command if necessary. If either side do not receive +a command within e.g. 15s then the connection should be closed.

+

Because the server may be connected to manually using e.g. netcat, the +timeouts aren't enabled until an initial PING command is seen. Both +the client and server implementations below send a PING command +immediately on connection to ensure the timeouts are enabled.

+

This ensures that both sides can quickly realize if the tcp connection +has gone and handle the situation appropriately.

+

Start up

+

When a new connection is made, the server:

+
    +
  • Sends a SERVER command, which includes the identity of the server, +allowing the client to detect if its connected to the expected +server
  • +
  • Sends a PING command as above, to enable the client to time out +connections promptly.
  • +
+

The client:

+
    +
  • Sends a NAME command, allowing the server to associate a human +friendly name with the connection. This is optional.
  • +
  • Sends a PING as above
  • +
  • Sends a REPLICATE to get the current position of all streams.
  • +
  • On receipt of a SERVER command, checks that the server name +matches the expected server name.
  • +
+

Error handling

+

If either side detects an error it can send an ERROR command and close +the connection.

+

If the client side loses the connection to the server it should +reconnect, following the steps above.

+

Congestion

+

If the server sends messages faster than the client can consume them the +server will first buffer a (fairly large) number of commands and then +disconnect the client. This ensures that we don't queue up an unbounded +number of commands in memory and gives us a potential oppurtunity to +squawk loudly. When/if the client recovers it can reconnect to the +server and ask for missed messages.

+

Reliability

+

In general the replication stream should be considered an unreliable +transport since e.g. commands are not resent if the connection +disappears.

+

The exception to that are the replication streams, i.e. RDATA commands, +since these include tokens which can be used to restart the stream on +connection errors.

+

The client should keep track of the token in the last RDATA command +received for each stream so that on reconneciton it can start streaming +from the correct place. Note: not all RDATA have valid tokens due to +batching. See RdataCommand for more details.

+

Example

+

An example iteraction is shown below. Each line is prefixed with '>' +or '<' to indicate which side is sending, these are not included on +the wire:

+
* connection established *
+> SERVER localhost:8823
+> PING 1490197665618
+< NAME synapse.app.appservice
+< PING 1490197665618
+< REPLICATE
+> POSITION events master 1 1
+> POSITION backfill master 1 1
+> POSITION caches master 1 1
+> RDATA caches master 2 ["get_user_by_id",["@01register-user:localhost:8823"],1490197670513]
+> RDATA events master 14 ["$149019767112vOHxz:localhost:8823",
+    "!AFDCvgApUmpdfVjIXm:localhost:8823","m.room.guest_access","",null]
+< PING 1490197675618
+> ERROR server stopping
+* connection closed by server *
+
+

The POSITION command sent by the server is used to set the clients +position without needing to send data with the RDATA command.

+

An example of a batched set of RDATA is:

+
> RDATA caches master batch ["get_user_by_id",["@test:localhost:8823"],1490197670513]
+> RDATA caches master batch ["get_user_by_id",["@test2:localhost:8823"],1490197670513]
+> RDATA caches master batch ["get_user_by_id",["@test3:localhost:8823"],1490197670513]
+> RDATA caches master 54 ["get_user_by_id",["@test4:localhost:8823"],1490197670513]
+
+

In this case the client shouldn't advance their caches token until it +sees the the last RDATA.

+

List of commands

+

The list of valid commands, with which side can send it: server (S) or +client (C):

+

SERVER (S)

+

Sent at the start to identify which server the client is talking to

+

RDATA (S)

+

A single update in a stream

+

POSITION (S)

+

On receipt of a POSITION command clients should check if they have missed any +updates, and if so then fetch them out of band. Sent in response to a +REPLICATE command (but can happen at any time).

+

The POSITION command includes the source of the stream. Currently all streams +are written by a single process (usually "master"). If fetching missing +updates via HTTP API, rather than via the DB, then processes should make the +request to the appropriate process.

+

Two positions are included, the "new" position and the last position sent respectively. +This allows servers to tell instances that the positions have advanced but no +data has been written, without clients needlessly checking to see if they +have missed any updates.

+

ERROR (S, C)

+

There was an error

+

PING (S, C)

+

Sent periodically to ensure the connection is still alive

+

NAME (C)

+

Sent at the start by client to inform the server who they are

+

REPLICATE (C)

+

Asks the server for the current position of all streams.

+

USER_SYNC (C)

+

A user has started or stopped syncing on this process.

+

CLEAR_USER_SYNC (C)

+

The server should clear all associated user sync data from the worker.

+

This is used when a worker is shutting down.

+

FEDERATION_ACK (C)

+

Acknowledge receipt of some federation data

+

REMOTE_SERVER_UP (S, C)

+

Inform other processes that a remote server may have come back online.

+

See synapse/replication/tcp/commands.py for a detailed description and +the format of each command.

+

Cache Invalidation Stream

+

The cache invalidation stream is used to inform workers when they need +to invalidate any of their caches in the data store. This is done by +streaming all cache invalidations done on master down to the workers, +assuming that any caches on the workers also exist on the master.

+

Each individual cache invalidation results in a row being sent down +replication, which includes the cache name (the name of the function) +and they key to invalidate. For example:

+
> RDATA caches master 550953771 ["get_user_by_id", ["@bob:example.com"], 1550574873251]
+
+

Alternatively, an entire cache can be invalidated by sending down a null +instead of the key. For example:

+
> RDATA caches master 550953772 ["get_user_by_id", null, 1550574873252]
+
+

However, there are times when a number of caches need to be invalidated +at the same time with the same key. To reduce traffic we batch those +invalidations into a single poke by defining a special cache name that +workers understand to mean to expand to invalidate the correct caches.

+

Currently the special cache names are declared in +synapse/storage/_base.py and are:

+
    +
  1. cs_cache_fake ─ invalidates caches that depend on the current +state
  2. +
+ +
+ + +
+
+ + + +
+ + + + + + + + + + + + + \ No newline at end of file -- cgit 1.4.1