From 74506934356dcb10b1704e3e66d4648e99ba6308 Mon Sep 17 00:00:00 2001 From: Erik Johnston Date: Mon, 27 Mar 2017 15:40:37 +0100 Subject: Initial TCP protocol implementation This defines the low level TCP replication protocol --- docs/tcp_replication.rst | 174 +++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 174 insertions(+) create mode 100644 docs/tcp_replication.rst (limited to 'docs/tcp_replication.rst') diff --git a/docs/tcp_replication.rst b/docs/tcp_replication.rst new file mode 100644 index 0000000000..946add2849 --- /dev/null +++ b/docs/tcp_replication.rst @@ -0,0 +1,174 @@ +TCP Replication +=============== + +This describes the TCP replication protocol that replaces the HTTP protocol. + +Motivation +---------- + +The HTTP API used long poll from the workers to the master, this has the problem +of causing a lot of duplicate work on the server. This TCP protocol aims to +solve. + +Overview +-------- + +The protocol is based on fire and forget, line based commands. An example flow +would be (where '>' indicates master->worker and '<' worker->master flows):: + + > SERVER example.com + < REPLICATE events 53 + > RDATA events 54 ["$foo1:bar.com", ...] + > RDATA events 55 ["$foo4:bar.com", ...] + +The example shows the server accepting a new connection and sending its identity +with the ``SERVER`` command, followed by the client asking to subscribe to the +``events`` stream from the token ``53``. The server then periodically sends ``RDATA`` +commands which have the format ``RDATA ```, where the +format of ```` is defined by the individual streams. + +Error reporting happens by either the client or server sending an `ERROR` +command, and usually the connection will be closed. + + +Since the protocol is a simple line based, its possible to manually connect to +the server using a tool like netcat. A few things should be noted when manually +using the protocol: + * When subscribing to a stream using ``REPLICATE``, the special token ``NOW`` can + be used to get all future updates. The special stream name ``ALL`` can be used + with ``NOW`` to subscribe to all available streams. + * The federation stream is only available if federation sending has been + disabled on the main process. + * The server will only time connections out that have sent a ``PING`` command. + If a ping is sent then the connection will be closed if no further commands + are receieved within 15s. Both the client and server protocol implementations + will send an initial PING on connection and ensure at least one command every + 5s is sent (not necessarily ``PING``). + * ``RDATA`` commands *usually* include a numeric token, however if the stream + has multiple rows to replicate per token the server will send multiple + ``RDATA`` commands, with all but the last having a token of ``batch``. See + the documentation on ``commands.RdataCommand`` for further details. + + +Architecture +------------ + +The basic structure of the protocol is line based, where the initial word of +each line specifies the command. The rest of the line is parsed based on the +command. For example, the `RDATA` command is defined as:: + + RDATA + +(Note that `` may contains spaces, but cannot contain newlines.) + +Blank lines are ignored. + + +Keep alives +~~~~~~~~~~~ + +Both sides are expected to send at least one command every 5s or so, and +should send a ``PING`` command if necessary. If either side do not receive a +command within e.g. 15s then the connection should be closed. + +Because the server may be connected to manually using e.g. netcat, the timeouts +aren't enabled until an initial ``PING`` command is seen. Both the client and +server implementations below send a ``PING`` command immediately on connection to +ensure the timeouts are enabled. + +This ensures that both sides can quickly realize if the tcp connection has gone +and handle the situation appropriately. + + +Start up +~~~~~~~~ + +When a new connection is made, the server: + * Sends a ``SERVER`` command, which includes the identity of the server, allowing + the client to detect if its connected to the expected server + * Sends a ``PING`` command as above, to enable the client to time out connections + promptly. + +The client: + * Sends a ``NAME`` command, allowing the server to associate a human friendly + name with the connection. This is optional. + * Sends a ``PING`` as above + * For each stream the client wishes to subscribe to it sends a ``REPLICATE`` + with the stream_name and token it wants to subscribe from. + * On receipt of a ``SERVER`` command, checks that the server name matches the + expected server name. + + +Error handling +~~~~~~~~~~~~~~ + +If either side detects an error it can send an ``ERROR`` command and close the +connection. + +If the client side loses the connection to the server it should reconnect, +following the steps above. + + +Congestion +~~~~~~~~~~ + +If the server sends messages faster than the client can consume them the server +will first buffer a (fairly large) number of commands and then disconnect the +client. This ensures that we don't queue up an unbounded number of commands in +memory and gives us a potential oppurtunity to squawk loudly. When/if the client +recovers it can reconnect to the server and ask for missed messages. + + +Reliability +~~~~~~~~~~~ + +In general the replication stream should be consisdered an unreliable transport +since e.g. commands are not resent if the connection disappears. + +The exception to that are the replication streams, i.e. RDATA commands, since +these include tokens which can be used to restart the stream on connection +errors. + +The client should keep track of the token in the last RDATA command received +for each stream so that on reconneciton it can start streaming from the correct +place. Note: not all RDATA have valid tokens due to batching. See +``RdataCommand`` for more details. + + +Example +~~~~~~~ + +An example iteraction is shown below. Each line is prefixed with '>' or '<' to +indicate which side is sending, these are *not* included on the wire:: + + * connection established * + > SERVER localhost:8823 + > PING 1490197665618 + < NAME synapse.app.appservice + < PING 1490197665618 + < REPLICATE events 1 + < REPLICATE backfill 1 + < REPLICATE caches 1 + > POSITION events 1 + > POSITION backfill 1 + > POSITION caches 1 + > RDATA caches 2 ["get_user_by_id",["@01register-user:localhost:8823"],1490197670513] + > RDATA events 14 ["$149019767112vOHxz:localhost:8823", + "!AFDCvgApUmpdfVjIXm:localhost:8823","m.room.guest_access","",null] + < PING 1490197675618 + > ERROR server stopping + * connection closed by server * + +The ``POSITION`` command sent by the server is used to set the clients position +without needing to send data with the ``RDATA`` command. + + +An example of a batched set of ``RDATA`` is:: + + > RDATA caches batch ["get_user_by_id",["@test:localhost:8823"],1490197670513] + > RDATA caches batch ["get_user_by_id",["@test2:localhost:8823"],1490197670513] + > RDATA caches batch ["get_user_by_id",["@test3:localhost:8823"],1490197670513] + > RDATA caches 54 ["get_user_by_id",["@test4:localhost:8823"],1490197670513] + +In this case the client shouldn't advance their caches token until it sees the +the last ``RDATA``. -- cgit 1.4.1 From 31e0fe90315c46d62163cd5cb78cf9302e9dd0c9 Mon Sep 17 00:00:00 2001 From: Erik Johnston Date: Thu, 30 Mar 2017 13:54:15 +0100 Subject: Fix indentation in docs/ --- docs/tcp_replication.rst | 53 +++++++++++++++++++++++++----------------------- 1 file changed, 28 insertions(+), 25 deletions(-) (limited to 'docs/tcp_replication.rst') diff --git a/docs/tcp_replication.rst b/docs/tcp_replication.rst index 946add2849..be0aa6b28c 100644 --- a/docs/tcp_replication.rst +++ b/docs/tcp_replication.rst @@ -34,20 +34,21 @@ command, and usually the connection will be closed. Since the protocol is a simple line based, its possible to manually connect to the server using a tool like netcat. A few things should be noted when manually using the protocol: - * When subscribing to a stream using ``REPLICATE``, the special token ``NOW`` can - be used to get all future updates. The special stream name ``ALL`` can be used - with ``NOW`` to subscribe to all available streams. - * The federation stream is only available if federation sending has been - disabled on the main process. - * The server will only time connections out that have sent a ``PING`` command. - If a ping is sent then the connection will be closed if no further commands - are receieved within 15s. Both the client and server protocol implementations - will send an initial PING on connection and ensure at least one command every - 5s is sent (not necessarily ``PING``). - * ``RDATA`` commands *usually* include a numeric token, however if the stream - has multiple rows to replicate per token the server will send multiple - ``RDATA`` commands, with all but the last having a token of ``batch``. See - the documentation on ``commands.RdataCommand`` for further details. + +* When subscribing to a stream using ``REPLICATE``, the special token ``NOW`` can + be used to get all future updates. The special stream name ``ALL`` can be used + with ``NOW`` to subscribe to all available streams. +* The federation stream is only available if federation sending has been + disabled on the main process. +* The server will only time connections out that have sent a ``PING`` command. + If a ping is sent then the connection will be closed if no further commands + are receieved within 15s. Both the client and server protocol implementations + will send an initial PING on connection and ensure at least one command every + 5s is sent (not necessarily ``PING``). +* ``RDATA`` commands *usually* include a numeric token, however if the stream + has multiple rows to replicate per token the server will send multiple + ``RDATA`` commands, with all but the last having a token of ``batch``. See + the documentation on ``commands.RdataCommand`` for further details. Architecture @@ -84,19 +85,21 @@ Start up ~~~~~~~~ When a new connection is made, the server: - * Sends a ``SERVER`` command, which includes the identity of the server, allowing - the client to detect if its connected to the expected server - * Sends a ``PING`` command as above, to enable the client to time out connections - promptly. + +* Sends a ``SERVER`` command, which includes the identity of the server, allowing + the client to detect if its connected to the expected server +* Sends a ``PING`` command as above, to enable the client to time out connections + promptly. The client: - * Sends a ``NAME`` command, allowing the server to associate a human friendly - name with the connection. This is optional. - * Sends a ``PING`` as above - * For each stream the client wishes to subscribe to it sends a ``REPLICATE`` - with the stream_name and token it wants to subscribe from. - * On receipt of a ``SERVER`` command, checks that the server name matches the - expected server name. + +* Sends a ``NAME`` command, allowing the server to associate a human friendly + name with the connection. This is optional. +* Sends a ``PING`` as above +* For each stream the client wishes to subscribe to it sends a ``REPLICATE`` + with the stream_name and token it wants to subscribe from. +* On receipt of a ``SERVER`` command, checks that the server name matches the + expected server name. Error handling -- cgit 1.4.1 From bfcf016714575edb0ad2c19b2f00694d62ca08ec Mon Sep 17 00:00:00 2001 From: Erik Johnston Date: Fri, 31 Mar 2017 11:19:24 +0100 Subject: Fix up docs --- docs/tcp_replication.rst | 16 ++++++++-------- synapse/replication/tcp/__init__.py | 18 +----------------- synapse/replication/tcp/streams.py | 4 ++-- synapse/storage/pusher.py | 2 +- 4 files changed, 12 insertions(+), 28 deletions(-) (limited to 'docs/tcp_replication.rst') diff --git a/docs/tcp_replication.rst b/docs/tcp_replication.rst index be0aa6b28c..7393527f6f 100644 --- a/docs/tcp_replication.rst +++ b/docs/tcp_replication.rst @@ -1,20 +1,20 @@ TCP Replication =============== -This describes the TCP replication protocol that replaces the HTTP protocol. - Motivation ---------- -The HTTP API used long poll from the workers to the master, this has the problem -of causing a lot of duplicate work on the server. This TCP protocol aims to -solve. +Previously the workers used an HTTP long poll mechanism to get updates from the +master, which had the problem of causing a lot of duplicate work on the server. +This TCP protocol replaces those APIs with the aim of increased efficiency. + + Overview -------- The protocol is based on fire and forget, line based commands. An example flow -would be (where '>' indicates master->worker and '<' worker->master flows):: +would be (where '>' indicates master to worker and '<' worker to master flows):: > SERVER example.com < REPLICATE events 53 @@ -24,7 +24,7 @@ would be (where '>' indicates master->worker and '<' worker->master flows):: The example shows the server accepting a new connection and sending its identity with the ``SERVER`` command, followed by the client asking to subscribe to the ``events`` stream from the token ``53``. The server then periodically sends ``RDATA`` -commands which have the format ``RDATA ```, where the +commands which have the format ``RDATA ``, where the format of ```` is defined by the individual streams. Error reporting happens by either the client or server sending an `ERROR` @@ -125,7 +125,7 @@ recovers it can reconnect to the server and ask for missed messages. Reliability ~~~~~~~~~~~ -In general the replication stream should be consisdered an unreliable transport +In general the replication stream should be considered an unreliable transport since e.g. commands are not resent if the connection disappears. The exception to that are the replication streams, i.e. RDATA commands, since diff --git a/synapse/replication/tcp/__init__.py b/synapse/replication/tcp/__init__.py index 8b4e886d4e..81c2ea7ee9 100644 --- a/synapse/replication/tcp/__init__.py +++ b/synapse/replication/tcp/__init__.py @@ -16,22 +16,7 @@ """This module implements the TCP replication protocol used by synapse to communicate between the master process and its workers (when they're enabled). -The protocol is based on fire and forget, line based commands. An example flow -would be (where '>' indicates master->worker and '<' worker->master flows):: - - > SERVER example.com - < REPLICATE events 53 - > RDATA events 54 ["$foo1:bar.com", ...] - > RDATA events 55 ["$foo4:bar.com", ...] - -The example shows the server accepting a new connection and sending its identity -with the `SERVER` command, followed by the client asking to subscribe to the -`events` stream from the token `53`. The server then periodically sends `RDATA` -commands which have the format `RDATA `, where the -format of `` is defined by the individual streams. - -Error reporting happens by either the client or server sending an `ERROR` -command, and usually the connection will be closed. +Further details can be found in docs/tcp_replication.rst Structure of the module: @@ -42,5 +27,4 @@ Structure of the module: * resource.py - the server classes that accepts and handle client connections * streams.py - the definitons of all the valid streams -Further details can be found in docs/tcp_replication.rst """ diff --git a/synapse/replication/tcp/streams.py b/synapse/replication/tcp/streams.py index 07adf9412e..fada40c6ef 100644 --- a/synapse/replication/tcp/streams.py +++ b/synapse/replication/tcp/streams.py @@ -154,8 +154,8 @@ class Stream(object): True then limit is provided, otherwise it's not. Returns: - list(tuple): the first entry in the tuple is the token for that - update, and the rest of the tuple gets used to construct + Deferred(list(tuple)): the first entry in the tuple is the token for + that update, and the rest of the tuple gets used to construct a ``ROW_TYPE`` instance """ raise NotImplementedError() diff --git a/synapse/storage/pusher.py b/synapse/storage/pusher.py index 715c8bef24..34d2f82b7f 100644 --- a/synapse/storage/pusher.py +++ b/synapse/storage/pusher.py @@ -139,7 +139,7 @@ class PusherStore(SQLBaseStore): """Get all the pushers that have changed between the given tokens. Returns: - list(tuple): each tuple consists of: + Deferred(list(tuple)): each tuple consists of: stream_id (str) user_id (str) app_id (str) -- cgit 1.4.1 From b4276a3896b7fec702feac4a60391ec81e595322 Mon Sep 17 00:00:00 2001 From: Erik Johnston Date: Fri, 31 Mar 2017 11:34:40 +0100 Subject: Add a brief list of commands to docs --- docs/tcp_replication.rst | 46 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) (limited to 'docs/tcp_replication.rst') diff --git a/docs/tcp_replication.rst b/docs/tcp_replication.rst index 7393527f6f..62225ba6f4 100644 --- a/docs/tcp_replication.rst +++ b/docs/tcp_replication.rst @@ -175,3 +175,49 @@ An example of a batched set of ``RDATA`` is:: In this case the client shouldn't advance their caches token until it sees the the last ``RDATA``. + + +List of commands +~~~~~~~~~~~~~~~~ + +The list of valid commands, with which side can send it: server (S) or client (C): + +SERVER (S) + Sent at the start to identify which server the client is talking to + +RDATA (S) + A single update in a stream + +POSITION (S) + The position of the stream has been updated + +ERROR (S, C) + There was an error + +PING (S, C) + Sent periodically to ensure the connection is still alive + +NAME (C) + Sent at the start by client to inform the server who they are + +REPLICATE (C) + Asks the server to replicate a given stream + +USER_SYNC (C) + A user has started or stopped syncing + +FEDERATION_ACK (C) + Acknowledge receipt of some federation data + +REMOVE_PUSHER (C) + Inform the server a pusher should be removed + +INVALIDATE_CACHE (C) + Inform the server a cache should be invalidated + +SYNC (S, C) + Used exclusively in tests + + +See ``synapse/replication/tcp/commands.py`` for a detailed description and the +format of each command. -- cgit 1.4.1