summary refs log tree commit diff
path: root/synapse/replication/tcp/handler.py (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Make event persisters periodically announce position over replication. (#8499)Erik Johnston2020-10-121-10/+14
| | | | | Currently background proccesses stream the events stream use the "minimum persisted position" (i.e. `get_current_token()`) rather than the vector clock style tokens. This is broadly fine as it doesn't matter if the background processes lag a small amount. However, in extreme cases (i.e. SyTests) where we only write to one event persister the background processes will never make progress. This PR changes it so that the `MultiWriterIDGenerator` keeps the current position of a given instance as up to date as possible (i.e using the latest token it sees if its not in the process of persisting anything), and then periodically announces that over replication. This then allows the "minimum persisted position" to advance, albeit with a small lag.
* Add unit test for event persister sharding (#8433)Erik Johnston2020-10-021-3/+3
|
* Add experimental support for sharding event persister. Again. (#8294)Erik Johnston2020-09-141-1/+1
| | | | | | This is *not* ready for production yet. Caveats: 1. We should write some tests... 2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
* Revert "Add experimental support for sharding event persister. (#8170)" (#8242)Brendan Abolivier2020-09-041-1/+1
| | | | | | | * Revert "Add experimental support for sharding event persister. (#8170)" This reverts commit 82c1ee1c22a87b9e6e3179947014b0f11c0a1ac3. * Changelog
* Add experimental support for sharding event persister. (#8170)Erik Johnston2020-09-021-1/+1
| | | | | | This is *not* ready for production yet. Caveats: 1. We should write some tests... 2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
* Handle replication commands synchronously where possible (#7876)Richard van der Hoff2020-07-271-49/+66
| | | Most of the stuff we do for replication commands can be done synchronously. There's no point spinning up background processes if we're not going to need them.
* Remove an unused prometheus metric (#7878)Richard van der Hoff2020-07-221-3/+1
|
* Optimise queueing of inbound replication commands (#7861)Richard van der Hoff2020-07-161-116/+215
| | | | | | | | | | | When we get behind on replication, we tend to stack up background processes behind a linearizer. Bg processes are heavy (particularly with respect to prometheus metrics) and linearizers aren't terribly efficient once the queue gets long either. A better approach is to maintain a queue of requests to be processed, and nominate a single process to work its way through the queue. Fixes: #7444
* Allow moving typing off master (#7869)Erik Johnston2020-07-161-0/+9
|
* Add ability to shard the federation sender (#7798)Erik Johnston2020-07-101-2/+2
|
* isort 5 compatibility (#7786)Will Hunt2020-07-051-2/+2
| | | The CI appears to use the latest version of isort, which is a problem when isort gets a major version bump. Rather than try to pin the version, I've done the necessary to make isort5 happy with synapse.
* Discard RDATA from already seen positions. (#7648)Patrick Cloke2020-06-151-4/+26
|
* Ensure ReplicationStreamer is always started when replication enabled. (#7579)Erik Johnston2020-05-271-0/+3
| | | Fixes #7566.
* Add option to move event persistence off master (#7517)Erik Johnston2020-05-221-0/+10
|
* Have all instances correctly respond to REPLICATE command. (#7475)Erik Johnston2020-05-131-10/+45
| | | | | Before all streams were only written to from master, so only master needed to respond to `REPLICATE` commands. Before all instances wrote to the cache invalidation stream, but didn't respond to `REPLICATE`. This was a bug, which could lead to missed rows from cache invalidation stream if an instance is restarted, however all the caches would be empty in that case so it wasn't a problem.
* Fix Redis reconnection logic (#7482)Erik Johnston2020-05-131-1/+8
| | | Proactively send out `POSITION` commands (as if we had just received a `REPLICATE`) when we connect to Redis. This is important as other instances won't notice we've connected to issue a `REPLICATE` command (unlike for direct TCP connections). This is only currently an issue if master process reconnects without restarting (if it restarts then it won't have written anything and so other instances probably won't have missed anything).
* Merge branch 'release-v1.13.0' into developAndrew Morgan2020-05-111-3/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * release-v1.13.0: Don't UPGRADE database rows RST indenting Put rollback instructions in upgrade notes Fix changelog typo Oh yeah, RST Absolute URL it is then Fix upgrade notes link Provide summary of upgrade issues in changelog. Fix ) Move next version notes from changelog to upgrade notes Changelog fixes 1.13.0rc1 Documentation on setting up redis (#7446) Rework UI Auth session validation for registration (#7455) Fix errors from malformed log line (#7454) Drop support for redis.dbid (#7450)
| * Drop support for redis.dbid (#7450)Richard van der Hoff2020-05-071-3/+1
| | | | | | Since we only use pubsub, the dbid is irrelevant.
* | Support any process writing to cache invalidation stream. (#7436)Erik Johnston2020-05-071-35/+7
|/
* Merge branch 'release-v1.13.0' into rav/fix_dropped_messagesRichard van der Hoff2020-05-051-1/+1
|\
| * Move logs about discarded RDATA to debug (#7421)Brendan Abolivier2020-05-051-1/+1
| |
* | Merge branch 'release-v1.13.0' into rav/fix_dropped_messagesRichard van der Hoff2020-05-051-13/+16
|\|
| * Thread through instance name to replication client. (#7369)Erik Johnston2020-05-011-5/+15
| | | | | | For in memory streams when fetching updates on workers we need to query the source of the stream, which currently is hard coded to be master. This PR threads through the source instance we received via `POSITION` through to the update function in each stream, which can then be passed to the replication client for in memory streams.
| * Use `stream.current_token()` and remove `stream_positions()` (#7172)Erik Johnston2020-05-011-9/+1
| | | | | | | | We move the processing of typing and federation replication traffic into their handlers so that `Stream.current_token()` points to a valid token. This allows us to remove `get_streams_to_replicate()` and `stream_positions()`.
* | Wait for a POSITION on the right connection before accepting RDATARichard van der Hoff2020-05-051-18/+37
| | | | | | | | ... otherwise we can believe we're up to date when we're not.
* | Wait to subscribe before sending REPLICATERichard van der Hoff2020-05-051-1/+2
|/
* Add instance name to RDATA/POSITION commands (#7364)Erik Johnston2020-04-291-3/+14
| | | | | This is primarily for allowing us to send those commands from workers, but for now simply allows us to ignore echoed RDATA/POSITION commands that we sent (we get echoes of sent commands when using redis). Currently we log a WARNING on the master process every time we receive an echoed RDATA.
* Don't relay REMOTE_SERVER_UP cmds to same conn. (#7352)Erik Johnston2020-04-291-14/+49
| | | | | | | | | | | | | | For direct TCP connections we need the master to relay REMOTE_SERVER_UP commands to the other connections so that all instances get notified about it. The old implementation just relayed to all connections, assuming that sending back to the original sender of the command was safe. This is not true for redis, where commands sent get echoed back to the sender, which was causing master to effectively infinite loop sending and then re-receiving REMOTE_SERVER_UP commands that it sent. The fix is to ensure that we only relay to *other* connections and not to the connection we received the notification from. Fixes #7334.
* Fix limit logic for EventsStream (#7358)Richard van der Hoff2020-04-291-1/+3
| | | | | | | | | | | | | | | | | | | * Factor out functions for injecting events into database I want to add some more flexibility to the tools for injecting events into the database, and I don't want to clutter up HomeserverTestCase with them, so let's factor them out to a new file. * Rework TestReplicationDataHandler This wasn't very easy to work with: the mock wrapping was largely superfluous, and it's useful to be able to inspect the received rows, and clear out the received list. * Fix AssertionErrors being thrown by EventsStream Part of the problem was that there was an off-by-one error in the assertion, but also the limit logic was too simple. Fix it all up and add some tests.
* Stop the master relaying USER_SYNC for other workers (#7318)Richard van der Hoff2020-04-221-10/+5
| | | | | | | Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication. In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits. Fixes (I hope) #7257.
* Add ability to run replication protocol over redis. (#7040)Erik Johnston2020-04-221-7/+43
| | | This is configured via the `redis` config options.
* On catchup, process each row with its own stream id (#7286)Richard van der Hoff2020-04-201-5/+68
| | | | | | Other parts of the code (such as the StreamChangeCache) assume that there will not be multiple changes with the same stream id. This code was introduced in #7024, and I hope this fixes #7206.
* Remove vestigal references to SYNC replication commandRichard van der Hoff2020-04-071-4/+0
| | | | We've ripped pretty much all of this out: let's remove the remains.
* Fix race in replication (#7226)Erik Johnston2020-04-071-28/+45
| | | | Fixes a race between handling `POSITION` and `RDATA` commands. We do this by simply linearizing handling of them.
* Move server command handling out of TCP protocol (#7187)Erik Johnston2020-04-071-18/+159
| | | This completes the merging of server and client command processing.
* Move client command handling out of TCP protocol (#7185)Erik Johnston2020-04-061-0/+252
The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.