summary refs log tree commit diff
path: root/synapse/federation/sender (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Fix sharded federation sender sometimes using 100% CPU.Erik Johnston2021-04-081-2/+4
| | | | | | | We pull all destinations requiring catchup from the DB in batches. However, if all those destinations get filtered out (due to the federation sender being sharded), then the `last_processed` destination doesn't get updated, and we keep requesting the same set repeatedly.
* Add a Synapse Module for configuring presence update routing (#9491)Andrew Morgan2021-04-061-1/+18
| | | | | | | | | | | | At the moment, if you'd like to share presence between local or remote users, those users must be sharing a room together. This isn't always the most convenient or useful situation though. This PR adds a module to Synapse that will allow deployments to set up extra logic on where presence updates should be routed. The module must implement two methods, `get_users_for_states` and `get_interested_users`. These methods are given presence updates or user IDs and must return information that Synapse will use to grant passing presence updates around. A method is additionally added to `ModuleApi` which allows triggering a set of users to receive the current, online presence information for all users they are considered interested in. This is the equivalent of that user receiving presence information during an initial sync. The goal of this module is to be fairly generic and useful for a variety of applications, with hard requirements being: * Sending state for a specific set or all known users to a defined set of local and remote users. * The ability to trigger an initial sync for specific users, so they receive all current state.
* Improve tracing for to device messages (#9686)Erik Johnston2021-04-011-0/+8
|
* Add type hints for the federation sender. (#9681)Patrick Cloke2021-03-291-13/+103
| | | | Includes an abstract base class which both the FederationSender and the FederationRemoteSendQueue must implement.
* Fixed undefined variable error in catchup (#9664)Erik Johnston2021-03-241-0/+2
| | | | | Broke in #9640 Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Make federation catchup send last event from any server. (#9640)Erik Johnston2021-03-181-15/+89
| | | | | | | | | | | | | | Currently federation catchup will send the last *local* event that we failed to send to the remote. This can cause issues for large rooms where lots of servers have sent events while the remote server was down, as when it comes back up again it'll be flooded with events from various points in the DAG. Instead, let's make it so that all the servers send the most recent events, even if its not theirs. The remote should deduplicate the events, so there shouldn't be much overhead in doing this. Alternatively, the servers could only send local events if they were also extremities and hope that the other server will send the event over, but that is a bit risky.
* Don't go into federation catch up mode so easily (#9561)Erik Johnston2021-03-152-153/+182
| | | | | | | | | | Federation catch up mode is very inefficient if the number of events that the remote server has missed is small, since handling gaps can be very expensive, c.f. #9492. Instead of going into catch up mode whenever we see an error, we instead do so only if we've backed off from trying the remote for more than an hour (the assumption being that in such a case it is more than a transient failure).
* Replace `last_*_pdu_age` metrics with timestamps (#9540)Richard van der Hoff2021-03-041-6/+5
| | | | | | | | Following the advice at https://prometheus.io/docs/practices/instrumentation/#timestamps-not-time-since, it's preferable to export unix timestamps, not ages. There doesn't seem to be any particular naming convention for timestamp metrics.
* Be smarter about which hosts to send presence to when processing room joins ↵Andrew Morgan2021-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | (#9402) This PR attempts to eliminate unnecessary presence sending work when your local server joins a room, or when a remote server joins a room your server is participating in by processing state deltas in chunks rather than individually. --- When your server joins a room for the first time, it requests the historical state as well. This chunk of new state is passed to the presence handler which, after filtering that state down to only membership joins, will send presence updates to homeservers for each join processed. It turns out that we were being a bit naive and processing each event individually, and sending out presence updates for every one of those joins. Even if many different joins were users on the same server (hello IRC bridges), we'd send presence to that same homeserver for every remote user join we saw. This PR attempts to deduplicate all of that by processing the entire batch of state deltas at once, instead of only doing each join individually. We process the joins and note down which servers need which presence: * If it was a local user join, send that user's latest presence to all servers in the room * If it was a remote user join, send the presence for all local users in the room to that homeserver We deduplicate by inserting all of those pending updates into a dictionary of the form: ``` { server_name1: {presence_update1, ...}, server_name2: {presence_update1, presence_update2, ...} } ``` Only after building this dict do we then start sending out presence updates.
* Update black, and run auto formatting over the codebase (#9381)Eric Eastwood2021-02-163-9/+18
| | | | | | | - Update black version to the latest - Run black auto formatting over the codebase - Run autoformatting according to [`docs/code_style.md `](https://github.com/matrix-org/synapse/blob/80d6dc9783aa80886a133756028984dbf8920168/docs/code_style.md) - Update `code_style.md` docs around installing black to use the correct version
* Precompute joined hosts and store in Redis (#9198)Erik Johnston2021-01-261-15/+35
|
* Fix not sending events over federation when using sharded event persisters ↵Erik Johnston2020-10-141-2/+7
| | | | | | | | | | | | | | | | | (#8536) * Fix outbound federaion with multiple event persisters. We incorrectly notified federation senders that the minimum persisted stream position had advanced when we got an `RDATA` from an event persister. Notifying of federation senders already correctly happens in the notifier, so we just delete the offending line. * Change some interfaces to use RoomStreamToken. By enforcing use of `RoomStreamTokens` we make it less likely that people pass in random ints that they got from somewhere random.
* Remove stream ordering from Metadata dict (#8452)Richard van der Hoff2020-10-052-0/+4
| | | | | | | | There's no need for it to be in the dict as well as the events table. Instead, we store it in a separate attribute in the EventInternalMetadata object, and populate that on load. This means that we can rely on it being correctly populated for any event which has been persited to the database.
* Fix malformed log line in new federation "catch up" logic (#8442)Richard van der Hoff2020-10-021-1/+1
|
* Add prometheus metrics to track federation delays (#8430)Richard van der Hoff2020-10-011-0/+22
| | | | | Add a pair of federation metrics to track the delays in sending PDUs to/from particular servers.
* Catch-up after Federation Outage (bonus): Catch-up on Synapse Startup (#8322)reivilibre2020-09-181-0/+51
| | | | | | | | | | Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net> Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com> * Fix _set_destination_retry_timings This came about because the code assumed that retry_interval could not be NULL — which has been challenged by catch-up.
* Catch-up after Federation Outage (split, 4): catch-up loop (#8272)reivilibre2020-09-151-4/+125
|
* Catch up after Federation Outage (split, 2): Track last successful stream ↵reivilibre2020-09-041-0/+11
| | | | | ordering after transmission (#8247) Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Catch-up after Federation Outage (split, 1) (#8230)reivilibre2020-09-041-2/+9
| | | Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
* Stop sub-classing object (#8249)Patrick Cloke2020-09-043-3/+3
|
* Remove obsolete order field in `send_new_transaction` (#8245)reivilibre2020-09-033-28/+22
| | | Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Add type hints for state. (#8140)Patrick Cloke2020-08-241-2/+2
|
* Be stricter about JSON that is accepted by Synapse (#8106)Patrick Cloke2020-08-191-3/+2
|
* Convert stream database to async/await. (#8074)Patrick Cloke2020-08-172-2/+2
|
* Drop federation transmission queues during a significant remote outage. (#7864)reivilibre2020-08-131-0/+22
| | | | | | | | | | | | | * Empty federation transmission queues when we are backing off. Fixes #7828. Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net> * Address feedback Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net> * Reword newsfile
* Fix typing for notifier (#8064)Erik Johnston2020-08-121-2/+5
|
* Merge branch 'master' into developOlivier Wilkinson (reivilibre)2020-07-302-2/+2
|\
| * Update worker docs with recent enhancements (#7969)Erik Johnston2020-07-292-2/+2
| |
* | Convert federation client to async/await. (#7975)Patrick Cloke2020-07-301-13/+6
|/
* Convert state resolution to async/await (#7942)Patrick Cloke2020-07-241-1/+3
|
* Convert presence handler helpers to async/await. (#7939)Patrick Cloke2020-07-231-1/+3
|
* Add ability to run multiple pusher instances (#7855)Erik Johnston2020-07-162-9/+9
| | | This reuses the same scheme as federation sender sharding
* Remove obsolete comment.Olivier Wilkinson (reivilibre)2020-07-161-2/+0
| | | | | | | | It was correct at the time of our friend Jorik writing it (checking git blame), but the world has moved now and it is no longer a generator. Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
* Add ability to shard the federation sender (#7798)Erik Johnston2020-07-102-2/+68
|
* Fix some spelling mistakes / typos. (#7811)Patrick Cloke2020-07-091-2/+2
|
* Fix new metric where we used ms instead of seconds (#7771)Erik Johnston2020-07-011-1/+1
| | | | Introduced in #7755, not yet released.
* Add some metrics for inbound and outbound federation processing times (#7755)Erik Johnston2020-06-301-1/+9
|
* Replace iteritems/itervalues/iterkeys with native versions. (#7692)Patrick Cloke2020-06-151-3/+1
|
* add a commentRichard van der Hoff2020-05-211-0/+3
|
* Fix catchup-on-reconnect for the Federation Stream (#7374)Richard van der Hoff2020-05-053-9/+15
| | | | looks like we managed to break this during the refactorathon.
* Move catchup of replication streams to worker. (#7024)Erik Johnston2020-03-251-0/+9
| | | This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
* Add typing to synapse.federation.sender (#6871)Erik Johnston2020-02-073-101/+102
|
* Wake up transaction queue when remote server comes back online (#6706)Erik Johnston2020-01-171-2/+16
| | | | | This will be used to retry outbound transactions to a remote server if we think it might have come back up.
* Add StateMap type alias (#6715)Erik Johnston2020-01-161-1/+2
|
* Clean up newline quote marks around the codebase (#6362)Andrew Morgan2019-11-212-3/+3
|
* Merge branch 'develop' into cross-signing_federationHubert Chathi2019-10-312-7/+8
|\
| * Update black to 19.10b0 (#6304)Amber Brown2019-11-011-5/+6
| | | | | | * update version of black and also fix the mypy config being overridden
| * Remove usage of deprecated logger.warn method from codebase (#6271)Andrew Morgan2019-10-311-2/+2
| | | | | | Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.
* | rename get_devices_by_remote to get_device_updates_by_remoteHubert Chathi2019-10-301-2/+2
| |
* | Merge branch 'develop' into cross-signing_federationHubert Chathi2019-10-241-1/+1
|\|
| * Move storage classes into a main "data store".Erik Johnston2019-10-211-1/+1
| | | | | | | | | | This is in preparation for having multiple data stores that offer different functionality, e.g. splitting out state or event storage.
* | implement federation parts of cross-signingHubert Chathi2019-10-221-2/+2
|/
* add some metrics on the federation sender (#6160)Richard van der Hoff2019-10-031-5/+6
|
* use access methods (duh..)Jorik Schellekens2019-09-051-1/+3
| | | Co-Authored-By: Erik Johnston <erik@matrix.org>
* Link the send loop with the edus contextsJorik Schellekens2019-09-051-3/+8
| | | | | | The contexts were being filtered too early so the send loop wasn't being linked to them unless the destination was whitelisted.
* Propagate opentracing contexts through EDUs (#5852)Jorik Schellekens2019-08-221-73/+97
| | | | | Propagate opentracing contexts through EDUs Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Replace returnValue with return (#5736)Amber Brown2019-07-232-3/+3
|
* remove dead transaction persist code (#5622)Richard van der Hoff2019-07-051-9/+0
| | | | this hasn't done anything for years
* Move logging utilities out of the side drawer of util/ and into logging/ (#5606)Amber Brown2019-07-041-4/+8
|
* Run Black. (#5482)Amber Brown2019-06-203-58/+54
|
* Add experimental option to reduce extremities.Erik Johnston2019-06-181-0/+3
| | | | | | | Adds new config option `cleanup_extremities_with_dummy_events` which periodically sends dummy events to rooms with more than 10 extremities. THIS IS REALLY EXPERIMENTAL.
* Clean up code for sending federation EDUs. (#5381)Richard van der Hoff2019-06-131-14/+26
| | | | This code confused the hell out of me today. Split _get_new_device_messages into its two (unrelated) parts.
* Prevent multiple device list updates from breaking a batch send (#5156)Andrew Morgan2019-06-061-2/+3
| | | fixes #5153
* Run `black` on per_destination_queueRichard van der Hoff2019-05-091-35/+39
| | | | ... mostly to fix pep8 fails
* Limit the number of EDUs in transactions to 100 as expected by receiver (#5138)Quentin Dufour2019-05-091-26/+30
| | | Fixes #3951.
* Use event streams to calculate presenceErik Johnston2019-03-271-1/+18
| | | | | | | | | | | | | | | | | Primarily this fixes a bug in the handling of remote users joining a room where the server sent out the presence for all local users in the room to all servers in the room. We also change to using the state delta stream, rather than the distributor, as it will make it easier to split processing out of the master process (as well as being more flexible). Finally, when sending presence states to newly joined servers we filter out old presence states to reduce the number sent. Initially we filter out states that are offline and have a last active more than a week ago, though this can be changed down the line. Fixes #3962
* Batch up outgoing read-receipts to reduce federation traffic. (#4890)Richard van der Hoff2019-03-202-21/+158
| | | | Rate-limit outgoing read-receipts as per #4730.
* Rename and move the classesRichard van der Hoff2019-03-133-0/+853