summary refs log tree commit diff
path: root/synapse/federation/sender (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Improve logging and opentracing for to-device message handling (#14598)Richard van der Hoff2022-12-061-1/+1
| | | | | | | A batch of changes intended to make it easier to trace to-device messages through the system. The intention here is that a client can set a property org.matrix.msgid in any to-device message it sends. That ID is then included in any tracing or logging related to the message. (Suggestions as to where this field should be documented welcome. I'm not enthusiastic about speccing it - it's very much an optional extra to help with debugging.) I've also generally improved the data we send to opentracing for these messages.
* Use servers list approx to send read receipts when in partial state (#14549)Mathieu Velten2022-11-301-1/+1
| | | Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
* Include thread information when sending receipts over federation. (#14466)Patrick Cloke2022-11-281-63/+120
| | | | | | | | | | | | Include the thread_id field when sending read receipts over federation. This might result in the same user having multiple read receipts per-room, meaning multiple EDUs must be sent to encapsulate those receipts. This restructures the PerDestinationQueue APIs to support multiple receipt EDUs, queue_read_receipt now becomes linear time in the number of queued threaded receipts in the room for the given user, it is expected this is a small number since receipt EDUs are sent as filler in transactions.
* Faster joins: use initial list of servers if we don't have the full state ↵Mathieu Velten2022-11-241-1/+17
| | | | | | | yet (#14408) Signed-off-by: Mathieu Velten <mathieuv@matrix.org> Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Faster joins: filter out non local events when a room doesn't have its full ↵Mathieu Velten2022-11-211-0/+1
| | | | | | state (#14404) Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
* Fix `TypeError: 'dict_keys' object is not reversible` (#14280)Erik Johnston2022-10-241-2/+1
|
* Fix a bug where redactions were not being sent over federation if we did not ↵Shay2022-10-111-9/+20
| | | | have the original event. (#13813)
* Prioritize outbound to-device over device list updates (#13922)Erik Johnston2022-09-271-13/+16
| | | Otherwise device list changes for large accounts can temporarily delay to-device messages.
* Fix Prometheus recording rules to not use legacy metric names. (#13718)reivilibre2022-09-081-2/+2
|
* Add some logging to help track down #13444 (#13679)Erik Johnston2022-09-011-0/+13
|
* Federation Sender & Appservice Pusher Stream Optimisations (#13251)Nick Mills-Barrett2022-07-151-3/+7
| | | | | | | | | | | | | * Replace `get_new_events_for_appservice` with `get_all_new_events_stream` The functions were near identical and this brings the AS worker closer to the way federation senders work which can allow for multiple workers to handle AS traffic. * Pull received TS alongside events when processing the stream This avoids an extra query -per event- when both federation sender and appservice pusher process events.
* Reduce amount of state we pull out when attempting to send catchup PDUs. ↵Erik Johnston2022-06-071-11/+20
| | | | | | | | | (#12963) * Don't pull out state for catchup * Newsfile * Merge newsfile
* Reduce state pulled from DB due to sending typing and receipts over ↵Erik Johnston2022-06-061-1/+5
| | | | | federation (#12964) Reducing the amount of state we pull from the DB is useful as fetching state is expensive in terms of DB, CPU and memory.
* Additional constants for EDU types. (#12884)Patrick Cloke2022-05-272-4/+9
| | | Instead of hard-coding strings in many places.
* Avoid attempting to delete push actions for remote users. (#12879)Patrick Cloke2022-05-261-1/+1
| | | | Remote users will never have push actions, so we can avoid a database round-trip/transaction completely.
* Add some type hints to datastore (#12717)Dirk Klimpel2022-05-171-7/+17
|
* Add extra debug logging to federation sender (#12614)Richard van der Hoff2022-05-031-2/+18
| | | | ... in order to debug some problems we've been having with certain events not being sent when expected.
* Exclude OOB memberships from the federation sender (#12570)Richard van der Hoff2022-05-031-0/+39
| | | | | | | As the comment says, there is no need to process such events, and indeed we need to avoid doing so. Fixes #12509.
* Spread out sending device lists to remote hosts (#12132)Erik Johnston2022-03-042-9/+27
|
* Remove `HomeServer.get_datastore()` (#12031)Richard van der Hoff2022-02-233-6/+7
| | | | | | | The presence of this method was confusing, and mostly present for backwards compatibility. Let's get rid of it. Part of #11733
* Minor typing fixes (#12034)Richard van der Hoff2022-02-211-9/+9
| | | | | | These started failing in https://github.com/matrix-org/synapse/pull/12031... I'm a bit mystified by how they ever worked.
* Debug for device lists updates (#11760)David Robertson2022-01-201-0/+12
| | | | | | | | | | | | | | | | | | Debug for #8631. I'm having a hard time tracking down what's going wrong in that issue. In the reported example, I could see server A sending federation traffic to server B and all was well. Yet B reports out-of-sync device updates from A. I couldn't see what was _in_ the events being sent from A to B. So I have added some crude logging to track - when we have updates to send to a remote HS - the edus we actually accumulate to send - when a federation transaction includes a device list update edu - when such an EDU is received This is a bit of a sledgehammer.
* Use auto_attribs/native type hints for attrs classes. (#11692)Patrick Cloke2022-01-131-6/+6
|
* Add most of the missing type hints to `synapse.federation`. (#11483)Patrick Cloke2021-12-021-3/+10
| | | This skips a few methods which are difficult to type.
* Annotate `log_function` decorator (#10943)reivilibre2021-10-271-1/+0
| | | Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Remove unnecessary parentheses around tuples returned from methods (#10889)Andrew Morgan2021-09-231-2/+2
|
* Use direct references for some configuration variables (part 2) (#10812)Patrick Cloke2021-09-151-1/+1
|
* Use direct references for some configuration variables (#10798)Patrick Cloke2021-09-131-1/+2
| | | | Instead of proxying through the magic getter of the RootConfig object. This should be more performant (and is more explicit).
* Add types to synapse.util. (#10601)reivilibre2021-09-101-2/+6
|
* Convert Transaction and Edu object to attrs (#10542)Patrick Cloke2021-08-061-4/+5
| | | | | Instead of wrapping the JSON into an object, this creates concrete instances for Transaction and Edu. This allows for improved type hints and simplified code.
* Stagger send presence to remotes (#10398)Erik Johnston2021-07-152-5/+107
| | | | | | This is to help with performance, where trying to connect to thousands of hosts at once can consume a lot of CPU (due to TLS etc). Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
* Use inline type hints in various other places (in `synapse/`) (#10380)Jonathan de Jong2021-07-152-17/+15
|
* Add debug logging for issue #9533 (#9959)Richard van der Hoff2021-05-111-0/+9
| | | | | Hopefully this will help us track down where to-device messages are getting lost/delayed.
* Revert "Experimental Federation Speedup (#9702)"Andrew Morgan2021-04-282-102/+58
| | | | This reverts commit 05e8c70c059f8ebb066e029bc3aa3e0cefef1019.
* Remove `synapse.types.Collection` (#9856)Richard van der Hoff2021-04-221-2/+12
| | | This is no longer required, since we have dropped support for Python 3.5.
* Fix bug where we sent remote presence states to remote servers (#9850)Erik Johnston2021-04-201-0/+4
|
* Don't send normal presence updates over federation replication stream (#9828)Erik Johnston2021-04-191-95/+1
|
* remove `HomeServer.get_config` (#9815)Richard van der Hoff2021-04-141-1/+1
| | | | Every single time I want to access the config object, I have to remember whether or not we use `get_config`. Let's just get rid of it.
* Experimental Federation Speedup (#9702)Jonathan de Jong2021-04-142-62/+93
| | | | | This basically speeds up federation by "squeezing" each individual dual database call (to destinations and destination_rooms), which previously happened per every event, into one call for an entire batch (100 max). Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>
* Remove redundant "coding: utf-8" lines (#9786)Jonathan de Jong2021-04-143-3/+0
| | | | | | | Part of #9744 Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now. `Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
* Fix sharded federation sender sometimes using 100% CPU.Erik Johnston2021-04-081-2/+4
| | | | | | | We pull all destinations requiring catchup from the DB in batches. However, if all those destinations get filtered out (due to the federation sender being sharded), then the `last_processed` destination doesn't get updated, and we keep requesting the same set repeatedly.
* Add a Synapse Module for configuring presence update routing (#9491)Andrew Morgan2021-04-061-1/+18
| | | | | | | | | | | | At the moment, if you'd like to share presence between local or remote users, those users must be sharing a room together. This isn't always the most convenient or useful situation though. This PR adds a module to Synapse that will allow deployments to set up extra logic on where presence updates should be routed. The module must implement two methods, `get_users_for_states` and `get_interested_users`. These methods are given presence updates or user IDs and must return information that Synapse will use to grant passing presence updates around. A method is additionally added to `ModuleApi` which allows triggering a set of users to receive the current, online presence information for all users they are considered interested in. This is the equivalent of that user receiving presence information during an initial sync. The goal of this module is to be fairly generic and useful for a variety of applications, with hard requirements being: * Sending state for a specific set or all known users to a defined set of local and remote users. * The ability to trigger an initial sync for specific users, so they receive all current state.
* Improve tracing for to device messages (#9686)Erik Johnston2021-04-011-0/+8
|
* Add type hints for the federation sender. (#9681)Patrick Cloke2021-03-291-13/+103
| | | | Includes an abstract base class which both the FederationSender and the FederationRemoteSendQueue must implement.
* Fixed undefined variable error in catchup (#9664)Erik Johnston2021-03-241-0/+2
| | | | | Broke in #9640 Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Make federation catchup send last event from any server. (#9640)Erik Johnston2021-03-181-15/+89
| | | | | | | | | | | | | | Currently federation catchup will send the last *local* event that we failed to send to the remote. This can cause issues for large rooms where lots of servers have sent events while the remote server was down, as when it comes back up again it'll be flooded with events from various points in the DAG. Instead, let's make it so that all the servers send the most recent events, even if its not theirs. The remote should deduplicate the events, so there shouldn't be much overhead in doing this. Alternatively, the servers could only send local events if they were also extremities and hope that the other server will send the event over, but that is a bit risky.
* Don't go into federation catch up mode so easily (#9561)Erik Johnston2021-03-152-153/+182
| | | | | | | | | | Federation catch up mode is very inefficient if the number of events that the remote server has missed is small, since handling gaps can be very expensive, c.f. #9492. Instead of going into catch up mode whenever we see an error, we instead do so only if we've backed off from trying the remote for more than an hour (the assumption being that in such a case it is more than a transient failure).
* Replace `last_*_pdu_age` metrics with timestamps (#9540)Richard van der Hoff2021-03-041-6/+5
| | | | | | | | Following the advice at https://prometheus.io/docs/practices/instrumentation/#timestamps-not-time-since, it's preferable to export unix timestamps, not ages. There doesn't seem to be any particular naming convention for timestamp metrics.
* Be smarter about which hosts to send presence to when processing room joins ↵Andrew Morgan2021-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | (#9402) This PR attempts to eliminate unnecessary presence sending work when your local server joins a room, or when a remote server joins a room your server is participating in by processing state deltas in chunks rather than individually. --- When your server joins a room for the first time, it requests the historical state as well. This chunk of new state is passed to the presence handler which, after filtering that state down to only membership joins, will send presence updates to homeservers for each join processed. It turns out that we were being a bit naive and processing each event individually, and sending out presence updates for every one of those joins. Even if many different joins were users on the same server (hello IRC bridges), we'd send presence to that same homeserver for every remote user join we saw. This PR attempts to deduplicate all of that by processing the entire batch of state deltas at once, instead of only doing each join individually. We process the joins and note down which servers need which presence: * If it was a local user join, send that user's latest presence to all servers in the room * If it was a remote user join, send the presence for all local users in the room to that homeserver We deduplicate by inserting all of those pending updates into a dictionary of the form: ``` { server_name1: {presence_update1, ...}, server_name2: {presence_update1, presence_update2, ...} } ``` Only after building this dict do we then start sending out presence updates.
* Update black, and run auto formatting over the codebase (#9381)Eric Eastwood2021-02-163-9/+18
| | | | | | | - Update black version to the latest - Run black auto formatting over the codebase - Run autoformatting according to [`docs/code_style.md `](https://github.com/matrix-org/synapse/blob/80d6dc9783aa80886a133756028984dbf8920168/docs/code_style.md) - Update `code_style.md` docs around installing black to use the correct version
* Precompute joined hosts and store in Redis (#9198)Erik Johnston2021-01-261-15/+35
|
* Fix not sending events over federation when using sharded event persisters ↵Erik Johnston2020-10-141-2/+7
| | | | | | | | | | | | | | | | | (#8536) * Fix outbound federaion with multiple event persisters. We incorrectly notified federation senders that the minimum persisted stream position had advanced when we got an `RDATA` from an event persister. Notifying of federation senders already correctly happens in the notifier, so we just delete the offending line. * Change some interfaces to use RoomStreamToken. By enforcing use of `RoomStreamTokens` we make it less likely that people pass in random ints that they got from somewhere random.
* Remove stream ordering from Metadata dict (#8452)Richard van der Hoff2020-10-052-0/+4
| | | | | | | | There's no need for it to be in the dict as well as the events table. Instead, we store it in a separate attribute in the EventInternalMetadata object, and populate that on load. This means that we can rely on it being correctly populated for any event which has been persited to the database.
* Fix malformed log line in new federation "catch up" logic (#8442)Richard van der Hoff2020-10-021-1/+1
|
* Add prometheus metrics to track federation delays (#8430)Richard van der Hoff2020-10-011-0/+22
| | | | | Add a pair of federation metrics to track the delays in sending PDUs to/from particular servers.
* Catch-up after Federation Outage (bonus): Catch-up on Synapse Startup (#8322)reivilibre2020-09-181-0/+51
| | | | | | | | | | Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net> Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com> * Fix _set_destination_retry_timings This came about because the code assumed that retry_interval could not be NULL — which has been challenged by catch-up.
* Catch-up after Federation Outage (split, 4): catch-up loop (#8272)reivilibre2020-09-151-4/+125
|
* Catch up after Federation Outage (split, 2): Track last successful stream ↵reivilibre2020-09-041-0/+11
| | | | | ordering after transmission (#8247) Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Catch-up after Federation Outage (split, 1) (#8230)reivilibre2020-09-041-2/+9
| | | Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
* Stop sub-classing object (#8249)Patrick Cloke2020-09-043-3/+3
|
* Remove obsolete order field in `send_new_transaction` (#8245)reivilibre2020-09-033-28/+22
| | | Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Add type hints for state. (#8140)Patrick Cloke2020-08-241-2/+2
|
* Be stricter about JSON that is accepted by Synapse (#8106)Patrick Cloke2020-08-191-3/+2
|
* Convert stream database to async/await. (#8074)Patrick Cloke2020-08-172-2/+2
|
* Drop federation transmission queues during a significant remote outage. (#7864)reivilibre2020-08-131-0/+22
| | | | | | | | | | | | | * Empty federation transmission queues when we are backing off. Fixes #7828. Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net> * Address feedback Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net> * Reword newsfile
* Fix typing for notifier (#8064)Erik Johnston2020-08-121-2/+5
|
* Merge branch 'master' into developOlivier Wilkinson (reivilibre)2020-07-302-2/+2
|\
| * Update worker docs with recent enhancements (#7969)Erik Johnston2020-07-292-2/+2
| |
* | Convert federation client to async/await. (#7975)Patrick Cloke2020-07-301-13/+6
|/
* Convert state resolution to async/await (#7942)Patrick Cloke2020-07-241-1/+3
|
* Convert presence handler helpers to async/await. (#7939)Patrick Cloke2020-07-231-1/+3
|
* Add ability to run multiple pusher instances (#7855)Erik Johnston2020-07-162-9/+9
| | | This reuses the same scheme as federation sender sharding
* Remove obsolete comment.Olivier Wilkinson (reivilibre)2020-07-161-2/+0
| | | | | | | | It was correct at the time of our friend Jorik writing it (checking git blame), but the world has moved now and it is no longer a generator. Signed-off-by: Olivier Wilkinson (reivilibre) <olivier@librepush.net>
* Add ability to shard the federation sender (#7798)Erik Johnston2020-07-102-2/+68
|
* Fix some spelling mistakes / typos. (#7811)Patrick Cloke2020-07-091-2/+2
|
* Fix new metric where we used ms instead of seconds (#7771)Erik Johnston2020-07-011-1/+1
| | | | Introduced in #7755, not yet released.
* Add some metrics for inbound and outbound federation processing times (#7755)Erik Johnston2020-06-301-1/+9
|
* Replace iteritems/itervalues/iterkeys with native versions. (#7692)Patrick Cloke2020-06-151-3/+1
|
* add a commentRichard van der Hoff2020-05-211-0/+3
|
* Fix catchup-on-reconnect for the Federation Stream (#7374)Richard van der Hoff2020-05-053-9/+15
| | | | looks like we managed to break this during the refactorathon.
* Move catchup of replication streams to worker. (#7024)Erik Johnston2020-03-251-0/+9
| | | This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
* Add typing to synapse.federation.sender (#6871)Erik Johnston2020-02-073-101/+102
|
* Wake up transaction queue when remote server comes back online (#6706)Erik Johnston2020-01-171-2/+16
| | | | | This will be used to retry outbound transactions to a remote server if we think it might have come back up.
* Add StateMap type alias (#6715)Erik Johnston2020-01-161-1/+2
|
* Clean up newline quote marks around the codebase (#6362)Andrew Morgan2019-11-212-3/+3
|
* Merge branch 'develop' into cross-signing_federationHubert Chathi2019-10-312-7/+8
|\
| * Update black to 19.10b0 (#6304)Amber Brown2019-11-011-5/+6
| | | | | | * update version of black and also fix the mypy config being overridden
| * Remove usage of deprecated logger.warn method from codebase (#6271)Andrew Morgan2019-10-311-2/+2
| | | | | | Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.
* | rename get_devices_by_remote to get_device_updates_by_remoteHubert Chathi2019-10-301-2/+2
| |
* | Merge branch 'develop' into cross-signing_federationHubert Chathi2019-10-241-1/+1
|\|
| * Move storage classes into a main "data store".Erik Johnston2019-10-211-1/+1
| | | | | | | | | | This is in preparation for having multiple data stores that offer different functionality, e.g. splitting out state or event storage.
* | implement federation parts of cross-signingHubert Chathi2019-10-221-2/+2
|/
* add some metrics on the federation sender (#6160)Richard van der Hoff2019-10-031-5/+6
|
* use access methods (duh..)Jorik Schellekens2019-09-051-1/+3
| | | Co-Authored-By: Erik Johnston <erik@matrix.org>
* Link the send loop with the edus contextsJorik Schellekens2019-09-051-3/+8
| | | | | | The contexts were being filtered too early so the send loop wasn't being linked to them unless the destination was whitelisted.
* Propagate opentracing contexts through EDUs (#5852)Jorik Schellekens2019-08-221-73/+97
| | | | | Propagate opentracing contexts through EDUs Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Replace returnValue with return (#5736)Amber Brown2019-07-232-3/+3
|
* remove dead transaction persist code (#5622)Richard van der Hoff2019-07-051-9/+0
| | | | this hasn't done anything for years
* Move logging utilities out of the side drawer of util/ and into logging/ (#5606)Amber Brown2019-07-041-4/+8
|
* Run Black. (#5482)Amber Brown2019-06-203-58/+54
|
* Add experimental option to reduce extremities.Erik Johnston2019-06-181-0/+3
| | | | | | | Adds new config option `cleanup_extremities_with_dummy_events` which periodically sends dummy events to rooms with more than 10 extremities. THIS IS REALLY EXPERIMENTAL.
* Clean up code for sending federation EDUs. (#5381)Richard van der Hoff2019-06-131-14/+26
| | | | This code confused the hell out of me today. Split _get_new_device_messages into its two (unrelated) parts.
* Prevent multiple device list updates from breaking a batch send (#5156)Andrew Morgan2019-06-061-2/+3
| | | fixes #5153
* Run `black` on per_destination_queueRichard van der Hoff2019-05-091-35/+39
| | | | ... mostly to fix pep8 fails
* Limit the number of EDUs in transactions to 100 as expected by receiver (#5138)Quentin Dufour2019-05-091-26/+30
| | | Fixes #3951.
* Use event streams to calculate presenceErik Johnston2019-03-271-1/+18
| | | | | | | | | | | | | | | | | Primarily this fixes a bug in the handling of remote users joining a room where the server sent out the presence for all local users in the room to all servers in the room. We also change to using the state delta stream, rather than the distributor, as it will make it easier to split processing out of the master process (as well as being more flexible). Finally, when sending presence states to newly joined servers we filter out old presence states to reduce the number sent. Initially we filter out states that are offline and have a last active more than a week ago, though this can be changed down the line. Fixes #3962
* Batch up outgoing read-receipts to reduce federation traffic. (#4890)Richard van der Hoff2019-03-202-21/+158
| | | | Rate-limit outgoing read-receipts as per #4730.
* Rename and move the classesRichard van der Hoff2019-03-133-0/+853