summary refs log tree commit diff
path: root/synapse/federation (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Regularly try to wake up dests instead of waiting for next PDU/EDU (#15743)Mathieu Velten2023-06-161-18/+16
|
* Fix MSC3983 support: only one OTK per device was returned through federation ↵Mathieu Velten2023-06-131-1/+3
| | | | (#15770)
* Fix MSC3983 support: Use the unstable /keys/claim federation endpoint if ↵Patrick Cloke2023-06-131-1/+4
| | | | multiple keys are requested (#15755)
* Update error to more plainly explain we can only authorize our own events ↵Eric Eastwood2023-06-061-1/+1
| | | | (#15725)
* Add stricter mypy options (#15694)Patrick Cloke2023-05-311-2/+2
| | | | Enable warn_unused_configs, strict_concatenate, disallow_subclassing_any, and disallow_incomplete_defs.
* Remove unused `FederationServer.__str__` override (#15690)Sean Quah2023-05-301-3/+0
| | | Signed-off-by: Sean Quah <seanq@matrix.org>
* Add requesting user id parameter to key claim methods in ↵Shay2023-05-242-5/+17
| | | | `TransportLayerClient` (#15663)
* Remove experimental configuration flags & unstable values for faster joins ↵Patrick Cloke2023-05-193-39/+4
| | | | | | | (#15625) Synapse will no longer send (or respond to) the unstable flags for faster joins. These were only available behind a configuration flag and handled in parallel with the stable flags.
* Factor out an `is_mine_server_name` method (#15542)Sean Quah2023-05-057-13/+19
| | | | | | | | | | | | Add an `is_mine_server_name` method, similar to `is_mine_id`. Ideally we would use this consistently, instead of sometimes comparing against `hs.hostname` and other times reaching into `hs.config.server.server_name`. Also fix a bug in the tests where `hs.hostname` would sometimes differ from `hs.config.server.server_name`. Signed-off-by: Sean Quah <seanq@matrix.org>
* Add support for claiming multiple OTKs at once. (#15468)Patrick Cloke2023-04-274-14/+116
| | | | | | | MSC3983 provides a way to request multiple OTKs at once from appservices, this extends this concept to the Client-Server API. Note that this will likely be spit out into a separate MSC, but is currently part of MSC3983.
* Add unstable /keys/claim endpoint which always returns fallback keys. (#15462)Patrick Cloke2023-04-253-3/+32
| | | | | | | | | | | | | It can be useful to always return the fallback key when attempting to claim keys. This adds an unstable endpoint for `/keys/claim` which always returns fallback keys in addition to one-time-keys. The fallback key(s) are not marked as "used" unless there are no corresponding OTKs. This is currently defined in MSC3983 (although likely to be split out to a separate MSC). The endpoint shape may change or be requested differently (i.e. a keyword parameter on the current endpoint), but the core logic should be reasonable.
* Finish type hints for federation client HTTP code. (#15465)Patrick Cloke2023-04-242-10/+15
|
* Move Spam Checker callbacks to a dedicated file (#15453)Andrew Morgan2023-04-182-6/+8
|
* Implement MSC3984 to proxy /keys/query requests to appservices. (#15321)Patrick Cloke2023-03-301-42/+6
| | | | | If enabled, for users which are exclusively owned by an application service then the appservice will be queried for devices in addition to any information stored in the Synapse database.
* Implement MSC3983 to proxy /keys/claim queries to appservices. (#15314)Patrick Cloke2023-03-281-10/+10
| | | | | | Experimental support for MSC3983 is behind a configuration flag. If enabled, for users which are exclusively owned by an application service then the appservice will be queried for one-time keys *if* there are none uploaded to Synapse.
* Add developer documentation for the Federation Sender and add a ↵reivilibre2023-03-241-0/+113
| | | | | | documentation mechanism using Sphinx. (#15265) Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Add a primitive helper script for listing worker endpoints. (#15243)reivilibre2023-03-232-0/+29
| | | | Co-authored-by: Patrick Cloke <patrickc@matrix.org>
* Change the parameter `immediate` of `send_device_messages` to default to ↵Shay2023-03-212-2/+2
| | | | `True` (#15297)
* Ensure fed-sender catchup does not block for full state (#15248)David Robertson2023-03-131-2/+7
| | | | * Reproduce bad scenario in test * Avoid catchup optimisation for partial state rooms
* Refactor `filter_events_for_server` (#15240)David Robertson2023-03-101-0/+2
| | | | | | | | | | | | | | | | | * Tweak docstring and type hint * Flip logic and provide better name * Separate decision from action * Track a set of strings, not EventBases * Require explicit boolean options from callers * Add explicit option for partial state rooms * Changelog * Rename param
* Bump black from 22.12.0 to 23.1.0 (#15103)dependabot[bot]2023-02-221-2/+2
|
* Fix federated joins when the first server in the list is not in the room ↵Sean Quah2023-02-151-6/+5
| | | | | | | | (#15074) Previously we would give up upon receiving a 404 from the first server, instead of trying the rest of the servers in the list. Signed-off-by: Sean Quah <seanq@matrix.org>
* Faster joins: don't stall when a user joins during a fast join (#14606)Mathieu Velten2023-02-101-1/+1
| | | | | | | | | | | | | | | | Fixes #12801. Complement tests are at https://github.com/matrix-org/complement/pull/567. Avoid blocking on full state when handling a subsequent join into a partial state room. Also always perform a remote join into partial state rooms, since we do not know whether the joining user has been banned and want to avoid leaking history to banned users. Signed-off-by: Mathieu Velten <mathieuv@matrix.org> Co-authored-by: Sean Quah <seanq@matrix.org> Co-authored-by: David Robertson <davidr@element.io>
* Return read-only collections from `@cached` methods (#13755)Sean Quah2023-02-101-1/+2
| | | | | | | | | | | | | It's important that collections returned from `@cached` methods are not modified, otherwise future retrievals from the cache will return the modified collection. This applies to the return values from `@cached` methods and the values inside the dictionaries returned by `@cachedList` methods. It's not necessary for the dictionaries returned by `@cachedList` methods themselves to be read-only. Signed-off-by: Sean Quah <seanq@matrix.org> Co-authored-by: David Robertson <davidr@element.io>
* Faster joins: Refactor handling of servers in room (#14954)Sean Quah2023-02-032-12/+23
| | | | | | | | | | | | | | Ensure that the list of servers in a partial state room always contains the server we joined off. Also refactor `get_partial_state_servers_at_join` to return `None` when the given room is no longer partial stated, to explicitly indicate when the room has partial state. Otherwise it's not clear whether an empty list means that the room has full state, or the room is partial stated, but the server we joined off told us that there are no servers in the room. Signed-off-by: Sean Quah <seanq@matrix.org>
* Add helper to parse an enum from query args & use it. (#14956)Patrick Cloke2023-02-014-15/+27
| | | | | | | | The `parse_enum` helper pulls an enum value from the query string (by delegating down to the parse_string helper with values generated from the enum). This is used to pull out "f" and "b" in most places and then we thread the resulting Direction enum throughout more code.
* Tag /send_join responses to detect faster joins (#14950)David Robertson2023-01-311-0/+6
| | | | | | | | | * Tag /send_join responses to detect faster joins * Changelog * Define a proper SynapseTag * isort
* Reject boolean power levels (#14944)David Robertson2023-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Better test for bad values in power levels events The previous test only checked that Synapse didn't raise an exception, but didn't check that we had correctly interpreted the value of the dodgy power level. It also conflated two things: bad room notification levels, and bad user levels. There _is_ logic for converting the latter to integers, but we should test it separately. * Check we ignore types that don't convert to int * Handle `None` values in `notifications.room` * Changelog * Also test that bad values are rejected by event auth * Docstring * linter scripttttttttt * Test boolean values in PL content * Reject boolean power levels * Changelog
* Prefer `type(x) is int` to `isinstance(x, int)` (#14945)David Robertson2023-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Perfer `type(x) is int` to `isinstance(x, int)` This covered all additional instances I could see where `x` was user-controlled. The remaining cases are ``` $ rg -s 'isinstance.*[^_]int' tests/replication/_base.py 576: if isinstance(obj, int): synapse/util/caches/stream_change_cache.py 136: assert isinstance(stream_pos, int) 214: assert isinstance(stream_pos, int) 246: assert isinstance(stream_pos, int) 267: assert isinstance(stream_pos, int) synapse/replication/tcp/external_cache.py 133: if isinstance(result, int): synapse/metrics/__init__.py 100: if isinstance(calls, (int, float)): synapse/handlers/appservice.py 262: assert isinstance(new_token, int) synapse/config/_util.py 62: if isinstance(p, int): ``` which cover metrics, logic related to `jsonschema`, and replication and data streams. AFAICS these are all internal to Synapse * Changelog
* Bump the client-side timeout for /state (#14912)David Robertson2023-01-251-0/+4
| | | | | | | | | | | * Bump the client-side timeout for /state to allow faster joins resyncs the chance to complete for large rooms. We have seen this fair poorly (~90s for Matrix HQ's /state) in testing, causing the resync to advance to another HS who hasn't seen our join yet. * Changelog * Milliseconds!!!!
* Faster joins: Fix incompatibility with restricted joins (#14882)Sean Quah2023-01-221-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Avoid clearing out forward extremities when doing a second remote join When joining a restricted room where the local homeserver does not have a user able to issue invites, we perform a second remote join. We want to avoid clearing out forward extremities in this case because the forward extremities we have are up to date and clearing out forward extremities creates a window in which the room can get bricked if Synapse crashes. Signed-off-by: Sean Quah <seanq@matrix.org> * Do a full join when doing a second remote join into a full state room We cannot persist a partial state join event into a joined full state room, so we perform a full state join for such rooms instead. As a future optimization, we could always perform a partial state join and compute or retrieve the full state ourselves if necessary. Signed-off-by: Sean Quah <seanq@matrix.org> * Add lock around partial state flag for rooms Signed-off-by: Sean Quah <seanq@matrix.org> * Preserve partial state info when doing a second partial state join Signed-off-by: Sean Quah <seanq@matrix.org> * Add newsfile * Add a TODO(faster_joins) marker Signed-off-by: Sean Quah <seanq@matrix.org>
* Stabilise serving partial join responses (#14839)David Robertson2023-01-171-11/+10
| | | | | Serving partial join responses is no longer experimental. They will only be served under the stable identifier if the the undocumented config flag experimental.msc3706_enabled is set to true. Synapse continues to request a partial join only if the undocumented config flag experimental.faster_joins is set to true; this setting remains present and unaffected.
* Add parameter to control whether we do a partial state join (#14843)Sean Quah2023-01-162-5/+23
| | | | | | | When the local homeserver is already joined to a room and wants to perform another remote join, we may find it useful to do a non-partial state join if we already have the full state for the room. Signed-off-by: Sean Quah <seanq@matrix.org>
* Also use stable name in SendJoinResponse struct (#14841)David Robertson2023-01-163-11/+13
| | | | | | | | | | | | | | | | | * Also use stable name in SendJoinResponse struct follow-up to #14832 * Changelog * Fix a rename I missed * Run black * Update synapse/federation/federation_client.py Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Use stable identifiers for faster joins (#14832)David Robertson2023-01-133-3/+30
| | | | | | | | | | | * Use new query param when requesting a partial join * Read new query param when serving partial join * Provide new field names when serving partial joins * Read new field names from partial join response * Changelog
* Failover on proper error responses. (#14620)Patrick Cloke2022-12-061-9/+20
| | | | When querying a remote server handle a 404/405 with an errcode of M_UNRECOGNIZED as an unimplemented endpoint.
* Improve logging and opentracing for to-device message handling (#14598)Richard van der Hoff2022-12-061-1/+1
| | | | | | | A batch of changes intended to make it easier to trace to-device messages through the system. The intention here is that a client can set a property org.matrix.msgid in any to-device message it sends. That ID is then included in any tracing or logging related to the message. (Suggestions as to where this field should be documented welcome. I'm not enthusiastic about speccing it - it's very much an optional extra to help with debugging.) I've also generally improved the data we send to opentracing for these messages.
* Use servers list approx to send read receipts when in partial state (#14549)Mathieu Velten2022-11-301-1/+1
| | | Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
* Move MSC3030 `/timestamp_to_event` endpoint to stable v1 location (#14471)Eric Eastwood2022-11-284-14/+14
| | | | | | | | Fix https://github.com/matrix-org/synapse/issues/14390 - Client API: `/_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>` -> `/_matrix/client/v1/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction>` - Federation API: `/_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>` -> `/_matrix/federation/v1/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction>` Complement test changes: https://github.com/matrix-org/complement/pull/559
* Include thread information when sending receipts over federation. (#14466)Patrick Cloke2022-11-281-63/+120
| | | | | | | | | | | | Include the thread_id field when sending read receipts over federation. This might result in the same user having multiple read receipts per-room, meaning multiple EDUs must be sent to encapsulate those receipts. This restructures the PerDestinationQueue APIs to support multiple receipt EDUs, queue_read_receipt now becomes linear time in the number of queued threaded receipts in the room for the given user, it is expected this is a small number since receipt EDUs are sent as filler in transactions.
* Faster joins: use initial list of servers if we don't have the full state ↵Mathieu Velten2022-11-241-1/+17
| | | | | | | yet (#14408) Signed-off-by: Mathieu Velten <mathieuv@matrix.org> Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Faster joins: filter out non local events when a room doesn't have its full ↵Mathieu Velten2022-11-211-0/+1
| | | | | | state (#14404) Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
* Remove redundant types from comments. (#14412)Patrick Cloke2022-11-162-8/+7
| | | | | | | Remove type hints from comments which have been added as Python type hints. This helps avoid drift between comments and reality, as well as removing redundant information. Also adds some missing type hints which were simple to fill in.
* Include heroes in partial join responses' state (#14442)David Robertson2022-11-151-4/+19
| | | | | | | | | | | * Pull out hero selection logic * Include heroes in partial join response's state * Changelog * Fixup trial test * Remove TODO
* Fix typo in #13320 which could cause log spam (#14347)David Robertson2022-11-011-1/+1
|
* Refactor MSC3030 `/timestamp_to_event` to move away from our snowflake pull ↵Eric Eastwood2022-10-261-21/+109
| | | | | | | | | from `destination` pattern (#14096) 1. `federation_client.timestamp_to_event(...)` now handles all `destination` looping and uses our generic `_try_destination_list(...)` helper. 2. Consistently handling `NotRetryingDestination` and `FederationDeniedError` across `get_pdu` , backfill, and the generic `_try_destination_list` which is used for many places we use this pattern. 3. `get_pdu(...)` now returns `PulledPduInfo` so we know which `destination` we ended up pulling the PDU from
* Merge branch 'release-v1.70' into developOlivier Wilkinson (reivilibre)2022-10-251-2/+1
|\
| * Fix `TypeError: 'dict_keys' object is not reversible` (#14280)Erik Johnston2022-10-241-2/+1
| |
* | Stop returning `unsigned.invite_room_state` in `PUT ↵Andrew Morgan2022-10-201-0/+5
| | | | | | | | | | /_matrix/federation/v2/invite/{roomId}/{eventId}` responses (#14064) Co-authored-by: David Robertson <davidr@element.io>
* | Explain `SynapseError` and `FederationError` better (#14191)Eric Eastwood2022-10-191-0/+8
|/ | | | | Explain `SynapseError` and `FederationError` better Spawning from https://github.com/matrix-org/synapse/pull/13816#discussion_r993262622
* Don't require optional `invite_room_state` field on fed v2 invite (#14083)Andrew Morgan2022-10-141-1/+1
|
* Correct field name for stripped state events when knocking. ↵Andrew Morgan2022-10-122-2/+9
| | | | `knock_state_events` -> `knock_room_state` (#14102)
* Fix a bug where redactions were not being sent over federation if we did not ↵Shay2022-10-111-9/+20
| | | | have the original event. (#13813)
* Always close _all_ `ijson` coroutines, even if doing so raises Exceptions ↵David Robertson2022-10-061-4/+25
| | | | (#14065)
* Track when the pulled event signature fails (#13815)Eric Eastwood2022-10-032-13/+62
| | | | | | | | | Because we're doing the recording in `_check_sigs_and_hash_for_pulled_events_and_fetch` (previously named `_check_sigs_and_hash_and_fetch`), this means we will track signature failures for `backfill`, `get_room_state`, `get_event_auth`, and `get_missing_events` (all pulled event scenarios). And we also record signature failures from `get_pdu`. Part of https://github.com/matrix-org/synapse/issues/13700 Part of https://github.com/matrix-org/synapse/issues/13676 and https://github.com/matrix-org/synapse/issues/13356 This PR will be especially important for https://github.com/matrix-org/synapse/pull/13816 so we can avoid the costly `_get_state_ids_after_missing_prev_event` down the line when `/messages` calls backfill.
* Prioritize outbound to-device over device list updates (#13922)Erik Johnston2022-09-271-13/+16
| | | Otherwise device list changes for large accounts can temporarily delay to-device messages.
* Faster Remote Room Joins: tell remote homeservers that we are unable to ↵reivilibre2022-09-231-8/+3
| | | | authorise them if they query a room which has partial state on our server. (#13823)
* Don't include redundant prev_state in new events (#13791)Denis2022-09-201-3/+0
|
* Fix a long-standing spec compliance bug where Synapse would accept a ↵reivilibre2022-09-141-2/+1
| | | | | | | | | | | trailing slash on the end of `/get_missing_events` federation requests. (#13789) * Don't accept a trailing slash on the end of /get_missing_events * Newsfile Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Fix Prometheus recording rules to not use legacy metric names. (#13718)reivilibre2022-09-081-2/+2
|
* Rename the `EventFormatVersions` enum values so that they line up with room ↵reivilibre2022-09-072-2/+2
| | | | version numbers. (#13706)
* Add some logging to help track down #13444 (#13679)Erik Johnston2022-09-011-0/+13
|
* Generalise the `@cancellable` annotation so it can be used on functions ↵reivilibre2022-08-311-2/+3
| | | | other than just servlet methods. (#13662)
* Faster Room Joins: fix `/make_knock` blocking indefinitely when the room in ↵reivilibre2022-08-241-0/+11
| | | | | question is a partial-stated room. (#13583) Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Instrument `_check_sigs_and_hash_and_fetch` to trace time spent in child ↵Eric Eastwood2022-08-232-3/+42
| | | | | | | | | concurrent calls (#13588) Instrument `_check_sigs_and_hash_and_fetch` to trace time spent in child concurrent calls because I've see `_check_sigs_and_hash_and_fetch` take [10.41s to process 100 events](https://github.com/matrix-org/synapse/issues/13587) Fix https://github.com/matrix-org/synapse/issues/13587 Part of https://github.com/matrix-org/synapse/issues/13356
* Instrument the federation/backfill part of `/messages` (#13489)Eric Eastwood2022-08-161-1/+26
| | | | | | | | | Instrument the federation/backfill part of `/messages` so it's easier to follow what's going on in Jaeger when viewing a trace. Split out from https://github.com/matrix-org/synapse/pull/13440 Follow-up from https://github.com/matrix-org/synapse/pull/13368 Part of https://github.com/matrix-org/synapse/issues/13356
* Instrument `FederationStateIdsServlet` - `/state_ids` (#13499)Eric Eastwood2022-08-151-1/+10
| | | Instrument FederationStateIdsServlet - `/state_ids` so it's easier to follow what's going on in Jaeger when viewing a trace.
* Faster Room Joins: prevent Synapse from answering federated join requests ↵reivilibre2022-08-041-0/+17
| | | | for a room which it has not fully joined yet. (#13416)
* Instrument `/messages` for understandable traces in Jaeger (#13368)Eric Eastwood2022-08-031-0/+2
| | | | | | In Jaeger: - Before: huge list of uncategorized database calls - After: nice and collapsible into units of work
* Implement MSC3848: Introduce errcodes for specific event sending failures ↵Will Hunt2022-07-271-1/+1
| | | | | (#13343) Implements MSC3848
* Make minor clarifications to the error messages given when we fail to join a ↵reivilibre2022-07-271-1/+7
| | | | room via any server. (#13160)
* Fix `get_pdu` asking every remote destination even after it finds an event ↵Eric Eastwood2022-07-271-3/+3
| | | | (#13346)
* Add missing types to opentracing. (#13345)Patrick Cloke2022-07-211-1/+1
| | | After this change `synapse.logging` is fully typed.
* Update `get_pdu` to return the original, pristine `EventBase` (#13320)Eric Eastwood2022-07-201-42/+81
| | | | | | | | | | | | Update `get_pdu` to return the untouched, pristine `EventBase` as it was originally seen over federation (no metadata added). Previously, we returned the same `event` reference that we stored in the cache which downstream code modified in place and added metadata like setting it as an `outlier` and essentially poisoned our cache. Now we always return a copy of the `event` so the original can stay pristine in our cache and re-used for the next cache call. Split out from https://github.com/matrix-org/synapse/pull/13205 As discussed at: - https://github.com/matrix-org/synapse/pull/13205#discussion_r918365746 - https://github.com/matrix-org/synapse/pull/13205#discussion_r918366125 Related to https://github.com/matrix-org/synapse/issues/12584. This PR doesn't fix that issue because it hits [`get_event` which exists from the local database before it tries to `get_pdu`](https://github.com/matrix-org/synapse/blob/7864f33e286dec22368dc0b11c06eebb1462a51e/synapse/federation/federation_client.py#L581-L594).
* Add type annotations to `trace` decorator. (#13328)Patrick Cloke2022-07-192-2/+2
| | | | Functions that are decorated with `trace` are now properly typed and the type hints for them are fixed.
* Rate limit joins per-room (#13276)David Robertson2022-07-191-0/+16
|
* Federation Sender & Appservice Pusher Stream Optimisations (#13251)Nick Mills-Barrett2022-07-151-3/+7
| | | | | | | | | | | | | * Replace `get_new_events_for_appservice` with `get_all_new_events_stream` The functions were near identical and this brings the AS worker closer to the way federation senders work which can allow for multiple workers to handle AS traffic. * Pull received TS alongside events when processing the stream This avoids an extra query -per event- when both federation sender and appservice pusher process events.
* Handle race between persisting an event and un-partial stating a room (#13100)Sean Quah2022-07-051-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whenever we want to persist an event, we first compute an event context, which includes the state at the event and a flag indicating whether the state is partial. After a lot of processing, we finally try to store the event in the database, which can fail for partial state events when the containing room has been un-partial stated in the meantime. We detect the race as a foreign key constraint failure in the data store layer and turn it into a special `PartialStateConflictError` exception, which makes its way up to the method in which we computed the event context. To make things difficult, the exception needs to cross a replication request: `/fed_send_events` for events coming over federation and `/send_event` for events from clients. We transport the `PartialStateConflictError` as a `409 Conflict` over replication and turn `409`s back into `PartialStateConflictError`s on the worker making the request. All client events go through `EventCreationHandler.handle_new_client_event`, which is called in *a lot* of places. Instead of trying to update all the code which creates client events, we turn the `PartialStateConflictError` into a `429 Too Many Requests` in `EventCreationHandler.handle_new_client_event` and hope that clients take it as a hint to retry their request. On the federation event side, there are 7 places which compute event contexts. 4 of them use outlier event contexts: `FederationEventHandler._auth_and_persist_outliers_inner`, `FederationHandler.do_knock`, `FederationHandler.on_invite_request` and `FederationHandler.do_remotely_reject_invite`. These events won't have the partial state flag, so we do not need to do anything for then. The remaining 3 paths which create events are `FederationEventHandler.process_remote_join`, `FederationEventHandler.on_send_membership_event` and `FederationEventHandler._process_received_pdu`. We can't experience the race in `process_remote_join`, unless we're handling an additional join into a partial state room, which currently blocks, so we make no attempt to handle it correctly. `on_send_membership_event` is only called by `FederationServer._on_send_membership_event`, so we catch the `PartialStateConflictError` there and retry just once. `_process_received_pdu` is called by `on_receive_pdu` for incoming events and `_process_pulled_event` for backfill. The latter should never try to persist partial state events, so we ignore it. We catch the `PartialStateConflictError` in `on_receive_pdu` and retry just once. Refering to the graph of code paths in https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648 may make the above make more sense. Signed-off-by: Sean Quah <seanq@matrix.org>
* Stop depending on `room_id` to be returned for children state in the ↵Patrick Cloke2022-06-101-4/+0
| | | | | | | | | | | hierarchy response. (#12991) The `room_id` field was removed from MSC2946 before it was accepted. It was initially kept for backwards compatibility and should be removed now that the stable form of the API is used. This change only stops Synapse from validating that it is returned, a future PR will remove returning it as part of the response.
* Fix Synapse git info missing in version strings (#12973)David Robertson2022-06-071-2/+2
|
* Reduce amount of state we pull out when attempting to send catchup PDUs. ↵Erik Johnston2022-06-071-11/+20
| | | | | | | | | (#12963) * Don't pull out state for catchup * Newsfile * Merge newsfile
* Reduce state pulled from DB due to sending typing and receipts over ↵Erik Johnston2022-06-061-1/+5
| | | | | federation (#12964) Reducing the amount of state we pull from the DB is useful as fetching state is expensive in terms of DB, CPU and memory.
* Reduce the amount of state we pull from the DB (#12811)Erik Johnston2022-06-062-8/+5
|
* Wait for lazy join to complete when getting current state (#12872)Erik Johnston2022-06-011-1/+3
|
* Improve logging when signature checks fail (#12925)Richard van der Hoff2022-05-313-65/+94
| | | | | | | | | | | | | * Raise a dedicated `InvalidEventSignatureError` from `_check_sigs_on_pdu` * Downgrade logging about redactions to DEBUG this can be very spammy during a room join, and it's not very useful. * Raise `InvalidEventSignatureError` from `_check_sigs_and_hash` ... and, more importantly, move the logging out to the callers. * changelog
* Faster room joins: Try other destinations when resyncing the state of a ↵Sean Quah2022-05-311-1/+4
| | | | | | | partial-state room (#12812) Signed-off-by: Sean Quah <seanq@matrix.org>
* Merge branch 'master' into developErik Johnston2022-05-311-2/+1
|\
| * Fix import in module_api module and docs on the new check_event_for_spam ↵Brendan Abolivier2022-05-311-2/+1
| | | | | | | | | | signature (#12918) Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* | Rename storage classes (#12913)Erik Johnston2022-05-311-1/+0
| |
* | Additional constants for EDU types. (#12884)Patrick Cloke2022-05-274-6/+15
| | | | | | Instead of hard-coding strings in many places.
* | Remove federation client code for groups. (#12563)Patrick Cloke2022-05-271-483/+0
| |
* | Merge tag 'v1.60.0rc2' into developSean Quah2022-05-271-2/+7
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Synapse 1.60.0rc2 (2022-05-27) ============================== This release of Synapse adds a unique index to the `state_group_edges` table, in order to prevent accidentally introducing duplicate information (for example, because a database backup was restored multiple times). If your Synapse database already has duplicate rows in this table, this could fail with an error and require manual remediation. Additionally, the signature of the `check_event_for_spam` module callback has changed. The previous signature has been deprecated and remains working for now. Module authors should update their modules to use the new signature where possible. See [the upgrade notes](https://github.com/matrix-org/synapse/blob/develop/docs/upgrade.md#upgrading-to-v1600) for more details. Features -------- - Add an option allowing users to use their password to reauthenticate for privileged actions even though password login is disabled. ([\#12883](https://github.com/matrix-org/synapse/issues/12883)) Bugfixes -------- - Explicitly close `ijson` coroutines once we are done with them, instead of leaving the garbage collector to close them. ([\#12875](https://github.com/matrix-org/synapse/issues/12875)) Internal Changes ---------------- - Improve URL previews by not including the content of media tags in the generated description. ([\#12887](https://github.com/matrix-org/synapse/issues/12887))
| * Close `ijson` coroutines ourselves instead of letting the GC close them (#12875)Sean Quah2022-05-271-2/+7
| | | | | | | | | | | | | | Hopefully this means that exceptions raised due to truncated JSON get a sensible logging context and stack. Signed-off-by: Sean Quah <seanq@matrix.org>
* | Remove unstable APIs for /hierarchy. (#12851)Patrick Cloke2022-05-261-5/+0
| | | | | | | | Removes the unstable endpoint as well as a duplicated field which was modified during stabilization.
* | Avoid attempting to delete push actions for remote users. (#12879)Patrick Cloke2022-05-261-1/+1
| | | | | | | | Remote users will never have push actions, so we can avoid a database round-trip/transaction completely.
* | Allow bigger responses to `/federation/v1/state` (#12877)Richard van der Hoff2022-05-251-7/+8
| | | | | | | | | | | | | | | | | | | | | | * Refactor HTTP response size limits Rather than passing a separate `max_response_size` down the stack, make it an attribute of the `parser`. * Allow bigger responses on `federation/v1/state` `/state` can return huge responses, so we need to handle that.
* | Remove user-visible groups/communities code (#12553)Patrick Cloke2022-05-253-917/+1
|/ | | | | | | | | Makes it so that groups/communities no longer exist from a user-POV. E.g. we remove: * All API endpoints (including Client-Server, Server-Server, and admin). * Documented configuration options (and the experimental flag, which is now unused). * Special handling during room upgrades. * The `groups` section of the `/sync` response.
* Uniformize spam-checker API, part 2: check_event_for_spam (#12808)David Teller2022-05-231-2/+3
| | | Signed-off-by: David Teller <davidt@element.io>
* add SpamChecker callback for silently dropping inbound federated events (#12744)Jess Porter2022-05-231-4/+44
| | | Signed-off-by: jesopo <github@lolnerd.net>
* Make handling of federation Authorization header (more) compliant with ↵Hubert Chathi2022-05-181-3/+5
| | | | | | | | | | | | RFC7230 (#12774) The main differences are: - values with delimiters (such as colons) should be quoted, so always quote the origin, since it could contain a colon followed by a port number - should allow more than one space after "X-Matrix" - quoted values with backslash-escaped characters should be unescaped - names should be case insensitive
* Add some type hints to datastore (#12717)Dirk Klimpel2022-05-171-7/+17
|
* Complain if a federation endpoint has the `@cancellable` flag (#12705)Sean Quah2022-05-111-1/+12
| | | | | | | | `BaseFederationServlet` wraps its endpoints in a bunch of async code that has not been vetted for compatibility with cancellation. Fail CI if a `@cancellable` flag is applied to a federation endpoint. Signed-off-by: Sean Quah <seanq@element.io>
* Fix inconsistent spelling of 'M_UNRECOGNIZED'. (#12665)Val Lorentz2022-05-091-1/+1
|
* Support MSC3266 room summaries over federation (#11507)DeepBlueV7.X2022-05-051-0/+2
| | | | Signed-off-by: Nicolas Werner <nicolas.werner@hotmail.de>
* Add extra debug logging to federation sender (#12614)Richard van der Hoff2022-05-031-2/+18
| | | | ... in order to debug some problems we've been having with certain events not being sent when expected.
* Exclude OOB memberships from the federation sender (#12570)Richard van der Hoff2022-05-031-0/+39
| | | | | | | As the comment says, there is no need to process such events, and indeed we need to avoid doing so. Fixes #12509.
* Remove unused `# type: ignore`s (#12531)David Robertson2022-04-272-7/+7
| | | | | | | | | | | | | | | | | | | | | | Over time we've begun to use newer versions of mypy, typeshed, stub packages---and of course we've improved our own annotations. This makes some type ignore comments no longer necessary. I have removed them. There was one exception: a module that imports `select.epoll`. The ignore is redundant on Linux, but I've kept it ignored for those of us who work on the source tree using not-Linux. (#11771) I'm more interested in the config line which enforces this. I want unused ignores to be reported, because I think it's useful feedback when annotating to know when you've fixed a problem you had to previously ignore. * Installing extras before typechecking Lacking an easy way to install all extras generically, let's bite the bullet and make install the hand-maintained `all` extra before typechecking. Now that https://github.com/matrix-org/backend-meta/pull/6 is merged to the release/v1 branch.
* Implement MSC3383: include destination in X-Matrix auth header (#11398)Jan Christian Grünhage2022-04-191-8/+31
| | | | Co-authored-by: Jan Christian Grünhage <jan.christian@gruenhage.xyz> Co-authored-by: Marcus Hoffmann <bubu@bubu1.eu>
* Back out implementation of MSC2314 (#12474)Richard van der Hoff2022-04-192-18/+10
| | | | | | | | MSC2314 has now been closed, so we're backing out its implementation, which originally happened in #6176. Unfortunately it's not a direct revert, as that PR mixed in a bunch of unrelated changes to tests etc.
* Remove the unstable event field for `/send_join` per MSC3083. (#12395)Patrick Cloke2022-04-122-12/+0
| | | | | | | This was missed when initially stabilising room version 8 and was left in as a compatibility shim. Most homeservers have upgraded to a version which expects the proper field name, and the failure mode is reasonable (a user on an older server may have to attempt joining the room twice with an obscure error message the first time).
* Unify HTTP query parameter type hints (#12415)David Robertson2022-04-082-3/+5
| | | | | | * Pull out query param types to `synapse.http.types` * Use QueryParams everywhere * Simplify `encode_query_args` * Add annotation which would have caught #12410
* Fix fetching public rooms over federation (#12410)Erik Johnston2022-04-071-1/+1
| | | Broke by #12364
* Refactor and convert `Linearizer` to async (#12357)Sean Quah2022-04-051-5/+5
| | | | | | | | | | | Refactor and convert `Linearizer` to async. This makes a `Linearizer` cancellation bug easier to fix. Also refactor to use an async context manager, which eliminates an unlikely footgun where code that doesn't immediately use the context manager could forget to release the lock. Signed-off-by: Sean Quah <seanq@element.io>
* Fix a spec compliance issue where requests to the `/publicRooms` federation ↵reivilibre2022-04-051-2/+2
| | | | API would specify `limit` as a string. (#12364)
* Enhance logging for inbound federation events (#12301)Richard van der Hoff2022-03-251-1/+1
| | | | It is currently rather hard to see which rooms are causing inbound federation traffic. Add the room id to the logs.
* Return a 404 from `/state` for an outlier (#12087)Richard van der Hoff2022-03-211-4/+3
| | | | | * Replace `get_state_for_pdu` with `get_state_ids_for_pdu` and `get_events_as_list`. * Return a 404 from `/state` and `/state_ids` for an outlier
* Deprecate the groups/communities endpoints and add an experimental ↵Patrick Cloke2022-03-121-4/+11
| | | | configuration flag. (#12200)
* Rename get_tcp_replication to get_replication_command_handler. (#12192)Patrick Cloke2022-03-101-1/+1
| | | | | | Since the object it returns is a ReplicationCommandHandler. This is clean-up from adding support to Redis where the command handler was added as an additional layer of abstraction from the TCP protocol.
* Spread out sending device lists to remote hosts (#12132)Erik Johnston2022-03-043-10/+28
|
* Check if instances are lists, not sequences. (#12128)Patrick Cloke2022-03-021-4/+4
| | | | | As a str is a sequence, the checks were not granular enough and would allow lists or strings, when only lists were valid.
* Remove the unstable `/spaces` endpoint. (#12073)Patrick Cloke2022-02-283-303/+32
| | | | | | | | ...and various code supporting it. The /spaces endpoint was from an old version of MSC2946 and included both a Client-Server and Server-Server API. Note that the unstable /hierarchy endpoint (from the final version of MSC2946) is not yet removed.
* Actually fix bad debug logging rejecting device list & signing key ↵David Robertson2022-02-281-1/+1
| | | | transactions (#12098)
* Properly failover for unknown endpoints from Conduit/Dendrite. (#12077)Patrick Cloke2022-02-281-9/+13
| | | | | Before this fix, a legitimate 404 from a federation endpoint (e.g. due to an unknown room) would be treated as an unknown endpoint. This could cause unnecessary federation traffic.
* Remove `HomeServer.get_datastore()` (#12031)Richard van der Hoff2022-02-236-9/+10
| | | | | | | The presence of this method was confusing, and mostly present for backwards compatibility. Let's get rid of it. Part of #11733
* Implement account status endpoints (MSC3720) (#12001)Brendan Abolivier2022-02-224-2/+120
| | | | | See matrix-org/matrix-doc#3720 Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Faster joins: Support for calling `/federation/v1/state` (#12013)Richard van der Hoff2022-02-223-16/+157
| | | | This is an endpoint that we have server-side support for, but no client-side support. It's going to be useful for resyncing partial-stated rooms, so let's introduce it.
* remote join processing: get create event from state, not auth_chain (#12039)Richard van der Hoff2022-02-211-2/+4
| | | A follow-up to #12005, in which I apparently missed that there are a bunch of other places that assume the create event is in the auth chain.
* Minor typing fixes (#12034)Richard van der Hoff2022-02-211-9/+9
| | | | | | These started failing in https://github.com/matrix-org/synapse/pull/12031... I'm a bit mystified by how they ever worked.
* Faster joins: parse msc3706 fields in send_join response (#12011)Richard van der Hoff2022-02-172-32/+101
| | | Part of my work on #11249: add code to handle the new fields added in MSC3706.
* Use version string helper from matrix-common (#11979)David Robertson2022-02-141-3/+7
| | | | * Require latest matrix-common * Use the common function
* Implement MSC3706: partial state in `/send_join` response (#11967)Richard van der Hoff2022-02-122-11/+100
| | | | | | | | | | | | * Make `get_auth_chain_ids` return a Set It has a set internally, and a set is often useful where it gets used, so let's avoid converting to an intermediate list. * Minor refactors in `on_send_join_request` A little bit of non-functional groundwork * Implement MSC3706: partial state in /send_join response
* Improve opentracing for federation requests (#11870)Richard van der Hoff2022-02-031-19/+48
| | | | | | | | | The idea here is to set the parent span for incoming federation requests to the *outgoing* span on the other end. That means that you can see (most of) the full end-to-end flow when you have a process that includes federation requests. However, in order not to lose information, we still want a link to the `incoming-federation-request` span from the servlet, so we have to create another span to do exactly that.
* Fix losing incoming EDUs if debug logging enabled (#11890)David Robertson2022-02-021-2/+2
| | | | | | | | | * Fix losing incoming EDUs if debug logging enabled Fixes #11889. Homeservers should only be affected if the `synapse.8631_debug` logger was enabled for DEBUG mode. I am not sure if this merits a bugfix release: I think the logging can be disabled in config if anyone is affected? But it is still pretty bad.
* Add admin API to reset connection timeouts for remote server (#11639)Dirk Klimpel2022-01-255-25/+45
| | | * Fix get federation status of destination if no error occured
* Debug for device lists updates (#11760)David Robertson2022-01-202-0/+27
| | | | | | | | | | | | | | | | | | Debug for #8631. I'm having a hard time tracking down what's going wrong in that issue. In the reported example, I could see server A sending federation traffic to server B and all was well. Yet B reports out-of-sync device updates from A. I couldn't see what was _in_ the events being sent from A to B. So I have added some crude logging to track - when we have updates to send to a remote HS - the edus we actually accumulate to send - when a federation transaction includes a device list update edu - when such an EDU is received This is a bit of a sledgehammer.
* Fix a bug that corrupted the cache of federated space hierarchies (#11775)Sean Quah2022-01-201-9/+9
| | | | `FederationClient.get_room_hierarchy()` caches its return values, so refactor the code to avoid modifying the returned room summary.
* Remove `log_function` and its uses (#11761)Richard van der Hoff2022-01-184-59/+0
| | | | | | | I've never found this terribly useful. I think it was added in the early days of Synapse, without much thought as to what would actually be useful to log, and has just been cargo-culted ever since. Rather, it tends to clutter up debug logs with useless information.
* Use auto_attribs/native type hints for attrs classes. (#11692)Patrick Cloke2022-01-131-6/+6
|
* Strip unauthorized fields from `unsigned` object in events received over ↵Shay2022-01-061-0/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | federation (#11530) * add some tests to verify we are stripping unauthorized fields out of unsigned * add function to strip unauthorized fields from the unsigned object of event * newsfragment * update newsfragment number * add check to on_send_membership_event * refactor tests * fix lint error * slightly refactor tests and add some comments * slight refactor * refactor tests * fix import error * slight refactor * remove unsigned filtration code from synapse/handlers/federation_event.py * lint * move unsigned filtering code to event base * refactor tests * update newsfragment * requested changes * remove unused retun values
* Refactor the way we set `outlier` (#11634)Richard van der Hoff2022-01-052-37/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * `_auth_and_persist_outliers`: mark persisted events as outliers Mark any events that get persisted via `_auth_and_persist_outliers` as, well, outliers. Currently this will be a no-op as everything will already be flagged as an outlier, but I'm going to change that. * `process_remote_join`: stop flagging as outlier The events are now flagged as outliers later on, by `_auth_and_persist_outliers`. * `send_join`: remove `outlier=True` The events created here are returned in the result of `send_join` to `FederationHandler.do_invite_join`. From there they are passed into `FederationEventHandler.process_remote_join`, which passes them to `_auth_and_persist_outliers`... which sets the `outlier` flag. * `get_event_auth`: remove `outlier=True` stop flagging the events returned by `get_event_auth` as outliers. This method is only called by `_get_remote_auth_chain_for_event`, which passes the results into `_auth_and_persist_outliers`, which will flag them as outliers. * `_get_remote_auth_chain_for_event`: remove `outlier=True` we pass all the events into `_auth_and_persist_outliers`, which will now flag the events as outliers. * `_check_sigs_and_hash_and_fetch`: remove unused `outlier` parameter This param is now never set to True, so we can remove it. * `_check_sigs_and_hash_and_fetch_one`: remove unused `outlier` param This is no longer set anywhere, so we can remove it. * `get_pdu`: remove unused `outlier` parameter ... and chase it down into `get_pdu_from_destination_raw`. * `event_from_pdu_json`: remove redundant `outlier` param This is never set to `True`, so can be removed. * changelog * update docstring
* Re-apply: Move glob_to_regex and re_word_boundary to matrix-python-common ↵reivilibre2022-01-051-1/+2
| | | | | #11505 (#11687) Co-authored-by: Sean Quah <seanq@element.io>
* `FederationClient.backfill`: stop flagging events as outliers (#11632)Richard van der Hoff2022-01-041-1/+1
| | | | | | | | | | | | | | | | | | | | | Events returned by `backfill` should not be flagged as outliers. Fixes: ``` AssertionError: null File "synapse/handlers/federation.py", line 313, in try_backfill dom, room_id, limit=100, extremities=extremities File "synapse/handlers/federation_event.py", line 517, in backfill await self._process_pulled_events(dest, events, backfilled=True) File "synapse/handlers/federation_event.py", line 642, in _process_pulled_events await self._process_pulled_event(origin, ev, backfilled=backfilled) File "synapse/handlers/federation_event.py", line 669, in _process_pulled_event assert not event.internal_metadata.is_outlier() ``` See https://sentry.matrix.org/sentry/synapse-matrixorg/issues/231992 Fixes #8894.
* Convert all namedtuples to attrs. (#11665)Patrick Cloke2021-12-302-29/+23
| | | To improve type hints throughout the code.
* Improve opentracing for incoming HTTP requests (#11618)Richard van der Hoff2021-12-201-26/+13
| | | | | | | | | | | | | | | | | | | | | | * remove `start_active_span_from_request` Instead, pull out a separate function, `span_context_from_request`, to extract the parent span, which we can then pass into `start_active_span` as normal. This seems to be clearer all round. * Remove redundant tags from `incoming-federation-request` These are all wrapped up inside a parent span generated in AsyncResource, so there's no point duplicating all the tags that are set there. * Leave request spans open until the request completes It may take some time for the response to be encoded into JSON, and that JSON to be streamed back to the client, and really we want that inside the top-level span, so let's hand responsibility for closure to the SynapseRequest. * opentracing logs for HTTP request events * changelog
* Add missing type hints to `synapse.logging.context` (#11556)Sean Quah2021-12-141-5/+4
|
* Revert "Move `glob_to_regex` and `re_word_boundary` to ↵Sean Quah2021-12-071-2/+1
| | | | | | `matrix-python-common` (#11505) (#11527) This reverts commit a77c36989785c0d5565ab9a1169f4f88e512ce8a.
* Move `glob_to_regex` and `re_word_boundary` to `matrix-python-common` (#11505)Sean Quah2021-12-061-1/+2
|
* Add most of the missing type hints to `synapse.federation`. (#11483)Patrick Cloke2021-12-028-49/+77
| | | This skips a few methods which are difficult to type.
* Add MSC3030 experimental client and federation API endpoints to get the ↵Eric Eastwood2021-12-025-1/+208
| | | | | | | | | | | | | | | | | | | | | | | | | closest event to a given timestamp (#9445) MSC3030: https://github.com/matrix-org/matrix-doc/pull/3030 Client API endpoint. This will also go and fetch from the federation API endpoint if unable to find an event locally or we found an extremity with possibly a closer event we don't know about. ``` GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction> { "event_id": ... "origin_server_ts": ... } ``` Federation API endpoint: ``` GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction> { "event_id": ... "origin_server_ts": ... } ``` Co-authored-by: Erik Johnston <erik@matrix.org>
* Support the stable /hierarchy endpoint from MSC2946 (#11329)Patrick Cloke2021-11-293-10/+49
| | | | | | This also makes additional updates where the implementation had drifted from the approved MSC. Unstable endpoints will be removed at a later data.
* Return the stable `event` field from `/send_join` per MSC3083. (#11413)Patrick Cloke2021-11-292-2/+16
| | | | | This does not remove the unstable field and still parses both. Handling of the unstable field will need to be removed in the future.
* Split out federated PDU retrieval into a non-cached version (#11242)Eric Eastwood2021-11-091-22/+58
| | | | Context: https://github.com/matrix-org/synapse/pull/11114/files#r741643968
* Handle federation inbound instances being killed more gracefully (#11262)Erik Johnston2021-11-081-0/+5
| | | | | | | | | | | | | | | | | * Make lock better handle process being killed If the process gets killed and restarted (so that it didn't have a chance to drop its locks gracefully) then there may still be locks in the DB that are for the same instance that haven't yet timed out but are safe to delete. We handle this case by a) checking if the current instance already has taken out the lock, and b) if not then ignoring locks that are for the same instance. * Periodically check for old staged events This is to protect against other instances dying and their locks timing out.
* Enable passing typing stream writers as a list. (#11237)Nick Barrett2021-11-031-4/+0
| | | | This makes the typing stream writer config match the other stream writers that only currently support a single worker.
* Add `use_float=true` to ijson calls in Synapse (#11217)Shay2021-11-011-0/+3
| | | | | | | | | | | | | * add use_float=true to ijson calls * lints * add changelog * Update changelog.d/11217.bugfix Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Annotate `log_function` decorator (#10943)reivilibre2021-10-274-11/+39
| | | Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Add type hints for most `HomeServer` parameters (#11095)Sean Quah2021-10-222-5/+11
|
* Strip "join_authorised_via_users_server" from join events which do not need ↵Patrick Cloke2021-09-303-9/+9
| | | | | | | it. (#10933) This fixes a "Event not signed by authorising server" error when transition room member from join -> join, e.g. when updating a display name or avatar URL for restricted rooms.
* add event id to logcontext when handling incoming PDUs (#10936)Richard van der Hoff2021-09-291-1/+4
|
* Use direct references for configuration variables (part 6). (#10916)Patrick Cloke2021-09-291-1/+1
|
* Factor out common code for persisting fetched auth events (#10896)Richard van der Hoff2021-09-241-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Factor more stuff out of `_get_events_and_persist` It turns out that the event-sorting algorithm in `_get_events_and_persist` is also useful in other circumstances. Here we move the current `_auth_and_persist_fetched_events` to `_auth_and_persist_fetched_events_inner`, and then factor the sorting part out to `_auth_and_persist_fetched_events`. * `_get_remote_auth_chain_for_event`: remove redundant `outlier` assignment `get_event_auth` returns events with the outlier flag already set, so this is redundant (though we need to update a test where `get_event_auth` is mocked). * `_get_remote_auth_chain_for_event`: move existing-event tests earlier Move a couple of tests outside the loop. This is a bit inefficient for now, but a future commit will make it better. It should be functionally identical. * `_get_remote_auth_chain_for_event`: use `_auth_and_persist_fetched_events` We can use the same codepath for persisting the events fetched as part of an auth chain as for those fetched individually by `_get_events_and_persist` for building the state at a backwards extremity. * `_get_remote_auth_chain_for_event`: use a dict for efficiency `_auth_and_persist_fetched_events` sorts the events itself, so we no longer need to care about maintaining the ordering from `get_event_auth` (and no longer need to sort by depth in `get_event_auth`). That means that we can use a map, making it easier to filter out events we already have, etc. * changelog * `_auth_and_persist_fetched_events`: improve docstring
* Use direct references for configuration variables (part 4). (#10893)Patrick Cloke2021-09-231-1/+3
|
* Remove unnecessary parentheses around tuples returned from methods (#10889)Andrew Morgan2021-09-231-2/+2
|
* Use direct references for some configuration variables (part 2) (#10812)Patrick Cloke2021-09-152-2/+2
|
* Use direct references for some configuration variables (#10798)Patrick Cloke2021-09-131-1/+2
| | | | Instead of proxying through the magic getter of the RootConfig object. This should be more performant (and is more explicit).
* Add types to synapse.util. (#10601)reivilibre2021-09-101-2/+6
|
* Split `FederationHandler` in half (#10692)Richard van der Hoff2021-08-261-2/+5
| | | The idea here is to take anything to do with incoming events and move it out to a separate handler, as a way of making FederationHandler smaller.
* Cache the result of fetching the room hierarchy over federation. (#10647)Patrick Cloke2021-08-261-40/+66
|
* Do not include stack traces for known exceptions when trying multiple ↵Patrick Cloke2021-08-231-1/+6
| | | | federation destinations. (#10662)
* Split `on_receive_pdu` in half (#10640)Richard van der Hoff2021-08-191-3/+1
| | | Here we split on_receive_pdu into two functions (on_receive_pdu and process_pulled_event), rather than having both cases in the same method. There's a tiny bit of overlap, but not that much.
* Attempt to pull from the legacy spaces summary API over federation. (#10583)Patrick Cloke2021-08-171-9/+55
| | | | | | | If the new /hierarchy API does not exist on all destinations, fallback to querying the /spaces API and translating the results. This is a backwards compatibility hack since not all of the federated homeservers will update at the same time.
* Validate the max_rooms_per_space parameter to ensure it is non-negative. ↵Patrick Cloke2021-08-161-4/+18
| | | | (#10611)
* Experimental support for MSC3266 Room Summary API. (#10394)Michael Telatynski2021-08-161-2/+2
|
* Split `synapse.federation.transport.server` into multiple files. (#10590)Patrick Cloke2021-08-166-2158/+2218
|
* Clean up some logging in the federation event handler (#10591)Richard van der Hoff2021-08-161-0/+1
| | | | | | | | | | | | | | | | | | | * Include outlier status in `str(event)` In places where we log event objects, knowing whether or not you're dealing with an outlier is super useful. * Remove duplicated logging in get_missing_events When we process events received from get_missing_events, we log them twice (once in `_get_missing_events_for_pdu`, and once in `on_receive_pdu`). Reduce the duplication by removing the logging in `on_receive_pdu`, and ensuring the call sites do sensible logging. * log in `on_receive_pdu` when we already have the event * Log which prev_events we are missing * changelog
* Support federation in the new spaces summary API (MSC2946). (#10569)Patrick Cloke2021-08-163-0/+132
|
* Fix a harmless exception when the staged events queue is empty. (#10592)Patrick Cloke2021-08-131-5/+10
|
* Convert Transaction and Edu object to attrs (#10542)Patrick Cloke2021-08-066-92/+74
| | | | | Instead of wrapping the JSON into an object, this creates concrete instances for Transaction and Edu. This allows for improved type hints and simplified code.
* Fix exceptions in logs when failing to get remote room list (#10541)Erik Johnston2021-08-061-1/+2
|
* Refactoring before implementing the updated spaces summary. (#10527)Patrick Cloke2021-08-051-9/+14
| | | | | This should have no user-visible changes, but refactors some pieces of the SpaceSummaryHandler before adding support for the updated MSC2946.
* Prune inbound federation queues if they get too long (#10390)Erik Johnston2021-08-021-0/+17
|
* Improve failover logic for MSC3083 restricted rooms. (#10447)Patrick Cloke2021-07-291-4/+39
| | | | | If the federation client receives an M_UNABLE_TO_AUTHORISE_JOIN or M_UNABLE_TO_GRANT_JOIN response it will attempt another server before giving up completely.
* Update the MSC3083 support to verify if joins are from an authorized server. ↵Patrick Cloke2021-07-264-19/+141
| | | | (#10254)
* Add type hints to synapse.federation.transport.client. (#10408)Patrick Cloke2021-07-261-201/+298
|
* Add type hints to additional servlet functions (#10437)Patrick Cloke2021-07-211-11/+2
| | | | | | | | | Improves type hints for: * parse_{boolean,integer} * parse_{boolean,integer}_from_args * parse_json_{value,object}_from_request And fixes any incorrect calls that resulted from unknown types.
* Do not include signatures/hashes in make_{join,leave,knock} responses. (#10404)Patrick Cloke2021-07-161-6/+3
| | | | These signatures would end up invalid since the joining/leaving/knocking server would modify the response before calling send_{join,leave,knock}.
* Stagger send presence to remotes (#10398)Erik Johnston2021-07-152-5/+107
| | | | | | This is to help with performance, where trying to connect to thousands of hosts at once can consume a lot of CPU (due to TLS etc). Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
* Use inline type hints in various other places (in `synapse/`) (#10380)Jonathan de Jong2021-07-157-72/+62
|
* Add type hints to get_domain_from_id and get_localpart_from_id. (#10385)Patrick Cloke2021-07-131-24/+72
|
* Ensure we always drop the federation inbound lock (#10336)Erik Johnston2021-07-091-0/+1
|
* Handle old staged inbound events (#10303)Erik Johnston2021-07-061-10/+57
| | | | | | | We might have events in the staging area if the service was restarted while there were unhandled events in the staging area. Fixes #10295
* Move methods involving event authentication to EventAuthHandler. (#10268)Patrick Cloke2021-07-011-3/+3
| | | Instead of mixing them with user authentication methods.
* Fix the inbound PDU metric (#10279)Erik Johnston2021-06-301-17/+20
| | | This broke in #10272
* Merge branch 'release-v1.37' into developRichard van der Hoff2021-06-291-2/+96
|\
| * Handle inbound events from federation asynchronously (#10272)Erik Johnston2021-06-291-2/+96
| | | | | | | | | | | | | | | | | | | | | | Fixes #9490 This will break a couple of SyTest that are expecting failures to be added to the response of a federation /send, which obviously doesn't happen now that things are asynchronous. Two drawbacks: Currently there is no logic to handle any events left in the staging area after restart, and so they'll only be handled on the next incoming event in that room. That can be fixed separately. We now only process one event per room at a time. This can be fixed up further down the line.
* | Soft-fail spammy events received over federation (#10263)Richard van der Hoff2021-06-291-6/+6
| |
* | Add additional types to the federation transport server. (#10213)Patrick Cloke2021-06-281-114/+474
| |
* | Improve validation for `send_{join,leave,knock}` (#10225)Richard van der Hoff2021-06-242-55/+78
|/ | | The idea here is to stop people sending things that aren't joins/leaves/knocks through these endpoints: previously you could send anything you liked through them. I wasn't able to find any security holes from doing so, but it doesn't sound like a good thing.
* Expose opentracing trace id in response headers (#10199)Richard van der Hoff2021-06-181-0/+3
| | | Fixes: #9480
* Remove the experimental flag for knocking and use stable prefixes / ↵Patrick Cloke2021-06-153-51/+7
| | | | | | | endpoints. (#10167) * Room version 7 for knocking. * Stable prefixes and endpoints (both client and federation) for knocking. * Removes the experimental configuration flag.
* Implement knock feature (#6739)Sorunome2021-06-094-8/+277
| | | | | | This PR aims to implement the knock feature as proposed in https://github.com/matrix-org/matrix-doc/pull/2403 Signed-off-by: Sorunome mail@sorunome.de Signed-off-by: Andrew Morgan andrewm@element.io
* Add type hints to the federation server transport. (#10080)Patrick Cloke2021-06-082-72/+166
|
* When joining a remote room limit the number of events we concurrently check ↵Erik Johnston2021-06-082-217/+173
| | | | | signatures/hashes for (#10117) If we do hundreds of thousands at once the memory overhead can easily reach 500+ MB.
* Rewrite the KeyRing (#10035)Erik Johnston2021-06-021-1/+3
|
* Merge branch 'master' into developAndrew Morgan2021-06-011-0/+7
|\
| * Allow response of `/send_join` to be larger. (#10093)Erik Johnston2021-05-281-0/+7
| | | | | | Fixes #10087.
* | Set opentracing priority before setting other tags (#10092)Richard van der Hoff2021-05-281-1/+2
| | | | | | ... because tags on spans which aren't being sampled get thrown away.
* | Merge tag 'v1.35.0rc2' into developErik Johnston2021-05-271-1/+1
|\| | | | | | | | | | | | | | | | | | | Synapse 1.35.0rc2 (2021-05-27) ============================== Bugfixes -------- - Fix a bug introduced in v1.35.0rc1 when calling the spaces summary API via a GET request. ([\#10079](https://github.com/matrix-org/synapse/issues/10079))
| * Pass the origin when calculating the spaces summary over GET. (#10079)Patrick Cloke2021-05-271-1/+1
| | | | | | | | Fixes a bug due to conflicting PRs which were merged. (One added a new caller to a method, the other added a new parameter to the same method.)
* | Remove the experimental spaces enabled flag. (#10063)Patrick Cloke2021-05-261-7/+6
|/ | | | In lieu of just always enabling the unstable spaces endpoint and unstable room version.
* Don't hammer the database for destination retry timings every ~5mins (#10036)Erik Johnston2021-05-211-1/+1
|
* Add `Keyring.verify_events_for_server` and reduce memory usage (#10018)Erik Johnston2021-05-201-12/+5
| | | | | | Also add support for giving a callback to generate the JSON object to verify. This should reduce memory usage, as we no longer have the event in memory in dict form (which has a large memory footprint) for extend periods of time.
* Use ijson to parse the response to `/send_join`, reducing memory usage. (#9958)Erik Johnston2021-05-202-22/+91
| | | Instead of parsing the full response to `/send_join` into Python objects (which can be huge for large rooms) and *then* parsing that into events, we instead use ijson to stream parse the response directly into `EventBase` objects.
* Allow a user who could join a restricted room to see it in spaces summary. ↵Patrick Cloke2021-05-201-1/+1
| | | | | | (#9922) This finishes up the experimental implementation of MSC3083 by showing the restricted rooms in the spaces summary (from MSC2946).
* Support fetching the spaces summary via GET over federation. (#9947)Patrick Cloke2021-05-112-0/+27
| | | | | | | | | | | Per changes in MSC2946, the C-S and S-S APIs for spaces summary should use GET requests. Until this is stable, the POST endpoints still exist. This does not switch federation requests to use the GET version yet since it is newly added and already deployed servers might not support it. When switching to the stable endpoint we should switch to GET requests.
* Add debug logging for issue #9533 (#9959)Richard van der Hoff2021-05-111-0/+9
| | | | | Hopefully this will help us track down where to-device messages are getting lost/delayed.
* Fix `m.room_key_request` to-device messages (#9961)Richard van der Hoff2021-05-111-19/+0
| | | fixes #9960
* Revert "Experimental Federation Speedup (#9702)"Andrew Morgan2021-04-282-102/+58
| | | | This reverts commit 05e8c70c059f8ebb066e029bc3aa3e0cefef1019.
* Pass errors back to the client when trying multiple federation destinations. ↵Patrick Cloke2021-04-271-58/+60
| | | | | | | | (#9868) This ensures that something like an auth error (403) will be returned to the requester instead of attempting to try more servers, which will likely result in the same error, and then passing back a generic 400 error.
* Remove `synapse.types.Collection` (#9856)Richard van der Hoff2021-04-221-2/+12
| | | This is no longer required, since we have dropped support for Python 3.5.
* Fix bug where we sent remote presence states to remote servers (#9850)Erik Johnston2021-04-201-0/+4
|
* Fix (final) Bugbear violations (#9838)Jonathan de Jong2021-04-201-2/+2
|
* Don't send normal presence updates over federation replication stream (#9828)Erik Johnston2021-04-192-163/+3
|
* remove `HomeServer.get_config` (#9815)Richard van der Hoff2021-04-142-2/+2
| | | | Every single time I want to access the config object, I have to remember whether or not we use `get_config`. Let's just get rid of it.
* Experimental Federation Speedup (#9702)Jonathan de Jong2021-04-142-62/+93
| | | | | This basically speeds up federation by "squeezing" each individual dual database call (to destinations and destination_rooms), which previously happened per every event, into one call for an entire batch (100 max). Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>
* Remove redundant "coding: utf-8" lines (#9786)Jonathan de Jong2021-04-1413-13/+0
| | | | | | | Part of #9744 Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now. `Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
* Fix duplicate logging of exceptions in transaction processing (#9780)Richard van der Hoff2021-04-091-7/+3
| | | There's no point logging this twice.
* Bugbear: Add Mutable Parameter fixes (#9682)Jonathan de Jong2021-04-081-2/+3
| | | | | | | Part of #9366 Adds in fixes for B006 and B008, both relating to mutable parameter lint errors. Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>
* Fix sharded federation sender sometimes using 100% CPU.Erik Johnston2021-04-081-2/+4
| | | | | | | We pull all destinations requiring catchup from the DB in batches. However, if all those destinations get filtered out (due to the federation sender being sharded), then the `last_processed` destination doesn't get updated, and we keep requesting the same set repeatedly.
* Add a Synapse Module for configuring presence update routing (#9491)Andrew Morgan2021-04-061-1/+18
| | | | | | | | | | | | At the moment, if you'd like to share presence between local or remote users, those users must be sharing a room together. This isn't always the most convenient or useful situation though. This PR adds a module to Synapse that will allow deployments to set up extra logic on where presence updates should be routed. The module must implement two methods, `get_users_for_states` and `get_interested_users`. These methods are given presence updates or user IDs and must return information that Synapse will use to grant passing presence updates around. A method is additionally added to `ModuleApi` which allows triggering a set of users to receive the current, online presence information for all users they are considered interested in. This is the equivalent of that user receiving presence information during an initial sync. The goal of this module is to be fairly generic and useful for a variety of applications, with hard requirements being: * Sending state for a specific set or all known users to a defined set of local and remote users. * The ability to trigger an initial sync for specific users, so they receive all current state.
* Add type hints to expiring cache. (#9730)Patrick Cloke2021-04-061-1/+1
|
* Add type hints to the federation handler and server. (#9743)Patrick Cloke2021-04-062-15/+15
|
* Improve tracing for to device messages (#9686)Erik Johnston2021-04-011-0/+8
|
* Make RateLimiter class check for ratelimit overrides (#9711)Erik Johnston2021-03-301-1/+4
| | | | | | | This should fix a class of bug where we forget to check if e.g. the appservice shouldn't be ratelimited. We also check the `ratelimit_override` table to check if the user has ratelimiting disabled. That table is really only meant to override the event sender ratelimiting, so we don't use any values from it (as they might not make sense for different rate limits), but we do infer that if ratelimiting is disabled for the user we should disabled all ratelimits. Fixes #9663
* Add type hints for the federation sender. (#9681)Patrick Cloke2021-03-292-44/+160
| | | | Includes an abstract base class which both the FederationSender and the FederationRemoteSendQueue must implement.
* Fixed undefined variable error in catchup (#9664)Erik Johnston2021-03-241-0/+2
| | | | | Broke in #9640 Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Spaces summary: call out to other servers (#9653)Richard van der Hoff2021-03-242-11/+204
| | | | | When we hit an unknown room in the space tree, see if there are other servers that we might be able to poll to get the data. Fixes: #9447
* Federation API for Space summary (#9652)Richard van der Hoff2021-03-231-9/+58
| | | | | Builds on the work done in #9643 to add a federation API for space summaries. There's a bit of refactoring of the existing client-server code first, to avoid too much duplication.
* Import HomeServer from the proper module. (#9665)Patrick Cloke2021-03-231-1/+1
|
* Make federation catchup send last event from any server. (#9640)Erik Johnston2021-03-182-38/+91
| | | | | | | | | | | | | | Currently federation catchup will send the last *local* event that we failed to send to the remote. This can cause issues for large rooms where lots of servers have sent events while the remote server was down, as when it comes back up again it'll be flooded with events from various points in the DAG. Instead, let's make it so that all the servers send the most recent events, even if its not theirs. The remote should deduplicate the events, so there shouldn't be much overhead in doing this. Alternatively, the servers could only send local events if they were also extremities and hope that the other server will send the event over, but that is a bit risky.
* Don't go into federation catch up mode so easily (#9561)Erik Johnston2021-03-152-153/+182
| | | | | | | | | | Federation catch up mode is very inefficient if the number of events that the remote server has missed is small, since handling gaps can be very expensive, c.f. #9492. Instead of going into catch up mode whenever we see an error, we instead do so only if we've backed off from trying the remote for more than an hour (the assumption being that in such a case it is more than a transient failure).
* Fix additional type hints from Twisted 21.2.0. (#9591)Patrick Cloke2021-03-121-3/+5
|
* Reject concurrent transactions (#9597)Richard van der Hoff2021-03-121-35/+42
| | | | | | If more transactions arrive from an origin while we're still processing the first one, reject them. Hopefully a quick fix to https://github.com/matrix-org/synapse/issues/9489
* Improve logging when processing incoming transactions (#9596)Richard van der Hoff2021-03-121-27/+34
| | | Put the room id in the logcontext, to make it easier to understand what's going on.
* Use the chain cover index in get_auth_chain_ids. (#9576)Patrick Cloke2021-03-101-2/+4
| | | | This uses a simplified version of get_chain_cover_difference to calculate auth chain of events.
* Fix additional type hints. (#9543)Patrick Cloke2021-03-091-1/+1
| | | Type hint fixes due to Twisted 21.2.0 adding type hints.
* Add ResponseCache tests. (#9458)Jonathan de Jong2021-03-081-5/+8
|
* Replace `last_*_pdu_age` metrics with timestamps (#9540)Richard van der Hoff2021-03-042-12/+9
| | | | | | | | Following the advice at https://prometheus.io/docs/practices/instrumentation/#timestamps-not-time-since, it's preferable to export unix timestamps, not ages. There doesn't seem to be any particular naming convention for timestamp metrics.
* Ratelimit cross-user key sharing requests. (#8957)Patrick Cloke2021-02-191-2/+18
|
* Be smarter about which hosts to send presence to when processing room joins ↵Andrew Morgan2021-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | (#9402) This PR attempts to eliminate unnecessary presence sending work when your local server joins a room, or when a remote server joins a room your server is participating in by processing state deltas in chunks rather than individually. --- When your server joins a room for the first time, it requests the historical state as well. This chunk of new state is passed to the presence handler which, after filtering that state down to only membership joins, will send presence updates to homeservers for each join processed. It turns out that we were being a bit naive and processing each event individually, and sending out presence updates for every one of those joins. Even if many different joins were users on the same server (hello IRC bridges), we'd send presence to that same homeserver for every remote user join we saw. This PR attempts to deduplicate all of that by processing the entire batch of state deltas at once, instead of only doing each join individually. We process the joins and note down which servers need which presence: * If it was a local user join, send that user's latest presence to all servers in the room * If it was a remote user join, send the presence for all local users in the room to that homeserver We deduplicate by inserting all of those pending updates into a dictionary of the form: ``` { server_name1: {presence_update1, ...}, server_name2: {presence_update1, presence_update2, ...} } ``` Only after building this dict do we then start sending out presence updates.