summary refs log tree commit diff
path: root/tests/federation/test_federation_server.py (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Fix join being denied after being invited over federation (#18075)Eric Eastwood2025-01-271-10/+237
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This also happens for rejecting an invite. Basically, any out-of-band membership transition where we first get the membership as an `outlier` and then rely on federation filling us in to de-outlier it. This PR mainly addresses automated test flakiness, bots/scripts, and options within Synapse like [`auto_accept_invites`](https://element-hq.github.io/synapse/v1.122/usage/configuration/config_documentation.html#auto_accept_invites) that are able to react quickly (before federation is able to push us events), but also helps in generic scenarios where federation is lagging. I initially thought this might be a Synapse consistency issue (see issues labeled with [`Z-Read-After-Write`](https://github.com/matrix-org/synapse/labels/Z-Read-After-Write)) but it seems to be an event auth logic problem. Workers probably do increase the number of possible race condition scenarios that make this visible though (replication and cache invalidation lag). Fix https://github.com/element-hq/synapse/issues/15012 (probably fixes https://github.com/matrix-org/synapse/issues/15012 (https://github.com/element-hq/synapse/issues/15012)) Related to https://github.com/matrix-org/matrix-spec/issues/2062 Problems: 1. We don't consider [out-of-band membership](https://github.com/element-hq/synapse/blob/develop/docs/development/room-dag-concepts.md#out-of-band-membership-events) (outliers) in our `event_auth` logic even though we expose them in `/sync`. 1. (This PR doesn't address this point) Perhaps we should consider authing events in the persistence queue as events already in the queue could allow subsequent events to be allowed (events come through many channels: federation transaction, remote invite, remote join, local send). But this doesn't save us in the case where the event is more delayed over federation. ### What happened before? I wrote some Complement test that stresses this exact scenario and reproduces the problem: https://github.com/matrix-org/complement/pull/757 ``` COMPLEMENT_ALWAYS_PRINT_SERVER_LOGS=1 COMPLEMENT_DIR=../complement ./scripts-dev/complement.sh -run TestSynapseConsistency ``` We have `hs1` and `hs2` running in monolith mode (no workers): 1. `@charlie1:hs2` is invited and joins the room: 1. `hs1` invites `@charlie1:hs2` to a room which we receive on `hs2` as `PUT /_matrix/federation/v1/invite/{roomId}/{eventId}` (`on_invite_request(...)`) and the invite membership is persisted as an outlier. The `room_memberships` and `local_current_membership` database tables are also updated which means they are visible down `/sync` at this point. 1. `@charlie1:hs2` decides to join because it saw the invite down `/sync`. Because `hs2` is not yet in the room, this happens as a remote join `make_join`/`send_join` which comes back with all of the auth events needed to auth successfully and now `@charlie1:hs2` is successfully joined to the room. 1. `@charlie2:hs2` is invited and and tries to join the room: 1. `hs1` invites `@charlie2:hs2` to the room which we receive on `hs2` as `PUT /_matrix/federation/v1/invite/{roomId}/{eventId}` (`on_invite_request(...)`) and the invite membership is persisted as an outlier. The `room_memberships` and `local_current_membership` database tables are also updated which means they are visible down `/sync` at this point. 1. Because `hs2` is already participating in the room, we also see the invite come over federation in a transaction and we start processing it (not done yet, see below) 1. `@charlie2:hs2` decides to join because it saw the invite down `/sync`. Because `hs2`, is already in the room, this happens as a local join but we deny the event because our `event_auth` logic thinks that we have no membership in the room :x: (expected to be able to join because we saw the invite down `/sync`) 1. We finally finish processing the `@charlie2:hs2` invite event from and de-outlier it. - If this finished before we tried to join we would have been fine but this is the race condition that makes this situation visible. Logs for `hs2`: ``` 🗳️ on_invite_request: handling event <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=False> 🔦 _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True> 🔦 _store_room_members_txn update local_current_membership: <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True> 📨 Notifying about new event <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True> ✅ on_invite_request: handled event <FrozenEventV3 event_id=$PRPCvdXdcqyjdUKP_NxGF2CcukmwOaoK0ZR1WiVOZVk, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=invite, outlier=True> 🧲 do_invite_join for @user-2-charlie1:hs2 in !sfZVBdLUezpPWetrol:hs1 🔦 _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$bwv8LxFnqfpsw_rhR7OrTjtz09gaJ23MqstKOcs7ygA, type=m.room.member, state_key=@user-1-alice:hs1, membership=join, outlier=True> 🔦 _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$oju1ts3G3pz5O62IesrxX5is4LxAwU3WPr4xvid5ijI, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=join, outlier=False> 📨 Notifying about new event <FrozenEventV3 event_id=$oju1ts3G3pz5O62IesrxX5is4LxAwU3WPr4xvid5ijI, type=m.room.member, state_key=@user-2-charlie1:hs2, membership=join, outlier=False> ... 🗳️ on_invite_request: handling event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False> 🔦 _store_room_members_txn update room_memberships: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True> 🔦 _store_room_members_txn update local_current_membership: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True> 📨 Notifying about new event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True> ✅ on_invite_request: handled event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=True> 📬 handling received PDU in room !sfZVBdLUezpPWetrol:hs1: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False> 📮 handle_new_client_event: handling <FrozenEventV3 event_id=$WNVDTQrxy5tCdPQHMyHyIn7tE4NWqKsZ8Bn8R4WbBSA, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=join, outlier=False> ❌ Denying new event <FrozenEventV3 event_id=$WNVDTQrxy5tCdPQHMyHyIn7tE4NWqKsZ8Bn8R4WbBSA, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=join, outlier=False> because 403: You are not invited to this room. synapse.http.server - 130 - INFO - POST-16 - <SynapseRequest at 0x7f460c91fbf0 method='POST' uri='/_matrix/client/v3/join/%21sfZVBdLUezpPWetrol:hs1?server_name=hs1' clientproto='HTTP/1.0' site='8080'> SynapseError: 403 - You are not invited to this room. 📨 Notifying about new event <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False> ✅ handled received PDU in room !sfZVBdLUezpPWetrol:hs1: <FrozenEventV3 event_id=$O_54j7O--6xMsegY5EVZ9SA-mI4_iHJOIoRwYyeWIPY, type=m.room.member, state_key=@user-3-charlie2:hs2, membership=invite, outlier=False> ```
* Ensure that incoming to-device messages are not dropped (#17127)Richard van der Hoff2024-04-291-0/+17
| | | | | | | | | | | | | | | | | | | | ... when workers are unreachable, etc. Fixes https://github.com/element-hq/synapse/issues/17117. The general principle is just to make sure that we propagate any exceptions to the JsonResource, so that we return an error code to the sending server. That means that the sending server no longer considers the message safely sent, so it will retry later. In the issue, Erik mentions that an alternative solution would be to persist the to-device messages into a table so that they can be retried. This might be an improvement for performance, but even if we did that, we still need this mechanism, since we might be unable to reach the database. So, if we want to do that, it can be a later follow-up. --------- Co-authored-by: Erik Johnston <erik@matrix.org>
* Correctly mention previous copyright (#16820)Erik Johnston2024-01-231-0/+1
| | | | | During the migration the automated script to update the copyright headers accidentally got rid of some of the existing copyright lines. Reinstate them.
* Update license headersPatrick Cloke2023-11-211-11/+16
|
* Add a cache around server ACL checking (#16360)Patrick Cloke2023-09-261-13/+22
| | | | | * Pre-compiles the server ACLs onto an object per room and invalidates them when new events come in. * Converts the server ACL checking into Rust.
* Rename blacklist/whitelist internally. (#15620)Patrick Cloke2023-05-191-1/+1
| | | | Avoid renaming configuration settings for now and rename internal code to use blocklist and allowlist instead.
* Bump black from 22.12.0 to 23.1.0 (#15103)dependabot[bot]2023-02-221-1/+0
|
* Type hints for tests.federation (#14991)David Robertson2023-02-061-9/+9
| | | | | | | | | | | | | * Make tests.federation pass mypy * Untyped defs in tests.federation.transport * test methods return None * Remaining type hints in tests.federation * Changelog * Avoid an uncessary type-ignore
* Stabilise serving partial join responses (#14839)David Robertson2023-01-171-2/+1
| | | | | Serving partial join responses is no longer experimental. They will only be served under the stable identifier if the the undocumented config flag experimental.msc3706_enabled is set to true. Synapse continues to request a partial join only if the undocumented config flag experimental.faster_joins is set to true; this setting remains present and unaffected.
* Use stable identifiers for faster joins (#14832)David Robertson2023-01-131-1/+1
| | | | | | | | | | | * Use new query param when requesting a partial join * Read new query param when serving partial join * Provide new field names when serving partial joins * Read new field names from partial join response * Changelog
* Include heroes in partial join responses' state (#14442)David Robertson2022-11-151-4/+7
| | | | | | | | | | | * Pull out hero selection logic * Include heroes in partial join response's state * Changelog * Fixup trial test * Remove TODO
* Rate limit joins per-room (#13276)David Robertson2022-07-191-1/+62
|
* Use HTTPStatus constants in place of literals in tests. (#13297)Dirk Klimpel2022-07-151-5/+6
|
* Rename test case method to `add_hashes_and_signatures_from_other_server` ↵David Robertson2022-07-121-9/+4
| | | | (#13255)
* Reduce the amount of state we pull from the DB (#12811)Erik Johnston2022-06-061-2/+4
|
* Back out implementation of MSC2314 (#12474)Richard van der Hoff2022-04-191-39/+2
| | | | | | | | MSC2314 has now been closed, so we're backing out its implementation, which originally happened in #6176. Unfortunately it's not a direct revert, as that PR mixed in a bunch of unrelated changes to tests etc.
* Replace assertEquals and friends with non-deprecated versions. (#12092)Patrick Cloke2022-02-281-6/+6
|
* Implement MSC3706: partial state in `/send_join` response (#11967)Richard van der Hoff2022-02-121-0/+148
| | | | | | | | | | | | * Make `get_auth_chain_ids` return a Set It has a set internally, and a set is often useful where it gets used, so let's avoid converting to an intermediate list. * Minor refactors in `on_send_join_request` A little bit of non-functional groundwork * Implement MSC3706: partial state in /send_join response
* Tests: replace mocked Authenticator with the real thing (#11913)Richard van der Hoff2022-02-111-2/+2
| | | | | | | | | | | | If we prepopulate the test homeserver with a key for a remote homeserver, we can make federation requests to it without having to stub out the authenticator. This has two advantages: * means that what we are testing is closer to reality (ie, we now have complete tests for the incoming-request-authorisation flow) * some tests require that other objects be signed by the remote server (eg, the event in `/send_join`), and doing that would require a whole separate set of mocking out. It's much simpler just to use real keys.
* Use direct references for configuration variables (part 6). (#10916)Patrick Cloke2021-09-291-1/+1
|
* Flatten the synapse.rest.client package (#10600)reivilibre2021-08-171-1/+1
|
* Merge pull request from GHSA-x345-32rc-8h85Richard van der Hoff2021-05-111-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * tests for push rule pattern matching * tests for acl pattern matching * factor out common `re.escape` * Factor out common re.compile * Factor out common anchoring code * add word_boundary support to `glob_to_regex` * Use `glob_to_regex` in push rule evaluator NB that this drops support for character classes. I don't think anyone ever used them. * Improve efficiency of globs with multiple wildcards The idea here is that we compress multiple `*` globs into a single `.*`. We also need to consider `?`, since `*?*` is as hard to implement efficiently as `**`. * add assertion on regex pattern * Fix mypy * Simplify glob_to_regex * Inline the glob_to_regex helper function Signed-off-by: Dan Callahan <danc@element.io> * Moar comments Signed-off-by: Dan Callahan <danc@element.io> Co-authored-by: Dan Callahan <danc@element.io>
* Remove redundant "coding: utf-8" lines (#9786)Jonathan de Jong2021-04-141-1/+0
| | | | | | | Part of #9744 Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now. `Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
* Remove spurious "SynapseRequest" result from `make_request"Richard van der Hoff2020-12-151-3/+3
| | | | This was never used, so let's get rid of it.
* Remove redundant `HomeserverTestCase.render`Richard van der Hoff2020-11-161-3/+0
|
* Fix the exception that is raised when invalid JSON is encountered. (#8291)Patrick Cloke2020-09-101-0/+33
|
* Clarify list/set/dict/tuple comprehensions and enforce via flake8 (#6957)Patrick Cloke2020-02-211-1/+1
| | | | Ensure good comprehension hygiene using flake8-comprehensions.
* Add a `make_event_from_dict` method (#6858)Richard van der Hoff2020-02-071-2/+2
| | | | | | | ... and use it in places where it's trivial to do so. This will make it easier to pass room versions into the FrozenEvent constructors.
* Implementation of MSC2314 (#6176)Amber Brown2019-11-281-0/+63
|
* Remove test debugsErik Johnston2019-08-201-1/+0
|
* Run black.black2018-08-101-15/+11
|
* run isortAmber Brown2018-07-091-0/+1
|
* Implementation of server_aclsRichard van der Hoff2018-07-041-0/+57
... as described at https://docs.google.com/document/d/1EttUVzjc2DWe2ciw4XPtNpUpIl9lWXGEsy2ewDS7rtw.