summary refs log tree commit diff
path: root/synapse/handlers/federation.py (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Mitigate a race where /make_join could 403 for restricted rooms (#15080)Sean Quah2023-02-171-1/+15
| | | | | | | | | | | | | | | | | | | | | | | | | Previously, when creating a join event in /make_join, we would decide whether to include additional fields to satisfy restricted room checks based on the current state of the room. Then, when building the event, we would capture the forward extremities of the room to use as prev events. This is subject to race conditions. For example, when leaving and rejoining a room, the following sequence of events leads to a misleading 403 response: 1. /make_join reads the current state of the room and sees that the user is still in the room. It decides to omit the field required for restricted room joins. 2. The leave event is persisted and the room's forward extremities are updated. 3. /make_join builds the event, using the post-leave forward extremities. The event then fails the restricted room checks. To mitigate the race, we move the read of the forward extremities closer to the read of the current state. Ideally, we would compute the state based off the chosen prev events, but that can involve state resolution, which is expensive. Signed-off-by: Sean Quah <seanq@matrix.org>
* Fix order of partial state tables when purging (#15068)David Robertson2023-02-141-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Fix order of partial state tables when purging `partial_state_rooms` has an FK on `events` pointing to the join event we get from `/send_join`, so we must delete from that table before deleting from `events`. **NB:** It would be nice to cancel any resync processes for the room being purged. We do not do this at present. To do so reliably we'd need an internal HTTP "replication" endpoint, because the worker doing the resync process may be different to that handling the purge request. The first time the resync process tries to write data after the deletion it will fail because we have deleted necessary data e.g. auth events. AFAICS it will not retry the resync, so the only downside to not cancelling the resync is a scary-looking traceback. (This is presumably extremely race-sensitive.) * Changelog * admist(?) -> between * Warn about a race * Fix typo, thanks Sean Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> --------- Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Faster joins: don't stall when a user joins during a fast join (#14606)Mathieu Velten2023-02-101-1/+1
| | | | | | | | | | | | | | | | Fixes #12801. Complement tests are at https://github.com/matrix-org/complement/pull/567. Avoid blocking on full state when handling a subsequent join into a partial state room. Also always perform a remote join into partial state rooms, since we do not know whether the joining user has been banned and want to avoid leaking history to banned users. Signed-off-by: Mathieu Velten <mathieuv@matrix.org> Co-authored-by: Sean Quah <seanq@matrix.org> Co-authored-by: David Robertson <davidr@element.io>
* Add a class UnpersistedEventContext to allow for the batching up of storing ↵Shay2023-02-091-20/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | state groups (#14675) * add class UnpersistedEventContext * modify create new client event to create unpersistedeventcontexts * persist event contexts after creation * fix tests to persist unpersisted event contexts * cleanup * misc lints + cleanup * changelog + fix comments * lints * fix batch insertion? * reduce redundant calculation * add unpersisted event classes * rework compute_event_context, split into function that returns unpersisted event context and then persists it * use calculate_context_info to create unpersisted event contexts * update typing * $%#^&* * black * fix comments and consolidate classes, use attr.s for class * requested changes * lint * requested changes * requested changes * refactor to be stupidly explicit * clearer renaming and flow * make partial state non-optional * update docstrings --------- Co-authored-by: Erik Johnston <erik@matrix.org>
* Faster joins: Refactor handling of servers in room (#14954)Sean Quah2023-02-031-5/+15
| | | | | | | | | | | | | | Ensure that the list of servers in a partial state room always contains the server we joined off. Also refactor `get_partial_state_servers_at_join` to return `None` when the given room is no longer partial stated, to explicitly indicate when the room has partial state. Otherwise it's not clear whether an empty list means that the room has full state, or the room is partial stated, but the server we joined off told us that there are no servers in the room. Signed-off-by: Sean Quah <seanq@matrix.org>
* Use StrCollection in place of Collection[str] in (most) handlers code. (#14922)Patrick Cloke2023-01-261-18/+8
| | | | Due to the increased safety of StrCollection over Collection[str] and Sequence[str].
* Faster joins: omit partial rooms from eager syncs until the resync completes ↵David Robertson2023-01-231-10/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (#14870) * Allow `AbstractSet` in `StrCollection` Or else frozensets are excluded. This will be useful in an upcoming commit where I plan to change a function that accepts `List[str]` to accept `StrCollection` instead. * `rooms_to_exclude` -> `rooms_to_exclude_globally` I am about to make use of this exclusion mechanism to exclude rooms for a specific user and a specific sync. This rename helps to clarify the distinction between the global config and the rooms to exclude for a specific sync. * Better function names for internal sync methods * Track a list of excluded rooms on SyncResultBuilder I plan to feed a list of partially stated rooms for this sync to ignore * Exclude partial state rooms during eager sync using the mechanism established in the previous commit * Track un-partial-state stream in sync tokens So that we can work out which rooms have become fully-stated during a given sync period. * Fix mutation of `@cached` return value This was fouling up a complement test added alongside this PR. Excluding a room would mean the set of forgotten rooms in the cache would be extended. This means that room could be erroneously considered forgotten in the future. Introduced in #12310, Synapse 1.57.0. I don't think this had any user-visible side effects (until now). * SyncResultBuilder: track rooms to force as newly joined Similar plan as before. We've omitted rooms from certain sync responses; now we establish the mechanism to reintroduce them into future syncs. * Read new field, to present rooms as newly joined * Force un-partial-stated rooms to be newly-joined for eager incremental syncs only, provided they're still fully stated * Notify user stream listeners to wake up long polling syncs * Changelog * Typo fix Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Unnecessary list cast Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Rephrase comment Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Another comment Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com> * Fixup merge(?) * Poke notifier when receiving un-partial-stated msg over replication * Fixup merge whoops Thanks MV :) Co-authored-by: Mathieu Velen <mathieuv@matrix.org> Co-authored-by: Mathieu Velten <mathieuv@matrix.org> Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Faster joins: Update room stats and the user directory on workers when ↵Sean Quah2023-01-231-3/+4
| | | | | | | | | | | | | | | | | | | | | | | finishing join (#14874) * Faster joins: Update room stats and user directory on workers when done When finishing a partial state join to a room, we update the current state of the room without persisting additional events. Workers receive notice of the current state update over replication, but neglect to wake the room stats and user directory updaters, which then get incidentally triggered the next time an event is persisted or an unrelated event persister sends out a stream position update. We wake the room stats and user directory updaters at the appropriate time in this commit. Part of #12814 and #12815. Signed-off-by: Sean Quah <seanq@matrix.org> * fixup comment Signed-off-by: Sean Quah <seanq@matrix.org>
* Enable Faster Remote Room Joins against worker-mode Synapse. (#14752)reivilibre2023-01-221-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Enable Complement tests for Faster Remote Room Joins on worker-mode * (dangerous) Add an override to allow Complement to use FRRJ under workers * Newsfile Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> * Fix race where we didn't send out replication notification * MORE HACKS * Fix get_un_partial_stated_rooms_token to take instance_name * Fix bad merge * Remove warning * Correctly advance un_partial_stated_room_stream * Fix merge * Add another notify_replication * Fixups * Create a separate ReplicationNotifier * Fix test * Fix portdb * Create a separate ReplicationNotifier * Fix test * Fix portdb * Fix presence test * Newsfile * Apply suggestions from code review * Update changelog.d/14752.misc Co-authored-by: Erik Johnston <erik@matrix.org> * lint Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> Co-authored-by: Erik Johnston <erik@matrix.org>
* Faster joins: Fix incompatibility with restricted joins (#14882)Sean Quah2023-01-221-81/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Avoid clearing out forward extremities when doing a second remote join When joining a restricted room where the local homeserver does not have a user able to issue invites, we perform a second remote join. We want to avoid clearing out forward extremities in this case because the forward extremities we have are up to date and clearing out forward extremities creates a window in which the room can get bricked if Synapse crashes. Signed-off-by: Sean Quah <seanq@matrix.org> * Do a full join when doing a second remote join into a full state room We cannot persist a partial state join event into a joined full state room, so we perform a full state join for such rooms instead. As a future optimization, we could always perform a partial state join and compute or retrieve the full state ourselves if necessary. Signed-off-by: Sean Quah <seanq@matrix.org> * Add lock around partial state flag for rooms Signed-off-by: Sean Quah <seanq@matrix.org> * Preserve partial state info when doing a second partial state join Signed-off-by: Sean Quah <seanq@matrix.org> * Add newsfile * Add a TODO(faster_joins) marker Signed-off-by: Sean Quah <seanq@matrix.org>
* Faster joins: Avoid starting duplicate partial state syncs (#14844)Sean Quah2023-01-201-8/+98
| | | | | | | | | | | | | | | | | | Currently, we will try to start a new partial state sync every time we perform a remote join, which is undesirable if there is already one running for a given room. We intend to perform remote joins whenever additional local users wish to join a partial state room, so let's ensure that we do not start more than one concurrent partial state sync for any given room. ------------------------------------------------------------------------ There is a race condition where the homeserver leaves a room and later rejoins while the partial state sync from the previous membership is still running. There is no guarantee that the previous partial state sync will process the latest join, so we restart it if needed. Signed-off-by: Sean Quah <seanq@matrix.org>
* Make `handle_new_client_event` throws `PartialStateConflictError` (#14665)Mathieu Velten2022-12-151-39/+78
| | | | | | | Then adapts calling code to retry when needed so it doesn't 500 to clients. Signed-off-by: Mathieu Velten <mathieuv@matrix.org> Co-authored-by: Sean Quah <8349537+squahtx@users.noreply.github.com>
* Move `StateFilter` to `synapse.types` (#14668)David Robertson2022-12-121-1/+1
| | | | | * Move `StateFilter` to `synapse.types` * Changelog
* Faster remote room joins: stream the un-partial-stating of rooms over ↵reivilibre2022-12-051-0/+4
| | | | replication. [rei:frrj/streams/unpsr] (#14473)
* Faster joins: filter out non local events when a room doesn't have its full ↵Mathieu Velten2022-11-211-5/+10
| | | | | | state (#14404) Signed-off-by: Mathieu Velten <mathieuv@matrix.org>
* Remove redundant types from comments. (#14412)Patrick Cloke2022-11-161-2/+2
| | | | | | | Remove type hints from comments which have been added as Python type hints. This helps avoid drift between comments and reality, as well as removing redundant information. Also adds some missing type hints which were simple to fill in.
* Refactor MSC3030 `/timestamp_to_event` to move away from our snowflake pull ↵Eric Eastwood2022-10-261-6/+9
| | | | | | | | | from `destination` pattern (#14096) 1. `federation_client.timestamp_to_event(...)` now handles all `destination` looping and uses our generic `_try_destination_list(...)` helper. 2. Consistently handling `NotRetryingDestination` and `FederationDeniedError` across `get_pdu` , backfill, and the generic `_try_destination_list` which is used for many places we use this pattern. 3. `get_pdu(...)` now returns `PulledPduInfo` so we know which `destination` we ended up pulling the PDU from
* Add initial power level event to batch of bulk persisted events when ↵Shay2022-10-211-1/+3
| | | | creating a new room. (#14228)
* Prepatory work for adding power level event to batched events (#14214)Shay2022-10-181-7/+5
|
* When restarting a partial join resync, prioritise the server which actioned ↵David Robertson2022-10-181-23/+34
| | | | a partial join (#14126)
* Stop getting missing `prev_events` after we already know their signature is ↵Eric Eastwood2022-10-151-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | invalid (#13816) While https://github.com/matrix-org/synapse/pull/13635 stops us from doing the slow thing after we've already done it once, this PR stops us from doing one of the slow things in the first place. Related to - https://github.com/matrix-org/synapse/issues/13622 - https://github.com/matrix-org/synapse/pull/13635 - https://github.com/matrix-org/synapse/issues/13676 Part of https://github.com/matrix-org/synapse/issues/13356 Follow-up to https://github.com/matrix-org/synapse/pull/13815 which tracks event signature failures. With this PR, we avoid the call to the costly `_get_state_ids_after_missing_prev_event` because the signature failure will count as an attempt before and we filter events based on the backoff before calling `_get_state_ids_after_missing_prev_event` now. For example, this will save us 156s out of the 185s total that this `matrix.org` `/messages` request. If you want to see the full Jaeger trace of this, you can drag and drop this `trace.json` into your own Jaeger, https://gist.github.com/MadLittleMods/4b12d0d0afe88c2f65ffcc907306b761 To explain this exact scenario around `/messages` -> backfill, we call `/backfill` and first check the signatures of the 100 events. We see bad signature for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` and `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` (both member events). Then we process the 98 events remaining that have valid signatures but one of the events references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event`. So we have to do the whole `_get_state_ids_after_missing_prev_event` rigmarole which pulls in those same events which fail again because the signatures are still invalid. - `backfill` - `outgoing-federation-request` `/backfill` - `_check_sigs_and_hash_and_fetch` - `_check_sigs_and_hash_and_fetch_one` for each event received over backfill - ❗ `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)` - ❗ `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` fails with `Signature on retrieved event was invalid.`: `unable to verify signature for sender domain xxx: 401: Failed to find any key to satisfy: _FetchKeyRequest(...)` - `_process_pulled_events` - `_process_pulled_event` for each validated event - ❗ Event `$Q0iMdqtz3IJYfZQU2Xk2WjB5NDF8Gg8cFSYYyKQgKJ0` references `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` as a `prev_event` which is missing so we try to get it - `_get_state_ids_after_missing_prev_event` - `outgoing-federation-request` `/state_ids` - ❗ `get_pdu` for `$luA4l7QHhf_jadH3mI-AyFqho0U2Q-IXXUbGSMq6h6M` which fails the signature check again - ❗ `get_pdu` for `$zuOn2Rd2vsC7SUia3Hp3r6JSkSFKcc5j3QTTqW_0jDw` which fails the signature check
* Correct field name for stripped state events when knocking. ↵Andrew Morgan2022-10-121-4/+16
| | | | `knock_state_events` -> `knock_room_state` (#14102)
* Fix performance regression in `get_users_in_room` (#13972)Erik Johnston2022-09-301-1/+3
| | | | | Fixes #13942. Introduced in #13575. Basically, let's only get the ordered set of hosts out of the DB if we need an ordered set of hosts. Since we split the function up the caching won't be as good, but I think it will still be fine as e.g. multiple backfill requests for the same room will hit the cache.
* Improve backfill robustness by trying more servers. (#13890)reivilibre2022-09-291-2/+31
| | | Co-authored-by: Eric Eastwood <erice@element.io>
* Limit and filter the number of backfill points to get from the database (#13879)Eric Eastwood2022-09-281-48/+61
| | | | | | | | | There is no need to grab thousands of backfill points when we only need 5 to make the `/backfill` request with. We need to grab a few extra in case the first few aren't visible in the history. Previously, we grabbed thousands of backfill points from the database, then sorted and filtered them in the app. Fetching the 4.6k backfill points for `#matrix:matrix.org` from the database takes ~50ms - ~570ms so it's not like this saves a lot of time 🤷. But it might save us more time now that `get_backfill_points_in_room`/`get_insertion_event_backward_extremities_in_room` are more complicated after https://github.com/matrix-org/synapse/pull/13635 This PR moves the filtering and limiting to the SQL query so we just have less data to work with in the first place. Part of https://github.com/matrix-org/synapse/issues/13356
* Handle remote device list updates during partial join (#13913)Erik Johnston2022-09-281-0/+4
| | | | | | | c.f. #12993 (comment), point 3 This stores all device list updates that we receive while partial joins are ongoing, and processes them once we have the full state. Note: We don't actually process the device lists in the same ways as if we weren't partially joined. Instead of updating the device list remote cache, we simply notify local users that a change in the remote user's devices has happened. I think this is safe as if the local user requests the keys for the remote user and we don't have them we'll simply fetch them as normal.
* fix: Push notifications for invite over federation (#13719)Kateřina Churanová2022-09-281-3/+10
|
* Add new columns tracking when we partial-joined (#13892)David Robertson2022-09-271-1/+13
|
* Only try to backfill event if we haven't tried before recently (#13635)Eric Eastwood2022-09-231-3/+1
| | | | | | | | | | Only try to backfill event if we haven't tried before recently (exponential backoff). No need to keep trying the same backfill point that fails over and over. Fix https://github.com/matrix-org/synapse/issues/13622 Fix https://github.com/matrix-org/synapse/issues/8451 Follow-up to https://github.com/matrix-org/synapse/pull/13589 Part of https://github.com/matrix-org/synapse/issues/13356
* Faster Remote Room Joins: tell remote homeservers that we are unable to ↵reivilibre2022-09-231-21/+13
| | | | authorise them if they query a room which has partial state on our server. (#13823)
* Optimize how we calculate `likely_domains` during backfill (#13575)Eric Eastwood2022-08-301-43/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Optimize how we calculate `likely_domains` during backfill because I've seen this take 17s in production just to `get_current_state` which is used to `get_domains_from_state` (see case [*2. Loading tons of events* in the `/messages` investigation issue](https://github.com/matrix-org/synapse/issues/13356)). There are 3 ways we currently calculate hosts that are in the room: 1. `get_current_state` -> `get_domains_from_state` - Used in `backfill` to calculate `likely_domains` and `/timestamp_to_event` because it was cargo-culted from `backfill` - This one is being eliminated in favor of `get_current_hosts_in_room` in this PR 🕳 1. `get_current_hosts_in_room` - Used for other federation things like sending read receipts and typing indicators 1. `get_hosts_in_room_at_events` - Used when pushing out events over federation to other servers in the `_process_event_queue_loop` Fix https://github.com/matrix-org/synapse/issues/13626 Part of https://github.com/matrix-org/synapse/issues/13356 Mentioned in [internal doc](https://docs.google.com/document/d/1lvUoVfYUiy6UaHB6Rb4HicjaJAU40-APue9Q4vzuW3c/edit#bookmark=id.2tvwz3yhcafh) ### Query performance #### Before The query from `get_current_state` sucks just because we have to get all 80k events. And we see almost the exact same performance locally trying to get all of these events (16s vs 17s): ``` synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org'; Time: 16035.612 ms (00:16.036) synapse=# SELECT type, state_key, event_id FROM current_state_events WHERE room_id = '!OGEhHVWSdvArJzumhm:matrix.org'; Time: 4243.237 ms (00:04.243) ``` But what about `get_current_hosts_in_room`: When there is 8M rows in the `current_state_events` table, the previous query in `get_current_hosts_in_room` took 13s from complete freshness (when the events were first added). But takes 930ms after a Postgres restart or 390ms if running back to back to back. ```sh $ psql synapse synapse=# \timing on synapse=# SELECT COUNT(DISTINCT substring(state_key FROM '@[^:]*:(.*)$')) FROM current_state_events WHERE type = 'm.room.member' AND membership = 'join' AND room_id = '!OGEhHVWSdvArJzumhm:matrix.org'; count ------- 4130 (1 row) Time: 13181.598 ms (00:13.182) synapse=# SELECT COUNT(*) from current_state_events where room_id = '!OGEhHVWSdvArJzumhm:matrix.org'; count ------- 80814 synapse=# SELECT COUNT(*) from current_state_events; count --------- 8162847 synapse=# SELECT pg_size_pretty( pg_total_relation_size('current_state_events') ); pg_size_pretty ---------------- 4702 MB ``` #### After I'm not sure how long it takes from complete freshness as I only really get that opportunity once (maybe restarting computer but that's cumbersome) and it's not really relevant to normal operating times. Maybe you get closer to the fresh times the more access variability there is so that Postgres caches aren't as exact. Update: The longest I've seen this run for is 6.4s and 4.5s after a computer restart. After a Postgres restart, it takes 330ms and running back to back takes 260ms. ```sh $ psql synapse synapse=# \timing on Timing is on. synapse=# SELECT substring(c.state_key FROM '@[^:]*:(.*)$') as host FROM current_state_events c /* Get the depth of the event from the events table */ INNER JOIN events AS e USING (event_id) WHERE c.type = 'm.room.member' AND c.membership = 'join' AND c.room_id = '!OGEhHVWSdvArJzumhm:matrix.org' GROUP BY host ORDER BY min(e.depth) ASC; Time: 333.800 ms ``` #### Going further To improve things further we could add a `limit` parameter to `get_current_hosts_in_room`. Realistically, we don't need 4k domains to choose from because there is no way we're going to query that many before we a) probably get an answer or b) we give up. Another thing we can do is optimize the query to use a index skip scan: - https://wiki.postgresql.org/wiki/Loose_indexscan - Index Skip Scan, https://commitfest.postgresql.org/37/1741/ - https://www.timescale.com/blog/how-we-made-distinct-queries-up-to-8000x-faster-on-postgresql/
* Fix Prometheus metrics being negative (mixed up start/end) (#13584)Eric Eastwood2022-08-231-1/+6
| | | | | | | Fix: - https://github.com/matrix-org/synapse/pull/13535#discussion_r949582508 - https://github.com/matrix-org/synapse/pull/13533#discussion_r949577244
* Time how long it takes us to do backfill processing (#13535)Eric Eastwood2022-08-171-2/+54
|
* Instrument the federation/backfill part of `/messages` (#13489)Eric Eastwood2022-08-161-1/+9
| | | | | | | | | Instrument the federation/backfill part of `/messages` so it's easier to follow what's going on in Jaeger when viewing a trace. Split out from https://github.com/matrix-org/synapse/pull/13440 Follow-up from https://github.com/matrix-org/synapse/pull/13368 Part of https://github.com/matrix-org/synapse/issues/13356
* Instrument `FederationStateIdsServlet` - `/state_ids` (#13499)Eric Eastwood2022-08-151-1/+3
| | | Instrument FederationStateIdsServlet - `/state_ids` so it's easier to follow what's going on in Jaeger when viewing a trace.
* Faster Room Joins: prevent Synapse from answering federated join requests ↵reivilibre2022-08-041-0/+17
| | | | for a room which it has not fully joined yet. (#13416)
* Instrument `/messages` for understandable traces in Jaeger (#13368)Eric Eastwood2022-08-031-0/+2
| | | | | | In Jaeger: - Before: huge list of uncategorized database calls - After: nice and collapsible into units of work
* Fix error when out of servers to sync partial state with (#13432)Sean Quah2022-08-021-2/+3
| | | | | so that we raise the intended error instead. Signed-off-by: Sean Quah <seanq@matrix.org>
* Faster Room Joins: don't leave a stuck room partial state flag if the join ↵reivilibre2022-08-011-14/+18
| | | | fails. (#13403)
* Uniformize spam-checker API, part 5: expand other spam-checker callbacks to ↵David Teller2022-07-111-1/+2
| | | | | | return `Tuple[Codes, dict]` (#13044) Signed-off-by: David Teller <davidt@element.io> Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
* Faster room joins: fix race in recalculation of current room state (#13151)Sean Quah2022-07-071-7/+2
| | | | | | | | | | | Bounce recalculation of current state to the correct event persister and move recalculation of current state into the event persistence queue, to avoid concurrent updates to a room's current state. Also give recalculation of a room's current state a real stream ordering. Signed-off-by: Sean Quah <seanq@matrix.org>
* Handle race between persisting an event and un-partial stating a room (#13100)Sean Quah2022-07-051-14/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whenever we want to persist an event, we first compute an event context, which includes the state at the event and a flag indicating whether the state is partial. After a lot of processing, we finally try to store the event in the database, which can fail for partial state events when the containing room has been un-partial stated in the meantime. We detect the race as a foreign key constraint failure in the data store layer and turn it into a special `PartialStateConflictError` exception, which makes its way up to the method in which we computed the event context. To make things difficult, the exception needs to cross a replication request: `/fed_send_events` for events coming over federation and `/send_event` for events from clients. We transport the `PartialStateConflictError` as a `409 Conflict` over replication and turn `409`s back into `PartialStateConflictError`s on the worker making the request. All client events go through `EventCreationHandler.handle_new_client_event`, which is called in *a lot* of places. Instead of trying to update all the code which creates client events, we turn the `PartialStateConflictError` into a `429 Too Many Requests` in `EventCreationHandler.handle_new_client_event` and hope that clients take it as a hint to retry their request. On the federation event side, there are 7 places which compute event contexts. 4 of them use outlier event contexts: `FederationEventHandler._auth_and_persist_outliers_inner`, `FederationHandler.do_knock`, `FederationHandler.on_invite_request` and `FederationHandler.do_remotely_reject_invite`. These events won't have the partial state flag, so we do not need to do anything for then. The remaining 3 paths which create events are `FederationEventHandler.process_remote_join`, `FederationEventHandler.on_send_membership_event` and `FederationEventHandler._process_received_pdu`. We can't experience the race in `process_remote_join`, unless we're handling an additional join into a partial state room, which currently blocks, so we make no attempt to handle it correctly. `on_send_membership_event` is only called by `FederationServer._on_send_membership_event`, so we catch the `PartialStateConflictError` there and retry just once. `_process_received_pdu` is called by `on_receive_pdu` for incoming events and `_process_pulled_event` for backfill. The latter should never try to persist partial state events, so we ignore it. We catch the `PartialStateConflictError` in `on_receive_pdu` and retry just once. Refering to the graph of code paths in https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648 may make the above make more sense. Signed-off-by: Sean Quah <seanq@matrix.org>
* Uniformize spam-checker API, part 4: port other spam-checker callbacks to ↵David Teller2022-06-131-3/+7
| | | | | return `Union[Allow, Codes]`. (#12857) Co-authored-by: Brendan Abolivier <babolivier@matrix.org>
* Merge branch 'rav/simplify_event_auth_interface' into developRichard van der Hoff2022-06-131-15/+7
|\
| * Remove redundant `room_version` param from `check_auth_rules_from_context`Richard van der Hoff2022-06-121-13/+5
| | | | | | | | It's now implied by the room_version property on the event.
| * Remove `room_version` param from `validate_event_for_room_version`Richard van der Hoff2022-06-121-2/+2
| | | | | | | | | | | | | | Instead, use the `room_version` property of the event we're validating. The `room_version` was originally added as a parameter somewhere around #4482, but really it's been redundant since #6875 added a `room_version` field to `EventBase`.
* | Faster joins: add issue links to the TODOs (#13004)Richard van der Hoff2022-06-091-1/+12
|/ | | | ... to help us keep track of these things
* Reduce the amount of state we pull from the DB (#12811)Erik Johnston2022-06-061-1/+1
|
* Wait for lazy join to complete when getting current state (#12872)Erik Johnston2022-06-011-1/+6
|
* Faster room joins: Resume state re-syncing after a Synapse restart (#12813)Sean Quah2022-05-311-2/+25
| | | | Signed-off-by: Sean Quah <seanq@matrix.org>
* Faster room joins: Try other destinations when resyncing the state of a ↵Sean Quah2022-05-311-8/+78
| | | | | | | partial-state room (#12812) Signed-off-by: Sean Quah <seanq@matrix.org>
* Rename storage classes (#12913)Erik Johnston2022-05-311-12/+18
|
* Fix up `state_store` naming (#12871)Erik Johnston2022-05-251-2/+4
|
* Update EventContext `get_current_event_ids` and `get_prev_event_ids` to ↵Shay2022-05-201-2/+7
| | | | accept state filters and update calls where possible (#12791)
* Refactor `EventContext` (#12689)Erik Johnston2022-05-101-3/+3
| | | | | | | | | | Refactor how the `EventContext` class works, with the intention of reducing the amount of state we fetch from the DB during event processing. The idea here is to get rid of the cached `current_state_ids` and `prev_state_ids` that live in the `EventContext`, and instead defer straight to the database (and its caching). One change that may have a noticeable effect is that we now no longer prefill the `get_current_state_ids` cache on a state change. However, that query is relatively light, since its just a case of reading a table from the DB (unlike fetching state at an event which is more heavyweight). For deployments with workers this cache isn't even used. Part of #12684
* remove constantly lib use and switch to enums. (#12624)andrew do2022-05-041-2/+2
|
* Optimise backfill calculation (#12522)Richard van der Hoff2022-04-261-89/+145
| | | | | | Try to avoid an OOM by checking fewer extremities. Generally this is a big rewrite of _maybe_backfill, to try and fix some of the TODOs and other problems in it. It's best reviewed commit-by-commit.
* Resync state after partial-state join (#12394)Richard van der Hoff2022-04-121-0/+75
| | | | | We work through all the events with partial state, updating the state at each of them. Once it's done, we recalculate the state for the whole room, and then mark the room as having complete state.
* Refactor and convert `Linearizer` to async (#12357)Sean Quah2022-04-051-1/+1
| | | | | | | | | | | Refactor and convert `Linearizer` to async. This makes a `Linearizer` cancellation bug easier to fix. Also refactor to use an async context manager, which eliminates an unlikely footgun where code that doesn't immediately use the context manager could forget to release the lock. Signed-off-by: Sean Quah <seanq@element.io>
* Return a 404 from `/state` for an outlier (#12087)Richard van der Hoff2022-03-211-40/+21
| | | | | * Replace `get_state_for_pdu` with `get_state_ids_for_pdu` and `get_events_as_list`. * Return a 404 from `/state` and `/state_ids` for an outlier
* Skip attempt to get state at backwards-extremities (#12173)Richard van der Hoff2022-03-091-57/+3
| | | | We don't *have* the state at a backwards-extremity, so this is never going to do anything useful.
* Faster joins: persist to database (#12012)Richard van der Hoff2022-03-011-1/+10
| | | | | | | | | | | | When we get a partial_state response from send_join, store information in the database about it: * store a record about the room as a whole having partial state, and stash the list of member servers too. * flag the join event itself as having partial state * also, for any new events whose prev-events are partial-stated, note that they will *also* be partial-stated. We don't yet make any attempt to interpret this data, so API calls (and a bunch of other things) are just going to get incorrect data.
* Remove `HomeServer.get_datastore()` (#12031)Richard van der Hoff2022-02-231-1/+1
| | | | | | | The presence of this method was confusing, and mostly present for backwards compatibility. Let's get rid of it. Part of #11733
* Run `_handle_queued_pdus` as a background process (#12041)Richard van der Hoff2022-02-221-2/+4
| | | ... to ensure it gets a proper log context, mostly.
* remote join processing: get create event from state, not auth_chain (#12039)Richard van der Hoff2022-02-211-1/+1
| | | A follow-up to #12005, in which I apparently missed that there are a bunch of other places that assume the create event is in the auth chain.
* Fix historical messages backfilling in random order on remote homeservers ↵Eric Eastwood2022-02-071-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (MSC2716) (#11114) Fix https://github.com/matrix-org/synapse/issues/11091 Fix https://github.com/matrix-org/synapse/issues/10764 (side-stepping the issue because we no longer have to deal with `fake_prev_event_id`) 1. Made the `/backfill` response return messages in `(depth, stream_ordering)` order (previously only sorted by `depth`) - Technically, it shouldn't really matter how `/backfill` returns things but I'm just trying to make the `stream_ordering` a little more consistent from the origin to the remote homeservers in order to get the order of messages from `/messages` consistent ([sorted by `(topological_ordering, stream_ordering)`](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)). - Even now that we return backfilled messages in order, it still doesn't guarantee the same `stream_ordering` (and more importantly the [`/messages` order](https://github.com/matrix-org/synapse/blob/develop/docs/development/room-dag-concepts.md#depth-and-stream-ordering)) on the other server. For example, if a room has a bunch of history imported and someone visits a permalink to a historical message back in time, their homeserver will skip over the historical messages in between and insert the permalink as the next message in the `stream_order` and totally throw off the sort. - This will be even more the case when we add the [MSC3030 jump to date API endpoint](https://github.com/matrix-org/matrix-doc/pull/3030) so the static archives can navigate and jump to a certain date. - We're solving this in the future by switching to [online topological ordering](https://github.com/matrix-org/gomatrixserverlib/issues/187) and [chunking](https://github.com/matrix-org/synapse/issues/3785) which by its nature will apply retroactively to fix any inconsistencies introduced by people permalinking 2. As we're navigating `prev_events` to return in `/backfill`, we order by `depth` first (newest -> oldest) and now also tie-break based on the `stream_ordering` (newest -> oldest). This is technically important because MSC2716 inserts a bunch of historical messages at the same `depth` so it's best to be prescriptive about which ones we should process first. In reality, I think the code already looped over the historical messages as expected because the database is already in order. 3. Making the historical state chain and historical event chain float on their own by having no `prev_events` instead of a fake `prev_event` which caused backfill to get clogged with an unresolvable event. Fixes https://github.com/matrix-org/synapse/issues/11091 and https://github.com/matrix-org/synapse/issues/10764 4. We no longer find connected insertion events by finding a potential `prev_event` connection to the current event we're iterating over. We now solely rely on marker events which when processed, add the insertion event as an extremity and the federating homeserver can ask about it when time calls. - Related discussion, https://github.com/matrix-org/synapse/pull/11114#discussion_r741514793 Before | After --- | --- ![](https://user-images.githubusercontent.com/558581/139218681-b465c862-5c49-4702-a59e-466733b0cf45.png) | ![](https://user-images.githubusercontent.com/558581/146453159-a1609e0a-8324-439d-ae44-e4bce43ac6d1.png) #### Why aren't we sorting topologically when receiving backfill events? > The main reason we're going to opt to not sort topologically when receiving backfill events is because it's probably best to do whatever is easiest to make it just work. People will probably have opinions once they look at [MSC2716](https://github.com/matrix-org/matrix-doc/pull/2716) which could change whatever implementation anyway. > > As mentioned, ideally we would do this but code necessary to make the fake edges but it gets confusing and gives an impression of “just whyyyy” (feels icky). This problem also dissolves with online topological ordering. > > -- https://github.com/matrix-org/synapse/pull/11114#discussion_r741517138 See https://github.com/matrix-org/synapse/pull/11114#discussion_r739610091 for the technical difficulties
* Remove `log_function` and its uses (#11761)Richard van der Hoff2022-01-181-6/+0
| | | | | | | I've never found this terribly useful. I think it was added in the early days of Synapse, without much thought as to what would actually be useful to log, and has just been cargo-culted ever since. Rather, it tends to clutter up debug logs with useless information.
* Add missing type hints to `synapse.logging.context` (#11556)Sean Quah2021-12-141-8/+11
|
* Add MSC3030 experimental client and federation API endpoints to get the ↵Eric Eastwood2021-12-021-30/+31
| | | | | | | | | | | | | | | | | | | | | | | | | closest event to a given timestamp (#9445) MSC3030: https://github.com/matrix-org/matrix-doc/pull/3030 Client API endpoint. This will also go and fetch from the federation API endpoint if unable to find an event locally or we found an extremity with possibly a closer event we don't know about. ``` GET /_matrix/client/unstable/org.matrix.msc3030/rooms/<roomID>/timestamp_to_event?ts=<timestamp>&dir=<direction> { "event_id": ... "origin_server_ts": ... } ``` Federation API endpoint: ``` GET /_matrix/federation/unstable/org.matrix.msc3030/timestamp_to_event/<roomID>?ts=<timestamp>&dir=<direction> { "event_id": ... "origin_server_ts": ... } ``` Co-authored-by: Erik Johnston <erik@matrix.org>
* Move _persist_auth_tree into FederationEventHandler (#11115)Richard van der Hoff2021-10-191-124/+4
| | | | | This is just a lift-and-shift, because it fits more naturally here. We do rename it to `process_remote_join` at the same time though.
* Check *all* auth events for room id and rejection (#11009)Richard van der Hoff2021-10-181-6/+4
| | | | | | | | | | | This fixes a bug where we would accept an event whose `auth_events` include rejected events, if the rejected event was shadowed by another `auth_event` with same `(type, state_key)`. The approach is to pass a list of auth events into `check_auth_rules_for_event` instead of a dict, which of course means updating the call sites. This is an extension of #10956.
* Fix 500 error on `/messages` when we accumulate more than 5 backward ↵Eric Eastwood2021-10-141-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | extremities (#11027) Found while working on the Gitter backfill script and noticed it only happened after we sent 7 batches, https://gitlab.com/gitterHQ/webapp/-/merge_requests/2229#note_665906390 When there are more than 5 backward extremities for a given depth, backfill will throw an error because we sliced the extremity list to 5 but then try to iterate over the full list. This causes us to look for state that we never fetched and we get a `KeyError`. Before when calling `/messages` when there are more than 5 backward extremities: ``` Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/synapse/http/server.py", line 258, in _async_render_wrapper callback_return = await self._async_render(request) File "/usr/local/lib/python3.8/site-packages/synapse/http/server.py", line 446, in _async_render callback_return = await raw_callback_return File "/usr/local/lib/python3.8/site-packages/synapse/rest/client/room.py", line 580, in on_GET msgs = await self.pagination_handler.get_messages( File "/usr/local/lib/python3.8/site-packages/synapse/handlers/pagination.py", line 396, in get_messages await self.hs.get_federation_handler().maybe_backfill( File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 133, in maybe_backfill return await self._maybe_backfill_inner(room_id, current_depth, limit) File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 386, in _maybe_backfill_inner likely_extremeties_domains = get_domains_from_state(states[e_id]) KeyError: '$zpFflMEBtZdgcMQWTakaVItTLMjLFdKcRWUPHbbSZJl' ```
* Remove the deprecated BaseHandler. (#11005)Patrick Cloke2021-10-081-4/+2
| | | | | | | | The shared ratelimit function was replaced with a dedicated RequestRatelimiter class (accessible from the HomeServer object). Other properties were copied to each sub-class that inherited from BaseHandler.
* Strip "join_authorised_via_users_server" from join events which do not need ↵Patrick Cloke2021-09-301-2/+7
| | | | | | | it. (#10933) This fixes a "Event not signed by authorising server" error when transition room member from join -> join, e.g. when updating a display name or avatar URL for restricted rooms.
* Split `event_auth.check` into two parts (#10940)Richard van der Hoff2021-09-291-12/+18
| | | | | | | | | | | | | Broadly, the existing `event_auth.check` function has two parts: * a validation section: checks that the event isn't too big, that it has the rught signatures, etc. This bit is independent of the rest of the state in the room, and so need only be done once for each event. * an auth section: ensures that the event is allowed, given the rest of the state in the room. This gets done multiple times, against various sets of room state, because it forms part of the state res algorithm. Currently, this is implemented with `do_sig_check` and `do_size_check` parameters, but I think that makes everything hard to follow. Instead, we split the function in two and call each part separately where it is needed.
* Use direct references for configuration variables (part 6). (#10916)Patrick Cloke2021-09-291-1/+1
|
* Use `RoomVersion` objects (#10934)Richard van der Hoff2021-09-291-20/+26
| | | Various refactors to use `RoomVersion` objects instead of room version identifiers.
* Use direct references for configuration variables (part 5). (#10897)Patrick Cloke2021-09-241-1/+1
|
* Remove unnecessary parentheses around tuples returned from methods (#10889)Andrew Morgan2021-09-231-1/+1
|
* Factor out a separate `EventContext.for_outlier` (#10883)Richard van der Hoff2021-09-221-5/+4
| | | | | | Constructing an EventContext for an outlier is actually really simple, and there's no sense in going via an `async` method in the `StateHandler`. This also means that we can resolve a bunch of FIXMEs.
* Ensure we mark sent knocks as outliers (#10873)Richard van der Hoff2021-09-221-0/+7
|
* Require type hints in the handlers module. (#10831)Patrick Cloke2021-09-201-130/+0
| | | | | | | Adds missing type hints to methods in the synapse.handlers module and requires all methods to have type hints there. This also removes the unused construct_auth_difference method from the FederationHandler.
* Use direct references for some configuration variables (#10798)Patrick Cloke2021-09-131-2/+2
| | | | Instead of proxying through the magic getter of the RootConfig object. This should be more performant (and is more explicit).
* Populate `rooms.creator` field for easy lookup (#10697)Eric Eastwood2021-09-011-0/+1
| | | | | | Part of https://github.com/matrix-org/synapse/pull/10566 - Fill in creator whenever we insert into the rooms table - Add background update to backfill any missing creator values
* Split `FederationHandler` in half (#10692)Richard van der Hoff2021-08-261-1765/+22
| | | The idea here is to take anything to do with incoming events and move it out to a separate handler, as a way of making FederationHandler smaller.
* Make `backfill` and `get_missing_events` use the same codepath (#10645)Richard van der Hoff2021-08-261-233/+40
| | | Given that backfill and get_missing_events are basically the same thing, it's somewhat crazy that we have entirely separate code paths for them. This makes backfill use the existing get_missing_events code, and then clears up all the unused code.
* Split `on_receive_pdu` in half (#10640)Richard van der Hoff2021-08-191-98/+138
| | | Here we split on_receive_pdu into two functions (on_receive_pdu and process_pulled_event), rather than having both cases in the same method. There's a tiny bit of overlap, but not that much.
* Extract `_resolve_state_at_missing_prevs` (#10624)Richard van der Hoff2021-08-191-105/+124
| | | This is a follow-up to #10615: it takes the code that constructs the state at a backwards extremity, and extracts it to a separate method.
* Refactor `on_receive_pdu` code (#10615)Richard van der Hoff2021-08-181-134/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * drop room pdu linearizer sooner No point holding onto it while we recheck the db * move out `missing_prevs` calculation we're going to need `missing_prevs` whatever we do, so we may as well calculate it eagerly and just update it if it gets outdated. * Add another `if missing_prevs` condition this should be a no-op, since all the code inside the block already checks `if missing_prevs` * reorder if conditions This shouldn't change the logic at all. * Push down `min_depth` read No point reading it from the database unless we're going to use it. * Collect the sent_to_us_directly code together Move the remaining `sent_to_us_directly` code inside the `if sent_to_us_directly` block. * Properly separate the `not sent_to_us_directly` branch Since the only way this second block is now reachable is if we *didn't* go into the `sent_to_us_directly` branch, we can replace it with a simple `else`. * changelog
* Stop setting the outlier flag for things that aren't (#10614)Richard van der Hoff2021-08-171-7/+2
| | | | | Marking things as outliers to inhibit pushes is a sledgehammer to crack a nut. Move the test further down the stack so that we just inhibit the thing we want.
* Clean up some logging in the federation event handler (#10591)Richard van der Hoff2021-08-161-28/+24
| | | | | | | | | | | | | | | | | | | * Include outlier status in `str(event)` In places where we log event objects, knowing whether or not you're dealing with an outlier is super useful. * Remove duplicated logging in get_missing_events When we process events received from get_missing_events, we log them twice (once in `_get_missing_events_for_pdu`, and once in `on_receive_pdu`). Reduce the duplication by removing the logging in `on_receive_pdu`, and ensuring the call sites do sensible logging. * log in `on_receive_pdu` when we already have the event * Log which prev_events we are missing * changelog
* Clean up federation event auth code (#10539)Richard van der Hoff2021-08-061-52/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * drop old-room hack pretty sure we don't need this any more. * Remove incorrect comment about modifying `context` It doesn't look like the supplied context is ever modified. * Stop `_auth_and_persist_event` modifying its parameters This is only called in three places. Two of them don't pass `auth_events`, and the third doesn't use the dict after passing it in, so this should be non-functional. * Stop `_check_event_auth` modifying its parameters `_check_event_auth` is only called in three places. `on_send_membership_event` doesn't pass an `auth_events`, and `prep` and `_auth_and_persist_event` do not use the map after passing it in. * Stop `_update_auth_events_and_context_for_auth` modifying its parameters Return the updated auth event dict, rather than modifying the parameter. This is only called from `_check_event_auth`. * Improve documentation on `_auth_and_persist_event` Rename `auth_events` parameter to better reflect what it contains. * Improve documentation on `_NewEventInfo` * Improve documentation on `_check_event_auth` rename `auth_events` parameter to better describe what it contains * changelog
* Add support for MSC2716 marker events (#10498)Eric Eastwood2021-08-041-6/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Make historical messages available to federated servers Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716 Follow-up to https://github.com/matrix-org/synapse/pull/9247 * Debug message not available on federation * Add base starting insertion point when no chunk ID is provided * Fix messages from multiple senders in historical chunk Follow-up to https://github.com/matrix-org/synapse/pull/9247 Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716 --- Previously, Synapse would throw a 403, `Cannot force another user to join.`, because we were trying to use `?user_id` from a single virtual user which did not match with messages from other users in the chunk. * Remove debug lines * Messing with selecting insertion event extremeties * Move db schema change to new version * Add more better comments * Make a fake requester with just what we need See https://github.com/matrix-org/synapse/pull/10276#discussion_r660999080 * Store insertion events in table * Make base insertion event float off on its own See https://github.com/matrix-org/synapse/pull/10250#issuecomment-875711889 Conflicts: synapse/rest/client/v1/room.py * Validate that the app service can actually control the given user See https://github.com/matrix-org/synapse/pull/10276#issuecomment-876316455 Conflicts: synapse/rest/client/v1/room.py * Add some better comments on what we're trying to check for * Continue debugging * Share validation logic * Add inserted historical messages to /backfill response * Remove debug sql queries * Some marker event implemntation trials * Clean up PR * Rename insertion_event_id to just event_id * Add some better sql comments * More accurate description * Add changelog * Make it clear what MSC the change is part of * Add more detail on which insertion event came through * Address review and improve sql queries * Only use event_id as unique constraint * Fix test case where insertion event is already in the normal DAG * Remove debug changes * Add support for MSC2716 marker events * Process markers when we receive it over federation * WIP: make hs2 backfill historical messages after marker event * hs2 to better ask for insertion event extremity But running into the `sqlite3.IntegrityError: NOT NULL constraint failed: event_to_state_groups.state_group` error * Add insertion_event_extremities table * Switch to chunk events so we can auth via power_levels Previously, we were using `content.chunk_id` to connect one chunk to another. But these events can be from any `sender` and we can't tell who should be able to send historical events. We know we only want the application service to do it but these events have the sender of a real historical message, not the application service user ID as the sender. Other federated homeservers also have no indicator which senders are an application service on the originating homeserver. So we want to auth all of the MSC2716 events via power_levels and have them be sent by the application service with proper PL levels in the room. * Switch to chunk events for federation * Add unstable room version to support new historical PL * Messy: Fix undefined state_group for federated historical events ``` 2021-07-13 02:27:57,810 - synapse.handlers.federation - 1248 - ERROR - GET-4 - Failed to backfill from hs1 because NOT NULL constraint failed: event_to_state_groups.state_group Traceback (most recent call last): File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1216, in try_backfill await self.backfill( File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 1035, in backfill await self._auth_and_persist_event(dest, event, context, backfilled=True) File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 2222, in _auth_and_persist_event await self._run_push_actions_and_persist_event(event, context, backfilled) File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 2244, in _run_push_actions_and_persist_event await self.persist_events_and_notify( File "/usr/local/lib/python3.8/site-packages/synapse/handlers/federation.py", line 3290, in persist_events_and_notify events, max_stream_token = await self.storage.persistence.persist_events( File "/usr/local/lib/python3.8/site-packages/synapse/logging/opentracing.py", line 774, in _trace_inner return await func(*args, **kwargs) File "/usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 320, in persist_events ret_vals = await yieldable_gather_results(enqueue, partitioned.items()) File "/usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 237, in handle_queue_loop ret = await self._per_item_callback( File "/usr/local/lib/python3.8/site-packages/synapse/storage/persist_events.py", line 577, in _persist_event_batch await self.persist_events_store._persist_events_and_state_updates( File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 176, in _persist_events_and_state_updates await self.db_pool.runInteraction( File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 681, in runInteraction result = await self.runWithConnection( File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 770, in runWithConnection return await make_deferred_yieldable( File "/usr/local/lib/python3.8/site-packages/twisted/python/threadpool.py", line 238, in inContext result = inContext.theWork() # type: ignore[attr-defined] File "/usr/local/lib/python3.8/site-packages/twisted/python/threadpool.py", line 254, in <lambda> inContext.theWork = lambda: context.call( # type: ignore[attr-defined] File "/usr/local/lib/python3.8/site-packages/twisted/python/context.py", line 118, in callWithContext return self.currentContext().callWithContext(ctx, func, *args, **kw) File "/usr/local/lib/python3.8/site-packages/twisted/python/context.py", line 83, in callWithContext return func(*args, **kw) File "/usr/local/lib/python3.8/site-packages/twisted/enterprise/adbapi.py", line 293, in _runWithConnection compat.reraise(excValue, excTraceback) File "/usr/local/lib/python3.8/site-packages/twisted/python/deprecate.py", line 298, in deprecatedFunction return function(*args, **kwargs) File "/usr/local/lib/python3.8/site-packages/twisted/python/compat.py", line 403, in reraise raise exception.with_traceback(traceback) File "/usr/local/lib/python3.8/site-packages/twisted/enterprise/adbapi.py", line 284, in _runWithConnection result = func(conn, *args, **kw) File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 765, in inner_func return func(db_conn, *args, **kwargs) File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 549, in new_transaction r = func(cursor, *args, **kwargs) File "/usr/local/lib/python3.8/site-packages/synapse/logging/utils.py", line 69, in wrapped return f(*args, **kwargs) File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 385, in _persist_events_txn self._store_event_state_mappings_txn(txn, events_and_contexts) File "/usr/local/lib/python3.8/site-packages/synapse/storage/databases/main/events.py", line 2065, in _store_event_state_mappings_txn self.db_pool.simple_insert_many_txn( File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 923, in simple_insert_many_txn txn.execute_batch(sql, vals) File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 280, in execute_batch self.executemany(sql, args) File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 300, in executemany self._do_execute(self.txn.executemany, sql, *args) File "/usr/local/lib/python3.8/site-packages/synapse/storage/database.py", line 330, in _do_execute return func(sql, *args) sqlite3.IntegrityError: NOT NULL constraint failed: event_to_state_groups.state_group ``` * Revert "Messy: Fix undefined state_group for federated historical events" This reverts commit 187ab28611546321e02770944c86f30ee2bc742a. * Fix federated events being rejected for no state_groups Add fix from https://github.com/matrix-org/synapse/pull/10439 until it merges. * Adapting to experimental room version * Some log cleanup * Add better comments around extremity fetching code and why * Rename to be more accurate to what the function returns * Add changelog * Ignore rejected events * Use simplified upsert * Add Erik's explanation of extra event checks See https://github.com/matrix-org/synapse/pull/10498#discussion_r680880332 * Clarify that the depth is not directly correlated to the backwards extremity that we return See https://github.com/matrix-org/synapse/pull/10498#discussion_r681725404 * lock only matters for sqlite See https://github.com/matrix-org/synapse/pull/10498#discussion_r681728061 * Move new SQL changes to its own delta file * Clean up upsert docstring * Bump database schema version (62)
* Make historical events discoverable from backfill for servers without any ↵Eric Eastwood2021-07-281-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | scrollback history (MSC2716) (#10245) * Make historical messages available to federated servers Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716 Follow-up to https://github.com/matrix-org/synapse/pull/9247 * Debug message not available on federation * Add base starting insertion point when no chunk ID is provided * Fix messages from multiple senders in historical chunk Follow-up to https://github.com/matrix-org/synapse/pull/9247 Part of MSC2716: https://github.com/matrix-org/matrix-doc/pull/2716 --- Previously, Synapse would throw a 403, `Cannot force another user to join.`, because we were trying to use `?user_id` from a single virtual user which did not match with messages from other users in the chunk. * Remove debug lines * Messing with selecting insertion event extremeties * Move db schema change to new version * Add more better comments * Make a fake requester with just what we need See https://github.com/matrix-org/synapse/pull/10276#discussion_r660999080 * Store insertion events in table * Make base insertion event float off on its own See https://github.com/matrix-org/synapse/pull/10250#issuecomment-875711889 Conflicts: synapse/rest/client/v1/room.py * Validate that the app service can actually control the given user See https://github.com/matrix-org/synapse/pull/10276#issuecomment-876316455 Conflicts: synapse/rest/client/v1/room.py * Add some better comments on what we're trying to check for * Continue debugging * Share validation logic * Add inserted historical messages to /backfill response * Remove debug sql queries * Some marker event implemntation trials * Clean up PR * Rename insertion_event_id to just event_id * Add some better sql comments * More accurate description * Add changelog * Make it clear what MSC the change is part of * Add more detail on which insertion event came through * Address review and improve sql queries * Only use event_id as unique constraint * Fix test case where insertion event is already in the normal DAG * Remove debug changes * Switch to chunk events so we can auth via power_levels Previously, we were using `content.chunk_id` to connect one chunk to another. But these events can be from any `sender` and we can't tell who should be able to send historical events. We know we only want the application service to do it but these events have the sender of a real historical message, not the application service user ID as the sender. Other federated homeservers also have no indicator which senders are an application service on the originating homeserver. So we want to auth all of the MSC2716 events via power_levels and have them be sent by the application service with proper PL levels in the room. * Switch to chunk events for federation * Add unstable room version to support new historical PL * Fix federated events being rejected for no state_groups Add fix from https://github.com/matrix-org/synapse/pull/10439 until it merges. * Only connect base insertion event to prev_event_ids Per discussion with @erikjohnston, https://matrix.to/#/!UytJQHLQYfvYWsGrGY:jki.re/$12bTUiObDFdHLAYtT7E-BvYRp3k_xv8w0dUQHibasJk?via=jki.re&via=matrix.org * Make it possible to get the room_version with txn * Allow but ignore historical events in unsupported room version See https://github.com/matrix-org/synapse/pull/10245#discussion_r675592489 We can't reject historical events on unsupported room versions because homeservers without knowledge of MSC2716 or the new room version don't reject historical events either. Since we can't rely on the auth check here to stop historical events on unsupported room versions, I've added some additional checks in the processing/persisting code (`synapse/storage/databases/main/events.py` -> `_handle_insertion_event` and `_handle_chunk_event`). I've had to do some refactoring so there is method to fetch the room version by `txn`. * Move to unique index syntax See https://github.com/matrix-org/synapse/pull/10245#discussion_r675638509 * High-level document how the insertion->chunk lookup works * Remove create_event fallback for room_versions See https://github.com/matrix-org/synapse/pull/10245/files#r677641879 * Use updated method name
* Update the MSC3083 support to verify if joins are from an authorized server. ↵Patrick Cloke2021-07-261-10/+44
| | | | (#10254)
* Port the ThirdPartyEventRules module interface to the new generic interface ↵Brendan Abolivier2021-07-201-2/+2
| | | | | (#10386) Port the third-party event rules interface to the generic module interface introduced in v1.37.0
* [pyupgrade] `synapse/` (#10348)Jonathan de Jong2021-07-191-1/+1
| | | | | | | | | This PR is tantamount to running ``` pyupgrade --py36-plus --keep-percent-format `find synapse/ -type f -name "*.py"` ``` Part of #9744
* Use inline type hints in `handlers/` and `rest/`. (#10382)Jonathan de Jong2021-07-161-11/+11
|
* Fix a number of logged errors caused by remote servers being down. (#10400)Erik Johnston2021-07-151-9/+16
|
* Move methods involving event authentication to EventAuthHandler. (#10268)Patrick Cloke2021-07-011-13/+23
| | | Instead of mixing them with user authentication methods.
* Return errors from `send_join` etc if the event is rejected (#10243)Richard van der Hoff2021-06-241-7/+39
| | | Rather than persisting rejected events via `send_join` and friends, raise a 403 if someone tries to pull a fast one.
* Improve validation for `send_{join,leave,knock}` (#10225)Richard van der Hoff2021-06-241-126/+51
| | | The idea here is to stop people sending things that aren't joins/leaves/knocks through these endpoints: previously you could send anything you liked through them. I wasn't able to find any security holes from doing so, but it doesn't sound like a good thing.
* Send out invite rejections and knocks over federation (#10223)Richard van der Hoff2021-06-231-0/+14
| | | | | ensure that events sent via `send_leave` and `send_knock` are sent on to the rest of the federation.
* Check third party rules before persisting knocks over federation (#10212)Andrew Morgan2021-06-211-2/+2
| | | | | An accidental mis-ordering of operations during #6739 technically allowed an incoming knock event over federation in before checking it against any configured Third Party Access Rules modules. This PR corrects that by performing the TPAR check *before* persisting the event.
* update black to 21.6b0 (#10197)Marcus2021-06-171-1/+1
| | | | | Reformat all files with the new version. Signed-off-by: Marcus Hoffmann <bubu@bubu1.eu>
* Add fields to better debug where events are being soft_failed (#10168)Eric Eastwood2021-06-171-3/+18
| | | Follow-up to https://github.com/matrix-org/synapse/pull/10156#discussion_r650292223
* Remove the experimental flag for knocking and use stable prefixes / ↵Patrick Cloke2021-06-151-4/+2
| | | | | | | endpoints. (#10167) * Room version 7 for knocking. * Stable prefixes and endpoints (both client and federation) for knocking. * Removes the experimental configuration flag.
* Add metrics to track how often events are `soft_failed` (#10156)Eric Eastwood2021-06-111-0/+7
| | | | | | | | | | | Spawned from missing messages we were seeing on `matrix.org` from a federated Gtiter bridged room, https://gitlab.com/gitterHQ/webapp/-/issues/2770. The underlying issue in Synapse is tracked by https://github.com/matrix-org/synapse/issues/10066 where the message and join event race and the message is `soft_failed` before the `join` event reaches the remote federated server. Less soft_failed events = better and usually this should only trigger for events where people are doing bad things and trying to fuzz and fake everything.
* Implement knock feature (#6739)Sorunome2021-06-091-3/+183
| | | | | | This PR aims to implement the knock feature as proposed in https://github.com/matrix-org/matrix-doc/pull/2403 Signed-off-by: Sorunome mail@sorunome.de Signed-off-by: Andrew Morgan andrewm@element.io
* Handle /backfill returning no events (#10133)Erik Johnston2021-06-081-13/+25
| | | Fixes #10123
* Don't try and backfill the same room in parallel. (#10116)Erik Johnston2021-06-041-0/+8
| | | | | If backfilling is slow then the client may time out and retry, causing Synapse to start a new `/backfill` before the existing backfill has finished, duplicating work.
* Limit number of events in a replication request (#10118)Erik Johnston2021-06-041-2/+3
| | | Fixes #9956.
* add a cache to have_seen_event (#9953)Richard van der Hoff2021-06-011-5/+7
| | | Empirically, this helped my server considerably when handling gaps in Matrix HQ. The problem was that we would repeatedly call have_seen_events for the same set of (50K or so) auth_events, each of which would take many minutes to complete, even though it's only an index scan.
* Limit the number of events sent over replication when persisting events. ↵Brendan Abolivier2021-05-271-7/+10
| | | | (#10082)
* Refactor checking restricted join rules (#10007)Patrick Cloke2021-05-181-20/+9
| | | | | To be more consistent with similar code. The check now automatically raises an AuthError instead of passing back a boolean. It also absorbs some shared logic between callers.
* Improve performance of backfilling in large rooms. (#9935)Erik Johnston2021-05-101-69/+54
| | | | | | We were pulling the full auth chain for the room out of the DB each time we backfilled, which can be *huge* for large rooms and is totally unnecessary.
* Don't set the external cache if its been done recently (#9905)Erik Johnston2021-05-051-1/+3
|
* Check for space membership during a remote join of a restricted room (#9814)Patrick Cloke2021-04-231-9/+35
| | | | | | When receiving a /send_join request for a room with join rules set to 'restricted', check if the user is a member of the spaces defined in the 'allow' key of the join rules. This only applies to an experimental room version, as defined in MSC3083.
* Fix (final) Bugbear violations (#9838)Jonathan de Jong2021-04-201-1/+1
|
* Separate creating an event context from persisting it in the federation ↵Patrick Cloke2021-04-141-65/+113
| | | | | | handler (#9800) This refactoring allows adding logic that uses the event context before persisting it.
* Revert "Check for space membership during a remote join of a restricted ↵Patrick Cloke2021-04-141-142/+70
| | | | | | | | room. (#9763)" This reverts commit cc51aaaa7adb0ec2235e027b5184ebda9b660ec4. The PR was prematurely merged and not yet approved.
* Check for space membership during a remote join of a restricted room. (#9763)Patrick Cloke2021-04-141-70/+142
| | | | | | | When receiving a /send_join request for a room with join rules set to 'restricted', check if the user is a member of the spaces defined in the 'allow' key of the join rules. This only applies to an experimental room version, as defined in MSC3083.
* Remove redundant "coding: utf-8" lines (#9786)Jonathan de Jong2021-04-141-1/+0
| | | | | | | Part of #9744 Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now. `Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
* Bugbear: Add Mutable Parameter fixes (#9682)Jonathan de Jong2021-04-081-1/+1
| | | | | | | Part of #9366 Adds in fixes for B006 and B008, both relating to mutable parameter lint errors. Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>
* Add type hints to the federation handler and server. (#9743)Patrick Cloke2021-04-061-80/+81
|
* Make RateLimiter class check for ratelimit overrides (#9711)Erik Johnston2021-03-301-1/+1
| | | | | | | This should fix a class of bug where we forget to check if e.g. the appservice shouldn't be ratelimited. We also check the `ratelimit_override` table to check if the user has ratelimiting disabled. That table is really only meant to override the event sender ratelimiting, so we don't use any values from it (as they might not make sense for different rate limits), but we do infer that if ratelimiting is disabled for the user we should disabled all ratelimits. Fixes #9663
* Optimise missing prev_event handling (#9601)Richard van der Hoff2021-03-151-21/+131
| | | | | | | | | | | | | | | | | | | | | | | | | | | Background: When we receive incoming federation traffic, and notice that we are missing prev_events from the incoming traffic, first we do a `/get_missing_events` request, and then if we still have missing prev_events, we set up new backwards-extremities. To do that, we need to make a `/state_ids` request to ask the remote server for the state at those prev_events, and then we may need to then ask the remote server for any events in that state which we don't already have, as well as the auth events for those missing state events, so that we can auth them. This PR attempts to optimise the processing of that state request. The `state_ids` API returns a list of the state events, as well as a list of all the auth events for *all* of those state events. The optimisation comes from the observation that we are currently loading all of those auth events into memory at the start of the operation, but we almost certainly aren't going to need *all* of the auth events. Rather, we can check that we have them, and leave the actual load into memory for later. (Ideally the federation API would tell us which auth events we're actually going to need, but it doesn't.) The effect of this is to reduce the number of events that I need to load for an event in Matrix HQ from about 60000 to about 22000, which means it can stay in my in-memory cache, whereas previously the sheer number of events meant that all 60K events had to be loaded from db for each request, due to the amount of cache churn. (NB I've already tripled the size of the cache from its default of 10K). Unfortunately I've ended up basically C&Ping `_get_state_for_room` and `_get_events_from_store_or_dest` into a new method, because `_get_state_for_room` is also called during backfill, which expects the auth events to be returned, so the same tricks don't work. That said, I don't really know why that codepath is completely different (ultimately we're doing the same thing in setting up a new backwards extremity) so I've left a TODO suggesting that we clean it up.
* Improve logging when processing incoming transactions (#9596)Richard van der Hoff2021-03-121-46/+16
| | | Put the room id in the logcontext, to make it easier to understand what's going on.
* Use the chain cover index in get_auth_chain_ids. (#9576)Patrick Cloke2021-03-101-3/+3
| | | | This uses a simplified version of get_chain_cover_difference to calculate auth chain of events.
* Update black, and run auto formatting over the codebase (#9381)Eric Eastwood2021-02-161-41/+61
| | | | | | | - Update black version to the latest - Run black auto formatting over the codebase - Run autoformatting according to [`docs/code_style.md `](https://github.com/matrix-org/synapse/blob/80d6dc9783aa80886a133756028984dbf8920168/docs/code_style.md) - Update `code_style.md` docs around installing black to use the correct version
* Remove dead handled_events set in invite_join (#9394)Andrew Morgan2021-02-121-6/+0
| | | | | This PR removes a set that was created and [initially used](https://github.com/matrix-org/synapse/commit/1d2a0040cff8d04cdc7d7d09d8f04a5d628fa9dd#diff-0bc92da3d703202f5b9be2d3f845e375f5b1a6bc6ba61705a8af9be1121f5e42R435-R436), but is no longer today. May help cut down a bit on the time it takes to accept invites.
* Honour ratelimit flag for application services for invite ratelimiting (#9302)Erik Johnston2021-02-031-1/+3
|
* Ratelimit invites by room and target user (#9258)Erik Johnston2021-01-291-0/+4
|
* Precompute joined hosts and store in Redis (#9198)Erik Johnston2021-01-261-0/+5
|
* Allow spam-checker modules to be provide async methods. (#8890)David Teller2020-12-111-1/+1
| | | | Spam checker modules can now provide async methods. This is implemented in a backwards-compatible manner.
* Apply an IP range blacklist to push and key revocation requests. (#8821)Patrick Cloke2020-12-021-1/+1
| | | | | | | | | | | | Replaces the `federation_ip_range_blacklist` configuration setting with an `ip_range_blacklist` setting with wider scope. It now applies to: * Federation * Identity servers * Push notifications * Checking key validitity for third-party invite events The old `federation_ip_range_blacklist` setting is still honored if present, but with reduced scope (it only applies to federation and identity servers).
* Consistently use room_id from federation request body (#8776)Richard van der Hoff2020-11-191-5/+5
| | | | | | | | | | | | | * Consistently use room_id from federation request body Some federation APIs have a redundant `room_id` path param (see https://github.com/matrix-org/matrix-doc/issues/2330). We should make sure we consistently use either the path param or the body param, and the body param is easier. * Kill off some references to "context" Once upon a time, "rooms" were known as "contexts". I think this kills of the last references to "contexts".
* Generalise _maybe_store_room_on_invite (#8754)Andrew Morgan2020-11-131-4/+6
| | | | | | | | | There's a handy function called maybe_store_room_on_invite which allows us to create an entry in the rooms table for a room and its version for which we aren't joined to yet, but we can reference when ingesting events about. This is currently used for invites where we receive some stripped state about the room and pass it down via /sync to the client, without us being in the room yet. There is a similar requirement for knocking, where we will eventually do the same thing, and need an entry in the rooms table as well. Thus, reusing this function works, however its name needs to be generalised a bit. Separated out from #6739.
* Fix typos and spelling errors. (#8639)Patrick Cloke2020-10-231-7/+7
|
* Move third_party_rules check to event creation timeRichard van der Hoff2020-10-131-44/+2
| | | | | Rather than waiting until we handle the event, call the ThirdPartyRules check when we fist create the event.
* Remove redundant calls to third_party_rules in `on_send_{join,leave}`Richard van der Hoff2020-10-131-19/+1
| | | | | There's not much point in calling these *after* we have decided to accept them into the DAG.
* Fix message duplication if something goes wrong after persisting the event ↵Erik Johnston2020-10-131-3/+6
| | | | | (#8476) Should fix #3365.
* Remove stream ordering from Metadata dict (#8452)Richard van der Hoff2020-10-051-0/+3
| | | | | | | | There's no need for it to be in the dict as well as the events table. Instead, we store it in a separate attribute in the EventInternalMetadata object, and populate that on load. This means that we can rely on it being correctly populated for any event which has been persited to the database.
* Move `resolve_events_with_store` into StateResolutionHandlerRichard van der Hoff2020-09-291-5/+8
|
* Mypy fixes for `synapse.handlers.federation` (#8422)Richard van der Hoff2020-09-291-4/+9
| | | For some reason, an apparently unrelated PR upset mypy about this module. Here are a number of little fixes.
* A pair of tiny cleanups in the federation request code. (#8401)Richard van der Hoff2020-09-281-1/+1
|
* Add EventStreamPosition type (#8388)Erik Johnston2020-09-241-6/+10
| | | | | | | | | | | | | | The idea is to remove some of the places we pass around `int`, where it can represent one of two things: 1. the position of an event in the stream; or 2. a token that partitions the stream, used as part of the stream tokens. The valid operations are then: 1. did a position happen before or after a token; 2. get all events that happened before or after a token; and 3. get all events between two tokens. (Note that we don't want to allow other operations as we want to change the tokens to be vector clocks rather than simple ints)
* Merge tag 'v1.20.0rc5' into developPatrick Cloke2020-09-181-8/+57
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Synapse 1.20.0rc5 (2020-09-18) ============================== In addition to the below, Synapse 1.20.0rc5 also includes the bug fix that was included in 1.19.3. Features -------- - Add flags to the `/versions` endpoint for whether new rooms default to using E2EE. ([\#8343](https://github.com/matrix-org/synapse/issues/8343)) Bugfixes -------- - Fix rate limiting of federation `/send` requests. ([\#8342](https://github.com/matrix-org/synapse/issues/8342)) - Fix a longstanding bug where back pagination over federation could get stuck if it failed to handle a received event. ([\#8349](https://github.com/matrix-org/synapse/issues/8349)) Internal Changes ---------------- - Blacklist [MSC2753](https://github.com/matrix-org/matrix-doc/pull/2753) SyTests until it is implemented. ([\#8285](https://github.com/matrix-org/synapse/issues/8285))
| * Intelligently select extremities used in backfill. (#8349)Erik Johnston2020-09-181-8/+57
| | | | | | | | | | | | | | | | | | Instead of just using the most recent extremities let's pick the ones that will give us results that the pagination request cares about, i.e. pick extremities only if they have a smaller depth than the pagination token. This is useful when we fail to backfill an extremity, as we no longer get stuck requesting that same extremity repeatedly.
* | Simplify super() calls to Python 3 syntax. (#8344)Patrick Cloke2020-09-181-1/+1
| | | | | | | | | | | | | | This converts calls like super(Foo, self) -> super(). Generated with: sed -i "" -Ee 's/super\([^\(]+\)/super()/g' **/*.py
* | Use slots in attrs classes where possible (#8296)Patrick Cloke2020-09-141-1/+1
| | | | | | | | | | slots use less memory (and attribute access is faster) while slightly limiting the flexibility of the class attributes. This focuses on objects which are instantiated "often" and for short periods of time.
* | Add experimental support for sharding event persister. Again. (#8294)Erik Johnston2020-09-141-14/+30
| | | | | | | | | | | | This is *not* ready for production yet. Caveats: 1. We should write some tests... 2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
* | Clean up `Notifier.on_new_room_event` code path (#8288)Erik Johnston2020-09-101-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | The idea here is that we pass the `max_stream_id` to everything, and only use the stream ID of the particular event to figure out *when* the max stream position has caught up to the event and we can notify people about it. This is to maintain the distinction between the position of an item in the stream (i.e. event A has stream ID 513) and a token that can be used to partition the stream (i.e. give me all events after stream ID 352). This distinction becomes important when the tokens are more complicated than a single number, which they will be once we start tracking the position of multiple writers in the tokens. The valid operations here are: 1. Is a position before or after a token 2. Fetching all events between two tokens 3. Merging multiple tokens to get the "max", i.e. `C = max(A, B)` means that for all positions P where P is before A *or* before B, then P is before C. Future PR will change the token type to a dedicated type.
* | Remove some unused distributor signals (#8216)Patrick Cloke2020-09-091-42/+1
| | | | | | | | | | Removes the `user_joined_room` and stops calling it since there are no observers. Also cleans-up some other unused signals and related code.
* | Fixup pusher pool notifications (#8287)Erik Johnston2020-09-091-1/+1
| | | | | | | | | | `pusher_pool.on_new_notifications` expected a min and max stream ID, however that was not what we were passing in. Instead, let's just pass it the current max stream ID and have it track the last stream ID it got passed. I believe that it mostly worked as we called the function for every event. However, it would break for events that got persisted out of order, i.e, that were persisted but the max stream ID wasn't incremented as not all preceding events had finished persisting, and push for that event would be delayed until another event got pushed to the effected users.
* | Revert "Fixup pusher pool notifications"Erik Johnston2020-09-091-1/+1
| | | | | | | | This reverts commit e7fd336a53a4ca489cdafc389b494d5477019dc0.
* | Fixup pusher pool notificationsErik Johnston2020-09-091-1/+1
|/
* Revert "Add experimental support for sharding event persister. (#8170)" (#8242)Brendan Abolivier2020-09-041-30/+14
| | | | | | | * Revert "Add experimental support for sharding event persister. (#8170)" This reverts commit 82c1ee1c22a87b9e6e3179947014b0f11c0a1ac3. * Changelog
* Fix typing for `@cached` wrapped functions (#8240)Erik Johnston2020-09-031-5/+5
| | | This requires adding a mypy plugin to fiddle with the type signatures a bit.
* Add experimental support for sharding event persister. (#8170)Erik Johnston2020-09-021-14/+30
| | | | | | This is *not* ready for production yet. Caveats: 1. We should write some tests... 2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
* Convert additional database code to async/await. (#8195)Patrick Cloke2020-08-281-2/+2
|
* Define StateMap as immutable and add a MutableStateMap type. (#8183)Patrick Cloke2020-08-281-6/+14
|
* Add type hints for state. (#8140)Patrick Cloke2020-08-241-4/+6
|
* Convert events worker database to async/await. (#8071)Patrick Cloke2020-08-181-11/+5
|
* Convert synapse.api to async/await (#8031)Patrick Cloke2020-08-061-1/+1
|
* Rename database classes to make some sense (#8033)Erik Johnston2020-08-051-1/+1
|
* Convert a synapse.events to async/await. (#7949)Patrick Cloke2020-07-271-1/+1
|
* Remove hacky error handling for inlineDeferreds. (#7950)Patrick Cloke2020-07-271-9/+5
|
* Fix up types and comments that refer to Deferreds. (#7945)Patrick Cloke2020-07-241-3/+5
|
* Fix deprecation warning: import ABC from collections.abc (#7892)Karthikeyan Singaravelan2020-07-201-1/+1
|
* Reject attempts to join empty rooms over federation (#7859)Richard van der Hoff2020-07-161-2/+13
| | | | | | We shouldn't allow others to make_join through us if we've left the room; reject such attempts with a 404. Fixes #7835. Fixes #6958.
* Fix resync remote devices on receive PDU in worker mode. (#7815)Erik Johnston2020-07-101-8/+19
| | | | | | The replication client requires that arguments are given as keyword arguments, which was not done in this case. We also pull out the logic so that we can catch and handle any exceptions raised, rather than leaving them unhandled.
* Fix recursion error when fetching auth chain over federation (#7817)Erik Johnston2020-07-101-12/+37
| | | | | | | | | | | | | | | When fetching the state of a room over federation we receive the event IDs of the state and auth chain. We then fetch those events that we don't already have. However, we used a function that recursively fetched any missing auth events for the fetched events, which can lead to a lot of recursion if the server is missing most of the auth chain. This work is entirely pointless because would have queued up the missing events in the auth chain to be fetched already. Let's just diable the recursion, since it only gets called from one place anyway.
* Add `HomeServer.signing_key` property (#7805)Richard van der Hoff2020-07-081-1/+1
| | | ... instead of duplicating `config.signing_key[0]` everywhere
* Merge branch 'master' into developPatrick Cloke2020-07-021-3/+3
|\
| * Correctly handle outliers as prev events over federationErik Johnston2020-07-021-3/+3
| |
* | Add early returns to `_check_for_soft_fail` (#7769)Richard van der Hoff2020-07-011-64/+55
| | | | | | | | my editor was complaining about unset variables, so let's add some early returns to fix that and reduce indentation/cognitive load.
* | Type checking for `FederationHandler` (#7770)Richard van der Hoff2020-07-011-17/+30
| | | | | | fix a few things to make this pass mypy.
* | Yield during large v2 state res. (#7735)Erik Johnston2020-06-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | State res v2 across large data sets can be very CPU intensive, and if all the relevant events are in the cache the algorithm will run from start to finish within a single reactor tick. This can result in blocking the reactor tick for several seconds, which can have major repercussions on other requests. To fix this we simply add the occaisonal `sleep(0)` during iterations to yield execution until the next reactor tick. The aim is to only do this for large data sets so that we don't impact otherwise quick resolutions.=
* | Replace all remaining six usage with native Python 3 equivalents (#7704)Dagfinn Ilmari Mannsåker2020-06-161-5/+4
| |
* | Replace iteritems/itervalues/iterkeys with native versions. (#7692)Patrick Cloke2020-06-151-12/+10
| |
* | Add option to enable encryption by default for new rooms (#7639)Andrew Morgan2020-06-101-2/+10
|/ | | | | | | | | Fixes https://github.com/matrix-org/synapse/issues/2431 Adds config option `encryption_enabled_by_default_for_room_type`, which determines whether encryption should be enabled with the default encryption algorithm in private or public rooms upon creation. Whether the room is private or public is decided based upon the room creation preset that is used. Part of this PR is also pulling out all of the individual instances of `m.megolm.v1.aes-sha2` into a constant variable to eliminate typos ala https://github.com/matrix-org/synapse/pull/7637 Based on #7637
* Fix exceptions when fetching events from a down host. (#7622)Erik Johnston2020-06-031-1/+1
| | | We already caught some exceptions, but not all.
* Add option to move event persistence off master (#7517)Erik Johnston2020-05-221-6/+10
|
* Add ability to wait for replication streams (#7542)Erik Johnston2020-05-221-10/+23
| | | | | | | The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room). Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on. People probably want to look at this commit by commit.
* Fix exception reporting due to HTTP request errors. (#7556)Erik Johnston2020-05-221-0/+7
| | | | These are business as usual errors, rather than stuff we want to log at error.
* Convert federation handler to async/await. (#7459)Patrick Cloke2020-05-111-18/+14
|
* async/await is_server_admin (#7363)Andrew Morgan2020-05-011-11/+10
|
* Convert some of the federation handler methods to async/await. (#7338)Patrick Cloke2020-04-241-25/+24
|
* Rewrite prune_old_outbound_device_pokes for efficiency (#7159)Richard van der Hoff2020-03-301-23/+2
| | | | make sure we clear out all but one update for the user
* Store room version on invite (#6983)Richard van der Hoff2020-02-261-0/+12
| | | | | When we get an invite over federation, store the room version in the rooms table. The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...
* Remove redundant store_room call (#6979)Richard van der Hoff2020-02-241-23/+0
| | | | | `_process_received_pdu` is only called by `on_receive_pdu`, which ignores any events for unknown rooms, so this is redundant.
* Upsert room version when we join over federation (#6968)Richard van der Hoff2020-02-241-10/+12
| | | | | | | | This is intended as a precursor to storing room versions when we receive an invite over federation, but has the happy side-effect of fixing #3374 at last. In short: change the store_room with try/except to a proper upsert which updates the right columns.
* Clarify list/set/dict/tuple comprehensions and enforce via flake8 (#6957)Patrick Cloke2020-02-211-9/+9
| | | | Ensure good comprehension hygiene using flake8-comprehensions.
* Limit the number of events that can be requested when backfilling events (#6864)Patrick Cloke2020-02-061-0/+4
| | | Limit the maximum number of events requested when backfilling events.
* pass room version into FederationClient.send_join (#6854)Richard van der Hoff2020-02-061-2/+1
| | | | ... which allows us to sanity-check the create event.
* Merge pull request #6823 from matrix-org/rav/redact_changes/5Richard van der Hoff2020-02-061-6/+2
|\ | | | | pass room versions around
| * Pass room version object into `FederationClient.get_pdu`Richard van der Hoff2020-02-051-6/+2
| |
* | Merge tag 'v1.10.0rc2' into developErik Johnston2020-02-061-14/+60
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | Synapse 1.10.0rc2 (2020-02-06) ============================== Bugfixes -------- - Fix an issue with cross-signing where device signatures were not sent to remote servers. ([\#6844](https://github.com/matrix-org/synapse/issues/6844)) - Fix to the unknown remote device detection which was introduced in 1.10.rc1. ([\#6848](https://github.com/matrix-org/synapse/issues/6848)) Internal Changes ---------------- - Detect unexpected sender keys on remote encrypted events and resync device lists. ([\#6850](https://github.com/matrix-org/synapse/issues/6850))
| * Check sender_key matches on inbound encrypted events. (#6850)Erik Johnston2020-02-051-13/+59
| | | | | | If they don't then the device lists are probably out of sync.
| * Fix detecting unknown devices from remote encrypted events. (#6848)Erik Johnston2020-02-041-1/+1
| | | | | | | | | | | | | | | | We were looking at the wrong event type (`m.room.encryption` vs `m.room.encrypted`). Also fixup the duplicate `EvenTypes` entries. Introduced in #6776.
* | make FederationHandler.send_invite asyncRichard van der Hoff2020-02-031-3/+2
| |
* | make FederationHandler.on_get_missing_events asyncRichard van der Hoff2020-02-031-5/+4
| |
* | make FederationHandler.user_joined_room asyncRichard van der Hoff2020-02-031-3/+3
| |
* | make FederationHandler._clean_room_for_join asyncRichard van der Hoff2020-02-031-4/+4
| |
* | make FederationHandler._notify_persisted_event asyncRichard van der Hoff2020-02-031-4/+6
| |
* | make FederationHandler.persist_events_and_notify asyncRichard van der Hoff2020-02-031-10/+10
| |
* | make FederationHandler._make_and_verify_event asyncRichard van der Hoff2020-02-031-5/+10
| |
* | make FederationHandler.do_remotely_reject_invite asyncRichard van der Hoff2020-02-031-6/+7
| |
* | make FederationHandler._check_for_soft_fail asyncRichard van der Hoff2020-02-031-13/+9
| |
* | make FederationHandler._persist_auth_tree asyncRichard van der Hoff2020-02-031-11/+7
| |
* | make FederationHandler.do_invite_join asyncRichard van der Hoff2020-02-031-16/+14
| |
* | make FederationHandler.on_event_auth asyncRichard van der Hoff2020-02-031-5/+4
| |
* | make FederationHandler.on_exchange_third_party_invite_request asyncRichard van der Hoff2020-02-031-14/+12
| |
* | make FederationHandler.construct_auth_difference asyncRichard van der Hoff2020-02-031-3/+4
| |
* | make FederationHandler._update_context_for_auth_events asyncRichard van der Hoff2020-02-031-10/+10
| |
* | make FederationHandler._update_auth_events_and_context_for_auth asyncRichard van der Hoff2020-02-031-20/+21
| |
* | make FederationHandler.do_auth asyncRichard van der Hoff2020-02-031-10/+14
| |
* | make FederationHandler._prep_event asyncRichard van der Hoff2020-02-031-23/+10
| |
* | make FederationHandler._handle_new_event asyncRichard van der Hoff2020-02-031-6/+7
| |
* | make FederationHandler._handle_new_events asyncRichard van der Hoff2020-02-031-8/+6
| |
* | make FederationHandler.on_make_leave_request asyncRichard van der Hoff2020-02-031-13/+10
| |
* | make FederationHandler.on_send_leave_request asyncRichard van der Hoff2020-02-031-5/+3
| |
* | make FederationHandler.on_make_join_request asyncRichard van der Hoff2020-02-031-13/+10
| |
* | make FederationHandler.on_invite_request asyncRichard van der Hoff2020-02-031-5/+4
| |
* | make FederationHandler.on_send_join_request asyncRichard van der Hoff2020-02-031-9/+7
| |
* | make FederationHandler.on_query_auth asyncRichard van der Hoff2020-02-031-7/+6
|/
* pass room_version into compute_event_signature (#6807)Richard van der Hoff2020-01-311-1/+4
|
* Merge pull request #6820 from matrix-org/rav/get_room_version_idRichard van der Hoff2020-01-311-9/+9
|\ | | | | Make `get_room_version` return a RoomVersion object
| * s/get_room_version/get_room_version_id/Richard van der Hoff2020-01-311-9/+9
| | | | | | | | | | ... to make way for a forthcoming get_room_version which returns a RoomVersion object.
* | Fix bug with getting missing auth event during join 500'ed (#6810)Erik Johnston2020-01-311-1/+5
|/
* pass room version into FederationHandler.on_invite_request (#6805)Richard van der Hoff2020-01-301-3/+3
|
* Resync remote device list when detected as stale. (#6786)Erik Johnston2020-01-301-2/+16
|
* Detect unknown remote devices and mark cache as stale (#6776)Erik Johnston2020-01-281-0/+20
| | | | We just mark the fact that the cache may be stale in the database for now.
* Pass room version object into event_auth.check and check_redaction (#6788)Richard van der Hoff2020-01-281-7/+11
| | | | | | | These are easier to work with than the strings and we normally have one around. This fixes `FederationHander._persist_auth_tree` which was passing a RoomVersion object into event_auth.check instead of a string.
* Add `rooms.room_version` column (#6729)Erik Johnston2020-01-271-15/+50
| | | This is so that we don't have to rely on pulling it out from `current_state_events` table.
* Add StateMap type alias (#6715)Erik Johnston2020-01-161-6/+4
|
* Fix conditions failing if min_depth = 0Brendan Abolivier2020-01-071-2/+2
| | | | This could result in Synapse not fetching prev_events for new events in the room if it has missed some events.
* Merge branch 'master' into developRichard van der Hoff2019-12-201-1/+4
|\
| * Fix exceptions when attempting to backfill (#6576)Richard van der Hoff2019-12-201-1/+4
| | | | | | Fixes #6575
* | Change EventContext to use the Storage class (#6564)Erik Johnston2019-12-201-7/+7
| |
* | Merge release-v1.7.1 into developRichard van der Hoff2019-12-181-0/+1
|\|
| * Exclude rejected state events when calculating state at backwards extrems ↵Richard van der Hoff2019-12-161-1/+1
| | | | | | | | | | (#6527) This fixes a weird bug where, if you were determined enough, you could end up with a rejected event forming part of the state at a backwards-extremity. Authing that backwards extrem would then lead to us trying to pull the rejected event from the db (with allow_rejected=False), which would fail with a 404.
| * Persist auth/state events at backwards extremities when we fetch them (#6526)Richard van der Hoff2019-12-161-163/+80
| | | | | | | | The main point here is to make sure that the state returned by _get_state_in_room has been authed before we try to use it as state in the room.
| * sanity-checking for events used in state res (#6531)Richard van der Hoff2019-12-161-0/+1
| | | | | | | | | | When we perform state resolution, check that all of the events involved are in the right room.
| * Check the room_id of events when fetching room state/auth (#6524)Richard van der Hoff2019-12-161-24/+54
| | | | | | | | | | | | | | | | | | | | | | When we request the state/auth_events to populate a backwards extremity (on backfill or in the case of missing events in a transaction push), we should check that the returned events are in the right room rather than blindly using them in the room state or auth chain. Given that _get_events_from_store_or_dest takes a room_id, it seems clear that it should be sanity-checking the room_id of the requested events, so let's do it there.
| * Add `include_event_in_state` to _get_state_for_room (#6521)Richard van der Hoff2019-12-161-18/+21
| | | | | | | | | | | | Make it return the state *after* the requested event, rather than the one before it. This is a bit easier and requires fewer calls to get_events_from_store_or_dest.
| * Move get_state methods into FederationHandler (#6503)Richard van der Hoff2019-12-161-6/+95
| | | | | | | | | | This is a non-functional refactor as a precursor to some other work.
* | Exclude rejected state events when calculating state at backwards extrems ↵Richard van der Hoff2019-12-161-1/+1
| | | | | | | | | | (#6527) This fixes a weird bug where, if you were determined enough, you could end up with a rejected event forming part of the state at a backwards-extremity. Authing that backwards extrem would then lead to us trying to pull the rejected event from the db (with allow_rejected=False), which would fail with a 404.
* | Persist auth/state events at backwards extremities when we fetch them (#6526)Richard van der Hoff2019-12-161-167/+80
| | | | | | The main point here is to make sure that the state returned by _get_state_in_room has been authed before we try to use it as state in the room.