summary refs log tree commit diff
path: root/synapse (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Fix bug where typing replication breaks (#17252)Erik Johnston2024-05-311-3/+3
| | | | This can happen on restarts of the service, due to old rooms being pruned.
* Fix logging errors when receiving invalid User ID for key querys (#17250)Erik Johnston2024-05-311-0/+5
|
* Fix sentry default tags (#17251)Erik Johnston2024-05-311-10/+10
| | | | | This was broken by the sentry 2.0 upgrade Broke in v1.108.0
* In sync wait for worker to catch up since token (#17215)Erik Johnston2024-05-305-3/+131
| | | | | | | Otherwise things will get confused. An alternative would be to make sure that for lagging stream we don't return anything (and make sure the returned next_batch token doesn't go backwards). But that is a faff.
* Fix deduplicating of membership events to not create unused state groups. ↵Erik Johnston2024-05-302-35/+32
| | | | | | | | | | | | | (#17164) We try and deduplicate in two places: 1) really early on, and 2) just before we persist the event. The first case was broken due to it occuring before the profile information was added, and so it thought the event contents were different. The second case did catch it and handle it correctly, however doing so creates a redundant state group leading to bloat. Fixes #3791
* Replaces all usages of `StreamIdGenerator` with `MultiWriterIdGenerator` ↵Erik Johnston2024-05-308-224/+225
| | | | | (#17229) Replaces all usages of `StreamIdGenerator` with `MultiWriterIdGenerator`, which is safer.
* Clean out invalid destinations from outbox (#17242)Erik Johnston2024-05-302-0/+91
| | | | We started ensuring we only insert valid destinations: https://github.com/element-hq/synapse/pull/17240
* Ensure we delete media if we reject due to spam check (#17246)Erik Johnston2024-05-302-32/+32
| | | | | | | | Fixes up #17239 We need to keep the spam check within the `try/except` block. Also makes it so that we don't enter the top span twice. Also also ensures that we get the right thumbnail length.
* Move towards using `MultiWriterIdGenerator` everywhere (#17226)Erik Johnston2024-05-298-215/+153
| | | | | | | | | | | | | | | There is a problem with `StreamIdGenerator` where it can go backwards over restarts when a stream ID is requested but then not inserted into the DB. This is problematic if we want to land #17215, and is generally a potential cause for all sorts of nastiness. Instead of trying to fix `StreamIdGenerator`, we may as well move to `MultiWriterIdGenerator` that does not suffer from this problem (the latest positions are stored in `stream_positions` table). This involves adding SQLite support to the class. This only changes id generators that were already using `MultiWriterIdGenerator` under postgres, a separate PR will move the rest of the uses of `StreamIdGenerator` over.
* Don't invalidate all `get_relations_for_event` on history purge (#17083)Erik Johnston2024-05-295-13/+40
| | | | This is a tree cache already, so may as well move the room ID to the front and use that
* Change allow_unsafe_locale to also apply on new databases (#17238)Erik Johnston2024-05-291-1/+7
| | | | We relax this as there are use cases where this is safe, though it is still highly recommended that people avoid using it.
* Ignore attempts to send to-device messages to bad users (#17240)Erik Johnston2024-05-291-0/+7
| | | | | | | | Currently sending a to-device message to a user ID with a dodgy destination is accepted, but then ends up spamming the logs when we try and send to the destination. An alternative would be to reject the request, but I'm slightly nervous that could break things.
* Handle duplicate OTK uploads racing (#17241)Erik Johnston2024-05-291-33/+45
| | | Currently this causes one of then to 500.
* Fix slipped logging context when media rejected (#17239)Erik Johnston2024-05-293-77/+40
| | | | | | | When a module rejects a piece of media we end up trying to close the same logging context twice. Instead of fixing the existing code we refactor to use an async context manager, which is easier to write correctly.
* Support MSC3916 by adding unstable media endpoints to `_matrix/client` (#17213)Shay2024-05-244-468/+703
| | | | | | | | | | [MSC3916](https://github.com/matrix-org/matrix-spec-proposals/blob/rav/authentication-for-media/proposals/3916-authentication-for-media.md) adds new media endpoints under `_matrix/client`. This PR adds the `/preview_url`, `/config`, and `/thumbnail` endpoints. `/download` will be added in a follow-up PR once the work for the federation `/download` endpoint is complete (see https://github.com/element-hq/synapse/pull/17172). Should be reviewable commit-by-commit.
* Add Sliding Sync `/sync/e2ee` endpoint for To-Device messages (#17167)Eric Eastwood2024-05-233-10/+411
| | | | | | | | | | | This is being introduced as part of Sliding Sync but doesn't have any sliding window component. It's just a way to get E2EE events without having to sit through a big initial sync (`/sync` v2). And we can avoid encryption events being backed up by the main sync response or vice-versa. Part of some Sliding Sync simplification/experimentation. See [this discussion](https://github.com/element-hq/synapse/pull/17167#discussion_r1610495866) for why it may not be as useful as we thought. Based on: - https://github.com/matrix-org/matrix-spec-proposals/pull/3575 - https://github.com/matrix-org/matrix-spec-proposals/pull/3885 - https://github.com/matrix-org/matrix-spec-proposals/pull/3884
* Log exceptions when failing to auto-join new user according to the ↵reivilibre2024-05-221-1/+1
| | | | | | | `auto_join_rooms` option. (#17176) Would have been useful for tracking down #16878. Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
* Add logging to tasks managed by the task scheduler, showing CPU and database ↵reivilibre2024-05-221-2/+67
| | | | | | | | | | | | | | | | | | | usage. (#17219) The log format is the same as the request log format, except: - fields that are specific to HTTP requests have been removed - the task's params are included at the end of the log line. These log lines are emitted: - when the task function finishes — both completion and failure (and I suppose it is possible for a task to become schedulable again?) - every 5 minutes whilst it is running Closes #17217. --------- Signed-off-by: Olivier 'reivilibre <oliverw@matrix.org>
* Reduce work of calculating outbound device pokes (#17211)Erik Johnston2024-05-222-0/+31
|
* Bring auto-accept invite logic into Synapse (#17147)devonh2024-05-216-1/+250
| | | | | | | | | | | | | | This PR ports the logic from the [synapse_auto_accept_invite](https://github.com/matrix-org/synapse-auto-accept-invite) module into synapse. I went with the naive approach of injecting the "module" next to where third party modules are currently loaded. If there is a better/preferred way to handle this, I'm all ears. It wasn't obvious to me if there was a better location to add this logic that would cleanly apply to all incoming invite events. Relies on https://github.com/element-hq/synapse/pull/17166 to fix linter errors.
* Improve perf of sync device lists (#17216)Erik Johnston2024-05-214-62/+102
| | | | | | | | Re-introduces #17191, and includes #17197 and #17214 The basic idea is to stop calling `get_rooms_for_user` everywhere, and instead use the table `device_lists_changes_in_room`. Commits reviewable one-by-one.
* Add a short sleep if the request is rate-limited (#17210)Erik Johnston2024-05-181-0/+4
| | | This helps prevent clients from "tight-looping" retrying their request.
* Refactor `SyncResultBuilder` assembly to its own function (#17202)Eric Eastwood2024-05-161-116/+148
| | | | | | We will re-use `get_sync_result_builder(...)` in https://github.com/element-hq/synapse/pull/17167 Split out from https://github.com/element-hq/synapse/pull/17167
* Fix `joined_rooms`/`joined_room_ids` usage (#17208)Eric Eastwood2024-05-161-1/+1
| | | | | | | | This change was introduced in https://github.com/element-hq/synapse/pull/17203 But then https://github.com/element-hq/synapse/pull/17207 was reverted which brought back usage `joined_rooms` that needed to be updated. Wasn't caught because `develop` wasn't up to date before merging.
* Rename to be obvious: `joined_rooms` -> `joined_room_ids` (#17203)Eric Eastwood2024-05-161-2/+2
| | | Split out from https://github.com/element-hq/synapse/pull/17167
* Removed `request_key` from the `SyncConfig` (moved outside as its own ↵Eric Eastwood2024-05-162-4/+4
| | | | | | | | | function parameter) (#17201) Removed `request_key` from the `SyncConfig` (moved outside as its own function parameter) so it doesn't have to flow into `_generate_sync_entry_for_xxx` methods. This way we can separate the concerns of caching from generating the response and reuse the `_generate_sync_entry_for_xxx` functions as we see fit. Plus caching doesn't really have anything to do with the config of sync. Split from https://github.com/element-hq/synapse/pull/17167 Spawning from https://github.com/element-hq/synapse/pull/17167#discussion_r1601497279
* Revert "Improve perf of sync device lists" (#17207)Erik Johnston2024-05-162-8/+46
| | | Reverts element-hq/synapse#17191
* Fix bug where push rules would be empty in `/sync` (#17142)Erik Johnston2024-05-161-12/+8
| | | | | | Fixes #16987 Some old accounts seem to have an entry in global account data table for push rules, which we should ignore
* Refactor Sync handler to be able to return different sync responses ↵Eric Eastwood2024-05-162-7/+60
| | | | | | | | | | | | | | (`SyncVersion`) (#17200) Refactor Sync handler to be able to be able to return different sync responses (`SyncVersion`). Preparation to be able support sync v2 and a new Sliding Sync `/sync/e2ee` endpoint which returns a subset of sync v2. Split upon request: https://github.com/element-hq/synapse/pull/17167#discussion_r1601497279 Split from https://github.com/element-hq/synapse/pull/17167 where we will add `SyncVersion.E2EE_SYNC` and a new type of sync response.
* Cache literal sync filter validation (#17186)Erik Johnston2024-05-141-1/+13
| | | | The sliding sync proxy (amongst other things) use literal json blobs as filters, and repeatedly validating them takes a bunch of CPU.
* Reduce pauses on large device list changes (#17192)Erik Johnston2024-05-141-3/+10
| | | | For large accounts waking up all the relevant notifier streams can cause pauses of the reactor.
* Improve perf of sync device lists (#17191)Erik Johnston2024-05-142-46/+8
| | | | | It's almost always more efficient to query the rooms that have device list changes, rather than looking at the list of all users whose devices have changed and then look for shared rooms.
* Allows CAS SSO flow to provide user IDs composed of numbers only (#17098)Aurélien Grimpard2024-05-142-0/+18
|
* An federation whitelist query endpoint extension (#16848)Erik Johnston2024-05-133-0/+74
| | | | | | | | | | This is to allow clients to query the configured federation whitelist. Disabled by default. --------- Co-authored-by: Devon Hudson <devonhudson@librem.one> Co-authored-by: devonh <devon.dmytro@gmail.com> Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Actually fix public rooms (#17184)Erik Johnston2024-05-131-54/+54
| | | | | See #17177. I'm an idiot and moved them to the wrong store :facepalm:
* Fix bug with creating public rooms on workers (#17177)Erik Johnston2024-05-131-65/+51
| | | | | | If room publication is disabled then creating public rooms on workers would not work. Introduced in #16811.
* Fix undiscovered linter errors (#17166)devonh2024-05-081-3/+11
| | | | | | Linter errors are showing up in #17147 that are unrelated to that PR. The errors do not currently show up on develop. This PR aims to resolve the linter errors separately from #17147.
* Optional whitespace support in Authorization (#1350) (#17145)Timshel2024-05-081-1/+5
| | | | Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Add note about MSC3886 being closed (#17151)Hugh Nimmo-Smith2024-05-081-0/+3
|
* Add optimisation to `StreamChangeCache` (#17130)Erik Johnston2024-05-061-1/+19
| | | | | | | When there have been lots of changes compared with the number of entities, we can do a fast(er) path. Locally I ran some benchmarking, and the comparison seems to give the best determination of which method we use.
* Fix bug where `StreamChangeCache` would not respect cache factors (#17152)Erik Johnston2024-05-031-1/+1
| | | Annoyingly mypy didn't pick up this typo.
* Add support for MSC3823 - Account Suspension (#17051)Shay2024-05-016-3/+105
|
* Apply user `email` & `picture` during OIDC registration if present & ↵devonh2024-04-292-1/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | selected (#17120) This change will apply the `email` & `picture` provided by OIDC to the new user account when registering a new user via OIDC. If the user is directed to the account details form, this change makes sure they have been selected before applying them, otherwise they are omitted. In particular, this change ensures the values are carried through when Synapse has consent configured, and the redirect to the consent form/s are followed. I have tested everything manually. Including: - with/without consent configured - allowing/not allowing the use of email/avatar (via `sso_auth_account_details.html`) - with/without automatic account detail population (by un/commenting the `localpart_template` option in synapse config). ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [X] Pull request is based on the develop branch * [X] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [X] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters))
* Add support for MSC4115 (#17104)Richard van der Hoff2024-04-2914-28/+139
| | | | Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Ensure that incoming to-device messages are not dropped (#17127)Richard van der Hoff2024-04-292-18/+29
| | | | | | | | | | | | | | | | | | | | ... when workers are unreachable, etc. Fixes https://github.com/element-hq/synapse/issues/17117. The general principle is just to make sure that we propagate any exceptions to the JsonResource, so that we return an error code to the sending server. That means that the sending server no longer considers the message safely sent, so it will retry later. In the issue, Erik mentions that an alternative solution would be to persist the to-device messages into a table so that they can be retried. This might be an improvement for performance, but even if we did that, we still need this mechanism, since we might be unable to reach the database. So, if we want to do that, it can be a later follow-up. --------- Co-authored-by: Erik Johnston <erik@matrix.org>
* Declare support for Matrix v1.10. (#17082)Patrick Cloke2024-04-291-0/+1
| | | | | Pretty straightforward. 😄 Fixes #17021
* Fix filtering of rooms when supplying the `destination` query parameter to ↵Andrew Morgan2024-04-261-0/+1
| | | | `/_synapse/admin/v1/federation/destinations/<destination>/rooms` (#17077)
* Improve error message for cross signing reset with MSC3861 enabled (#17121)Michael Telatynski2024-04-261-5/+8
|
* Use recommended endpoint for MSC3266 requests (#17078)Andrew Ferrazzutti2024-04-261-0/+6
| | | | | Keep the existing endpoint for backwards compatibility Signed-off-by: Andrew Ferrazzutti <andrewf@element.io>
* Return the search terms as search highlights for SQLite instead of nothing ↵Melvyn Laïly2024-04-261-7/+24
| | | | | | | (#17000) Fixes https://github.com/element-hq/synapse/issues/16999 and https://github.com/element-hq/element-android/pull/8729 by returning the search terms as search highlights.
* Redact membership events if the user requested erasure upon deactivating ↵Till2024-04-252-1/+34
| | | | | (#17076) Fixes #15355 by redacting all membership events before leaving rooms.
* MSC4108 implementation (#17056)Quentin Gliech2024-04-258-5/+134
| | | | | | Co-authored-by: Hugh Nimmo-Smith <hughns@element.io> Co-authored-by: Hugh Nimmo-Smith <hughns@users.noreply.github.com> Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Add type annotation to `visited_chains` (#17125)Andrew Morgan2024-04-251-1/+1
| | | | | This should fix CI on `develop`. Broke in https://github.com/element-hq/synapse/commit/0fe9e1f7dafa80f3e02762f7ae75cefee5b3316c, presumably due to a `mypy` dependency upgrade.
* Merge branch 'master' into developErik Johnston2024-04-232-73/+43
|\
| * Fix GHSA-3h7q-rfh9-xm4vErik Johnston2024-04-232-73/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Weakness in auth chain indexing allows DoS from remote room members through disk fill and high CPU usage. A remote Matrix user with malicious intent, sharing a room with Synapse instances before 1.104.1, can dispatch specially crafted events to exploit a weakness in how the auth chain cover index is calculated. This can induce high CPU consumption and accumulate excessive data in the database of such instances, resulting in a denial of service. Servers in private federations, or those that do not federate, are not affected.
* | Send an email if the address is already bound to an user account (#16819)mcalinghee2024-04-235-2/+60
| | | | | | | | Co-authored-by: Mathieu Velten <mathieu.velten@beta.gouv.fr> Co-authored-by: Olivier D <odelcroi@gmail.com>
* | Parse json validation (#16923)Gordan Trevis2024-04-183-47/+106
| | | | | | | | Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* | Fix remote receipts for events we don't have (#17096)Erik Johnston2024-04-171-1/+5
| | | | | | Introduced in #17032
* | Support for MSC4108 via delegation (#17086)Quentin Gliech2024-04-174-4/+53
| | | | | | | | | | | | | | This adds support for MSC4108 via delegation, similar to what has been done for MSC3886 --------- Co-authored-by: Hugh Nimmo-Smith <hughns@element.io>
* | Parse Integer negative value validation (#16920)Gordan Trevis2024-04-167-156/+85
| |
* | bugfix: make msc3967 idempotent (#16943)Kegan Dougal2024-04-152-2/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | MSC3967 was updated recently to make it more robust to network failures: > there is an existing cross-signing master key and it exactly matches the cross-signing master key provided in the request body. If there are any additional keys provided in the request (self signing key, user signing key) they MUST also match the existing keys stored on the server. In other words, the request contains no new keys. If there are new keys, UIA MUST be performed. https://github.com/matrix-org/matrix-spec-proposals/blob/hughns/device-signing-upload-uia/proposals/3967-device-signing-upload-uia.md#proposal This covers the case where the 200 OK is lost in transit so the client retries the upload, only to then get UIA'd. Complement tests: https://github.com/matrix-org/complement/pull/713 - passing example https://github.com/element-hq/synapse/actions/runs/7976948122/job/21778795094?pr=16943#step:7:8820 ### Pull Request Checklist <!-- Please read https://element-hq.github.io/synapse/latest/development/contributing_guide.html before submitting your pull request --> * [x] Pull request is based on the develop branch * [x] Pull request includes a [changelog file](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#changelog). The entry should: - Be a short description of your change which makes sense to users. "Fixed a bug that prevented receiving messages from other servers." instead of "Moved X method from `EventStore` to `EventWorkerStore`.". - Use markdown where necessary, mostly for `code blocks`. - End with either a period (.) or an exclamation mark (!). - Start with a capital letter. - Feel free to credit yourself, by adding a sentence "Contributed by @github_username." or "Contributed by [Your Name]." to the end of the entry. * [x] [Code style](https://element-hq.github.io/synapse/latest/code_style.html) is correct (run the [linters](https://element-hq.github.io/synapse/latest/development/contributing_guide.html#run-the-linters)) --------- Co-authored-by: reivilibre <oliverw@matrix.org>
* | Use receipts `event_stream_ordering` instead of joins (#17032)Nick Mills-Barrett2024-04-122-19/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resurrecting https://github.com/matrix-org/synapse/pull/13918. This should reduce IOPs incurred by joining to the events table to lookup stream ordering, which happens in many receipt handling code paths. Like the previous PR I believe sufficient time has passed between the original migration in DB schema 72 and now to merge this as-is. It's highly unlikely that both the migration is still ongoing AND (active) users still have any receipts prior to that date. In the unlikely event there is a receipt without a populated `event_stream_ordering` synapse will behave just as it does now when receipts exist for events that don't (yet): for push action calculation the receipts are just ignored. I've removed the validation on event IDs as this is already covered here: https://github.com/element-hq/synapse/blob/59ceabcb9798793cd4312fdbcced4e612aeda84d/synapse/handlers/receipts.py#L189-L192
* | Fix mypy on latest Twisted release (#17036)Erik Johnston2024-04-113-4/+6
|/ | | | | | `ITransport.abortConnection` isn't a thing, but `HTTPChannel.forceAbortClient` calls it, so lets just use that Fixes https://github.com/element-hq/synapse/issues/16728
* Stabilize support for MSC4010: push rules & account data. (#17022)Patrick Cloke2024-04-092-28/+6
| | | | | | | See [MSC4010](https://github.com/matrix-org/matrix-spec-proposals/pull/4010), but this is pretty much just removing an experimental flag. Part of #17021
* Stabliize support for MSC3981: recurse /relations (#17023)Patrick Cloke2024-04-093-13/+5
| | | | | | | See [MSC3981](https://github.com/matrix-org/matrix-spec-proposals/pull/3981), this pretty much just removes flags though. Part of #17021
* Also check if first event matches the last in prev batch (#17066)Erik Johnston2024-04-091-7/+13
| | | | | Refinement of #17064 cc @richvdh
* Fix PR #16677, a parameter was missing in a function call (#17033)Mathieu Velten2024-04-091-0/+1
| | | | Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Add forgotten schema delta (#17054)Erik Johnston2024-04-092-7/+22
| | | This should have been in #17045. Whoops.
* Fixups to #17064 (#17065)Erik Johnston2024-04-081-0/+3
| | | | | Forget a line, and an empty batch is trivially linear. c.f. #17064
* Add back fast path for non-gappy syncs (#17064)Erik Johnston2024-04-081-0/+36
| | | | | | | | | | | PR #16942 removed an invalid optimisation that avoided pulling out state for non-gappy syncs. This causes a large increase in DB usage. c.f. #16941 for why that optimisation was wrong. However, we can still optimise in the simple case where the events in the timeline are a linear chain without any branching/merging of the DAG. cc. @richvdh
* Pull out fewer receipts from DB when doing push (#17049)Erik Johnston2024-04-051-22/+102
| | | | | | | Before we were pulling out *all* read receipts for a user for every event we pushed. Instead let's only pull out the relevant receipts. This also pulled out the event rows for each receipt, causing load on the events table.
* Fix bug in calculating state for non-gappy syncs (#16942)Richard van der Hoff2024-04-041-54/+37
| | | | | | | | | Unfortunately, the optimisation we applied here for non-gappy syncs is not actually valid. Fixes https://github.com/element-hq/synapse/issues/16941. ~~Based on https://github.com/element-hq/synapse/pull/16930.~~ Requires https://github.com/matrix-org/sytest/pull/1374.
* `/sync`: fix bug in calculating `state` response (#16930)Richard van der Hoff2024-04-041-41/+13
| | | | | | | Fix a long-standing issue which could cause state to be omitted from the sync response if the last event was filtered out. Fixes: https://github.com/element-hq/synapse/issues/16928
* Fix bug in `/sync` response for archived rooms (#16932)Richard van der Hoff2024-04-041-14/+107
| | | | | | | | | | | | This PR fixes a very, very niche edge-case, but I've got some more work coming which will otherwise make the problem worse. The bug happens when the syncing user leaves a room, and has a sync filter which includes "left" rooms, but sets the timeline limit to 0. In that case, the state returned in the `state` section is calculated incorrectly. The fix is to pass a token corresponding to the point that the user leaves the room through to `compute_state_delta`.
* Add missing index to `access_tokens` table (#17045)Erik Johnston2024-04-041-0/+7
| | | This was causing sequential scans when using refresh tokens.
* Refactor chain fetching (#17044)Erik Johnston2024-04-021-96/+66
| | | Since these queries are duplicated in two places.
* Fixups to new push stream (#17038)Erik Johnston2024-03-285-9/+15
| | | Follow on from #17037
* Add support for moving `/push_rules` off of main process (#17037)Erik Johnston2024-03-288-40/+125
|
* Fix OIDC login regression (#17031)Erik Johnston2024-03-261-0/+7
| | | | | Requests may require a User-Agent header, and the change in #16972 accidentally removed it, resulting in requests getting rejected causing login to fail.
* Ensure that pending to-device events are sent over federation at startup ↵Richard van der Hoff2024-03-223-31/+126
| | | | | | | | | | | | | | (#16925) Fixes https://github.com/element-hq/synapse/issues/16680, as well as a related bug, where servers which we had *never* successfully sent an event to would not be retried. In order to fix the case of pending to-device messages, we hook into the existing `wake_destinations_needing_catchup` process, by extending it to look for destinations that have pending to-device messages. The federation transmission loop then attempts to send the pending to-device messages as normal.
* Add OIDC config to add extra parameters to the authorize URL (#16971)Mathieu Velten2024-03-222-6/+20
|
* Do not refuse to set read_marker if previous event_id is in wrong room (#16990)SpiritCroc2024-03-212-5/+7
|
* Fix reject knocks on deactivating account (#17010)Hanadi2024-03-212-10/+31
|
* OIDC: try to JWT decode userinfo response if JSON parsing failed (#16972)Mathieu Velten2024-03-211-4/+28
|
* Update power level default for public rooms (#16907)Shay2024-03-191-1/+1
|
* Improve event validation (#16908)Shay2024-03-193-1/+26
| | | As the title states.
* Pass module API to OIDC mapping provider (#16974)Mathieu Velten2024-03-191-3/+14
| | | | As done for SAML mapping provider, let's pass the module API to the OIDC one so the mapper can do more logic in its code.
* Clarify docs for some room state functions (#16950)Richard van der Hoff2024-03-191-3/+5
| | | | State *before* an event is different to state *after* that event, and people tend to assume the wrong one.
* `/sync`: Fix edge-case in calculating the "device_lists" response (#16949)Richard van der Hoff2024-03-141-2/+9
| | | | | Fixes https://github.com/element-hq/synapse/issues/16948. If the `join` and the `leave` are in the same sync response, we need to count them as a "left" user.
* Split up `SyncHandler.compute_state_delta` (#16929)Richard van der Hoff2024-03-141-145/+236
| | | | | | This is a huge method, which melts my brain. This is a non-functional change which lays some groundwork for future work in this area.
* Improve lock performance when a lot of locks are waiting (#16840)Mathieu Velten2024-03-141-6/+9
| | | | | | | | | | | | When a lot of locks are waiting for a single lock, notifying all locks independently with `call_later` on each release is really costly and incurs some kind of async contention, where the CPU is spinning a lot for not much. The included test is taking around 30s before the change, and 0.5s after. It was found following failing tests with https://github.com/element-hq/synapse/pull/16827.
* Bump ruff from 0.1.14 to 0.3.2 (#16994)dependabot[bot]2024-03-131-7/+0
|
* Bump mypy from 1.5.1 to 1.8.0 (#16901)dependabot[bot]2024-03-1310-30/+19
|
* Bump black from 23.10.1 to 24.2.0 (#16936)dependabot[bot]2024-03-1365-453/+349
|
* Prevent locking up while processing batched_auth_events (#16968)Gerrit Gogel2024-03-121-9/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This PR aims to fix #16895, caused by a regression in #7 and not fixed by #16903. The PR #16903 only fixes a starvation issue, where the CPU isn't released. There is a second issue, where the execution is blocked. This theory is supported by the flame graphs provided in #16895 and the fact that I see the CPU usage reducing and far below the limit. Since the changes in #7, the method `check_state_independent_auth_rules` is called with the additional parameter `batched_auth_events`: https://github.com/element-hq/synapse/blob/6fa13b4f927c10b5f4e9495be746ec28849f5cb6/synapse/handlers/federation_event.py#L1741-L1743 It makes the execution enter this if clause, introduced with #15195 https://github.com/element-hq/synapse/blob/6fa13b4f927c10b5f4e9495be746ec28849f5cb6/synapse/event_auth.py#L178-L189 There are two issues in the above code snippet. First, there is the blocking issue. I'm not entirely sure if this is a deadlock, starvation, or something different. In the beginning, I thought the copy operation was responsible. It wasn't. Then I investigated the nested `store.get_events` inside the function `update`. This was also not causing the blocking issue. Only when I replaced the set difference operation (`-` ) with a list comprehension, the blocking was resolved. Creating and comparing sets with a very large amount of events seems to be problematic. This is how the flamegraph looks now while persisting outliers. As you can see, the execution no longer locks up in the above function. ![output_2024-02-28_13-59-40](https://github.com/element-hq/synapse/assets/13143850/6db9c9ac-484f-47d0-bdde-70abfbd773ec) Second, the copying here doesn't serve any purpose, because only a shallow copy is created. This means the same objects from the original dict are referenced. This fails the intention of protecting these objects from mutation. The review of the original PR https://github.com/matrix-org/synapse/pull/15195 had an extensive discussion about this matter. Various approaches to copying the auth_events were attempted: 1) Implementing a deepcopy caused issues due to builtins.EventInternalMetadata not being pickleable. 2) Creating a dict with new objects akin to a deepcopy. 3) Creating a dict with new objects containing only necessary attributes. Concluding, there is no easy way to create an actual copy of the objects. Opting for a deepcopy can significantly strain memory and CPU resources, making it an inefficient choice. I don't see why the copy is necessary in the first place. Therefore I'm proposing to remove it altogether. After these changes, I was able to successfully join these rooms, without the main worker locking up: - #synapse:matrix.org - #element-android:matrix.org - #element-web:matrix.org - #ecips:matrix.org - #ipfs-chatter:ipfs.io - #python:matrix.org - #matrix:matrix.org
* deactivated flag refactored to filter deactivated users. (#16874)Alexander Fechler2024-03-113-5/+27
| | | | Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Stabilize support for Retry-After header (MSC4014) (#16947)Patrick Cloke2024-03-082-12/+2
|
* Fix joining remote rooms when a `on_new_event` callback is registered (#16973)Quentin Gliech2024-03-062-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since Synapse 1.76.0, any module which registers a `on_new_event` callback would brick the ability to join remote rooms. This is because this callback tried to get the full state of the room, which would end up in a deadlock. Related: https://github.com/matrix-org/synapse-auto-accept-invite/issues/18 The following module would brick the ability to join remote rooms: ```python from typing import Any, Dict, Literal, Union import logging from synapse.module_api import ModuleApi, EventBase logger = logging.getLogger(__name__) class MyModule: def __init__(self, config: None, api: ModuleApi): self._api = api self._config = config self._api.register_third_party_rules_callbacks( on_new_event=self.on_new_event, ) async def on_new_event(self, event: EventBase, _state_map: Any) -> None: logger.info(f"Received new event: {event}") @staticmethod def parse_config(_config: Dict[str, Any]) -> None: return None ``` This is technically a breaking change, as we are now passing partial state on the `on_new_event` callback. However, this callback was broken for federated rooms since 1.76.0, and local rooms have full state anyway, so it's unlikely that it would change anything.
* Revert "Improve DB performance of calculating badge counts for push. ↵Andrew Morgan2024-03-052-147/+114
| | | | (#16756)" (#16979)
* Don't lock up when joining large rooms (#16903)Erik Johnston2024-02-201-9/+17
| | | | Co-authored-by: Andrew Morgan <andrew@amorgan.xyz>
* bugfix: always prefer unthreaded receipt when >1 exist (MSC4102) (#16927)kegsay2024-02-201-3/+18
| | | | Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Add metric for emails sent (#16881)Remi Rampin2024-02-141-0/+23
| | | | | This adds a counter `synapse_emails_sent_total` for emails sent. They are broken down by `type`, which are `password_reset`, `registration`, `add_threepid`, `notification` (matching the methods of `Mailer`).
* Don't invalidate the entire event cache when we purge history (#16905)Erik Johnston2024-02-133-7/+68
| | | | | We do this by adding support to the LRU cache for "extra indices" based on the cached value. This allows us to efficiently map from room ID to the cached events and only invalidate those.
* Add a config to not send out device list updates for specific users (#16909)Erik Johnston2024-02-132-2/+19
| | | | | | | | | List of users not to send out device list updates for when they register new devices. This is useful to handle bot accounts. This is undocumented as its mostly a hack to test on matrix.org. Note: This will still send out device list updates if the device is later updated, e.g. end to end keys are added.
* Merge remote-tracking branch 'origin/release-v1.101' into developErik Johnston2024-02-091-2/+2
|\
| * Increase batching when fetching auth chains (#16893)Erik Johnston2024-02-091-2/+2
| | | | | | | | | | | | | | | | This basically reverts a change that was in https://github.com/element-hq/synapse/pull/16833, where we reduced the batching. The smaller batching can cause performance issues on busy servers and databases.
* | Only do one concurrent fetch per server in keyring (#16894)Erik Johnston2024-02-091-4/+5
|/ | | | | Otherwise if we've stacked a bunch of requests for the keys of a server, we'll end up sending lots of concurrent requests for its keys, needlessly.
* Accept unprefixed form of MSC3981 recurse parameter (#16842)David Baker2024-02-061-1/+1
| | | Now that the MSC3981 has passed FCP
* Bump lxml-stubs from 0.4.0 to 0.5.1 (#16885)dependabot[bot]2024-02-062-5/+3
|
* Run `ANALYZE` after fiddling with stats (#16849)Erik Johnston2024-01-242-0/+18
| | | | | Introduced in #16833 Fixes #16844
* Speed up e2e device keys queries for bot accounts (#16841)Erik Johnston2024-01-231-11/+18
| | | | | | This helps with bot accounts with lots of non-e2e devices. The change is basically to change the order of the join for the case of using `INNER JOIN`
* Correctly mention previous copyright (#16820)Erik Johnston2024-01-23461-0/+560
| | | | | During the migration the automated script to update the copyright headers accidentally got rid of some of the existing copyright lines. Reinstate them.
* Preparatory work for tweaking performance of auth chain lookups (#16833)Erik Johnston2024-01-234-27/+162
|
* Allow room creation but not publishing to continue if room publication rules ↵Shay2024-01-221-4/+2
| | | | | | | | | | | | | are violated when creating a new room. (#16811) Prior to this PR, if a request to create a public (public as in published to the rooms directory) room violated the room list publication rules set in the [config](https://matrix-org.github.io/synapse/latest/usage/configuration/config_documentation.html#room_list_publication_rules), the request to create the room was denied and the room was not created. This PR changes the behavior such that when a request to create a room published to the directory violates room list publication rules, the room is still created but the room is not published to the directory.
* Handle wildcard type filters properly (#14984)Mo Balaa2024-01-221-6/+17
|
* feat: add msc4028 to versions api (#16787)Hanadi2024-01-161-0/+2
| | | | Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Optimize query for fetching to-device messages in `/sync` (#16805)Erik Johnston2024-01-111-77/+72
| | | | | | | | The current query supports passing in a list of users, which generates a query using `user_id = ANY(..)`. This is generates a less efficient query plan that is notably slower than a simple `user_id = ?` condition. Note: The new function is mostly a copy and paste and then a simplification of the existing function.
* Improve DB performance of calculating badge counts for push. (#16756)Erik Johnston2024-01-112-114/+147
| | | | | | | | | | | | | | | | The crux of the change is to try and make the queries simpler and pull out fewer rows. Before, there were quite a few joins against subqueries, which caused postgres to pull out more rows than necessary. Instead, let's simplify the query and do some of the filtering out in Python instead, letting Postgres do better optimizations now that it doesn't have to deal with joins against subqueries. Review note: this is a complete rewrite of the function, so not sure how useful the diff is. --------- Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* Correctly handle OIDC config with no `client_secret` set (#16806)Erik Johnston2024-01-101-1/+14
| | | | | | | | | | | In previous versions of authlib using `client_secret_basic` without a `client_secret` would result in an invalid auth header. Since authlib 1.3 it throws an exception. The configuration may be accepted in by very lax servers, so we don't want to deny it outright. Instead, let's default the `client_auth_method` to `none`, which does the right thing. If the config specifies `client_auth_method` and no `client_secret` then that is going to be bogus and we should reject it
* Faster load recents for sync (#16783)Erik Johnston2024-01-102-7/+24
| | | This hopefully reduces the amount of state we need to keep in memory
* Pull less state out if we fail to backfill (#16788)Erik Johnston2024-01-101-9/+12
| | | | | | | | | | | Sometimes we fail to fetch events during backfill due to missing state, and we often end up querying the same bad events periodically (as people backpaginate). In such cases its likely we will continue to fail to get the state, and therefore we should try *before* loading the state that we have from the DB (as otherwise it's wasted DB and memory). --------- Co-authored-by: reivilibre <oliverw@matrix.org>
* Reduce amount of state pulled out when querying federation hierachy (#16785)Erik Johnston2024-01-103-3/+81
| | | | | | | | | | | There are two changes here: 1. Only pull out the required state when handling the request. 2. Change the get filtered state return type to check that we're only querying state that was requested --------- Co-authored-by: reivilibre <oliverw@matrix.org>
* Split up deleting devices into batches (#16766)Erik Johnston2024-01-101-2/+6
| | | | Otherwise for users with large numbers of devices this can cause a lot of woe.
* Faster partial join to room with complex auth graph (#7)Erik Johnston2024-01-101-49/+30
| | | | | | | | Instead of persisting outliers in a bunch of batches, let's just do them all at once. This is fine because all `_auth_and_persist_outliers_inner` is doing is checking the auth rules for each event, which requires the events to be topologically sorted by the auth graph.
* Filter out rooms from the room directory being served to other homeservers ↵reivilibre2024-01-082-52/+132
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | when those rooms block that homeserver by their Access Control Lists. (#16759) The idea here being that the directory server shouldn't advertise rooms to a requesting server is the requesting server would not be allowed to join or participate in the room. <!-- Fixes: # <!-- --> <!-- Supersedes: # <!-- --> <!-- Follows: # <!-- --> <!-- Part of: # <!-- --> Base: `develop` <!-- git-stack-base-branch:develop --> <!-- This pull request is commit-by-commit review friendly. <!-- --> <!-- This pull request is intended for commit-by-commit review. <!-- --> Original commit schedule, with full messages: <ol> <li> Pass `from_federation_origin` down into room list retrieval code </li> <li> Don't cache /publicRooms response for inbound federated requests </li> <li> fixup! Don't cache /publicRooms response for inbound federated requests </li> <li> Cap the number of /publicRooms entries to 100 </li> <li> Simplify code now that you can't request unlimited rooms </li> <li> Filter out rooms from federated requests that don't have the correct ACL </li> <li> Request a handful more when filtering ACLs so that we can try to avoid shortchanging the requester </li> </ol> --------- Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Port `EventInternalMetadata` class to Rust (#16782)Erik Johnston2024-01-085-119/+119
| | | | | | | | | | | | | There are a couple of things we need to be careful of here: 1. The current python code does no validation when loading from the DB, so we need to be careful to ignore such errors (at least on jki.re there are some old events with internal metadata fields of the wrong type). 2. We want to be memory efficient, as we often have many hundreds of thousands of events in the cache at a time. --------- Co-authored-by: Quentin Gliech <quenting@element.io>
* Fix linting (#16780)Erik Johnston2024-01-051-1/+11
| | | Introduced in #16762
* Simplify internal metadata class. (#16762)Erik Johnston2024-01-055-46/+23
| | | | | | | | | We remove these fields as they're just duplicating data the event already stores, and (for reasons :shushing_face:) I'd like to simplify the class to only store simple types. I'm not entirely convinced that we shouldn't instead add helper methods to the event class to generate stream tokens, but I don't really think that's where they belong either
* Add recursion_depth to /relations if recursing (#16775)David Baker2024-01-041-0/+4
| | | | | This is an extra response parameter just added to MSC3981. In the current impl, the recursion depth is always 3, so this just returns a static 3 if the recurse parameter is supplied.
* Search non ASCII display names using Admin API (#16767)Adam Jędrzejewski2024-01-041-1/+1
| | | | | Closes #16370 Signed-off-by: Adam Jedrzejewski <adamjedrzejewski@icloud.com>
* Fix email verification redirection (#16761)FadhlanR2024-01-022-2/+2
| | | | | | Previously, the response status of `HTMLResource` was hardcoded as `200`. However, for proper redirection after the user verifies their email, we require the status to be `302`. This PR addresses that issue by using `code` as response status.
* Enable user without password (#16770)Dirk Klimpel2024-01-021-9/+0
| | | | | | | | | | | Closes: - https://github.com/matrix-org/synapse/issues/10397 - #10397 An administrator should know whether he wants to set a password or not. There are many uses cases where a blank password is required. - Use of only some users with SSO. - Use of bots with password, users with SSO
* Move the rust stubs inline for better IDE integration (#16757)Erik Johnston2023-12-213-0/+100
| | | | At least for vscode this allows click through / type checking / syntax highlighting.
* Update book locationErik Johnston2023-12-1313-17/+17
|
* Fix linksErik Johnston2023-12-131-1/+1
|
* Log the new license during start.Patrick Cloke2023-12-131-0/+4
|
* Merge remote-tracking branch 'gitlab/clokep/license-license' into new_developErik Johnston2023-12-13802-5348/+13682
|\
| * Update license headersPatrick Cloke2023-11-21816-5348/+13948
| |
* | Revert changes to READMEErik Johnston2023-12-131-12/+0
| |
* | Merge remote-tracking branch 'origin/clokep/morg-readme' into developErik Johnston2023-12-131-0/+12
|\ \
| * | Update text github/clokep/morg-readme clokep/morg-readmeErik Johnston2023-12-121-3/+9
| | |
| * | Update the README pointing to the Element fork.Patrick Cloke2023-12-121-0/+6
| | |
* | | Sentry Alert configuration based on production and development environment ↵Zeeshan Rafiq2023-12-122-0/+2
| | | | | | | | | | | | (#16738)
* | | Add avatar and topic settings for server notice room (#16679)Mathieu Velten2023-12-122-11/+114
| | |
* | | Add config to change the delay before sending a notification email (#16696)Mathieu Velten2023-12-122-9/+11
| | |
* | | Write signing keys with file mode 0640 (#16740)elara-leitstellentechnik2023-12-082-5/+16
| | | | | | | | | | | | Co-authored-by: Fabian Klemp <fabian.klemp@frequentis.com>
* | | Expose OIDC discovery information under the CSAPI (#16726)David Robertson2023-12-062-0/+65
|/ / | | | | | | Co-authored-by: Quentin Gliech <quenting@element.io>
* | Revert postgres logical replication deltaas v1.98.0rc1David Robertson2023-12-05116-128/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts two commits: 0bb8e418a41c6f583ca9d705b400e37e2308a534 "Fix postgres schema after dropping old tables (#16730)" and 51e4e35653f98c3f61222fbdbdb1dcb8864f7fca "Add a Postgres `REPLICA IDENTITY` to tables that do not have an implicit one. This should allow use of Postgres logical replication. (take 2, now with no added deadlocks!) (#16658)" and also amends the changelog.
* | Fix upgrading a room without `events` field in power levels (#16725)David Robertson2023-12-051-1/+1
| |
* | Set response values to zero if None for ↵Will Hunt2023-12-051-2/+2
| | | | | | | | | | | | /_synapse/admin/v1/federation/destinations (#16729)
* | Fix postgres schema after dropping old tables (#16730)David Robertson2023-12-055-5/+0
| |
* | Add a Postgres `REPLICA IDENTITY` to tables that do not have an implicit ↵reivilibre2023-12-04121-0/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | one. This should allow use of Postgres logical replication. (take 2, now with no added deadlocks!) (#16658) * Add `ALTER TABLE ... REPLICA IDENTITY ...` for individual tables We can't combine them into one file as it makes it likely to hit a deadlock if Synapse is running, as it only takes one other transaction to access two tables in a different order to the schema delta. * Add notes * Newsfile Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> * Re-introduce REPLICA IDENTITY test --------- Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* | Server notices: add an autojoin setting for the notices room (#16699)Mathieu Velten2023-12-042-1/+16
| | | | | | | | Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* | Implement MSC4069: Inhibit profile propagation (#16636)Travis Ralston2023-12-044-5/+42
| | | | | | MSC: https://github.com/matrix-org/matrix-spec-proposals/pull/4069
* | ModuleAPI SSO auth callbacks (#15207)Andrew Yasinishyn2023-12-015-0/+41
| | | | | | Signed-off-by: Andrii Yasynyshyn yasinishyn.a.n@gmail.com
* | Drop unused tables & unneeded access token ID for events. (#16522)Patrick Cloke2023-12-013-8/+28
| |
* | Declare support for Matrix v1.7, v1.8, and v1.9. (#16707)Patrick Cloke2023-11-291-0/+3
| |
* | Request & follow redirects for /media/v3/download (#16701)Patrick Cloke2023-11-294-33/+152
| | | | | | | | | | | | Implement MSC3860 to follow redirects for federated media downloads. Note that the Client-Server API doesn't support this (yet) since the media repository in Synapse doesn't have a way of supporting redirects.
* | Reduce DB load when forget on leave setting is disabled (#16668)Erik Johnston2023-11-291-3/+8
| | | | | | | | | | * Reduce DB load when forget on leave setting is disabled * Newsfile
* | Speed up pruning of `user_ips` table (#16667)Erik Johnston2023-11-291-10/+7
| | | | | | Silly query planner
* | Ignore `encryption_enabled_by_default_for_room_type` for notices room (#16677)Mathieu Velten2023-11-282-1/+10
| |
* | Remove old full schema dumps. (#16697)Patrick Cloke2023-11-2820-2962/+0
| | | | | | | | These are not useful and make it difficult to search for table definitions, etc.
* | Correctly read to-device stream pos on SQLite (#16682)David Robertson2023-11-242-13/+20
| |
* | Keep track of `user_ips` and `monthly_active_users` when delegating auth ↵David Robertson2023-11-234-38/+74
| | | | | | | | | | | | | | | | | | | | | | | | | | (#16672) * Describe `insert_client_ip` * Pull out client_ips and MAU tracking to BaseAuth * Define HAS_AUTHLIB once in tests sick of copypasting * Track ips and token usage when delegating auth * Test that we track MAU and user_ips * Don't track `__oidc_admin`
* | Enable refreshable tokens on the admin registration endpoint (#16642)Charles Wright2023-11-221-1/+9
| | | | | | Signed-off-by: Charles Wright <cvwright@futo.org>
* | Admin API for server notice: consistently bypass rate limits (#16670)Mathieu Velten2023-11-221-0/+2
| | | | | | | | | | | | | | * Admin API for server notice: disable rate limit for all calls * Add changelog * Update changelog.d/16670.bugfix
* | Filter out auth chain queries that don't exist (#16552)Jason Little2023-11-221-0/+5
|/
* Speed up how quickly we launch new tasks (#16660)Erik Johnston2023-11-171-1/+1
| | | Now that we're reducing concurrency (#16656), this is more important.
* Speed up purge room by adding index (#16657)Erik Johnston2023-11-172-0/+25
| | | What it says on the tin
* Also discard 'caches' and 'backfill' stream POSITIONS (#16655)Erik Johnston2023-11-171-0/+16
| | | Follow on from #16640
* Merge branch 'master' into developPatrick Cloke2023-11-171-2/+2
|\
| * Fix "'int' object is not iterable" error in set_device_id_for_pushers ↵Patrick Cloke2023-11-021-2/+2
| | | | | | | | | | | | background update (#16594) A regression from removing the cursor_to_dict call, adds back the wrapping into a tuple.
* | Reduce task concurrency (#16656)Erik Johnston2023-11-172-2/+2
| |
* | Revert "Fix test not detecting tables with missing primary keys and missing ↵Erik Johnston2023-11-162-110/+0
| | | | | | | | | | replica identities, then add more replica identities. (#16647)" (#16652) This reverts commit 830988ae72d63bbb67d2020a3f221664f3f456ee.
* | Revert "Add a Postgres `REPLICA IDENTITY` to tables that do not have an ↵Erik Johnston2023-11-162-118/+0
| | | | | | | | | | implicit one. This should allow use of Postgres logical replication. (#16456)" (#16651) This reverts commit 69afe3f7a0d89f3422ddbd3aa16bc9bbc01056eb.
* | Speed up deleting device messages (#16643)Erik Johnston2023-11-163-29/+87
| | | | | | Keeping track of a lower bound of stream ID where we've deleted everything below makes the queries much faster. Otherwise, every time we scan for rows to delete we'd re-scan across all the rows that have previously deleted (until the next table VACUUM).
* | Speed up persisting large number of outliers (#16649)Erik Johnston2023-11-162-11/+58
| | | | | | Recalculating the roots tuple every iteration could be very expensive, so instead let's do a topological sort.
* | Fix sending out of order `POSITION` over replication (#16639)Erik Johnston2023-11-163-21/+36
| | | | | | | | | | If a worker reconnects to Redis we send out the current positions of all our streams. However, if we're also trying to send out a backlog of RDATA at the same time then we can end up sending a `POSITION` with the current token *before* we've sent all the RDATA before the current token. This doesn't cause actual bugs as the receiving servers see the POSITION, fetch the relevant rows from the DB, and then ignore the old RDATA as they come in. However, this is inefficient so it'd be better if we didn't send out-of-order positions
* | More efficiently handle no-op POSITION (#16640)Erik Johnston2023-11-162-0/+52
| | | | | | | | We may receive `POSITION` commands where we already know that worker has advanced past that position, so there is no point in handling it.
* | Fix test not detecting tables with missing primary keys and missing replica ↵reivilibre2023-11-162-0/+110
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | identities, then add more replica identities. (#16647) * Fix the CI query that did not detect all cases of missing primary keys * Add more missing REPLICA IDENTITY entries * Newsfile Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> --------- Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* | Add an Admin API to temporarily grant the ability to update an existing ↵David Robertson2023-11-156-12/+165
| | | | | | | | cross-signing key without UIA (#16634)
* | Asynchronous Uploads (#15503)Sumner Evans2023-11-1511-58/+530
| | | | | | Support asynchronous uploads as defined in MSC2246.
* | Use full GitHub links instead of bare issue numbers. (#16637)Patrick Cloke2023-11-1519-32/+42
| |
* | Remove whole table locks on push rule add/delete (#16051)Nick Mills-Barrett2023-11-131-16/+27
| | | | | | | | The statements are already executed within a transaction thus a table level lock is unnecessary.
* | Add a Postgres `REPLICA IDENTITY` to tables that do not have an implicit ↵reivilibre2023-11-132-0/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | one. This should allow use of Postgres logical replication. (#16456) * Add Postgres replica identities to tables that don't have an implicit one Fixes #16224 * Newsfile Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> * Move the delta to version 83 as we missed the boat for 82 * Add a test that all tables have a REPLICA IDENTITY * Extend the test to include when indices are deleted * isort * black * Fully qualify `oid` as it is a 'hidden attribute' in Postgres 11 * Update tests/storage/test_database.py Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com> * Add missed tables --------- Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* | Use attempt_to_set_autocommit everywhere. (#16615)Patrick Cloke2023-11-093-12/+18
| | | | | | To avoid asserting the type of the database connection.
* | Fix a long-standing bug where Synapse would not unbind third-party ↵reivilibre2023-11-091-11/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | identifiers for Application Service users when deactivated and would not emit a compliant response. (#16617) * Don't skip unbinding 3PIDs and returning success status when deactivating AS user Fixes #16608 * Newsfile Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> --------- Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* | Use _invalidate_cache_and_stream_bulk in more places. (#16616)Patrick Cloke2023-11-096-46/+70
| | | | | | | | | | This takes advantage of the new bulk method in more places to invalidate caches for many keys at once (and then to stream that over replication).
* | Convert simple_select_one_txn and simple_select_one to return tuples. (#16612)Patrick Cloke2023-11-0925-253/+261
| |
* | Return attrs for more media repo APIs. (#16611)Patrick Cloke2023-11-096-91/+128
| |
* | Bulk-invalidate e2e cached queries after claiming keys (#16613)David Robertson2023-11-094-28/+131
| | | | | | | | Co-authored-by: Patrick Cloke <patrickc@matrix.org>
* | Avoid updating the same rows multiple times with simple_update_many_txn. ↵Patrick Cloke2023-11-071-4/+1
| | | | | | | | | | | | (#16609) simple_update_many_txn had a bug in it which would cause each update to be applied twice.
* | Avoid executing no-op queries. (#16583)Patrick Cloke2023-11-075-19/+33
| | | | | | | | | | | | If simple_{insert,upsert,update}_many_txn is called without any data to modify then return instead of executing the query. This matches the behavior of simple_{select,delete}_many_txn.
* | More tests for the simple_* methods. (#16596)Patrick Cloke2023-11-071-9/+4
| | | | | | | | Expand tests for the simple_* database methods, additionally test against both PostgreSQL and SQLite variants.
* | Collect information for PushRuleEvaluator in parallel. (#16590)Patrick Cloke2023-11-063-34/+86
| | | | | | | | | | | | | | | | Fetch information needed for push rule evaluation in parallel. Ideally this would use query pipelining, but this is not available in psycopg2. Due to the database thread pool this may result in little to no parallelization.
* | Support reactor timing metric on more reactors. (#16532)Patrick Cloke2023-11-061-27/+103
| | | | | | | | | | | | | | | | | | | | Previously only Twisted's EPollReactor was compatible with the reactor timing metric, notably not working when asyncio was used. After this change, the following configurations support the reactor timing metric: * poll, epoll, or select reactors * asyncio reactor with a poll, epoll, select, /dev/poll, or kqueue event loop.
* | Simplify event persistence code (#16584)Patrick Cloke2023-11-032-312/+324
| | | | | | | | | | | | | | | | | | | | | | The event persistence code used to handle multiple rooms at a time, but was simplified to only ever be called with a single room at a time (different rooms are now handled in parallel). The code is still generic to multiple rooms causing a lot of work that is unnecessary (e.g. unnecessary loops, and partitioning data by room). This strips out the ability to handle multiple rooms at once, greatly simplifying the code.
* | Use simple_select_many_txn in event persistance code. (#16585)Patrick Cloke2023-11-021-5/+11
| | | | | | | | | | Just to standardize on the normal helpers, it might also have a slight perf improvement on PostgreSQL which will now use `ANY (?)` instead of `IN (?, ?, ...)`.
* | Bump twisted from 23.8.0 to 23.10.0 (#16588)dependabot[bot]2023-11-011-1/+1
| |
* | Do not call getfullargspec on every call. (#16589)Patrick Cloke2023-10-311-2/+5
| | | | | | | | | | getfullargspec is relatively expensive and the results will not change between calls, so precalculate it outside the wrapper.
* | Remove remaining usage of cursor_to_dict. (#16564)Patrick Cloke2023-10-3114-136/+283
| |
* | Fix import ordering issue introduced in ↵Patrick Cloke2023-10-311-1/+1
|/ | | | 7a3a55ac98847d7adb0e200378abe07ef8d0c645.
* Merge pull request from GHSA-mp92-3jfm-3575Patrick Cloke2023-10-313-1/+16
|
* Claim local one-time-keys in bulk (#16565)David Robertson2023-10-302-114/+149
| | | | Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
* Add fast path for replication events stream fetch (#16580)Erik Johnston2023-10-301-0/+6
| | | | We can bail early if the from token is greater than or equal to the current token.
* Claim fallback keys in bulk (#16570)David Robertson2023-10-303-0/+84
|
* Fix HTTP repl response to use minimum token (#16578)Erik Johnston2023-10-301-1/+1
|
* Portdb: don't copy a table that gets rebuilt (#16563)David Robertson2023-10-271-1/+1
|
* Ensure local invited & knocking users leave before purge. (#16559)Patrick Cloke2023-10-272-3/+20
| | | | | This is mostly useful for federated rooms where some users would get stuck in the invite or knock state when the room was purged from their homeserver.
* Reduce amount of caches POSITIONS we send (#16561)Erik Johnston2023-10-271-0/+10
| | | Follow on from / actually correctly does #16557
* Reduce spurious replication catchup (#16555)Erik Johnston2023-10-271-5/+9
|
* Fix cross-worker ratelimiting (#16558)Erik Johnston2023-10-271-16/+57
| | | c.f. #16481
* Reduce replication traffic due to reflected cache stream POSITION (#16557)Erik Johnston2023-10-271-1/+18
|
* Add new module API for adding custom fields to events `unsigned` section ↵Erik Johnston2023-10-2714-42/+99
| | | | (#16549)
* Remove more usages of cursor_to_dict. (#16551)Patrick Cloke2023-10-2621-123/+182
| | | Mostly to improve type safety.
* Add a new module API to update user presence state. (#16544)Patrick Cloke2023-10-268-42/+94
| | | | | | | | | | This adds a module API which allows a module to update a user's presence state/status message. This is useful for controlling presence from an external system. To fully control presence from the module the presence.enabled config parameter gains a new state of "untracked" which disables internal tracking of presence changes via user actions, etc. Only updates from the module will be persisted and sent down sync properly).
* Convert simple_select_list and simple_select_list_txn to return lists of ↵Patrick Cloke2023-10-2621-269/+346
| | | | | tuples (#16505) This should use fewer allocations and improves type hints.
* Allow multiple workers to write to receipts stream. (#16432)Erik Johnston2023-10-2512-85/+347
| | | Fixes #16417
* Fix http/s proxy authentication with long username/passwords (#16504)Richard Brežák2023-10-241-1/+1
|
* Remove duplicate call to wake a remote destination when using federation ↵Jason Little2023-10-242-13/+0
| | | | sending worker (#16515)
* Fix type hint errors from Twisted trunk (#16526)Patrick Cloke2023-10-231-5/+11
|
* Fix bug where a new writer advances their token too quickly (#16473)Erik Johnston2023-10-236-62/+170
| | | | | | | | | | | | | | | | | | | * Fix bug where a new writer advances their token too quickly When starting a new writer (for e.g. persisting events), the `MultiWriterIdGenerator` doesn't have a minimum token for it as there are no rows matching that new writer in the DB. This results in the the first stream ID it acquired being announced as persisted *before* it actually finishes persisting, if another writer gets and persists a subsequent stream ID. This is due to the logic of setting the minimum persisted position to the minimum known position of across all writers, and the new writer starts off not being considered. * Fix sending out POSITIONs when our token advances without update Broke in #14820 * For replication HTTP requests, only wait for minimal position
* Fix bug that could cause a `/sync` to tightloop with sqlite after restart ↵Erik Johnston2023-10-231-1/+4
| | | | | (#16540) This could happen if the last rows in the account data stream were inserted into `account_data`. After a restart the max account ID would be calculated without looking at the `account_data` table, and so have an old ID.
* Force TLS certificate verification in registration script. (#16530)Denis Kasak2023-10-231-2/+2
| | | | | | | | | | If using the script remotely, there's no particularly convincing reason to disable certificate verification, as this makes the connection interceptible. If on the other hand, the script is used locally (the most common use case), you can simply target the HTTP listener and avoid TLS altogether. This is what the script already attempts to do if passed a homeserver configuration YAML file.
* Remove the last reference to event_txn_id. (#16521)Patrick Cloke2023-10-232-7/+4
| | | | This table was no longer used, except for a background process which purged old entries in it.
* Mark sync as limited if there is a gap in the timeline (#16485)Erik Johnston2023-10-194-33/+165
| | | | | | | | This splits thinsg into two queries, but most of the time we won't have new event backwards extremities so this shouldn't actually add an extra RTT for the majority of cases. Note this removes the check for events with no prev events, but that was part of MSC2716 work that has since been removed.
* Avoid sending massive replication updates when purging a room. (#16510)Patrick Cloke2023-10-182-1/+52
|
* Improve performance of delete device messages query (#16492)Mathieu Velten2023-10-182-7/+10
|
* Convert DeviceLastConnectionInfo to attrs. (#16507)Patrick Cloke2023-10-172-36/+33
| | | To improve type safety & memory usage.
* Fix a bug where servers could be marked as up when they were failing (#16506)Patrick Cloke2023-10-171-13/+17
| | | | After this change a server will only be reported as back online if they were previously having requests fail.
* Convert state delta processing from a dict to attrs. (#16469)Patrick Cloke2023-10-166-108/+109
| | | For improved type checking & memory usage.
* Remove useless async job to delete device messages on sync (#16491)Mathieu Velten2023-10-162-24/+3
|
* Clean up logging on event persister endpoints (#16488)Richard van der Hoff2023-10-142-6/+13
|
* Revert "Drop unused tables & unneeded access token ID for events. (#16268)" ↵Patrick Cloke2023-10-123-28/+8
| | | | | | | | (#16465) This reverts commit cabd57746004fe2dacc11aa8d373854a3d25e306. There are additional usages of these tables which need to be removed first.
* Convert user_get_threepids response to attrs. (#16468)Patrick Cloke2023-10-117-14/+26
| | | This improves type annotations by not having a dictionary of Any values.
* Convert simple_select_many_batch, simple_select_many_txn to tuples. (#16444)Patrick Cloke2023-10-1121-416/+601
|
* Handle content types with parameters. (#16440)Patrick Cloke2023-10-111-1/+3
|
* Inline simple_search_list/simple_search_list_txn. (#16434)Patrick Cloke2023-10-103-73/+48
| | | | This only has a single use and is over abstracted. Inline it so that we can improve type hints.
* Add DB indices to speed up purging rooms (#16457)David Robertson2023-10-103-0/+34
|
* Disable statement timeout whilst purging rooms (#16455)reivilibre2023-10-091-0/+5
| | | | | | | | | | | | | * Disable statement timeout whilst purging rooms * Newsfile Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org> * Note the introduction version --------- Signed-off-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Bump ruff from 0.0.290 to 0.0.292 (#16449)dependabot[bot]2023-10-095-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | * Bump ruff from 0.0.290 to 0.0.292 Bumps [ruff](https://github.com/astral-sh/ruff) from 0.0.290 to 0.0.292. - [Release notes](https://github.com/astral-sh/ruff/releases) - [Changelog](https://github.com/astral-sh/ruff/blob/main/BREAKING_CHANGES.md) - [Commits](https://github.com/astral-sh/ruff/compare/v0.0.290...v0.0.292) --- updated-dependencies: - dependency-name: ruff dependency-type: direct:development update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> * Fix up lint --------- Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Erik Johnston <erik@matrix.org>
* Fix possible AttributeError when account-api is called over unix socket (#16404)Christoph2023-10-091-1/+1
| | | Fixes #16396
* Apply join rate limiter outside the lineariser (#16441)David Robertson2023-10-061-20/+23
|
* Convert simple_select_list_paginate_txn to return tuples. (#16433)Patrick Cloke2023-10-066-39/+66
|
* Return ThumbnailInfo in more places (#16438)Patrick Cloke2023-10-064-62/+71
| | | | Improves type hints by using concrete types instead of dictionaries.
* Drop unused tables & unneeded access token ID for events. (#16268)Patrick Cloke2023-10-063-8/+28
| | | | Drop the event_txn_id table and the tables related to MSC2716, which is no longer supported in Synapse.
* Stop sending incorrect knock_state_events. (#16403)Patrick Cloke2023-10-064-22/+6
| | | | | | | | | Synapse was incorrectly implemented with a knock_state_events property on some APIs (instead of knock_room_state). This was correct in Synapse 1.70.0, but *both* fields were sent to also be compatible with Synapse versions expecting the wrong field. Enough time has passed that only the correct field needs to be included/handled.
* Fix comments related to replication. (#16428)Patrick Cloke2023-10-062-3/+1
|
* Register media servlets via regex. (#16419)Patrick Cloke2023-10-069-118/+103
| | | | | This converts the media servlet URLs in the same way as (most) of the rest of Synapse. This will give more flexibility in the versions each endpoint exists under.
* Remove unused method. (#16435)Patrick Cloke2023-10-051-20/+0
|