summary refs log tree commit diff
path: root/synapse/replication/http (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Fix HTTP repl response to use minimum token (#16578)Erik Johnston2023-10-301-1/+1
|
* Fix bug where a new writer advances their token too quickly (#16473)Erik Johnston2023-10-231-1/+1
| | | | | | | | | | | | | | | | | | | * Fix bug where a new writer advances their token too quickly When starting a new writer (for e.g. persisting events), the `MultiWriterIdGenerator` doesn't have a minimum token for it as there are no rows matching that new writer in the DB. This results in the the first stream ID it acquired being announced as persisted *before* it actually finishes persisting, if another writer gets and persists a subsequent stream ID. This is due to the logic of setting the minimum persisted position to the minimum known position of across all writers, and the new writer starts off not being considered. * Fix sending out POSITIONs when our token advances without update Broke in #14820 * For replication HTTP requests, only wait for minimal position
* Clean up logging on event persister endpoints (#16488)Richard van der Hoff2023-10-142-6/+13
|
* Recheck if remote device is cached before requesting it (#16252)Erik Johnston2023-09-071-2/+2
| | | | This fixes a bug where we could get stuck re-requesting the device over replication again and again.
* Cache device resync requests over replication (#16241)David Robertson2023-09-041-1/+1
|
* Pass the device ID around in the presence handler (#16171)Patrick Cloke2023-08-281-4/+7
| | | | | | Refactoring to pass the device ID (in addition to the user ID) through the presence handler (specifically the `user_syncing`, `set_state`, and `bump_presence_active_time` methods and their replication versions).
* Combine logic about not overriding BUSY presence. (#16170)Patrick Cloke2023-08-281-5/+5
| | | | | | | | | | | | | Simplify some of the presence code by reducing duplicated code between worker & non-worker modes. The main change is to push some of the logic from `user_syncing` into `set_state`. This is done by passing whether the user is setting the presence via a `/sync` with a new `is_sync` flag to `set_state`. If this is `true` some additional logic is performed: * Don't override `busy` presence. * Update the `last_user_sync_ts`. * Never update the status message.
* Clarify comment on key uploads over replication (#16016)Shay2023-07-271-2/+2
|
* Use a custom scheme & the worker name for replication requests. (#15578)Jason Little2023-05-231-12/+6
| | | | | | | | All the information needed is already in the `instance_map`, so use that instead of passing the hostname / IP & port manually for each replication request. This consolidates logic for future improvements of using e.g. UNIX sockets for workers.
* Remove `worker_replication_*` settings (#15491)Jason Little2023-05-111-11/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Add master to the instance_map as part of Complement, have ReplicationEndpoint look at instance_map for master. * Fix typo in drive by. * Remove unnecessary worker_replication_* bits from unit tests and add master to instance_map(hopefully in the right place) * Several updates: 1. Switch from master to main for naming the main process in the instance_map. Add useful constants for easier adjustment of names in the future. 2. Add backwards compatibility for worker_replication_* to allow time to transition to new style. Make sure to prioritize declaring main directly on the instance_map. 3. Clean up old comments/commented out code. 4. Adjust unit tests to match with new code. 5. Adjust Complement setup infrastructure to only add main to the instance_map if workers are used and remove now unused options from the worker.yaml template. * Initial Docs upload * Changelog * Missed some commented out code that can go now * Remove TODO comment that no longer holds true. * Fix links in docs * More docs * Remove debug logging * Apply suggestions from code review Co-authored-by: reivilibre <olivier@librepush.net> * Apply suggestions from code review Co-authored-by: reivilibre <olivier@librepush.net> * Update version to latest, include completeish before/after examples in upgrade notes. * Fix up and docs too --------- Co-authored-by: reivilibre <olivier@librepush.net>
* HTTP Replication Client (#15470)Jason Little2023-05-091-1/+1
| | | | | | Separate out a HTTP client for replication in preparation for also supporting using UNIX sockets. The major difference from the base class is that this does not use treq to handle HTTP requests.
* Remove legacy code of single user device resync api (#15418)Alok Kumar Singh2023-04-211-57/+0
| | | | | * Removed single-user resync usage and updated it to use multi-user counterpart Signed-off-by: Alok Kumar Singh alokaks601@gmail.com
* Have replication clients remove _INT_STREAM_POS (#15309)David Robertson2023-03-221-1/+1
| | | | | | | | | | | | | | | | | | | * Have replication clients remove _INT_STREAM_POS Suppose worker A makes an internal http request from worker B. B may make changes that A later learns about over replication. We want A's request to block until it has seen those changes—mainly to ensure A's caches are invalidated promptly. This helps provide read-after-write consistency, eliminating entire categories of races and test flakes. To implement this, B includes a top-level field `_INT_STREAM_POS` in its response JSON. Roughly speaking, the field's value tells A what to wait for. But we weren't removing that internal field before A's request completed! Introduced in https://github.com/matrix-org/synapse/pull/14820. Fixes #15308. * Changelog
* Add support for knocking to workers. (#15133)Dirk Klimpel2023-03-021-11/+4
|
* Bump black from 22.12.0 to 23.1.0 (#15103)dependabot[bot]2023-02-222-2/+0
|
* Fix bug in replication where response is cached (#15024)Erik Johnston2023-02-081-0/+2
|
* Reduce max time we wait for stream positions (#14881)Erik Johnston2023-01-201-2/+0
| | | | | | Now that we wait for stream positions whenever we do a HTTP replication hit, we need to be less brutal in the case where we do timeout (as we have bugs around this).
* Wait for streams to catch up when processing HTTP replication. (#14820)Erik Johnston2023-01-1813-89/+140
| | | | This should hopefully mitigate a class of races where data gets out of sync due a HTTP replication request racing with the replication streams.
* Batch up replication requests to request the resyncing of remote users's ↵reivilibre2023-01-101-1/+73
| | | | devices. (#14716)
* Add experimental support for MSC3391: deleting account data (#14714)Andrew Morgan2023-01-011-8/+84
|
* Add a type hint for `get_device_handler()` and fix incorrect types. (#14055)Patrick Cloke2022-11-221-3/+8
| | | | | This was the last untyped handler from the HomeServer object. Since it was being treated as Any (and thus unchecked) it was being used incorrectly in a few places.
* Remove need for `worker_main_http_uri` setting to use /keys/upload. (#14400)realtyem2022-11-161-0/+67
|
* Remove redundant types from comments. (#14412)Patrick Cloke2022-11-161-1/+1
| | | | | | | Remove type hints from comments which have been added as Python type hints. This helps avoid drift between comments and reality, as well as removing redundant information. Also adds some missing type hints which were simple to fill in.
* Support using SSL on worker endpoints. (#14128)Tuomas Ojamies2022-11-151-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Fix missing SSL support in worker endpoints. * Add changelog * SSL for Replication endpoint * Remove unit test change * Refactor listener creation to reduce duplicated code * Fix the logger message * Update synapse/app/_base.py Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com> * Update synapse/app/_base.py Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com> * Update synapse/app/_base.py Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com> * Add config documentation for new TLS option Co-authored-by: Tuomas Ojamies <tojamies@palantir.com> Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com> Co-authored-by: Olivier Wilkinson (reivilibre) <oliverw@matrix.org>
* Fallback if 'approved' isn't included in a registration replication request ↵Brendan Abolivier2022-10-111-1/+17
| | | | (#14135)
* Allow admins to require a manual approval process before new accounts can be ↵Brendan Abolivier2022-09-291-0/+5
| | | | used (using MSC3866) (#13556)
* Persist CreateRoom events to DB in a batch (#13800)Shay2022-09-283-2/+175
|
* Generalise the `@cancellable` annotation so it can be used on functions ↵reivilibre2022-08-311-3/+4
| | | | other than just servlet methods. (#13662)
* Add type annotations to `trace` decorator. (#13328)Patrick Cloke2022-07-191-2/+2
| | | | Functions that are decorated with `trace` are now properly typed and the type hints for them are fixed.
* Faster room joins: fix race in recalculation of current room state (#13151)Sean Quah2022-07-072-0/+77
| | | | | | | | | | | Bounce recalculation of current state to the correct event persister and move recalculation of current state into the event persistence queue, to avoid concurrent updates to a room's current state. Also give recalculation of a room's current state a real stream ordering. Signed-off-by: Sean Quah <seanq@matrix.org>
* Handle race between persisting an event and un-partial stating a room (#13100)Sean Quah2022-07-052-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Whenever we want to persist an event, we first compute an event context, which includes the state at the event and a flag indicating whether the state is partial. After a lot of processing, we finally try to store the event in the database, which can fail for partial state events when the containing room has been un-partial stated in the meantime. We detect the race as a foreign key constraint failure in the data store layer and turn it into a special `PartialStateConflictError` exception, which makes its way up to the method in which we computed the event context. To make things difficult, the exception needs to cross a replication request: `/fed_send_events` for events coming over federation and `/send_event` for events from clients. We transport the `PartialStateConflictError` as a `409 Conflict` over replication and turn `409`s back into `PartialStateConflictError`s on the worker making the request. All client events go through `EventCreationHandler.handle_new_client_event`, which is called in *a lot* of places. Instead of trying to update all the code which creates client events, we turn the `PartialStateConflictError` into a `429 Too Many Requests` in `EventCreationHandler.handle_new_client_event` and hope that clients take it as a hint to retry their request. On the federation event side, there are 7 places which compute event contexts. 4 of them use outlier event contexts: `FederationEventHandler._auth_and_persist_outliers_inner`, `FederationHandler.do_knock`, `FederationHandler.on_invite_request` and `FederationHandler.do_remotely_reject_invite`. These events won't have the partial state flag, so we do not need to do anything for then. The remaining 3 paths which create events are `FederationEventHandler.process_remote_join`, `FederationEventHandler.on_send_membership_event` and `FederationEventHandler._process_received_pdu`. We can't experience the race in `process_remote_join`, unless we're handling an additional join into a partial state room, which currently blocks, so we make no attempt to handle it correctly. `on_send_membership_event` is only called by `FederationServer._on_send_membership_event`, so we catch the `PartialStateConflictError` there and retry just once. `_process_received_pdu` is called by `on_receive_pdu` for incoming events and `_process_pulled_event` for backfill. The latter should never try to persist partial state events, so we ignore it. We catch the `PartialStateConflictError` in `on_receive_pdu` and retry just once. Refering to the graph of code paths in https://github.com/matrix-org/synapse/issues/12988#issuecomment-1156857648 may make the above make more sense. Signed-off-by: Sean Quah <seanq@matrix.org>
* Rename storage classes (#12913)Erik Johnston2022-05-312-4/+6
|
* Respect the `@cancellable` flag for `ReplicationEndpoint`s (#12700)Sean Quah2022-05-111-2/+19
| | | | | | | | | While `ReplicationEndpoint`s register themselves via `JsonResource`, they pass a method that calls the handler, instead of the handler itself, to `register_paths`. As a result, `JsonResource` will not correctly pick up the `@cancellable` flag and we have to apply it ourselves. Signed-off-by: Sean Quah <seanq@element.io>
* Bump `black` and `click` versions (#12320)David Robertson2022-03-291-1/+1
|
* Retry some http replication failures (#12182)Nick Mills-Barrett2022-03-091-11/+36
| | | | | | | | This allows for the target process to be down for around a minute which provides time for restarts during synapse upgrades/config updates. Closes: #12178 Signed off by Nick Mills-Barrett nick@beeper.com
* Remove `HomeServer.get_datastore()` (#12031)Richard van der Hoff2022-02-235-14/+14
| | | | | | | The presence of this method was confusing, and mostly present for backwards compatibility. Let's get rid of it. Part of #11733
* Better error message when failing to request from another process (#12060)Erik Johnston2022-02-221-1/+3
|
* Add missing type hints to synapse.replication.http. (#11856)Patrick Cloke2022-02-0812-162/+257
|
* Save the OIDC session ID (sid) with the device on login (#11482)Quentin Gliech2021-12-061-0/+8
| | | As a step towards allowing back-channel logout for OIDC.
* Add type hints for most `HomeServer` parameters (#11095)Sean Quah2021-10-2212-32/+67
|
* Fix opentracing and Prometheus metrics for replication requests (#10996)Sean Quah2021-10-121-76/+78
| | | | | | | | | | | | | | | | | | | | | | | | This commit fixes two bugs to do with decorators not instrumenting `ReplicationEndpoint`'s `send_request` correctly. There are two decorators on `send_request`: Prometheus' `Gauge.track_inprogress()` and Synapse's `opentracing.trace`. `Gauge.track_inprogress()` does not have any support for async functions when used as a decorator. Since async functions behave like regular functions that return coroutines, only the creation of the coroutine was covered by the metric and none of the actual body of `send_request`. `Gauge.track_inprogress()` returns a regular, non-async function wrapping `send_request`, which is the source of the next bug. The `opentracing.trace` decorator would normally handle async functions correctly, but since the wrapped `send_request` is a non-async function, the decorator ends up suffering from the same issue as `Gauge.track_inprogress()`: the opentracing span only measures the creation of the coroutine and none of the actual function body. Using `Gauge.track_inprogress()` as a context manager instead of a decorator resolves both bugs.
* Use direct references for configuration variables (part 5). (#10897)Patrick Cloke2021-09-241-2/+2
|
* Split `FederationHandler` in half (#10692)Richard van der Hoff2021-08-261-2/+2
| | | The idea here is to take anything to do with incoming events and move it out to a separate handler, as a way of making FederationHandler smaller.
* Use inline type hints in various other places (in `synapse/`) (#10380)Jonathan de Jong2021-07-151-5/+5
|
* MSC2918 Refresh tokens implementation (#9450)Quentin Gliech2021-06-241-1/+12
| | | | | | | | | | This implements refresh tokens, as defined by MSC2918 This MSC has been implemented client side in Hydrogen Web: vector-im/hydrogen-web#235 The basics of the MSC works: requesting refresh tokens on login, having the access tokens expire, and using the refresh token to get a new one. Signed-off-by: Quentin Gliech <quentingliech@gmail.com>
* Extend `ResponseCache` to pass a context object into the callback (#10157)Richard van der Hoff2021-06-142-4/+4
| | | | | This is the first of two PRs which seek to address #8518. This first PR lays the groundwork by extending ResponseCache; a second PR (#10158) will update the SyncHandler to actually use it, and fix the bug. The idea here is that we allow the callback given to ResponseCache.wrap to decide whether its result should be cached or not. We do that by (optionally) passing a ResponseCacheContext into it, which it can modify.
* Implement knock feature (#6739)Sorunome2021-06-091-0/+139
| | | | | | This PR aims to implement the knock feature as proposed in https://github.com/matrix-org/matrix-doc/pull/2403 Signed-off-by: Sorunome mail@sorunome.de Signed-off-by: Andrew Morgan andrewm@element.io
* Clean up the interface for injecting opentracing over HTTP (#10143)Richard van der Hoff2021-06-091-2/+3
| | | | | | | * Remove unused helper functions * Clean up the interface for injecting opentracing over HTTP * changelog
* Use a database table to hold the users that should have full presence sent ↵Andrew Morgan2021-05-181-2/+9
| | | | to them, instead of something in-memory (#9823)
* Split presence out of master (#9820)Erik Johnston2021-04-231-1/+4
|
* Remove redundant "coding: utf-8" lines (#9786)Jonathan de Jong2021-04-1412-12/+0
| | | | | | | Part of #9744 Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now. `Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
* Make RateLimiter class check for ratelimit overrides (#9711)Erik Johnston2021-03-301-1/+1
| | | | | | | This should fix a class of bug where we forget to check if e.g. the appservice shouldn't be ratelimited. We also check the `ratelimit_override` table to check if the user has ratelimiting disabled. That table is really only meant to override the event sender ratelimiting, so we don't use any values from it (as they might not make sense for different rate limits), but we do infer that if ratelimiting is disabled for the user we should disabled all ratelimits. Fixes #9663
* Prep work for removing `outlier` from `internal_metadata` (#9411)Richard van der Hoff2021-03-172-1/+6
| | | | | | | | | | | | * Populate `internal_metadata.outlier` based on `events` table Rather than relying on `outlier` being in the `internal_metadata` column, populate it based on the `events.outlier` column. * Move `outlier` out of InternalMetadata._dict Ultimately, this will allow us to stop writing it to the database. For now, we have to grandfather it back in so as to maintain compatibility with older versions of Synapse.
* Fix the auth provider on the logins metric (#9573)Richard van der Hoff2021-03-101-2/+2
| | | | | We either need to pass the auth provider over the replication api, or make sure we report the auth provider on the worker that received the request. I've gone with the latter.
* Add ResponseCache tests. (#9458)Jonathan de Jong2021-03-081-3/+6
|
* Use the proper Request in type hints. (#9515)Patrick Cloke2021-03-011-5/+4
| | | | This also pins the Twisted version in the mypy job for CI until proper type hints are fixed throughout Synapse.
* Fix deleting pushers when using sharded pushers. (#9465)Erik Johnston2021-02-222-0/+74
|
* Add configs to make profile data more private (#9203)AndrewFerr2021-02-191-1/+2
| | | | | | | Add off-by-default configuration settings to: - disable putting an invitee's profile info in invite events - disable profile lookup via federation Signed-off-by: Andrew Ferrazzutti <fair@miscworks.net>
* Update black, and run auto formatting over the codebase (#9381)Eric Eastwood2021-02-164-7/+15
| | | | | | | - Update black version to the latest - Run black auto formatting over the codebase - Run autoformatting according to [`docs/code_style.md `](https://github.com/matrix-org/synapse/blob/80d6dc9783aa80886a133756028984dbf8920168/docs/code_style.md) - Update `code_style.md` docs around installing black to use the correct version
* Allow moving account data and receipts streams off master (#9104)Erik Johnston2021-01-182-0/+189
|
* Enforce all replication HTTP clients calls use kwargs (#9144)Erik Johnston2021-01-181-1/+1
|
* Merge remote-tracking branch 'origin/erikj/as_mau_block' into developErik Johnston2020-12-181-2/+10
|\
| * Correctly handle AS registerations and add testErik Johnston2020-12-171-2/+10
| |
* | Add authentication to replication endpoints. (#8853)Patrick Cloke2020-12-041-6/+41
|/ | | | Authentication is done by checking a shared secret provided in the Synapse configuration file.
* Add typing to membership Replication class methods (#8809)Andrew Morgan2020-11-271-22/+44
| | | | | This PR grew out of #6739, and adds typing to some method arguments You'll notice that there are a lot of `# type: ignores` in here. This is due to the base methods not matching the overloads here. This is necessary to stop mypy complaining, but a better solution is #8828.
* Generalise _maybe_store_room_on_invite (#8754)Andrew Morgan2020-11-131-5/+5
| | | | | | | | | There's a handy function called maybe_store_room_on_invite which allows us to create an entry in the rooms table for a room and its version for which we aren't joined to yet, but we can reference when ingesting events about. This is currently used for invites where we receive some stripped state about the room and pass it down via /sync to the client, without us being in the room yet. There is a similar requirement for knocking, where we will eventually do the same thing, and need an entry in the rooms table as well. Thus, reusing this function works, however its name needs to be generalised a bit. Separated out from #6739.
* Add ability for access tokens to belong to one user but grant access to ↵Erik Johnston2020-10-292-6/+3
| | | | | | | | | | another user. (#8616) We do it this way round so that only the "owner" can delete the access token (i.e. `/logout/all` by the "owner" also deletes that token, but `/logout/all` by the "target user" doesn't). A future PR will add an API for creating such a token. When the target user and authenticated entity are different the `Processed request` log line will be logged with a: `{@admin:server as @bob:server} ...`. I'm not convinced by that format (especially since it adds spaces in there, making it harder to use `cut -d ' '` to chop off the start of log lines). Suggestions welcome.
* Fix message duplication if something goes wrong after persisting the event ↵Erik Johnston2020-10-131-2/+14
| | | | | (#8476) Should fix #3365.
* Add type hints to response cache. (#8507)Patrick Cloke2020-10-091-1/+1
|
* Remove the deprecated Handlers object (#8494)Patrick Cloke2020-10-092-2/+2
| | | All handlers now available via get_*_handler() methods on the HomeServer.
* Add metrics to track success/otherwise of replication requests (#8406)Richard van der Hoff2020-09-291-12/+28
| | | One hope is that this might provide some insights into #3365.
* Simplify super() calls to Python 3 syntax. (#8344)Patrick Cloke2020-09-186-12/+12
| | | | | | | This converts calls like super(Foo, self) -> super(). Generated with: sed -i "" -Ee 's/super\([^\(]+\)/super()/g' **/*.py
* Switch metaclass initialization to python 3-compatible syntax (#8326)Jonathan de Jong2020-09-161-3/+1
|
* Add experimental support for sharding event persister. Again. (#8294)Erik Johnston2020-09-141-3/+9
| | | | | | This is *not* ready for production yet. Caveats: 1. We should write some tests... 2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
* Remove some unused distributor signals (#8216)Patrick Cloke2020-09-091-6/+4
| | | | | Removes the `user_joined_room` and stops calling it since there are no observers. Also cleans-up some other unused signals and related code.
* Stop sub-classing object (#8249)Patrick Cloke2020-09-041-1/+1
|
* Revert "Add experimental support for sharding event persister. (#8170)" (#8242)Brendan Abolivier2020-09-041-9/+3
| | | | | | | * Revert "Add experimental support for sharding event persister. (#8170)" This reverts commit 82c1ee1c22a87b9e6e3179947014b0f11c0a1ac3. * Changelog
* Add experimental support for sharding event persister. (#8170)Erik Johnston2020-09-021-3/+9
| | | | | | This is *not* ready for production yet. Caveats: 1. We should write some tests... 2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
* Add a shadow-banned flag to users. (#8092)Patrick Cloke2020-08-141-0/+4
|
* Convert replication code to async/await. (#7987)Patrick Cloke2020-08-039-37/+27
|
* Merge tag 'v1.18.0rc2' into developRichard van der Hoff2020-07-281-1/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Synapse 1.18.0rc2 (2020-07-28) ============================== Bugfixes -------- - Fix an `AssertionError` exception introduced in v1.18.0rc1. ([\#7876](https://github.com/matrix-org/synapse/issues/7876)) - Fix experimental support for moving typing off master when worker is restarted, which is broken in v1.18.0rc1. ([\#7967](https://github.com/matrix-org/synapse/issues/7967)) Internal Changes ---------------- - Further optimise queueing of inbound replication commands. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
| * Typing worker needs to handle stream update requests (#7967)Erik Johnston2020-07-281-1/+1
| | | | | | | | | | IIRC this doesn't break tests because its only hit on reconnection, or something. Basically, when a process needs to fetch missing updates for the `typing` stream it needs to query the writer instance via HTTP (as we don't write typing notifications to the DB), the problem was that the endpoint (`streams`) was only registered on master and specifically not on the typing writer worker.
* | Convert a synapse.events to async/await. (#7949)Patrick Cloke2020-07-272-2/+4
|/
* Fix some spelling mistakes / typos. (#7811)Patrick Cloke2020-07-091-2/+2
|
* Generate real events when we reject invites (#7804)Richard van der Hoff2020-07-091-67/+25
| | | | | | | | Fixes #2181. The basic premise is that, when we fail to reject an invite via the remote server, we can generate our own out-of-band leave event and persist it as an outlier, so that we have something to send to the client.
* Merge different Resource implementation classes (#7732)Erik Johnston2020-07-032-10/+4
|
* Replace all remaining six usage with native Python 3 equivalents (#7704)Dagfinn Ilmari Mannsåker2020-06-161-4/+2
|
* Add option to move event persistence off master (#7517)Erik Johnston2020-05-224-2/+161
|
* Add ability to wait for replication streams (#7542)Erik Johnston2020-05-224-16/+20
| | | | | | | The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room). Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on. People probably want to look at this commit by commit.
* Allow ReplicationRestResource to be added to workers (#7515)Erik Johnston2020-05-181-5/+8
| | | This allows workers to talk to each other over HTTP replication.
* Add `instance_map` config and route replication calls (#7495)Erik Johnston2020-05-141-6/+15
|
* Have all instances correctly respond to REPLICATE command. (#7475)Erik Johnston2020-05-131-2/+2
| | | | | Before all streams were only written to from master, so only master needed to respond to `REPLICATE` commands. Before all instances wrote to the cache invalidation stream, but didn't respond to `REPLICATE`. This was a bug, which could lead to missed rows from cache invalidation stream if an instance is restarted, however all the caches would be empty in that case so it wasn't a problem.
* Thread through instance name to replication client. (#7369)Erik Johnston2020-05-012-2/+21
| | | For in memory streams when fetching updates on workers we need to query the source of the stream, which currently is hard coded to be master. This PR threads through the source instance we received via `POSITION` through to the update function in each stream, which can then be passed to the replication client for in memory streams.
* Remove 'limit' param from `get_repl_stream_updates` APIRichard van der Hoff2020-04-231-5/+7
| | | | | there doesn't seem to be much point in passing this limit all around, since both sides agree it's meant to be 100.
* Move catchup of replication streams to worker. (#7024)Erik Johnston2020-03-252-0/+80
| | | This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
* Store room_versions in EventBase objects (#6875)Richard van der Hoff2020-03-052-8/+19
| | | | | | | This is a bit fiddly because it all has to be done on one fell swoop: * Wherever we create a new event, pass in the room version (and check it matches the format version) * When we prune an event, use the room version of the unpruned event to create the pruned version. * When we pass an event over the replication protocol, pass the room version over alongside it, and use it when deserialising the event again.
* Store room version on invite (#6983)Richard van der Hoff2020-02-262-2/+36
| | | | | When we get an invite over federation, store the room version in the rooms table. The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...
* Fixup synapse.replication to pass mypy checks (#6667)Erik Johnston2020-01-141-5/+5
|
* Change EventContext to use the Storage class (#6564)Erik Johnston2019-12-202-2/+6
|
* Propagate reason in remotely rejected invitesErik Johnston2019-11-281-2/+5
|
* Merge pull request #6332 from matrix-org/erikj/query_devices_fixErik Johnston2019-11-262-1/+82
|\ | | | | Fix caching devices for remote servers in worker.
| * Fixup docsErik Johnston2019-11-261-1/+5
| |
| * Fix caching devices for remote servers in worker.Erik Johnston2019-11-052-1/+78
| | | | | | | | | | | | | | | | When the `/keys/query` API is hit on client_reader worker Synapse may decide that it needs to resync some remote deivces. Usually this happens on master, and then gets cached. However, that fails on workers and so it falls back to fetching devices from remotes directly, which may in turn fail if the remote is down.
* | Address review commentsAndrew Morgan2019-11-061-1/+1
| |
* | Don't forget to ratelimit calls outside of RegistrationHandlerAndrew Morgan2019-11-061-0/+2
|/
* Remove usage of deprecated logger.warn method from codebase (#6271)Andrew Morgan2019-10-312-2/+2
| | | Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.
* Port replication http server endpoints to async/awaitErik Johnston2019-10-296-44/+26
|
* Trace how long it takes for the send trasaction to complete, including ↵Jorik Schellekens2019-09-051-1/+6
| | | | retrys (#5986)
* Add opentracing to all client servlets (#5983)Jorik Schellekens2019-09-051-10/+6
|
* Remove bind_email and bind_msisdn (#5964)Andrew Morgan2019-09-041-18/+3
| | | Removes the `bind_email` and `bind_msisdn` parameters from the `/register` C/S API endpoint as per [MSC2140: Terms of Service for ISes and IMs](https://github.com/matrix-org/matrix-doc/pull/2140/files#diff-c03a26de5ac40fb532de19cb7fc2aaf7R107).
* Remove unnecessary parentheses around return statements (#5931)Andrew Morgan2019-08-305-11/+11
| | | | | Python will return a tuple whether there are parentheses around the returned values or not. I'm just sick of my editor complaining about this all over the place :)
* Opentracing across workers (#5771)Jorik Schellekens2019-08-221-2/+14
| | | | | | | | | | | | | | Propagate opentracing contexts across workers Also includes some Convenience modifications to opentracing for servlets, notably: - Add boolean to skip the whitelisting check on inject extract methods. - useful when injecting into carriers locally. Otherwise we'd always have to include our own servername and whitelist our servername - start_active_span_from_request instead of header - Add boolean to decide whether to extract context from a request to a servlet
* Revert "Add "require_consent" parameter for registration"Brendan Abolivier2019-08-221-2/+0
| | | | This reverts commit 3320aaab3a9bba3f5872371aba7053b41af9d0a0.
* Add "require_consent" parameter for registrationHalf-Shot2019-08-221-0/+2
|
* Merge tag 'v1.2.0rc2' into developAndrew Morgan2019-07-241-1/+1
|\ | | | | | | | | | | | | Bugfixes -------- - Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))
| * Fix servlet metric names (#5734)Jorik Schellekens2019-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | * Fix servlet metric names Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> * Remove redundant check * Cover all return paths
* | Replace returnValue with return (#5736)Amber Brown2019-07-236-13/+13
|/
* Remove access-token support from RegistrationHandler.register (#5641)Richard van der Hoff2019-07-081-6/+0
| | | | | | | | Nothing uses this now, so we can remove the dead code, and clean up the API. Since we're changing the shape of the return value anyway, we take the opportunity to give the method a better name.
* Remove support for invite_3pid_guest. (#5625)Richard van der Hoff2019-07-051-65/+0
| | | | | | | | | This has never been documented, and I'm not sure it's ever been used outside sytest. It's quite a lot of poorly-maintained code, so I'd like to get rid of it. For now I haven't removed the database table; I suggest we leave that for a future clearout.
* Run Black. (#5482)Amber Brown2019-06-206-88/+56
|
* Handle failing to talk to master over replicationErik Johnston2019-06-071-1/+9
|
* Add rate-limiting on registration (#4735)Brendan Abolivier2019-03-051-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Rate-limiting for registration * Add unit test for registration rate limiting * Add config parameters for rate limiting on auth endpoints * Doc * Fix doc of rate limiting function Co-Authored-By: babolivier <contact@brendanabolivier.com> * Incorporate review * Fix config parsing * Fix linting errors * Set default config for auth rate limiting * Fix tests * Add changelog * Advance reactor instead of mocked clock * Move parameters to registration specific config and give them more sensible default values * Remove unused config options * Don't mock the rate limiter un MAU tests * Rename _register_with_store into register_with_store * Make CI happy * Remove unused import * Update sample config * Fix ratelimiting test for py2 * Add non-guest test
* Fix registration on workers (#4682)Erik Johnston2019-02-203-3/+58
| | | | | | | | | | * Move RegistrationHandler init to HomeServer * Move post registration actions to RegistrationHandler * Add post regisration replication endpoint * Newsfile
* Move register_device into handlerErik Johnston2019-02-181-14/+3
|
* Split out registration to workerErik Johnston2019-02-183-1/+179
| | | | | | | | This allows registration to be handled by a worker, though the actual write to the database still happens on master. Note: due to the in-memory session map all registration requests must be handled by the same worker.
* Fix replication for room v3 (#4523)Erik Johnston2019-01-301-1/+4
| | | | | | | | | * Fix replication for room v3 We were not correctly quoting the path fragments over http replication, which meant that it exploded when the event IDs had a slash in them * Newsfile
* Fix receiving events from federation via a workerErik Johnston2019-01-291-1/+1
| | | | This bug was introduced in PR #4470, commit 678a92cb56d547dcadffa723e29b4855a27d0901
* Replace missed usages of FrozenEventErik Johnston2019-01-252-4/+12
|
* Revert "Require event format version to parse or create events"Erik Johnston2019-01-252-12/+4
|
* Replace missed usages of FrozenEventErik Johnston2019-01-242-4/+12
|
* Fix logging bug in EDU handling over replicationErik Johnston2018-08-171-1/+1
|
* Use federation handler function rather than duplicateErik Johnston2018-08-151-41/+3
| | | | This involves renaming _persist_events to be a public function.
* Move clean_room_for_join to masterErik Johnston2018-08-091-0/+35
|
* Fixup doc commentsErik Johnston2018-08-091-0/+17
|
* Merge branch 'develop' of github.com:matrix-org/synapse into ↵Erik Johnston2018-08-093-16/+62
|\ | | | | | | erikj/split_federation
| * Fixup wording and remove dead codeErik Johnston2018-08-091-2/+1
| |
| * Rename POST param to METHODErik Johnston2018-08-082-13/+22
| |
| * Fixup logging and docstringsErik Johnston2018-08-082-2/+40
| |
* | Add EDU/query handling over replicationErik Johnston2018-08-061-1/+1
| |
* | Add replication APIs for persisting federation eventsErik Johnston2018-08-062-1/+247
|/
* Fix isortErik Johnston2018-08-061-4/+1
|
* Merge branch 'develop' of github.com:matrix-org/synapse into ↵Erik Johnston2018-08-031-4/+3
|\ | | | | | | erikj/refactor_repl_servlet
| * Kill off MatrixCodeMessageExceptionRichard van der Hoff2018-08-012-16/+12
| | | | | | | | | | | | | | | | | | | | | | This code brings the SimpleHttpClient into line with the MatrixFederationHttpClient by having it raise HttpResponseExceptions when a request fails (rather than trying to parse for matrix errors and maybe raising MatrixCodeMessageException). Then, whenever we were checking for MatrixCodeMessageException and turning them into SynapseErrors, we now need to check for HttpResponseExceptions and call to_synapse_error.
* | Use new helper base class for membership requestsErik Johnston2018-07-311-171/+91
| |
* | Use new helper base class for ReplicationSendEventRestServletErik Johnston2018-07-311-79/+36
| |
* | Add helper base class for generating new replication endpointsErik Johnston2018-07-311-0/+208
|/ | | | | This will hopefully reduce the boiler plate required to implement new internal HTTP requests.
* Fix missing attributes on workers.Erik Johnston2018-07-231-2/+5
| | | | | This was missed during the transition from attribute to getter for getting state from context.
* run isortAmber Brown2018-07-093-8/+9
|
* Pass around the reactor explicitly (#3385)Amber Brown2018-06-221-3/+3
|
* Refactor ResponseCache usageRichard van der Hoff2018-04-121-12/+6
| | | | | | | | | | | | | | | Adds a `.wrap` method to ResponseCache which wraps up the boilerplate of a (get, set) pair, and then use it throughout the codebase. This will be largely non-functional, but does include the following functional changes: * federation_server.on_context_state_request: drops use of _server_linearizer which looked redundant and could cause incorrect cache misses by yielding between the get and the set. * RoomListHandler.get_remote_public_room_list(): fixes logcontext leaks * the wrap function includes some logging. I'm hoping this won't be too noisy on production.
* Add metrics for ResponseCacheRichard van der Hoff2018-04-101-1/+1
|
* Fix importsErik Johnston2018-03-142-7/+4
|
* s/join/joined/ in notify_user_membership_changeErik Johnston2018-03-141-3/+3
|
* Implement RoomMemberWorkerHandlerErik Johnston2018-03-132-0/+336
|
* extra_users is actually a list of UserIDsErik Johnston2018-03-131-4/+4
|
* Log in the correct placesErik Johnston2018-03-011-2/+4
|
* Don't do preserve_fn for every requestErik Johnston2018-03-011-1/+2
|
* Add some loggingErik Johnston2018-03-011-0/+2
|
* Make repl send_event idempotent and retry on timeoutsErik Johnston2018-03-011-6/+38
| | | | | | If we treated timeouts as failures on the worker we would attempt to clean up e.g. push actions while the master might still process the event.
* Correctly send ratelimit and extra_users paramsErik Johnston2018-03-011-1/+13
|
* Calculate push actions on workerErik Johnston2018-02-281-1/+1
|
* Don't serialize current state over replicationErik Johnston2018-02-151-2/+2
|
* Don't log errors propogated from send_eventErik Johnston2018-02-151-1/+10
|
* Add replication http endpoint for event sendingErik Johnston2018-02-072-0/+139