| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
| |
This reverts commit 158d73ebdd61eef33831ae5f6990acf07244fc55.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Revert "Sort internal changes in changelog"
Revert "Update CHANGES.md"
Revert "1.49.0rc1"
Revert "Revert "Move `glob_to_regex` and `re_word_boundary` to `matrix-python-common` (#11505) (#11527)"
Revert "Refactors in `_generate_sync_entry_for_rooms` (#11515)"
Revert "Correctly register shutdown handler for presence workers (#11518)"
Revert "Fix `ModuleApi.looping_background_call` for non-async functions (#11524)"
Revert "Fix 'delete room' admin api to work on incomplete rooms (#11523)"
Revert "Correctly ignore invites from ignored users (#11511)"
Revert "Fix the test breakage introduced by #11435 as a result of concurrent PRs (#11522)"
Revert "Stabilise support for MSC2918 refresh tokens as they have now been merged into the Matrix specification. (#11435)"
Revert "Save the OIDC session ID (sid) with the device on login (#11482)"
Revert "Add admin API to get some information about federation status (#11407)"
Revert "Include bundled aggregations in /sync and related fixes (#11478)"
Revert "Move `glob_to_regex` and `re_word_boundary` to `matrix-python-common` (#11505)"
Revert "Update backward extremity docs to make it clear that it does not indicate whether we have fetched an events' `prev_events` (#11469)"
Revert "Support configuring the lifetime of non-refreshable access tokens separately to refreshable access tokens. (#11445)"
Revert "Add type hints to `synapse/tests/rest/admin` (#11501)"
Revert "Revert accidental commits to develop."
Revert "Newsfile"
Revert "Give `tests.server.setup_test_homeserver` (nominally!) the same behaviour"
Revert "Move `tests.utils.setup_test_homeserver` to `tests.server`"
Revert "Convert one of the `setup_test_homeserver`s to `make_test_homeserver_synchronous`"
Revert "Disambiguate queries on `state_key` (#11497)"
Revert "Comments on the /sync tentacles (#11494)"
Revert "Clean up tests.storage.test_appservice (#11492)"
Revert "Clean up `tests.storage.test_main` to remove use of legacy code. (#11493)"
Revert "Clean up `tests.test_visibility` to remove legacy code. (#11495)"
Revert "Minor cleanup on recently ported doc pages (#11466)"
Revert "Add most of the missing type hints to `synapse.federation`. (#11483)"
Revert "Avoid waiting for zombie processes in `synctl stop` (#11490)"
Revert "Fix media repository failing when media store path contains symlinks (#11446)"
Revert "Add type annotations to `tests.storage.test_appservice`. (#11488)"
Revert "`scripts-dev/sign_json`: support for signing events (#11486)"
Revert "Add MSC3030 experimental client and federation API endpoints to get the closest event to a given timestamp (#9445)"
Revert "Port wiki pages to documentation website (#11402)"
Revert "Add a license header and comment. (#11479)"
Revert "Clean-up get_version_string (#11468)"
Revert "Link background update controller docs to summary (#11475)"
Revert "Additional type hints for config module. (#11465)"
Revert "Register the login redirect endpoint for v3. (#11451)"
Revert "Update openid.md"
Revert "Remove mention of OIDC certification from Dex (#11470)"
Revert "Add a note about huge pages to our Postgres doc (#11467)"
Revert "Don't start Synapse master process if `worker_app` is set (#11416)"
Revert "Expose worker & homeserver as entrypoints in `setup.py` (#11449)"
Revert "Bundle relations of relations into the `/relations` result. (#11284)"
Revert "Fix `LruCache` corruption bug with a `size_callback` that can return 0 (#11454)"
Revert "Eliminate a few `Any`s in `LruCache` type hints (#11453)"
Revert "Remove unnecessary `json.dumps` from `tests.rest.admin` (#11461)"
Revert "Merge branch 'master' into develop"
This reverts commit 26b5d2320f62b5eb6262c7614fbdfc364a4dfc02.
This reverts commit bce4220f387bf5448387f0ed7d14ed1e41e40747.
This reverts commit 966b5d0fa0893c3b628c942dfc232e285417f46d.
This reverts commit 088d748f2cb51f03f3bcacc0fb3af1e0f9607737.
This reverts commit 14d593f72d10b4d8cb67e3288bb3131ee30ccf59.
This reverts commit 2a3ec6facf79f6aae011d9fb6f9ed5e43c7b6bec.
This reverts commit eccc49d7554d1fab001e1fefb0fda8ffb254b630.
This reverts commit b1ecd19c5d19815b69e425d80f442bf2877cab76.
This reverts commit 9c55dedc8c4484e6269451a8c3c10b3e314aeb4a.
This reverts commit 2d42e586a8c54be1a83643148358b1651c1ca666.
This reverts commit 2f053f3f82ca174cc1c858c75afffae51af8ce0d.
This reverts commit a15a893df8428395df7cb95b729431575001c38a.
This reverts commit 8b4b153c9e86c04c7db8c74fde4b6a04becbc461.
This reverts commit 494ebd7347ba52d702802fba4c3bb13e7bfbc2cf.
This reverts commit a77c36989785c0d5565ab9a1169f4f88e512ce8a.
This reverts commit 4eb77965cd016181d2111f37d93526e9bb0434f0.
This reverts commit 637df95de63196033a6da4a6e286e1d58ea517b6.
This reverts commit e5f426cd54609e7f05f8241d845e6e36c5f10d9a.
This reverts commit 8cd68b8102eeab1b525712097c1b2e9679c11896.
This reverts commit 6cae125e20865c52d770b24278bb7ab8fde5bc0d.
This reverts commit 7be88fbf48156b36b6daefb228e1258e7d48cae4.
This reverts commit b3fd99b74a3f6f42a9afd1b19ee4c60e38e8e91a.
This reverts commit f7ec6e7d9e0dc360d9fb41f3a1afd7bdba1475c7.
This reverts commit 5640992d176a499204a0756b1677c9b1575b0a49.
This reverts commit d26808dd854006bd26a2366c675428ce0737238c.
This reverts commit f91624a5950e14ba9007eed9bfa1c828676d4745.
This reverts commit 16d39a5490ce74c901c7a8dbb990c6e83c379207.
This reverts commit 8a4c2969874c0b7d72003f2523883eba8a348e83.
This reverts commit 49e1356ee3d5d72929c91f778b3a231726c1413c.
This reverts commit d2279f471ba8f44d9f578e62b286897a338d8aa1.
This reverts commit b50e39df578adc3f86c5efa16bee9035cfdab61b.
This reverts commit 858d80bf0f9f656a03992794874081b806e49222.
This reverts commit 435f04480728c5d982e1a63c1b2777784bf9cd26.
This reverts commit f61462e1be36a51dbf571076afa8e1930cb182f4.
This reverts commit a6f1a3abecf8e8fd3e1bff439a06b853df18f194.
This reverts commit 84dc50e160a2ec6590813374b5a1e58b97f7a18d.
This reverts commit ed635d32853ee0a3e5ec1078679b27e7844a4ac7.
This reverts commit 7b62791e001d6a4f8897ed48b3232d7f8fe6aa48.
This reverts commit 153194c7717d8016b0eb974c81b1baee7dc1917d.
This reverts commit f44d729d4ccae61bc0cdd5774acb3233eb5f7c13.
This reverts commit a265fbd397ae72b2d3ea4c9310591ff1d0f3e05c.
This reverts commit b9fef1a7cdfcc128fa589a32160e6aa7ed8964d7.
This reverts commit b0eb64ff7bf6bde42046e091f8bdea9b7aab5f04.
This reverts commit f1795463bf503a6fca909d77f598f641f9349f56.
This reverts commit 70cbb1a5e311f609b624e3fae1a1712db639c51e.
This reverts commit 42bf0204635213e2c75188b19ee66dc7e7d8a35e.
This reverts commit 379f2650cf875f50c59524147ec0e33cfd5ef60c.
This reverts commit 7ff22d6da41cd5ca80db95c18b409aea38e49fcd.
This reverts commit 5a0b652d36ae4b6d423498c1f2c82c97a49c6f75.
This reverts commit 432a174bc192740ac7a0a755009f6099b8363ad9.
This reverts commit b14f8a1baf6f500997ae4c1d6a6d72094ce14270, reversing
changes made to e713855dca17a7605bae99ea8d71bc7f8657e4b8.
|
|
|
|
| |
Also refactor the stream ID trackers/generators a bit and try to
document them better.
|
| |
|
|
|
|
| |
This makes the typing stream writer config match the other stream writers
that only currently support a single worker.
|
|
|
| |
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
|
| |
|
|
|
|
|
|
| |
Instead of triggering `__exit__` manually on the replication handler's
logging context, use it as a context manager so that there is an
`__enter__` call to balance the `__exit__`.
|
|
|
|
|
|
| |
This removes the magic allowing accessing configurable
variables directly from the config object. It is now required
that a specific configuration class is used (e.g. `config.foo`
must be replaced with `config.server.foo`).
|
|
|
|
|
|
|
| |
This follows a correction made in twisted/twisted#1664 and should fix our Twisted Trial CI job.
Until that change is in a twisted release, we'll have to ignore the type
of the `host` argument. I've raised #10899 to remind us to review the
issue in a few months' time.
|
| |
|
| |
|
|
|
|
| |
Instead of proxying through the magic getter of the RootConfig
object. This should be more performant (and is more explicit).
|
|
|
| |
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
|
|
|
| |
Mostly this involves decorating a few Deferred declarations with extra type hints. We wrap the types in quotes to avoid runtime errors when running against older versions of Twisted that don't have generics on Deferred.
|
|
|
| |
Implementation of matrix-org/matrix-doc#2285
|
| |
|
|
|
|
|
| |
Reformat all files with the new version.
Signed-off-by: Marcus Hoffmann <bubu@bubu1.eu>
|
|
|
|
|
| |
Hopefully this will help us track down where to-device messages are getting
lost/delayed.
|
| |
|
| |
|
|
|
| |
This is no longer required, since we have dropped support for Python 3.5.
|
|\ |
|
| |
| |
| |
| | |
This undoes part of b076bc276e881b262048307b6a226061d96c4a8d.
|
|\| |
|
| |
| |
| |
| |
| | |
As far as I can tell our logging contexts are meant to log the request ID, or sometimes the request ID followed by a suffix (this is generally stored in the name field of LoggingContext). There's also code to log the name@memory location, but I'm not sure this is ever used.
This simplifies the code paths to require every logging context to have a name and use that in logging. For sub-contexts (created via nested_logging_contexts, defer_to_threadpool, Measure) we use the current context's str (which becomes their name or the string "sentinel") and then potentially modify that (e.g. add a suffix).
|
| | |
|
| |
| |
| | |
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
|
|/
|
|
|
|
|
| |
Part of #9744
Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.
`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
|
|
|
|
| |
Records additional request information into the structured logs,
e.g. the requester, IP address, etc.
|
| |
|
|
|
|
| |
Includes an abstract base class which both the FederationSender
and the FederationRemoteSendQueue must implement.
|
|
|
|
|
|
|
|
|
| |
Running `dmypy run` will do a `mypy` check while spinning up a daemon
that makes rerunning `dmypy run` a lot faster.
`dmypy` doesn't support `follow_imports = silent` and has
`local_partial_types` enabled, so this PR enables those options and
fixes the issues that were newly raised. Note that `local_partial_types`
will be enabled by default in upcoming mypy releases.
|
| |
|
|
|
|
| |
By splitting this to two separate methods the callers know
what methods they can expect on the handler.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
interfaces. (#9528)
This helps fix some type hints when running with Twisted 21.2.0.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
- Update black version to the latest
- Run black auto formatting over the codebase
- Run autoformatting according to [`docs/code_style.md
`](https://github.com/matrix-org/synapse/blob/80d6dc9783aa80886a133756028984dbf8920168/docs/code_style.md)
- Update `code_style.md` docs around installing black to use the correct version
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
This is done by creating a custom `RedisFactory` subclass that
periodically pings all connections in its pool.
We also ensure that the `replyTimeout` param is non-null, so that we
timeout waiting for the reply to those pings (and thus triggering a
reconnect).
|
| |
|
| |
|
| |
|
|
|
|
|
| |
I was trying to make it so that we didn't have to start a background task when handling RDATA, but that is a bigger job (due to all the code in `generic_worker`). However I still think not pulling the event from the DB may help reduce some DB usage due to replication, even if most workers will simply go and pull that event from the DB later anyway.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
|
| |
|
|
|
|
|
|
|
| |
#8567 started a span for every background process. This is good as it means all Synapse code that gets run should be in a span (unless in the sentinel logging context), but it means we generate about 15x the number of spans as we did previously.
This PR attempts to reduce that number by a) not starting one for send commands to Redis, and b) deferring starting background processes until after we're sure they're necessary.
I don't really know how much this will help.
|
|
|
|
|
| |
Currently background proccesses stream the events stream use the "minimum persisted position" (i.e. `get_current_token()`) rather than the vector clock style tokens. This is broadly fine as it doesn't matter if the background processes lag a small amount. However, in extreme cases (i.e. SyTests) where we only write to one event persister the background processes will never make progress.
This PR changes it so that the `MultiWriterIDGenerator` keeps the current position of a given instance as up to date as possible (i.e using the latest token it sees if its not in the process of persisting anything), and then periodically announces that over replication. This then allows the "minimum persisted position" to advance, albeit with a small lag.
|
|
|
|
|
| |
When pulling events out of the DB to send over replication we were not
filtering by instance name, and so we were sending events for other
instances.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea is to remove some of the places we pass around `int`, where it can represent one of two things:
1. the position of an event in the stream; or
2. a token that partitions the stream, used as part of the stream tokens.
The valid operations are then:
1. did a position happen before or after a token;
2. get all events that happened before or after a token; and
3. get all events between two tokens.
(Note that we don't want to allow other operations as we want to change the tokens to be vector clocks rather than simple ints)
|
|
|
|
|
|
|
| |
This converts calls like super(Foo, self) -> super().
Generated with:
sed -i "" -Ee 's/super\([^\(]+\)/super()/g' **/*.py
|
|
|
|
|
| |
slots use less memory (and attribute access is faster) while slightly
limiting the flexibility of the class attributes. This focuses on objects
which are instantiated "often" and for short periods of time.
|
| |
|
|
|
|
|
|
| |
This is *not* ready for production yet. Caveats:
1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea here is that we pass the `max_stream_id` to everything, and only use the stream ID of the particular event to figure out *when* the max stream position has caught up to the event and we can notify people about it.
This is to maintain the distinction between the position of an item in the stream (i.e. event A has stream ID 513) and a token that can be used to partition the stream (i.e. give me all events after stream ID 352). This distinction becomes important when the tokens are more complicated than a single number, which they will be once we start tracking the position of multiple writers in the tokens.
The valid operations here are:
1. Is a position before or after a token
2. Fetching all events between two tokens
3. Merging multiple tokens to get the "max", i.e. `C = max(A, B)` means that for all positions P where P is before A *or* before B, then P is before C.
Future PR will change the token type to a dedicated type.
|
|
|
|
|
| |
`pusher_pool.on_new_notifications` expected a min and max stream ID, however that was not what we were passing in. Instead, let's just pass it the current max stream ID and have it track the last stream ID it got passed.
I believe that it mostly worked as we called the function for every event. However, it would break for events that got persisted out of order, i.e, that were persisted but the max stream ID wasn't incremented as not all preceding events had finished persisting, and push for that event would be delayed until another event got pushed to the effected users.
|
|
|
|
| |
This reverts commit e7fd336a53a4ca489cdafc389b494d5477019dc0.
|
| |
|
| |
|
|
|
|
|
|
|
| |
* Revert "Add experimental support for sharding event persister. (#8170)"
This reverts commit 82c1ee1c22a87b9e6e3179947014b0f11c0a1ac3.
* Changelog
|
|
|
|
|
|
| |
This is *not* ready for production yet. Caveats:
1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
|
|
|
|
|
|
| |
This fixes a bug where having multiple callers waiting on the same
stream and position will cause it to try and compare two deferreds,
which fails (due to the sorted list having an entry of `Tuple[int,
Deferred]`).
|
|
|
|
|
| |
It's just a thin wrapper around two ID gens to make `get_current_token`
and `get_next` return tuples. This can easily be replaced by calling the
appropriate methods on the underlying ID gens directly.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function is used for two purposes: 1) for subscribers of streams to
get a token they can use to get further updates with, and 2) for
replication to track position of the writers of the stream.
For streams with a single writer the two scenarios produce the same
result, however the situation becomes complicated for streams with
multiple writers. The current `MultiWriterIdGenerator` does not
correctly handle the first case (which is not an issue as its only used
for the `caches` stream which nothing subscribes to outside of
replication).
|
| |
|
|
|
| |
Most of the stuff we do for replication commands can be done synchronously. There's no point spinning up background processes if we're not going to need them.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Handling of incoming typing stream updates from replication was not
hooked up on master, effecting set ups where typing was handled on a
different worker.
This is really only a problem if the master process is also handling
sync requests, which is unlikely for those that are at the stage of
moving typing off.
The other observable effect is that if a worker restarts or a
replication connect drops then the typing worker will issue a
`POSITION typing`, triggering master process to try and stream *all*
typing updates from position 0.
Fixes #7907
|
| |
|
|
|
|
| |
I'm going to be doing more stuff synchronously, and I don't want to lose the
CPU metrics down the sofa.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
When we get behind on replication, we tend to stack up background processes
behind a linearizer. Bg processes are heavy (particularly with respect to
prometheus metrics) and linearizers aren't terribly efficient once the queue
gets long either.
A better approach is to maintain a queue of requests to be processed, and
nominate a single process to work its way through the queue.
Fixes: #7444
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
| |
The CI appears to use the latest version of isort, which is a problem when isort gets a major version bump. Rather than try to pin the version, I've done the necessary to make isort5 happy with synapse.
|
|
|
| |
The aim here is to make it easier to reason about when streams are limited and when they're not, by moving the logic into the database functions themselves. This should mean we can kill of `db_query_to_update_function` function.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Ensure account data stream IDs are unique.
The account data stream is shared between three tables, and the maximum
allocated ID was tracked in a dedicated table. Updating the max ID
happened outside the transaction that allocated the ID, leading to a
race where if the server was restarted then the same ID could be
allocated but the max ID failed to be updated, leading it to be reused.
The ID generators have support for tracking across multiple tables, so
we may as well use that instead of a dedicated table.
* Fix bug in account data replication stream.
If the same stream ID was used in both global and room account data then
the getting updates for the replication stream would fail due to
`heapq.merge(..)` trying to compare a `str` with a `None`. (This is
because you'd have two rows like `(534, '!room')` and `(534, None)` from
the room and global account data tables).
Fix is just to order by stream ID, since we don't rely on the ordering
beyond that. The bug where stream IDs can be reused should be fixed now,
so this case shouldn't happen going forward.
Fixes #7617
|
| |
|
|
|
| |
Fixes #7566.
|
| |
|
|
|
|
|
|
|
| |
The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room).
Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on.
People probably want to look at this commit by commit.
|
|
|
|
|
|
| |
Make sure that the AccountDataStream presents complete updates, in the right
order.
This is much the same fix as #7337 and #7358, but applied to a different stream.
|
|
|
| |
This is so that the logic can happen on both master and workers when we move event persistence out.
|
|
|
|
|
| |
Before all streams were only written to from master, so only master needed to respond to `REPLICATE` commands.
Before all instances wrote to the cache invalidation stream, but didn't respond to `REPLICATE`. This was a bug, which could lead to missed rows from cache invalidation stream if an instance is restarted, however all the caches would be empty in that case so it wasn't a problem.
|
|
|
| |
Proactively send out `POSITION` commands (as if we had just received a `REPLICATE`) when we connect to Redis. This is important as other instances won't notice we've connected to issue a `REPLICATE` command (unlike for direct TCP connections). This is only currently an issue if master process reconnects without restarting (if it restarts then it won't have written anything and so other instances probably won't have missed anything).
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* release-v1.13.0:
Don't UPGRADE database rows
RST indenting
Put rollback instructions in upgrade notes
Fix changelog typo
Oh yeah, RST
Absolute URL it is then
Fix upgrade notes link
Provide summary of upgrade issues in changelog. Fix )
Move next version notes from changelog to upgrade notes
Changelog fixes
1.13.0rc1
Documentation on setting up redis (#7446)
Rework UI Auth session validation for registration (#7455)
Fix errors from malformed log line (#7454)
Drop support for redis.dbid (#7450)
|
| | |
|
| |
| |
| | |
Since we only use pubsub, the dbid is irrelevant.
|
| | |
|
|\| |
|
| |\ |
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | | |
... otherwise we can believe we're up to date when we're not.
|
| | | | |
|
|\ \ \ \
| | |_|/
| |/| | |
|
| | |/
| |/| |
|
|/ /
| |
| |
| | |
looks like we managed to break this during the refactorathon.
|
| |
| |
| |
| |
| | |
We forgot to set the password on the subscriber connection, as well as
not calling super methods for overridden connectionMade/connectionLost
functions.
|
| |
| |
| | |
For in memory streams when fetching updates on workers we need to query the source of the stream, which currently is hard coded to be master. This PR threads through the source instance we received via `POSITION` through to the update function in each stream, which can then be passed to the replication client for in memory streams.
|
|/
|
|
| |
We move the processing of typing and federation replication traffic into their handlers so that `Stream.current_token()` points to a valid token. This allows us to remove `get_streams_to_replicate()` and `stream_positions()`.
|
|
|
| |
Hopefully this is no worse than what we have on master...
|
|
|
|
|
| |
This is primarily for allowing us to send those commands from workers, but for now simply allows us to ignore echoed RDATA/POSITION commands that we sent (we get echoes of sent commands when using redis). Currently we log a WARNING on the master process every time we receive an echoed RDATA.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For direct TCP connections we need the master to relay REMOTE_SERVER_UP
commands to the other connections so that all instances get notified
about it. The old implementation just relayed to all connections,
assuming that sending back to the original sender of the command was
safe. This is not true for redis, where commands sent get echoed back to
the sender, which was causing master to effectively infinite loop
sending and then re-receiving REMOTE_SERVER_UP commands that it sent.
The fix is to ensure that we only relay to *other* connections and not
to the connection we received the notification from.
Fixes #7334.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Factor out functions for injecting events into database
I want to add some more flexibility to the tools for injecting events into the
database, and I don't want to clutter up HomeserverTestCase with them, so let's
factor them out to a new file.
* Rework TestReplicationDataHandler
This wasn't very easy to work with: the mock wrapping was largely superfluous,
and it's useful to be able to inspect the received rows, and clear out the
received list.
* Fix AssertionErrors being thrown by EventsStream
Part of the problem was that there was an off-by-one error in the assertion,
but also the limit logic was too simple. Fix it all up and add some tests.
|
|
|
| |
Currently we never write to streams from workers, but that will change soon
|
|
|
|
|
|
|
|
|
|
| |
Figuring out how to correctly limit updates from this stream without dropping
entries is far more complicated than just counting the number of rows being
returned. We need to consider each query separately and, if any one query hits
the limit, truncate the results from the others.
I think this also fixes some potentially long-standing bugs where events or
state changes could get missed if we hit the limit on either query.
|
| |
|
|
|
|
|
| |
there doesn't seem to be much point in passing this limit all around, since
both sides agree it's meant to be 100.
|
|
|
|
|
|
|
| |
Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication.
In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits.
Fixes (I hope) #7257.
|
| |
|
|
|
| |
I messed this up last time I tried (#7239 / e13c6c7).
|
|
|
| |
This is configured via the `redis` config options.
|
|
|
|
|
|
| |
Other parts of the code (such as the StreamChangeCache) assume that there will
not be multiple changes with the same stream id.
This code was introduced in #7024, and I hope this fixes #7206.
|
|
|
|
|
|
|
| |
The general idea here is to get rid of the type: ignore annotations on all of the current_token and update_function assignments, which would have caught #7290.
After a bit of experimentation, it seems like the least-awful way to do this is to pass the offending functions in as parameters to the Stream constructor. Unfortunately that means that the concrete implementations no longer have the same constructor signature as Stream itself, which means that it gets hard to correctly annotate STREAMS_MAP.
I've also introduced a couple of new types, to take out some duplication.
|
|
|
|
|
|
| |
Some of the query functions return generators rather than lists, so we can't
index into the result. Happily we already have a copy of the results.
(think this was introduced in #7024)
|
|
|
|
|
| |
`REPLICATE` is now a valid command, and it's nice if you can issue it from the
console without remembering to call it `REPLICATE ` with a trailing space.
|
|
|
|
|
|
| |
Separate `SimpleCommand` from `Command`, so that things which don't want to use
the `data` property don't have to, and thus fix the warnings PyCharm was giving
me about not calling `__init__` in the base class.
|
|
|
|
| |
We've ripped pretty much all of this out: let's remove the remains.
|
|
|
|
| |
Fixes a race between handling `POSITION` and `RDATA` commands. We do this by simply linearizing handling of them.
|
|
|
| |
This completes the merging of server and client command processing.
|
|
|
| |
The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.
|
|
|
|
|
| |
This broke in a recent PR (#7024) and is no longer useful due to all
replication clients implicitly subscribing to all streams, so let's
just remove it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Remove `conn_id` usage for UserSyncCommand.
Each tcp replication connection is assigned a "conn_id", which is used
to give an ID to a remotely connected worker. In a redis world, there
will no longer be a one to one mapping between connection and instance,
so instead we need to replace such usages with an ID generated by the
remote instances and included in the replicaiton commands.
This really only effects UserSyncCommand.
* Add CLEAR_USER_SYNCS command that is sent on shutdown.
This should help with the case where a synchrotron gets restarted
gracefully, rather than rely on 5 minute timeout.
|
|
|
| |
This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
|
|
|
|
|
| |
This just helps keep the rows closer to their streams, so that it's easier to
see what the format of each stream is.
|
|
|
|
|
|
| |
`groups` != `receipts`
Introduced in #6964
|
| |
|
|
|
|
|
| |
Instead of sending down batches of user ID/host tuples, send down a row
per entity (user ID or host).
|
| |
|
| |
|
|
|
|
| |
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
|
|
|
| |
Currently if a worker invalidates a cache it will be streamed to master, which then didn't forward those to other workers.
|
| |
|
|
|
|
|
| |
This will be used to retry outbound transactions to a remote server if
we think it might have come back up.
|
|
|
|
|
|
|
|
|
|
| |
* Port synapse.replication.tcp to async/await
* Newsfile
* Correctly document type of on_<FOO> functions as async
* Don't be overenthusiastic with the asyncing....
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
since I found myself wonder how it works
|
|\ |
|
| |
| |
| | |
Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.
|
|/ |
|
|
|
|
|
| |
Python will return a tuple whether there are parentheses around the returned values or not.
I'm just sick of my editor complaining about this all over the place :)
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
... as a precursor to combining it with the CurrentStateDelta stream.
|
|
|
|
|
| |
We're about to turn it straight into a JSON object anyway so building a
ROW_TYPE is a bit pointless, and reduces flexibility in the update_function.
|
|
|
|
| |
This will allow individual stream classes to override how a row is parsed.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
`__str__` depended on `self.addr`, which was absent from
ClientReplicationStreamProtocol, so attempting to call str on such an object
would raise an exception.
We can calculate the peer addr from the transport, so there is no need for addr
anyway.
|
|
|
|
|
| |
Make sure that they are sent correctly over the replication stream.
Fixes: #4898
|
|
|
| |
Setting this to 50 or so makes a bunch of sytests fail in worker mode.
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| | |
Fix tightloop over connecting to replication server
|
| | |
|
| |
| |
| |
| |
| | |
Otherwise if you have many workers they can easily take out master with
their connection attempts
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If the client failed to process incoming commands during the initial set
up of the replication connection it would immediately disconnect and
reconnect, resulting in a tightloop.
This can happen, for example, when subscribing to a stream that has a
row that is too long in the backlog.
The fix here is to not consider the connection successfully set up until
the client has succesfully subscribed and caught up with the streams.
This ensures that the retry logic timers aren't reset until then,
meaning that if an error does happen during start up the client will
continue backing off before retrying again.
|
|/ |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
`conn_id` gets set to a random string, and so we end up filling up
prometheus with tonnes of data series, which is bad.
|
|
|
|
|
|
|
|
|
|
| |
Run the handlers for replication commands as background processes. This should
improve the visibility in our metrics, and reduce the number of "running db
transaction from sentinel context" warnings.
Ideally it means converting the things that fire off deferreds into the night
into things that actually return a Deferred when they are done. I've made a bit
of a stab at this, but it will probably be leaky.
|
|
|
|
|
|
| |
on_notifier_poke no longer runs synchonously, so we have to do a different hack
to make sure that the replication data has been sent. Let's actually listen for
its arrival.
|
|
|
|
|
| |
This will reduce the number of "Starting db connection from sentinel context"
warnings, and will help with our metrics.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
fix bug introduced in #3256
|
|\ |
|
| |\
| | |
| | | |
replace some iteritems with six
|
| | |
| | |
| | |
| | | |
Signed-off-by: Adrian Tschira <nota@notafile.com>
|
| | | |
|
| | | |
|
| | | |
|
|\| | |
|
| |/
| |
| |
| |
| | |
When a user first syncs, we will send them a server notice asking them to
consent to the privacy policy if they have not already done so.
|
| | |
|
|/ |
|
|
|
|
| |
Signed-off-by: Adrian Tschira <nota@notafile.com>
|
|
|
|
| |
json encoders have an encode method, not a dumps method.
|
|
|
|
|
| |
using json.dumps with custom options requires us to create a new JSONEncoder on
each call. It's more efficient to create one upfront and reuse it.
|
| |
|
|
|
|
|
| |
Turns out that simplejson serialises namedtuple's as dictionaries rather
than tuples by default.
|
|\ |
|
| | |
|
|/
|
|
| |
I found myself wishing we had this.
|
|
|
|
|
| |
The @measure_func annotations rely on the wrapped function respecting the
logcontext rules. Add the necessary yields to make this work.
|
|
|
|
| |
what could possibly go wrong
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| | |
Fix up federation SendQueue and document types
|
| | |
|
|/ |
|
|\
| |
| | |
Don't double encode replication data
|
| | |
|
| | |
|
| | |
|
| | |
|
| | |
|
|/ |
|
|\
| |
| | |
Move to using TCP replication
|
| | |
|
|\ \
| | |
| | | |
Advance replication streams even if nothing is listening
|
| |/
| |
| |
| |
| |
| | |
Otherwise the streams don't advance and steadily fall behind, so when a
worker does connect either a) they'll be streamed lots of old updates or
b) the connection will fail as the streams are too far behind.
|
|/ |
|
| |
|
| |
|
|
|
|
| |
This timestamp is used to indicate when the user last sync'd
|
| |
|
| |
|
|
|
|
| |
This defines the low level TCP replication protocol
|
|
|