| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
This is done by creating a custom `RedisFactory` subclass that
periodically pings all connections in its pool.
We also ensure that the `replyTimeout` param is non-null, so that we
timeout waiting for the reply to those pings (and thus triggering a
reconnect).
|
| |
|
| |
|
| |
|
| |
|
|\ |
|
| | |
|
| |
| |
| | |
This improves type hinting and should use less memory.
|
| | |
|
|/
|
|
| |
Authentication is done by checking a shared secret provided
in the Synapse configuration file.
|
|
|
|
|
| |
This PR grew out of #6739, and adds typing to some method arguments
You'll notice that there are a lot of `# type: ignores` in here. This is due to the base methods not matching the overloads here. This is necessary to stop mypy complaining, but a better solution is #8828.
|
|
|
|
|
|
|
|
|
| |
There's a handy function called maybe_store_room_on_invite which allows us to create an entry in the rooms table for a room and its version for which we aren't joined to yet, but we can reference when ingesting events about.
This is currently used for invites where we receive some stripped state about the room and pass it down via /sync to the client, without us being in the room yet.
There is a similar requirement for knocking, where we will eventually do the same thing, and need an entry in the rooms table as well. Thus, reusing this function works, however its name needs to be generalised a bit.
Separated out from #6739.
|
|
|
|
|
|
|
|
|
|
| |
another user. (#8616)
We do it this way round so that only the "owner" can delete the access token (i.e. `/logout/all` by the "owner" also deletes that token, but `/logout/all` by the "target user" doesn't).
A future PR will add an API for creating such a token.
When the target user and authenticated entity are different the `Processed request` log line will be logged with a: `{@admin:server as @bob:server} ...`. I'm not convinced by that format (especially since it adds spaces in there, making it harder to use `cut -d ' '` to chop off the start of log lines). Suggestions welcome.
|
|
|
|
|
| |
I was trying to make it so that we didn't have to start a background task when handling RDATA, but that is a bigger job (due to all the code in `generic_worker`). However I still think not pulling the event from the DB may help reduce some DB usage due to replication, even if most workers will simply go and pull that event from the DB later anyway.
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
|
| |
|
|
|
|
|
|
|
| |
#8567 started a span for every background process. This is good as it means all Synapse code that gets run should be in a span (unless in the sentinel logging context), but it means we generate about 15x the number of spans as we did previously.
This PR attempts to reduce that number by a) not starting one for send commands to Redis, and b) deferring starting background processes until after we're sure they're necessary.
I don't really know how much this will help.
|
|
|
| |
Most of these uses don't need a full-blown DeferredCache; LruCache is lighter and more appropriate.
|
| |
|
| |
|
| |
|
|
|
|
|
| |
(#8476)
Should fix #3365.
|
|
|
|
|
| |
Currently background proccesses stream the events stream use the "minimum persisted position" (i.e. `get_current_token()`) rather than the vector clock style tokens. This is broadly fine as it doesn't matter if the background processes lag a small amount. However, in extreme cases (i.e. SyTests) where we only write to one event persister the background processes will never make progress.
This PR changes it so that the `MultiWriterIDGenerator` keeps the current position of a given instance as up to date as possible (i.e using the latest token it sees if its not in the process of persisting anything), and then periodically announces that over replication. This then allows the "minimum persisted position" to advance, albeit with a small lag.
|
| |
|
|
|
|
|
| |
When pulling events out of the DB to send over replication we were not
filtering by instance name, and so we were sending events for other
instances.
|
|
|
| |
All handlers now available via get_*_handler() methods on the HomeServer.
|
| |
|
| |
|
| |
|
|
|
| |
One hope is that this might provide some insights into #3365.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On startup `MultiWriteIdGenerator` fetches the maximum stream ID for
each instance from the table and uses that as its initial "current
position" for each writer. This is problematic as a) it involves either
a scan of events table or an index (neither of which is ideal), and b)
if rows are being persisted out of order elsewhere while the process
restarts then using the maximum stream ID is not correct. This could
theoretically lead to race conditions where e.g. events that are
persisted out of order are not sent down sync streams.
We fix this by creating a new table that tracks the current positions of
each writer to the stream, and update it each time we finish persisting
a new entry. This is a relatively small overhead when persisting events.
However for the cache invalidation stream this is a much bigger relative
overhead, so instead we note that for invalidation we don't actually
care about reliability over restarts (as there's no caches to
invalidate) and simply don't bother reading and writing to the new table
in that particular case.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea is to remove some of the places we pass around `int`, where it can represent one of two things:
1. the position of an event in the stream; or
2. a token that partitions the stream, used as part of the stream tokens.
The valid operations are then:
1. did a position happen before or after a token;
2. get all events that happened before or after a token; and
3. get all events between two tokens.
(Note that we don't want to allow other operations as we want to change the tokens to be vector clocks rather than simple ints)
|
|
|
|
|
|
|
| |
This converts calls like super(Foo, self) -> super().
Generated with:
sed -i "" -Ee 's/super\([^\(]+\)/super()/g' **/*.py
|
| |
|
|
|
|
|
| |
slots use less memory (and attribute access is faster) while slightly
limiting the flexibility of the class attributes. This focuses on objects
which are instantiated "often" and for short periods of time.
|
| |
|
|
|
|
|
|
| |
This is *not* ready for production yet. Caveats:
1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The idea here is that we pass the `max_stream_id` to everything, and only use the stream ID of the particular event to figure out *when* the max stream position has caught up to the event and we can notify people about it.
This is to maintain the distinction between the position of an item in the stream (i.e. event A has stream ID 513) and a token that can be used to partition the stream (i.e. give me all events after stream ID 352). This distinction becomes important when the tokens are more complicated than a single number, which they will be once we start tracking the position of multiple writers in the tokens.
The valid operations here are:
1. Is a position before or after a token
2. Fetching all events between two tokens
3. Merging multiple tokens to get the "max", i.e. `C = max(A, B)` means that for all positions P where P is before A *or* before B, then P is before C.
Future PR will change the token type to a dedicated type.
|
|
|
|
|
| |
Removes the `user_joined_room` and stops calling it since there are no observers.
Also cleans-up some other unused signals and related code.
|
|
|
|
|
| |
`pusher_pool.on_new_notifications` expected a min and max stream ID, however that was not what we were passing in. Instead, let's just pass it the current max stream ID and have it track the last stream ID it got passed.
I believe that it mostly worked as we called the function for every event. However, it would break for events that got persisted out of order, i.e, that were persisted but the max stream ID wasn't incremented as not all preceding events had finished persisting, and push for that event would be delayed until another event got pushed to the effected users.
|
|
|
|
| |
This reverts commit e7fd336a53a4ca489cdafc389b494d5477019dc0.
|
| |
|
| |
|
|
|
|
|
|
|
| |
* Revert "Add experimental support for sharding event persister. (#8170)"
This reverts commit 82c1ee1c22a87b9e6e3179947014b0f11c0a1ac3.
* Changelog
|
|
|
|
|
|
| |
This is *not* ready for production yet. Caveats:
1. We should write some tests...
2. The stream token that we use for events can get stalled at the minimum position of all writers. This means that new events may not be processed and e.g. sent down sync streams if a writer isn't writing or is slow.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Move `get_devices_with_keys_by_user` to `EndToEndKeyWorkerStore`
this seems a better fit for it.
This commit simply moves the existing code: no other changes at all.
* Rename `get_devices_with_keys_by_user`
to better reflect what it does.
* get_device_stream_token abstract method
To avoid referencing fields which are declared in the derived classes, make
`get_device_stream_token` abstract, and define that in the classes which define
`_device_list_id_gen`.
|
|
|
|
|
|
| |
This fixes a bug where having multiple callers waiting on the same
stream and position will cause it to try and compare two deferreds,
which fails (due to the sorted list having an entry of `Tuple[int,
Deferred]`).
|
|
|
|
| |
(#8171)
|
|
|
|
|
| |
It's just a thin wrapper around two ID gens to make `get_current_token`
and `get_next` return tuples. This can easily be replaced by calling the
appropriate methods on the underlying ID gens directly.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The function is used for two purposes: 1) for subscribers of streams to
get a token they can use to get further updates with, and 2) for
replication to track position of the writers of the stream.
For streams with a single writer the two scenarios produce the same
result, however the situation becomes complicated for streams with
multiple writers. The current `MultiWriterIdGenerator` does not
correctly handle the first case (which is not an issue as its only used
for the `caches` stream which nothing subscribes to outside of
replication).
|
| |
|
| |
|
| |
|
| |
|
| |
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Synapse 1.18.0rc2 (2020-07-28)
==============================
Bugfixes
--------
- Fix an `AssertionError` exception introduced in v1.18.0rc1. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
- Fix experimental support for moving typing off master when worker is restarted, which is broken in v1.18.0rc1. ([\#7967](https://github.com/matrix-org/synapse/issues/7967))
Internal Changes
----------------
- Further optimise queueing of inbound replication commands. ([\#7876](https://github.com/matrix-org/synapse/issues/7876))
|
| |
| |
| |
| |
| | |
IIRC this doesn't break tests because its only hit on reconnection, or something.
Basically, when a process needs to fetch missing updates for the `typing` stream it needs to query the writer instance via HTTP (as we don't write typing notifications to the DB), the problem was that the endpoint (`streams`) was only registered on master and specifically not on the typing writer worker.
|
| |
| |
| | |
Most of the stuff we do for replication commands can be done synchronously. There's no point spinning up background processes if we're not going to need them.
|
|/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Handling of incoming typing stream updates from replication was not
hooked up on master, effecting set ups where typing was handled on a
different worker.
This is really only a problem if the master process is also handling
sync requests, which is unlikely for those that are at the stage of
moving typing off.
The other observable effect is that if a worker restarts or a
replication connect drops then the typing worker will issue a
`POSITION typing`, triggering master process to try and stream *all*
typing updates from position 0.
Fixes #7907
|
| |
|
|
|
|
| |
I'm going to be doing more stuff synchronously, and I don't want to lose the
CPU metrics down the sofa.
|
| |
|
|
|
|
|
| |
It serves no purpose and updating everytime we write to the device inbox
stream means all such transactions will conflict, causing lots of
transaction failures and retries.
|
|
|
|
|
|
|
|
|
|
|
| |
When we get behind on replication, we tend to stack up background processes
behind a linearizer. Bg processes are heavy (particularly with respect to
prometheus metrics) and linearizers aren't terribly efficient once the queue
gets long either.
A better approach is to maintain a queue of requests to be processed, and
nominate a single process to work its way through the queue.
Fixes: #7444
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
Fixes #2181.
The basic premise is that, when we
fail to reject an invite via the remote server, we can generate our own
out-of-band leave event and persist it as an outlier, so that we have something
to send to the client.
|
| |
|
| |
|
|
|
| |
The CI appears to use the latest version of isort, which is a problem when isort gets a major version bump. Rather than try to pin the version, I've done the necessary to make isort5 happy with synapse.
|
| |
|
|
|
| |
This makes it much easier to find where streams are referenced.
|
|
|
| |
The aim here is to make it easier to reason about when streams are limited and when they're not, by moving the logic into the database functions themselves. This should mean we can kill of `db_query_to_update_function` function.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Ensure account data stream IDs are unique.
The account data stream is shared between three tables, and the maximum
allocated ID was tracked in a dedicated table. Updating the max ID
happened outside the transaction that allocated the ID, leading to a
race where if the server was restarted then the same ID could be
allocated but the max ID failed to be updated, leading it to be reused.
The ID generators have support for tracking across multiple tables, so
we may as well use that instead of a dedicated table.
* Fix bug in account data replication stream.
If the same stream ID was used in both global and room account data then
the getting updates for the replication stream would fail due to
`heapq.merge(..)` trying to compare a `str` with a `None`. (This is
because you'd have two rows like `(534, '!room')` and `(534, None)` from
the room and global account data tables).
Fix is just to order by stream ID, since we don't rely on the ordering
beyond that. The bug where stream IDs can be reused should be fixed now,
so this case shouldn't happen going forward.
Fixes #7617
|
| |
|
|
|
| |
Fixes #7566.
|
| |
|
|
|
|
|
|
|
| |
The idea here is that if an instance persists an event via the replication HTTP API it can return before we receive that event over replication, which can lead to races where code assumes that persisting an event immediately updates various caches (e.g. current state of the room).
Most of Synapse doesn't hit such races, so we don't do the waiting automagically, instead we do so where necessary to avoid unnecessary delays. We may decide to change our minds here if it turns out there are a lot of subtle races going on.
People probably want to look at this commit by commit.
|
|
|
| |
This allows workers to talk to each other over HTTP replication.
|
|\
| |
| | |
Kill off some old python 2 code
|
| |
| |
| |
| | |
this is a no-op under python 3
|
| |
| |
| |
| |
| |
| | |
Make sure that the AccountDataStream presents complete updates, in the right
order.
This is much the same fix as #7337 and #7358, but applied to a different stream.
|
|/
|
|
|
| |
This allows us to have the logic on both master and workers, which is necessary to move event persistence off master.
We also combine the instantiation of ID generators from DataStore and slave stores to the base worker stores. This allows us to select which process writes events independently of the master/worker splits.
|
|
|
| |
This is so that the logic can happen on both master and workers when we move event persistence out.
|
| |
|
|
|
|
|
| |
Before all streams were only written to from master, so only master needed to respond to `REPLICATE` commands.
Before all instances wrote to the cache invalidation stream, but didn't respond to `REPLICATE`. This was a bug, which could lead to missed rows from cache invalidation stream if an instance is restarted, however all the caches would be empty in that case so it wasn't a problem.
|
|
|
| |
Proactively send out `POSITION` commands (as if we had just received a `REPLICATE`) when we connect to Redis. This is important as other instances won't notice we've connected to issue a `REPLICATE` command (unlike for direct TCP connections). This is only currently an issue if master process reconnects without restarting (if it restarts then it won't have written anything and so other instances probably won't have missed anything).
|
|
|
|
| |
variables (#6391)
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* release-v1.13.0:
Don't UPGRADE database rows
RST indenting
Put rollback instructions in upgrade notes
Fix changelog typo
Oh yeah, RST
Absolute URL it is then
Fix upgrade notes link
Provide summary of upgrade issues in changelog. Fix )
Move next version notes from changelog to upgrade notes
Changelog fixes
1.13.0rc1
Documentation on setting up redis (#7446)
Rework UI Auth session validation for registration (#7455)
Fix errors from malformed log line (#7454)
Drop support for redis.dbid (#7450)
|
| | |
|
| |
| |
| | |
Since we only use pubsub, the dbid is irrelevant.
|
| | |
|
|\| |
|
| |\ |
|
| |\ \ |
|
| | | |
| | | |
| | | |
| | | | |
... otherwise we can believe we're up to date when we're not.
|
| | | | |
|
|\ \ \ \
| | |_|/
| |/| | |
|
| | |/
| |/| |
|
|/ /
| |
| |
| | |
looks like we managed to break this during the refactorathon.
|
| |
| |
| |
| |
| | |
We forgot to set the password on the subscriber connection, as well as
not calling super methods for overridden connectionMade/connectionLost
functions.
|
| |
| |
| | |
For in memory streams when fetching updates on workers we need to query the source of the stream, which currently is hard coded to be master. This PR threads through the source instance we received via `POSITION` through to the update function in each stream, which can then be passed to the replication client for in memory streams.
|
|/
|
|
| |
We move the processing of typing and federation replication traffic into their handlers so that `Stream.current_token()` points to a valid token. This allows us to remove `get_streams_to_replicate()` and `stream_positions()`.
|
|
|
| |
Hopefully this is no worse than what we have on master...
|
|
|
|
|
| |
This is primarily for allowing us to send those commands from workers, but for now simply allows us to ignore echoed RDATA/POSITION commands that we sent (we get echoes of sent commands when using redis). Currently we log a WARNING on the master process every time we receive an echoed RDATA.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For direct TCP connections we need the master to relay REMOTE_SERVER_UP
commands to the other connections so that all instances get notified
about it. The old implementation just relayed to all connections,
assuming that sending back to the original sender of the command was
safe. This is not true for redis, where commands sent get echoed back to
the sender, which was causing master to effectively infinite loop
sending and then re-receiving REMOTE_SERVER_UP commands that it sent.
The fix is to ensure that we only relay to *other* connections and not
to the connection we received the notification from.
Fixes #7334.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Factor out functions for injecting events into database
I want to add some more flexibility to the tools for injecting events into the
database, and I don't want to clutter up HomeserverTestCase with them, so let's
factor them out to a new file.
* Rework TestReplicationDataHandler
This wasn't very easy to work with: the mock wrapping was largely superfluous,
and it's useful to be able to inspect the received rows, and clear out the
received list.
* Fix AssertionErrors being thrown by EventsStream
Part of the problem was that there was an off-by-one error in the assertion,
but also the limit logic was too simple. Fix it all up and add some tests.
|
|
|
| |
Currently we never write to streams from workers, but that will change soon
|
|
|
|
|
|
|
|
|
|
| |
Figuring out how to correctly limit updates from this stream without dropping
entries is far more complicated than just counting the number of rows being
returned. We need to consider each query separately and, if any one query hits
the limit, truncate the results from the others.
I think this also fixes some potentially long-standing bugs where events or
state changes could get missed if we hit the limit on either query.
|
| |
|
|
|
|
|
| |
there doesn't seem to be much point in passing this limit all around, since
both sides agree it's meant to be 100.
|
|
|
|
|
|
|
| |
Long story short: if we're handling presence on the current worker, we shouldn't be sending USER_SYNC commands over replication.
In an attempt to figure out what is going on here, I ended up refactoring some bits of the presencehandler code, so the first 4 commits here are non-functional refactors to move this code slightly closer to sanity. (There's still plenty to do here :/). Suggest reviewing individual commits.
Fixes (I hope) #7257.
|
| |
|
|
|
| |
I messed this up last time I tried (#7239 / e13c6c7).
|
|
|
| |
This is configured via the `redis` config options.
|
|
|
|
|
|
| |
Other parts of the code (such as the StreamChangeCache) assume that there will
not be multiple changes with the same stream id.
This code was introduced in #7024, and I hope this fixes #7206.
|
|
|
|
|
|
|
| |
The general idea here is to get rid of the type: ignore annotations on all of the current_token and update_function assignments, which would have caught #7290.
After a bit of experimentation, it seems like the least-awful way to do this is to pass the offending functions in as parameters to the Stream constructor. Unfortunately that means that the concrete implementations no longer have the same constructor signature as Stream itself, which means that it gets hard to correctly annotate STREAMS_MAP.
I've also introduced a couple of new types, to take out some duplication.
|
|
|
|
|
|
| |
Some of the query functions return generators rather than lists, so we can't
index into the result. Happily we already have a copy of the results.
(think this was introduced in #7024)
|
|
|
|
|
| |
`REPLICATE` is now a valid command, and it's nice if you can issue it from the
console without remembering to call it `REPLICATE ` with a trailing space.
|
|
|
|
|
|
| |
Separate `SimpleCommand` from `Command`, so that things which don't want to use
the `data` property don't have to, and thus fix the warnings PyCharm was giving
me about not calling `__init__` in the base class.
|
|
|
|
| |
We've ripped pretty much all of this out: let's remove the remains.
|
|
|
|
| |
Fixes a race between handling `POSITION` and `RDATA` commands. We do this by simply linearizing handling of them.
|
|
|
| |
This completes the merging of server and client command processing.
|
|
|
| |
The aim here is to move the command handling out of the TCP protocol classes and to also merge the client and server command handling (so that we can reuse them for redis protocol). This PR simply moves the client paths to the new `ReplicationCommandHandler`, a future PR will move the server paths too.
|
|
|
|
|
| |
This broke in a recent PR (#7024) and is no longer useful due to all
replication clients implicitly subscribing to all streams, so let's
just remove it.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Remove `conn_id` usage for UserSyncCommand.
Each tcp replication connection is assigned a "conn_id", which is used
to give an ID to a remotely connected worker. In a redis world, there
will no longer be a one to one mapping between connection and instance,
so instead we need to replace such usages with an ID generated by the
remote instances and included in the replicaiton commands.
This really only effects UserSyncCommand.
* Add CLEAR_USER_SYNCS command that is sent on shutdown.
This should help with the case where a synchrotron gets restarted
gracefully, rather than rely on 5 minute timeout.
|
|
|
| |
This changes the replication protocol so that the server does not send down `RDATA` for rows that happened before the client connected. Instead, the server will send a `POSITION` and clients then query the database (or master out of band) to get up to date.
|
|
|
|
|
| |
This just helps keep the rows closer to their streams, so that it's easier to
see what the format of each stream is.
|
|
|
|
|
|
| |
`groups` != `receipts`
Introduced in #6964
|
| |
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* Add 'device_lists_outbound_pokes' as extra table.
This makes sure we check all the relevant tables to get the current max
stream ID.
Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.
* Change device lists stream to have one row per id.
This will make it possible to process the streams more incrementally,
avoiding having to process large chunks at once.
* Change device list replication to match new semantics.
Instead of sending down batches of user ID/host tuples, send down a row
per entity (user ID or host).
* Newsfile
* Remove handling of multiple rows per ID
* Fix worker handling
* Comments from review
|
| | |
|
| |
| |
| |
| |
| | |
Instead of sending down batches of user ID/host tuples, send down a row
per entity (user ID or host).
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This makes sure we check all the relevant tables to get the current max
stream ID.
Currently not doing so isn't problematic as the max stream ID in
`device_lists_outbound_pokes` is the same as in `device_lists_stream`,
however that will change.
|
|/
|
|
|
|
|
| |
This is a bit fiddly because it all has to be done on one fell swoop:
* Wherever we create a new event, pass in the room version (and check it matches the format version)
* When we prune an event, use the room version of the unpruned event to create the pruned version.
* When we pass an event over the replication protocol, pass the room version over alongside it, and use it when deserialising the event again.
|
|
|
|
|
| |
When we get an invite over federation, store the room version in the rooms table.
The general idea here is that, when we pull the invite out again, we'll want to know what room_version it belongs to (so that we can later redact it if need be). So we need to store it somewhere...
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
|
|
|
|
| |
We just mark the fact that the cache may be stale in the database for
now.
|
|
|
| |
Currently if a worker invalidates a cache it will be streamed to master, which then didn't forward those to other workers.
|
| |
|
|
|
|
|
| |
This will be used to retry outbound transactions to a remote server if
we think it might have come back up.
|
|
|
|
|
|
|
|
|
|
| |
* Port synapse.replication.tcp to async/await
* Newsfile
* Correctly document type of on_<FOO> functions as async
* Don't be overenthusiastic with the asyncing....
|
|
|
|
|
|
|
| |
Currently we rely on `current_state_events` to figure out what rooms a
user was in and their last membership event in there. However, if the
server leaves the room then the table may be cleaned up and that
information is lost. So lets add a table that separately holds that
information.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|\ |
|
| | |
|
| | |
|
|\ \
| |/
|/| |
Fix caching devices for remote servers in worker.
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| | |
When the `/keys/query` API is hit on client_reader worker Synapse may
decide that it needs to resync some remote deivces. Usually this happens
on master, and then gets cached. However, that fails on workers and so
it falls back to fetching devices from remotes directly, which may in
turn fail if the remote is down.
|
| | |
|
|/ |
|
|
|
|
| |
since I found myself wonder how it works
|
|\ |
|
| |
| |
| | |
Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.
|
| | |
|
| | |
|
|\| |
|
| | |
|
|/ |
|
|\
| |
| |
| | |
erikj/refactor_stores
|
| |\ |
|
| | | |
|
| |/
|/|
| |
| |
| | |
This is in preparation for having multiple data stores that offer
different functionality, e.g. splitting out state or event storage.
|
| |
| |
| |
| | |
retrys (#5986)
|
|/ |
|
|
|
| |
Removes the `bind_email` and `bind_msisdn` parameters from the `/register` C/S API endpoint as per [MSC2140: Terms of Service for ISes and IMs](https://github.com/matrix-org/matrix-doc/pull/2140/files#diff-c03a26de5ac40fb532de19cb7fc2aaf7R107).
|
|
|
|
|
| |
Python will return a tuple whether there are parentheses around the returned values or not.
I'm just sick of my editor complaining about this all over the place :)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Propagate opentracing contexts across workers
Also includes some Convenience modifications to opentracing for servlets, notably:
- Add boolean to skip the whitelisting check on inject
extract methods. - useful when injecting into carriers
locally. Otherwise we'd always have to include our
own servername and whitelist our servername
- start_active_span_from_request instead of header
- Add boolean to decide whether to extract context
from a request to a servlet
|
|
|
|
| |
This reverts commit 3320aaab3a9bba3f5872371aba7053b41af9d0a0.
|
| |
|
|\
| |
| |
| |
| |
| |
| | |
Bugfixes
--------
- Fix a regression introduced in v1.2.0rc1 which led to incorrect labels on some prometheus metrics. ([\#5734](https://github.com/matrix-org/synapse/issues/5734))
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* Fix servlet metric names
Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Remove redundant check
* Cover all return paths
|
|/ |
|
|
|
|
|
|
|
|
| |
Nothing uses this now, so we can remove the dead code, and clean up the
API.
Since we're changing the shape of the return value anyway, we take the
opportunity to give the method a better name.
|
|
|
|
|
|
|
|
|
| |
This has never been documented, and I'm not sure it's ever been used outside
sytest.
It's quite a lot of poorly-maintained code, so I'd like to get rid of it.
For now I haven't removed the database table; I suggest we leave that for a
future clearout.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
since we're pulling everything out of KeyStore anyway, we may as well simplify
it.
|
|
|
|
| |
These have been unused since #4120, and with the demise of perspectives, it is
unlikely that they will ever be used again.
|
|
|
| |
Remove presence list support as per MSC 1819
|
|
|
|
|
|
|
|
|
|
| |
Hopefully this time we really will fix #4422.
We need to make sure that the cache on
`get_rooms_for_user_with_stream_ordering` is invalidated *before* the
SyncHandler is notified for the new events, and we can now do so reliably via
the `events` stream.
|
| |
|
|
|
|
| |
... as a precursor to combining it with the CurrentStateDelta stream.
|
|
|
|
|
| |
We're about to turn it straight into a JSON object anyway so building a
ROW_TYPE is a bit pointless, and reduces flexibility in the update_function.
|
|
|
|
| |
This will allow individual stream classes to override how a row is parsed.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
`__str__` depended on `self.addr`, which was absent from
ClientReplicationStreamProtocol, so attempting to call str on such an object
would raise an exception.
We can calculate the peer addr from the transport, so there is no need for addr
anyway.
|
|
|
|
|
| |
Make sure that they are sent correctly over the replication stream.
Fixes: #4898
|
|
|
| |
Setting this to 50 or so makes a bunch of sytests fail in worker mode.
|
| |
|
|\
| |
| | |
Support batch updates in the worker sender
|
| | |
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* Rate-limiting for registration
* Add unit test for registration rate limiting
* Add config parameters for rate limiting on auth endpoints
* Doc
* Fix doc of rate limiting function
Co-Authored-By: babolivier <contact@brendanabolivier.com>
* Incorporate review
* Fix config parsing
* Fix linting errors
* Set default config for auth rate limiting
* Fix tests
* Add changelog
* Advance reactor instead of mocked clock
* Move parameters to registration specific config and give them more sensible default values
* Remove unused config options
* Don't mock the rate limiter un MAU tests
* Rename _register_with_store into register_with_store
* Make CI happy
* Remove unused import
* Update sample config
* Fix ratelimiting test for py2
* Add non-guest test
|
|/ |
|
| |
|
|\
| |
| | |
Fix tightloop over connecting to replication server
|
| | |
|
| |
| |
| |
| |
| | |
Otherwise if you have many workers they can easily take out master with
their connection attempts
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If the client failed to process incoming commands during the initial set
up of the replication connection it would immediately disconnect and
reconnect, resulting in a tightloop.
This can happen, for example, when subscribing to a stream that has a
row that is too long in the backlog.
The fix here is to not consider the connection successfully set up until
the client has succesfully subscribed and caught up with the streams.
This ensures that the retry logic timers aren't reset until then,
meaning that if an error does happen during start up the client will
continue backing off before retrying again.
|
|/ |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
* Move RegistrationHandler init to HomeServer
* Move post registration actions to RegistrationHandler
* Add post regisration replication endpoint
* Newsfile
|
|
|
|
|
|
|
|
|
|
| |
Currently whenever the current state changes in a room invalidate a lot
of caches, which cause *a lot* of traffic over replication. Instead,
lets batch up all those invalidations and send a single poke down
the replication streams.
Hopefully this will reduce load on the master process by substantially
reducing traffic.
|
| |
|
|
|
|
|
|
|
|
| |
This allows registration to be handled by a worker, though the actual
write to the database still happens on master.
Note: due to the in-memory session map all registration requests must be
handled by the same worker.
|
|
|
|
|
|
|
|
|
| |
* Fix replication for room v3
We were not correctly quoting the path fragments over http replication,
which meant that it exploded when the event IDs had a slash in them
* Newsfile
|
|
|
|
| |
This bug was introduced in PR #4470, commit 678a92cb56d547dcadffa723e29b4855a27d0901
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
`conn_id` gets set to a random string, and so we end up filling up
prometheus with tonnes of data series, which is bad.
|
|\
| |
| | |
Fix logging bug in EDU handling over replication
|
| | |
|
|/
|
|
|
|
|
|
|
|
| |
Run the handlers for replication commands as background processes. This should
improve the visibility in our metrics, and reduce the number of "running db
transaction from sentinel context" warnings.
Ideally it means converting the things that fire off deferreds into the night
into things that actually return a Deferred when they are done. I've made a bit
of a stab at this, but it will probably be leaky.
|
|
|
|
| |
This involves renaming _persist_events to be a public function.
|
| |
|
| |
|
| |
|
|\
| |
| |
| | |
erikj/split_federation
|