| Commit message (Collapse) | Author | Age | Files | Lines |
... | |
|
|
|
|
|
| |
* type checking fixes
* changelog
|
|
|
|
|
|
|
|
|
| |
We have set the max retry interval to a value larger than a postgres or
sqlite int can hold, which caused exceptions when updating the
destinations table.
To fix postgres we need to change the column to a bigint, and for sqlite
we lower the max interval to 2**62 (which is still incredibly long).
|
|\ |
|
| |
| |
| |
| | |
Track the time that a server started failing at, for general analysis purposes.
|
| |
| |
| |
| |
| |
| | |
Essentially the intention here is to end up blacklisting servers which never
respond to federation requests.
Fixes https://github.com/matrix-org/synapse/issues/5113.
|
| |
| |
| |
| | |
This was intended to introduce an element of jitter; instead it gave you a
30/60 chance of resetting to zero.
|
| |
| |
| |
| |
| |
| |
| | |
This is a redo of https://github.com/matrix-org/synapse/pull/5897 but with `id_access_token` accepted.
Implements [MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134) plus Identity Service v2 authentication ala [MSC2140](https://github.com/matrix-org/matrix-doc/pull/2140).
Identity lookup-related functions were also moved from `RoomMemberHandler` to `IdentityHandler`.
|
| |
| |
| |
| | |
* remove some unused code
* make things which were constants into constants for efficiency and clarity
|
| |
| |
| |
| |
| | |
This reverts commit 71fc04069a5770a204c3514e0237d7374df257a8.
This broke 3PID invites as #5892 was required for it to work correctly.
|
| |
| |
| |
| |
| |
| |
| | |
Fixes https://github.com/matrix-org/synapse/issues/5861
Adds support for the v2 lookup API as defined in [MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134). Currently this is only used for 3PID invites.
Sytest PR: https://github.com/matrix-org/sytest/pull/679
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This gives a bit of a grace period where we can attempt to refetch a
remote `well-known`, while still using the cached result if that fails.
Hopefully this will make the well-known resolution a bit more torelant
of failures, rather than it immediately treating failures as "no result"
and caching that for an hour.
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
Fixes a bug where the default attribute maps were prioritised over
user-specified ones, resulting in incorrect mappings.
The problem is that if you call SPConfig.load() multiple times, it adds new
attribute mappers to a list. So by calling it with the default config first,
and then the user-specified config, we would always get the default mappers
before the user-specified mappers.
To solve this, let's merge the config dicts first, and then pass them to
SPConfig.
|
| |
|
| |
|
|
|
|
|
|
|
| |
There was some inconsistent behaviour in the caching layer around how
exceptions were handled - particularly synchronously-thrown ones.
This seems to be most easily handled by pushing the creation of
ObservableDeferreds down from CacheDescriptor to the Cache.
|
|
|
|
|
|
| |
* Add a prometheus metric for active cache lookups.
* changelog
|
| |
|
|
|
|
|
|
|
|
|
| |
The version of a module isn't going to change over the lifetime of the
process (assuming no funky hot reloading is going on, which it isn't),
so let's just cache the result to avoid spawning lots of git
subprocesses.
Fixes #5672.
|
|
|
|
|
|
|
| |
- Put the default window_size back to 1000ms (broken by #5181)
- Make the `rc_federation` config actually do something
- fix an off-by-one error in the 'concurrent' limit
- Avoid creating an unused `_PerHostRatelimiter` object for every single
incoming request
|
|
|
|
|
|
|
|
| |
(#5617)
* Improve the backwards compatibility re-exports of synapse.logging.context.
* reexport logformatter too
|
| |
|
|
|
|
|
|
|
|
| |
* Fix 'utime went backwards' errors on daemonization.
Fixes #5608
* remove spurious debug
|
|
|
|
| |
Fixes #5602, #5603
|
| |
|
|
|
|
|
|
|
| |
Closes #4583
Does slightly less than #5045, which prevented a room from being upgraded multiple times, one after another. This PR still allows that, but just prevents two from happening at the same time.
Mostly just to mitigate the fact that servers are slow and it can take a moment for the room upgrade to actually complete. We don't want people sending another request to upgrade the room when really they just thought the first didn't go through.
|
|
|
|
|
| |
Sentry will catch the errors if they happen, so that should be good enough, and
woun't make things explode if we hit the error condition.
|
|\ |
|
| | |
|
|/
|
|
| |
Check that our clocks go forward.
|
|
|
| |
Fixes a regression introduced in #5335.
|
| |
|
| |
|
| |
|
|\
| |
| | |
Allow client event serialization to be async
|
| |
| |
| | |
Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
|
| | |
|
|/ |
|
|\ |
|
| | |
|
| | |
|
|/
|
| |
Avoid sending syntax errors from the manhole to sentry.
|
| |
|
|\
| |
| | |
Implement workaround for login error.
|
| |
| |
| |
| | |
Signed-off-by: Robert Jacob <xperimental@solidproject.de>
|
|/ |
|
| |
|
|
|
|
| |
This is a bit of a half-assed effort at fixing https://github.com/matrix-org/synapse/issues/4252. Fundamentally the right answer is to drop support for Python 2.
|
|\
| |
| |
| | |
erikj/alias_disallow_list
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Wrap calls to deferToThread() in a thing which uses a child logcontext to
attribute CPU usage to the right request.
While we're in the area, remove the logcontext_tracer stuff, which is never
used, and afaik doesn't work.
Fixes #4064
|
| |
| |
| |
| | |
on py3) (#4068)
|
| | |
|
| | |
|
| | |
|
|/ |
|
| |
|
| |
|
|
|
|
|
|
|
|
| |
If a looping call function errors, then it kills the loop entirely.
Currently it throws away the exception logs, so we should make it
actually log them.
Fixes #3929
|
| |
|
| |
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
transactions (#3959)
when processing incoming transactions, it can be hard to see what's going on,
because we process a bunch of stuff in parallel, and because we may end up
recursively working our way through a chain of three or four events.
This commit creates a way to use logcontexts to add the relevant event ids to
the log lines.
|
|\| |
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It used to try and produce an estimate, which was sometimes negative.
This caused metrics to be sad, so lets always just calculate it from
scratch.
(This appears to have been a longstanding bug, but one which has been made more
of a problem by #3932 and #3933).
(This was originally done by Erik as part of #3933. I'm cherry-picking it
because really it's a fix in its own right)
|
| |
| |
| |
| |
| |
| | |
It used to try and produce an estimate, which was sometimes negative.
This caused metrics to be sad, so lets always just calculate it from
scratch.
|
| |
| |
| |
| | |
Hopefully helps with #3931
|
|/ |
|
|
|
|
|
|
|
|
| |
ExpiringCache required that `start()` be called before it would actually
start expiring entries. A number of places didn't do that.
This PR removes `start` from ExpiringCache, and automatically starts
backround reaping process on creation instead.
|
|
|
|
|
|
|
|
|
|
| |
Let's try to rationalise the logging that happens when we are processing an
incoming transaction, to make it easier to figure out what is going wrong when
they take ages. In particular:
- make everything start with a [room_id event_id] prefix
- make sure we log a warning when catching exceptions rather than just turning
them into other, more cryptic, exceptions.
|
| |
|
| |
|
|
|
|
|
|
|
| |
The existing deferred timeout helper function (and the one into twisted)
suffer from a bug when a deferred's canceller throws an exception, #3842.
The new helper function doesn't suffer from this problem.
|
|
|
|
|
| |
Turns out deferred.cancel sometimes throws, so we do that last to ensure
that we always do resolve the new deferred.
|
|
|
|
| |
This is an attempt to mitigate #3842 by adding yet-another-timeout
|
| |
|
|
|
|
|
| |
Newer versions of openssh client refuse to connect to the old key due to
its length.
|
|
|
|
|
| |
This fixes bugs introduced in #3700, by making sure that we behave sanely
when an incoming connection is closed before the headers are read.
|
|
|
|
|
| |
Make the logcontext filter not explode if it somehow ends up with a logcontext
of None, since that infinite-loops the whole logging system.
|
| |
|
|\ |
|
| |
| |
| |
| |
| |
| | |
Turns out that cancellation of inlineDeferreds didn't really work properly
until Twisted 18.7. This commit refactors Linearizer.queue to avoid
inlineCallbacks.
|
|/ |
|
| |
|
| |
|
|
|
|
|
| |
Because it was complicated and annoyed me. I suspect this will be more
efficient too.
|
|
|
|
|
|
|
|
|
| |
It turns out that looping_call does check the deferred returned by its
callback, and (at least in the case of client_ips), we were relying on this,
and I broke it in #3604.
Update run_as_background_process to return the deferred, and make sure we
return it to clock.looping_call.
|
| |
|
|
|
|
|
| |
Linearizer was effectively a Limiter with max_count=1, so rather than
maintaining two sets of code, let's combine them.
|
|
|
|
|
| |
* give them names, to improve logging
* use a deque rather than a list for efficiency
|
|
|
|
| |
Fixes #3570
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This is more involved than it might otherwise be, because the current
implementation just drops its logcontexts and runs everything in the sentinel
context.
It turns out that we aren't actually using a bunch of the functionality here
(notably suppress_failures and the fact that Distributor.fire returns a
deferred), so the easiest way to fix this is actually by simplifying a bunch of
code.
|
|
|
|
|
|
|
|
| |
This fixes #3518, and ensures that we get useful logs and metrics for lots of
things that happen in the background.
(There are certainly more things that happen in the background; these are just
the common ones I've found running a single-process synapse locally).
|
| |
|
|
|
|
|
|
|
|
| |
The get_entities_changed function was changed to return all changed
entities since the given stream position, rather than only those changed
from a given list of entities. This resulted in the function incorrectly
returning large numbers of entities that, for example, caused large
increases in database usage.
|
|\
| |
| | |
Don't return unknown entities in get_entities_changed
|
| |
| |
| |
| |
| |
| |
| |
| | |
The stream cache keeps track of all entities that have changed since
a particular stream position, so get_entities_changed does not need to
return unknown entites when given a larger stream position.
This makes it consistent with the behaviour of has_entity_changed.
|
|/
|
|
|
|
|
|
| |
popitem removes the *most recent* item by default [1]. We want the oldest.
Fixes #3524
[1]: https://docs.python.org/2/library/collections.html#collections.OrderedDict.popitem
|
|
|
|
|
|
|
|
|
|
|
| |
This line shows up as about 5% of cpu time on a synchrotron:
not_known_entities = set(entities) - set(self._entity_to_key)
Presumably the problem here is that _entity_to_key can be largeish, and
building a set for its keys every time this function is called is slow.
Here we rewrite the logic to avoid building so many sets.
|
|
|
|
|
| |
Let's try to include time spent in the DB threads in the per-request/block cpu
usage metrics.
|
|
|
|
|
| |
Factor out the resource usage tracking out to a separate object, which can be
passed around and copied independently of the logcontext itself.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
| |
When _get_state_for_groups is given a wildcard filter, just do a complete
lookup. Hopefully this will give us the best of both worlds by not filling up
the ram if we only need one or two keys, but also making the cache still work
for the federation reader usecase.
|
|\
| |
| | |
Log number of events fetched from DB
|
| |
| |
| |
| | |
so that we can stub it for the sentinel and not have a billion failing UTs
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When we finish processing a request, log the number of events we fetched from
the database to handle it.
[I'm trying to figure out which requests are responsible for large amounts of
event cache churn. It may turn out to be more helpful to add counts to the
prometheus per-request/block metrics, but that is an extension to this code
anyway.]
|
|/ |
|
| |
|
| |
|
| |
|
|
|
|
| |
they're not meant to be lazy (#3307)
|
|\
| |
| | |
remaining isintance fixes
|
| | |
|
| | |
|
| |
| |
| |
| | |
Signed-off-by: Adrian Tschira <nota@notafile.com>
|
| | |
|
| | |
|
| | |
|
|\| |
|
| |\
| | |
| | | |
Misc Python3 fixes
|
| | |
| | |
| | |
| | | |
Signed-off-by: Adrian Tschira <nota@notafile.com>
|
| |\ \
| | | |
| | | | |
Add batch_iter to utils
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There's a frequent idiom I noticed where an iterable is split up into a
number of chunks/batches. Unfortunately that method does not work with
iterators like dict.keys() in python3. This implementation works with
iterators.
Signed-off-by: Adrian Tschira <nota@notafile.com>
|
| | | |
|
| | | |
|
| | | |
|
|\| | |
|
| | | |
|
| |/ |
|
|/ |
|
|\ |
|
| | |
|
| |\ |
|
| | |
| | |
| | |
| | | |
This was introduced in 4f2f5171
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
So, it turns out that if you have a first `Deferred` `D1`, you can add a
callback which returns another `Deferred` `D2`, and `D2` must then complete
before any further callbacks on `D1` will execute (and later callbacks on `D1`
get the *result* of `D2` rather than `D2` itself).
So, `D1` might have `called=True` (as in, it has started running its
callbacks), but any new callbacks added to `D1` won't get run until `D2`
completes - so if you `yield D1` in an `inlineCallbacks` function, your `yield`
will 'block'.
In conclusion: some of our assumptions in `logcontext` were invalid. We need to
make sure that we don't optimise out the logcontext juggling when this
situation happens. Fortunately, it is easy to detect by checking `D1.paused`.
|
| |\
| | |
| | |
| | |
| | | |
matrix-org/rav/run_in_background_exception_handling
Trap exceptions thrown within run_in_background
|
| | |
| | |
| | |
| | |
| | | |
Turn any exceptions that get thrown synchronously within run_in_background into
Failures instead.
|
| |\ \ |
|
| | |\ \
| | | | |
| | | | | |
Replace stringIO imports with six
|
| | | | | |
|
| | |\ \ \
| | | | | |
| | | | | | |
more bytes strings
|
| | | |/ /
| | | | |
| | | | |
| | | | | |
Signed-off-by: Adrian Tschira <nota@notafile.com>
|
| | |\ \ \
| | | |/ /
| | |/| | |
Use run_in_background in preference to preserve_fn
|
| | | |\ \ |
|
| | | | |/
| | | |/|
| | | | |
| | | | |
| | | | |
| | | | | |
While I was going through uses of preserve_fn for other PRs, I converted places
which only use the wrapped function once to use run_in_background, to avoid
creating the function object.
|
| |/ / /
| | | |
| | | |
| | | |
| | | |
| | | | |
plus a bonus next()
Signed-off-by: Adrian Tschira <nota@notafile.com>
|
| |\ \ \
| | | |/
| | |/| |
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There were a bunch of places where we fire off a process to happen in the
background, but don't have any exception handling on it - instead relying on
the unhandled error being logged when the relevent deferred gets
garbage-collected.
This is unsatisfactory for a number of reasons:
- logging on garbage collection is best-effort and may happen some time after
the error, if at all
- it can be hard to figure out where the error actually happened.
- it is logged as a scary CRITICAL error which (a) I always forget to grep for
and (b) it's not really CRITICAL if a background process we don't care about
fails.
So this is an attempt to add exception handling to everything we fire off into
the background.
|
| | |
| | |
| | |
| | | |
Twisted 16.0 doesn't have addTimeout, so let's backport it.
|
| |/
| |
| |
| | |
This doesn't feel like a wheel we need to reinvent.
|
| |\
| | |
| | | |
add __bool__ alias to __nonzero__ methods
|
| | |
| | |
| | |
| | | |
Signed-off-by: Adrian Tschira <nota@notafile.com>
|
| |\ \
| | | |
| | | | |
Replace Queue with six.moves.queue
|
| | |/
| | |
| | |
| | |
| | |
| | | |
and a six.range change which I missed the last time
Signed-off-by: Adrian Tschira <nota@notafile.com>
|
| |\ \
| | |/
| |/| |
Refactor ResponseCache usage
|
| | |
| | |
| | |
| | |
| | | |
Turns out that ObservableDeferred.observe doesn't return a deferred if the
result is already completed. Fix handling and improve documentation.
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Adds a `.wrap` method to ResponseCache which wraps up the boilerplate of a
(get, set) pair, and then use it throughout the codebase.
This will be largely non-functional, but does include the following functional
changes:
* federation_server.on_context_state_request: drops use of _server_linearizer
which looked redundant and could cause incorrect cache misses by yielding
between the get and the set.
* RoomListHandler.get_remote_public_room_list(): fixes logcontext leaks
* the wrap function includes some logging. I'm hoping this won't be too noisy
on production.
|
| |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit 9fbe70a7dc3afabfdac176ba1f4be32dd44602aa.
It turns out that sortedcontainers.SortedDict is not an exact match for
blist.sorteddict; in particular, `popitem()` removes things from the opposite
end of the dict.
This is trivial to fix, but I want to add some unit tests, and potentially some
more thought about it, before we do so.
|
| |\
| | |
| | | |
Add metrics for ResponseCache
|
| | | |
|
| |\ \
| | | |
| | | | |
Document the behaviour of ResponseCache
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
it looks like everything that uses ResponseCache expects to have to
`make_deferred_yieldable` its results. It's debatable whether that is the best
approach, but let's document it for now to avoid further confusion.
|
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | | |
This commit drop-in replaces blist with SortedContainers. They are
written in pure python so work with pypy, but perform as good as
native implementations, at least in a couple benchmarks:
http://www.grantjenks.com/docs/sortedcontainers/performance.html
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We aren't ready to release this yet, so I'm reverting it for now.
This reverts commit d1679a4ed7947b0814e0f2af9b888a16c588f1a1, reversing
changes made to e089100c6231541c446e37e157dec8feed02d283.
|
| |\ \
| | | |
| | | | |
Improve database cache performance
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Fixes an issue where a cache invalidation would invalidate *all* pending
entries, rather than just the entry that we intended to invalidate.
|
| |/ / |
|
| |/
| |
| |
| |
| | |
using json.dumps with custom options requires us to create a new JSONEncoder on
each call. It's more efficient to create one upfront and reuse it.
|
| |
| |
| |
| | |
fixes https://github.com/matrix-org/synapse/issues/2043 and https://github.com/matrix-org/synapse/issues/2029
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The state cache bases its size on the sum of the size of entries. The
size of the entry is calculated once on insertion, so it is important
that the size of entries does not change.
The DictionaryCache modified the entries size, which caused the state
cache to incorrectly think it was smaller than it actually was.
|
|/
|
|
|
| |
I think we've now fixed enough of these that the rest can be logged at
warning.
|
|
|
|
|
| |
It annoys me that we create temporary function objects when there's really no
need for it. Let's factor the gubbins out of preserve_fn and start using it.
|
|
|
|
|
| |
... because (a) it's actually simpler (b) it might be marginally more
performant?
|
| |
|
|
|
|
|
|
| |
Add federation_domain_whitelist
gives a way to restrict which domains your HS is allowed to federate with.
useful mainly for gracefully preventing a private but internet-connected HS from trying to federate to the wider public Matrix network
|
|\
| |
| | |
add registrations_require_3pid and allow_local_3pids
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* [ ] split config options into allowed_local_3pids and registrations_require_3pid
* [ ] simplify and comment logic for picking registration flows
* [ ] fix docstring and move check_3pid_allowed into a new util module
* [ ] use check_3pid_allowed everywhere
@erikjohnston PTAL
|
|\ \
| | |
| | | |
Add decent impl of a FileConsumer
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Twisted core doesn't have a general purpose one, so we need to write one
ourselves.
Features:
- All writing happens in background thread
- Supports both push and pull producers
- Push producers get paused if the consumer falls behind
|
| |/
|/|
| |
| | |
... which I introduced in #2785
|
| |
| |
| |
| |
| |
| | |
For each request, track the amount of time spent waiting for a db
connection. This entails adding it to the LoggingContext and we may as well add
metrics for it while we are passing.
|
| |
| |
| |
| | |
... to reduce the amount of floating-point foo we do.
|
|/
|
|
|
|
|
|
| |
It turns out that the only thing we use the __dict__ of LoggingContext for is
`request`, and given we create lots of LoggingContexts and then copy them every
time we do a db transaction or log line, using the __dict__ seems a bit
redundant. Let's try to optimise things by making the request attribute
explicit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In order to circumvent the number of duplicate foo:count metrics increasing
without bounds, it's time for a rearrangement.
The following are all deprecated, and replaced with synapse_util_metrics_block_count:
synapse_util_metrics_block_timer:count
synapse_util_metrics_block_ru_utime:count
synapse_util_metrics_block_ru_stime:count
synapse_util_metrics_block_db_txn_count:count
synapse_util_metrics_block_db_txn_duration:count
The following are all deprecated, and replaced with synapse_http_server_response_count:
synapse_http_server_requests
synapse_http_server_response_time:count
synapse_http_server_response_ru_utime:count
synapse_http_server_response_ru_stime:count
synapse_http_server_response_db_txn_count:count
synapse_http_server_response_db_txn_duration:count
The following are renamed (the old metrics are kept for now, but deprecated):
synapse_util_metrics_block_timer:total ->
synapse_util_metrics_block_time_seconds
synapse_util_metrics_block_ru_utime:total ->
synapse_util_metrics_block_ru_utime_seconds
synapse_util_metrics_block_ru_stime:total ->
synapse_util_metrics_block_ru_stime_seconds
synapse_util_metrics_block_db_txn_count:total ->
synapse_util_metrics_block_db_txn_count
synapse_util_metrics_block_db_txn_duration:total ->
synapse_util_metrics_block_db_txn_duration_seconds
synapse_http_server_response_time:total ->
synapse_http_server_response_time_seconds
synapse_http_server_response_ru_utime:total ->
synapse_http_server_response_ru_utime_seconds
synapse_http_server_response_ru_stime:total ->
synapse_http_server_response_ru_stime_seconds
synapse_http_server_response_db_txn_count:total ->
synapse_http_server_response_db_txn_count
synapse_http_server_response_db_txn_duration:total
synapse_http_server_response_db_txn_duration_seconds
|
| |
|
|
|
|
|
| |
Both of these functions ae known to leak logcontexts. Replace the remaining
calls to them and kill them off.
|
|
|
|
|
|
|
|
|
| |
Add some logging to the Limiter in a similar spirit to the Linearizer, to help
debug issues.
Also fix a logcontext leak.
Also refactor slightly to avoid throwing exceptions.
|
|
|
|
| |
E741 says "do not use variables named ‘l’, ‘O’, or ‘I’".
|
|
|
|
| |
what could possibly go wrong
|
|
|
|
|
|
|
|
| |
* don't use preserve_context_over_deferred, which is known broken.
* remove a redundant preserve_fn.
* add/improve some comments
|
|\
| |
| | |
Fix stackoverflow and logcontexts from linearizer
|
| |
| |
| |
| |
| |
| |
| | |
1. make it not blow out the stack when there are more than 50 things waiting
for a lock. Fixes https://github.com/matrix-org/synapse/issues/2505.
2. Make it not mess up the log contexts.
|
| |
| |
| |
| | |
make sure we have the relevant fields before we try to log them.
|
|/
|
|
|
|
|
|
|
| |
This is a bit of an experimental change at this point; the idea is to see if it
helps us track down where our stack overflows are coming from by logging the
stack when the exception was caught and turned into a Failure. (We'll also need
https://github.com/richvdh/twisted/commit/edf27044200e74680ea67c525768e36dc9d9af2b).
If we deploy this, we'll be able to enable it via the log config yaml.
|
|
|
|
| |
Avoid preserve_context_over_deferred, which is broken.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This is because pruning them was a significant performance drain on
matrix.org
|
| |
|
|
|
|
|
|
|
|
| |
Occaisonally has_any_entity_changed would throw the error: "Set changed
size during iteration" when taking the max of the `sorteddict`. While
its uncertain how that happens, its quite inefficient to iterate over
the entire dict anyway so we change to using the more traditional
`bisect_*` functions.
|
| |
|
| |
|
|
|
|
|
|
| |
We update the normal cache descriptors to handle caches with a single
argument specially so that the key wasn't a 1-tuple. We need to update
the cache list to be aware of this.
|
|
|
|
|
|
|
|
|
| |
Most of the time was spent copying a dict to filter out sentinel values
that indicated that keys did not exist in the dict. The sentinel values
were added to ensure that we cached the non-existence of keys.
By updating DictionaryCache to keep track of which keys were known to
not exist itself we can remove a dictionary copy.
|
|
|
|
|
| |
Otherwise the hit ration of plain get_events gets completely skewed by
calls to get_joined_users* functions.
|
| |
|
|
|
|
|
|
| |
Call `super` correctly, so that we correctly initialise the `errcode` field.
Fixes https://github.com/matrix-org/synapse/issues/2179.
|
|
|
|
|
|
|
|
| |
The _get_joined_users_from_context cache stores a mapping from user_id
to avatar_url and display_name. Instead of storing those in a dict,
store them in a namedtuple as that uses much less memory.
We also try converting the string to ascii to further reduce the size.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the cache descriptors store deferreds rather than raw values,
this is a simple way of triggering only one database hit and sharing the
result if two callers attempt to get the same value.
However, there are a few caches that simply store a mapping from string
to string (or int). These caches can have a large number of entries,
under the assumption that each entry is small. However, the size of a
deferred (specifically the size of ObservableDeferred) is signigicantly
larger than that of the raw value, 2kb vs 32b.
This PR therefore changes the cache descriptors to store the raw values
rather than the deferreds.
As a side effect cached storage function now either return a deferred or
the actual value, as the cached list decriptor already does. This is
fine as we always end up just yield'ing on the returned value
eventually, which handles that case correctly.
|
| |
|
|
|
|
|
| |
`preserve_fn` is no longer used as a decorator anywhere, so we can safely fix a
fixme therein.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
| |
This is because getcallargs recomputes the getargspec, amongst other
things, which we don't need to do as its already been done
|
| |
|
|
|
|
|
|
|
| |
The cache wrappers had a habit of leaking the logcontext into the reactor while
the lookup function was running, and then not restoring it correctly when the
lookup function had completed. It's all the fault of
`preserve_context_over_{fn,deferred}` which are basically a bit broken.
|
|\
| |
| | |
push federation retry limiter down to matrixfederationclient
|
| |
| |
| |
| |
| | |
Add a param to the federation client which lets us ignore historical backoff
data for federation queries, and set it for a handful of operations.
|
| |
| |
| |
| |
| | |
rather than having to instrument everywhere we make a federation call,
make the MatrixFederationHttpClient manage the retry limiter.
|
|\ \
| | |
| | | |
Fix time_bound_deferred to throw the right exception
|
| |/
| |
| |
| |
| |
| | |
Due to a failure to instantiate DeferredTimedOutError, time_bound_deferred
would throw a CancelledError when the deferred timed out, which was rather
confusing.
|
| |
| |
| |
| |
| | |
Use preserve_fn to correctly manage the logcontexts around things we don't want
to yield on.
|
|/
|
|
|
|
|
|
|
| |
The `@cached` decorator on `KeyStore._get_server_verify_key` was missing
its `num_args` parameter, which meant that it was returning the wrong key for
any server which had more than one recorded key.
By way of a fix, change the default for `num_args` to be *all* arguments. To
implement that, factor out a common base class for `CacheDescriptor` and `CacheListDescriptor`.
|
|\
| |
| | |
Logcontext docs
|
| | |
|
|/
|
|
|
|
|
|
| |
Fix a bug in ``logcontext.preserve_fn`` which made it leak context into the
reactor, and add a test for it.
Also, get rid of ``logcontext.reset_context_after_deferred``, which tried to do
the same thing but had its own, different, set of bugs.
|
|\
| |
| | |
Queue up federation PDUs while a room join is in progress
|
| |
| |
| |
| |
| | |
to correctly reset the context when we fire off a deferred we aren't going to
wait for.
|