| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
| |
Let's try to include time spent in the DB threads in the per-request/block cpu
usage metrics.
|
|
|
|
|
| |
Factor out the resource usage tracking out to a separate object, which can be
passed around and copied independently of the logcontext itself.
|
| |
|
|
|
|
| |
so that we can stub it for the sentinel and not have a billion failing UTs
|
|
|
|
|
|
|
|
|
|
| |
When we finish processing a request, log the number of events we fetched from
the database to handle it.
[I'm trying to figure out which requests are responsible for large amounts of
event cache churn. It may turn out to be more helpful to add counts to the
prometheus per-request/block metrics, but that is an extension to this code
anyway.]
|
| |
|
| |
|
| |
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
So, it turns out that if you have a first `Deferred` `D1`, you can add a
callback which returns another `Deferred` `D2`, and `D2` must then complete
before any further callbacks on `D1` will execute (and later callbacks on `D1`
get the *result* of `D2` rather than `D2` itself).
So, `D1` might have `called=True` (as in, it has started running its
callbacks), but any new callbacks added to `D1` won't get run until `D2`
completes - so if you `yield D1` in an `inlineCallbacks` function, your `yield`
will 'block'.
In conclusion: some of our assumptions in `logcontext` were invalid. We need to
make sure that we don't optimise out the logcontext juggling when this
situation happens. Fortunately, it is easy to detect by checking `D1.paused`.
|
| |\
| | |
| | |
| | |
| | | |
matrix-org/rav/run_in_background_exception_handling
Trap exceptions thrown within run_in_background
|
| | |
| | |
| | |
| | |
| | | |
Turn any exceptions that get thrown synchronously within run_in_background into
Failures instead.
|
| |\ \ |
|
| | |/
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There were a bunch of places where we fire off a process to happen in the
background, but don't have any exception handling on it - instead relying on
the unhandled error being logged when the relevent deferred gets
garbage-collected.
This is unsatisfactory for a number of reasons:
- logging on garbage collection is best-effort and may happen some time after
the error, if at all
- it can be hard to figure out where the error actually happened.
- it is logged as a scary CRITICAL error which (a) I always forget to grep for
and (b) it's not really CRITICAL if a background process we don't care about
fails.
So this is an attempt to add exception handling to everything we fire off into
the background.
|
| |/
| |
| |
| |
| |
| | |
While I was going through uses of preserve_fn for other PRs, I converted places
which only use the wrapped function once to use run_in_background, to avoid
creating the function object.
|
| |
| |
| |
| | |
Signed-off-by: Adrian Tschira <nota@notafile.com>
|
|/
|
|
|
| |
I think we've now fixed enough of these that the rest can be logged at
warning.
|
|
|
|
|
| |
It annoys me that we create temporary function objects when there's really no
need for it. Let's factor the gubbins out of preserve_fn and start using it.
|
|
|
|
|
| |
... because (a) it's actually simpler (b) it might be marginally more
performant?
|
|
|
|
|
|
| |
For each request, track the amount of time spent waiting for a db
connection. This entails adding it to the LoggingContext and we may as well add
metrics for it while we are passing.
|
|
|
|
| |
... to reduce the amount of floating-point foo we do.
|
|
|
|
|
|
|
|
| |
It turns out that the only thing we use the __dict__ of LoggingContext for is
`request`, and given we create lots of LoggingContexts and then copy them every
time we do a db transaction or log line, using the __dict__ seems a bit
redundant. Let's try to optimise things by making the request attribute
explicit.
|
| |
|
|
|
|
|
| |
Both of these functions ae known to leak logcontexts. Replace the remaining
calls to them and kill them off.
|
|
|
|
| |
what could possibly go wrong
|
|
|
|
|
| |
`preserve_fn` is no longer used as a decorator anywhere, so we can safely fix a
fixme therein.
|
|
|
|
|
|
|
| |
The cache wrappers had a habit of leaking the logcontext into the reactor while
the lookup function was running, and then not restoring it correctly when the
lookup function had completed. It's all the fault of
`preserve_context_over_{fn,deferred}` which are basically a bit broken.
|
|\
| |
| | |
Logcontext docs
|
| | |
|
|/
|
|
|
|
|
|
| |
Fix a bug in ``logcontext.preserve_fn`` which made it leak context into the
reactor, and add a test for it.
Also, get rid of ``logcontext.reset_context_after_deferred``, which tried to do
the same thing but had its own, different, set of bugs.
|
|
|
|
|
| |
to correctly reset the context when we fire off a deferred we aren't going to
wait for.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
and track the number of transactions that request started.
|
|
|
|
|
|
| |
Move the resource tracking inside that setter so that it is easier
to make sure that the resource tracking isn't double counting the
resource usage.
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
|
| |
wrong and DEBUG if the context is missing
|
|
|
|
| |
clients shouldn't cause ERROR level logging. Fix sql logging to use 'repr' rather than 'str'
|
|
|
|
| |
after it has been stashed to reduce potential for confusion
|
| |
|
|
|