From ca15a1d699c4a63d0149de0a14e7d26403efbcce Mon Sep 17 00:00:00 2001 From: babolivier Date: Thu, 15 Jul 2021 10:43:25 +0000 Subject: deploy: 5ecad4e7a57610baa55f64f1389b92d483716155 --- develop/log_contexts.html | 306 +++++++++++++++------------------------------- 1 file changed, 101 insertions(+), 205 deletions(-) (limited to 'develop/log_contexts.html') diff --git a/develop/log_contexts.html b/develop/log_contexts.html index f1cdfc489c..d5db339955 100644 --- a/develop/log_contexts.html +++ b/develop/log_contexts.html @@ -194,10 +194,13 @@ activity.

The synapse.logging.context module provides a facilities for managing the current log context (as well as providing the LoggingContextFilter class).

-

Deferreds make the whole thing complicated, so this document describes +

Asynchronous functions make the whole thing complicated, so this document describes how it all works, and how to write code which follows the rules.

-

Logcontexts without Deferreds

-

In the absence of any Deferred voodoo, things are simple enough. As with +

In this document, "awaitable" refers to any object which can be awaited. In the context of +Synapse, that normally means either a coroutine or a Twisted +Deferred.

+

Logcontexts without asynchronous code

+

In the absence of any asynchronous voodoo, things are simple enough. As with any code of this nature, the rule is that our function should leave things as it found them:

from synapse.logging import context         # omitted from future snippets
@@ -227,39 +230,36 @@ can be written much more succinctly as:

def do_request_handling(): logger.debug("phew")
-

Using logcontexts with Deferreds

-

Deferreds --- and in particular, defer.inlineCallbacks --- break the -linear flow of code so that there is no longer a single entry point -where we should set the logcontext and a single exit point where we -should remove it.

+

Using logcontexts with awaitables

+

Awaitables break the linear flow of code so that there is no longer a single entry point +where we should set the logcontext and a single exit point where we should remove it.

Consider the example above, where do_request_handling needs to do some -blocking operation, and returns a deferred:

-
@defer.inlineCallbacks
-def handle_request(request_id):
+blocking operation, and returns an awaitable:

+
async def handle_request(request_id):
     with context.LoggingContext() as request_context:
         request_context.request = request_id
-        yield do_request_handling()
+        await do_request_handling()
         logger.debug("finished")
 

In the above flow:

  • The logcontext is set
  • -
  • do_request_handling is called, and returns a deferred
  • -
  • handle_request yields the deferred
  • -
  • The inlineCallbacks wrapper of handle_request returns a deferred
  • +
  • do_request_handling is called, and returns an awaitable
  • +
  • handle_request awaits the awaitable
  • +
  • Execution of handle_request is suspended

So we have stopped processing the request (and will probably go on to start processing the next), without clearing the logcontext.

To circumvent this problem, synapse code assumes that, wherever you have -a deferred, you will want to yield on it. To that end, whereever -functions return a deferred, we adopt the following conventions:

-

Rules for functions returning deferreds:

+an awaitable, you will want to await it. To that end, whereever +functions return awaitables, we adopt the following conventions:

+

Rules for functions returning awaitables:

    -
  • If the deferred is already complete, the function returns with the +
  • If the awaitable is already complete, the function returns with the same logcontext it started with.
  • -
  • If the deferred is incomplete, the function clears the logcontext -before returning; when the deferred completes, it restores the +
  • If the awaitable is incomplete, the function clears the logcontext +before returning; when the awaitable completes, it restores the logcontext before running any callbacks.
@@ -267,72 +267,60 @@ logcontext before running any callbacks. the example above) "just works". There are two cases:

  • -

    If do_request_handling returns a completed deferred, then the +

    If do_request_handling returns a completed awaitable, then the logcontext will still be in place. In this case, execution will -continue immediately after the yield; the "finished" line will +continue immediately after the await; the "finished" line will be logged against the right context, and the with block restores the original context before we return to the caller.

  • -

    If the returned deferred is incomplete, do_request_handling clears +

    If the returned awaitable is incomplete, do_request_handling clears the logcontext before returning. The logcontext is therefore clear -when handle_request yields the deferred. At that point, the -inlineCallbacks wrapper adds a callback to the deferred, and -returns another (incomplete) deferred to the caller, and it is safe -to begin processing the next request.

    -

    Once do_request_handling's deferred completes, it will reinstate -the logcontext, before running the callback added by the -inlineCallbacks wrapper. That callback runs the second half of -handle_request, so again the "finished" line will be logged -against the right context, and the with block restores the -original context.

    +when handle_request awaits the awaitable.

    +

    Once do_request_handling's awaitable completes, it will reinstate +the logcontext, before running the second half of handle_request, +so again the "finished" line will be logged against the right context, +and the with block restores the original context.

-

As an aside, it's worth noting that handle_request follows our rules --though that only matters if the caller has its own logcontext which it -cares about.

+

As an aside, it's worth noting that handle_request follows our rules

+
    +
  • though that only matters if the caller has its own logcontext which it +cares about.
  • +

The following sections describe pitfalls and helpful patterns when implementing these rules.

-

Always yield your deferreds

-

Whenever you get a deferred back from a function, you should yield on -it as soon as possible. (Returning it directly to your caller is ok too, -if you're not doing inlineCallbacks.) Do not pass go; do not do any -logging; do not call any other functions.

-
@defer.inlineCallbacks
-def fun():
+

Always await your awaitables

+

Whenever you get an awaitable back from a function, you should await on +it as soon as possible. Do not pass go; do not do any logging; do not +call any other functions.

+
async def fun():
     logger.debug("starting")
-    yield do_some_stuff()       # just like this
+    await do_some_stuff()       # just like this
 
-    d = more_stuff()
-    result = yield d            # also fine, of course
+    coro = more_stuff()
+    result = await coro         # also fine, of course
 
     return result
-
-def nonInlineCallbacksFun():
-    logger.debug("just a wrapper really")
-    return do_some_stuff()      # this is ok too - the caller will yield on
-                                # it anyway.
 

Provided this pattern is followed all the way back up to the callchain to where the logcontext was set, this will make things work out ok: provided do_some_stuff and more_stuff follow the rules above, then -so will fun (as wrapped by inlineCallbacks) and -nonInlineCallbacksFun.

-

It's all too easy to forget to yield: for instance if we forgot that -do_some_stuff returned a deferred, we might plough on regardless. This +so will fun.

+

It's all too easy to forget to await: for instance if we forgot that +do_some_stuff returned an awaitable, we might plough on regardless. This leads to a mess; it will probably work itself out eventually, but not before a load of stuff has been logged against the wrong context. (Normally, other things will break, more obviously, if you forget to -yield, so this tends not to be a major problem in practice.)

+await, so this tends not to be a major problem in practice.)

Of course sometimes you need to do something a bit fancier with your -Deferreds - not all code follows the linear A-then-B-then-C pattern. +awaitable - not all code follows the linear A-then-B-then-C pattern. Notes on implementing more complex patterns are in later sections.

-

Where you create a new Deferred, make it follow the rules

-

Most of the time, a Deferred comes from another synapse function. -Sometimes, though, we need to make up a new Deferred, or we get a -Deferred back from external code. We need to make it follow our rules.

-

The easy way to do it is with a combination of defer.inlineCallbacks, -and context.PreserveLoggingContext. Suppose we want to implement +

Where you create a new awaitable, make it follow the rules

+

Most of the time, an awaitable comes from another synapse function. +Sometimes, though, we need to make up a new awaitable, or we get an awaitable +back from external code. We need to make it follow our rules.

+

The easy way to do it is by using context.make_deferred_yieldable. Suppose we want to implement sleep, which returns a deferred which will run its callbacks after a given number of seconds. That might look like:

# not a logcontext-rules-compliant function
@@ -341,37 +329,26 @@ def get_sleep_deferred(seconds):
     reactor.callLater(seconds, d.callback, None)
     return d
 
-

That doesn't follow the rules, but we can fix it by wrapping it with -PreserveLoggingContext and yield ing on it:

-
@defer.inlineCallbacks
-def sleep(seconds):
-    with PreserveLoggingContext():
-        yield get_sleep_deferred(seconds)
-
-

This technique works equally for external functions which return -deferreds, or deferreds we have made ourselves.

-

You can also use context.make_deferred_yieldable, which just does the -boilerplate for you, so the above could be written:

-
def sleep(seconds):
-    return context.make_deferred_yieldable(get_sleep_deferred(seconds))
+

That doesn't follow the rules, but we can fix it by calling it through +context.make_deferred_yieldable:

+
async def sleep(seconds):
+    return await context.make_deferred_yieldable(get_sleep_deferred(seconds))
 

Fire-and-forget

Sometimes you want to fire off a chain of execution, but not wait for its result. That might look a bit like this:

-
@defer.inlineCallbacks
-def do_request_handling():
-    yield foreground_operation()
+
async def do_request_handling():
+    await foreground_operation()
 
     # *don't* do this
     background_operation()
 
     logger.debug("Request handling complete")
 
-@defer.inlineCallbacks
-def background_operation():
-    yield first_background_step()
+async def background_operation():
+    await first_background_step()
     logger.debug("Completed first step")
-    yield second_background_step()
+    await second_background_step()
     logger.debug("Completed second step")
 

The above code does a couple of steps in the background after @@ -379,12 +356,12 @@ def background_operation(): against the request_context logcontext, which may or may not be desirable. There are two big problems with the above, however. The first problem is that, if background_operation returns an incomplete -Deferred, it will expect its caller to yield immediately, so will have +awaitable, it will expect its caller to await immediately, so will have cleared the logcontext. In this example, that means that 'Request handling complete' will be logged without any context.

The second problem, which is potentially even worse, is that when the -Deferred returned by background_operation completes, it will restore -the original logcontext. There is nothing waiting on that Deferred, so +awaitable returned by background_operation completes, it will restore +the original logcontext. There is nothing waiting on that awaitable, so the logcontext will leak into the reactor and possibly get attached to some arbitrary future operation.

There are two potential solutions to this.

@@ -393,9 +370,8 @@ some arbitrary future operation.

starting background_operation (so the context restored when the deferred completes will be the empty logcontext), and will restore the current logcontext before continuing the foreground process:

-
@defer.inlineCallbacks
-def do_request_handling():
-    yield foreground_operation()
+
async def do_request_handling():
+    await foreground_operation()
 
     # start background_operation off in the empty logcontext, to
     # avoid leaking the current context into the reactor.
@@ -411,14 +387,13 @@ def do_request_handling():
 with LoggingContext(...) in background_operation).

The second option is to use context.run_in_background, which wraps a function so that it doesn't reset the logcontext even when it returns -an incomplete deferred, and adds a callback to the returned deferred to +an incomplete awaitable, and adds a callback to the returned awaitable to reset the logcontext. In other words, it turns a function that follows -the Synapse rules about logcontexts and Deferreds into one which behaves +the Synapse rules about logcontexts and awaitables into one which behaves more like an external function --- the opposite operation to that described in the previous section. It can be used like this:

-
@defer.inlineCallbacks
-def do_request_handling():
-    yield foreground_operation()
+
async def do_request_handling():
+    await foreground_operation()
 
     context.run_in_background(background_operation)
 
@@ -427,120 +402,41 @@ def do_request_handling():
 

Passing synapse deferreds into third-party functions

A typical example of this is where we want to collect together two or -more deferred via defer.gatherResults:

-
d1 = operation1()
-d2 = operation2()
-d3 = defer.gatherResults([d1, d2])
+more awaitables via defer.gatherResults:

+
a1 = operation1()
+a2 = operation2()
+a3 = defer.gatherResults([a1, a2])
 

This is really a variation of the fire-and-forget problem above, in that -we are firing off d1 and d2 without yielding on them. The difference +we are firing off a1 and a2 without awaiting on them. The difference is that we now have third-party code attached to their callbacks. Anyway either technique given in the Fire-and-forget section will work.

-

Of course, the new Deferred returned by gatherResults needs to be +

Of course, the new awaitable returned by gather needs to be wrapped in order to make it follow the logcontext rules before we can -yield it, as described in Where you create a new Deferred, make it +yield it, as described in Where you create a new awaitable, make it follow the rules.

So, option one: reset the logcontext before starting the operations to be gathered:

-
@defer.inlineCallbacks
-def do_request_handling():
+
async def do_request_handling():
     with PreserveLoggingContext():
-        d1 = operation1()
-        d2 = operation2()
-        result = yield defer.gatherResults([d1, d2])
+        a1 = operation1()
+        a2 = operation2()
+        result = await defer.gatherResults([a1, a2])
 

In this case particularly, though, option two, of using -context.preserve_fn almost certainly makes more sense, so that +context.run_in_background almost certainly makes more sense, so that operation1 and operation2 are both logged against the original logcontext. This looks like:

-
@defer.inlineCallbacks
-def do_request_handling():
-    d1 = context.preserve_fn(operation1)()
-    d2 = context.preserve_fn(operation2)()
+
async def do_request_handling():
+    a1 = context.run_in_background(operation1)
+    a2 = context.run_in_background(operation2)
 
-    with PreserveLoggingContext():
-        result = yield defer.gatherResults([d1, d2])
-
-

Was all this really necessary?

-

The conventions used work fine for a linear flow where everything -happens in series via defer.inlineCallbacks and yield, but are -certainly tricky to follow for any more exotic flows. It's hard not to -wonder if we could have done something else.

-

We're not going to rewrite Synapse now, so the following is entirely of -academic interest, but I'd like to record some thoughts on an -alternative approach.

-

I briefly prototyped some code following an alternative set of rules. I -think it would work, but I certainly didn't get as far as thinking how -it would interact with concepts as complicated as the cache descriptors.

-

My alternative rules were:

-
    -
  • functions always preserve the logcontext of their caller, whether or -not they are returning a Deferred.
  • -
  • Deferreds returned by synapse functions run their callbacks in the -same context as the function was orignally called in.
  • -
-

The main point of this scheme is that everywhere that sets the -logcontext is responsible for clearing it before returning control to -the reactor.

-

So, for example, if you were the function which started a -with LoggingContext block, you wouldn't yield within it --- instead -you'd start off the background process, and then leave the with block -to wait for it:

-
def handle_request(request_id):
-    with context.LoggingContext() as request_context:
-        request_context.request = request_id
-        d = do_request_handling()
-
-    def cb(r):
-        logger.debug("finished")
-
-    d.addCallback(cb)
-    return d
-
-

(in general, mixing with LoggingContext blocks and -defer.inlineCallbacks in the same function leads to slighly -counter-intuitive code, under this scheme).

-

Because we leave the original with block as soon as the Deferred is -returned (as opposed to waiting for it to be resolved, as we do today), -the logcontext is cleared before control passes back to the reactor; so -if there is some code within do_request_handling which needs to wait -for a Deferred to complete, there is no need for it to worry about -clearing the logcontext before doing so:

-
def handle_request():
-    r = do_some_stuff()
-    r.addCallback(do_some_more_stuff)
-    return r
-
-

--- and provided do_some_stuff follows the rules of returning a -Deferred which runs its callbacks in the original logcontext, all is -happy.

-

The business of a Deferred which runs its callbacks in the original -logcontext isn't hard to achieve --- we have it today, in the shape of -context._PreservingContextDeferred:

-
def do_some_stuff():
-    deferred = do_some_io()
-    pcd = _PreservingContextDeferred(LoggingContext.current_context())
-    deferred.chainDeferred(pcd)
-    return pcd
-
-

It turns out that, thanks to the way that Deferreds chain together, we -automatically get the property of a context-preserving deferred with -defer.inlineCallbacks, provided the final Defered the function -yields on has that property. So we can just write:

-
@defer.inlineCallbacks
-def handle_request():
-    yield do_some_stuff()
-    yield do_some_more_stuff()
+    result = await make_deferred_yieldable(defer.gatherResults([a1, a2]))
 
-

To conclude: I think this scheme would have worked equally well, with -less danger of messing it up, and probably made some more esoteric code -easier to write. But again --- changing the conventions of the entire -Synapse codebase is not a sensible option for the marginal improvement -offered.

-

A note on garbage-collection of Deferred chains

-

It turns out that our logcontext rules do not play nicely with Deferred +

A note on garbage-collection of awaitable chains

+

It turns out that our logcontext rules do not play nicely with awaitable chains which get orphaned and garbage-collected.

Imagine we have some code that looks like this:

listener_queue = []
@@ -549,13 +445,12 @@ def on_something_interesting():
     for d in listener_queue:
         d.callback("foo")
 
-@defer.inlineCallbacks
-def await_something_interesting():
-    new_deferred = defer.Deferred()
-    listener_queue.append(new_deferred)
+async def await_something_interesting():
+    new_awaitable = defer.Deferred()
+    listener_queue.append(new_awaitable)
 
     with PreserveLoggingContext():
-        yield new_deferred
+        await new_awaitable
 

Obviously, the idea here is that we have a bunch of things which are waiting for an event. (It's just an example of the problem here, but a @@ -568,18 +463,19 @@ never going to happen, and we reset the listener queue:

def reset_listener_queue():
     listener_queue.clear()
 
-

So, both ends of the deferred chain have now dropped their references, -and the deferred chain is now orphaned, and will be garbage-collected at -some point. Note that await_something_interesting is a generator -function, and when Python garbage-collects generator functions, it gives -them a chance to clean up by making the yield raise a GeneratorExit +

So, both ends of the awaitable chain have now dropped their references, +and the awaitable chain is now orphaned, and will be garbage-collected at +some point. Note that await_something_interesting is a coroutine, +which Python implements as a generator function. When Python +garbage-collects generator functions, it gives them a chance to +clean up by making the async (or yield) raise a GeneratorExit exception. In our case, that means that the __exit__ handler of PreserveLoggingContext will carefully restore the request context, but there is now nothing waiting for its return, so the request context is never cleared.

-

To reiterate, this problem only arises when both ends of a deferred -chain are dropped. Dropping the the reference to a deferred you're -supposed to be calling is probably bad practice, so this doesn't +

To reiterate, this problem only arises when both ends of a awaitable +chain are dropped. Dropping the the reference to an awaitable you're +supposed to be awaiting is bad practice, so this doesn't actually happen too much. Unfortunately, when it does happen, it will lead to leaked logcontexts which are incredibly hard to track down.

-- cgit 1.5.1