| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
| |
A batch of changes intended to make it easier to trace to-device messages through the system.
The intention here is that a client can set a property org.matrix.msgid in any to-device message it sends. That ID is then included in any tracing or logging related to the message. (Suggestions as to where this field should be documented welcome. I'm not enthusiastic about speccing it - it's very much an optional extra to help with debugging.)
I've also generally improved the data we send to opentracing for these messages.
|
|
|
|
|
|
|
| |
Remove type hints from comments which have been added
as Python type hints. This helps avoid drift between comments
and reality, as well as removing redundant information.
Also adds some missing type hints which were simple to fill in.
|
|
|
| |
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
|
|
|
|
| |
This reverts commit 6d543d6d9f56e39199b7e460d0081b02d61f12be.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Update mypy and mypy-zope
* Unignore assigning to LogRecord attributes
Presumably https://github.com/python/typeshed/pull/8064 makes this ok
Cherry-picked from #13521
* Remove unused ignores due to mypy ParamSpec fixes
https://github.com/python/mypy/pull/12668
Cherry-picked from #13521
* Remove additional unused ignores
* Fix new mypy complaints related to `assertGreater`
Presumably due to https://github.com/python/typeshed/pull/8077
* Changelog
* Reword changelog
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
Co-authored-by: Patrick Cloke <clokep@users.noreply.github.com>
|
|
|
|
|
|
|
|
| |
We tag the Synapse instance name so that it's an easy jumping off point into the logs. Can also be used to filter for an instance that is under load.
As suggested by @clokep and @reivilibre in,
- https://github.com/matrix-org/synapse/pull/13729#discussion_r964719258
- https://github.com/matrix-org/synapse/pull/13729#discussion_r964733578
|
|
|
|
|
|
|
|
|
|
|
|
| |
are spread over (#13729)
The problem with many services is that it makes it hard to find which service has the trace you want, see https://github.com/jaegertracing/jaeger-ui/issues/985
Previously, we split traces out into services based on their instance name like `matrix.org client_reader-1`, etc but there are many worker instances of the same `client_reader` so there is a lot to click through.
With this PR, all of the traces are just collected under the worker type like `client_reader`, `event_persister` 😇
Note: A Synapse worker instance name is an opaque string with the number convention only being our own thing for the `matrix.org` deployment. But seems pretty sensible to group things this way.
|
|
|
|
|
|
|
|
|
| |
Instrument the federation/backfill part of `/messages` so it's easier to follow what's going on in Jaeger when viewing a trace.
Split out from https://github.com/matrix-org/synapse/pull/13440
Follow-up from https://github.com/matrix-org/synapse/pull/13368
Part of https://github.com/matrix-org/synapse/issues/13356
|
|
|
|
|
|
|
|
| |
This reverts commit f383b9b3eceaa082d5ae690550fe41460b711779. Other PRs
were seeing mypy failures that looked to be related to mypy-zope.
Confusingly, we didn't see this on #13521.
Revert this for now and investigate later.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
```py
@trace
@tag_args
async def get_oldest_event_ids_with_depth_in_room(...)
...
```
Before this PR, you would see a warning in the logs and the span was not exported:
```
2022-08-03 19:11:59,383 - synapse.logging.opentracing - 835 - ERROR - GET-0 - @trace may not have wrapped EventFederationWorkerStore.get_oldest_event_ids_with_depth_in_room correctly! The function is not async but returned a coroutine.
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix @tag_args being off-by-one (ahead)
Example:
```
argspec.args=[
'self',
'room_id'
]
args=(
<synapse.storage.databases.main.DataStore object at 0x10d0b8d00>,
'!HBehERstyQBxyJDLfR:my.synapse.server'
)
```
---
The previous logic was also flawed and we can end up in a situation like this:
```
argspec.args=['self', 'dest', 'room_id', 'limit', 'extremities']
args=(<synapse.federation.federation_client.FederationClient object at 0x7f1651c18160>, 'hs1', '!jAEHKIubyIfuLOdfpY:hs1')
```
From this source:
```py
async def backfill(
self, dest: str, room_id: str, limit: int, extremities: Collection[str]
) -> Optional[List[EventBase]]:
```
And this usage:
```py
events = await self._federation_client.backfill(
dest, room_id, limit=limit, extremities=extremities
)
```
which would previously cause this error:
```
synapse_main | 2022-08-04 06:13:12,051 - synapse.handlers.federation - 424 - ERROR - GET-5 - Failed to backfill from hs1 because tuple index out of range
synapse_main | Traceback (most recent call last):
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation.py", line 392, in try_backfill
synapse_main | await self._federation_event_handler.backfill(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 828, in _wrapper
synapse_main | return await func(*args, **kwargs)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/handlers/federation_event.py", line 593, in backfill
synapse_main | events = await self._federation_client.backfill(
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 828, in _wrapper
synapse_main | return await func(*args, **kwargs)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 827, in _wrapper
synapse_main | with wrapping_logic(func, *args, **kwargs):
synapse_main | File "/usr/local/lib/python3.9/contextlib.py", line 119, in __enter__
synapse_main | return next(self.gen)
synapse_main | File "/usr/local/lib/python3.9/site-packages/synapse/logging/tracing.py", line 922, in _wrapping_logic
synapse_main | set_attribute("ARG_" + arg, str(args[i + 1])) # type: ignore[index]
synapse_main | IndexError: tuple index out of range
```
|
| |
|
|
|
| |
After this change `synapse.logging` is fully typed.
|
|
|
|
| |
Functions that are decorated with `trace` are now properly typed
and the type hints for them are fixed.
|
|
|
|
| |
Completes type hints for synapse.logging.scopecontextmanager and (partially)
for synapse.logging.opentracing.
|
| |
|
|
|
|
|
| |
getClientIP was deprecated in Twisted 18.4.0, which also added
getClientAddress. The Synapse minimum version for Twisted is
currently 18.9.0, so all supported versions have the new API.
|
| |
|
|
|
|
|
|
|
|
|
| |
The idea here is to set the parent span for incoming federation requests to the
*outgoing* span on the other end. That means that you can see (most of) the
full end-to-end flow when you have a process that includes federation requests.
However, in order not to lose information, we still want a link to the
`incoming-federation-request` span from the servlet, so we have to create
another span to do exactly that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
`start_active_span` was inconsistent as to whether it would activate the span
immediately, or wait for `scope.__enter__` to happen (it depended on whether
the current logcontext already had an associated scope). The inconsistency was
rather confusing if you were hoping to set up a couple of separate spans before
activating either.
Looking at the other implementations of opentracing `ScopeManager`s, the
intention is that it *should* be activated immediately, as the name
implies. Indeed, the idea is that you don't have to use the scope as a
contextmanager at all - you can just call `.close` on the result. Hence, our
cleanup has to happen in `.close` rather than `.__exit__`.
So, the main change here is to ensure that `start_active_span` does activate
the span, and that `scope.close()` does close the scope.
We also add some tests, which requires a `tracer` param so that we don't have
to rely on the global variable in unit tests.
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* remove `start_active_span_from_request`
Instead, pull out a separate function, `span_context_from_request`, to extract
the parent span, which we can then pass into `start_active_span` as
normal. This seems to be clearer all round.
* Remove redundant tags from `incoming-federation-request`
These are all wrapped up inside a parent span generated in AsyncResource, so
there's no point duplicating all the tags that are set there.
* Leave request spans open until the request completes
It may take some time for the response to be encoded into JSON, and that JSON
to be streamed back to the client, and really we want that inside the top-level
span, so let's hand responsibility for closure to the SynapseRequest.
* opentracing logs for HTTP request events
* changelog
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit fixes two bugs to do with decorators not instrumenting
`ReplicationEndpoint`'s `send_request` correctly. There are two
decorators on `send_request`: Prometheus' `Gauge.track_inprogress()`
and Synapse's `opentracing.trace`.
`Gauge.track_inprogress()` does not have any support for async
functions when used as a decorator. Since async functions behave like
regular functions that return coroutines, only the creation of the
coroutine was covered by the metric and none of the actual body of
`send_request`.
`Gauge.track_inprogress()` returns a regular, non-async function
wrapping `send_request`, which is the source of the next bug.
The `opentracing.trace` decorator would normally handle async functions
correctly, but since the wrapped `send_request` is a non-async function,
the decorator ends up suffering from the same issue as
`Gauge.track_inprogress()`: the opentracing span only measures the
creation of the coroutine and none of the actual function body.
Using `Gauge.track_inprogress()` as a context manager instead of a
decorator resolves both bugs.
|
|
|
|
| |
Make it easier to track down where opentracing spans are going missing
by including stack traces in the logs.
|
| |
|
|
|
|
| |
Instead of proxying through the magic getter of the RootConfig
object. This should be more performant (and is more explicit).
|
|
|
|
| |
This fixes mypy errors with jaeger-client >= 4.7.0 and should be a no-op
for versions before that.
|
|
|
|
|
|
|
|
|
| |
This PR is tantamount to running
```
pyupgrade --py36-plus --keep-percent-format `find synapse/ -type f -name "*.py"`
```
Part of #9744
|
| |
|
|
|
| |
Fixes: #9480
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Trace event persistence
When we persist a batch of events, set the parent opentracing span to the that
from the request, so that we can trace all the way in.
* changelog
* When we force tracing, set a baggage item
... so that we can check again later.
* Link in both directions between persist_events spans
|
|
|
|
|
|
|
| |
* Remove unused helper functions
* Clean up the interface for injecting opentracing over HTTP
* changelog
|
|
|
|
| |
... otherwise we tend to get a namespace clash between the bg process and the
functions that it calls.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This adds quite a lot of OpenTracing decoration for database activity. Specifically it adds tracing at four different levels:
* emit a span for each "interaction" - ie, the top level database function that we tend to call "transaction", but isn't really, because it can end up as multiple transactions.
* emit a span while we hold a database connection open
* emit a span for each database transaction - actual actual transaction.
* emit a span for each database query.
I'm aware this might be quite a lot of overhead, but even just running it on a local Synapse it looks really interesting, and I hope the overhead can be offset just by turning down the sampling frequency and finding other ways of tracing requests of interest (eg, the `force_tracing_for_users` setting).
|
|
|
| |
Signed-off-by: Brad Murray brad@beeper.com
|
|
|
| |
... because tags on spans which aren't being sampled get thrown away.
|
|
|
|
|
|
|
| |
Part of #9744
Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now.
`Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
|
|
|
|
|
|
|
| |
Part of #9366
Adds in fixes for B006 and B008, both relating to mutable parameter lint errors.
Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>
|
| |
|
| |
|
|
|
|
|
|
|
|
|
| |
Running `dmypy run` will do a `mypy` check while spinning up a daemon
that makes rerunning `dmypy run` a lot faster.
`dmypy` doesn't support `follow_imports = silent` and has
`local_partial_types` enabled, so this PR enables those options and
fixes the issues that were newly raised. Note that `local_partial_types`
will be enabled by default in upcoming mypy releases.
|
|
|
|
|
|
|
| |
- Update black version to the latest
- Run black auto formatting over the codebase
- Run autoformatting according to [`docs/code_style.md
`](https://github.com/matrix-org/synapse/blob/80d6dc9783aa80886a133756028984dbf8920168/docs/code_style.md)
- Update `code_style.md` docs around installing black to use the correct version
|
|
|
| |
This finishes adding type hints to the `synapse.handlers` module.
|
|
|
|
|
|
|
| |
#8567 started a span for every background process. This is good as it means all Synapse code that gets run should be in a span (unless in the sentinel logging context), but it means we generate about 15x the number of spans as we did previously.
This PR attempts to reduce that number by a) not starting one for send commands to Redis, and b) deferring starting background processes until after we're sure they're necessary.
I don't really know how much this will help.
|
| |
|
| |
|
| |
|
| |
|
| |
|
|
|
| |
Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
|
|
|
| |
The CI appears to use the latest version of isort, which is a problem when isort gets a major version bump. Rather than try to pin the version, I've done the necessary to make isort5 happy with synapse.
|
| |
|
| |
|
|
|
|
|
| |
This is primarily for allowing us to send those commands from workers, but for now simply allows us to ignore echoed RDATA/POSITION commands that we sent (we get echoes of sent commands when using redis). Currently we log a WARNING on the master process every time we receive an echoed RDATA.
|
| |
|
|
|
|
|
|
| |
* type checking fixes
* changelog
|
| |
|
| |
|
|
|
|
| |
(#5988)
|
| |
|
| |
|
|
|
| |
Trace device list changes.
|
|
|
|
|
| |
Propagate opentracing contexts through EDUs
Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Propagate opentracing contexts across workers
Also includes some Convenience modifications to opentracing for servlets, notably:
- Add boolean to skip the whitelisting check on inject
extract methods. - useful when injecting into carriers
locally. Otherwise we'd always have to include our
own servername and whitelist our servername
- start_active_span_from_request instead of header
- Add boolean to decide whether to extract context
from a request to a servlet
|
|
|
| |
Update opentracing docs to use the unified 'trace' method
|
|
|
|
|
|
| |
* Allow Jaeger to be configured
* Update sample config
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
* Add decerators for tracing functions
* Use the new clean contexts
* Context and edu utils
* Move opentracing setters
* Move whitelisting
* Sectioning comments
* Better args wrapper
* Docstrings
Co-Authored-By: Erik Johnston <erik@matrix.org>
* Remove unused methods.
* Don't use global
* One tracing decorator to rule them all.
|
|\ |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* Opentracing survival guide
* Update decorator names in doc
* Doc cleanup
These are all alterations as a result of comments in #5703, it
includes mostly typos and clarifications. The most interesting
changes are:
- Split developer and user docs into two sections
- Add a high level description of OpenTracing
* newsfile
* Move contributer specific info to docstring.
* Sample config.
* Trailing whitespace.
* Update 5703.misc
* Apply suggestions from code review
Mostly just rewording parts of the docs for clarity.
Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
|
|/
|
|
|
|
| |
* Don't accept opentracing data from clients.
* newsfile
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Clean up config settings and dead code.
This is mostly about cleaning up the config format, to bring it into line with our conventions. In particular:
* There should be a blank line after `## Section ##' headings
* There should be a blank line between each config setting
* There should be a `#`-only line between a comment and the setting it describes
* We don't really do the `# #` style commenting-out of whole sections if we can help it
* rename `tracer_enabled` to `enabled`
While we're here, do more config parsing upfront, which makes it easier to use
later on.
Also removes redundant code from LogContextScopeManager.
Also changes the changelog fragment to a `feature` - it's exciting!
|
|
* Configure and initialise tracer
Includes config options for the tracer and sets up JaegerClient.
* Scope manager using LogContexts
We piggy-back our tracer scopes by using log context.
The current log context gives us the current scope. If new scope is
created we create a stack of scopes in the context.
* jaeger is a dependency now
* Carrier inject and extraction for Twisted Headers
* Trace federation requests on the way in and out.
The span is created in _started_processing and closed in
_finished_processing because we need a meaningful log context.
* Create logcontext for new scope.
Instead of having a stack of scopes in a logcontext we create a new
context for a new scope if the current logcontext already has a scope.
* Remove scope from logcontext if logcontext is top level
* Disable tracer if not configured
* typo
* Remove dependence on jaeger internals
* bools
* Set service name
* :Explicitely state that the tracer is disabled
* Black is the new black
* Newsfile
* Code style
* Use the new config setup.
* Generate config.
* Copyright
* Rename config to opentracing
* Remove user whitelisting
* Empty whitelist by default
* User ConfigError instead of RuntimeError
* Use isinstance
* Use tag constants for opentracing.
* Remove debug comment and no need to explicitely record error
* Two errors a "s(c)entry"
* Docstrings!
* Remove debugging brainslip
* Homeserver Whitlisting
* Better opentracing config comment
* linting
* Inclue worker name in service_name
* Make opentracing an optional dependency
* Neater config retreival
* Clean up dummy tags
* Instantiate tracing as object instead of global class
* Inlcude opentracing as a homeserver member.
* Thread opentracing to the request level
* Reference opetnracing through hs
* Instantiate dummy opentracin g for tests.
* About to revert, just keeping the unfinished changes just in case
* Revert back to global state, commit number:
9ce4a3d9067bf9889b86c360c05ac88618b85c4f
* Use class level methods in tracerutils
* Start and stop requests spans in a place where we
have access to the authenticated entity
* Seen it, isort it
* Make sure to close the active span.
* I'm getting black and blue from this.
* Logger formatting
Co-Authored-By: Erik Johnston <erik@matrix.org>
* Outdated comment
* Import opentracing at the top
* Return a contextmanager
* Start tracing client requests from the servlet
* Return noop context manager if not tracing
* Explicitely say that these are federation requests
* Include servlet name in client requests
* Use context manager
* Move opentracing to logging/
* Seen it, isort it again!
* Ignore twisted return exceptions on context exit
* Escape the scope
* Scopes should be entered to make them useful.
* Nicer decorator names
* Just one init, init?
* Don't need to close something that isn't open
* Docs make you smarter
|