diff options
author | David Robertson <davidr@element.io> | 2022-01-25 12:35:11 +0000 |
---|---|---|
committer | David Robertson <davidr@element.io> | 2022-01-25 12:35:11 +0000 |
commit | b500fcbc0c2721f0930a62ee1d3c83f4ff68c518 (patch) | |
tree | f462f3667102baeb28fa25ecacab0dd561ee45f6 /synapse/metrics/__init__.py | |
parent | 1.50.2 (diff) | |
parent | Correct version number (diff) | |
download | synapse-b500fcbc0c2721f0930a62ee1d3c83f4ff68c518.tar.xz |
Merge tag 'v1.51.0'
Synapse 1.51.0 (2022-01-25) =========================== No significant changes since 1.51.0rc2. Synapse 1.51.0 deprecates `webclient` listeners and non-HTTP(S) `web_client_location`s. Support for these will be removed in Synapse 1.53.0, at which point Synapse will not be capable of directly serving a web client for Matrix. Synapse 1.51.0rc2 (2022-01-24) ============================== Bugfixes -------- - Fix a bug introduced in Synapse 1.40.0 that caused Synapse to fail to process incoming federation traffic after handling a large amount of events in a v1 room. ([\#11806](https://github.com/matrix-org/synapse/issues/11806)) Synapse 1.51.0rc1 (2022-01-21) ============================== Features -------- - Add `track_puppeted_user_ips` config flag to record client IP addresses against puppeted users, and include the puppeted users in monthly active user counts. ([\#11561](https://github.com/matrix-org/synapse/issues/11561), [\#11749](https://github.com/matrix-org/synapse/issues/11749), [\#11757](https://github.com/matrix-org/synapse/issues/11757)) - Include whether the requesting user has participated in a thread when generating a summary for [MSC3440](https://github.com/matrix-org/matrix-doc/pull/3440). ([\#11577](https://github.com/matrix-org/synapse/issues/11577)) - Return an `M_FORBIDDEN` error code instead of `M_UNKNOWN` when a spam checker module prevents a user from creating a room. ([\#11672](https://github.com/matrix-org/synapse/issues/11672)) - Add a flag to the `synapse_review_recent_signups` script to ignore and filter appservice users. ([\#11675](https://github.com/matrix-org/synapse/issues/11675), [\#11770](https://github.com/matrix-org/synapse/issues/11770)) Bugfixes -------- - Fix a long-standing issue which could cause Synapse to incorrectly accept data in the unsigned field of events received over federation. ([\#11530](https://github.com/matrix-org/synapse/issues/11530)) - Fix a long-standing bug where Synapse wouldn't cache a response indicating that a remote user has no devices. ([\#11587](https://github.com/matrix-org/synapse/issues/11587)) - Fix an error that occurs whilst trying to get the federation status of a destination server that was working normally. This admin API was newly introduced in Synapse v1.49.0. ([\#11593](https://github.com/matrix-org/synapse/issues/11593)) - Fix bundled aggregations not being included in the `/sync` response, per [MSC2675](https://github.com/matrix-org/matrix-doc/pull/2675). ([\#11612](https://github.com/matrix-org/synapse/issues/11612), [\#11659](https://github.com/matrix-org/synapse/issues/11659), [\#11791](https://github.com/matrix-org/synapse/issues/11791)) - Fix the `/_matrix/client/v1/room/{roomId}/hierarchy` endpoint returning incorrect fields which have been present since Synapse 1.49.0. ([\#11667](https://github.com/matrix-org/synapse/issues/11667)) - Fix preview of some GIF URLs (like tenor.com). Contributed by Philippe Daouadi. ([\#11669](https://github.com/matrix-org/synapse/issues/11669)) - Fix a bug where only the first 50 rooms from a space were returned from the `/hierarchy` API. This has existed since the introduction of the API in Synapse v1.41.0. ([\#11695](https://github.com/matrix-org/synapse/issues/11695)) - Fix a bug introduced in Synapse v1.18.0 where password reset and address validation emails would not be sent if their subject was configured to use the 'app' template variable. Contributed by @br4nnigan. ([\#11710](https://github.com/matrix-org/synapse/issues/11710), [\#11745](https://github.com/matrix-org/synapse/issues/11745)) - Make the 'List Rooms' Admin API sort stable. Contributed by Daniƫl Sonck. ([\#11737](https://github.com/matrix-org/synapse/issues/11737)) - Fix a long-standing bug where space hierarchy over federation would only work correctly some of the time. ([\#11775](https://github.com/matrix-org/synapse/issues/11775)) - Fix a bug introduced in Synapse v1.46.0 that prevented `on_logged_out` module callbacks from being correctly awaited by Synapse. ([\#11786](https://github.com/matrix-org/synapse/issues/11786)) Improved Documentation ---------------------- - Warn against using a Let's Encrypt certificate for TLS/DTLS TURN server client connections, and suggest using ZeroSSL certificate instead. This works around client-side connectivity errors caused by WebRTC libraries that reject Let's Encrypt certificates. Contibuted by @AndrewFerr. ([\#11686](https://github.com/matrix-org/synapse/issues/11686)) - Document the new `SYNAPSE_TEST_PERSIST_SQLITE_DB` environment variable in the contributing guide. ([\#11715](https://github.com/matrix-org/synapse/issues/11715)) - Document that the minimum supported PostgreSQL version is now 10. ([\#11725](https://github.com/matrix-org/synapse/issues/11725)) - Fix typo in demo docs: differnt. ([\#11735](https://github.com/matrix-org/synapse/issues/11735)) - Update room spec URL in config files. ([\#11739](https://github.com/matrix-org/synapse/issues/11739)) - Mention `python3-venv` and `libpq-dev` dependencies in the contribution guide. ([\#11740](https://github.com/matrix-org/synapse/issues/11740)) - Update documentation for configuring login with Facebook. ([\#11755](https://github.com/matrix-org/synapse/issues/11755)) - Update installation instructions to note that Python 3.6 is no longer supported. ([\#11781](https://github.com/matrix-org/synapse/issues/11781)) Deprecations and Removals ------------------------- - Remove the unstable `/send_relation` endpoint. ([\#11682](https://github.com/matrix-org/synapse/issues/11682)) - Remove `python_twisted_reactor_pending_calls` Prometheus metric. ([\#11724](https://github.com/matrix-org/synapse/issues/11724)) - Remove the `password_hash` field from the response dictionaries of the [Users Admin API](https://matrix-org.github.io/synapse/latest/admin_api/user_admin_api.html). ([\#11576](https://github.com/matrix-org/synapse/issues/11576)) - **Deprecate support for `webclient` listeners and non-HTTP(S) `web_client_location` configuration. ([\#11774](https://github.com/matrix-org/synapse/issues/11774), [\#11783](https://github.com/matrix-org/synapse/issues/11783))** Internal Changes ---------------- - Run `pyupgrade --py37-plus --keep-percent-format` on Synapse. ([\#11685](https://github.com/matrix-org/synapse/issues/11685)) - Use buildkit's cache feature to speed up docker builds. ([\#11691](https://github.com/matrix-org/synapse/issues/11691)) - Use `auto_attribs` and native type hints for attrs classes. ([\#11692](https://github.com/matrix-org/synapse/issues/11692), [\#11768](https://github.com/matrix-org/synapse/issues/11768)) - Remove debug logging for #4422, which has been closed since Synapse 0.99. ([\#11693](https://github.com/matrix-org/synapse/issues/11693)) - Remove fallback code for Python 2. ([\#11699](https://github.com/matrix-org/synapse/issues/11699)) - Add a test for [an edge case](https://github.com/matrix-org/synapse/pull/11532#discussion_r769104461) in the `/sync` logic. ([\#11701](https://github.com/matrix-org/synapse/issues/11701)) - Add the option to write SQLite test dbs to disk when running tests. ([\#11702](https://github.com/matrix-org/synapse/issues/11702)) - Improve Complement test output for Gitub Actions. ([\#11707](https://github.com/matrix-org/synapse/issues/11707)) - Fix docstring on `add_account_data_for_user`. ([\#11716](https://github.com/matrix-org/synapse/issues/11716)) - Complement environment variable name change and update `.gitignore`. ([\#11718](https://github.com/matrix-org/synapse/issues/11718)) - Simplify calculation of Prometheus metrics for garbage collection. ([\#11723](https://github.com/matrix-org/synapse/issues/11723)) - Improve accuracy of `python_twisted_reactor_tick_time` Prometheus metric. ([\#11724](https://github.com/matrix-org/synapse/issues/11724), [\#11771](https://github.com/matrix-org/synapse/issues/11771)) - Minor efficiency improvements when inserting many values into the database. ([\#11742](https://github.com/matrix-org/synapse/issues/11742)) - Invite PR authors to give themselves credit in the changelog. ([\#11744](https://github.com/matrix-org/synapse/issues/11744)) - Add optional debugging to investigate [issue 8631](https://github.com/matrix-org/synapse/issues/8631). ([\#11760](https://github.com/matrix-org/synapse/issues/11760)) - Remove `log_function` utility function and its uses. ([\#11761](https://github.com/matrix-org/synapse/issues/11761)) - Add a unit test that checks both `client` and `webclient` resources will function when simultaneously enabled. ([\#11765](https://github.com/matrix-org/synapse/issues/11765)) - Allow overriding complement commit using `COMPLEMENT_REF`. ([\#11766](https://github.com/matrix-org/synapse/issues/11766)) - Add some comments and type annotations for `_update_outliers_txn`. ([\#11776](https://github.com/matrix-org/synapse/issues/11776))
Diffstat (limited to 'synapse/metrics/__init__.py')
-rw-r--r-- | synapse/metrics/__init__.py | 276 |
1 files changed, 14 insertions, 262 deletions
diff --git a/synapse/metrics/__init__.py b/synapse/metrics/__init__.py index ceef57ad88..9e6c1b2f3b 100644 --- a/synapse/metrics/__init__.py +++ b/synapse/metrics/__init__.py @@ -12,16 +12,12 @@ # See the License for the specific language governing permissions and # limitations under the License. -import functools -import gc import itertools import logging import os import platform import threading -import time from typing import ( - Any, Callable, Dict, Generic, @@ -34,35 +30,31 @@ from typing import ( Type, TypeVar, Union, - cast, ) import attr from prometheus_client import CollectorRegistry, Counter, Gauge, Histogram, Metric from prometheus_client.core import ( REGISTRY, - CounterMetricFamily, GaugeHistogramMetricFamily, GaugeMetricFamily, ) -from twisted.internet import reactor -from twisted.internet.base import ReactorBase from twisted.python.threadpool import ThreadPool -import synapse +import synapse.metrics._reactor_metrics from synapse.metrics._exposition import ( MetricsResource, generate_latest, start_http_server, ) +from synapse.metrics._gc import MIN_TIME_BETWEEN_GCS, install_gc_manager from synapse.util.versionstring import get_version_string logger = logging.getLogger(__name__) METRICS_PREFIX = "/_synapse/metrics" -running_on_pypy = platform.python_implementation() == "PyPy" all_gauges: "Dict[str, Union[LaterGauge, InFlightGauge]]" = {} HAVE_PROC_SELF_STAT = os.path.exists("/proc/self/stat") @@ -76,19 +68,17 @@ class RegistryProxy: yield metric -@attr.s(slots=True, hash=True) +@attr.s(slots=True, hash=True, auto_attribs=True) class LaterGauge: - name = attr.ib(type=str) - desc = attr.ib(type=str) - labels = attr.ib(hash=False, type=Optional[Iterable[str]]) + name: str + desc: str + labels: Optional[Iterable[str]] = attr.ib(hash=False) # callback: should either return a value (if there are no labels for this metric), # or dict mapping from a label tuple to a value - caller = attr.ib( - type=Callable[ - [], Union[Mapping[Tuple[str, ...], Union[int, float]], Union[int, float]] - ] - ) + caller: Callable[ + [], Union[Mapping[Tuple[str, ...], Union[int, float]], Union[int, float]] + ] def collect(self) -> Iterable[Metric]: @@ -157,7 +147,9 @@ class InFlightGauge(Generic[MetricsEntry]): # Create a class which have the sub_metrics values as attributes, which # default to 0 on initialization. Used to pass to registered callbacks. self._metrics_class: Type[MetricsEntry] = attr.make_class( - "_MetricsEntry", attrs={x: attr.ib(0) for x in sub_metrics}, slots=True + "_MetricsEntry", + attrs={x: attr.ib(default=0) for x in sub_metrics}, + slots=True, ) # Counts number of in flight blocks for a given set of label values @@ -369,136 +361,6 @@ class CPUMetrics: REGISTRY.register(CPUMetrics()) -# -# Python GC metrics -# - -gc_unreachable = Gauge("python_gc_unreachable_total", "Unreachable GC objects", ["gen"]) -gc_time = Histogram( - "python_gc_time", - "Time taken to GC (sec)", - ["gen"], - buckets=[ - 0.0025, - 0.005, - 0.01, - 0.025, - 0.05, - 0.10, - 0.25, - 0.50, - 1.00, - 2.50, - 5.00, - 7.50, - 15.00, - 30.00, - 45.00, - 60.00, - ], -) - - -class GCCounts: - def collect(self) -> Iterable[Metric]: - cm = GaugeMetricFamily("python_gc_counts", "GC object counts", labels=["gen"]) - for n, m in enumerate(gc.get_count()): - cm.add_metric([str(n)], m) - - yield cm - - -if not running_on_pypy: - REGISTRY.register(GCCounts()) - - -# -# PyPy GC / memory metrics -# - - -class PyPyGCStats: - def collect(self) -> Iterable[Metric]: - - # @stats is a pretty-printer object with __str__() returning a nice table, - # plus some fields that contain data from that table. - # unfortunately, fields are pretty-printed themselves (i. e. '4.5MB'). - stats = gc.get_stats(memory_pressure=False) # type: ignore - # @s contains same fields as @stats, but as actual integers. - s = stats._s # type: ignore - - # also note that field naming is completely braindead - # and only vaguely correlates with the pretty-printed table. - # >>>> gc.get_stats(False) - # Total memory consumed: - # GC used: 8.7MB (peak: 39.0MB) # s.total_gc_memory, s.peak_memory - # in arenas: 3.0MB # s.total_arena_memory - # rawmalloced: 1.7MB # s.total_rawmalloced_memory - # nursery: 4.0MB # s.nursery_size - # raw assembler used: 31.0kB # s.jit_backend_used - # ----------------------------- - # Total: 8.8MB # stats.memory_used_sum - # - # Total memory allocated: - # GC allocated: 38.7MB (peak: 41.1MB) # s.total_allocated_memory, s.peak_allocated_memory - # in arenas: 30.9MB # s.peak_arena_memory - # rawmalloced: 4.1MB # s.peak_rawmalloced_memory - # nursery: 4.0MB # s.nursery_size - # raw assembler allocated: 1.0MB # s.jit_backend_allocated - # ----------------------------- - # Total: 39.7MB # stats.memory_allocated_sum - # - # Total time spent in GC: 0.073 # s.total_gc_time - - pypy_gc_time = CounterMetricFamily( - "pypy_gc_time_seconds_total", - "Total time spent in PyPy GC", - labels=[], - ) - pypy_gc_time.add_metric([], s.total_gc_time / 1000) - yield pypy_gc_time - - pypy_mem = GaugeMetricFamily( - "pypy_memory_bytes", - "Memory tracked by PyPy allocator", - labels=["state", "class", "kind"], - ) - # memory used by JIT assembler - pypy_mem.add_metric(["used", "", "jit"], s.jit_backend_used) - pypy_mem.add_metric(["allocated", "", "jit"], s.jit_backend_allocated) - # memory used by GCed objects - pypy_mem.add_metric(["used", "", "arenas"], s.total_arena_memory) - pypy_mem.add_metric(["allocated", "", "arenas"], s.peak_arena_memory) - pypy_mem.add_metric(["used", "", "rawmalloced"], s.total_rawmalloced_memory) - pypy_mem.add_metric(["allocated", "", "rawmalloced"], s.peak_rawmalloced_memory) - pypy_mem.add_metric(["used", "", "nursery"], s.nursery_size) - pypy_mem.add_metric(["allocated", "", "nursery"], s.nursery_size) - # totals - pypy_mem.add_metric(["used", "totals", "gc"], s.total_gc_memory) - pypy_mem.add_metric(["allocated", "totals", "gc"], s.total_allocated_memory) - pypy_mem.add_metric(["used", "totals", "gc_peak"], s.peak_memory) - pypy_mem.add_metric(["allocated", "totals", "gc_peak"], s.peak_allocated_memory) - yield pypy_mem - - -if running_on_pypy: - REGISTRY.register(PyPyGCStats()) - - -# -# Twisted reactor metrics -# - -tick_time = Histogram( - "python_twisted_reactor_tick_time", - "Tick time of the Twisted reactor (sec)", - buckets=[0.001, 0.002, 0.005, 0.01, 0.025, 0.05, 0.1, 0.2, 0.5, 1, 2, 5], -) -pending_calls_metric = Histogram( - "python_twisted_reactor_pending_calls", - "Pending calls", - buckets=[1, 2, 5, 10, 25, 50, 100, 250, 500, 1000], -) # # Federation Metrics @@ -551,8 +413,6 @@ build_info.labels( " ".join([platform.system(), platform.release()]), ).set(1) -last_ticked = time.time() - # 3PID send info threepid_send_requests = Histogram( "synapse_threepid_send_requests_with_tries", @@ -600,116 +460,6 @@ def register_threadpool(name: str, threadpool: ThreadPool) -> None: ) -class ReactorLastSeenMetric: - def collect(self) -> Iterable[Metric]: - cm = GaugeMetricFamily( - "python_twisted_reactor_last_seen", - "Seconds since the Twisted reactor was last seen", - ) - cm.add_metric([], time.time() - last_ticked) - yield cm - - -REGISTRY.register(ReactorLastSeenMetric()) - -# The minimum time in seconds between GCs for each generation, regardless of the current GC -# thresholds and counts. -MIN_TIME_BETWEEN_GCS = (1.0, 10.0, 30.0) - -# The time (in seconds since the epoch) of the last time we did a GC for each generation. -_last_gc = [0.0, 0.0, 0.0] - - -F = TypeVar("F", bound=Callable[..., Any]) - - -def runUntilCurrentTimer(reactor: ReactorBase, func: F) -> F: - @functools.wraps(func) - def f(*args: Any, **kwargs: Any) -> Any: - now = reactor.seconds() - num_pending = 0 - - # _newTimedCalls is one long list of *all* pending calls. Below loop - # is based off of impl of reactor.runUntilCurrent - for delayed_call in reactor._newTimedCalls: - if delayed_call.time > now: - break - - if delayed_call.delayed_time > 0: - continue - - num_pending += 1 - - num_pending += len(reactor.threadCallQueue) - start = time.time() - ret = func(*args, **kwargs) - end = time.time() - - # record the amount of wallclock time spent running pending calls. - # This is a proxy for the actual amount of time between reactor polls, - # since about 25% of time is actually spent running things triggered by - # I/O events, but that is harder to capture without rewriting half the - # reactor. - tick_time.observe(end - start) - pending_calls_metric.observe(num_pending) - - # Update the time we last ticked, for the metric to test whether - # Synapse's reactor has frozen - global last_ticked - last_ticked = end - - if running_on_pypy: - return ret - - # Check if we need to do a manual GC (since its been disabled), and do - # one if necessary. Note we go in reverse order as e.g. a gen 1 GC may - # promote an object into gen 2, and we don't want to handle the same - # object multiple times. - threshold = gc.get_threshold() - counts = gc.get_count() - for i in (2, 1, 0): - # We check if we need to do one based on a straightforward - # comparison between the threshold and count. We also do an extra - # check to make sure that we don't a GC too often. - if threshold[i] < counts[i] and MIN_TIME_BETWEEN_GCS[i] < end - _last_gc[i]: - if i == 0: - logger.debug("Collecting gc %d", i) - else: - logger.info("Collecting gc %d", i) - - start = time.time() - unreachable = gc.collect(i) - end = time.time() - - _last_gc[i] = end - - gc_time.labels(i).observe(end - start) - gc_unreachable.labels(i).set(unreachable) - - return ret - - return cast(F, f) - - -try: - # Ensure the reactor has all the attributes we expect - reactor.seconds # type: ignore - reactor.runUntilCurrent # type: ignore - reactor._newTimedCalls # type: ignore - reactor.threadCallQueue # type: ignore - - # runUntilCurrent is called when we have pending calls. It is called once - # per iteratation after fd polling. - reactor.runUntilCurrent = runUntilCurrentTimer(reactor, reactor.runUntilCurrent) # type: ignore - - # We manually run the GC each reactor tick so that we can get some metrics - # about time spent doing GC, - if not running_on_pypy: - gc.disable() -except AttributeError: - pass - - __all__ = [ "MetricsResource", "generate_latest", @@ -717,4 +467,6 @@ __all__ = [ "LaterGauge", "InFlightGauge", "GaugeBucketCollector", + "MIN_TIME_BETWEEN_GCS", + "install_gc_manager", ] |