summary refs log tree commit diff
path: root/synapse/metrics (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Use inline type hints in various other places (in `synapse/`) (#10380)Jonathan de Jong2021-07-153-6/+6
|
* opentracing: use a consistent name for background processes (#10135)Richard van der Hoff2021-06-071-2/+3
| | | | ... otherwise we tend to get a namespace clash between the bg process and the functions that it calls.
* Set opentracing priority before setting other tags (#10092)Richard van der Hoff2021-05-281-2/+8
| | | ... because tags on spans which aren't being sampled get thrown away.
* Export jemalloc stats to prometheus when used (#9882)Erik Johnston2021-05-062-0/+197
|
* Limit how often GC happens by time. (#9902)Erik Johnston2021-05-051-2/+16
| | | | | | | | | | | | | | | Synapse can be quite memory intensive, and unless care is taken to tune the GC thresholds it can end up thrashing, causing noticable performance problems for large servers. We fix this by limiting how often we GC a given generation, regardless of current counts/thresholds. This does not help with the reverse problem where the thresholds are set too high, but that should only happen in situations where they've been manually configured. Adds a `gc_min_seconds_between` config option to override the defaults. Fixes #9890.
* Merge branch 'master' into developAndrew Morgan2021-04-211-4/+16
|\
| * Stop BackgroundProcessLoggingContext making new prometheus timeseries (#9854)Richard van der Hoff2021-04-211-4/+16
| | | | | | | | This undoes part of b076bc276e881b262048307b6a226061d96c4a8d.
* | Merge branch 'master' into developAndrew Morgan2021-04-201-11/+4
|\|
| * Always use the name as the log ID. (#9829)Patrick Cloke2021-04-201-11/+4
| | | | | | | | | | As far as I can tell our logging contexts are meant to log the request ID, or sometimes the request ID followed by a suffix (this is generally stored in the name field of LoggingContext). There's also code to log the name@memory location, but I'm not sure this is ever used. This simplifies the code paths to require every logging context to have a name and use that in logging. For sub-contexts (created via nested_logging_contexts, defer_to_threadpool, Measure) we use the current context's str (which becomes their name or the string "sentinel") and then potentially modify that (e.g. add a suffix).
* | Remove redundant "coding: utf-8" lines (#9786)Jonathan de Jong2021-04-143-3/+0
|/ | | | | | | Part of #9744 Removes all redundant `# -*- coding: utf-8 -*-` lines from files, as python 3 automatically reads source code as utf-8 now. `Signed-off-by: Jonathan de Jong <jonathan@automatia.nl>`
* Record more information into structured logs. (#9654)Patrick Cloke2021-04-081-6/+12
| | | | Records additional request information into the structured logs, e.g. the requester, IP address, etc.
* Don't report anything from GaugeBucketCollector metrics until data is ↵Andrew Morgan2021-04-061-3/+13
| | | | | present (#8926) This PR modifies `GaugeBucketCollector` to only report data once it has been updated, rather than initially reporting a value of 0. Fixes zero values being reported for some metrics on startup until a background job to update the metric's value runs later.
* Fix additional type hints from Twisted upgrade. (#9518)Patrick Cloke2021-03-031-5/+6
|
* Update black, and run auto formatting over the codebase (#9381)Eric Eastwood2021-02-163-9/+9
| | | | | | | - Update black version to the latest - Run black auto formatting over the codebase - Run autoformatting according to [`docs/code_style.md `](https://github.com/matrix-org/synapse/blob/80d6dc9783aa80886a133756028984dbf8920168/docs/code_style.md) - Update `code_style.md` docs around installing black to use the correct version
* Various clean-ups to the logging context code (#8935)Patrick Cloke2020-12-141-4/+3
|
* Allow spam-checker modules to be provide async methods. (#8890)David Teller2020-12-111-7/+2
| | | | Spam checker modules can now provide async methods. This is implemented in a backwards-compatible manner.
* Add metrics for tracking 3PID /requestToken requests. (#8712)Erik Johnston2020-11-131-0/+10
| | | | | | The main use case is to see how many requests are being made, and how many are second/third/etc attempts. If there are large number of retries then that likely indicates a delivery problem.
* Start fewer opentracing spans (#8640)Erik Johnston2020-10-261-3/+9
| | | | | | | #8567 started a span for every background process. This is good as it means all Synapse code that gets run should be in a span (unless in the sentinel logging context), but it means we generate about 15x the number of spans as we did previously. This PR attempts to reduce that number by a) not starting one for send commands to Redis, and b) deferring starting background processes until after we're sure they're necessary. I don't really know how much this will help.
* Fix typos and spelling errors. (#8639)Patrick Cloke2020-10-231-1/+1
|
* Start an opentracing span for background processes. (#8567)Erik Johnston2020-10-191-5/+6
| | | | | | This should reduce the number of `There was no active span` errors we see. Fixes #8510.
* Rewrite BucketCollectorRichard van der Hoff2020-09-301-47/+68
| | | | | | | | | | | | This was a bit unweildy for what I wanted: in particular, I wanted to assign each measurement straight into a bucket, rather than storing an intermediate Counter which didn't do any bucketing at all. I've replaced it with something that is hopefully a bit easier to use. (I'm not entirely sure what the difference between a HistogramMetricFamily and a GaugeHistogramMetricFamily is, but given our counters can go down as well as up the latter *sounds* more accurate?)
* Fix _exposition.py to stop stripping samplesRichard van der Hoff2020-09-301-11/+29
| | | | | | Our hacked-up `_exposition.py` was stripping out some samples it shouldn't have been. Put them back in, to more closely match the upstream `exposition.py`.
* Drop support for ancient prometheus_client (#8426)Richard van der Hoff2020-09-301-22/+2
| | | | Drop compatibility hacks for prometheus-client pre 0.4.0. Debian stretch and Fedora 31 both have newer versions, so hopefully this will be ok.
* Use slots in attrs classes where possible (#8296)Patrick Cloke2020-09-141-2/+2
| | | | | slots use less memory (and attribute access is faster) while slightly limiting the flexibility of the class attributes. This focuses on objects which are instantiated "often" and for short periods of time.
* Stop sub-classing object (#8249)Patrick Cloke2020-09-042-10/+10
|
* Convert runWithConnection to async. (#8121)Patrick Cloke2020-08-191-1/+1
|
* Convert run_as_background_process inner function to async. (#8032)Patrick Cloke2020-08-061-22/+12
|
* Improve stacktraces from exceptions in background processes (#7808)Richard van der Hoff2020-07-091-1/+9
| | | use `Failure()` to fish out the real exception.
* Add some metrics for inbound and outbound federation processing times (#7755)Erik Johnston2020-06-301-0/+6
|
* Set Content-Length for Metrics requests (#7730)Christian Svensson2020-06-231-1/+4
| | | | | | HTTP requires the response to contain a Content-Length header unless chunked encoding is being used. Prometheus metrics endpoint did not set this, causing software such as prometheus-proxy to not be able to scrape synapse for metrics. Signed-off-by: Christian Svensson <blue@cmd.nu>
* Replace iteritems/itervalues/iterkeys with native versions. (#7692)Patrick Cloke2020-06-151-4/+2
|
* Make inflight background metrics more efficient. (#7597)Erik Johnston2020-05-291-34/+70
| | | Co-authored-by: Andrew Morgan <1342360+anoadragon453@users.noreply.github.com>
* synapse.metrics: implement detailed memory usage reporting on PyPy (#7536)Ivan Shapovalov2020-05-221-1/+78
| | | | | | PyPy's gc.get_stats() returns an object containing detailed allocator statistics which could be beneficial to collect as metrics. Signed-off-by: Ivan Shapovalov <intelfx@intelfx.name>
* Allow configuration of Synapse's cache without using synctl or environment ↵Amber Brown2020-05-111-4/+8
| | | | variables (#6391)
* Add prometheus metrics for the number of active pushers (#7103)Richard van der Hoff2020-03-192-7/+10
|
* Clarify list/set/dict/tuple comprehensions and enforce via flake8 (#6957)Patrick Cloke2020-02-212-3/+3
| | | | Ensure good comprehension hygiene using flake8-comprehensions.
* Fix up some typechecking (#6150)Amber Brown2019-10-022-4/+4
| | | | | | * type checking fixes * changelog
* Update comments and docstringRichard van der Hoff2019-09-251-4/+6
|
* Add wrap_as_background_process decorator.Erik Johnston2019-09-241-1/+28
| | | | | This does the same thing as `run_as_background_process` but means we don't need to create superfluous functions.
* Fix for structured logging tests stomping on logs (#6023)Amber Brown2019-09-132-4/+5
|
* Add a build info metric to Prometheus (#6005)Amber Brown2019-09-101-0/+12
|
* Support Prometheus_client 0.4.0+ (#5636)Amber Brown2019-07-183-20/+275
|
* Move logging utilities out of the side drawer of util/ and into logging/ (#5606)Amber Brown2019-07-041-1/+1
|
* Don't log GC 0s at INFO (#5557)Amber Brown2019-06-281-1/+4
|
* Run Black. (#5482)Amber Brown2019-06-202-22/+18
|
* Prometheus histograms are cumalativeErik Johnston2019-06-141-1/+0
|
* fix prometheus rendering errorAmber H. Brown2019-06-141-1/+1
|
* Expose statistics on extrems to prometheus (#5384)Amber Brown2019-06-131-20/+92
|
* Add metrics for number of outgoing EDUs, by type (#4695)Richard van der Hoff2019-02-201-2/+0
|
* Fix exception in background metrics collectionErik Johnston2018-10-031-2/+6
| | | | | We attempted to iterate through a list on a separate thread without doing the necessary copying.
* Add missing loggerErik Johnston2018-09-201-0/+4
|
* Handle exceptions thrown by background tasksErik Johnston2018-09-201-0/+2
| | | | Fixes #3921
* Remove spurious commentErik Johnston2018-09-141-2/+0
|
* Add in flight real time metrics for Measure blocksErik Johnston2018-09-141-1/+109
|
* isortErik Johnston2018-08-211-1/+2
|
* Make the in flight background process metrics thread safeErik Johnston2018-08-201-5/+20
|
* fix metric nameRichard van der Hoff2018-08-071-1/+1
|
* more metrics for the federation and appservice sendersRichard van der Hoff2018-08-071-0/+13
|
* Fix some looping_call calls which were broken in #3604Richard van der Hoff2018-07-261-2/+8
| | | | | | | | | It turns out that looping_call does check the deferred returned by its callback, and (at least in the case of client_ips), we were relying on this, and I broke it in #3604. Update run_as_background_process to return the deferred, and make sure we return it to clock.looping_call.
* Resource tracking for background processesRichard van der Hoff2018-07-181-0/+179
| | | | | | | | | | | | | | | | This introduces a mechanism for tracking resource usage by background processes, along with an example of how it will be used. This will help address #3518, but more importantly will give us better insights into things which are happening but not being shown up by the request metrics. We *could* do this with Measure blocks, but: - I think having them pulled out as a completely separate metric class will make it easier to distinguish top-level processes from those which are nested. - I want to be able to report on in-flight background processes, and I don't think we want to do this for *all* Measure blocks.
* run isortAmber Brown2018-07-091-6/+5
|
* Attempt to be more performant on PyPy (#3462)Amber Brown2018-06-281-1/+2
|
* Fix description of "python_gc_time" metricRichard van der Hoff2018-06-211-1/+1
|
* spell gauge correctlyMatthew Hodgson2018-06-161-1/+1
|
* add a last seen metric (#3396)Amber Brown2018-06-141-0/+21
|
* Hopefully, fix LaterGuage error handlingRichard van der Hoff2018-06-041-3/+6
|
* Run Prometheus on a different port, optionally. (#3274)Amber Brown2018-05-312-1/+6
|
* pep8Matthew Hodgson2018-05-291-0/+1
|
* disable CPUMetrics if no /proc/self/statMatthew Hodgson2018-05-291-0/+3
| | | | fixes build on macOS again
* invalid syntax :(Amber Brown2018-05-281-2/+1
|
* update metrics to be in secondsAmber Brown2018-05-281-9/+10
|
* pepeighttttAmber Brown2018-05-231-0/+1
|
* add back CPU metricsAmber Brown2018-05-231-1/+35
|
* more cleanupAmber Brown2018-05-221-3/+1
|
* cleanupAmber Brown2018-05-221-6/+22
|
* fixesAmber Brown2018-05-221-2/+10
|
* Merge remote-tracking branch 'origin/develop' into 3218-official-promAmber Brown2018-05-221-1/+5
|\
* | replacing portionsAmber Brown2018-05-211-119/+63
| |
* | don't need the resource portionAmber Brown2018-05-211-23/+0
| |
* | remove old metrics libsAmber Brown2018-05-212-450/+0
|/
* Note that label values can be anythingErik Johnston2018-05-031-1/+2
|
* Fix metrics that have integer value labelsErik Johnston2018-05-031-1/+1
|
* Make _escape_character take MatchObjectErik Johnston2018-05-021-2/+10
|
* Escape label values in prometheus metricsErik Johnston2018-05-021-2/+20
|
* s/list/tupleErik Johnston2018-04-121-2/+2
|
* Track last processed event received_tsErik Johnston2018-04-111-0/+13
|
* Track where event stream processing have gotten up toErik Johnston2018-04-111-0/+13
|
* Add GaugeMetricErik Johnston2018-04-112-1/+38
|
* Don't disable GC when running on PyPyVincent Breitmoser2018-04-101-1/+7
| | | | | | | | PyPy's incminimark GC can't be triggered manually. From what I observed there are no obvious issues with just letting it run normally. And unlike CPython, it actually returns unused RAM to the system. Signed-off-by: Vincent Breitmoser <look@my.amazin.horse>
* Add a metric which increments when a request is receivedRichard van der Hoff2018-03-091-0/+16
| | | | | | It's useful to know when there are peaks in incoming requests - which isn't quite the same as there being peaks in outgoing responses, due to the time taken to handle requests.
* report metrics on number of cache evictionsRichard van der Hoff2018-02-051-1/+10
|
* Add some comments about the reactor tick time metricRichard van der Hoff2018-01-191-1/+6
|
* better exception logging in callbackmetricsRichard van der Hoff2018-01-181-1/+8
| | | | when we fail to render a metric, give a clue as to which metric it was
* mechanism to render metrics with alternative namesRichard van der Hoff2018-01-151-13/+40
|
* Add some comments to metrics classesRichard van der Hoff2018-01-151-1/+27
|
* Make Counter render floatsRichard van der Hoff2018-01-121-3/+10
| | | | | | | | Prometheus handles all metrics as floats, and sometimes we store non-integer values in them (notably, durations in seconds), so let's render them as floats too. (Note that the standard client libraries also treat Counters as floats.)
* Rename the python-specific metrics now the docs claim that we have donePaul "LeoNerd" Evans2016-11-031-7/+9
|
* Since we don't export per-filetype fd counts any more, delete all the code ↵Paul "LeoNerd" Evans2016-11-031-36/+4
| | | | related to that too
* Remove now-unused 'resource' importPaul "LeoNerd" Evans2016-11-031-8/+0
|
* Now we have new-style metrics don't bother exporting legacy-named process onesPaul "LeoNerd" Evans2016-11-031-16/+1
|
* Set up the process collector during metrics __init__; that way all ↵Paul "LeoNerd" Evans2016-10-271-0/+3
| | | | split-process workers have it
* Pass the Metrics group into the process collector instead of having it find ↵Paul "LeoNerd" Evans2016-10-271-7/+3
| | | | its own one; this avoids it needing to import from synapse.metrics
* Allow creation of a 'subspace' within a Metrics object, returning another onePaul "LeoNerd" Evans2016-10-271-0/+3
|
* Split callback metric lambda functions down onto their own lines to keep ↵Paul "LeoNerd" Evans2016-10-191-8/+16
| | | | line lengths under 90
* Adjust code for <100 char line limitPaul "LeoNerd" Evans2016-10-191-1/+1
|
* Cut the raw /proc/self/stat line up into named fields at collection timePaul "LeoNerd" Evans2016-10-191-8/+22
|
* Move the process metrics collector code into its own filePaul "LeoNerd" Evans2016-10-192-141/+159
|
* A slightly neater way to manage metric collector functionsPaul "LeoNerd" Evans2016-10-191-2/+8
|
* appease pep8Paul "LeoNerd" Evans2016-10-191-3/+5
|
* Also guard /proc/self/fds-related code with a suitable psuedoconstantPaul "LeoNerd" Evans2016-10-191-3/+5
|
* Guard registration of process-wide metrics by existence of the requisite ↵Paul "LeoNerd" Evans2016-10-191-45/+50
| | | | /proc entries
* Add standard process_start_time_seconds metricPaul "LeoNerd" Evans2016-10-191-0/+15
|
* Add standard process_max_fds metricPaul "LeoNerd" Evans2016-10-191-0/+13
|
* Add standard process_open_fds metricPaul "LeoNerd" Evans2016-10-191-20/+29
|
* Add standard process_*_memory_bytes metricsPaul "LeoNerd" Evans2016-10-191-0/+8
|
* Use /proc/self/stat to generate the new process_cpu_*_seconds_total metricsPaul "LeoNerd" Evans2016-10-191-4/+12
|
* Export CPU usage metrics also under prometheus-standard metric namePaul "LeoNerd" Evans2016-10-191-0/+15
|
* Callback metric values might not just be integers - allow floatsPaul "LeoNerd" Evans2016-10-191-2/+2
|
* Make psutil optionalErik Johnston2016-08-082-5/+13
|
* Don't explode if we have no snapshots yetErik Johnston2016-07-201-0/+3
|
* Add metrics for psutil derived memory usageErik Johnston2016-07-202-1/+46
|
* Don't track total objects as its too expensive to calculateErik Johnston2016-06-071-1/+0
|
* Record some more GC metricsErik Johnston2016-06-071-0/+5
|
* Also record number of unreachable objectsErik Johnston2016-06-071-2/+4
|
* Change the way we do statsErik Johnston2016-06-071-7/+3
|
* Merge pull request #771 from matrix-org/erikj/gc_tickErik Johnston2016-06-071-0/+26
|\ | | | | Manually run GC on reactor tick.
| * Count number of GC collectsErik Johnston2016-05-161-5/+11
| |
| * Add a commentErik Johnston2016-05-131-0/+5
| |
| * Manually run GC on reactor tick.Erik Johnston2016-05-091-0/+15
| | | | | | | | This also adds a metric for amount of time spent in GC.
* | Change CacheMetrics to be quickerErik Johnston2016-06-032-32/+28
|/ | | | | | We change it so that each cache has an individual CacheMetric, instead of having one global CacheMetric. This means that when a cache tries to increment a counter it does not need to go through so many indirections.
* copyrightsMatthew Hodgson2016-01-073-3/+3
|
* Check that /proc/self/fd exists before listing itMark Haines2015-09-071-0/+4
|
* The maxrss reported by getrusage is in kilobytes, not pagesMark Haines2015-09-071-4/+3
|
* Also check for presence of 'threadCallQueue' in reactorErik Johnston2015-08-181-1/+8
|
* Use more helpful variable namesErik Johnston2015-08-181-3/+3
|
* Fix pending_calls metric to not lieErik Johnston2015-08-141-3/+18
|
* Don't time getDelayedCallsErik Johnston2015-08-131-1/+1
|
* Add some metrics about the reactorErik Johnston2015-08-131-0/+29
|
* Appease pep8Paul "LeoNerd" Evans2015-04-011-0/+1
|
* Report process open filehandles in metricsPaul "LeoNerd" Evans2015-04-011-0/+34
|
* Appease pyflakesPaul "LeoNerd" Evans2015-03-121-1/+1
|
* Delete unused import of NOT_READY_YETPaul "LeoNerd" Evans2015-03-121-1/+0
|
* Appease pep8Paul "LeoNerd" Evans2015-03-123-7/+9
|
* Replace the @metrics.counted annotations in federation with ↵Paul "LeoNerd" Evans2015-03-121-17/+0
| | | | specifically-written counters and distributions
* Add an .inc_by() method to CounterMetric; implement DistributionMetric a ↵Paul "LeoNerd" Evans2015-03-121-23/+14
| | | | neater way
* Don't forbid '_' in metric basenames any more, to allow things like foo_timePaul "LeoNerd" Evans2015-03-121-5/+0
|
* Rename TimerMetric to DistributionMetric; as it could count more than just timePaul "LeoNerd" Evans2015-03-122-14/+18
|
* Export CacheMetric as hits+total, rather than hits+misses, as it's easier to ↵Paul "LeoNerd" Evans2015-03-121-5/+6
| | | | derive hit ratio from that
* Remember to emit final linefeed from /metrics page, or Prometheus gets upsetPaul "LeoNerd" Evans2015-03-121-0/+2
|
* Prometheus needs "escaped" label valuesPaul "LeoNerd" Evans2015-03-121-2/+6
|
* Kill unused CounterMetric.fetch() methodPaul "LeoNerd" Evans2015-03-121-3/+0
|
* Use _ instead of . as a metric namespacing separator, for PrometheusPaul "LeoNerd" Evans2015-03-121-3/+11
|
* Have all @metrics.counted use a single metric name vectored on the method ↵Paul "LeoNerd" Evans2015-03-121-2/+9
| | | | name, rather than a brand new scalar counter per counted method
* Bugfix to rendering output of vectored TimerMetricsPaul "LeoNerd" Evans2015-03-121-3/+2
|
* Rename Metrics' "keys" to "labels"Paul "LeoNerd" Evans2015-03-121-12/+12
|
* Provide some process resource usage metricsPaul "LeoNerd" Evans2015-03-121-0/+27
|
* Neater register_* methods on overall Metrics containerPaul "LeoNerd" Evans2015-03-121-22/+12
|
* Neater implementation of metric render methods by pulling out 'render' as a ↵Paul "LeoNerd" Evans2015-03-121-18/+15
| | | | base method that calls self.render_item
* Initial hack at a TimerMetric; for storing counts + duration accumulatorsPaul "LeoNerd" Evans2015-03-121-0/+48
|
* Ensure that /_synapse/metrics response is UTF-8 encodedPaul "LeoNerd" Evans2015-03-121-1/+2
|
* Implement vector CallbackMetricsPaul "LeoNerd" Evans2015-03-121-2/+6
|
* Neater introspection methods on BaseMetric so that subclasses don't need to ↵Paul "LeoNerd" Evans2015-03-121-4/+11
| | | | touch self.keys directly
* Rename CacheCounterMetric to just CacheMetric; add a CallbackMetric ↵Paul "LeoNerd" Evans2015-03-122-7/+12
| | | | component to give the size of the cache
* Ensure that exceptions while rendering individual metrics don't stop others ↵Paul "LeoNerd" Evans2015-03-121-1/+10
| | | | from being rendered anyway - especially useful for CallbackMetric
* Initial attempt at a scalar callback-based metric to give instantaneous ↵Paul "LeoNerd" Evans2015-03-122-1/+24
| | | | snapshot gauges
* Create the concept of a cachecounter metric; generating two counters ↵Paul "LeoNerd" Evans2015-03-122-7/+47
| | | | specific to caches
* Have the MetricsResource actually render metric countersPaul "LeoNerd" Evans2015-03-121-1/+3
|
* An initial implementation of a 'metrics' instance, similar to a 'logger' for ↵Paul "LeoNerd" Evans2015-03-121-0/+69
| | | | keeping counter stats on method calls
* Initial tiny attempt at (vectorable) counter metricsPaul "LeoNerd" Evans2015-03-121-0/+54
|
* A trivial 'hello world'-style resource on /_synapse/metrics, with optional ↵Paul "LeoNerd" Evans2015-03-121-0/+37
commandline flag