summary refs log tree commit diff
path: root/synapse/util (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Replace iteritems/itervalues/iterkeys with native versions. (#7692)Patrick Cloke2020-06-153-10/+4
|
* Performance improvements and refactor of Ratelimiter (#7595)Andrew Morgan2020-06-051-1/+1
| | | | | | | | | | While working on https://github.com/matrix-org/synapse/issues/5665 I found myself digging into the `Ratelimiter` class and seeing that it was both: * Rather undocumented, and * causing a *lot* of config checks This PR attempts to refactor and comment the `Ratelimiter` class, as well as encourage config file accesses to only be done at instantiation. Best to be reviewed commit-by-commit.
* Speed up processing of federation stream RDATA rows.Erik Johnston2020-05-271-0/+12
| | | | | | Instead of storing and sending an ACK for every single row we send synchronously, we instead do it asynchronously while batching up updates.
* Don't apply cache factor to event cache. (#7578)Erik Johnston2020-05-271-0/+4
| | | | This is already correctly done when we instansiate the cache, but wasn't when it got reloaded (which always happens at least once on startup).
* Fix stacktrace mangling in `patch_inline_callbacks` (#7554)Richard van der Hoff2020-05-221-2/+7
| | | `Failure()` is more cunning than `Failure(e)`.
* remove miscellaneous PY2 codeRichard van der Hoff2020-05-152-27/+8
|
* remove to_asciiRichard van der Hoff2020-05-151-19/+1
| | | | this is a no-op on python 3.
* Remove `exception_to_unicode`Richard van der Hoff2020-05-151-36/+0
| | | | this is a no-op on python 3.
* Strictly enforce canonicaljson requirements in a new room version (#7381)Patrick Cloke2020-05-141-1/+1
|
* Allow configuration of Synapse's cache without using synctl or environment ↵Amber Brown2020-05-117-92/+206
| | | | variables (#6391)
* Speed up fetching device lists changes in sync.Erik Johnston2020-05-051-4/+15
| | | | | Currently we copy `users_who_share_room` needlessly about three times, which is expensive when the set is large (which it can easily be).
* Extend StreamChangeCache to support multiple entities per stream ID (#7303)Richard van der Hoff2020-04-221-46/+71
| | | | | | | | | | | | | | | | | | | First some background: StreamChangeCache is used to keep track of what "entities" have changed since a given stream ID. So for example, we might use it to keep track of when the last to-device message for a given user was received [1], and hence whether we need to pull any to-device messages from the database on a sync [2]. Now, it turns out that StreamChangeCache didn't support more than one thing being changed at a given stream_id (this was part of the problem with #7206). However, it's entirely valid to send to-device messages to more than one user at a time. As it turns out, this did in fact work, because *some* methods of StreamChangeCache coped ok with having multiple things changing on the same stream ID, and it seems we never actually use the methods which don't work on the stream change caches where we allow multiple changes at the same stream ID. But that feels horribly fragile, hence: let's update StreamChangeCache to properly support this, and add some typing and some more tests while we're at it. [1]: https://github.com/matrix-org/synapse/blob/release-v1.12.3/synapse/storage/data_stores/main/deviceinbox.py#L301 [2]: https://github.com/matrix-org/synapse/blob/release-v1.12.3/synapse/storage/data_stores/main/deviceinbox.py#L47-L51
* On catchup, process each row with its own stream id (#7286)Richard van der Hoff2020-04-201-0/+3
| | | | | | Other parts of the code (such as the StreamChangeCache) assume that there will not be multiple changes with the same stream id. This code was introduced in #7024, and I hope this fixes #7206.
* Rewrite prune_old_outbound_device_pokes for efficiency (#7159)Richard van der Hoff2020-03-301-1/+20
| | | | make sure we clear out all but one update for the user
* Clean up some LoggingContext stuff (#7120)Richard van der Hoff2020-03-242-20/+20
| | | | | | | | | | | | | | | | | | | | | | | * Pull Sentinel out of LoggingContext ... and drop a few unnecessary references to it * Factor out LoggingContext.current_context move `current_context` and `set_context` out to top-level functions. Mostly this means that I can more easily trace what's actually referring to LoggingContext, but I think it's generally neater. * move copy-to-parent into `stop` this really just makes `start` and `stop` more symetric. It also means that it behaves correctly if you manually `set_log_context` rather than using the context manager. * Replace `LoggingContext.alive` with `finished` Turn `alive` into `finished` and make it a bit better defined.
* Clarify list/set/dict/tuple comprehensions and enforce via flake8 (#6957)Patrick Cloke2020-02-211-1/+1
| | | | Ensure good comprehension hygiene using flake8-comprehensions.
* Reduce amount of logging at INFO level. (#6862)Erik Johnston2020-02-061-1/+1
| | | | | | | | A lot of the things we log at INFO are now a bit superfluous, so lets make them DEBUG logs to reduce the amount we log by default. Co-Authored-By: Brendan Abolivier <babolivier@matrix.org> Co-authored-by: Brendan Abolivier <github@brendanabolivier.com>
* Fix stacktraces when using ObservableDeferred and async/await (#6836)Erik Johnston2020-02-031-0/+4
|
* Validate client_secret parameter (#6767)Andrew Morgan2020-01-241-0/+17
|
* Log saml assertions rather than the whole responseRichard van der Hoff2020-01-161-0/+13
| | | | | | ... since the whole response is huge. We even need to break up the assertions, since kibana otherwise truncates them.
* move batch_iter to a separate moduleRichard van der Hoff2020-01-162-17/+35
|
* Handle `config` not being set for synapse plugin modulesRichard van der Hoff2020-01-121-1/+1
| | | | | Some modules don't need any config, so having to define a `config` property just to keep the loader happy is a bit annoying.
* Persist auth/state events at backwards extremities when we fetch them (#6526)Richard van der Hoff2019-12-161-2/+2
| | | The main point here is to make sure that the state returned by _get_state_in_room has been authed before we try to use it as state in the room.
* look up cross-signing keys from the DB in bulk (#6486)Hubert Chathi2019-12-121-1/+1
|
* Remove SnapshotCache in favour of ResponseCacheErik Johnston2019-12-091-94/+0
|
* Fix inaccurate per-block metrics (#6491)Richard van der Hoff2019-12-091-42/+18
| | | | | `Measure` incorrectly assumed that it was the only thing being done by the parent `LoggingContext`. For instance, during a "renew group attestations" operation, hundreds of `outbound_request` calls could take place in parallel, all using the same `LoggingContext`. This would mean that any resources used during *any* of those calls would be reported against *all* of them, producing wildly inaccurate results. Instead, we now give each `Measure` block its own `LoggingContext` (using the parent `LoggingContext` mechanism to ensure that the log lines look correct and that the metrics are ultimately propogated to the top level for reporting against requests/backgrond tasks).
* Port SyncHandler to async/awaitErik Johnston2019-12-051-6/+17
|
* Replace instance variations of homeserver with correct case/spacingAndrew Morgan2019-11-121-1/+1
|
* Fix LruCache callback deduplication (#6213)V024602019-11-071-11/+37
|
* Remove usage of deprecated logger.warn method from codebase (#6271)Andrew Morgan2019-10-314-6/+6
| | | Replace every instance of `logger.warn` with `logger.warning` as the former is deprecated.
* Clarify docstringErik Johnston2019-10-301-0/+4
|
* Make ObservableDeferred.observe() always return deferred.Erik Johnston2019-10-301-5/+2
| | | | | | | This makes it easier to use in an async/await world. Also fixes a bug where cache descriptors would occaisonally return a raw value rather than a deferred.
* Handle FileNotFound error in checking git repository version (#6284)Andrew Morgan2019-10-301-4/+6
|
* Make concurrently_execute work with async/awaitErik Johnston2019-10-291-4/+3
|
* Update docstringErik Johnston2019-10-291-3/+2
|
* Quick fix to ensure cache descriptors always return deferredsErik Johnston2019-10-281-2/+2
|
* Add maybe_awaitable and fix __init__ bugsErik Johnston2019-10-111-0/+29
|
* Fixup commentsErik Johnston2019-10-101-3/+3
| | | Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
* Fix py3.5Erik Johnston2019-10-101-1/+1
|
* Fix py3.5Erik Johnston2019-10-101-1/+1
|
* sortErik Johnston2019-10-101-3/+1
|
* Appease mypyErik Johnston2019-10-101-13/+18
|
* Add comentsErik Johnston2019-10-101-5/+25
|
* Log correct contextErik Johnston2019-10-101-6/+6
|
* Test for sentinel commitErik Johnston2019-10-101-2/+19
|
* Move patch_inline_callbacks into synapse/Erik Johnston2019-10-101-0/+179
|
* add some metrics on the federation sender (#6160)Richard van der Hoff2019-10-031-2/+4
|
* Fix up some typechecking (#6150)Amber Brown2019-10-025-8/+33
| | | | | | * type checking fixes * changelog
* Fix errors storing large retry intervals.Erik Johnston2019-10-021-1/+1
| | | | | | | | | We have set the max retry interval to a value larger than a postgres or sqlite int can hold, which caused exceptions when updating the destinations table. To fix postgres we need to change the column to a bigint, and for sqlite we lower the max interval to 2**62 (which is still incredibly long).
* Merge branch 'develop' into rav/fix_attribute_mappingRichard van der Hoff2019-09-193-21/+68
|\
| * Add 'failure_ts' column to 'destinations' table (#6016)Richard van der Hoff2019-09-171-1/+15
| | | | | | | | Track the time that a server started failing at, for general analysis purposes.
| * Remove the cap on federation retry interval. (#6026)Richard van der Hoff2019-09-121-2/+2
| | | | | | | | | | | | Essentially the intention here is to end up blacklisting servers which never respond to federation requests. Fixes https://github.com/matrix-org/synapse/issues/5113.
| * Fix bug in calculating the federation retry backoff period (#6025)Richard van der Hoff2019-09-121-2/+3
| | | | | | | | This was intended to introduce an element of jitter; instead it gave you a 30/60 chance of resetting to zero.
| * Use the v2 Identity Service API for lookups (MSC2134 + MSC2140) (#5976)Andrew Morgan2019-09-111-0/+33
| | | | | | | | | | | | | | This is a redo of https://github.com/matrix-org/synapse/pull/5897 but with `id_access_token` accepted. Implements [MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134) plus Identity Service v2 authentication ala [MSC2140](https://github.com/matrix-org/matrix-doc/pull/2140). Identity lookup-related functions were also moved from `RoomMemberHandler` to `IdentityHandler`.
| * Clean up some code in the retry logic (#6017)Richard van der Hoff2019-09-111-16/+13
| | | | | | | | * remove some unused code * make things which were constants into constants for efficiency and clarity
| * Revert "Use the v2 lookup API for 3PID invites (#5897)" (#5937)Andrew Morgan2019-08-301-33/+0
| | | | | | | | | | This reverts commit 71fc04069a5770a204c3514e0237d7374df257a8. This broke 3PID invites as #5892 was required for it to work correctly.
| * Use the v2 lookup API for 3PID invites (#5897)Andrew Morgan2019-08-281-0/+33
| | | | | | | | | | | | | | Fixes https://github.com/matrix-org/synapse/issues/5861 Adds support for the v2 lookup API as defined in [MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134). Currently this is only used for 3PID invites. Sytest PR: https://github.com/matrix-org/sytest/pull/679
| * Retry well-known lookup before expiry.Erik Johnston2019-08-131-3/+5
| | | | | | | | | | | | | | | | | | This gives a bit of a grace period where we can attempt to refetch a remote `well-known`, while still using the cached result if that fails. Hopefully this will make the well-known resolution a bit more torelant of failures, rather than it immediately treating failures as "no result" and caching that for an hour.
* | Fix a bug with saml attribute maps.Richard van der Hoff2019-09-191-1/+19
|/ | | | | | | | | | | | | Fixes a bug where the default attribute maps were prioritised over user-specified ones, resulting in incorrect mappings. The problem is that if you call SPConfig.load() multiple times, it adds new attribute mappers to a list. So by calling it with the default config first, and then the user-specified config, we would always get the default mappers before the user-specified mappers. To solve this, let's merge the config dicts first, and then pass them to SPConfig.
* Add kwargs and docBrendan Abolivier2019-07-291-2/+4
|
* Add ability to pass arguments to looping callsBrendan Abolivier2019-07-291-2/+2
|
* Fix some error cases in the caching layer. (#5749)Richard van der Hoff2019-07-251-32/+42
| | | | | | | There was some inconsistent behaviour in the caching layer around how exceptions were handled - particularly synchronously-thrown ones. This seems to be most easily handled by pushing the creation of ObservableDeferreds down from CacheDescriptor to the Cache.
* Add a prometheus metric for active cache lookups. (#5750)Richard van der Hoff2019-07-242-2/+33
| | | | | | * Add a prometheus metric for active cache lookups. * changelog
* Replace returnValue with return (#5736)Amber Brown2019-07-236-15/+13
|
* Cache get_version_string.Erik Johnston2019-07-221-2/+21
| | | | | | | | | The version of a module isn't going to change over the lifetime of the process (assuming no funky hot reloading is going on, which it isn't), so let's just cache the result to avoid spawning lots of git subprocesses. Fixes #5672.
* Fixes to the federation rate limiter (#5621)Richard van der Hoff2019-07-051-8/+8
| | | | | | | - Put the default window_size back to 1000ms (broken by #5181) - Make the `rc_federation` config actually do something - fix an off-by-one error in the 'concurrent' limit - Avoid creating an unused `_PerHostRatelimiter` object for every single incoming request
* Improve the backwards compatibility re-exports of synapse.logging.context ↵Amber Brown2019-07-053-5/+61
| | | | | | | | (#5617) * Improve the backwards compatibility re-exports of synapse.logging.context. * reexport logformatter too
* Move logging utilities out of the side drawer of util/ and into logging/ (#5606)Amber Brown2019-07-0412-962/+26
|
* Fix 'utime went backwards' errors on daemonization. (#5609)Richard van der Hoff2019-07-031-4/+13
| | | | | | | | * Fix 'utime went backwards' errors on daemonization. Fixes #5608 * remove spurious debug
* Fix a number of "Starting txn from sentinel context" warnings (#5605)Richard van der Hoff2019-07-031-1/+7
| | | | Fixes #5602, #5603
* Fix media repo breaking (#5593)Amber Brown2019-07-021-2/+7
|
* Prevent multiple upgrades on the same room at once (#5051)Andrew Morgan2019-06-251-1/+1
| | | | | | | Closes #4583 Does slightly less than #5045, which prevented a room from being upgraded multiple times, one after another. This PR still allows that, but just prevents two from happening at the same time. Mostly just to mitigate the fact that servers are slow and it can take a moment for the room upgrade to actually complete. We don't want people sending another request to upgrade the room when really they just thought the first didn't go through.
* Avoid raising exceptions in metricsRichard van der Hoff2019-06-241-8/+14
| | | | | Sentry will catch the errors if they happen, so that should be good enough, and woun't make things explode if we hit the error condition.
* Merge branch 'develop' into rav/cleanup_metricsRichard van der Hoff2019-06-2427-314/+317
|\
| * Run Black. (#5482)Amber Brown2019-06-2027-314/+317
| |
* | Sanity-checking for metrics updatesRichard van der Hoff2019-06-191-7/+33
|/ | | | Check that our clocks go forward.
* Call RetryLimiter correctly (#5340)Richard van der Hoff2019-06-041-1/+6
| | | Fixes a regression introduced in #5335.
* Avoid rapidly backing-off a server if we ignore the retry intervalRichard van der Hoff2019-06-031-23/+37
|
* Improve logging for logcontext leaks. (#5288)Richard van der Hoff2019-05-291-9/+13
|
* Make all the rate limiting options more consistent (#5181)Amber Brown2019-05-151-32/+15
|
* Merge pull request #5183 from matrix-org/erikj/async_serialize_eventErik Johnston2019-05-151-0/+19
|\ | | | | Allow client event serialization to be async
| * Update docstring with correct return typeErik Johnston2019-05-151-1/+1
| | | | | | Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
| * Allow client event serialization to be asyncErik Johnston2019-05-141-0/+19
| |
* | comment about user_joined_roomRichard van der Hoff2019-05-141-0/+1
|/
* Merge branch 'master' into developRichard van der Hoff2019-05-031-2/+7
|\
| * Use SystemRandom for token generationRichard van der Hoff2019-05-031-2/+7
| |
* | Remove periods from copyright headers (#5046)Andrew Morgan2019-04-111-1/+1
| |
* | Fix disappearing exceptions in manhole. (#5035)Richard van der Hoff2019-04-101-2/+57
|/ | | Avoid sending syntax errors from the manhole to sentry.
* Add a caching layer to .well-known responses (#4516)Richard van der Hoff2019-01-301-0/+161
|
* Merge pull request #4486 from xperimental/workaround-4216Richard van der Hoff2019-01-301-1/+4
|\ | | | | Implement workaround for login error.
| * Implement workaround for login error.Robert Jacob2019-01-301-1/+4
| | | | | | | | Signed-off-by: Robert Jacob <xperimental@solidproject.de>
* | Make linearizer more quiet (#4507)Amber Brown2019-01-291-5/+5
|/
* Fix incorrect logcontexts after a Deferred was cancelled (#4407)Richard van der Hoff2019-01-171-1/+3
|
* Fix UnicodeDecodeError when postgres is not configured in english (#4253)Richard van der Hoff2018-12-041-1/+38
| | | | This is a bit of a half-assed effort at fixing https://github.com/matrix-org/synapse/issues/4252. Fundamentally the right answer is to drop support for Python 2.
* Merge branch 'develop' of github.com:matrix-org/synapse into ↵Erik Johnston2018-10-253-54/+76
|\ | | | | | | erikj/alias_disallow_list
| * Correctly account for cpu usage by background threads (#4074)Richard van der Hoff2018-10-231-51/+69
| | | | | | | | | | | | | | | | | | | | Wrap calls to deferToThread() in a thing which uses a child logcontext to attribute CPU usage to the right request. While we're in the area, remove the logcontext_tracer stuff, which is never used, and afaik doesn't work. Fixes #4064
| * Make scripts/ and scripts-dev/ pass pyflakes (and the rest of the codebase ↵Amber Brown2018-10-201-1/+3
| | | | | | | | on py3) (#4068)
| * Fix manhole on py3 (pt 2) (#4067)Amber Brown2018-10-191-0/+2
| |
| * make a bytestringAmber Brown2018-10-191-2/+2
| |
* | Anchor returned regex to start and end of stringErik Johnston2018-10-191-2/+6
| |
* | Add config option to control alias creationErik Johnston2018-10-191-0/+21
|/
* Remove unnecessary extra function call layerErik Johnston2018-10-081-16/+13
|
* Use errback pattern and catch async failuresErik Johnston2018-10-081-14/+29
|
* Log looping call exceptionsErik Johnston2018-10-051-1/+18
| | | | | | | | If a looping call function errors, then it kills the loop entirely. Currently it throws away the exception logs, so we should make it actually log them. Fixes #3929
* Correctly match 'dict.pop' apiErik Johnston2018-10-011-3/+11
|
* Don't update eviction metrics on explicit removalErik Johnston2018-10-011-5/+0
|
* Merge remote-tracking branch 'origin/develop' into erikj/destination_retry_cacheRichard van der Hoff2018-09-281-4/+37
|\
| * Include eventid in log lines when processing incoming federation ↵Richard van der Hoff2018-09-271-4/+37
| | | | | | | | | | | | | | | | | | | | | | transactions (#3959) when processing incoming transactions, it can be hard to see what's going on, because we process a bunch of stuff in parallel, and because we may end up recursively working our way through a chain of three or four events. This commit creates a way to use logcontexts to add the relevant event ids to the log lines.
* | Merge branch 'rav/fix_expiring_cache_len' into erikj/destination_retry_cacheRichard van der Hoff2018-09-261-10/+17
|\|
| * Log which cache is throwing exceptionsRichard van der Hoff2018-09-261-10/+17
| |
| * Fix ExpiringCache.__len__ to be accurateErik Johnston2018-09-261-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | It used to try and produce an estimate, which was sometimes negative. This caused metrics to be sad, so lets always just calculate it from scratch. (This appears to have been a longstanding bug, but one which has been made more of a problem by #3932 and #3933). (This was originally done by Erik as part of #3933. I'm cherry-picking it because really it's a fix in its own right)
* | Fix ExpiringCache.__len__ to be accurateErik Johnston2018-09-211-12/+9
| | | | | | | | | | | | It used to try and produce an estimate, which was sometimes negative. This caused metrics to be sad, so lets always just calculate it from scratch.
* | Add a five minute cache to get_destination_retry_timingsErik Johnston2018-09-211-0/+13
| | | | | | | | Hopefully helps with #3931
* | Make ExpiringCache slightly more performantErik Johnston2018-09-211-1/+5
|/
* Fix some instances of ExpiringCache not expiring cache itemsErik Johnston2018-09-211-1/+0
| | | | | | | | ExpiringCache required that `start()` be called before it would actually start expiring entries. A number of places didn't do that. This PR removes `start` from ExpiringCache, and automatically starts backround reaping process on creation instead.
* Improve the logging when handling a federation transaction (#3904)Richard van der Hoff2018-09-191-1/+1
| | | | | | | | | | Let's try to rationalise the logging that happens when we are processing an incoming transaction, to make it easier to figure out what is going wrong when they take ages. In particular: - make everything start with a [room_id event_id] prefix - make sure we log a warning when catching exceptions rather than just turning them into other, more cryptic, exceptions.
* Replace custom DeferredTimeoutError with defer.TimeoutErrorErik Johnston2018-09-191-9/+3
|
* Run canceller first to allow it to generate correct errorErik Johnston2018-09-191-2/+5
|
* Update to use new timeout function everywhere.Erik Johnston2018-09-191-54/+19
| | | | | | | The existing deferred timeout helper function (and the one into twisted) suffer from a bug when a deferred's canceller throws an exception, #3842. The new helper function doesn't suffer from this problem.
* Fix timeout functionErik Johnston2018-09-151-1/+2
| | | | | Turns out deferred.cancel sometimes throws, so we do that last to ensure that we always do resolve the new deferred.
* Add an awful secondary timeout to fix wedged requestsErik Johnston2018-09-141-0/+51
| | | | This is an attempt to mitigate #3842 by adding yet-another-timeout
* Add in flight real time metrics for Measure blocksErik Johnston2018-09-141-0/+22
|
* Change the manhole SSH key to have more bitsErik Johnston2018-09-111-13/+31
| | | | | Newer versions of openssh client refuse to connect to the old key due to its length.
* Fix exceptions when a connection is closed before we read the headersRichard van der Hoff2018-08-201-1/+3
| | | | | This fixes bugs introduced in #3700, by making sure that we behave sanely when an incoming connection is closed before the headers are read.
* Robustness fix for logcontext filterRichard van der Hoff2018-08-201-1/+7
| | | | | Make the logcontext filter not explode if it somehow ends up with a logcontext of None, since that infinite-loops the whole logging system.
* Port over enough to get some sytests running on Python 3 (#3668)Amber Brown2018-08-203-8/+29
|
* Merge branch 'rav/fix_linearizer_cancellation' into developRichard van der Hoff2018-08-101-43/+68
|\
| * Fix linearizer cancellation on twisted < 18.7Richard van der Hoff2018-08-101-43/+68
| | | | | | | | | | | | Turns out that cancellation of inlineDeferreds didn't really work properly until Twisted 18.7. This commit refactors Linearizer.queue to avoid inlineCallbacks.
* | Rename async to async_helpers because `async` is a keyword on Python 3.7 (#3678)Amber Brown2018-08-105-4/+4
|/
* Python 3: Convert some unicode/bytes uses (#3569)Amber Brown2018-08-021-3/+3
|
* fix invalidationRichard van der Hoff2018-07-271-1/+1
|
* Rewrite cache list decoratorRichard van der Hoff2018-07-271-67/+64
| | | | | Because it was complicated and annoyed me. I suspect this will be more efficient too.
* Fix some looping_call calls which were broken in #3604Richard van der Hoff2018-07-261-1/+1
| | | | | | | | | It turns out that looping_call does check the deferred returned by its callback, and (at least in the case of client_ips), we were relying on this, and I broke it in #3604. Update run_as_background_process to return the deferred, and make sure we return it to clock.looping_call.
* Test and fix support for cancellation in LinearizerRichard van der Hoff2018-07-201-6/+22
|
* Combine Limiter and LinearizerRichard van der Hoff2018-07-201-89/+10
| | | | | Linearizer was effectively a Limiter with max_count=1, so rather than maintaining two sets of code, let's combine them.
* Improvements to the LimiterRichard van der Hoff2018-07-201-13/+20
| | | | | * give them names, to improve logging * use a deque rather than a list for efficiency
* Add a sleep to the Limiter to fix stack overflows.Richard van der Hoff2018-07-201-3/+20
| | | | Fixes #3570
* Don't spew errors because we can't save metrics (#3563)Amber Brown2018-07-192-6/+24
|
* Make Distributor run its processes as a background processRichard van der Hoff2018-07-181-26/+18
| | | | | | | | | | | This is more involved than it might otherwise be, because the current implementation just drops its logcontexts and runs everything in the sentinel context. It turns out that we aren't actually using a bunch of the functionality here (notably suppress_failures and the fact that Distributor.fire returns a deferred), so the easiest way to fix this is actually by simplifying a bunch of code.
* Run things as background processesRichard van der Hoff2018-07-182-1/+9
| | | | | | | | This fixes #3518, and ensures that we get useful logs and metrics for lots of things that happen in the background. (There are certainly more things that happen in the background; these are just the common ones I've found running a single-process synapse locally).
* Use efficient .intersectionErik Johnston2018-07-171-4/+1
|
* Fix perf regression in PR #3530Erik Johnston2018-07-171-1/+6
| | | | | | | | The get_entities_changed function was changed to return all changed entities since the given stream position, rather than only those changed from a given list of entities. This resulted in the function incorrectly returning large numbers of entities that, for example, caused large increases in database usage.
* Merge pull request #3530 from matrix-org/erikj/stream_cacheAmber Brown2018-07-171-8/+1
|\ | | | | Don't return unknown entities in get_entities_changed
| * Don't return unknown entities in get_entities_changedErik Johnston2018-07-131-8/+1
| | | | | | | | | | | | | | | | The stream cache keeps track of all entities that have changed since a particular stream position, so get_entities_changed does not need to return unknown entites when given a larger stream position. This makes it consistent with the behaviour of has_entity_changed.
* | Make FederationRateLimiter queue requests properlyRichard van der Hoff2018-07-131-10/+23
|/ | | | | | | | popitem removes the *most recent* item by default [1]. We want the oldest. Fixes #3524 [1]: https://docs.python.org/2/library/collections.html#collections.OrderedDict.popitem
* Reduce set building in get_entities_changedRichard van der Hoff2018-07-121-8/+12
| | | | | | | | | | | This line shows up as about 5% of cpu time on a synchrotron: not_known_entities = set(entities) - set(self._entity_to_key) Presumably the problem here is that _entity_to_key can be largeish, and building a set for its keys every time this function is called is slow. Here we rewrite the logic to avoid building so many sets.
* Attempt to include db threads in cpu usage stats (#3496)Richard van der Hoff2018-07-101-2/+21
| | | | | Let's try to include time spent in the DB threads in the per-request/block cpu usage metrics.
* Refactor logcontext resource usage tracking (#3501)Richard van der Hoff2018-07-102-49/+120
| | | | | Factor out the resource usage tracking out to a separate object, which can be passed around and copied independently of the logcontext itself.
* run isortAmber Brown2018-07-0922-73/+71
|
* Attempt to be more performant on PyPy (#3462)Amber Brown2018-06-281-1/+1
|
* Revert "Revert "Try to not use as much CPU in the StreamChangeCache"" (#3454)Amber Brown2018-06-281-2/+4
|
* Revert "Try to not use as much CPU in the StreamChangeCache"Matthew Hodgson2018-06-261-4/+2
|
* fixesAmber Brown2018-06-261-2/+2
|
* fixesAmber Brown2018-06-261-2/+2
|
* try and make loading items from the cache fasterAmber Brown2018-06-261-2/+4
|
* Remove all global reactor imports & pass it around explicitly (#3424)Amber Brown2018-06-251-0/+3
|
* Disable partial state group caching for wildcard lookupsRichard van der Hoff2018-06-221-13/+12
| | | | | | | When _get_state_for_groups is given a wildcard filter, just do a complete lookup. Hopefully this will give us the best of both worlds by not filling up the ram if we only need one or two keys, but also making the cache still work for the federation reader usecase.
* Merge pull request #3419 from matrix-org/rav/events_per_requestRichard van der Hoff2018-06-221-0/+15
|\ | | | | Log number of events fetched from DB
| * Indirect evt_count updates via method callRichard van der Hoff2018-06-221-0/+11
| | | | | | | | so that we can stub it for the sentinel and not have a billion failing UTs
| * Log number of events fetched from DBRichard van der Hoff2018-06-211-0/+4
| | | | | | | | | | | | | | | | | | | | When we finish processing a request, log the number of events we fetched from the database to handle it. [I'm trying to figure out which requests are responsible for large amounts of event cache churn. It may turn out to be more helpful to add counts to the prometheus per-request/block metrics, but that is an extension to this code anyway.]
* | Pass around the reactor explicitly (#3385)Amber Brown2018-06-224-33/+43
|/
* Remove run_on_reactor (#3395)Amber Brown2018-06-141-9/+1
|
* Port to sortedcontainers (with tests!) (#3332)Amber Brown2018-06-061-26/+31
|
* Add hacky cache factor override systemErik Johnston2018-06-042-2/+12
|
* Consistently use six's iteritems and wrap lazy keys/values in list() if ↵Amber Brown2018-05-312-3/+5
| | | | they're not meant to be lazy (#3307)
* Merge pull request #3281 from NotAFile/py3-six-isinstanceAmber Brown2018-05-302-11/+15
|\ | | | | remaining isintance fixes
| * pep8Adrian Tschira2018-05-291-0/+1
| |
| * fix recursion errorAdrian Tschira2018-05-241-7/+5
| |
| * remaining isintance fixesAdrian Tschira2018-05-242-6/+11
| | | | | | | | Signed-off-by: Adrian Tschira <nota@notafile.com>
* | fix up testsAmber Brown2018-05-281-3/+3
| |
* | update to more consistently use seconds in any metrics or loggingAmber Brown2018-05-283-19/+19
| |
* | add comment about why unregAmber Brown2018-05-281-0/+2
| |
* | Merge remote-tracking branch 'origin/develop' into 3218-official-promAmber Brown2018-05-282-1/+24
|\|
| * Merge pull request #3247 from NotAFile/py3-miscAmber Brown2018-05-241-1/+6
| |\ | | | | | | Misc Python3 fixes
| | * fix py3 intern and remove unnecessary py3 encodeAdrian Tschira2018-05-191-1/+6
| | | | | | | | | | | | Signed-off-by: Adrian Tschira <nota@notafile.com>
| * | Merge pull request #3245 from NotAFile/batch-iterAmber Brown2018-05-241-0/+18
| |\ \ | | | | | | | | Add batch_iter to utils
| | * | Add batch_iter to utilsAdrian Tschira2018-05-191-0/+18
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | There's a frequent idiom I noticed where an iterable is split up into a number of chunks/batches. Unfortunately that method does not work with iterators like dict.keys() in python3. This implementation works with iterators. Signed-off-by: Adrian Tschira <nota@notafile.com>
* | | cleanupAmber Brown2018-05-221-5/+10
| | |
* | | cleanup pep8 errorsAmber Brown2018-05-221-2/+5
| | |
* | | fixesAmber Brown2018-05-222-12/+30
| | |
* | | Merge remote-tracking branch 'origin/develop' into 3218-official-promAmber Brown2018-05-221-11/+27
|\| |
| * | CommentErik Johnston2018-05-221-1/+1
| | |
| * | Fix logcontext resource usage trackingErik Johnston2018-05-221-11/+27
| |/
* / replacing portionsAmber Brown2018-05-217-98/+71
|/
* Merge remote-tracking branch 'origin/develop' into rav/warn_on_logcontext_failRichard van der Hoff2018-05-0315-136/+342
|\
| * Fix logcontext leaks in rate limiterRichard van der Hoff2018-05-031-3/+14
| |
| * Merge branch 'develop' into rav/more_logcontext_leaksRichard van der Hoff2018-05-021-1/+1
| |\
| | * Fix incorrect reference to StringIORichard van der Hoff2018-05-021-1/+1
| | | | | | | | | | | | This was introduced in 4f2f5171
| * | Fix a class of logcontext leaksRichard van der Hoff2018-05-021-22/+38
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So, it turns out that if you have a first `Deferred` `D1`, you can add a callback which returns another `Deferred` `D2`, and `D2` must then complete before any further callbacks on `D1` will execute (and later callbacks on `D1` get the *result* of `D2` rather than `D2` itself). So, `D1` might have `called=True` (as in, it has started running its callbacks), but any new callbacks added to `D1` won't get run until `D2` completes - so if you `yield D1` in an `inlineCallbacks` function, your `yield` will 'block'. In conclusion: some of our assumptions in `logcontext` were invalid. We need to make sure that we don't optimise out the logcontext juggling when this situation happens. Fortunately, it is easy to detect by checking `D1.paused`.
| * Merge pull request #3144 from ↵Richard van der Hoff2018-04-301-1/+7
| |\ | | | | | | | | | | | | matrix-org/rav/run_in_background_exception_handling Trap exceptions thrown within run_in_background
| | * Trap exceptions thrown within run_in_backgroundRichard van der Hoff2018-04-271-1/+7
| | | | | | | | | | | | | | | Turn any exceptions that get thrown synchronously within run_in_background into Failures instead.
| * | Merge branch 'develop' into py3-xrange-1Richard van der Hoff2018-04-307-12/+17
| |\ \
| | * \ Merge pull request #3154 from NotAFile/py3-stringioRichard van der Hoff2018-04-301-1/+1
| | |\ \ | | | | | | | | | | Replace stringIO imports with six
| | | * | replace stringIO importsAdrian Tschira2018-04-281-1/+1
| | | | |
| | * | | Merge pull request #3155 from NotAFile/py3-bytes-1Richard van der Hoff2018-04-301-2/+5
| | |\ \ \ | | | | | | | | | | | | more bytes strings
| | | * | | more bytes stringsAdrian Tschira2018-04-291-2/+5
| | | |/ / | | | | | | | | | | | | | | | Signed-off-by: Adrian Tschira <nota@notafile.com>
| | * | | Merge pull request #3140 from matrix-org/rav/use_run_in_backgroundRichard van der Hoff2018-04-305-9/+11
| | |\ \ \ | | | |/ / | | |/| | Use run_in_background in preference to preserve_fn
| | | * | Merge remote-tracking branch 'origin/develop' into rav/use_run_in_backgroundRichard van der Hoff2018-04-271-1/+6
| | | |\ \
| | | * | | Use run_in_background in preference to preserve_fnRichard van der Hoff2018-04-275-9/+11
| | | | |/ | | | |/| | | | | | | | | | | | | | | | | | | | | While I was going through uses of preserve_fn for other PRs, I converted places which only use the wrapped function once to use run_in_background, to avoid creating the function object.
| * | / | Move more xrange to sixAdrian Tschira2018-04-283-5/+10
| |/ / / | | | | | | | | | | | | | | | | | | | | plus a bonus next() Signed-off-by: Adrian Tschira <nota@notafile.com>
| * | | Merge remote-tracking branch 'origin/develop' into rav/deferred_timeoutRichard van der Hoff2018-04-271-1/+6
| |\ \ \ | | | |/ | | |/|
| | * | Improve exception handling for background processesRichard van der Hoff2018-04-271-1/+6
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were a bunch of places where we fire off a process to happen in the background, but don't have any exception handling on it - instead relying on the unhandled error being logged when the relevent deferred gets garbage-collected. This is unsatisfactory for a number of reasons: - logging on garbage collection is best-effort and may happen some time after the error, if at all - it can be hard to figure out where the error actually happened. - it is logged as a scary CRITICAL error which (a) I always forget to grep for and (b) it's not really CRITICAL if a background process we don't care about fails. So this is an attempt to add exception handling to everything we fire off into the background.
| * | Backport deferred.addTimeoutRichard van der Hoff2018-04-271-0/+67
| | | | | | | | | | | | Twisted 16.0 doesn't have addTimeout, so let's backport it.
| * | Use deferred.addTimeout instead of time_bound_deferredRichard van der Hoff2018-04-231-56/+0
| |/ | | | | | | This doesn't feel like a wheel we need to reinvent.
| * Merge pull request #3107 from NotAFile/py3-bool-nonzeroRichard van der Hoff2018-04-201-0/+1
| |\ | | | | | | add __bool__ alias to __nonzero__ methods
| | * add __bool__ alias to __nonzero__ methodsAdrian Tschira2018-04-151-0/+1
| | | | | | | | | | | | Signed-off-by: Adrian Tschira <nota@notafile.com>
| * | Merge pull request #3110 from NotAFile/py3-six-queueRichard van der Hoff2018-04-201-2/+2
| |\ \ | | | | | | | | Replace Queue with six.moves.queue
| | * | Replace Queue with six.moves.queueAdrian Tschira2018-04-161-2/+2
| | |/ | | | | | | | | | | | | | | | and a six.range change which I missed the last time Signed-off-by: Adrian Tschira <nota@notafile.com>
| * | Merge pull request #3093 from matrix-org/rav/response_cache_wrapRichard van der Hoff2018-04-201-14/+74
| |\ \ | | |/ | |/| Refactor ResponseCache usage
| | * ResponseCache: fix handling of completed resultsRichard van der Hoff2018-04-131-13/+19
| | | | | | | | | | | | | | | Turns out that ObservableDeferred.observe doesn't return a deferred if the result is already completed. Fix handling and improve documentation.
| | * Refactor ResponseCache usageRichard van der Hoff2018-04-121-2/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds a `.wrap` method to ResponseCache which wraps up the boilerplate of a (get, set) pair, and then use it throughout the codebase. This will be largely non-functional, but does include the following functional changes: * federation_server.on_context_state_request: drops use of _server_linearizer which looked redundant and could cause incorrect cache misses by yielding between the get and the set. * RoomListHandler.get_remote_public_room_list(): fixes logcontext leaks * the wrap function includes some logging. I'm hoping this won't be too noisy on production.
| * | Revert "Use sortedcontainers instead of blist"Richard van der Hoff2018-04-131-2/+2
| |/ | | | | | | | | | | | | | | | | | | | | This reverts commit 9fbe70a7dc3afabfdac176ba1f4be32dd44602aa. It turns out that sortedcontainers.SortedDict is not an exact match for blist.sorteddict; in particular, `popitem()` removes things from the opposite end of the dict. This is trivial to fix, but I want to add some unit tests, and potentially some more thought about it, before we do so.
| * Merge pull request #3092 from matrix-org/rav/response_cache_metricsRichard van der Hoff2018-04-121-1/+13
| |\ | | | | | | Add metrics for ResponseCache
| | * Add metrics for ResponseCacheRichard van der Hoff2018-04-101-1/+13
| | |
| * | Merge pull request #3059 from matrix-org/rav/doc_response_cacheRichard van der Hoff2018-04-121-0/+32
| |\ \ | | | | | | | | Document the behaviour of ResponseCache
| | * | Document the behaviour of ResponseCacheRichard van der Hoff2018-04-041-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | it looks like everything that uses ResponseCache expects to have to `make_deferred_yieldable` its results. It's debatable whether that is the best approach, but let's document it for now to avoid further confusion.
| * | | Use sortedcontainers instead of blistVincent Breitmoser2018-04-101-2/+2
| | |/ | |/| | | | | | | | | | | | | | | | | | | This commit drop-in replaces blist with SortedContainers. They are written in pure python so work with pypy, but perform as good as native implementations, at least in a couple benchmarks: http://www.grantjenks.com/docs/sortedcontainers/performance.html
| * | Revert "Merge pull request #3066 from matrix-org/rav/remove_redundant_metrics"Richard van der Hoff2018-04-091-0/+25
| | | | | | | | | | | | | | | | | | | | | We aren't ready to release this yet, so I'm reverting it for now. This reverts commit d1679a4ed7947b0814e0f2af9b888a16c588f1a1, reversing changes made to e089100c6231541c446e37e157dec8feed02d283.
| * | Merge pull request #3068 from matrix-org/rav/fix_cache_invalidationRichard van der Hoff2018-04-051-26/+38
| |\ \ | | | | | | | | Improve database cache performance
| | * | Fix overzealous cache invalidationRichard van der Hoff2018-04-051-26/+38
| | | | | | | | | | | | | | | | | | | | Fixes an issue where a cache invalidation would invalidate *all* pending entries, rather than just the entry that we intended to invalidate.
| * | | Remove redundant metrics which were deprecated in 0.27.0.Richard van der Hoff2018-04-041-25/+0
| |/ /
| * / Use static JSONEncodersRichard van der Hoff2018-03-291-0/+19
| |/ | | | | | | | | using json.dumps with custom options requires us to create a new JSONEncoder on each call. It's more efficient to create one upfront and reuse it.
| * 404 correctly on missing paths via NoResourceMatthew Hodgson2018-03-231-2/+2
| | | | | | | | fixes https://github.com/matrix-org/synapse/issues/2043 and https://github.com/matrix-org/synapse/issues/2029
| * Add commentsErik Johnston2018-03-191-0/+7
| |
| * Fix bug where state cache used lots of memoryErik Johnston2018-03-152-5/+9
| | | | | | | | | | | | | | | | | | The state cache bases its size on the sum of the size of entries. The size of the entry is calculated once on insertion, so it is important that the size of entries does not change. The DictionaryCache modified the entries size, which caused the state cache to incorrectly think it was smaller than it actually was.
* | Make 'unexpected logging context' into warningsRichard van der Hoff2018-03-151-2/+2
|/ | | | | I think we've now fixed enough of these that the rest can be logged at warning.
* Factor run_in_background out from preserve_fnRichard van der Hoff2018-03-081-24/+29
| | | | | It annoys me that we create temporary function objects when there's really no need for it. Let's factor the gubbins out of preserve_fn and start using it.
* Rewrite make_deferred_yieldable avoiding inlineCallbacksRichard van der Hoff2018-03-011-9/+11
| | | | | ... because (a) it's actually simpler (b) it might be marginally more performant?
* report metrics on number of cache evictionsRichard van der Hoff2018-02-053-4/+34
|
* Add federation_domain_whitelist option (#2820)Matthew Hodgson2018-01-221-0/+12
| | | | | | Add federation_domain_whitelist gives a way to restrict which domains your HS is allowed to federate with. useful mainly for gracefully preventing a private but internet-connected HS from trying to federate to the wider public Matrix network
* Merge pull request #2813 from matrix-org/matthew/registrations_require_3pidMatthew Hodgson2018-01-221-0/+48
|\ | | | | add registrations_require_3pid and allow_local_3pids
| * fix PR nitpickingMatthew Hodgson2018-01-191-3/+6
| |
| * rewrite based on PR feedback:Matthew Hodgson2018-01-191-0/+45
| | | | | | | | | | | | | | | | | | * [ ] split config options into allowed_local_3pids and registrations_require_3pid * [ ] simplify and comment logic for picking registration flows * [ ] fix docstring and move check_3pid_allowed into a new util module * [ ] use check_3pid_allowed everywhere @erikjohnston PTAL
* | Merge pull request #2804 from matrix-org/erikj/file_consumerErik Johnston2018-01-181-0/+139
|\ \ | | | | | | Add decent impl of a FileConsumer
| * | Do logcontexts correctlyErik Johnston2018-01-181-2/+2
| | |
| * | Move test stuff to testsErik Johnston2018-01-181-25/+1
| | |
| * | Make all fields privateErik Johnston2018-01-181-31/+31
| | |
| * | Ensure we registerProducer isn't called twiceErik Johnston2018-01-181-0/+3
| | |
| * | Fix _notify_empty typoErik Johnston2018-01-181-1/+1
| | |
| * | Move definition of paused_producer to __init__Erik Johnston2018-01-181-2/+4
| | |
| * | Fix commentsErik Johnston2018-01-181-3/+3
| | |
| * | Add decent impl of a FileConsumerErik Johnston2018-01-171-0/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Twisted core doesn't have a general purpose one, so we need to write one ourselves. Features: - All writing happens in background thread - Supports both push and pull producers - Push producers get paused if the consumer falls behind
* | | Fix bugs in block metricsRichard van der Hoff2018-01-181-2/+4
| |/ |/| | | | | ... which I introduced in #2785
* | Track DB scheduling delay per-requestRichard van der Hoff2018-01-162-2/+30
| | | | | | | | | | | | For each request, track the amount of time spent waiting for a db connection. This entails adding it to the LoggingContext and we may as well add metrics for it while we are passing.
* | Track db txn time in millisecsRichard van der Hoff2018-01-162-6/+11
| | | | | | | | ... to reduce the amount of floating-point foo we do.
* | Optimise LoggingContext creation and copyingRichard van der Hoff2018-01-161-7/+18
|/ | | | | | | | It turns out that the only thing we use the __dict__ of LoggingContext for is `request`, and given we create lots of LoggingContexts and then copy them every time we do a db transaction or log line, using the __dict__ seems a bit redundant. Let's try to optimise things by making the request attribute explicit.
* Reorganise request and block metricsRichard van der Hoff2018-01-151-11/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to circumvent the number of duplicate foo:count metrics increasing without bounds, it's time for a rearrangement. The following are all deprecated, and replaced with synapse_util_metrics_block_count: synapse_util_metrics_block_timer:count synapse_util_metrics_block_ru_utime:count synapse_util_metrics_block_ru_stime:count synapse_util_metrics_block_db_txn_count:count synapse_util_metrics_block_db_txn_duration:count The following are all deprecated, and replaced with synapse_http_server_response_count: synapse_http_server_requests synapse_http_server_response_time:count synapse_http_server_response_ru_utime:count synapse_http_server_response_ru_stime:count synapse_http_server_response_db_txn_count:count synapse_http_server_response_db_txn_duration:count The following are renamed (the old metrics are kept for now, but deprecated): synapse_util_metrics_block_timer:total -> synapse_util_metrics_block_time_seconds synapse_util_metrics_block_ru_utime:total -> synapse_util_metrics_block_ru_utime_seconds synapse_util_metrics_block_ru_stime:total -> synapse_util_metrics_block_ru_stime_seconds synapse_util_metrics_block_db_txn_count:total -> synapse_util_metrics_block_db_txn_count synapse_util_metrics_block_db_txn_duration:total -> synapse_util_metrics_block_db_txn_duration_seconds synapse_http_server_response_time:total -> synapse_http_server_response_time_seconds synapse_http_server_response_ru_utime:total -> synapse_http_server_response_ru_utime_seconds synapse_http_server_response_ru_stime:total -> synapse_http_server_response_ru_stime_seconds synapse_http_server_response_db_txn_count:total -> synapse_http_server_response_db_txn_count synapse_http_server_response_db_txn_duration:total synapse_http_server_response_db_txn_duration_seconds
* Remove __PreservingContextDeferred tooRichard van der Hoff2017-11-141-30/+0
|
* Remove preserve_context_over_{fn, deferred}Richard van der Hoff2017-11-143-49/+10
| | | | | Both of these functions ae known to leak logcontexts. Replace the remaining calls to them and kill them off.
* Logging and logcontext fixes for LimiterRichard van der Hoff2017-11-071-7/+17
| | | | | | | | | Add some logging to the Limiter in a similar spirit to the Linearizer, to help debug issues. Also fix a logcontext leak. Also refactor slightly to avoid throwing exceptions.