summary refs log tree commit diff
path: root/synapse/util (follow)
Commit message (Collapse)AuthorAgeFilesLines
* add some metrics on the federation sender (#6160)Richard van der Hoff2019-10-031-2/+4
|
* Fix up some typechecking (#6150)Amber Brown2019-10-025-8/+33
| | | | | | * type checking fixes * changelog
* Fix errors storing large retry intervals.Erik Johnston2019-10-021-1/+1
| | | | | | | | | We have set the max retry interval to a value larger than a postgres or sqlite int can hold, which caused exceptions when updating the destinations table. To fix postgres we need to change the column to a bigint, and for sqlite we lower the max interval to 2**62 (which is still incredibly long).
* Merge branch 'develop' into rav/fix_attribute_mappingRichard van der Hoff2019-09-193-21/+68
|\
| * Add 'failure_ts' column to 'destinations' table (#6016)Richard van der Hoff2019-09-171-1/+15
| | | | | | | | Track the time that a server started failing at, for general analysis purposes.
| * Remove the cap on federation retry interval. (#6026)Richard van der Hoff2019-09-121-2/+2
| | | | | | | | | | | | Essentially the intention here is to end up blacklisting servers which never respond to federation requests. Fixes https://github.com/matrix-org/synapse/issues/5113.
| * Fix bug in calculating the federation retry backoff period (#6025)Richard van der Hoff2019-09-121-2/+3
| | | | | | | | This was intended to introduce an element of jitter; instead it gave you a 30/60 chance of resetting to zero.
| * Use the v2 Identity Service API for lookups (MSC2134 + MSC2140) (#5976)Andrew Morgan2019-09-111-0/+33
| | | | | | | | | | | | | | This is a redo of https://github.com/matrix-org/synapse/pull/5897 but with `id_access_token` accepted. Implements [MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134) plus Identity Service v2 authentication ala [MSC2140](https://github.com/matrix-org/matrix-doc/pull/2140). Identity lookup-related functions were also moved from `RoomMemberHandler` to `IdentityHandler`.
| * Clean up some code in the retry logic (#6017)Richard van der Hoff2019-09-111-16/+13
| | | | | | | | * remove some unused code * make things which were constants into constants for efficiency and clarity
| * Revert "Use the v2 lookup API for 3PID invites (#5897)" (#5937)Andrew Morgan2019-08-301-33/+0
| | | | | | | | | | This reverts commit 71fc04069a5770a204c3514e0237d7374df257a8. This broke 3PID invites as #5892 was required for it to work correctly.
| * Use the v2 lookup API for 3PID invites (#5897)Andrew Morgan2019-08-281-0/+33
| | | | | | | | | | | | | | Fixes https://github.com/matrix-org/synapse/issues/5861 Adds support for the v2 lookup API as defined in [MSC2134](https://github.com/matrix-org/matrix-doc/pull/2134). Currently this is only used for 3PID invites. Sytest PR: https://github.com/matrix-org/sytest/pull/679
| * Retry well-known lookup before expiry.Erik Johnston2019-08-131-3/+5
| | | | | | | | | | | | | | | | | | This gives a bit of a grace period where we can attempt to refetch a remote `well-known`, while still using the cached result if that fails. Hopefully this will make the well-known resolution a bit more torelant of failures, rather than it immediately treating failures as "no result" and caching that for an hour.
* | Fix a bug with saml attribute maps.Richard van der Hoff2019-09-191-1/+19
|/ | | | | | | | | | | | | Fixes a bug where the default attribute maps were prioritised over user-specified ones, resulting in incorrect mappings. The problem is that if you call SPConfig.load() multiple times, it adds new attribute mappers to a list. So by calling it with the default config first, and then the user-specified config, we would always get the default mappers before the user-specified mappers. To solve this, let's merge the config dicts first, and then pass them to SPConfig.
* Add kwargs and docBrendan Abolivier2019-07-291-2/+4
|
* Add ability to pass arguments to looping callsBrendan Abolivier2019-07-291-2/+2
|
* Fix some error cases in the caching layer. (#5749)Richard van der Hoff2019-07-251-32/+42
| | | | | | | There was some inconsistent behaviour in the caching layer around how exceptions were handled - particularly synchronously-thrown ones. This seems to be most easily handled by pushing the creation of ObservableDeferreds down from CacheDescriptor to the Cache.
* Add a prometheus metric for active cache lookups. (#5750)Richard van der Hoff2019-07-242-2/+33
| | | | | | * Add a prometheus metric for active cache lookups. * changelog
* Replace returnValue with return (#5736)Amber Brown2019-07-236-15/+13
|
* Cache get_version_string.Erik Johnston2019-07-221-2/+21
| | | | | | | | | The version of a module isn't going to change over the lifetime of the process (assuming no funky hot reloading is going on, which it isn't), so let's just cache the result to avoid spawning lots of git subprocesses. Fixes #5672.
* Fixes to the federation rate limiter (#5621)Richard van der Hoff2019-07-051-8/+8
| | | | | | | - Put the default window_size back to 1000ms (broken by #5181) - Make the `rc_federation` config actually do something - fix an off-by-one error in the 'concurrent' limit - Avoid creating an unused `_PerHostRatelimiter` object for every single incoming request
* Improve the backwards compatibility re-exports of synapse.logging.context ↵Amber Brown2019-07-053-5/+61
| | | | | | | | (#5617) * Improve the backwards compatibility re-exports of synapse.logging.context. * reexport logformatter too
* Move logging utilities out of the side drawer of util/ and into logging/ (#5606)Amber Brown2019-07-0412-962/+26
|
* Fix 'utime went backwards' errors on daemonization. (#5609)Richard van der Hoff2019-07-031-4/+13
| | | | | | | | * Fix 'utime went backwards' errors on daemonization. Fixes #5608 * remove spurious debug
* Fix a number of "Starting txn from sentinel context" warnings (#5605)Richard van der Hoff2019-07-031-1/+7
| | | | Fixes #5602, #5603
* Fix media repo breaking (#5593)Amber Brown2019-07-021-2/+7
|
* Prevent multiple upgrades on the same room at once (#5051)Andrew Morgan2019-06-251-1/+1
| | | | | | | Closes #4583 Does slightly less than #5045, which prevented a room from being upgraded multiple times, one after another. This PR still allows that, but just prevents two from happening at the same time. Mostly just to mitigate the fact that servers are slow and it can take a moment for the room upgrade to actually complete. We don't want people sending another request to upgrade the room when really they just thought the first didn't go through.
* Avoid raising exceptions in metricsRichard van der Hoff2019-06-241-8/+14
| | | | | Sentry will catch the errors if they happen, so that should be good enough, and woun't make things explode if we hit the error condition.
* Merge branch 'develop' into rav/cleanup_metricsRichard van der Hoff2019-06-2427-314/+317
|\
| * Run Black. (#5482)Amber Brown2019-06-2027-314/+317
| |
* | Sanity-checking for metrics updatesRichard van der Hoff2019-06-191-7/+33
|/ | | | Check that our clocks go forward.
* Call RetryLimiter correctly (#5340)Richard van der Hoff2019-06-041-1/+6
| | | Fixes a regression introduced in #5335.
* Avoid rapidly backing-off a server if we ignore the retry intervalRichard van der Hoff2019-06-031-23/+37
|
* Improve logging for logcontext leaks. (#5288)Richard van der Hoff2019-05-291-9/+13
|
* Make all the rate limiting options more consistent (#5181)Amber Brown2019-05-151-32/+15
|
* Merge pull request #5183 from matrix-org/erikj/async_serialize_eventErik Johnston2019-05-151-0/+19
|\ | | | | Allow client event serialization to be async
| * Update docstring with correct return typeErik Johnston2019-05-151-1/+1
| | | | | | Co-Authored-By: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
| * Allow client event serialization to be asyncErik Johnston2019-05-141-0/+19
| |
* | comment about user_joined_roomRichard van der Hoff2019-05-141-0/+1
|/
* Merge branch 'master' into developRichard van der Hoff2019-05-031-2/+7
|\
| * Use SystemRandom for token generationRichard van der Hoff2019-05-031-2/+7
| |
* | Remove periods from copyright headers (#5046)Andrew Morgan2019-04-111-1/+1
| |
* | Fix disappearing exceptions in manhole. (#5035)Richard van der Hoff2019-04-101-2/+57
|/ | | Avoid sending syntax errors from the manhole to sentry.
* Add a caching layer to .well-known responses (#4516)Richard van der Hoff2019-01-301-0/+161
|
* Merge pull request #4486 from xperimental/workaround-4216Richard van der Hoff2019-01-301-1/+4
|\ | | | | Implement workaround for login error.
| * Implement workaround for login error.Robert Jacob2019-01-301-1/+4
| | | | | | | | Signed-off-by: Robert Jacob <xperimental@solidproject.de>
* | Make linearizer more quiet (#4507)Amber Brown2019-01-291-5/+5
|/
* Fix incorrect logcontexts after a Deferred was cancelled (#4407)Richard van der Hoff2019-01-171-1/+3
|
* Fix UnicodeDecodeError when postgres is not configured in english (#4253)Richard van der Hoff2018-12-041-1/+38
| | | | This is a bit of a half-assed effort at fixing https://github.com/matrix-org/synapse/issues/4252. Fundamentally the right answer is to drop support for Python 2.
* Merge branch 'develop' of github.com:matrix-org/synapse into ↵Erik Johnston2018-10-253-54/+76
|\ | | | | | | erikj/alias_disallow_list
| * Correctly account for cpu usage by background threads (#4074)Richard van der Hoff2018-10-231-51/+69
| | | | | | | | | | | | | | | | | | | | Wrap calls to deferToThread() in a thing which uses a child logcontext to attribute CPU usage to the right request. While we're in the area, remove the logcontext_tracer stuff, which is never used, and afaik doesn't work. Fixes #4064
| * Make scripts/ and scripts-dev/ pass pyflakes (and the rest of the codebase ↵Amber Brown2018-10-201-1/+3
| | | | | | | | on py3) (#4068)
| * Fix manhole on py3 (pt 2) (#4067)Amber Brown2018-10-191-0/+2
| |
| * make a bytestringAmber Brown2018-10-191-2/+2
| |
* | Anchor returned regex to start and end of stringErik Johnston2018-10-191-2/+6
| |
* | Add config option to control alias creationErik Johnston2018-10-191-0/+21
|/
* Remove unnecessary extra function call layerErik Johnston2018-10-081-16/+13
|
* Use errback pattern and catch async failuresErik Johnston2018-10-081-14/+29
|
* Log looping call exceptionsErik Johnston2018-10-051-1/+18
| | | | | | | | If a looping call function errors, then it kills the loop entirely. Currently it throws away the exception logs, so we should make it actually log them. Fixes #3929
* Correctly match 'dict.pop' apiErik Johnston2018-10-011-3/+11
|
* Don't update eviction metrics on explicit removalErik Johnston2018-10-011-5/+0
|
* Merge remote-tracking branch 'origin/develop' into erikj/destination_retry_cacheRichard van der Hoff2018-09-281-4/+37
|\
| * Include eventid in log lines when processing incoming federation ↵Richard van der Hoff2018-09-271-4/+37
| | | | | | | | | | | | | | | | | | | | | | transactions (#3959) when processing incoming transactions, it can be hard to see what's going on, because we process a bunch of stuff in parallel, and because we may end up recursively working our way through a chain of three or four events. This commit creates a way to use logcontexts to add the relevant event ids to the log lines.
* | Merge branch 'rav/fix_expiring_cache_len' into erikj/destination_retry_cacheRichard van der Hoff2018-09-261-10/+17
|\|
| * Log which cache is throwing exceptionsRichard van der Hoff2018-09-261-10/+17
| |
| * Fix ExpiringCache.__len__ to be accurateErik Johnston2018-09-261-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | It used to try and produce an estimate, which was sometimes negative. This caused metrics to be sad, so lets always just calculate it from scratch. (This appears to have been a longstanding bug, but one which has been made more of a problem by #3932 and #3933). (This was originally done by Erik as part of #3933. I'm cherry-picking it because really it's a fix in its own right)
* | Fix ExpiringCache.__len__ to be accurateErik Johnston2018-09-211-12/+9
| | | | | | | | | | | | It used to try and produce an estimate, which was sometimes negative. This caused metrics to be sad, so lets always just calculate it from scratch.
* | Add a five minute cache to get_destination_retry_timingsErik Johnston2018-09-211-0/+13
| | | | | | | | Hopefully helps with #3931
* | Make ExpiringCache slightly more performantErik Johnston2018-09-211-1/+5
|/
* Fix some instances of ExpiringCache not expiring cache itemsErik Johnston2018-09-211-1/+0
| | | | | | | | ExpiringCache required that `start()` be called before it would actually start expiring entries. A number of places didn't do that. This PR removes `start` from ExpiringCache, and automatically starts backround reaping process on creation instead.
* Improve the logging when handling a federation transaction (#3904)Richard van der Hoff2018-09-191-1/+1
| | | | | | | | | | Let's try to rationalise the logging that happens when we are processing an incoming transaction, to make it easier to figure out what is going wrong when they take ages. In particular: - make everything start with a [room_id event_id] prefix - make sure we log a warning when catching exceptions rather than just turning them into other, more cryptic, exceptions.
* Replace custom DeferredTimeoutError with defer.TimeoutErrorErik Johnston2018-09-191-9/+3
|
* Run canceller first to allow it to generate correct errorErik Johnston2018-09-191-2/+5
|
* Update to use new timeout function everywhere.Erik Johnston2018-09-191-54/+19
| | | | | | | The existing deferred timeout helper function (and the one into twisted) suffer from a bug when a deferred's canceller throws an exception, #3842. The new helper function doesn't suffer from this problem.
* Fix timeout functionErik Johnston2018-09-151-1/+2
| | | | | Turns out deferred.cancel sometimes throws, so we do that last to ensure that we always do resolve the new deferred.
* Add an awful secondary timeout to fix wedged requestsErik Johnston2018-09-141-0/+51
| | | | This is an attempt to mitigate #3842 by adding yet-another-timeout
* Add in flight real time metrics for Measure blocksErik Johnston2018-09-141-0/+22
|
* Change the manhole SSH key to have more bitsErik Johnston2018-09-111-13/+31
| | | | | Newer versions of openssh client refuse to connect to the old key due to its length.
* Fix exceptions when a connection is closed before we read the headersRichard van der Hoff2018-08-201-1/+3
| | | | | This fixes bugs introduced in #3700, by making sure that we behave sanely when an incoming connection is closed before the headers are read.
* Robustness fix for logcontext filterRichard van der Hoff2018-08-201-1/+7
| | | | | Make the logcontext filter not explode if it somehow ends up with a logcontext of None, since that infinite-loops the whole logging system.
* Port over enough to get some sytests running on Python 3 (#3668)Amber Brown2018-08-203-8/+29
|
* Merge branch 'rav/fix_linearizer_cancellation' into developRichard van der Hoff2018-08-101-43/+68
|\
| * Fix linearizer cancellation on twisted < 18.7Richard van der Hoff2018-08-101-43/+68
| | | | | | | | | | | | Turns out that cancellation of inlineDeferreds didn't really work properly until Twisted 18.7. This commit refactors Linearizer.queue to avoid inlineCallbacks.
* | Rename async to async_helpers because `async` is a keyword on Python 3.7 (#3678)Amber Brown2018-08-105-4/+4
|/
* Python 3: Convert some unicode/bytes uses (#3569)Amber Brown2018-08-021-3/+3
|
* fix invalidationRichard van der Hoff2018-07-271-1/+1
|
* Rewrite cache list decoratorRichard van der Hoff2018-07-271-67/+64
| | | | | Because it was complicated and annoyed me. I suspect this will be more efficient too.
* Fix some looping_call calls which were broken in #3604Richard van der Hoff2018-07-261-1/+1
| | | | | | | | | It turns out that looping_call does check the deferred returned by its callback, and (at least in the case of client_ips), we were relying on this, and I broke it in #3604. Update run_as_background_process to return the deferred, and make sure we return it to clock.looping_call.
* Test and fix support for cancellation in LinearizerRichard van der Hoff2018-07-201-6/+22
|
* Combine Limiter and LinearizerRichard van der Hoff2018-07-201-89/+10
| | | | | Linearizer was effectively a Limiter with max_count=1, so rather than maintaining two sets of code, let's combine them.
* Improvements to the LimiterRichard van der Hoff2018-07-201-13/+20
| | | | | * give them names, to improve logging * use a deque rather than a list for efficiency
* Add a sleep to the Limiter to fix stack overflows.Richard van der Hoff2018-07-201-3/+20
| | | | Fixes #3570
* Don't spew errors because we can't save metrics (#3563)Amber Brown2018-07-192-6/+24
|
* Make Distributor run its processes as a background processRichard van der Hoff2018-07-181-26/+18
| | | | | | | | | | | This is more involved than it might otherwise be, because the current implementation just drops its logcontexts and runs everything in the sentinel context. It turns out that we aren't actually using a bunch of the functionality here (notably suppress_failures and the fact that Distributor.fire returns a deferred), so the easiest way to fix this is actually by simplifying a bunch of code.
* Run things as background processesRichard van der Hoff2018-07-182-1/+9
| | | | | | | | This fixes #3518, and ensures that we get useful logs and metrics for lots of things that happen in the background. (There are certainly more things that happen in the background; these are just the common ones I've found running a single-process synapse locally).
* Use efficient .intersectionErik Johnston2018-07-171-4/+1
|
* Fix perf regression in PR #3530Erik Johnston2018-07-171-1/+6
| | | | | | | | The get_entities_changed function was changed to return all changed entities since the given stream position, rather than only those changed from a given list of entities. This resulted in the function incorrectly returning large numbers of entities that, for example, caused large increases in database usage.
* Merge pull request #3530 from matrix-org/erikj/stream_cacheAmber Brown2018-07-171-8/+1
|\ | | | | Don't return unknown entities in get_entities_changed
| * Don't return unknown entities in get_entities_changedErik Johnston2018-07-131-8/+1
| | | | | | | | | | | | | | | | The stream cache keeps track of all entities that have changed since a particular stream position, so get_entities_changed does not need to return unknown entites when given a larger stream position. This makes it consistent with the behaviour of has_entity_changed.
* | Make FederationRateLimiter queue requests properlyRichard van der Hoff2018-07-131-10/+23
|/ | | | | | | | popitem removes the *most recent* item by default [1]. We want the oldest. Fixes #3524 [1]: https://docs.python.org/2/library/collections.html#collections.OrderedDict.popitem
* Reduce set building in get_entities_changedRichard van der Hoff2018-07-121-8/+12
| | | | | | | | | | | This line shows up as about 5% of cpu time on a synchrotron: not_known_entities = set(entities) - set(self._entity_to_key) Presumably the problem here is that _entity_to_key can be largeish, and building a set for its keys every time this function is called is slow. Here we rewrite the logic to avoid building so many sets.
* Attempt to include db threads in cpu usage stats (#3496)Richard van der Hoff2018-07-101-2/+21
| | | | | Let's try to include time spent in the DB threads in the per-request/block cpu usage metrics.
* Refactor logcontext resource usage tracking (#3501)Richard van der Hoff2018-07-102-49/+120
| | | | | Factor out the resource usage tracking out to a separate object, which can be passed around and copied independently of the logcontext itself.
* run isortAmber Brown2018-07-0922-73/+71
|
* Attempt to be more performant on PyPy (#3462)Amber Brown2018-06-281-1/+1
|
* Revert "Revert "Try to not use as much CPU in the StreamChangeCache"" (#3454)Amber Brown2018-06-281-2/+4
|
* Revert "Try to not use as much CPU in the StreamChangeCache"Matthew Hodgson2018-06-261-4/+2
|
* fixesAmber Brown2018-06-261-2/+2
|
* fixesAmber Brown2018-06-261-2/+2
|
* try and make loading items from the cache fasterAmber Brown2018-06-261-2/+4
|
* Remove all global reactor imports & pass it around explicitly (#3424)Amber Brown2018-06-251-0/+3
|
* Disable partial state group caching for wildcard lookupsRichard van der Hoff2018-06-221-13/+12
| | | | | | | When _get_state_for_groups is given a wildcard filter, just do a complete lookup. Hopefully this will give us the best of both worlds by not filling up the ram if we only need one or two keys, but also making the cache still work for the federation reader usecase.
* Merge pull request #3419 from matrix-org/rav/events_per_requestRichard van der Hoff2018-06-221-0/+15
|\ | | | | Log number of events fetched from DB
| * Indirect evt_count updates via method callRichard van der Hoff2018-06-221-0/+11
| | | | | | | | so that we can stub it for the sentinel and not have a billion failing UTs
| * Log number of events fetched from DBRichard van der Hoff2018-06-211-0/+4
| | | | | | | | | | | | | | | | | | | | When we finish processing a request, log the number of events we fetched from the database to handle it. [I'm trying to figure out which requests are responsible for large amounts of event cache churn. It may turn out to be more helpful to add counts to the prometheus per-request/block metrics, but that is an extension to this code anyway.]
* | Pass around the reactor explicitly (#3385)Amber Brown2018-06-224-33/+43
|/
* Remove run_on_reactor (#3395)Amber Brown2018-06-141-9/+1
|
* Port to sortedcontainers (with tests!) (#3332)Amber Brown2018-06-061-26/+31
|
* Add hacky cache factor override systemErik Johnston2018-06-042-2/+12
|
* Consistently use six's iteritems and wrap lazy keys/values in list() if ↵Amber Brown2018-05-312-3/+5
| | | | they're not meant to be lazy (#3307)
* Merge pull request #3281 from NotAFile/py3-six-isinstanceAmber Brown2018-05-302-11/+15
|\ | | | | remaining isintance fixes
| * pep8Adrian Tschira2018-05-291-0/+1
| |
| * fix recursion errorAdrian Tschira2018-05-241-7/+5
| |
| * remaining isintance fixesAdrian Tschira2018-05-242-6/+11
| | | | | | | | Signed-off-by: Adrian Tschira <nota@notafile.com>
* | fix up testsAmber Brown2018-05-281-3/+3
| |
* | update to more consistently use seconds in any metrics or loggingAmber Brown2018-05-283-19/+19
| |
* | add comment about why unregAmber Brown2018-05-281-0/+2
| |
* | Merge remote-tracking branch 'origin/develop' into 3218-official-promAmber Brown2018-05-282-1/+24
|\|
| * Merge pull request #3247 from NotAFile/py3-miscAmber Brown2018-05-241-1/+6
| |\ | | | | | | Misc Python3 fixes
| | * fix py3 intern and remove unnecessary py3 encodeAdrian Tschira2018-05-191-1/+6
| | | | | | | | | | | | Signed-off-by: Adrian Tschira <nota@notafile.com>
| * | Merge pull request #3245 from NotAFile/batch-iterAmber Brown2018-05-241-0/+18
| |\ \ | | | | | | | | Add batch_iter to utils
| | * | Add batch_iter to utilsAdrian Tschira2018-05-191-0/+18
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | There's a frequent idiom I noticed where an iterable is split up into a number of chunks/batches. Unfortunately that method does not work with iterators like dict.keys() in python3. This implementation works with iterators. Signed-off-by: Adrian Tschira <nota@notafile.com>
* | | cleanupAmber Brown2018-05-221-5/+10
| | |
* | | cleanup pep8 errorsAmber Brown2018-05-221-2/+5
| | |
* | | fixesAmber Brown2018-05-222-12/+30
| | |
* | | Merge remote-tracking branch 'origin/develop' into 3218-official-promAmber Brown2018-05-221-11/+27
|\| |
| * | CommentErik Johnston2018-05-221-1/+1
| | |
| * | Fix logcontext resource usage trackingErik Johnston2018-05-221-11/+27
| |/
* / replacing portionsAmber Brown2018-05-217-98/+71
|/
* Merge remote-tracking branch 'origin/develop' into rav/warn_on_logcontext_failRichard van der Hoff2018-05-0315-136/+342
|\
| * Fix logcontext leaks in rate limiterRichard van der Hoff2018-05-031-3/+14
| |
| * Merge branch 'develop' into rav/more_logcontext_leaksRichard van der Hoff2018-05-021-1/+1
| |\
| | * Fix incorrect reference to StringIORichard van der Hoff2018-05-021-1/+1
| | | | | | | | | | | | This was introduced in 4f2f5171
| * | Fix a class of logcontext leaksRichard van der Hoff2018-05-021-22/+38
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So, it turns out that if you have a first `Deferred` `D1`, you can add a callback which returns another `Deferred` `D2`, and `D2` must then complete before any further callbacks on `D1` will execute (and later callbacks on `D1` get the *result* of `D2` rather than `D2` itself). So, `D1` might have `called=True` (as in, it has started running its callbacks), but any new callbacks added to `D1` won't get run until `D2` completes - so if you `yield D1` in an `inlineCallbacks` function, your `yield` will 'block'. In conclusion: some of our assumptions in `logcontext` were invalid. We need to make sure that we don't optimise out the logcontext juggling when this situation happens. Fortunately, it is easy to detect by checking `D1.paused`.
| * Merge pull request #3144 from ↵Richard van der Hoff2018-04-301-1/+7
| |\ | | | | | | | | | | | | matrix-org/rav/run_in_background_exception_handling Trap exceptions thrown within run_in_background
| | * Trap exceptions thrown within run_in_backgroundRichard van der Hoff2018-04-271-1/+7
| | | | | | | | | | | | | | | Turn any exceptions that get thrown synchronously within run_in_background into Failures instead.
| * | Merge branch 'develop' into py3-xrange-1Richard van der Hoff2018-04-307-12/+17
| |\ \
| | * \ Merge pull request #3154 from NotAFile/py3-stringioRichard van der Hoff2018-04-301-1/+1
| | |\ \ | | | | | | | | | | Replace stringIO imports with six
| | | * | replace stringIO importsAdrian Tschira2018-04-281-1/+1
| | | | |
| | * | | Merge pull request #3155 from NotAFile/py3-bytes-1Richard van der Hoff2018-04-301-2/+5
| | |\ \ \ | | | | | | | | | | | | more bytes strings
| | | * | | more bytes stringsAdrian Tschira2018-04-291-2/+5
| | | |/ / | | | | | | | | | | | | | | | Signed-off-by: Adrian Tschira <nota@notafile.com>
| | * | | Merge pull request #3140 from matrix-org/rav/use_run_in_backgroundRichard van der Hoff2018-04-305-9/+11
| | |\ \ \ | | | |/ / | | |/| | Use run_in_background in preference to preserve_fn
| | | * | Merge remote-tracking branch 'origin/develop' into rav/use_run_in_backgroundRichard van der Hoff2018-04-271-1/+6
| | | |\ \
| | | * | | Use run_in_background in preference to preserve_fnRichard van der Hoff2018-04-275-9/+11
| | | | |/ | | | |/| | | | | | | | | | | | | | | | | | | | | While I was going through uses of preserve_fn for other PRs, I converted places which only use the wrapped function once to use run_in_background, to avoid creating the function object.
| * | / | Move more xrange to sixAdrian Tschira2018-04-283-5/+10
| |/ / / | | | | | | | | | | | | | | | | | | | | plus a bonus next() Signed-off-by: Adrian Tschira <nota@notafile.com>
| * | | Merge remote-tracking branch 'origin/develop' into rav/deferred_timeoutRichard van der Hoff2018-04-271-1/+6
| |\ \ \ | | | |/ | | |/|
| | * | Improve exception handling for background processesRichard van der Hoff2018-04-271-1/+6
| | |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were a bunch of places where we fire off a process to happen in the background, but don't have any exception handling on it - instead relying on the unhandled error being logged when the relevent deferred gets garbage-collected. This is unsatisfactory for a number of reasons: - logging on garbage collection is best-effort and may happen some time after the error, if at all - it can be hard to figure out where the error actually happened. - it is logged as a scary CRITICAL error which (a) I always forget to grep for and (b) it's not really CRITICAL if a background process we don't care about fails. So this is an attempt to add exception handling to everything we fire off into the background.
| * | Backport deferred.addTimeoutRichard van der Hoff2018-04-271-0/+67
| | | | | | | | | | | | Twisted 16.0 doesn't have addTimeout, so let's backport it.
| * | Use deferred.addTimeout instead of time_bound_deferredRichard van der Hoff2018-04-231-56/+0
| |/ | | | | | | This doesn't feel like a wheel we need to reinvent.
| * Merge pull request #3107 from NotAFile/py3-bool-nonzeroRichard van der Hoff2018-04-201-0/+1
| |\ | | | | | | add __bool__ alias to __nonzero__ methods
| | * add __bool__ alias to __nonzero__ methodsAdrian Tschira2018-04-151-0/+1
| | | | | | | | | | | | Signed-off-by: Adrian Tschira <nota@notafile.com>
| * | Merge pull request #3110 from NotAFile/py3-six-queueRichard van der Hoff2018-04-201-2/+2
| |\ \ | | | | | | | | Replace Queue with six.moves.queue
| | * | Replace Queue with six.moves.queueAdrian Tschira2018-04-161-2/+2
| | |/ | | | | | | | | | | | | | | | and a six.range change which I missed the last time Signed-off-by: Adrian Tschira <nota@notafile.com>
| * | Merge pull request #3093 from matrix-org/rav/response_cache_wrapRichard van der Hoff2018-04-201-14/+74
| |\ \ | | |/ | |/| Refactor ResponseCache usage
| | * ResponseCache: fix handling of completed resultsRichard van der Hoff2018-04-131-13/+19
| | | | | | | | | | | | | | | Turns out that ObservableDeferred.observe doesn't return a deferred if the result is already completed. Fix handling and improve documentation.
| | * Refactor ResponseCache usageRichard van der Hoff2018-04-121-2/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds a `.wrap` method to ResponseCache which wraps up the boilerplate of a (get, set) pair, and then use it throughout the codebase. This will be largely non-functional, but does include the following functional changes: * federation_server.on_context_state_request: drops use of _server_linearizer which looked redundant and could cause incorrect cache misses by yielding between the get and the set. * RoomListHandler.get_remote_public_room_list(): fixes logcontext leaks * the wrap function includes some logging. I'm hoping this won't be too noisy on production.
| * | Revert "Use sortedcontainers instead of blist"Richard van der Hoff2018-04-131-2/+2
| |/ | | | | | | | | | | | | | | | | | | | | This reverts commit 9fbe70a7dc3afabfdac176ba1f4be32dd44602aa. It turns out that sortedcontainers.SortedDict is not an exact match for blist.sorteddict; in particular, `popitem()` removes things from the opposite end of the dict. This is trivial to fix, but I want to add some unit tests, and potentially some more thought about it, before we do so.
| * Merge pull request #3092 from matrix-org/rav/response_cache_metricsRichard van der Hoff2018-04-121-1/+13
| |\ | | | | | | Add metrics for ResponseCache
| | * Add metrics for ResponseCacheRichard van der Hoff2018-04-101-1/+13
| | |
| * | Merge pull request #3059 from matrix-org/rav/doc_response_cacheRichard van der Hoff2018-04-121-0/+32
| |\ \ | | | | | | | | Document the behaviour of ResponseCache
| | * | Document the behaviour of ResponseCacheRichard van der Hoff2018-04-041-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | it looks like everything that uses ResponseCache expects to have to `make_deferred_yieldable` its results. It's debatable whether that is the best approach, but let's document it for now to avoid further confusion.
| * | | Use sortedcontainers instead of blistVincent Breitmoser2018-04-101-2/+2
| | |/ | |/| | | | | | | | | | | | | | | | | | | This commit drop-in replaces blist with SortedContainers. They are written in pure python so work with pypy, but perform as good as native implementations, at least in a couple benchmarks: http://www.grantjenks.com/docs/sortedcontainers/performance.html
| * | Revert "Merge pull request #3066 from matrix-org/rav/remove_redundant_metrics"Richard van der Hoff2018-04-091-0/+25
| | | | | | | | | | | | | | | | | | | | | We aren't ready to release this yet, so I'm reverting it for now. This reverts commit d1679a4ed7947b0814e0f2af9b888a16c588f1a1, reversing changes made to e089100c6231541c446e37e157dec8feed02d283.
| * | Merge pull request #3068 from matrix-org/rav/fix_cache_invalidationRichard van der Hoff2018-04-051-26/+38
| |\ \ | | | | | | | | Improve database cache performance
| | * | Fix overzealous cache invalidationRichard van der Hoff2018-04-051-26/+38
| | | | | | | | | | | | | | | | | | | | Fixes an issue where a cache invalidation would invalidate *all* pending entries, rather than just the entry that we intended to invalidate.
| * | | Remove redundant metrics which were deprecated in 0.27.0.Richard van der Hoff2018-04-041-25/+0
| |/ /
| * / Use static JSONEncodersRichard van der Hoff2018-03-291-0/+19
| |/ | | | | | | | | using json.dumps with custom options requires us to create a new JSONEncoder on each call. It's more efficient to create one upfront and reuse it.
| * 404 correctly on missing paths via NoResourceMatthew Hodgson2018-03-231-2/+2
| | | | | | | | fixes https://github.com/matrix-org/synapse/issues/2043 and https://github.com/matrix-org/synapse/issues/2029
| * Add commentsErik Johnston2018-03-191-0/+7
| |
| * Fix bug where state cache used lots of memoryErik Johnston2018-03-152-5/+9
| | | | | | | | | | | | | | | | | | The state cache bases its size on the sum of the size of entries. The size of the entry is calculated once on insertion, so it is important that the size of entries does not change. The DictionaryCache modified the entries size, which caused the state cache to incorrectly think it was smaller than it actually was.
* | Make 'unexpected logging context' into warningsRichard van der Hoff2018-03-151-2/+2
|/ | | | | I think we've now fixed enough of these that the rest can be logged at warning.
* Factor run_in_background out from preserve_fnRichard van der Hoff2018-03-081-24/+29
| | | | | It annoys me that we create temporary function objects when there's really no need for it. Let's factor the gubbins out of preserve_fn and start using it.
* Rewrite make_deferred_yieldable avoiding inlineCallbacksRichard van der Hoff2018-03-011-9/+11
| | | | | ... because (a) it's actually simpler (b) it might be marginally more performant?
* report metrics on number of cache evictionsRichard van der Hoff2018-02-053-4/+34
|
* Add federation_domain_whitelist option (#2820)Matthew Hodgson2018-01-221-0/+12
| | | | | | Add federation_domain_whitelist gives a way to restrict which domains your HS is allowed to federate with. useful mainly for gracefully preventing a private but internet-connected HS from trying to federate to the wider public Matrix network
* Merge pull request #2813 from matrix-org/matthew/registrations_require_3pidMatthew Hodgson2018-01-221-0/+48
|\ | | | | add registrations_require_3pid and allow_local_3pids
| * fix PR nitpickingMatthew Hodgson2018-01-191-3/+6
| |
| * rewrite based on PR feedback:Matthew Hodgson2018-01-191-0/+45
| | | | | | | | | | | | | | | | | | * [ ] split config options into allowed_local_3pids and registrations_require_3pid * [ ] simplify and comment logic for picking registration flows * [ ] fix docstring and move check_3pid_allowed into a new util module * [ ] use check_3pid_allowed everywhere @erikjohnston PTAL
* | Merge pull request #2804 from matrix-org/erikj/file_consumerErik Johnston2018-01-181-0/+139
|\ \ | | | | | | Add decent impl of a FileConsumer
| * | Do logcontexts correctlyErik Johnston2018-01-181-2/+2
| | |
| * | Move test stuff to testsErik Johnston2018-01-181-25/+1
| | |
| * | Make all fields privateErik Johnston2018-01-181-31/+31
| | |
| * | Ensure we registerProducer isn't called twiceErik Johnston2018-01-181-0/+3
| | |
| * | Fix _notify_empty typoErik Johnston2018-01-181-1/+1
| | |
| * | Move definition of paused_producer to __init__Erik Johnston2018-01-181-2/+4
| | |
| * | Fix commentsErik Johnston2018-01-181-3/+3
| | |
| * | Add decent impl of a FileConsumerErik Johnston2018-01-171-0/+158
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Twisted core doesn't have a general purpose one, so we need to write one ourselves. Features: - All writing happens in background thread - Supports both push and pull producers - Push producers get paused if the consumer falls behind
* | | Fix bugs in block metricsRichard van der Hoff2018-01-181-2/+4
| |/ |/| | | | | ... which I introduced in #2785
* | Track DB scheduling delay per-requestRichard van der Hoff2018-01-162-2/+30
| | | | | | | | | | | | For each request, track the amount of time spent waiting for a db connection. This entails adding it to the LoggingContext and we may as well add metrics for it while we are passing.
* | Track db txn time in millisecsRichard van der Hoff2018-01-162-6/+11
| | | | | | | | ... to reduce the amount of floating-point foo we do.
* | Optimise LoggingContext creation and copyingRichard van der Hoff2018-01-161-7/+18
|/ | | | | | | | It turns out that the only thing we use the __dict__ of LoggingContext for is `request`, and given we create lots of LoggingContexts and then copy them every time we do a db transaction or log line, using the __dict__ seems a bit redundant. Let's try to optimise things by making the request attribute explicit.
* Reorganise request and block metricsRichard van der Hoff2018-01-151-11/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to circumvent the number of duplicate foo:count metrics increasing without bounds, it's time for a rearrangement. The following are all deprecated, and replaced with synapse_util_metrics_block_count: synapse_util_metrics_block_timer:count synapse_util_metrics_block_ru_utime:count synapse_util_metrics_block_ru_stime:count synapse_util_metrics_block_db_txn_count:count synapse_util_metrics_block_db_txn_duration:count The following are all deprecated, and replaced with synapse_http_server_response_count: synapse_http_server_requests synapse_http_server_response_time:count synapse_http_server_response_ru_utime:count synapse_http_server_response_ru_stime:count synapse_http_server_response_db_txn_count:count synapse_http_server_response_db_txn_duration:count The following are renamed (the old metrics are kept for now, but deprecated): synapse_util_metrics_block_timer:total -> synapse_util_metrics_block_time_seconds synapse_util_metrics_block_ru_utime:total -> synapse_util_metrics_block_ru_utime_seconds synapse_util_metrics_block_ru_stime:total -> synapse_util_metrics_block_ru_stime_seconds synapse_util_metrics_block_db_txn_count:total -> synapse_util_metrics_block_db_txn_count synapse_util_metrics_block_db_txn_duration:total -> synapse_util_metrics_block_db_txn_duration_seconds synapse_http_server_response_time:total -> synapse_http_server_response_time_seconds synapse_http_server_response_ru_utime:total -> synapse_http_server_response_ru_utime_seconds synapse_http_server_response_ru_stime:total -> synapse_http_server_response_ru_stime_seconds synapse_http_server_response_db_txn_count:total -> synapse_http_server_response_db_txn_count synapse_http_server_response_db_txn_duration:total synapse_http_server_response_db_txn_duration_seconds
* Remove __PreservingContextDeferred tooRichard van der Hoff2017-11-141-30/+0
|
* Remove preserve_context_over_{fn, deferred}Richard van der Hoff2017-11-143-49/+10
| | | | | Both of these functions ae known to leak logcontexts. Replace the remaining calls to them and kill them off.
* Logging and logcontext fixes for LimiterRichard van der Hoff2017-11-071-7/+17
| | | | | | | | | Add some logging to the Limiter in a similar spirit to the Linearizer, to help debug issues. Also fix a logcontext leak. Also refactor slightly to avoid throwing exceptions.
* fix vars named `l`Richard van der Hoff2017-10-232-7/+4
| | | | E741 says "do not use variables named ‘l’, ‘O’, or ‘I’".
* replace 'except:' with 'except Exception:'Richard van der Hoff2017-10-234-11/+11
| | | | what could possibly go wrong
* Fix logcontext handling for persist_eventsRichard van der Hoff2017-10-171-0/+5
| | | | | | | | * don't use preserve_context_over_deferred, which is known broken. * remove a redundant preserve_fn. * add/improve some comments
* Merge pull request #2532 from matrix-org/rav/fix_linearizerRichard van der Hoff2017-10-111-2/+22
|\ | | | | Fix stackoverflow and logcontexts from linearizer
| * Fix stackoverflow and logcontexts from linearizerRichard van der Hoff2017-10-111-2/+22
| | | | | | | | | | | | | | 1. make it not blow out the stack when there are more than 50 things waiting for a lock. Fixes https://github.com/matrix-org/synapse/issues/2505. 2. Make it not mess up the log contexts.
* | logformatter: fix AttributeErrorRichard van der Hoff2017-10-111-3/+11
| | | | | | | | make sure we have the relevant fields before we try to log them.
* | Fancy logformatter to format exceptions betterRichard van der Hoff2017-10-091-0/+43
|/ | | | | | | | | This is a bit of an experimental change at this point; the idea is to see if it helps us track down where our stack overflows are coming from by logging the stack when the exception was caught and turned into a Failure. (We'll also need https://github.com/richvdh/twisted/commit/edf27044200e74680ea67c525768e36dc9d9af2b). If we deploy this, we'll be able to enable it via the log config yaml.
* Fix logcontext handling for concurrently_executeRichard van der Hoff2017-10-061-2/+2
| | | | Avoid preserve_context_over_deferred, which is broken.
* pep8David Baker2017-09-261-0/+1
|
* unnecessary parensDavid Baker2017-09-261-1/+1
|
* Add module_loader.pyDavid Baker2017-09-261-0/+41
|
* Increase default cache factor size.Erik Johnston2017-07-041-1/+1
|
* Define CACHE_SIZE_FACTOR onceErik Johnston2017-07-042-9/+2
|
* Use an ExpiringCache for storing registration sessionsErik Johnston2017-06-291-0/+3
| | | | | This is because pruning them was a significant performance drain on matrix.org
* Rewrite conditionalErik Johnston2017-06-091-1/+1
|
* Fix has_any_entity_changedErik Johnston2017-06-091-4/+4
| | | | | | | | Occaisonally has_any_entity_changed would throw the error: "Set changed size during iteration" when taking the max of the `sorteddict`. While its uncertain how that happens, its quite inefficient to iterate over the entire dict anyway so we change to using the more traditional `bisect_*` functions.
* Add stream change cacheErik Johnston2017-05-311-0/+15
|
* Pull out if statement from for loopErik Johnston2017-05-221-6/+14
|
* Update list cache to handle one arg caseErik Johnston2017-05-221-17/+33
| | | | | | We update the normal cache descriptors to handle caches with a single argument specially so that the key wasn't a 1-tuple. We need to update the cache list to be aware of this.
* Make get_state_groups_from_groups faster.Erik Johnston2017-05-171-11/+46
| | | | | | | | | Most of the time was spent copying a dict to filter out sentinel values that indicated that keys did not exist in the dict. The sentinel values were added to ensure that we cached the non-existence of keys. By updating DictionaryCache to keep track of which keys were known to not exist itself we can remove a dictionary copy.
* Don't update event cache hit ratio from get_joined_usersErik Johnston2017-05-081-3/+6
| | | | | Otherwise the hit ration of plain get_events gets completely skewed by calls to get_joined_users* functions.
* Optimise caches with single keyErik Johnston2017-05-041-9/+33
|
* Instantiate DeferredTimedOutError correctlyRichard van der Hoff2017-05-021-1/+1
| | | | | | Call `super` correctly, so that we correctly initialise the `errcode` field. Fixes https://github.com/matrix-org/synapse/issues/2179.
* Reduce size of joined_user cacheErik Johnston2017-04-251-0/+14
| | | | | | | | The _get_joined_users_from_context cache stores a mapping from user_id to avatar_url and display_name. Instead of storing those in a dict, store them in a namedtuple as that uses much less memory. We also try converting the string to ascii to further reduce the size.
* Remove DEBUG_CACHESErik Johnston2017-04-251-2/+0
|
* Reduce cache size by not storing deferredsErik Johnston2017-04-251-18/+21
| | | | | | | | | | | | | | | | | | | | Currently the cache descriptors store deferreds rather than raw values, this is a simple way of triggering only one database hit and sharing the result if two callers attempt to get the same value. However, there are a few caches that simply store a mapping from string to string (or int). These caches can have a large number of entries, under the assumption that each entry is small. However, the size of a deferred (specifically the size of ObservableDeferred) is signigicantly larger than that of the raw value, 2kb vs 32b. This PR therefore changes the cache descriptors to store the raw values rather than the deferreds. As a side effect cached storage function now either return a deferred or the actual value, as the cached list decriptor already does. This is fine as we always end up just yield'ing on the returned value eventually, which handles that case correctly.
* Only intern ascii stringsErik Johnston2017-04-241-18/+11
|
* Fix fixme in preserve_fnRichard van der Hoff2017-04-031-5/+1
| | | | | `preserve_fn` is no longer used as a decorator anywhere, so we can safely fix a fixme therein.
* Remove unused instance variableErik Johnston2017-03-311-4/+0
|
* DocsErik Johnston2017-03-301-0/+5
|
* Revert log context changeErik Johnston2017-03-301-3/+0
|
* Doc new instance variablesErik Johnston2017-03-301-1/+8
|
* Manually calculate cache key as getcallargs is expensiveErik Johnston2017-03-301-6/+28
| | | | | This is because getcallargs recomputes the getargspec, amongst other things, which we don't need to do as its already been done
* Don't convert to deferreds when not necessaryErik Johnston2017-03-303-2/+8
|
* Fix the logcontext handling in the cache wrappers (#2077)Richard van der Hoff2017-03-302-16/+37
| | | | | | | The cache wrappers had a habit of leaking the logcontext into the reactor while the lookup function was running, and then not restoring it correctly when the lookup function had completed. It's all the fault of `preserve_context_over_{fn,deferred}` which are basically a bit broken.
* Merge pull request #2050 from matrix-org/rav/federation_backoffRichard van der Hoff2017-03-231-4/+25
|\ | | | | push federation retry limiter down to matrixfederationclient
| * Ignore backoff history for invites, aliases, and roomdirsRichard van der Hoff2017-03-231-2/+11
| | | | | | | | | | Add a param to the federation client which lets us ignore historical backoff data for federation queries, and set it for a handful of operations.
| * push federation retry limiter down to matrixfederationclientRichard van der Hoff2017-03-231-2/+14
| | | | | | | | | | rather than having to instrument everywhere we make a federation call, make the MatrixFederationHttpClient manage the retry limiter.
* | Merge pull request #2052 from matrix-org/rav/time_bound_deferredRichard van der Hoff2017-03-231-4/+6
|\ \ | | | | | | Fix time_bound_deferred to throw the right exception
| * | Fix time_bound_deferred to throw the right exceptionRichard van der Hoff2017-03-231-4/+6
| |/ | | | | | | | | | | Due to a failure to instantiate DeferredTimedOutError, time_bound_deferred would throw a CancelledError when the deferred timed out, which was rather confusing.
* | Fix a couple of logcontext leaksRichard van der Hoff2017-03-231-2/+3
| | | | | | | | | | Use preserve_fn to correctly manage the logcontexts around things we don't want to yield on.
* | Fix caching of remote servers' signature keysRichard van der Hoff2017-03-221-63/+72
|/ | | | | | | | | The `@cached` decorator on `KeyStore._get_server_verify_key` was missing its `num_args` parameter, which meant that it was returning the wrong key for any server which had more than one recorded key. By way of a fix, change the default for `num_args` to be *all* arguments. To implement that, factor out a common base class for `CacheDescriptor` and `CacheListDescriptor`.
* Merge pull request #2026 from matrix-org/rav/logcontext_docsRichard van der Hoff2017-03-201-0/+10
|\ | | | | Logcontext docs
| * Logcontext docsRichard van der Hoff2017-03-171-0/+10
| |
* | Stop preserve_fn leaking context into the reactorRichard van der Hoff2017-03-181-32/+29
|/ | | | | | | | Fix a bug in ``logcontext.preserve_fn`` which made it leak context into the reactor, and add a test for it. Also, get rid of ``logcontext.reset_context_after_deferred``, which tried to do the same thing but had its own, different, set of bugs.
* Merge pull request #2016 from matrix-org/rav/queue_pdus_during_joinRichard van der Hoff2017-03-171-0/+25
|\ | | | | Queue up federation PDUs while a room join is in progress