summary refs log tree commit diff
diff options
context:
space:
mode:
authorMatthew Hodgson <matthew@matrix.org>2017-06-14 02:23:06 +0100
committerMatthew Hodgson <matthew@matrix.org>2017-06-14 02:23:14 +0100
commitba502fb89a4c57c57de81669bfaa5ef02b4af904 (patch)
tree5fcbaa27d1b5eaf59c97d6c6a92f494e445f88b7
parentMerge pull request #2279 from matrix-org/erikj/fix_user_dir (diff)
downloadsynapse-ba502fb89a4c57c57de81669bfaa5ef02b4af904.tar.xz
add notes on running out of FDs
-rw-r--r--README.rst24
1 files changed, 24 insertions, 0 deletions
diff --git a/README.rst b/README.rst
index 35141ac71b..12f0c0c51a 100644
--- a/README.rst
+++ b/README.rst
@@ -528,6 +528,30 @@ fix try re-installing from PyPI or directly from
     # Install from github
     pip install --user https://github.com/pyca/pynacl/tarball/master
 
+Running out of File Handles
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+If synapse runs out of filehandles, it typically fails badly - live-locking
+at 100% CPU, and/or failing to accept new TCP connections (blocking the
+connecting client).  Matrix currently can legitimately use a lot of file handles,
+thanks to busy rooms like #matrix:matrix.org containing hundreds of participating
+servers.  The first time a server talks in a room it will try to connect
+simultaneously to all participating servers, which could exhaust the available
+file descriptors between DNS queries & HTTPS sockets, especially if DNS is slow
+to respond.  (We need to improve the routing algorithm used to be better than
+full mesh, but as of June 2017 this hasn't happened yet).
+
+If you hit this failure mode, we recommend increasing the maximum number of
+open file handles to be at least 4096 (assuming a default of 1024 or 256).
+This is typically done by editing ``/etc/security/limits.conf``
+
+Separately, Synapse may leak file handles if inbound HTTP requests get stuck
+during processing - e.g. blocked behind a lock or talking to a remote server etc.
+This is best diagnosed by matching up the 'Received request' and 'Processed request'
+log lines and looking for any 'Processed request' lines which take more than
+a few seconds to execute.  Please let us know at #matrix-dev:matrix.org if
+you see this failure mode so we can help debug it, however.
+
 ArchLinux
 ~~~~~~~~~