Put a cache on `/state_ids` (#7931)

If we send out an event which refers to `prev_events` which other servers in the federation are missing, then (after a round or two of backfill attempts), they will end up asking us for `/state_ids` at a particular point in the DAG. As per https://github.com/matrix-org/synapse/issues/7893, this is quite expensive, and we tend to see lots of very similar requests around the same time. We can therefore handle this much more efficiently by using a cache, which (a) ensures that if we see the same request from multiple servers (or even the same server, multiple times), then they share the result, and (b) any other servers that miss the initial excitement can also benefit from the work. [It's interesting to note that `/state` has a cache for exactly this reason. `/state` is now essentially unused and replaced with `/state_ids`, but evidently when we replaced it we forgot to add a cache to the new endpoint.]
author: Richard van der Hoff <1389908+richvdh@users.noreply.github.com> 2020-07-23 18:38:19 +0100
committer: GitHub <noreply@github.com> 2020-07-23 18:38:19 +0100
commit: 7078866969758e52eec33ebdb8288e203d8bd2b7 (patch)
tree: 814050c5ec16f094711b267129ea5d7332f9de1b /synapse
parent: Abort federation requests if the client disconnects early (#7930) (diff)
download: synapse-7078866969758e52eec33ebdb8288e203d8bd2b7.tar.xz
1 files changed, 11 insertions, 2 deletions
diff --git a/synapse/federation/federation_server.py b/synapse/federation/federation_server.py
index 23625ba995..11c5d63298 100644
--- a/synapse/federation/federation_server.py
+++ b/synapse/federation/federation_server.py
@@ -109,6 +109,9 @@ class FederationServer(FederationBase):
         # We cache responses to state queries, as they take a while and often
         # come in waves.
         self._state_resp_cache = ResponseCache(hs, "state_resp", timeout_ms=30000)
+        self._state_ids_resp_cache = ResponseCache(
+            hs, "state_ids_resp", timeout_ms=30000
+        )
 
     async def on_backfill_request(
         self, origin: str, room_id: str, versions: List[str], limit: int
@@ -376,10 +379,16 @@ class FederationServer(FederationBase):
         if not in_room:
             raise AuthError(403, "Host not in room.")
 
+        resp = await self._state_ids_resp_cache.wrap(
+            (room_id, event_id), self._on_state_ids_request_compute, room_id, event_id,
+        )
+
+        return 200, resp
+
+    async def _on_state_ids_request_compute(self, room_id, event_id):
         state_ids = await self.handler.get_state_ids_for_pdu(room_id, event_id)
         auth_chain_ids = await self.store.get_auth_chain_ids(state_ids)
-
-        return 200, {"pdu_ids": state_ids, "auth_chain_ids": auth_chain_ids}
+        return {"pdu_ids": state_ids, "auth_chain_ids": auth_chain_ids}
 
     async def _on_context_state_request_compute(
         self, room_id: str, event_id: str
author	Richard van der Hoff <1389908+richvdh@users.noreply.github.com>	2020-07-23 18:38:19 +0100
committer	GitHub <noreply@github.com>	2020-07-23 18:38:19 +0100
commit	7078866969758e52eec33ebdb8288e203d8bd2b7 (patch)
tree	814050c5ec16f094711b267129ea5d7332f9de1b /synapse
parent	Abort federation requests if the client disconnects early (#7930) (diff)
download	synapse-7078866969758e52eec33ebdb8288e203d8bd2b7.tar.xz