diff options
author | Sean Quah <8349537+squahtx@users.noreply.github.com> | 2022-11-10 19:02:27 +0000 |
---|---|---|
committer | GitHub <noreply@github.com> | 2022-11-10 19:02:27 +0000 |
commit | b2c2b030798d0e74d3bf1afb4726465b53620638 (patch) | |
tree | a93ecc29c3af6b3b701dfa36206a4e34b384c04c | |
parent | Quieter logging for stateres failure at missing prev events (#14346) (diff) | |
download | synapse-b2c2b030798d0e74d3bf1afb4726465b53620638.tar.xz |
Fix PostgreSQL sometimes using table scans for `event_search` (#14409)
PostgreSQL may underestimate the number of distinct `room_id`s in `event_search`, which can cause it to use table scans for queries for multiple rooms. Fix this by setting `n_distinct` on the column. Resolves #14402. Signed-off-by: Sean Quah <seanq@matrix.org>
-rw-r--r-- | changelog.d/14409.bugfix | 1 | ||||
-rw-r--r-- | synapse/storage/schema/main/delta/73/11event_search_room_id_n_distinct.sql.postgres | 33 |
2 files changed, 34 insertions, 0 deletions
diff --git a/changelog.d/14409.bugfix b/changelog.d/14409.bugfix new file mode 100644 index 0000000000..f720700653 --- /dev/null +++ b/changelog.d/14409.bugfix @@ -0,0 +1 @@ +Fix PostgreSQL sometimes using table scans for queries against the `event_search` table, taking a long time and a large amount of IO. diff --git a/synapse/storage/schema/main/delta/73/11event_search_room_id_n_distinct.sql.postgres b/synapse/storage/schema/main/delta/73/11event_search_room_id_n_distinct.sql.postgres new file mode 100644 index 0000000000..93cdaefca1 --- /dev/null +++ b/synapse/storage/schema/main/delta/73/11event_search_room_id_n_distinct.sql.postgres @@ -0,0 +1,33 @@ +/* Copyright 2022 The Matrix.org Foundation C.I.C + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + + +-- By default the postgres statistics collector massively underestimates the +-- number of distinct rooms in `event_search`, which can cause postgres to use +-- table scans for queries for multiple rooms. +-- +-- To work around this we can manually tell postgres the number of distinct rooms +-- by setting `n_distinct` (a negative value here is the number of distinct values +-- divided by the number of rows, so -0.01 means on average there are 100 rows per +-- distinct value). We don't need a particularly accurate number here, as a) we just +-- want it to always use index scans and b) our estimate is going to be better than the +-- one made by the statistics collector. + +ALTER TABLE event_search ALTER COLUMN room_id SET (n_distinct = -0.01); + +-- Ideally we'd do an `ANALYZE event_search (room_id)` here so that +-- the above gets picked up immediately, but that can take a bit of time so we +-- rely on the autovacuum eventually getting run and doing that in the +-- background for us. |