summary refs log tree commit diff
path: root/synapse/storage
diff options
context:
space:
mode:
authorDavid Robertson <davidr@element.io>2021-10-05 18:35:25 +0100
committerGitHub <noreply@github.com>2021-10-05 18:35:25 +0100
commit4f00432ce1a5571dd43f9ddc3ae128c58ae4d063 (patch)
treec792bee6b6bf9080ad59cd9c24a9579575eef56d /synapse/storage
parentFix logic flaw preventing tracking of MSC2716 events in existing room version... (diff)
downloadsynapse-4f00432ce1a5571dd43f9ddc3ae128c58ae4d063.tar.xz
Fix potential leak of per-room profiles when the user dir is rebuilt. (#10981)
There are two steps to rebuilding the user directory:

1. a scan over rooms, followed by
2. a scan over local users.

The former reads avatars and display names from the `room_memberships`
table and therefore contains potentially private avatars and
display names. The latter reads from the the `profiles` table which only
contains public data; moreover it will overwrite any private profiles
that the rooms scan may have written to the user directory. This means
that the rebuild could leak private user while the rebuild was in
progress, only to later cover up the leaks once the rebuild had completed.

This change skips over local users when writing user_directory rows
when scanning rooms. Doing so means that it'll take longer for a rebuild
to make local users searchable, which is unfortunate. I think a future
PR can improve this by swapping the order of the two steps above. (And
indeed there's more to do here, e.g. copying from `profiles` without
going via Python.)

Small tidy-ups while I'm here:

* Remove duplicated code from test_initial. This was meant to be pulled into `purge_and_rebuild_user_dir`.
* Move `is_public` before updating sharing tables. No functional change; it's still before the first read of `is_public`.
* Don't bother creating a set from dict keys. Slightly nicer and makes the code simpler.

Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Diffstat (limited to 'synapse/storage')
-rw-r--r--synapse/storage/databases/main/user_directory.py33
1 files changed, 20 insertions, 13 deletions
diff --git a/synapse/storage/databases/main/user_directory.py b/synapse/storage/databases/main/user_directory.py
index 5f538947ec..5c713a732e 100644
--- a/synapse/storage/databases/main/user_directory.py
+++ b/synapse/storage/databases/main/user_directory.py
@@ -228,10 +228,6 @@ class UserDirectoryBackgroundUpdateStore(StateDeltasStore):
             is_in_room = await self.is_host_joined(room_id, self.server_name)
 
             if is_in_room:
-                is_public = await self.is_room_world_readable_or_publicly_joinable(
-                    room_id
-                )
-
                 users_with_profile = await self.get_users_in_room_with_profiles(room_id)
                 # Throw away users excluded from the directory.
                 users_with_profile = {
@@ -241,22 +237,33 @@ class UserDirectoryBackgroundUpdateStore(StateDeltasStore):
                     or await self.should_include_local_user_in_dir(user_id)
                 }
 
-                # Update each user in the user directory.
+                # Upsert a user_directory record for each remote user we see.
                 for user_id, profile in users_with_profile.items():
+                    # Local users are processed separately in
+                    # `_populate_user_directory_users`; there we can read from
+                    # the `profiles` table to ensure we don't leak their per-room
+                    # profiles. It also means we write local users to this table
+                    # exactly once, rather than once for every room they're in.
+                    if self.hs.is_mine_id(user_id):
+                        continue
+                    # TODO `users_with_profile` above reads from the `user_directory`
+                    #   table, meaning that `profile` is bespoke to this room.
+                    #   and this leaks remote users' per-room profiles to the user directory.
                     await self.update_profile_in_user_dir(
                         user_id, profile.display_name, profile.avatar_url
                     )
 
-                to_insert = set()
-
+                # Now update the room sharing tables to include this room.
+                is_public = await self.is_room_world_readable_or_publicly_joinable(
+                    room_id
+                )
                 if is_public:
-                    for user_id in users_with_profile:
-                        to_insert.add(user_id)
-
-                    if to_insert:
-                        await self.add_users_in_public_rooms(room_id, to_insert)
-                        to_insert.clear()
+                    if users_with_profile:
+                        await self.add_users_in_public_rooms(
+                            room_id, users_with_profile.keys()
+                        )
                 else:
+                    to_insert = set()
                     for user_id in users_with_profile:
                         # We want the set of pairs (L, M) where L and M are
                         # in `users_with_profile` and L is local.