diff options
author | David Robertson <davidr@element.io> | 2021-10-05 18:35:25 +0100 |
---|---|---|
committer | GitHub <noreply@github.com> | 2021-10-05 18:35:25 +0100 |
commit | 4f00432ce1a5571dd43f9ddc3ae128c58ae4d063 (patch) | |
tree | c792bee6b6bf9080ad59cd9c24a9579575eef56d /synapse/storage/databases | |
parent | Fix logic flaw preventing tracking of MSC2716 events in existing room version... (diff) | |
download | synapse-4f00432ce1a5571dd43f9ddc3ae128c58ae4d063.tar.xz |
Fix potential leak of per-room profiles when the user dir is rebuilt. (#10981)
There are two steps to rebuilding the user directory: 1. a scan over rooms, followed by 2. a scan over local users. The former reads avatars and display names from the `room_memberships` table and therefore contains potentially private avatars and display names. The latter reads from the the `profiles` table which only contains public data; moreover it will overwrite any private profiles that the rooms scan may have written to the user directory. This means that the rebuild could leak private user while the rebuild was in progress, only to later cover up the leaks once the rebuild had completed. This change skips over local users when writing user_directory rows when scanning rooms. Doing so means that it'll take longer for a rebuild to make local users searchable, which is unfortunate. I think a future PR can improve this by swapping the order of the two steps above. (And indeed there's more to do here, e.g. copying from `profiles` without going via Python.) Small tidy-ups while I'm here: * Remove duplicated code from test_initial. This was meant to be pulled into `purge_and_rebuild_user_dir`. * Move `is_public` before updating sharing tables. No functional change; it's still before the first read of `is_public`. * Don't bother creating a set from dict keys. Slightly nicer and makes the code simpler. Co-authored-by: Richard van der Hoff <1389908+richvdh@users.noreply.github.com>
Diffstat (limited to 'synapse/storage/databases')
-rw-r--r-- | synapse/storage/databases/main/user_directory.py | 33 |
1 files changed, 20 insertions, 13 deletions
diff --git a/synapse/storage/databases/main/user_directory.py b/synapse/storage/databases/main/user_directory.py index 5f538947ec..5c713a732e 100644 --- a/synapse/storage/databases/main/user_directory.py +++ b/synapse/storage/databases/main/user_directory.py @@ -228,10 +228,6 @@ class UserDirectoryBackgroundUpdateStore(StateDeltasStore): is_in_room = await self.is_host_joined(room_id, self.server_name) if is_in_room: - is_public = await self.is_room_world_readable_or_publicly_joinable( - room_id - ) - users_with_profile = await self.get_users_in_room_with_profiles(room_id) # Throw away users excluded from the directory. users_with_profile = { @@ -241,22 +237,33 @@ class UserDirectoryBackgroundUpdateStore(StateDeltasStore): or await self.should_include_local_user_in_dir(user_id) } - # Update each user in the user directory. + # Upsert a user_directory record for each remote user we see. for user_id, profile in users_with_profile.items(): + # Local users are processed separately in + # `_populate_user_directory_users`; there we can read from + # the `profiles` table to ensure we don't leak their per-room + # profiles. It also means we write local users to this table + # exactly once, rather than once for every room they're in. + if self.hs.is_mine_id(user_id): + continue + # TODO `users_with_profile` above reads from the `user_directory` + # table, meaning that `profile` is bespoke to this room. + # and this leaks remote users' per-room profiles to the user directory. await self.update_profile_in_user_dir( user_id, profile.display_name, profile.avatar_url ) - to_insert = set() - + # Now update the room sharing tables to include this room. + is_public = await self.is_room_world_readable_or_publicly_joinable( + room_id + ) if is_public: - for user_id in users_with_profile: - to_insert.add(user_id) - - if to_insert: - await self.add_users_in_public_rooms(room_id, to_insert) - to_insert.clear() + if users_with_profile: + await self.add_users_in_public_rooms( + room_id, users_with_profile.keys() + ) else: + to_insert = set() for user_id in users_with_profile: # We want the set of pairs (L, M) where L and M are # in `users_with_profile` and L is local. |