From c775f310e949e6ab72d1074310eb2bd4d024172c Mon Sep 17 00:00:00 2001 From: "Olivier Wilkinson (reivilibre)" Date: Tue, 27 Aug 2019 13:51:25 +0100 Subject: Don't include the room & user stats docs in this PR. Signed-off-by: Olivier Wilkinson (reivilibre) --- docs/room_and_user_statistics.md | 146 --------------------------------------- 1 file changed, 146 deletions(-) delete mode 100644 docs/room_and_user_statistics.md diff --git a/docs/room_and_user_statistics.md b/docs/room_and_user_statistics.md deleted file mode 100644 index eb003f771a..0000000000 --- a/docs/room_and_user_statistics.md +++ /dev/null @@ -1,146 +0,0 @@ -Room and User Statistics -======================== - -Synapse maintains room and user statistics (as well as a cache of room state), -in various tables. - -These can be used for administrative purposes but are also used when generating -the public room directory. If these tables get stale or out of sync (possibly -after database corruption), you may wish to regenerate them. - - -# Synapse Administrator Documentation - -## Various SQL scripts that you may find useful - -### Delete stats, including historical stats - -```sql -DELETE FROM room_stats_current; -DELETE FROM room_stats_historical; -DELETE FROM user_stats_current; -DELETE FROM user_stats_historical; -``` - -### Regenerate stats (all subjects) - -```sql -BEGIN; - DELETE FROM stats_incremental_position; - INSERT INTO stats_incremental_position ( - state_delta_stream_id, - total_events_min_stream_ordering, - total_events_max_stream_ordering, - is_background_contract - ) VALUES (NULL, NULL, NULL, FALSE), (NULL, NULL, NULL, TRUE); -COMMIT; - -DELETE FROM room_stats_current; -DELETE FROM user_stats_current; -``` - -then follow the steps below for **'Regenerate stats (missing subjects only)'** - -### Regenerate stats (missing subjects only) - -```sql --- Set up staging tables --- we depend on current_state_events_membership because this is used --- in our counting. -INSERT INTO background_updates (update_name, progress_json) VALUES - ('populate_stats_prepare', '{}', 'current_state_events_membership'); - --- Run through each room and update stats -INSERT INTO background_updates (update_name, progress_json, depends_on) VALUES - ('populate_stats_process_rooms', '{}', 'populate_stats_prepare'); - --- Run through each user and update stats. -INSERT INTO background_updates (update_name, progress_json, depends_on) VALUES - ('populate_stats_process_users', '{}', 'populate_stats_process_rooms'); - --- Clean up staging tables -INSERT INTO background_updates (update_name, progress_json, depends_on) VALUES - ('populate_stats_cleanup', '{}', 'populate_stats_process_users'); -``` - -then **restart Synapse**. - - -# Synapse Developer Documentation - -## High-Level Concepts - -### Definitions - -* **subject**: Something we are tracking stats about – currently a room or user. -* **current row**: An entry for a subject in the appropriate current statistics - table. Each subject can have only one. -* **historical row**: An entry for a subject in the appropriate historical - statistics table. Each subject can have any number of these. - -### Overview - -Stats are maintained as time series. There are two kinds of column: - -* absolute columns – where the value is correct for the time given by `end_ts` - in the stats row. (Imagine a line graph for these values) -* per-slice columns – where the value corresponds to how many of the occurrences - occurred within the time slice given by `(end_ts − bucket_size)…end_ts` - or `start_ts…end_ts`. (Imagine a histogram for these values) - -Currently, only absolute columns are in use. - -Stats are maintained in two tables (for each type): current and historical. - -Current stats correspond to the present values. Each subject can only have one -entry. - -Historical stats correspond to values in the past. Subjects may have multiple -entries. - -## Concepts around the management of stats - -### current rows - -#### dirty current rows - -Current rows can be **dirty**, which means that they have changed since the -latest historical row for the same subject. -**Dirty** current rows possess an end timestamp, `end_ts`. - -#### old current rows and old collection - -When a (necessarily dirty) current row has an `end_ts` in the past, it is said -to be **old**. -Old current rows must be copied into a historical row, and cleared of their dirty -status, before further statistics can be tracked for that subject. -The process which does this is referred to as **old collection**. - -#### incomplete current rows - -There are also **incomplete** current rows, which are current rows that do not -contain a full count yet – this is because they are waiting for the stats -regenerator to give them an initial count. Incomplete current rows DO NOT contain -correct and up-to-date values. As such, *incomplete rows are not old-collected*. -Instead, old incomplete rows will be extended so they are no longer old. - -### historical rows - -Historical rows can always be considered to be valid for the time slice and -end time specified. (This, of course, assumes a lack of defects in the code -to track the statistics, and assumes integrity of the database). - -Even still, there are two considerations that we may need to bear in mind: - -* historical rows will not exist for every time slice – they will be omitted - if there were no changes. In this case, the following assumptions can be - made to interpolate/recreate missing rows: - - absolute fields have the same values as in the preceding row - - per-slice fields are zero (`0`) -* historical rows will not be retained forever – rows older than a configurable - time will be purged. - -#### purge - -The purging of historical rows is not yet implemented. - -- cgit 1.4.1