summary refs log tree commit diff
path: root/docs/room_and_user_statistics.md
diff options
context:
space:
mode:
authorErik Johnston <erik@matrix.org>2019-09-05 17:27:46 +0100
committerErik Johnston <erik@matrix.org>2019-09-05 17:27:46 +0100
commit591d82f06b81738efe67c13cdeee0901c3b28946 (patch)
tree018ec1d3caaa956d2173e9b11c74f78619aba0de /docs/room_and_user_statistics.md
parentFix test (diff)
parentMerge pull request #5984 from matrix-org/joriks/opentracing_link_send_to_edu_... (diff)
downloadsynapse-591d82f06b81738efe67c13cdeee0901c3b28946.tar.xz
Merge branch 'develop' of github.com:matrix-org/synapse into erikj/censor_redactions
Diffstat (limited to 'docs/room_and_user_statistics.md')
-rw-r--r--docs/room_and_user_statistics.md62
1 files changed, 62 insertions, 0 deletions
diff --git a/docs/room_and_user_statistics.md b/docs/room_and_user_statistics.md
new file mode 100644
index 0000000000..e1facb38d4
--- /dev/null
+++ b/docs/room_and_user_statistics.md
@@ -0,0 +1,62 @@
+Room and User Statistics
+========================
+
+Synapse maintains room and user statistics (as well as a cache of room state),
+in various tables. These can be used for administrative purposes but are also
+used when generating the public room directory.
+
+
+# Synapse Developer Documentation
+
+## High-Level Concepts
+
+### Definitions
+
+* **subject**: Something we are tracking stats about – currently a room or user.
+* **current row**: An entry for a subject in the appropriate current statistics
+    table. Each subject can have only one.
+* **historical row**: An entry for a subject in the appropriate historical
+    statistics table. Each subject can have any number of these.
+
+### Overview
+
+Stats are maintained as time series. There are two kinds of column:
+
+* absolute columns – where the value is correct for the time given by `end_ts`
+    in the stats row. (Imagine a line graph for these values)
+    * They can also be thought of as 'gauges' in Prometheus, if you are familiar.
+* per-slice columns – where the value corresponds to how many of the occurrences
+    occurred within the time slice given by `(end_ts − bucket_size)…end_ts`
+    or `start_ts…end_ts`. (Imagine a histogram for these values)
+
+Stats are maintained in two tables (for each type): current and historical.
+
+Current stats correspond to the present values. Each subject can only have one
+entry.
+
+Historical stats correspond to values in the past. Subjects may have multiple
+entries.
+
+## Concepts around the management of stats
+
+### Current rows
+
+Current rows contain the most up-to-date statistics for a room.
+They only contain absolute columns
+
+### Historical rows
+
+Historical rows can always be considered to be valid for the time slice and
+end time specified.
+
+* historical rows will not exist for every time slice – they will be omitted
+    if there were no changes. In this case, the following assumptions can be
+    made to interpolate/recreate missing rows:
+    - absolute fields have the same values as in the preceding row
+    - per-slice fields are zero (`0`)
+* historical rows will not be retained forever – rows older than a configurable
+    time will be purged.
+
+#### Purge
+
+The purging of historical rows is not yet implemented.