From 8f35f8148e1a7ce3ac249e2d2052854409f2c0d6 Mon Sep 17 00:00:00 2001 From: Erik Johnston Date: Mon, 23 Oct 2023 16:57:30 +0100 Subject: Fix bug where a new writer advances their token too quickly (#16473) * Fix bug where a new writer advances their token too quickly When starting a new writer (for e.g. persisting events), the `MultiWriterIdGenerator` doesn't have a minimum token for it as there are no rows matching that new writer in the DB. This results in the the first stream ID it acquired being announced as persisted *before* it actually finishes persisting, if another writer gets and persists a subsequent stream ID. This is due to the logic of setting the minimum persisted position to the minimum known position of across all writers, and the new writer starts off not being considered. * Fix sending out POSITIONs when our token advances without update Broke in #14820 * For replication HTTP requests, only wait for minimal position --- docs/development/synapse_architecture/streams.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) (limited to 'docs/development') diff --git a/docs/development/synapse_architecture/streams.md b/docs/development/synapse_architecture/streams.md index bee0b8a8c0..67d92acfa1 100644 --- a/docs/development/synapse_architecture/streams.md +++ b/docs/development/synapse_architecture/streams.md @@ -51,17 +51,24 @@ will be inserted with that ID. For any given stream reader (including writers themselves), we may define a per-writer current stream ID: -> The current stream ID _for a writer W_ is the largest stream ID such that +> A current stream ID _for a writer W_ is the largest stream ID such that > all transactions added by W with equal or smaller ID have completed. Similarly, there is a "linear" notion of current stream ID: -> The "linear" current stream ID is the largest stream ID such that +> A "linear" current stream ID is the largest stream ID such that > all facts (added by any writer) with equal or smaller ID have completed. Because different stream readers A and B learn about new facts at different times, A and B may disagree about current stream IDs. Put differently: we should think of stream readers as being independent of each other, proceeding through a stream of facts at different rates. +The above definition does not give a unique current stream ID, in fact there can +be a range of current stream IDs. Synapse uses both the minimum and maximum IDs +for different purposes. Most often the maximum is used, as its generally +beneficial for workers to advance their IDs as soon as possible. However, the +minimum is used in situations where e.g. another worker is going to wait until +the stream advances past a position. + **NB.** For both senses of "current", that if a writer opens a transaction that never completes, the current stream ID will never advance beyond that writer's last written stream ID. For single-writer streams, the per-writer current ID and the linear current ID are the same. @@ -114,7 +121,7 @@ Writers need to track: - track their current position (i.e. its own per-writer stream ID). - their facts currently awaiting completion. -At startup, +At startup, - the current position of that writer can be found by querying the database (which suggests that facts need to be written to the database atomically, in a transaction); and - there are no facts awaiting completion. -- cgit 1.4.1