summary refs log tree commit diff
path: root/docs/human-id-rules.rst
diff options
context:
space:
mode:
Diffstat (limited to 'docs/human-id-rules.rst')
-rw-r--r--docs/human-id-rules.rst79
1 files changed, 0 insertions, 79 deletions
diff --git a/docs/human-id-rules.rst b/docs/human-id-rules.rst
deleted file mode 100644
index 3a1ff39892..0000000000
--- a/docs/human-id-rules.rst
+++ /dev/null
@@ -1,79 +0,0 @@
-This document outlines the format for human-readable IDs within matrix.
-
-Overview
---------
-UTF-8 is quickly becoming the standard character encoding set on the web. As
-such, Matrix requires that all strings MUST be encoded as UTF-8. However,
-using Unicode as the character set for human-readable IDs is troublesome. There
-are many different characters which appear identical to each other, but would
-identify different users. In addition, there are non-printable characters which
-cannot be rendered by the end-user. This opens up a security vulnerability with
-phishing/spoofing of IDs, commonly known as a homograph attack.
-
-Web browers encountered this problem when International Domain Names were
-introduced. A variety of checks were put in place in order to protect users. If
-an address failed the check, the raw punycode would be displayed to disambiguate
-the address. Similar checks are performed by home servers in Matrix. However, 
-Matrix does not use punycode representations, and so does not show raw punycode 
-on a failed check. Instead, home servers must outright reject these misleading 
-IDs.
-
-Types of human-readable IDs
----------------------------
-There are two main human-readable IDs in question:
-
-- Room aliases
-- User IDs
- 
-Room aliases look like ``#localpart:domain``. These aliases point to opaque
-non human-readable room IDs. These pointers can change, so there is already an
-issue present with the same ID pointing to a different destination at a later
-date.
-
-User IDs look like ``@localpart:domain``. These represent actual end-users, and
-unlike room aliases, there is no layer of indirection. This presents a much
-greater concern with homograph attacks. 
-
-Checks
-------
-- Similar to web browsers.
-- blacklisted chars (e.g. non-printable characters)
-- mix of language sets from 'preferred' language not allowed. 
-- Language sets from CLDR dataset.
-- Treated in segments (localpart, domain)
-- Additional restrictions for ease of processing IDs.
-   - Room alias localparts MUST NOT have ``#`` or ``:``.
-   - User ID localparts MUST NOT have ``@`` or ``:``.
-
-Rejecting
----------
-- Home servers MUST reject room aliases which do not pass the check, both on 
-  GETs and PUTs.
-- Home servers MUST reject user ID localparts which do not pass the check, both
-  on creation and on events.
-- Any home server whose domain does not pass this check, MUST use their punycode
-  domain name instead of the IDN, to prevent other home servers rejecting you.
-- Error code is ``M_FAILED_HUMAN_ID_CHECK``. (generic enough for both failing 
-  due to homograph attacks, and failing due to including ``:`` s, etc)
-- Error message MAY go into further information about which characters were
-  rejected and why.
-- Error message SHOULD contain a ``failed_keys`` key which contains an array
-  of strings which represent the keys which failed the check e.g::
-  
-    failed_keys: [ user_id, room_alias ]
-  
-Other considerations
---------------------
-- Basic security: Informational key on the event attached by HS to say "unsafe 
-  ID". Problem: clients can just ignore it, and since it will appear only very
-  rarely, easy to forget when implementing clients.
-- Moderate security: Requires client handshake. Forces clients to implement
-  a check, else they cannot communicate with the misleading ID. However, this is
-  extra overhead in both client implementations and round-trips.
-- High security: Outright rejection of the ID at the point of creation / 
-  receiving event. Point of creation rejection is preferable to avoid the ID
-  entering the system in the first place. However, malicious HSes can just allow
-  the ID. Hence, other home servers must reject them if they see them in events.
-  Client never sees the problem ID, provided the HS is correctly implemented.
-- High security decided; client doesn't need to worry about it, no additional
-  protocol complexity aside from rejection of an event.
\ No newline at end of file