From d5704cf2a3c6e8d27a6f70bca0db499e04ce6eb9 Mon Sep 17 00:00:00 2001 From: Kegan Dougal Date: Tue, 9 Sep 2014 14:53:35 -0700 Subject: Added initial draft for human-readable ID rules. --- docs/human-id-rules.rst | 71 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 71 insertions(+) create mode 100644 docs/human-id-rules.rst (limited to 'docs/human-id-rules.rst') diff --git a/docs/human-id-rules.rst b/docs/human-id-rules.rst new file mode 100644 index 0000000000..36987ddd0d --- /dev/null +++ b/docs/human-id-rules.rst @@ -0,0 +1,71 @@ +This document outlines the format for human-readable IDs within matrix. + +Overview +-------- +UTF-8 is quickly becoming the standard character encoding set on the web. As +such, Matrix requires that all strings MUST be encoded as UTF-8. However, +using Unicode as the character set for human-readable IDs is troublesome. There +are many different characters which appear identical to each other, but would +identify different users. In addition, there are non-printable characters which +cannot be rendered the the end-user. This opens up a security vulnerability with +phishing/spoofing of IDs, commonly known as a homograph attack. + +Web browers encountered this problem when International Domain Names were +introduced. A variety of checks were put in place in order to protect users. If +an address failed the check, the raw punycode would be displayed to disambiguate +the address. Similar checks are performed by home servers in Matrix, which will +then warn the client about the potentially misleading ID. However, Matrix does +not use punycode, and so does not show raw punycode on a failed check. Instead, +home servers must outright reject these misleading IDs. + +Types of human-readable IDs +--------------------------- +There are two main human-readable IDs in question: + + - Room aliases + - User IDs + +Room aliases look like ``#localpart:domain``. These aliases point to opaque +non human-readable room IDs. These pointers can change, so there is already an +issue present with the same ID pointing to a different destination at a later +date. + +User IDs look like ``@localpart:domain``. These represent actual end-users, and +unlike room aliases, there is no layer of indirection. This presents a much +greater concern with homograph attacks. + +Checks +------ +- Similar to web browsers. +- blacklisted chars (e.g. non-printable characters) +- mix of language sets from 'preferred' language not allowed. +- Language sets from CLDR dataset. +- Treated in segments (localpart, domain) + +Rejecting +--------- +- Home servers MUST reject room aliases which do not pass the check, both on + GETs and PUTs. +- Home servers MUST reject user ID localparts which do not pass the check, both + on creation and on events. +- Any home server whose domain does not pass this check, MUST use their punycode + domain name instead of the IDN, to prevent other home servers rejecting you. +- Error code is M_FAILED_HOMOGRAPH_CHECK. +- Error message MAY go into further information about which characters were + rejected and why. + +Other considerations +-------------------- +- Basic security: Informational key on the event attached by HS to say "unsafe + ID". Problem: clients can just ignore it, and since it will appear only very + rarely, easy to forget when implementing clients. +- Moderate security: Requires client handshake. Forces clients to implement + a check, else they cannot communicate with the misleading ID. However, this is + extra overhead in both client implementations and round-trips. +- High security: Outright rejection of the ID at the point of creation / + receiving event. Point of creation rejection is preferable to avoid the ID + entering the system in the first place. However, malicious HSes can just allow + the ID. Hence, other home servers must reject them if they see them in events. + Client never sees the problem ID, provided the HS is correctly implemented. +- High security decided; client doesn't need to worry about it, no additional + protocol complexity aside from rejection of an event. \ No newline at end of file -- cgit 1.4.1 From 56a358481e928d6e70ff8afd48756c67860965c9 Mon Sep 17 00:00:00 2001 From: Kegan Dougal Date: Tue, 9 Sep 2014 15:00:48 -0700 Subject: Tyops --- docs/human-id-rules.rst | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) (limited to 'docs/human-id-rules.rst') diff --git a/docs/human-id-rules.rst b/docs/human-id-rules.rst index 36987ddd0d..999651991c 100644 --- a/docs/human-id-rules.rst +++ b/docs/human-id-rules.rst @@ -7,23 +7,23 @@ such, Matrix requires that all strings MUST be encoded as UTF-8. However, using Unicode as the character set for human-readable IDs is troublesome. There are many different characters which appear identical to each other, but would identify different users. In addition, there are non-printable characters which -cannot be rendered the the end-user. This opens up a security vulnerability with +cannot be rendered by the end-user. This opens up a security vulnerability with phishing/spoofing of IDs, commonly known as a homograph attack. Web browers encountered this problem when International Domain Names were introduced. A variety of checks were put in place in order to protect users. If an address failed the check, the raw punycode would be displayed to disambiguate -the address. Similar checks are performed by home servers in Matrix, which will -then warn the client about the potentially misleading ID. However, Matrix does -not use punycode, and so does not show raw punycode on a failed check. Instead, -home servers must outright reject these misleading IDs. +the address. Similar checks are performed by home servers in Matrix. However, +Matrix does not use punycode representations, and so does not show raw punycode +on a failed check. Instead, home servers must outright reject these misleading +IDs. Types of human-readable IDs --------------------------- There are two main human-readable IDs in question: - - Room aliases - - User IDs +- Room aliases +- User IDs Room aliases look like ``#localpart:domain``. These aliases point to opaque non human-readable room IDs. These pointers can change, so there is already an -- cgit 1.4.1 From f23e5b17b66db0fabb8c53d3f046936268e5e031 Mon Sep 17 00:00:00 2001 From: Kegan Dougal Date: Tue, 9 Sep 2014 15:11:06 -0700 Subject: Extra restrictions to make parsing easier. --- docs/human-id-rules.rst | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) (limited to 'docs/human-id-rules.rst') diff --git a/docs/human-id-rules.rst b/docs/human-id-rules.rst index 999651991c..6e63bc43a2 100644 --- a/docs/human-id-rules.rst +++ b/docs/human-id-rules.rst @@ -41,6 +41,9 @@ Checks - mix of language sets from 'preferred' language not allowed. - Language sets from CLDR dataset. - Treated in segments (localpart, domain) +- Additional restrictions for ease of processing IDs. + - Room alias localparts MUST NOT have ``#`` or ``:``. + - User ID localparts MUST NOT have ``@`` or ``:``. Rejecting --------- @@ -50,9 +53,13 @@ Rejecting on creation and on events. - Any home server whose domain does not pass this check, MUST use their punycode domain name instead of the IDN, to prevent other home servers rejecting you. -- Error code is M_FAILED_HOMOGRAPH_CHECK. +- Error code is ``M_FAILED_HUMAN_ID_CHECK``. (generic enough for both failing + due to homograph attacks, and failing due to including ``:``s, etc) - Error message MAY go into further information about which characters were rejected and why. +- Error message SHOULD contain a ``failed_keys`` key which contains an array + of strings which represent the keys which failed the check e.g: + - ``failed_keys: [ user_id, room_alias ]`` Other considerations -------------------- -- cgit 1.4.1 From 2bd4346075b119d48afa676dcc883a51199119f2 Mon Sep 17 00:00:00 2001 From: Kegan Dougal Date: Tue, 9 Sep 2014 15:13:50 -0700 Subject: More rst formatting. --- docs/human-id-rules.rst | 11 ++++++----- 1 file changed, 6 insertions(+), 5 deletions(-) (limited to 'docs/human-id-rules.rst') diff --git a/docs/human-id-rules.rst b/docs/human-id-rules.rst index 6e63bc43a2..3a1ff39892 100644 --- a/docs/human-id-rules.rst +++ b/docs/human-id-rules.rst @@ -42,8 +42,8 @@ Checks - Language sets from CLDR dataset. - Treated in segments (localpart, domain) - Additional restrictions for ease of processing IDs. - - Room alias localparts MUST NOT have ``#`` or ``:``. - - User ID localparts MUST NOT have ``@`` or ``:``. + - Room alias localparts MUST NOT have ``#`` or ``:``. + - User ID localparts MUST NOT have ``@`` or ``:``. Rejecting --------- @@ -54,12 +54,13 @@ Rejecting - Any home server whose domain does not pass this check, MUST use their punycode domain name instead of the IDN, to prevent other home servers rejecting you. - Error code is ``M_FAILED_HUMAN_ID_CHECK``. (generic enough for both failing - due to homograph attacks, and failing due to including ``:``s, etc) + due to homograph attacks, and failing due to including ``:`` s, etc) - Error message MAY go into further information about which characters were rejected and why. - Error message SHOULD contain a ``failed_keys`` key which contains an array - of strings which represent the keys which failed the check e.g: - - ``failed_keys: [ user_id, room_alias ]`` + of strings which represent the keys which failed the check e.g:: + + failed_keys: [ user_id, room_alias ] Other considerations -------------------- -- cgit 1.4.1