From 5b81c6675231d6bf63b51631de357a1db5e182ff Mon Sep 17 00:00:00 2001 From: clokep Date: Tue, 7 Sep 2021 13:11:01 +0000 Subject: deploy: 89ba83481821d44a4b768fbcd7761de039393a67 --- develop/development/url_previews.html | 320 ++++++++++++++++++++++++++++++++++ 1 file changed, 320 insertions(+) create mode 100644 develop/development/url_previews.html (limited to 'develop/development/url_previews.html') diff --git a/develop/development/url_previews.html b/develop/development/url_previews.html new file mode 100644 index 0000000000..0af2fdc833 --- /dev/null +++ b/develop/development/url_previews.html @@ -0,0 +1,320 @@ + + + + + + URL Previews - Synapse + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ +
+ + + + + + + + + + + +
+
+ +
+ +
+ +

URL Previews

+

The GET /_matrix/media/r0/preview_url endpoint provides a generic preview API +for URLs which outputs Open Graph responses (with some Matrix +specific additions).

+

This does have trade-offs compared to other designs:

+
    +
  • Pros: +
      +
    • Simple and flexible; can be used by any clients at any point
    • +
    +
  • +
  • Cons: +
      +
    • If each homeserver provides one of these independently, all the HSes in a +room may needlessly DoS the target URI
    • +
    • The URL metadata must be stored somewhere, rather than just using Matrix +itself to store the media.
    • +
    • Matrix cannot be used to distribute the metadata between homeservers.
    • +
    +
  • +
+

When Synapse is asked to preview a URL it does the following:

+
    +
  1. Checks against a URL blacklist (defined as url_preview_url_blacklist in the +config).
  2. +
  3. Checks the in-memory cache by URLs and returns the result if it exists. (This +is also used to de-duplicate processing of multiple in-flight requests at once.)
  4. +
  5. Kicks off a background process to generate a preview: +
      +
    1. Checks the database cache by URL and timestamp and returns the result if it +has not expired and was successful (a 2xx return code).
    2. +
    3. Checks if the URL matches an oEmbed pattern. If it does, fetch the oEmbed +response. If this is an image, replace the URL to fetch and continue. If +if it is HTML content, use the HTML as the document and continue.
    4. +
    5. If it doesn't match an oEmbed pattern, downloads the URL and stores it +into a file via the media storage provider and saves the local media +metadata.
    6. +
    7. If the media is an image: +
        +
      1. Generates thumbnails.
      2. +
      3. Generates an Open Graph response based on image properties.
      4. +
      +
    8. +
    9. If the media is HTML: +
        +
      1. Decodes the HTML via the stored file.
      2. +
      3. Generates an Open Graph response from the HTML.
      4. +
      5. If an image exists in the Open Graph response: +
          +
        1. Downloads the URL and stores it into a file via the media storage +provider and saves the local media metadata.
        2. +
        3. Generates thumbnails.
        4. +
        5. Updates the Open Graph response based on image properties.
        6. +
        +
      6. +
      +
    10. +
    11. Stores the result in the database cache.
    12. +
    +
  6. +
  7. Returns the result.
  8. +
+

The in-memory cache expires after 1 hour.

+

Expired entries in the database cache (and their associated media files) are +deleted every 10 seconds. The default expiration time is 1 hour from download.

+ +
+ + +
+
+ + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file -- cgit 1.5.1