A Brief Analysis of Telegram's Rich Text Rendering Mechanism


Recently, I've been trying to call Telegram's TDLib API to fetch data and then render it on a webpage. Let's analyze the data structure and rendering scheme of Telegram's rich text data.

The following data structure can be obtained through the API:

  "content": "It is a test sentence.",
  "entities": [
      "type": "bold",
      "offset": 3,
      "length": 2
      "type": "italic",
      "offset": 0,
      "length": 7
      "type": "url",
      "offset": 13,
      "length": 8,
      "url": ""
      "type": "strikethrough",
      "offset": 10,
      "length": 5

Actually, it corresponds to the following rich text:

It is a test sentence.

How to generate

As we can see, the rich text of Telegram is based on a piece of plain text (content), and then continuously adds layers (entity) on top of it. Each entity contains the format type, the start position and length of the text, as well as some additional information about the current format.

  • When the user is editing text, select a section of text and apply the corresponding format, then add the corresponding entity to the eneities.
  • If a user cancels the format of a certain text, it is necessary to traverse all entities, find the entity that matches the corresponding format, and if it happens to include the modified range, the current entity needs to be split into one or two entities to exclude the overlapping area in the middle.

How to render

Upon seeing this pattern, the most intuitive solution that comes to mind is to restore the original text layer by layer through replace operations on the entities. However, it will soon be found that this approach is not feasible: text replacement will inevitably disrupt the original offset+length positional relationship, causing offsets when dealing with multiple entities.

So, let's take a different approach. A whole paragraph + a set of entities, with different types of entities overlapping and combining, can result in a wide variety of formats. For example, some texts are bold+italic, some are underline+bold+url, etc. However, no matter what, there must be a substring within which all text formats are the same. Based on this premise, we can break down the entire paragraph and generate a set of atomic substrings, each of which has the same format for all text within it.

As for the planning, all you need to do is take the endpoints on both sides of all entities as the dividing points:

points = entities

For instance, in the previous example, we can obtain a set of style points like this: [0,3,5,7,10,13,15,21]

By using these split points to divide the original text, we can get a group of substrings: ["it ","is"," a"," te","st ","se","ntence","."]

Then we handle each substring separately, sequentially find the entity that matches the substring and overwrite its format.

currentPos = 0
for (let i=0; i<segments.length(); i++)  {
   for (e of entities) {
       if (e.offset>currentPos) continue
       if (e.offset+e.length<=currentPos) continue
       segments[i] = e.applyFormat(segments[i])