While events are currently JSON blobs which accept additional metadata appended to them, there is no formal structure for how to represent this information or interpret it on the client side, particularly in the case of unknown event types.
When specifying new events, the proposals often reinvent the same wheel instead of reusing existing blocks or types, such as in cases where captions, thumbnails, etc need to be considered for an event. This has further issues of clients not knowing how to render these newly-specified events, leading to mixed compatibility within the ecosystem.
The above seriously hinders the uptake of new event types (and therefore features) within the Matrix ecosystem. In the current system, a new event type would be introduced and all implementations slowly gain support for it - if we instead had reusable types then clients could automatically support a "good enough" version of that new event type while "proper" support is written in over time. Such an example could be polls: not every client will want polls right away, but it would be quite limiting as a user experience if some users can't even see the question being posed.
This proposal introduces a structure for how extensible events are represented, using the existing extensible nature of events today, laying the groundwork for more reusable blocks of content in future events.
With text being the simplest form of representation for events today, this MSC also specifies a relatively basic text schema for room messages that can be reused in other events. Other building block types are specified by other MSCs:
- MSC3954 - Emotes
- MSC3955 - Notices / automated events
- MSC3956 - Encryption
- MSC3927 - Audio
- MSC3551 - Files
- MSC3552 - Images and Stickers
- MSC3553 - Videos
- MSC3554 - Translatable text
Some examples of new features/events using extensible events are:
- MSC3488 - Location data
- MSC3381 - Polls
- MSC3245 - Voice messages
- MSC2192 - Inline widgets
- MSC3765 - Rich text topics
Note: Readers might find Andy's blog useful for understanding the problem space. Unfortunately, for those who need to understand the changes to the protocol/specification, the best option is to read this proposal.
In a new room version (why is described later in this proposal), events are declared
to be represented by their extensible form, as described by this MSC. m.room.message
is formally deprecated by this MSC, with removal from the specification happening as
part of a room version adopting the feature. Clients are expected to use extensible
events only in rooms versions which explicitly declare such support (in both unstable
and stable settings), except where noted later in this proposal.
An extensible event is made up of two critical parts: an event type and zero or more content blocks. The event type defines which content blocks a receiver can expect, and the content blocks carry the information needed to render the event (whether the client understands the event type or not).
Content blocks are simply any top-level key in content
on the event. They can have
any value type (that is also legal in an event generally: string, integer, etc), and
are namespaced using the
Matrix conventions for namespacing.
Content blocks can be invented independent of event types and should be reusable
in nature. For example, this proposal introduces an m.text
content block which
can be reused by other event types to represent textual fallback.
When a client encounters an extensible event (any event sent in a supported room version) that it does not understand, the client begins searching for a best match based on event type schemas it does know. This may mean combining multiple different content blocks to match a suitable schema, such as in the case of MSC3553 video events. Which schemas to try, and in what order, is left as a deliberate implementation detail. A client might decide to try parsing the event as a video, then image, then file, then text message, for example.
It is generally not expected that a single content block will describe an entire event, except in the exceedingly trivial cases (like text messages in this proposal). Multiple content blocks will usually fully describe the information in the event, and mixins (described later) can further change how an event is represented or processed.
Note that a "client" in an extensible events sense will typically mean an application using the Client-Server API, however in reality a client will be anything which needs to parse and understand event contents (servers for some functions like push rules, application services, etc).
Per the introduction, text is the baseline format that most/all Matrix clients support
today, often through use of HTML and m.room.message
. Instead of using m.room.message
to represent this content, clients would instead use an m.message
event with, at
a minimum, a m.text
content block:
{
// irrelevant fields not shown
"type": "m.message",
"content": {
"m.text": [
{ "body": "<i>Hello world</i>", "mimetype": "text/html" },
{ "body": "Hello world" }
]
}
}
m.text
has the following definitions associated with it:
- An ordered array of mimetypes and applicable string content to represent a single marked-up blob of text. Each element is known as a representation.
body
in a representation is required, and must be a string.mimetype
is optional in a representation, and defaults totext/plain
.- Zero representations are permitted, however senders should aim to always specify at least one.
- Invalid representations are skipped by clients (missing
body
, not an object, etc). - The first representation a renderer understands should be used.
- Senders are strongly encouraged to always include a plaintext representation.
- The
mimetype
of a representation determines itsbody
- no effort is made to limit what is allowed in thebody
, however clients are still strongly encouraged to validate/sanitize the content further, like in the existing spec for HTML. - Custom text formats in a representation are specified by a suitably custom
mimetype
. For example, a representation might use a text format extending HTML or XML, or an all-new markup. This can be used to create bridge-compatible clients where the destination network's markup is first in the array, followed by more common HTML and text formats.
Like with the event described above, all event types now describe which content blocks
they expect to see on their events. These content blocks could be required, as is the
case of m.text
in m.message
, or they could be optional depending on the situation.
Of course, senders are welcome to send even more blocks which aren't specified in the
schema for an event type, however clients which understand that event type might not
consider them at all.
In m.message
's case, m.text
is the only required content block. The m.text
block can be reused by other events to include a text-like format for the event, such
as a text fallback for clients which do not understand how to render a custom event
type.
To reiterate, when a client encounters an unknown event type it first tries to see if there's a set of content blocks present that it can associate with a known event type. If it finds suitable content blocks, it parses the event as though the event were of the known type. If it doesn't find anything useful, the event is left as unrenderable, just as it likely would today.
To avoid a situation where events end up being unrenderable, it is strongly
recommended that all event types support at least an m.text
content block in
their schema, thus allowing all events to theoretically be rendered as message
events (in a worst case scenario).
For clarity, events are not able to specify how they are handled when the receiver doesn't know how to render the event type: the sender simply includes all possible or feasible representations for the data, hoping the receiver will pick the richest form for the user. As an example, a special medical imaging event type might also be represented as a video, static image, or text (URL to some healthcare platform): the sender includes all 3 fallbacks by specifying the needed content blocks, and the receiver may pick the video, image, or text depending on its own rules.
Events must still only represent a single logical piece of information, thus encouraging sensible fallback options in the form of content blocks. The information being represented is described by the event type, as it always has been before this MSC. It is explicitly not permitted to represent two or more pieces of information in a single event, such as a livestream reference and poll: senders should look into relationships instead.
In a hypothetical scenario, a temperature event might look as such:
{
// irrelevant fields not shown
"type": "org.example.temperature",
"content": {
"m.text": [{"body": "It is 22 degrees at Home"}],
"org.example.probe_value": {
"label": "Home",
"units": "org.example.celsius",
"value": 22
}
}
}
In this scenario, clients which understand how to render an org.example.temperature
event might use the information in org.example.probe_value
exclusively, leaving the
m.text
block for clients which don't understand the temperature event type.
Another event type might find inspiration and use the probe value block for their event as well. Such an example might be in a more industrial control application:
{
// irrelevant fields not shown
"type": "org.example.tank.level",
"content": {
"m.text": [{"body": "[Danger] The water tank is 90% full."}],
"org.example.probe_value": {
"label": "Tank 3",
"units": "org.example.litres",
"value": 9037
},
"org.example.danger_level": "alert"
}
}
This event also demonstrates a org.example.danger_level
block, which uses a string
value type instead of the previously demonstrated objects and values - this is a legal
content block, as blocks can be of any type.
Clients should be cautious and avoid reusing too many unspecified types as it can create opportunities for confusion and inconsistency. There should always be an effort to get useful event types into the Matrix spec for others to benefit from.
This MSC requires a room version to make the transition process clear and coordinated. Normally for a feature such as this, an effort would be made to attempt to support backwards compatibility for a duration of time, however for a feature that requires significant overhaul of clients, servers, and Matrix as a whole it feels more important to bias towards a clear switch between legacy and modern (extensible) events.
Note: A previous draft of this proposal (codenamed "v1 extensible events") did attempt to describe a timeline-based approach, allowing for event types to mix concepts of content blocks and legacy fields, however that approach did not give sufficient reason for clients to fully adopt the extensible events changes.
In room versions supporting extensible events, clients MUST only send extensible events.
Deprecated event types (to be enumerated at the time of making the room version) MUST NOT
be sent into extensible event-supporting room versions, and clients MUST treat deprecated
event types as unrenderable by force. For example, if a client sees an m.room.message
in
an extensible event-supporting room version, it must not render it, even if it knows how
to render that type.
While full enforcement of this restriction is not feasible, servers are encouraged to block Client-Server API requests for sending known-banned event types into applicable rooms. This obviously does not help when the room is encrypted, or the client is sending custom events in a non-extensible form, hence the requirement that clients treat the events as invalid too.
Using the usual MSC process, the Spec Core Team (SCT) will be responsible for determining the minimum scope of extensible events in a published (stable) room version.
Meanwhile, clients are welcome to use the unstable implementations of extensible event-supporting features, provided they are in an appropriate room version. Some event type MSCs declare explicit support for what would normally be an unsupported room version - client authors should check the applicable MSC or specification for the feature to determine if they are allowed to do this. Such examples include MSC3381 Polls and MSC3245 Voice Messages.
Unknown state event types generally should not be parsed by clients. This is to prevent situations
where the sender masks a state change as some other, non-state, event. For example, even
if a state event has an m.text
content block, it should not be treated as a room message.
Note that state events MUST still make use of content blocks in applicable room versions, and that
any top-level key in content
is defined as a content block under this proposal. As such, this
MSC implicitly promotes all existing content fields of m.*
state events to independent content
blocks as needed. Other MSCs may override this decision on a per-event type basis (ie: redeclaring
how room topics work to support content blocks, deprecating the existing m.room.topic
event in
the process, like in MSC3765).
Unlike most content blocks, these promoted-to-content-blocks are not realistically meant to be
reused: it is simply a formality given this MSC's scope.
Currently push notifications
describe how an event can cause a notification to the user, though it makes the assumption
that there are m.room.message
events flying around to denote "messages" which can trigger
keyword/mention-style alerts. With extensible events, the same might not be possible as it
relies on understanding how/when the client will render the event to cause notifications.
For simplicity, when content.body
is used in an event_match
condition, it now looks for
an m.text
block's text/plain
representation (implied or explicit) in room versions
supporting extensible events. This is not an easy rule to represent in the existing push
rules schema, and this MSC has no interest in designing a better schema. Note that other
conditions applied to push notifications, such as an event type check, are not affected by
this: clients/servers will have to alter applicable push rules to handle the new event types
(see also: MSC3933 and friends).
This MSC proposes no changes to how power levels interact with events: they are still capable of restricting which users can send an event type. Though events might be rendered as a different logical type (ie: unknown event being rendered as a message), this does not materially impact the room's ability to function. Thus, considerations for how to handle power levels more intelligently are details left for a future MSC.
As of writing, most rooms fit into two categories: any event type is possible to send, or specific cherry-picked event types are allowed (announcement rooms: reactions & redactions). Extensible events don't materially change the situation implied by this power levels structure.
A mixin is a specific type of content block which can be added to any type of event to change how that event is processed. Content blocks which are mixins will be called out as such in the spec. Mixins are meant to be purely additive, thus all event types MUST support being rendered/processed without the use of mixins.
See also the Wikipedia entry on mixins.
Note that mixins differ from optional content blocks in an event type's schema: a mixin is able to be applied to any event type sensibly while optional content blocks are generally only valuable to the applicable event types.
Though this MSC does not describe any such mixins itself, MSC3955 does by allowing any event to be flagged as "automated" - a strictly additive annotation on events.
Another possible mixin would be m.relates_to
(not described by this MSC). Currently,
some features like the key verification framework
rely on relationships as part of making the feature work. The expectation is that
these features would be adapted to meet the "purely additive" condition (assuming
m.relates_to
does actually end up being a mixin).
For an abundance of clarity, all functionality not explicitly called out in this MSC which
relies on the formatted_body
of an m.room.message
is expected to transition to using
an appropriate m.text
representation instead. For example, the HTML representation of
a mention will
now appear under m.text
's text/html
representation (adding one if required).
A similar condition is applied to body
in m.room.message
: all existing functionality
will instead use the text/plain
representation within m.text
, if not explicitly
called out by this MSC.
It's a bit ugly to not know whether a given key in content
will take a string, object,
boolean, integer, or array.
It's a bit ugly to not know at a glance if a content block is a mixin or not.
It's a bit ugly that you have to look over the keys of contents to see what blocks
are present, but better than duplicating this into an explicit blocks
list within the
event content (on balance).
We're skipping over defining rules for which fallback combinations to display (i.e. "display hints") for now; these can be added in a future MSC if needed. MSC1225 contains a proposal for this.
Placing content blocks at the top level of content
is a bit unfortunate, though mixes
nicely thanks to namespacing. Potentially conflicting cases in the wild would be
namespaced fields, which would get translated as unrenderable events if the value type
doesn't meet the client's known schema.
This MSC does not rewrite or redefine all possible events in the specification: this is deliberately left as an exercise for several future MSCs.
Like today, it's possible to have the different representations of an event not match, thus introducing a potential for malicious payloads (text-only clients seeing something different to HTML-friendly ones). Clients could try to do similarity comparisons, though this is complicated with features like HTML and arbitrary custom markup (markdown, etc) showing up in the plaintext or in tertiary formats on the events. Historically, room moderators have been pretty good about removing these malicious senders from their rooms when other users point out (quite quickly) that the event is appearing funky to them.
Extensible events as a spec feature requires dozens of different MSCs, with this MSC being the structure definition and text baseline. It is not expected that this MSC will be written into spec once it has passed FCP. Instead, it is expected that all of the "core" extensible events MSCs will pass FCP and extensible events be assigned a stable room version before any spec authoring begins. Thus, this particular MSC should be anticipated to sit in accepted-but-not-merged (stable, not formal spec yet) for a while, and that's okay.
The Spec Core Team (SCT) has decision making power over what is considered core for extensible
events, though the recommendation is to ensure replacements for all non-state m.room.*
types
have accepted (successful FCP) MSCs to replace them.
While this MSC is not considered stable by the specification, implementations must use
org.matrix.msc1767
as a prefix to denote the unstable functionality. For example, sending
an m.message
event would mean sending an org.matrix.msc1767.message
event instead.
For purposes of testing, implementations can use a dynamically-assigned unstable room version
org.matrix.msc1767.<version>
to use extensible events within. For example, org.matrix.msc1767.10
for room version 10 or org.matrix.msc1767.org.example.cool_ver
for a hypothetical
org.example.cool_ver
room version. Any events sent in these room versions can use stable
identifiers given the entire room version itself is unstable, however senders must take care
to ensure stable identifiers do not leak out to other room versions - it may be simpler to not
send stable identifiers at all.
- converted from googledoc to MD, and to be a single PR rather than split PR/Issue.
- simplifies it by removing displayhints (for now - deferred to a future MSC).
- replaces the clunky m.text.1 idea with lists for types which support fallbacks.
- removes the concept of optional compact form for m.text by instead having m.text always in expanded form.
- tries to accommodate most of the feedback on GH and Google Docs from MSC1225.
- Anything that wasn't simple text rendering was broken out to dedicated MSCs in an effort to get the structure approved and adopted while the more complex types get implemented independently.
- Renamed subtypes/reusable types to just "content blocks".
- Allow content blocks to be nested.
- Fix push rules in the most basic sense, deferring to a future MSC on better support.
- Explicitly make no changes to power levels, deferring to a future MSC on better support.
- Drop timeline for transition in favour of an explicit room version.
- Move most push rule changes and such into their own/future MSCs.
- Move emotes, notices, and encryption out to their own dedicated MSCs.