More Instant Messaging Interoperability R. L. Barnes Internet-Draft Cisco Intended status: Informational 20 March 2023 Expires: 21 September 2023 ActivityPub for Interoperable Messaging draft-barnes-mimi-aim-latest Abstract The MIMI working group is chartered to define tools that messaging providers can use to interoperate with one another. The W3C ActivityPub protocol is already widely used for several use cases that resemble the MIMI use case. This document examines whether ActivityPub might be a good baseline for providing the sort of interoperability that MIMI intends to achieve. About This Document This note is to be removed before publishing as an RFC. The latest revision of this draft can be found at https://bifurcation.github.io/mimi-aim/draft-barnes-mimi-aim.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-barnes-mimi-aim/. Discussion of this document takes place on the More Instant Messaging Interoperability Working Group mailing list (mailto:mimi@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/mimi/. Subscribe at https://www.ietf.org/mailman/listinfo/mimi/. Source for this draft and an issue tracker can be found at https://github.com/bifurcation/mimi-aim. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 21 September 2023. Copyright Notice Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction 2. Conventions and Definitions 3. MIMI Requirements 3.1. Components 3.2. Service-to-Service Interoperability 3.3. Transport Use Cases 4. ActivityPub 4.1. Actors and Activities 4.2. Activity Delivery 4.3. Identity 5. Using ActivityPub for MIMI 5.1. User Identity and Metadata 5.2. Channels 5.2.1. Channel Creation 5.2.2. Message Delivery 5.2.3. Membership and Metadata Management 5.3. Group DMs 6. Security Considerations 6.1. End-to-End Security 6.2. Forward Secrecy 6.3. Authentication and Authorization 7. IANA Considerations 8. References 8.1. Normative References 8.2. Informative References Acknowledgments Author's Address 1. Introduction The MIMI working group is chartered to define tools that messaging providers can use to interoperate with one another. Messaging is obviously not a new application; readers of "Message Transmission Protocol" [RFC561] from 1975 will find familiar concepts such as "TO", "CC", and "BCC" fields. It thus seems likely that some existing protocol will satisfy many of MIMI's requirements. Basing MIMI on an existing widely deployed protocol can also facilitate deployment of the MIMI protocol, since the lessons from deployment of the predecessor protocol should mostly carry forward. This document considers the W3C ActivityPub protocol [W3C.ActivityPub] as such candidate to be such a baseline for MIMI. At a high, level, ActivityPub is a protocol for sharing "Activities" with various semantics among users homed to loosely-coupled servers. ActivityPub was published as a W3C Recommendation in 2018, and today supports several wide-scale services. The largest and most prominent of these is the Mastodon microblogging platform [Mastodon], which as of this writing has around 10 million registered users, and an active userbase in the millions. The fact that Mastodon is based on ActivityPub is suggestive of how ActivityPub might be useful for MIMI. On the one hand, while Mastodon is primarily used for distributing public messages, it also allows users to post private messages that are only delivered to specific recipients. On the other hand, Mastodon's focus on wide distribution of public messages suggests that ActivityPub could support messaging among large numbers of recipients. Mastodon also includes some extensions to ActiivityPub that could also be salient to MIMI, such as the use of acct URIs [RFC7565] to identify users and the use of WebFinger [RFC7033] to resolve these URIs to routable identifiers. In the remainder of this document, we review the MIMI requirements and the high-level architecture of ActivityPub. We then sketch out approaches based on ActivityPub for realizing the use cases salient to MIMI, highlighting both areas where there is natural overlap between MIMI and ActivityPub and areas where ActivityPub might need modification or extension to support MIMI use cases. 2. Conventions and Definitions The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here. 3. MIMI Requirements 3.1. Components An overall solution for interoperability between messaging services naturally breaks down into a few components, as illustrated in Figure 1: * A *Transport* system that delivers messages between services, including enough information for the services to route the messages to the correct set of end clients. * An *End-to-End (E2E) Security* layer that protects message contents from inspection or tampering by the services involved in delivering them. * An *Identity* system that provides: - A client addressing scheme by which the servers participating in the transport can identify which clients should receive a message. - A credential scheme that is used to authenticate clients to one another in the end-to-end security system. * Formats for messages carried within end-to-end protection that enable *Messaging* and *Real-Time* applications. +-----------+-----------+ | Messaging | Real-Time | +----------+ +-----------+-----------+ | Identity +---+--->+ E2E Security | +----------+ | +-----------------------+ +--->+ Transport | +-----------------------+ Figure 1: Components of MIMI In other words, the E2E security layer creates a demarcation between things that are visible to servers and things that are not. The transport protocol defines the former, message formats the latter. 3.2. Service-to-Service Interoperability MIMI is focused on interoperability between messaging *services*. Unlike earlier messaging protocols like XMPP [RFC6120], which cover client-to-server interactions as well as server-to-server interactions, MIMI is focused primarily on the latter. Domain A MIMI Transport Domain B | | | .-----' '-----. .-----' '-----. .-----' '-----. | | | | | | Client A <-----> Service A <-----> Service B <-----> Client B Figure 2: MIMI delivers messages between services The MIMI transport system and the routing functions of the identity system operate within the inter-service interaction. The services are presumed to be able to deliver messages to connected clients based on information provided by the transport system. As the name implies, the E2E security system must be compatible across the various clients that comprise the endpoints of a messaging interaction. This in turn requires that the authentication aspects of the identity layer and the message formats be understood by clients. Since these components are not accessible to servers (due to E2E protections), they need to be handled locally on clients. Here again, the E2E security layer creates a demarcation, between protocol features that are server-to-server and client-to-client scoped. Note, however, that no part of the protocol covers client- to-server interactions. These are the domain of the individual services. 3.3. Transport Use Cases The messaging applications among which MIMI is to provide interoperability typically support two types of interaction with complementary properties: * Group Direct Messages (DMs): The interaction has a static set of participants, and is "singular", in the sense that any direct message to exactly that set of participants is presumed to belong to the interaction. * Channels: An interaction with a dynamic set of participants. Multiple channels can have the same set of participants, and participants can join and leave the channel. (These concepts have various names in different messaging systems. The naming here is not intended to indicate alignment with one system over another, but to choose some common terminology with appropriate connotations.) Many systems also support one-to-one messaging, but this can be considered a special cases of Group DMs, in the sense that one-to-one conversations are typically singular interactions with have a static participant set. It is also common for an interaction that appears to be 1:1 in a user interface to be realized with group messaging, for example to accommodate users' use of multiple devices. One way to view the distinction between group DMs and channels is that in a channel-style interaction, the interaction is "reified", in the sense that it is an entity in the protocol that can be the subject of metadata, the object of actions, etc. Group DMs, on the other hand, are defined only by their participant list. Channels are like XMPP MUCs [RFC7702]; group DMs are more like email. 4. ActivityPub In this section, we provide a brief overview of the ActivityPub protocol. ActivityPub defines both client-to-server and server-to- server protocols. Because the MIMI transport layer only goes between two services, we focus on the server-to-server protocol. At a very high level, ActivityPub is similar to SMTP with JSON messages and delivery over HTTP. An ActivityPub server forwards messages from local clients to their intended recipients, and receives messages from other servers intended for its local clients. An ActivityPub server also makes available metadata that support the functioning of the protocol. 4.1. Actors and Activities The main entities in ActivityPub are Actors and Activities. In most cases, an Actor represents a user ("type": "Person"), but Actors can also represent automated services ("type": "Service") or collections of other Actors ("type": "Group"). Each Actor has a unique URI, from which a JSON-LD description of the Actor's attributes can be retrieved. Figure 3 shows a simple Actor description. Activities represent a variety of actions within the system, including "Create" activities that carry new messages as well as things like "Add" and "Remove" to allow modifications of collections. Figure 4 shows a Create activity that reflects the creation of a new Note object. Activities can also carry metadata such as inReplyTo or tags. { "@context": ["https://www.w3.org/ns/activitystreams"], "type": "Person", "id": "https://example.com/users/alice", "inbox": "https://example.com/users/alice/inbox", "outbox": "https://example.com/users/alice/feed" } Figure 3: A sample Actor { "@context": "https://www.w3.org/ns/activitystreams", "type": "Create", "id": "https://example.net/~mallory/87374", "actor": "https://example.net/~mallory", "object": { "id": "https://example.com/~mallory/note/72", "type": "Note", "attributedTo": "https://example.net/~mallory", "content": "This is a note", "published": "2015-02-10T15:04:55Z", "to": ["https://example.org/~john/"], "cc": ["https://example.com/~erik/followers"] }, "published": "2015-02-10T15:04:55Z", "to": ["https://example.org/~john/"], "cc": ["https://example.com/~erik/followers"] } Figure 4: A sample Activity 4.2. Activity Delivery Delivery of activities in ActivityPub follows a push pattern, with the ability to pull messages as a fallback. Each Actor has a "inbox" and "outbox" URIs, which allow external parties to deliver Activities to the Actor or read Activities that the Actor has posted, respectively. To send an Activity to the Actor, a remote server sends an HTTP POST request to the Actor's inbox URI. When a client of an ActivityPub server asks it to distribute an Activity, the server identifies the set of Actors that are the intended recipients of the Activity (e.g., using the to and cc fields visible in Figure 4), and sends POSTs requests containing the activity to the Actors' inboxes. Outbox URIs allow a remote server to query the list of Activities that the Actor has posted. To read Activities posted by the actor, a remote party sednes an HTTP GET request to the outbox URL. The outbox includes a paging function to allow traversal of large sets of Activities. Both inbox and outbox requests are constrained by an authorization model, so that a server can constrain which Actors allowed to communicate. 4.3. Identity The native identifiers for ActivityPub are Actor URIs. These URIs are HTTP URIs that both identify end users as well as services and groups and allow the metadata for the Actor to be retrieved. HTTP URIs are of course not a very user-friendly identifier. So many applications based on ActivityPub use identifiers of the form @username@domain or simply @username when the domain is clear from context. These identifiers represent acct URIs [RFC7565], which, in the words of the RFC, "identify a user's account at a service provider, irrespective of the particular protocols that can be used to interact with the account". In order to engage in ActivityPub interactions with an Actor given such an identifier, the application resolves the identifier to an Actor URI using WebFinger [RFC7033]. For example, given the URI acct:alice@example.com, the application would send a GET request to https://example.com/.well-known/ webfinger?resource=acct:alice@example.com. The response would indicate various contact points associated with that account, as shown in Figure 5. The ActivityPub Actor URI is indicated by the href in the links entry with "type": "application/activity+json". { "subject": "acct:alice@example.com", "links": [ { "rel": "https://webfinger.net/rel/profile-page", "type": "text/html", "href": "https://example.com/@alice" }, { "rel": "self", "type": "application/activity+json", "href": "https://example.com/users/alice" } ] } Figure 5: A WebFinger response for `acct:alice@example.com` 5. Using ActivityPub for MIMI In this document, we consider the use of ActivityPub and related technologies for the transport and identity systems, and the integration of MLS for the E2E security layer [I-D.ietf-mls-protocol]. Message formats are not handled here. Points at which ActivityPub would need to be extended are highlighted with *[EXT]*. These are the domains where the MIMI working group would need to define protocol extensions to build an overall messaging systme based on ActivityPub. 5.1. User Identity and Metadata The primary identifier for a user is an acct URI, which is resolved to an Actor URI using WebFinger as described in Section 4.3. Aside from UI considerations, this choice of primary identifier is important for authentication at the end-to-end security layer. An acct URI is a scoped identifier, in the sense that the domain is the authoritative source of information about what entity is represented by of the user portion of the URI. Indeed, this is the whole premise of using WebFinger for acct URI resolution. *[EXT]* To leverage this information in an MLS-based end-to-end security layer, all that is needed is a credential issued by the domain that attests that the holder of a given signature key legitimately represents the user portion of hte URI, for example an X.509 certificate or Verifiable Credential [RFC5280] [W3C.vc-data-model]. MIMI would need to verify the format for such credentials and how a client receiving one would verify it, but would not need to specify an issuance API. However, given that domains are already assumed to know how to authenticate their users, such an API could be as simple as a single authenticated POST request containing a proof of control of a key pair, whose response would then contain the desired credential. The ActivityPub Actor object contains optional fields that can provide additional metadata about a user, for example a profile URL or preferred username. *[EXT]* The Actor object would be a convenient mechanism to distribute the cryptographic material required to initiate end-to-end secure communications with an actor, i.e., KeyPackage objects in the case of MLS. This facility would be slightly more complicated than the static metadata fields currently present. KeyPackages are intended to be single-use, so the server managing the Actor object would need to selectively provide different KeyPackages in response to differnet queries. Multi-device scenarios might require multiple KeyPackages to be provided in response to a single query. 5.2. Channels The channel use case can be implemented by representing the channel as an ActivityPub Actor. Metadata related to the channel can be published and managed as part of the Actor object. In particular, the followers collection for the Actor can be used to track the membership of the channel, so that normal ActivityPub patterns can be followed for message delivery and membership management. *[EXT]* A channel's Actor also tracks information about the end-to- end security state of the channel. For MLS, this would entail tracking information about an MLS group associated to the channel, most importantly the current epoch and ratchet tree. A channel may also need to store a GroupInfo object for the group, as discussed in Section 5.2.3. In the context of federated messaging, the question of which server hosts a channel could be contentious. For example, if Alice creates a channel on Service A and invites Bob and Charlie from Services B and C, but then Alice leaves the channel, does Service A continue to host the channel even though none of their users are involved? If this is a problem the working group needs to tackle, it will likely be useful to follow the approaches used in Mastodon for moving or linking accounts, e.g., using a Move activity. 5.2.1. Channel Creation When a channel is created on a service's server by a user of that server, no MIMI/ActivityPub action is needed. The server hosting the channel can notify the members of the channel that it has been created by sending a Create activity to their inboxes. { "@context": [ "https://www.w3.org/ns/activitystreams", "urn:ietf:ns:mimi" ], "summary": "Alice created a channel", "type": "Create", "id": "https://example.com/activities/1", "actor": "https://example.com/user/alice", "object": { "type": "Service", "id": "https://example.com/channels/e4f70622", "name": "MIMI discussion group" }, "mimi:welcome": "", "to": "https://john.example.org" } Figure 6: A Create activity announcing a new channel *[EXT]* To set up the end-to-end security for the channel, the creator of the channel will need to fetch KeyPackages for the other members of the channel. For members using other services, KeyPackages can be fetched via the members' Actor objects, as discussed in Section 5.1. An MLS Welcome message enabling the members to initialize their MLS state is attached to the Create activity. 5.2.2. Message Delivery Messages are sent within a channel by sending a Create activity to the channel Actor's inbox, addressed to the channel's followers. Following the "Forwarding from inbox" pattern discussed in [W3C.ActivityPub], the server hosting the channel will then forward the activity to inboxes of the members of the channel. The message content itself is an MLS PrivateMessage encapsulating the actual content to be delivered to the channel. { "@context": "https://www.w3.org/ns/activitystreams", "type": "Create", "id": "https://example.net/~mallory/87374", "actor": "https://example.net/~mallory", "object": { "id": "https://example.com/~mallory/note/72", "type": "Note", "attributedTo": "https://example.net/~mallory", "content": "", "published": "2015-02-10T15:04:55Z", "to": ["https://example.com/channels/e4f70622/followers"] }, "published": "2015-02-10T15:04:55Z", "to": ["https://example.com/channels/e4f70622/followers"] } Figure 7: A Create activity sending a message to a channel If the members have a sharedInbox field in their Actor objects, this delivery can be quite efficient at the inter-service level: Only one copy of the activity will be sent to each shared inbox, effectively once per service involved in the channel. 5.2.3. Membership and Metadata Management Members of the channel add and remove other members by using Add and Remove activities to propose modifications to the followers collection associated to the channel's Actor. Add activities should be forward to the new member to make them aware of their membership in the channel. *[EXT]* An Add or Remove activity must include an MLS Commit that implements the corresponding action on the MLS group. The Commit message must be sent as a PublicMessage so that the server can update its representation of the group's ratchet tree based on the content of the Commit. An Add activity must also include an MLS Welcome message allowing the new member to initialize their MLS state. Before accepting an Add or Remove activity for a channel, the server must verify that the attached Commit corresponds to the current MLS epoch for the channel, and reject the activity if this is not the case. { "@context": "https://www.w3.org/ns/activitystreams", "type": "Add", "id": "https://example.com/user/alice/98485", "actor": "https://example.com/user/alice", "object": "https://example.net/user/bob", "target": "https://example.com/channels/e4f70622/followers", "commit": "", "welcome": "", "to": ["https://example.com/channels/e4f70622/followers"] } Figure 8: An Add activity adding a new member to a channel Other channel metadata (e.g., the name of the channel) can be updated by sending an Update activity to the channel Actor's inbox. { "@context": "https://www.w3.org/ns/activitystreams", "type": "Update", "actor": "https://example.com/user/alice", "object": { "type": "Service", "id": "https://example.com/channels/e4f70622", "name": "MIMI discussion group (now with more ActivityPub!)" }, "to": "https://example.com/channels/e4f70622/inbox" } Figure 9 5.3. Group DMs In principle, since group DMs don't have any independent state aside from the recipient list, groupDMs could be implemented directly using ActivityPub's addressing model. Activities could be directly addressed to other actors using the to field, and a service receiving an activity could associate it to a group DM based on the recipient list in the to field. This approach would simplify certain things. For example, if a group DM used an Actor for distribution as with a channel, it would be necessary to explicitly enforce that there was only one such Actor per group DM; with direct addresising, no such common resources are created, so there is no need to ensure their uniqueness. Such a decentralized approach, however, does not work well with MLS, which works best with a central coordination point to manage the sequencing of changes to MLS state. There are a couple of compromise options available here. It might be feasible have the MLS groups attached to group DMs be immutable. The first person to send a message in a group DM would include a Welcome addressed to KeyPackages for all the other recipients. That message would initialize an MLS group including all the other recipients, which would be used to protect further messages. While the immutability approach is appealing in its simplicity, it might not be workable. Participants in the group DM might want to update their keys for post-compromise security, or they might want to add a new device that they start using after the group DM starts. Both of these operations require changes to the MLS group. To allow mutable MLS groups, group DMs could use direct addressing for message delivery, but link to an MLS group managed more like the MLS group attached to a channel. 6. Security Considerations ActivityPub uses HTTPS for transport security on server-to-server interactions. 6.1. End-to-End Security Section 5 includes provisions for implementing an end-to-end security layer based on MLS. As described in [I-D.ietf-mls-protocol], MLS requires a Delivery Service (DS) and an Authentication Service (AS) in order to be integrated into an application. Here, the DS functions are provided in a decentralized fashion by the ActivityPub servers representing the interoperating services. KeyPackages are distributed via users' Actor objects (see Section 5.1). Other MLS messages are distributed as part of membership management activities (see Section 5.2.3). The AS function is provided by the service-issued credentials discussed in Section 5.1. 6.2. Forward Secrecy By default, MLS provides forward secrecy and post-compromise security for messages sent within a group. In the most straightforward application of MLS to messaging, this means that a new member of a channel will not be able to decrypt messages from before they joined the group. If providing access to historical messages is a desired feature, than further mechanism will be required to provide new members access to historical keys. 6.3. Authentication and Authorization There are some open questions here related to authentication and authorization, for example: * How should servers authenticate each other? * How a receiving server knows that an Activity authentically comes from the Actor who is supposed to have sent it? * What access control policies can a server enforce on inbound messages? The ActivityPub specification is very light on details on these topics. However, applications such as Mastodon have likely developed solutions that could be used as starting points. 7. IANA Considerations This document has no IANA actions. 8. References 8.1. Normative References [I-D.ietf-mls-protocol] Barnes, R., Beurdouche, B., Robert, R., Millican, J., Omara, E., and K. Cohn-Gordon, "The Messaging Layer Security (MLS) Protocol", Work in Progress, Internet- Draft, draft-ietf-mls-protocol-18, 13 March 2023, . [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . [RFC7033] Jones, P., Salgueiro, G., Jones, M., and J. Smarr, "WebFinger", RFC 7033, DOI 10.17487/RFC7033, September 2013, . [RFC7565] Saint-Andre, P., "The 'acct' URI Scheme", RFC 7565, DOI 10.17487/RFC7565, May 2015, . [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [W3C.ActivityPub] "ActivityPub", W3C REC activitypub, W3C activitypub, . 8.2. Informative References [Mastodon] "Mastodon", n.d., . [RFC5280] Cooper, D., Santesson, S., Farrell, S., Boeyen, S., Housley, R., and W. Polk, "Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile", RFC 5280, DOI 10.17487/RFC5280, May 2008, . [RFC561] Bhushan, A., Pogran, K., Tomlinson, R., and J. White, "Standardizing Network Mail Headers", RFC 561, DOI 10.17487/RFC0561, September 1973, . [RFC6120] Saint-Andre, P., "Extensible Messaging and Presence Protocol (XMPP): Core", RFC 6120, DOI 10.17487/RFC6120, March 2011, . [RFC7702] Saint-Andre, P., Ibarra, S., and S. Loreto, "Interworking between the Session Initiation Protocol (SIP) and the Extensible Messaging and Presence Protocol (XMPP): Groupchat", RFC 7702, DOI 10.17487/RFC7702, December 2015, . [W3C.vc-data-model] "Verifiable Credentials Data Model v1.1", W3C REC vc-data- model, W3C vc-data-model, . Acknowledgments This investigation was inspired by a Mastodon post by Darius Kazemi (https://friend.camp/@darius/109996157569528129). Author's Address Richard L. Barnes Cisco Email: rlb@ipv.sx