Internet-Draft MIMI ActivityPub March 2023
Barnes Expires 21 September 2023 [Page]
Workgroup:
More Instant Messaging Interoperability
Internet-Draft:
draft-barnes-mimi-aim-latest
Published:
Intended Status:
Informational
Expires:
Author:
R. L. Barnes
Cisco

ActivityPub for Interoperable Messaging

Abstract

The MIMI working group is chartered to define tools that messaging providers can use to interoperate with one another. The W3C ActivityPub protocol is already widely used for several use cases that resemble the MIMI use case. This document examines whether ActivityPub might be a good baseline for providing the sort of interoperability that MIMI intends to achieve.

About This Document

This note is to be removed before publishing as an RFC.

The latest revision of this draft can be found at https://bifurcation.github.io/mimi-aim/draft-barnes-mimi-aim.html. Status information for this document may be found at https://datatracker.ietf.org/doc/draft-barnes-mimi-aim/.

Discussion of this document takes place on the More Instant Messaging Interoperability Working Group mailing list (mailto:mimi@ietf.org), which is archived at https://mailarchive.ietf.org/arch/browse/mimi/. Subscribe at https://www.ietf.org/mailman/listinfo/mimi/.

Source for this draft and an issue tracker can be found at https://github.com/bifurcation/mimi-aim.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 21 September 2023.

Table of Contents

1. Introduction

The MIMI working group is chartered to define tools that messaging providers can use to interoperate with one another. Messaging is obviously not a new application; readers of "Message Transmission Protocol" [RFC561] from 1975 will find familiar concepts such as "TO", "CC", and "BCC" fields. It thus seems likely that some existing protocol will satisfy many of MIMI's requirements. Basing MIMI on an existing widely deployed protocol can also facilitate deployment of the MIMI protocol, since the lessons from deployment of the predecessor protocol should mostly carry forward.

This document considers the W3C ActivityPub protocol [W3C.ActivityPub] as such candidate to be such a baseline for MIMI. At a high, level, ActivityPub is a protocol for sharing "Activities" with various semantics among users homed to loosely-coupled servers. ActivityPub was published as a W3C Recommendation in 2018, and today supports several wide-scale services. The largest and most prominent of these is the Mastodon microblogging platform [Mastodon], which as of this writing has around 10 million registered users, and an active userbase in the millions.

The fact that Mastodon is based on ActivityPub is suggestive of how ActivityPub might be useful for MIMI. On the one hand, while Mastodon is primarily used for distributing public messages, it also allows users to post private messages that are only delivered to specific recipients. On the other hand, Mastodon's focus on wide distribution of public messages suggests that ActivityPub could support messaging among large numbers of recipients. Mastodon also includes some extensions to ActiivityPub that could also be salient to MIMI, such as the use of acct URIs [RFC7565] to identify users and the use of WebFinger [RFC7033] to resolve these URIs to routable identifiers.

In the remainder of this document, we review the MIMI requirements and the high-level architecture of ActivityPub. We then sketch out approaches based on ActivityPub for realizing the use cases salient to MIMI, highlighting both areas where there is natural overlap between MIMI and ActivityPub and areas where ActivityPub might need modification or extension to support MIMI use cases.

2. Conventions and Definitions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14 [RFC2119] [RFC8174] when, and only when, they appear in all capitals, as shown here.

3. MIMI Requirements

3.1. Components

An overall solution for interoperability between messaging services naturally breaks down into a few components, as illustrated in Figure 1:

  • A Transport system that delivers messages between services, including enough information for the services to route the messages to the correct set of end clients.
  • An End-to-End (E2E) Security layer that protects message contents from inspection or tampering by the services involved in delivering them.
  • An Identity system that provides:

    • A client addressing scheme by which the servers participating in the transport can identify which clients should receive a message.
    • A credential scheme that is used to authenticate clients to one another in the end-to-end security system.
  • Formats for messages carried within end-to-end protection that enable Messaging and Real-Time applications.
Messaging Real-Time Identity E2E Security Transport
Figure 1: Components of MIMI

In other words, the E2E security layer creates a demarcation between things that are visible to servers and things that are not. The transport protocol defines the former, message formats the latter.

3.2. Service-to-Service Interoperability

MIMI is focused on interoperability between messaging services. Unlike earlier messaging protocols like XMPP [RFC6120], which cover client-to-server interactions as well as server-to-server interactions, MIMI is focused primarily on the latter.

Domain A MIMI Transport Domain B Client A Service A Service B Client B
Figure 2: MIMI delivers messages between services

The MIMI transport system and the routing functions of the identity system operate within the inter-service interaction. The services are presumed to be able to deliver messages to connected clients based on information provided by the transport system.

As the name implies, the E2E security system must be compatible across the various clients that comprise the endpoints of a messaging interaction. This in turn requires that the authentication aspects of the identity layer and the message formats be understood by clients. Since these components are not accessible to servers (due to E2E protections), they need to be handled locally on clients.

Here again, the E2E security layer creates a demarcation, between protocol features that are server-to-server and client-to-client scoped. Note, however, that no part of the protocol covers client-to-server interactions. These are the domain of the individual services.

3.3. Transport Use Cases

The messaging applications among which MIMI is to provide interoperability typically support two types of interaction with complementary properties:

  • Group Direct Messages (DMs): The interaction has a static set of participants, and is "singular", in the sense that any direct message to exactly that set of participants is presumed to belong to the interaction.
  • Channels: An interaction with a dynamic set of participants. Multiple channels can have the same set of participants, and participants can join and leave the channel.

(These concepts have various names in different messaging systems. The naming here is not intended to indicate alignment with one system over another, but to choose some common terminology with appropriate connotations.)

Many systems also support one-to-one messaging, but this can be considered a special cases of Group DMs, in the sense that one-to-one conversations are typically singular interactions with have a static participant set. It is also common for an interaction that appears to be 1:1 in a user interface to be realized with group messaging, for example to accommodate users' use of multiple devices.

One way to view the distinction between group DMs and channels is that in a channel-style interaction, the interaction is "reified", in the sense that it is an entity in the protocol that can be the subject of metadata, the object of actions, etc. Group DMs, on the other hand, are defined only by their participant list. Channels are like XMPP MUCs [RFC7702]; group DMs are more like email.

4. ActivityPub

In this section, we provide a brief overview of the ActivityPub protocol. ActivityPub defines both client-to-server and server-to-server protocols. Because the MIMI transport layer only goes between two services, we focus on the server-to-server protocol.

At a very high level, ActivityPub is similar to SMTP with JSON messages and delivery over HTTP. An ActivityPub server forwards messages from local clients to their intended recipients, and receives messages from other servers intended for its local clients. An ActivityPub server also makes available metadata that support the functioning of the protocol.

4.1. Actors and Activities

The main entities in ActivityPub are Actors and Activities. In most cases, an Actor represents a user ("type": "Person"), but Actors can also represent automated services ("type": "Service") or collections of other Actors ("type": "Group"). Each Actor has a unique URI, from which a JSON-LD description of the Actor's attributes can be retrieved. Figure 3 shows a simple Actor description.

Activities represent a variety of actions within the system, including "Create" activities that carry new messages as well as things like "Add" and "Remove" to allow modifications of collections. Figure 4 shows a Create activity that reflects the creation of a new Note object. Activities can also carry metadata such as inReplyTo or tags.

{
  "@context": ["https://www.w3.org/ns/activitystreams"],
  "type": "Person",
  "id": "https://example.com/users/alice",
  "inbox": "https://example.com/users/alice/inbox",
  "outbox": "https://example.com/users/alice/feed"
}
Figure 3: A sample Actor
{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Create",
  "id": "https://example.net/~mallory/87374",
  "actor": "https://example.net/~mallory",
  "object": {
    "id": "https://example.com/~mallory/note/72",
    "type": "Note",
    "attributedTo": "https://example.net/~mallory",
    "content": "This is a note",
    "published": "2015-02-10T15:04:55Z",
    "to": ["https://example.org/~john/"],
    "cc": ["https://example.com/~erik/followers"]
  },
  "published": "2015-02-10T15:04:55Z",
  "to": ["https://example.org/~john/"],
  "cc": ["https://example.com/~erik/followers"]
}
Figure 4: A sample Activity

4.2. Activity Delivery

Delivery of activities in ActivityPub follows a push pattern, with the ability to pull messages as a fallback.

Each Actor has a "inbox" and "outbox" URIs, which allow external parties to deliver Activities to the Actor or read Activities that the Actor has posted, respectively.

To send an Activity to the Actor, a remote server sends an HTTP POST request to the Actor's inbox URI. When a client of an ActivityPub server asks it to distribute an Activity, the server identifies the set of Actors that are the intended recipients of the Activity (e.g., using the to and cc fields visible in Figure 4), and sends POSTs requests containing the activity to the Actors' inboxes.

Outbox URIs allow a remote server to query the list of Activities that the Actor has posted. To read Activities posted by the actor, a remote party sednes an HTTP GET request to the outbox URL. The outbox includes a paging function to allow traversal of large sets of Activities.

Both inbox and outbox requests are constrained by an authorization model, so that a server can constrain which Actors allowed to communicate.

4.3. Identity

The native identifiers for ActivityPub are Actor URIs. These URIs are HTTP URIs that both identify end users as well as services and groups and allow the metadata for the Actor to be retrieved.

HTTP URIs are of course not a very user-friendly identifier. So many applications based on ActivityPub use identifiers of the form @username@domain or simply @username when the domain is clear from context. These identifiers represent acct URIs [RFC7565], which, in the words of the RFC, "identify a user's account at a service provider, irrespective of the particular protocols that can be used to interact with the account".

In order to engage in ActivityPub interactions with an Actor given such an identifier, the application resolves the identifier to an Actor URI using WebFinger [RFC7033]. For example, given the URI acct:alice@example.com, the application would send a GET request to https://example.com/.well-known/webfinger?resource=acct:alice@example.com. The response would indicate various contact points associated with that account, as shown in Figure 5. The ActivityPub Actor URI is indicated by the href in the links entry with "type": "application/activity+json".

{
  "subject": "acct:alice@example.com",
  "links": [
    {
      "rel": "https://webfinger.net/rel/profile-page",
      "type": "text/html",
      "href": "https://example.com/@alice"
    },
    {
      "rel": "self",
      "type": "application/activity+json",
      "href": "https://example.com/users/alice"
    }
  ]
}
Figure 5: A WebFinger response for `acct:alice@example.com`

5. Using ActivityPub for MIMI

In this document, we consider the use of ActivityPub and related technologies for the transport and identity systems, and the integration of MLS for the E2E security layer [I-D.ietf-mls-protocol]. Message formats are not handled here.

Points at which ActivityPub would need to be extended are highlighted with [EXT]. These are the domains where the MIMI working group would need to define protocol extensions to build an overall messaging systme based on ActivityPub.

5.1. User Identity and Metadata

The primary identifier for a user is an acct URI, which is resolved to an Actor URI using WebFinger as described in Section 4.3.

Aside from UI considerations, this choice of primary identifier is important for authentication at the end-to-end security layer. An acct URI is a scoped identifier, in the sense that the domain is the authoritative source of information about what entity is represented by of the user portion of the URI. Indeed, this is the whole premise of using WebFinger for acct URI resolution.

[EXT] To leverage this information in an MLS-based end-to-end security layer, all that is needed is a credential issued by the domain that attests that the holder of a given signature key legitimately represents the user portion of hte URI, for example an X.509 certificate or Verifiable Credential [RFC5280] [W3C.vc-data-model]. MIMI would need to verify the format for such credentials and how a client receiving one would verify it, but would not need to specify an issuance API. However, given that domains are already assumed to know how to authenticate their users, such an API could be as simple as a single authenticated POST request containing a proof of control of a key pair, whose response would then contain the desired credential.

The ActivityPub Actor object contains optional fields that can provide additional metadata about a user, for example a profile URL or preferred username.

[EXT] The Actor object would be a convenient mechanism to distribute the cryptographic material required to initiate end-to-end secure communications with an actor, i.e., KeyPackage objects in the case of MLS. This facility would be slightly more complicated than the static metadata fields currently present. KeyPackages are intended to be single-use, so the server managing the Actor object would need to selectively provide different KeyPackages in response to differnet queries. Multi-device scenarios might require multiple KeyPackages to be provided in response to a single query.

5.2. Channels

The channel use case can be implemented by representing the channel as an ActivityPub Actor. Metadata related to the channel can be published and managed as part of the Actor object. In particular, the followers collection for the Actor can be used to track the membership of the channel, so that normal ActivityPub patterns can be followed for message delivery and membership management.

[EXT] A channel's Actor also tracks information about the end-to-end security state of the channel. For MLS, this would entail tracking information about an MLS group associated to the channel, most importantly the current epoch and ratchet tree. A channel may also need to store a GroupInfo object for the group, as discussed in Section 5.2.3.

In the context of federated messaging, the question of which server hosts a channel could be contentious. For example, if Alice creates a channel on Service A and invites Bob and Charlie from Services B and C, but then Alice leaves the channel, does Service A continue to host the channel even though none of their users are involved? If this is a problem the working group needs to tackle, it will likely be useful to follow the approaches used in Mastodon for moving or linking accounts, e.g., using a Move activity.

5.2.1. Channel Creation

When a channel is created on a service's server by a user of that server, no MIMI/ActivityPub action is needed. The server hosting the channel can notify the members of the channel that it has been created by sending a Create activity to their inboxes.

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    "urn:ietf:ns:mimi"
  ],
  "summary": "Alice created a channel",
  "type": "Create",
  "id": "https://example.com/activities/1",
  "actor": "https://example.com/user/alice",
  "object": {
    "type": "Service",
    "id": "https://example.com/channels/e4f70622",
    "name": "MIMI discussion group"
  },
  "mimi:welcome": "<base64-encoded Welcome>",
  "to": "https://john.example.org"
}
Figure 6: A Create activity announcing a new channel

[EXT] To set up the end-to-end security for the channel, the creator of the channel will need to fetch KeyPackages for the other members of the channel. For members using other services, KeyPackages can be fetched via the members' Actor objects, as discussed in Section 5.1. An MLS Welcome message enabling the members to initialize their MLS state is attached to the Create activity.

5.2.2. Message Delivery

Messages are sent within a channel by sending a Create activity to the channel Actor's inbox, addressed to the channel's followers. Following the "Forwarding from inbox" pattern discussed in [W3C.ActivityPub], the server hosting the channel will then forward the activity to inboxes of the members of the channel. The message content itself is an MLS PrivateMessage encapsulating the actual content to be delivered to the channel.

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Create",
  "id": "https://example.net/~mallory/87374",
  "actor": "https://example.net/~mallory",
  "object": {
    "id": "https://example.com/~mallory/note/72",
    "type": "Note",
    "attributedTo": "https://example.net/~mallory",
    "content": "<base64 encoded MLS PrivateMessage>",
    "published": "2015-02-10T15:04:55Z",
    "to": ["https://example.com/channels/e4f70622/followers"]
  },
  "published": "2015-02-10T15:04:55Z",
  "to": ["https://example.com/channels/e4f70622/followers"]
}
Figure 7: A Create activity sending a message to a channel

If the members have a sharedInbox field in their Actor objects, this delivery can be quite efficient at the inter-service level: Only one copy of the activity will be sent to each shared inbox, effectively once per service involved in the channel.

5.2.3. Membership and Metadata Management

Members of the channel add and remove other members by using Add and Remove activities to propose modifications to the followers collection associated to the channel's Actor. Add activities should be forward to the new member to make them aware of their membership in the channel.

[EXT] An Add or Remove activity must include an MLS Commit that implements the corresponding action on the MLS group. The Commit message must be sent as a PublicMessage so that the server can update its representation of the group's ratchet tree based on the content of the Commit. An Add activity must also include an MLS Welcome message allowing the new member to initialize their MLS state. Before accepting an Add or Remove activity for a channel, the server must verify that the attached Commit corresponds to the current MLS epoch for the channel, and reject the activity if this is not the case.

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Add",
  "id": "https://example.com/user/alice/98485",
  "actor": "https://example.com/user/alice",
  "object": "https://example.net/user/bob",
  "target": "https://example.com/channels/e4f70622/followers",
  "commit": "<base64-encoded MLS Commit>",
  "welcome": "<base64-encoded MLS Welcome>",
  "to": ["https://example.com/channels/e4f70622/followers"]
}
Figure 8: An Add activity adding a new member to a channel

Other channel metadata (e.g., the name of the channel) can be updated by sending an Update activity to the channel Actor's inbox.

{
  "@context": "https://www.w3.org/ns/activitystreams",
  "type": "Update",
  "actor": "https://example.com/user/alice",
  "object": {
    "type": "Service",
    "id": "https://example.com/channels/e4f70622",
    "name": "MIMI discussion group (now with more ActivityPub!)"
  },
  "to": "https://example.com/channels/e4f70622/inbox"
}
Figure 9

5.3. Group DMs

In principle, since group DMs don't have any independent state aside from the recipient list, groupDMs could be implemented directly using ActivityPub's addressing model. Activities could be directly addressed to other actors using the to field, and a service receiving an activity could associate it to a group DM based on the recipient list in the to field.

This approach would simplify certain things. For example, if a group DM used an Actor for distribution as with a channel, it would be necessary to explicitly enforce that there was only one such Actor per group DM; with direct addresising, no such common resources are created, so there is no need to ensure their uniqueness.

Such a decentralized approach, however, does not work well with MLS, which works best with a central coordination point to manage the sequencing of changes to MLS state. There are a couple of compromise options available here.

It might be feasible have the MLS groups attached to group DMs be immutable. The first person to send a message in a group DM would include a Welcome addressed to KeyPackages for all the other recipients. That message would initialize an MLS group including all the other recipients, which would be used to protect further messages.

While the immutability approach is appealing in its simplicity, it might not be workable. Participants in the group DM might want to update their keys for post-compromise security, or they might want to add a new device that they start using after the group DM starts. Both of these operations require changes to the MLS group.

To allow mutable MLS groups, group DMs could use direct addressing for message delivery, but link to an MLS group managed more like the MLS group attached to a channel.

6. Security Considerations

ActivityPub uses HTTPS for transport security on server-to-server interactions.

6.1. End-to-End Security

Section 5 includes provisions for implementing an end-to-end security layer based on MLS. As described in [I-D.ietf-mls-protocol], MLS requires a Delivery Service (DS) and an Authentication Service (AS) in order to be integrated into an application.

Here, the DS functions are provided in a decentralized fashion by the ActivityPub servers representing the interoperating services. KeyPackages are distributed via users' Actor objects (see Section 5.1). Other MLS messages are distributed as part of membership management activities (see Section 5.2.3).

The AS function is provided by the service-issued credentials discussed in Section 5.1.

6.2. Forward Secrecy

By default, MLS provides forward secrecy and post-compromise security for messages sent within a group. In the most straightforward application of MLS to messaging, this means that a new member of a channel will not be able to decrypt messages from before they joined the group. If providing access to historical messages is a desired feature, than further mechanism will be required to provide new members access to historical keys.

6.3. Authentication and Authorization

There are some open questions here related to authentication and authorization, for example:

  • How should servers authenticate each other?
  • How a receiving server knows that an Activity authentically comes from the Actor who is supposed to have sent it?
  • What access control policies can a server enforce on inbound messages?

The ActivityPub specification is very light on details on these topics. However, applications such as Mastodon have likely developed solutions that could be used as starting points.

7. IANA Considerations

This document has no IANA actions.

8. References

8.1. Normative References

[I-D.ietf-mls-protocol]
Barnes, R., Beurdouche, B., Robert, R., Millican, J., Omara, E., and K. Cohn-Gordon, "The Messaging Layer Security (MLS) Protocol", Work in Progress, Internet-Draft, draft-ietf-mls-protocol-18, , <https://datatracker.ietf.org/doc/html/draft-ietf-mls-protocol-18>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/rfc/rfc2119>.
[RFC7033]
Jones, P., Salgueiro, G., Jones, M., and J. Smarr, "WebFinger", RFC 7033, DOI 10.17487/RFC7033, , <https://www.rfc-editor.org/rfc/rfc7033>.
[RFC7565]
Saint-Andre, P., "The 'acct' URI Scheme", RFC 7565, DOI 10.17487/RFC7565, , <https://www.rfc-editor.org/rfc/rfc7565>.
[RFC8174]
Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, , <https://www.rfc-editor.org/rfc/rfc8174>.
[W3C.ActivityPub]
"ActivityPub", W3C REC activitypub, W3C activitypub, <https://www.w3.org/TR/activitypub/>.

8.2. Informative References

[Mastodon]
"Mastodon", n.d., <https://docs.joinmastodon.org/spec/activitypub/>.
[RFC5280]
Cooper, D., Santesson, S., Farrell, S., Boeyen, S., Housley, R., and W. Polk, "Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile", RFC 5280, DOI 10.17487/RFC5280, , <https://www.rfc-editor.org/rfc/rfc5280>.
[RFC561]
Bhushan, A., Pogran, K., Tomlinson, R., and J. White, "Standardizing Network Mail Headers", RFC 561, DOI 10.17487/RFC0561, , <https://www.rfc-editor.org/rfc/rfc561>.
[RFC6120]
Saint-Andre, P., "Extensible Messaging and Presence Protocol (XMPP): Core", RFC 6120, DOI 10.17487/RFC6120, , <https://www.rfc-editor.org/rfc/rfc6120>.
[RFC7702]
Saint-Andre, P., Ibarra, S., and S. Loreto, "Interworking between the Session Initiation Protocol (SIP) and the Extensible Messaging and Presence Protocol (XMPP): Groupchat", RFC 7702, DOI 10.17487/RFC7702, , <https://www.rfc-editor.org/rfc/rfc7702>.
[W3C.vc-data-model]
"Verifiable Credentials Data Model v1.1", W3C REC vc-data-model, W3C vc-data-model, <https://www.w3.org/TR/vc-data-model/>.

Acknowledgments

This investigation was inspired by a Mastodon post by Darius Kazemi.

Author's Address

Richard L. Barnes
Cisco