ariadne.space/content/blog/federation-what-flows-where...

83 lines
5.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

---
title: "Federation what flows where, and why?"
date: "2019-07-13"
---
With all of the recent hullabaloo with Gab, and then, today Kiwi Farms joining the fediverse, there has been a lot of people asking questions about how data flows in the fediverse and what exposure they actually have.
I'm not really particularly a fan of either of those websites, but that's beside the point. The point here is to provide an objective presentation of how instances federate with each other and how these federation transactions impact exposure.
## How Instances Federate
To start, lets describe a basic model of a federated network. This network will have five actors in it:
- _alyssa@social.example_
- _bob@chatty.example_
- _chris@photos.example_
- _emily@cat.tube_
- _sophie@activitypub.dev_
(yeah yeah, I know, I'm not that good at making up fake domains.)
Next, we will build some relationships:
- Sophie follows Alyssa and Bob
- Emily follows Alyssa and Chris
- Chris follows Emily and Alyssa
- Bob follows Sophie and Alyssa
- Alyssa follows Bob and Emily
### Broadcasts
Normally posts flow through the network in the form of broadcasts. A broadcast type post is one that is sent to and only to a pre-determined set of targets, typically your followers collection.
So, this means that if Sophie makes a post, `chatty.example` is the only server that gets a copy of it. It does not matter that `chatty.example` is peered with other instances (`social.example`).
This is, by far, the majority of traffic inside the fediverse.
### Relaying
The other kind of transaction is easily described as _relaying_.
To extend on our example above, lets say that Bob chooses to `Announce` (Mastodon calls this a boost, Pleroma calls this a repeat) the post Sophie sent him.
Because Bob is followed by Sophie and Alyssa, both of these people receive a copy of the `Announce` activity (an activity is a message which describes a transaction). Relay activities refer to the original message by it's unique identifier, and recipients of `Announce` activities use the unique identifier to fetch the referred message.
For now, we will assume that Alyssa's instance (`social.example`) was able to succeed in fetching the original post, because there's presently no access control in practice on fetching posts in ActivityPub.
This now means that Sophie's original post is present on three servers:
- `activitypub.dev`
- `chatty.example`
- `social.example`
Relaying can cause perceived problems when an instance blocks another instance, but these problems are actually caused by a lack of access control on object fetches.
### Replying
A variant on the broadcast-style transaction is a `Create` activity that references an object as a reply.
Lets say Alyssa responds to Sophie's post that was boosted to her. She composes a reply that references Sophie's original post with the `inReplyTo` property.
Because Alyssa is followed by actors on the entire network, now the entire network goes and fetches Sophie's post and has a copy of it.
This too can cause problems when an instance blocks another. And like in the relaying case, it is caused by a lack of access control on object fetches.
### Metadata Leakage
From time to time, people talk about _metadata leakage_ with ActivityPub. But what does that actually mean?
Some people erroneously believe that the _metadata leakage_ problem has to do with public (without access control) posts appearing on instances which they have blocked. While that is arguably a problem, that problem is related to the lack of access controls on public posts. The technical term for a publicly available post is `as:Public`, a reference to the security label that is applied to them.
The _metadata leakage_ problem is an entirely different problem. It deals with posts that are not labelled `as:Public`.
The _metadata leakage_ problem is this: If Sophie composes a post addressed to her followers collection, then only Bob receives it. So far, so good, no leakage. However, because of bad implementations (and other problems), if Bob replies back to Sophie, then his post will be sent not only to Sophie, but Alyssa. Based on _that_, Alyssa now has knowledge that Sophie posted _something_, but no actual idea what that _something_ was. That's why it's called a _metadata leakage_ problem — _metadata_ about one of Sophie's objects existing and it's contents (based on the text of the reply) are leaked to Alyssa.
This problem is the big one. It's not _technically_ ActivityPub's fault, either, but a problem in how ActivityPub is typically implemented. But at the same time, it means that followers-only posts can be risky. Mastodon covers up the metadata leakage problem by hiding replies to users you don't follow, but that's all it is, a cover up of the problem.
### Solution?
The solution to the metadata leakage problem is to have replies be forwarded to the OP's audience. But to do this, we need to rework the way the protocol works a bit. That's where proposals like moving to an OCAP-based variant of ActivityPub come into play. In those variants, doing this is easy. But in what we have now, doing this is difficult.
Anyway, I hope this post helps to explain how data flows through the network.