ariadne.space/content/blog/pleroma-litepub-activitypub...

73 lines
8.6 KiB
Markdown
Raw Normal View History

2022-08-02 22:16:40 +00:00
---
title: "Pleroma, LitePub, ActivityPub and JSON-LD"
date: "2018-11-12"
---
A lot of people make assumptions about my position on whether or not JSON-LD is actually good or not. The reality is that my view is more nuanced than that: there are _great_ uses for JSON-LD, but it's not appropriate in the scenario it is used in ActivityPub.
## What is JSON-LD anyway?
JSON-LD stands for _JSON Linked Data_. Linked Data is a “Big Data” technique which involves creating large graphs of interlinked pieces of data, intended to help enrich data sets with more semantic context (this is known as _graph coloring_), as well as additional data linked by URI (hince why it's called _linked data_). The Linked Data concept can be extremely powerful for data analysis when used in the appropriate context. A good example of where linked data is _useful_ is healthcare.gov, where they use it to help compare performance and value verses cost of US health insurance plans.
## ActivityPub and JSON-LD
Another example where JSON-LD is ostensibly used is ActivityPub. ActivityPub inherits it's JSON-LD dependency from ActivityStreams 2.0, which is a data format that enjoys wide use outside of the ActivityPub ecosystem: for example, Twitter, Instagram, Facebook and Tumblr all use variations of ActivityStreams 2.0 objects in various places inside their APIs.
These services find the JSON-LD concept useful because their advertising customers can leverage JSON-LD (in facebook, the _open graph_ concept they frequently pitch to advertisers is built in part on top of JSON-LD) to optimize their advertising campaigns.
But does JSON-LD provide any value in a social networking environment which does not have advertising? In my opinion, not really: it's just a artifact of the “if you're not the customer, you're the product” nature of the proprietary social networking services. As previously stated, the primary advantage of JSON-LD and the linked data philosophy in general is _data enrichment_, and _data enrichment_ is largely useful to two groups: _advertisers_ and _intelligence_ (public or private).
Since the federated social networking services don't have advertising, that just leaves _intelligence_.
### Private intelligence and social networking, how data enrichment can impact your credit score
There are various kinds of _private intelligence_ firms out there which collect information about you, me, and everyone else. You've probably heard of some of them, and some of the products they sell: companies like Experian, InfoCheckUSA and Equifax sell various products like FICO credit scores and background reports which determine everything from whether or not you can rent or buy a car or house to whether or not you can get a job.
But did you know these companies crawl your use of the proprietary social networking services? There are companies like [FriendlyScore](https://friendlyscore.com/) which sell credit-related data based on how you utilize social networking services. Those “social” credit scores are directly enabled by technology such as JSON-LD and ActivityStreams 2.0.
### Public intelligence and social networking, how data enrichment can get you killed
We've all heard about Predator drones and drone strikes in the news. In the past decade, drone strikes have been used to attack countless targets. But how do our public intelligence agencies determine who is a target? It's very similar to how the private intelligence agencies determine whether you should own a house or have a job: they use big data methods to analyze all of the metadata they collected.
If you write a post on a social networking service and attach GPS data to it, they can use that information to determine a general pattern of _when_ and _where_ you are, and then feed it into a machine learning algorithm to determine _when_ and _where_ you will likely be in the _future_. They can also use this metadata analysis to prove certain assertions about your identity to a level of certainty which determines if you become a target, even if you're not really the same person they are trying to find.
### Conclusion: safety is more important than data enrichment
These techniques that are used both in the public and private sector are what the press tend to refer to as “Big Data” techniques. JSON-LD is a “Big Data” technology that can be leveraged in these ways. But at the same time, we can leverage some “Big Data” techniques in such a way that JSON-LD parsers will automatically do what we want them to do.
In my opinion, it is a _critical_ obligation of federated social networking service developers to ensure that handling of data is done in the most secure way possible, built on proven fundamentals. I view the inclusion of JSON-LD in the ActivityPub and ActivityStreams 2.0 standards to be harmful toward that obligation.
## Pleroma and JSON-LD
As you may know, there are two mainstream ActivityPub servers that are in wide use: Mastodon and Pleroma. Mastodon uses JSON-LD and Pleroma does not. But they are able to interoperate just fine despite this. This is largely because Pleroma provides JSON-LD attributes in the messages it generates without actively using them itself.
### Handling ActivityPub in a world without JSON-LD
![The origin of the Transmogrifier name](images/Transmogrifier.png)
Instead, Pleroma has a module called `Transmogrifier` that translates between _real_ ActivityPub and our _ActivityPub internal representation_. The use of AP constructs in our internal representation is the origin of the statement that Pleroma uses ActivityPub internally, and to an extent it is a very truthful statement: our internal representation and object graph are directly derived from an earlier ActivityPub draft, but it's not _quite_ the same, and there have been a few bugs where things have not been translated correctly which have resulted in leaks and other problems.
Besides the `Transmogrifier`, we have two functions which fetch new pieces into the graphs we build: `Object.normalize()` and `Activity.normalize()`. This could be considered to be a similar approach to JSON-LD except that it's explicit instead of implicit. The explicit fetching of new graph pieces is a security feature: it allows us to validate that we actually trust what we're fetching before we do it. This helps us to prevent various “fake direction” attacks which can be used for spoofing.
## LitePub and JSON-LD
[LitePub](https://litepub.social/litepub) is a recent initiative that was started between Pleroma and a few other ActivityPub implementations to slim down the ActivityPub standard into something that is minimalist and secure. While LitePub itself does not require JSON-LD, LitePub implementations follow some JSON-LD like behaviors where it makes sense, and LitePub provides a `@context` which allows JSON-LD parsers to transparently parse LitePub messages.
### Leveraging Linked Data for Object Capability Enforcement
The main principle LitePub is built on is the use of leveraging the linked data paradigm to perform object capability enforcement. This can work either _explicitly_ (as is done in Pleroma) or _implicitly_ (as is done in Mastodon when parsing a LitePub activity).
We do this by treating every `Object` ID in LitePub as a _capability URI_. When processing messages that reference a _capability URI_, we check to make sure the _capability URI_ is still valid by re-fetching the object. If fetching the object fails, then the _capability URI_ is no longer valid. This prevents zombie activities.
### A note on Zombie Activities
There are two primary ways of securing ActivityPub implementations with digital signatures: [JSON Linked Data Signatures (LDSigs)](https://w3c-dvcg.github.io/ld-signatures/) and the construction built on [HTTP Signatures that is leveraged in LitePub](https://litepub.social/litepub/overview.html). These can be referred to as _inline_ signatures and _transient_ signatures, respectively.
The problem with _inline_ signatures is that they are valid forever. LDSig signatures have no expiration and have no revocation method. Because of this, if an `Object` is deleted, it can come back to life. The solution created by the LDSig advocates is to use `Tombstone` objects for all deletions, but that creates a potential metadata leak that proves a post once existed which harms plausible deniability.
The LitePub approach on the other hand is to treat all objects as _capability URIs_. This means when an object is deleted, future attempts to access the _capability URI_ fail and thus the object cannot come back to life through boosting or other means.
## Conclusion
Hopefully this clarifies my views on JSON-LD and it's applications in the fediverse. Feel free to ask me questions if you have any.