ariadne.space/content/blog/json-ld-is-ideal-for-cloud-...

7.5 KiB

title date
JSON-LD is ideal for Cloud Native technologies 2022-02-11

Frequently I have been told by developers that it is impossible to have extensible JSON documents underpinning their projects, because there may be collisions later. For those of us who are unaware of more capable graph serializations such as JSON-LD and Turtle, this seems like a reasonable position. Accordingly, I would like to introduce you all to JSON-LD, using a practical real-world deployment as an example, as well as how one might use JSON-LD to extend something like OCI container manifests.

You might feel compelled to look up JSON-LD on Google before continuing with reading this. My suggestion is to not do that, because the JSON-LD website is really aimed towards web developers, and this explanation will hopefully explain how a systems engineer can make use of JSON-LD graphs in practical terms. And, if it doesn't, feel free to DM me on Twitter or something.

what JSON-LD can do for you

Have you ever wanted any of the following in the scenarios where you use JSON:

  • Conflict-free extensibility
  • Strong typing
  • Compatibility with the RDF ecosystem (e.g. XQuery, SPARQL, etc)
  • Self-describing schemas
  • Transparent document inclusion

If you answered yes to any of these, then JSON-LD is for you. Some of these capabilities are also provided by the IETF's JSON Schema project, but it has a much higher learning curve than JSON-LD.

This post will be primarily focused on how namespaces and aliases can be used to provide extensibility while also providing backwards compatibility for clients that are not JSON-LD aware. In general, I believe strongly that any open standard built on JSON should actually be built on JSON-LD, and hopefully my examples will demonstrate why I believe this.

ActivityPub: a real-world case study

ActivityPub is a protocol that is used on the federated social web (thankfully entirely unrelated to Web3), that is built on the ActivityStreams 2.0 specification. Both ActivityPub and ActivityStreams are RDF vocabularies that are represented as JSON-LD documents, but you don't really need to know or care about this part.

This is a very simplified representation of an ActivityPub actor object:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "alsoKnownAs": {
        "@id": "as:alsoKnownAs",
        "@type": "@id"
      },
      "sec": "https://w3id.org/security#",
      "owner": {
        "@id": "sec:owner",
        "@type": "@id"
      },
      "publicKey": {
        "@id": "sec:publicKey",
        "@type": "@id"
      },
      "publicKeyPem": "sec:publicKeyPem",
    }
  ],
  "alsoKnownAs": "https://corp.example.org/~alice",
  "id": "https://www.example.com/~alice",
  "inbox": "https://www.example.com/~alice/inbox",
  "name": "Alice",
  "type": "Person",
  "publicKey": {
    "id": "https://www.example.com/~alice#key",
    "owner": "https://www.example.com/~alice",
    "publicKeyPem": "..."
  }
}

Pay attention to the @context variable here, it is doing a few things:

  1. It pulls in the entire ActivityStreams and ActivityPub vocabularies by reference. These can be downloaded on the fly or bundled with the application using context preloading.
  2. It then defines a few terms outside of those vocabularies: alsoKnownAs, sec, owner, publicKey and publicKeyPem.

When an application that is JSON-LD aware parses this document, it will receive a document that looks like this:

{
  "@context": [
    "https://www.w3.org/ns/activitystreams",
    {
      "alsoKnownAs": {
        "@id": "as:alsoKnownAs",
        "@type": "@id"
      },
      "sec": "https://w3id.org/security#",
      "owner": {
        "@id": "sec:owner",
        "@type": "@id"
      },
      "publicKey": {
        "@id": "sec:publicKey",
        "@type": "@id"
      },
      "publicKeyPem": "sec:publicKeyPem",
    }
  ],
  "@id": "https://www.example.com/~alice",
  "@type": "Person",
  "as:alsoKnownAs": "https://corp.example.org/~alice",
  "as:inbox": "https://www.example.com/~alice/inbox",
  "as:name": "Alice",
  "sec:publicKey": {
    "@id": "https://www.example.com/~alice#key",
    "sec:owner": "https://www.example.com/~alice",
    "sec:publicKeyPem": "..."
  }
}

This allows extensions to interoperate with minimal conflicts, as the application is operating on a normalized version of the document that has as many things namespaced as possible, without the user having to worry about it. This allows a parser to easily ignore things it does not know about, as they aren't defined in the context (which does not actually have to be defined, you can preload a root context), and so they aren't placed in a namespace.

In other words, that @context variable can be built into the application, or stored in an S3 bucket somewhere, or whatever you want to do. If you are planning to have an interoperable protocol, however, providing a useful @context is crucial.

How OCI image manifests could benefit from JSON-LD

There was a discussion on Twitter this evening about how extending the OCI image spec with signature references has taken a year. If OCI used JSON-LD (ironically, its JSON vocabulary is already similar to several pre-existing JSON-LD ones), then implementations could just store the pre-existing metadata, mapped to a namespace. In the case of an OCI image, this might look something like:

{
  "@context": [
    "https://opencontainers.org/ns",
    {
      "sigstore": "https://sigstore.dev/ns",
      "reference": {
        "@type": "@id",
        "@id": "sigstore:reference"
      }
    }
  ],
  "config": {
    "mediaType": "application/vnd.oci.image.config.v1+json",
    "digest": "sha256:d539cd357acb4a6df2a4ef99db5fe70714458349232dad0ec73e1ed65f6a0e13",
    "size": 585
  },
  "layers": [
    {
      "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
      "digest": "sha256:59bf1c3509f33515622619af21ed55bbe26d24913cedbca106468a5fb37a50c3",
      "size": 2818413
    },
    {
      "mediaType": "application/vnd.example.signature+json",
      "size": 3514,
      "digest": "sha256:19387f68117dbe07daeef0d99e018f7bbf7a660158d24949ea47bc12a3e4ba17",
      "reference": {
        "mediaType": "application/vnd.oci.image.layer.v1.tar+gzip",
        "digest": "sha256:59bf1c3509f33515622619af21ed55bbe26d24913cedbca106468a5fb37a50c3",
        "size": 2818413
      }
    }
  ]
}

The differences are minimal from a current OCI image manifest. Namely, schemaVersion has been deleted, because JSON-LD handles this detail automatically, and the signature reference extension has been added as the sigstore:reference property. Hopefully you can imagine how the rest of the document looks namespace wise.

One last thing about this example. You might notice that I am using URIs when I define namespaces in the @context. This is a great feature of the RDF ecosystem: you can put up a webpage at those URIs defining how to make use of the terms defined in the namespace, meaning that JSON-LD tooling can have rich documentation built in.

Also, since I am well aware that basically all of these OCI tools are written in Go, it should be noted that Go has an excellent implementation of JSON-LD, and for those concerned that W3C proposals are sometimes not in touch with reality, the creator of JSON-LD has some words about it that are interesting. Now, please, use JSON-LD and stop worrying about extensibility in open technology, this problem is totally solved.