new blog: building fair webs of trust via ocap
ci/woodpecker/push/woodpecker Pipeline failed Details

main
Ariadne Conill 2022-12-03 01:26:38 -06:00
parent 63ad030dad
commit 41350afedf
1 changed files with 176 additions and 0 deletions

View File

@ -0,0 +1,176 @@
---
title: "Building fair webs of trust by leveraging the OCAP model"
date: "2022-12-03"
---
Since the beginning of the Internet, determining the trustworthiness
of participants and published information has been a significant
point of contention.
Many systems have been proposed to solve these underlying concerns,
usually pertaining to specific niches and communities, but these
pre-existing solutions are nebulous at best.
How can we build infrastructure for truly democratic Webs of Trust?
## Fairness in reputation-based systems
When considering the design of a reputation-based system, *fairness*
must be paramount, but what is *fairness* in this context?
A reputation-based system can be considered *fair* if it appropriately
balances the concerns of the data publisher, the data subject, and
the data consumer.
Regulatory frameworks such as the GDPR attempt to provide guidance
concering how this balance can be accomplished in the general sense
of building internet services, but these frameworks are large and
complicated, and as such make it difficult to provide a definition
which is adequate for a reputation-based trust system.
To understand how these concerns must be balanced, we must understand
the underlying risks for each participant in a reputation-based system:
- The **data subject** is at risk of harm to their professional
reputation due to annotations they did not consent to, and mistakes
in those annotations.
This is a problem which has already captured regulatory ire, as I
will explain later.
- The **data publisher** is at risk of being sued for defamation due to
the annotations they publish.
- The **data consumer** is at risk of being misled by inaccurate
annotations they consume.
A *fair* reputation-based system must attempt to provide an adequate
balance between these concerns through active harm reduction in its
design:
- The harm to the **data subject** from misleading annotations can be
reduced by blinding the identity of the data subject.
- The harm to the **data publisher** from misleading annotations can
also be reduced by blinding the identity of the data subject.
- The harm to the **data consumer** from misleading annotations can be
reduced by allowing them to consume annotations from multiple sources.
## Shinigami Eyes, or how designing for fairness can be difficult
The [Shinigami Eyes][se] browser extension was designed to help people
establish trust in various web resources using a reputation-based system.
In general, the author attempted to make thoughtful choices to ensure
the system was reasonably fair in its design.
However the system has [a number of flaws, both technical and social][er],
which highlight how building systems of trust requires a detailed
understanding concerning how the underlying primitives interact and
the consequences of those interactions.
[se]: https://shinigami-eyes.github.io/
[er]: https://eyereaper.evelyn.moe/
### Shinigami Eyes and Blinding
As already noted, a *fair* reputation-based system must blind the identity
of the data subject to protect both the data subject and data publisher.
The approach used by Shinigami Eyes was to use a bloom filter constructed
with a 32-bit [`FNV-1a` hash][fnv].
[fnv]: http://www.isthe.com/chongo/tech/comp/fnv/index.html
The FNV family of hashes are a non-cryptographic family of hashes, which
provide scalability up to 1024 bits, which works by performing an XOR of
the current byte's value against the current hash value, then multiplying
that value by the designated FNV prime.
There is an alternate set of FNV hashes which swaps the XOR and
multiplication steps, which is the variant used by Shinigami Eyes.
The use of a bloom filter is an acceptable blinding method, assuming that
the underlying hash provides sufficient resolution, such as a 256-bit
or 512-bit hash.
Presumably, due to the constraints of having to run as a JavaScript extension,
the weak 32-bit `FNV-1a` hash was used instead.
Because of this, while the reputation lists used by Shinigami Eyes were
acceptably blinded, there was an extremely [high risk of false positives
caused by hash collisions][collided-account].
[collided-account]: https://twitter.com/x0s1jpnq2sk2
Concerns about the technical implementation of the Shinigami Eyes extension
led Datatilsynet, the Norwegian GDPR regulatory agency, to [ban the extension][se-ban]
at the end of 2021, and development of the extension appears to have
ended as a result of their initial inquiry.
[se-ban]: https://www.datatilsynet.no/en/news/2021/varsler-forbud-mot-nettleserutvidelsen-shinigami-eyes-i-norge/
## Can we build systems like Shinigami Eyes more robustly?
The main reason why Shinigami Eyes gained attention of Datatilsynet was due to
the centralized nature of the data processing.
Can we build a system which avoids centralized data processing and promotes
democratic participation?
Yes, it is quite easy, but like most things, the challenge will be delivering
a good user experience.
### Leveraging the OCAP model to build a robust solution
The largest problem in building this system is ensuring that the published
reputation data is reliably blinded.
To this end, I propose that feeds are a simple dataset containing a set of
blinded hashes and annotations.
The physical representation of the dataset does not matter, though keeping
it as simple as possible will expand the number of places where the data
can be consumed.
In the Object Capability model, we can think of the physical feed as an
*object*, and a blinding key as a *capability* to access that object in a
useful way.
You have to have both in order for either to be useful.
A participant can publish multiple copies of their feed, with different
blinding keys for each friend they wish to share it with, or they can
choose to publish a single key and share the same key with every friend,
or even the public at large.
Users can then choose which feeds they want to use when making trust
decisions from the collection of feeds and blinding keys they have been
given.
By comparison to Shinigami Eyes, this better satisfies the conditions for
*fairness*: there is no risk of a false positive, the contents of the
reputation lists remain private, and publishers can choose to consent to
data sharing requests however they wish.
### Choosing a reasonable set of primitives
To build such a system, I would probably personally choose to use
`HMAC-SHA3-256` as the blinding primitive.
This provides a good balance between collision protection,
cryptographic strength, and hash resolution.
A scheme which provides less than 256 bits of hash resolution should
be avoided due to the risk of collisions.
I would distribute the feeds as CSV files.
This would allow users the most flexibility in managing feeds, they
could distribute different feeds with different meanings, and include
extended data alongside the blinded hash as a form of annotation.
On the client side, I would calculate sets of blinded hashes for each
possible subset of the URI, all the way to the parent domain.
By doing so, it would be possible for feeds to match against a large
number of children URIs instead of having to list them all manually.
Implementations should store the learned hashes in a [radix trie][rt].
This allows the hash lookups to be done in constant time, as well
as allowing for automatic bucketing, which can be helpful for
implementing quorum requirements.
[rt]: https://en.wikipedia.org/wiki/Radix_tree
## Things we can build with this
The use of friend-to-friend reputation-based systems can be powerful.
They provide accountability (as you know who you are getting your
data from) and collaboration (your friends can consume your data in
exchange).
They can be used in the way Shinigami Eyes was used: to allow interested
parties to identify resources they should trust or distrust, but they can
also be used to enable collaborative blocking amongst friends and system
administrators.
They can also be used to determine if e-mail domains or URLs inside e-mails
are actually trustworthy.
The possibilities are truly endless.