new blog: building fair webs of trust via ocap

2022-12-03 01:26:38 -06:00 · 2022-12-03 01:26:38 -06:00 · 41350afedf
parent 63ad030dad
commit 41350afedf
1 changed files with 176 additions and 0 deletions
--- a/content/blog/building-fair-webs-of-trust-by-leveraging-the-ocap-model.md
+++ b/content/blog/building-fair-webs-of-trust-by-leveraging-the-ocap-model.md
@ -0,0 +1,176 @@
 ---
 title: "Building fair webs of trust by leveraging the OCAP model"
 date: "2022-12-03"
 ---
 Since the beginning of the Internet, determining the trustworthiness
 of participants and published information has been a significant
 point of contention.
 Many systems have been proposed to solve these underlying concerns,
 usually pertaining to specific niches and communities, but these
 pre-existing solutions are nebulous at best.
 How can we build infrastructure for truly democratic Webs of Trust?
 ## Fairness in reputation-based systems
 When considering the design of a reputation-based system, *fairness*
 must be paramount, but what is *fairness* in this context?
 A reputation-based system can be considered *fair* if it appropriately
 balances the concerns of the data publisher, the data subject, and
 the data consumer.
 Regulatory frameworks such as the GDPR attempt to provide guidance
 concering how this balance can be accomplished in the general sense
 of building internet services, but these frameworks are large and
 complicated, and as such make it difficult to provide a definition
 which is adequate for a reputation-based trust system.
 To understand how these concerns must be balanced, we must understand
 the underlying risks for each participant in a reputation-based system:
 - The **data subject** is at risk of harm to their professional
  reputation due to annotations they did not consent to, and mistakes
  in those annotations.
  This is a problem which has already captured regulatory ire, as I
  will explain later.
 - The **data publisher** is at risk of being sued for defamation due to
  the annotations they publish.
 - The **data consumer** is at risk of being misled by inaccurate
  annotations they consume.
 A *fair* reputation-based system must attempt to provide an adequate
 balance between these concerns through active harm reduction in its
 design:
 - The harm to the **data subject** from misleading annotations can be
  reduced by blinding the identity of the data subject.
 - The harm to the **data publisher** from misleading annotations can
  also be reduced by blinding the identity of the data subject.
 - The harm to the **data consumer** from misleading annotations can be
  reduced by allowing them to consume annotations from multiple sources.
 ## Shinigami Eyes, or how designing for fairness can be difficult
 The [Shinigami Eyes][se] browser extension was designed to help people
 establish trust in various web resources using a reputation-based system.
 In general, the author attempted to make thoughtful choices to ensure
 the system was reasonably fair in its design.
 However the system has [a number of flaws, both technical and social][er],
 which highlight how building systems of trust requires a detailed
 understanding concerning how the underlying primitives interact and
 the consequences of those interactions.
   [se]: https://shinigami-eyes.github.io/
   [er]: https://eyereaper.evelyn.moe/
 ### Shinigami Eyes and Blinding
 As already noted, a *fair* reputation-based system must blind the identity
 of the data subject to protect both the data subject and data publisher.
 The approach used by Shinigami Eyes was to use a bloom filter constructed
 with a 32-bit [`FNV-1a` hash][fnv].
   [fnv]: http://www.isthe.com/chongo/tech/comp/fnv/index.html
 The FNV family of hashes are a non-cryptographic family of hashes, which
 provide scalability up to 1024 bits, which works by performing an XOR of
 the current byte's value against the current hash value, then multiplying
 that value by the designated FNV prime.
 There is an alternate set of FNV hashes which swaps the XOR and
 multiplication steps, which is the variant used by Shinigami Eyes.
 The use of a bloom filter is an acceptable blinding method, assuming that
 the underlying hash provides sufficient resolution, such as a 256-bit
 or 512-bit hash.
 Presumably, due to the constraints of having to run as a JavaScript extension,
 the weak 32-bit `FNV-1a` hash was used instead.
 Because of this, while the reputation lists used by Shinigami Eyes were
 acceptably blinded, there was an extremely [high risk of false positives
 caused by hash collisions][collided-account].
   [collided-account]: https://twitter.com/x0s1jpnq2sk2
 Concerns about the technical implementation of the Shinigami Eyes extension
 led Datatilsynet, the Norwegian GDPR regulatory agency, to [ban the extension][se-ban]
 at the end of 2021, and development of the extension appears to have
 ended as a result of their initial inquiry.
   [se-ban]: https://www.datatilsynet.no/en/news/2021/varsler-forbud-mot-nettleserutvidelsen-shinigami-eyes-i-norge/
 ## Can we build systems like Shinigami Eyes more robustly?
 The main reason why Shinigami Eyes gained attention of Datatilsynet was due to
 the centralized nature of the data processing.
 Can we build a system which avoids centralized data processing and promotes
 democratic participation?
 Yes, it is quite easy, but like most things, the challenge will be delivering
 a good user experience.
 ### Leveraging the OCAP model to build a robust solution
 The largest problem in building this system is ensuring that the published
 reputation data is reliably blinded.
 To this end, I propose that feeds are a simple dataset containing a set of
 blinded hashes and annotations.
 The physical representation of the dataset does not matter, though keeping
 it as simple as possible will expand the number of places where the data
 can be consumed.
 In the Object Capability model, we can think of the physical feed as an
 *object*, and a blinding key as a *capability* to access that object in a
 useful way.
 You have to have both in order for either to be useful.
 A participant can publish multiple copies of their feed, with different
 blinding keys for each friend they wish to share it with, or they can
 choose to publish a single key and share the same key with every friend,
 or even the public at large.
 Users can then choose which feeds they want to use when making trust
 decisions from the collection of feeds and blinding keys they have been
 given.
 By comparison to Shinigami Eyes, this better satisfies the conditions for
 *fairness*: there is no risk of a false positive, the contents of the
 reputation lists remain private, and publishers can choose to consent to
 data sharing requests however they wish.
 ### Choosing a reasonable set of primitives
 To build such a system, I would probably personally choose to use
 `HMAC-SHA3-256` as the blinding primitive.
 This provides a good balance between collision protection,
 cryptographic strength, and hash resolution.
 A scheme which provides less than 256 bits of hash resolution should
 be avoided due to the risk of collisions.
 I would distribute the feeds as CSV files.
 This would allow users the most flexibility in managing feeds, they
 could distribute different feeds with different meanings, and include
 extended data alongside the blinded hash as a form of annotation.
 On the client side, I would calculate sets of blinded hashes for each 
 possible subset of the URI, all the way to the parent domain.
 By doing so, it would be possible for feeds to match against a large
 number of children URIs instead of having to list them all manually.
 Implementations should store the learned hashes in a [radix trie][rt].
 This allows the hash lookups to be done in constant time, as well
 as allowing for automatic bucketing, which can be helpful for
 implementing quorum requirements.
   [rt]: https://en.wikipedia.org/wiki/Radix_tree
 ## Things we can build with this
 The use of friend-to-friend reputation-based systems can be powerful.
 They provide accountability (as you know who you are getting your
 data from) and collaboration (your friends can consume your data in
 exchange).
 They can be used in the way Shinigami Eyes was used: to allow interested
 parties to identify resources they should trust or distrust, but they can
 also be used to enable collaborative blocking amongst friends and system
 administrators.
 They can also be used to determine if e-mail domains or URLs inside e-mails
 are actually trustworthy.
 The possibilities are truly endless.