Introduction
All of this is very work in progress :) plz do not rely on any of the descriptions or statements here, as they are all effectively provisional.
This site describes the implementation of the p2p linked data protocol in {{#cite saundersDecentralizedInfrastructureNeuro2022 }}
Overview
p2p-ld
Background
- Semweb/Linked Data
- Limitations/differences of existing p2p
Use
- How is this intended to be used? by whom? in what contexts?
Roadmap
- Development roadmap and timeline!
Comparison
All of this is TODO. Comparison to existing protocols and projects (just to situate in context, not talk shit obvs)
"The big ones"
- BitTorrent
- IPFS
"The research ones"
- Dat
- Hypercore
Social
- ActivityPub/Fediverse
- Secure Scuttlebutt
- Matrix
Semweb/LD
- SOLID
- Nanopubs
To be categorized
- Agregore
- Arweave
- CAN
- Chord
- Earthstar
- Freenet
- Manyverse
- P2panda
- SAFE
- Storj
- Swarm
Points of comparison
- not append-only
- metadata
P2P Concepts
Overview of the various concepts that p2p systems have to handle or address with links to the sections where we address them!
- Definitions - Terms used within the protocol spec
- Protocol - The protocol spec itself, which encompasses the following sections and describes how they relate to one another.
- Identity - How each peer in the swarm is identified (or not)
- Discovery - How peers are discovered and connected to in the swarm, or, how an identity is dereferenced into some network entity.
- Data Structures - What and how data is represented within the protocol
- Querying - How data, or pieces of data are requested from hosting peers
- Evolvability - How the protocol is intended to accommodate changes, plugins, etc.
Additionally, p2p-ld considers these additional properties that are not universal to p2p protocols:
- Vocabulary - The linked data vocabulary that is used within the protocol
- Encryption - How individual messages can be encrypted and decrypted by peers
- Federation - How peers can form supra-peer clusters for swarm robustness, social organization, and governance
- Backwards Compatibility - How the protocol integrates with existing protocols and technologies.
Out of Scope
What should explicitly be left out of the protocol?
Implementation
Things that are described in the spec, but details are left up to the implementation
- codecs: the spec describes how to define a codec, but does not include any codecs.
Definitions
Protocol
Connection
- When connecting to a peer, a peer MUST advertise its own connections to other peers whose discoverability permissions allow it
- eg. a peer can desig
Requests
Sharding
Backlinks
Every link has an implicit backlink that can be accepted/denied by the owner of the referenced object.
If a link is proposed from a blocked identifier, the proposed link is automatically dropped
Identity
How is an individual peer identified?
- Cryptographic identity
- Web of trust/shared identity
- External verification/discovery via DNS and other out of band means.
Instances
A given identity can have 0 or many instances - a manifestation of the peer within a particular server and runtime.
Each instance indicates a collection of peers
When connecting to a peer, the peer MUST tell the connecting peer of the instances that are within its permission scope.
Aliases
A given identity can have 0 or many bidirectional links indicating that the identity is sameAs
another
- eg. a fediverse account can indicate a cryptographic identity and then be used equivalently.
- Verification aliases MUST have a backlink from the original identity
- Subscribers to a given identity MUST store and represent the known aliases and treat them as equivalent
- Other accounts can give an alias to an identity that MAY be accepted (by issuing a backlink) or denied (by ignoring it).
Succession
An identity has a specific field indicating whether it is "active" or "retired," and can issue a special top-level link with given permission scope indicating the identity that succeeds it. - eg in the case of harrassment, one can hop identities and only tell close friends.
Beacons
Any peer can operate as a "Pub" (in the parlance of SSB) or a bootstrapping node, where a dereferenceable network location (eg. DNS) can be resolved to a
A given identity can have 0 or many static inbound references that can resolve a network
Discovery
How do we find people and know how to connect to them?
- Bootstrapping initial connections
- Gossiping
- Hole punching
Data Structures
Triplet graphs similar to linked data fragments with envelopes. decoupling content addressing from versioning
- Merkel DAGs
- Envelopes
- Versioning
- Typed objects with formatting
Containers
- Packets of LD-triplets that contain
- Hash of triplets
- Encryption Info (if applicable)
- Permissions scope
- Signature
- Anything that can be directly referenced without local qualifier is a container.
- Triplets within a container can be referenced with the query syntax
- Containers also behave like "feeds"
- Eg. one might put their blog posts in
@user:blog
or
- Eg. one might put their blog posts in
- The account identifier is the top-level container.
- Ordering:
- Every triple within a scope is ordered by default by the time it is declared
- A container can declare its ordering (see vocabulary)
- Naming:
- Each container intended to be directly referenced SHOULD contain a
name
so it can be referenced w.r.t its parent:@<ACCOUNT>:<name>
- Each container can also be indicated numerically
- Each container intended to be directly referenced SHOULD contain a
- Identity: Each container is uniquely identified by the hash of its contents and the hash of the account identifier.
- Format: A container can specify one or several ways it can be displayed
- Capabilities: A container can specify different capabilities that another account can take (eg. "Like", "Upvote", "Reply")
- Capabilities should also contain a permissions scope, if none is present, the global scope is assumed.
Triplets
- Triplet format
- Objects require a shortname that can be hierarchically indexed from
- Types/Schema
- Including intrinsic notion of nesting
- every object can have blank/positionally indexed children
- every triple can have blank/positionally indexed "qualifiers" like RDF-star or wikidata's qualifiers.
Schema
Codecs
See IPLD Codecs and Linked Data Platform spec
Means of interacting with binary data.
Describes
- Format
Versioning
- A given container has an identity hash from its first packing
- A given triple can be contained by
Vocabulary
Imports
skos:sameAs
- for declaring that a given triplet is equivalent to another.
Container
ordering
- how the children are to be ordereddeclaration
- makes numerical references stronger, but less predictable.alphabetic
- makes numerical references weaker, but more predictable
Social
- Containers of other accounts
- proxy identites: a given identity can specify a collection of alts that can only be resolved with the correct permission scope - so eg. a public account that is stable can be linked to by an abusive user, but they won't be able to resolve a more private alt.
- Peer Relationship Types
- Other peers can be given special roles that allow them to operate on behalf of the peer in mutually independent ways:
- Keybearer - also share a given private key,
- Visibility
- A peer can indicate that it is visible to a given scope as defined by a collection of peers and associated rules.
- eg. a "close friends" collection could be given the visibility rule to make a peer visible to n-deep friends of friends.
- A
Querying
How do we find peers that have subgraphs that are responsive to what we want?
Syntax
Location
How to refer to a given container, eg.
@user:containerName:childName
or numerically
@user:containerName:{0}
Children
Version
How to refer to a specific version of a container
References without version qualification indicate the most recent version at the time of containerizing the links.
Query Fragments
Using blank subgraphs to specify queries
Encryption
How can we make it possible to have a protocol that is "open" when it is intended to, but also protects privacy and consent when we need it to?
Federation
Making supra-peer clusters with explicit governance and policies for rehosting and sharing!
- Creating federations of peers
Sharding
Splitting data across multiple peers within a federation
Moderation
Federations MUST maintain a list of
Backwards Compatibility
- HTTP
- Bittorrent
- IPFS
- ActivityPub
HTTP Servers
- Using existing HTTP servers as web-seed like things.
- Use codecs to indicate the format and metadata of existing files
- Use HTTP servers as backup mirrors that behave like peers, and how peers can indicate them as mirrors for a given container
BitTorrent
IPFS
ActivityPub
- Mappings:
- Container <-> feed
Evolvability
Sketchpad
Dummy change to check that we don't invalidate the rust cache on CI.
System Diagram
Just a stub to check if mermaid works
erDiagram
IDENTITY {
string hash
}
INSTANCE {
string ip
string client
}
BEACON {
string uri
}
IDENTITY ||--o{ INSTANCE : runs
BEACON }o--|{ INSTANCE : links
BEACON }o--|| IDENTITY : represents
Graph Data Model
- Triplets
- Containers
- Codecs