diff --git a/src/comparison/activitypub.md b/src/comparison/activitypub.md deleted file mode 100644 index 9a83a0d..0000000 --- a/src/comparison/activitypub.md +++ /dev/null @@ -1 +0,0 @@ -# ActivityPub diff --git a/src/comparison/bittorrent.md b/src/comparison/bittorrent.md deleted file mode 100644 index 2fb68a2..0000000 --- a/src/comparison/bittorrent.md +++ /dev/null @@ -1 +0,0 @@ -# BitTorrent diff --git a/src/comparison/dmc.md b/src/comparison/data/dmc.md similarity index 100% rename from src/comparison/dmc.md rename to src/comparison/data/dmc.md diff --git a/src/comparison/eris.md b/src/comparison/data/eris.md similarity index 100% rename from src/comparison/eris.md rename to src/comparison/data/eris.md diff --git a/src/comparison/data/index.md b/src/comparison/data/index.md new file mode 100644 index 0000000..b7ac6f9 --- /dev/null +++ b/src/comparison/data/index.md @@ -0,0 +1,9 @@ +# Data Structures + +```{toctree} +:caption: Data Structures +:maxdepth: 1 + +eris +dmc +``` \ No newline at end of file diff --git a/src/comparison/hypercore.md b/src/comparison/hypercore.md deleted file mode 100644 index a94ede7..0000000 --- a/src/comparison/hypercore.md +++ /dev/null @@ -1,8 +0,0 @@ -# Dat/Hypercore - - -```{index} Hypercore; Holepunch -``` -## Holepunch - -https://docs.holepunch.to/ diff --git a/src/comparison/index.md b/src/comparison/index.md index 585eb10..8b5043b 100644 --- a/src/comparison/index.md +++ b/src/comparison/index.md @@ -4,40 +4,14 @@ All of this is TODO. Comparison to existing protocols and projects (just to situate in context, not talk shit obvs) ```{toctree} -:caption: P2P +:maxdepth: 2 -bittorrent -ipfs -hypercore -spritely +p2p/index +social/index +ld/index +data/index ``` -```{toctree} -:caption: Social - -activitypub -ssb -matrix -at_protocol -nostr -xmpp -``` - -```{toctree} -:caption: Linked Data - -solid -ld_fragments -ld_platform -nanopubs -``` - -```{toctree} -:caption: Data Structures - -eris -dmc -``` ## To be categorized @@ -47,7 +21,6 @@ dmc - Chord - Earthstar - https://earthstar-project.org/ - Freenet -- Manyverse - P2panda - v interesting! https://p2panda.org/ - SAFE - Storj diff --git a/src/comparison/ld/index.md b/src/comparison/ld/index.md new file mode 100644 index 0000000..21dedde --- /dev/null +++ b/src/comparison/ld/index.md @@ -0,0 +1,11 @@ +# Linked Data + +```{toctree} +:caption: Linked Data +:maxdepth: 1 + +solid +ld_fragments +ld_platform +nanopubs +``` \ No newline at end of file diff --git a/src/comparison/ld_fragments.md b/src/comparison/ld/ld_fragments.md similarity index 87% rename from src/comparison/ld_fragments.md rename to src/comparison/ld/ld_fragments.md index ea51385..839a745 100644 --- a/src/comparison/ld_fragments.md +++ b/src/comparison/ld/ld_fragments.md @@ -9,7 +9,7 @@ ## Summary -[Linked data fragments](https://linkeddatafragments.org/publications/) are designed to "fill in the middle" between entirely serverside ({index}`SPARQL`) or clientside (downloading a triple store) usage of linked data triples. SPARQL queries are notorious for being resource intensive, as queries can become much more complex than typical relational algebra and the server needs to resolve a potentially enormous number of resources. Placing all the logic on the server, rather than the client, is an architectural decision that has a complex history, but descends from the idea that the web should work by having "agents" that work on the web on our behalf[^semwebagents]. +[Linked data fragments](https://linkeddatafragments.org/publications/) are designed to "fill in the middle" between entirely serverside ({index}`SPARQL `) or clientside (downloading a triple store) usage of linked data triples. SPARQL queries are notorious for being resource intensive, as queries can become much more complex than typical relational algebra and the server needs to resolve a potentially enormous number of resources. Placing all the logic on the server, rather than the client, is an architectural decision that has a complex history, but descends from the idea that the web should work by having "agents" that work on the web on our behalf[^semwebagents]. Linked data fragments (LDFs) split the difference by placing more of the work on clients, with the server providing {index}`pre-computed sets of triples ` for a given selector. "Selector" is a purposefully general concept, but the LDF authors focus primarily on [Triple Pattern Fragments](https://linkeddatafragments.org/specification/triple-pattern-fragments/) that are composed of: diff --git a/src/comparison/ld_platform.md b/src/comparison/ld/ld_platform.md similarity index 100% rename from src/comparison/ld_platform.md rename to src/comparison/ld/ld_platform.md diff --git a/src/comparison/nanopubs.md b/src/comparison/ld/nanopubs.md similarity index 100% rename from src/comparison/nanopubs.md rename to src/comparison/ld/nanopubs.md diff --git a/src/comparison/solid.md b/src/comparison/ld/solid.md similarity index 100% rename from src/comparison/solid.md rename to src/comparison/ld/solid.md diff --git a/src/comparison/p2p/bittorrent.md b/src/comparison/p2p/bittorrent.md new file mode 100644 index 0000000..edbc924 --- /dev/null +++ b/src/comparison/p2p/bittorrent.md @@ -0,0 +1,111 @@ +```{index} pair: Protocol; BitTorrent +``` +(BitTorrent)= +# BitTorrent + +Bittorrent is unarguably the most successful p2p protocol to date, and needless to say we have much to learn walking in its footsteps. + +## Summary + +There are a number of very complete explanations of BitTorrent as a protocol, so we don't attempt one here outside of giving an unfamiliar reader a general sense of how it works. + +### Torrents + +Data is shared on BitTorrent in units described by `.torrent` files. They are [bencoded](https://en.wikipedia.org/wiki/Bencode) dictionaries that contain the following fields (in Bittorrent v1): + +- `announce`: The URL of one or several trackers (described below) +- `info`: A dictionary which includes metadata that describes the included file(s) and their length. The files are concatenated and then split into fixed-size pieces, and the info dict contains the SHA-1 hash of each piece. + +For example, a directory of three random files has a (decoded) `.torrent` file that looks like this: + +```json +{ + "announce": "http://example.tracker.com:8080/announce", + "info":{ + "files":[ + { + "length": 204800, + "path":["random-file3"] + }, + { + "length": 51200, + "path": ["random-file2"] + }, + { + "length": 102400, + "path":["random-file"] + } + ], + "name": "random", + "piece length": 16384, + "pieces": "" + } +} +``` + +The contents of a torrent file are then uniquely indexed by the `infohash`, which is the hash of the entire (bencoded) `info` dictionary. {key}`Magnet Links ` are an abbreviated form of the `.torrent` file that contain only the info-hash, which allows downloading peers to request and independently verify the rest of the info dictionary and start downloading without a complete `.torrent`. + +A generic magnet link looks like: + +`magnet:?xt=urn:btih:&dn=&tr=` + +BitTorrent v2 extends traditional `.torrent` files to include a {index}`Merkle Tree` which generalizes the traditional piece structure with some nice properties like being able to recognize unique files across multiple `.torrent`s, etc. + +### Trackers + +To connect peers that might have or be interested in the contents of a given `.torrent` file, the `.torrent` (but not its contents) are uploaded to a {index}`Tracker `. Peers interested in downloading a `.torrent` will connect to the trackers that it indicates in its `announce`[^announcelist] metadata, and the trackers will return a list of peer IP:Port combinations that the peer can download the file from. The downloading (leeching) peer doesn't need to trust the uploading (seeding) peers that the data they are sending is what is specified by the `.torrent`: the client checks the computed hash of each received piece against the hashes in the info dict, which is in turn checked against the info hash. + +Trackers solve the problem of {index}`Discovery` by giving a clear point where peers can find other peers from only the information contained within the `.torrent` itself. Trackers introduce a degree of brittleness, however, as they can become a single point of failure. Additional means of discovering peers have been added to BitTorrent over time, including [{index}`Distributed Hash Tables `](http://www.bittorrent.org/beps/bep_0005.html), [Peer Exchange](http://www.bittorrent.org/beps/bep_0011.html) + +Beyond their technical role, BitTorrent trackers also form a **social space** that is critical to understand its success as a protocol. While prior protocols like {index}`Gnutella ` (of {index}`Limewire `/{index}`Kazaa ` fame) had integrated search and peer discovery into the client and protocol itself, separating trackers as a means of organizing the BitTorrent ecosystem has allowed them to flourish as a means of experimenting with the kinds of social organization that keeps p2p swarms healthy. Tracker communities range from huge and disconnected as in widely-known public trackers like ThePirateBay, to tiny and close-knit like some niche private trackers. + +The bifurcated tracker/peer structure makes the overall system remarkably *resilient*. The trackers don't host any infringing content themselves, they just organize the metadata for finding it, so they are relatively long-lived and inexpensive to start compared to more resource- and risk-intensive piracy vectors. If they are shut down, the peers can continue to share amongst themselves through DHT, Peer Exchange, and any other trackers that are specified in the `.torrent` files. When a successor pops up, the members of the old tracker can then re-collect the `.torrent` files from the prior site, and without needing a massive re-upload of data to a centralized server repopulate the new site. + +```{seealso} +See more detailed discussion re: lessons from BitTorrent Trackers for social infrastructure in "[Archives Need Communities](https://jon-e.net/infrastructure/#archives-need-communities)" in {cite}`saundersDecentralizedInfrastructureNeuro2022` +``` + +### Protocol + +Peers that have been referred to one another from a tracker or other means start by attempting to make a connection with a 'handshake' that specifies the peer is connecting with BitTorrent and any other protocol extensions it supports. + +There are a number of subtleties in the transfer protocol, but it can be broadly summarized as a series of steps where peers tell each other which pieces they have, which they are interested in, and then sharing them amongst themselves. + +Though not explicitly in the protocol spec, two prominent design decisions are worth mentioning (See eg. {cite}`legoutRarestFirstChoke2006` for additional discussion). + +- **Peer Selection:** Which peers should I spent finite bandwidth uploading to? BitTorrent uses a variety of **Choke** algorithms that reward peers that reciprocate bandwidth. Choke algorithms are typically some variant of a 'tit-for-tat' strategy, although rarely the strict bitwise tit-for-tat favored by later blockchain systems and others that require a peer to upload an equivalent amount to what they have downloaded before they are given any additional pieces. Contrast this with [{index}`BitSwap`](#BitSwap) from IPFS. It is by *not* perfectly optimizing peer selection that BitTorrent is better capable of using more of its available network resources. +- **Piece Selection:** Which pieces should be uploaded/requested first? BitTorrent uses a **Rarest First** strategy, where a peer keeps track of the number of copies of each piece present in the swarm, and preferentially seeds the rarest pieces. This keeps the swarm healthy, rewarding keeping and sharing complete copies of files. This is in contrast to, eg. [SWARM](#SWARM) which explicitly rewards hosting and sharing the most in-demand pieces. + + +## Lessons + +(This section is mostly a scratchpad at the moment) + +### Adopt + +- Eventually had to add a generic 'extension extension' ([BEP 10](http://www.bittorrent.org/beps/bep_0010.html)), where on initial connection a peer informs another peer what extra features of the protocol it supports without needing to make constant adjustment to the underlying BitTorrent protocol. This pattern is adopted by most p2p protocols that follow, including [Nostr](#Nostr) which is almost *entirely* extensions. + - These extensions are not self-describing, however, and they require some centralized registry of extensions, see also [IPFS](#IPFS) and its handling of codecs, which curiously build a lot of infrastructure for self-describing extensions but at the very last stage fall back to a single git repository as the registry. +- `.torrent` files make for a very **low barrier to entry** and are extremely **portable.** They also operate over the existing idioms of files and folders, rather than creating their own filesystem abstraction. +- Explicit peer and piece selection algorithms are left out of the protocol specification, allowing individual implementations to experiment with what works. This makes it possible to exploit the protocol by refusing to seed ever, but this rarely occurs in practice, as people are not the complete assholes imagined in worst-case scenarios of scarcity. Indeed even the most selfish peers have the intrinsic incentive to upload, as by aggressively seeding the pieces that a leeching peer already has, the other peers in the swarm are less likely to "waste" the bandwidth of the seeders and more bandwidth can be allocated to pieces that the leecher doesn't already have. + + +### Adapt + +- **Metadata**. Currently all torrent metadata is contained within the tracker, so while it is possible to restore all the files that were indexed by a downed tracker, it is very difficult to restore all the metadata at a torrent level and above, eg. the organization of specific torrents into hierarchical categories that allow one to search for an artist, all the albums they have produced, all the versions of that album in different file formats, and so on. +- Give more in-protocol tools to social systems. This is tricky because we don't necessarily need to go down the road of DAOs and make strictly enforceable contracts. Recall that it is precisely by relaxing conditions of "optimality" that BitTorrent makes use of all resources available. +- **Cross-Swarm Indexing** - BitTorrent organizes all peer connections within swarms that are particular for a given `.torrent` file. We instead want to be able for a set of socially connected peers to be able to share many files. +- **Anonymity** This is also a tricky balance - We want to do three things that are potentially in conflict: + 1. Make use of the social structure of our peer swarm to be able to allocate automatic rehosting/sharding of files uploaded by close friends, etc. + 2. Maintain the possibility for loose anonymity where peers can share files without needing a large and well-connected social system to share files to them + 3. Avoid significant performance penalties from guarantees of strong network-level anonymity like Tor. +- **Trackers** are a good idea, even if they could use some updating. It is good to have an explicit entrypoint specified with a distributed, social mechanism rather than prespecified as a hardcoded entry point. It is a good idea to make a clear space for social curation of information, rather than something that is intrinsically bound to a torrent at the time of uploading. We update the notion of trackers with [Peer Federations](#Peer-Federations). + +## References + +- Bittorrent Protocol Specification (BEP 3): http://www.bittorrent.org/beps/bep_0003.html +- Bittorrent v2 (BEP 52): http://www.bittorrent.org/beps/bep_0052.html +- Magnet Links (BEP 9): http://www.bittorrent.org/beps/bep_0009.html +- More on BitTorrent and incentives - {cite}`cohenIncentivesBuildRobustness2003` + + +[^announcelist]: Or, properly, in the `announce-list` per ([BEP 12](http://www.bittorrent.org/beps/bep_0012.html)) \ No newline at end of file diff --git a/src/comparison/p2p/hypercore.md b/src/comparison/p2p/hypercore.md new file mode 100644 index 0000000..bc12415 --- /dev/null +++ b/src/comparison/p2p/hypercore.md @@ -0,0 +1,65 @@ +# Dat/Hypercore + +Hypercore, originally known as the Dat protocol {cite}`ogdenDatDistributedDataset2017`, and apparently now known as HolePunch, is a p2p protocol designed for versioned transfer of large files. + +## Summary + +- **Merkle Trees** - The underlying data model is a tree! + - Specifically an ordered tree +- **Version Controlled** - including incremental versioning +- **Sparse Replication** - Like bittorrent, it is possible to only download part of a given dataset. +- **Encrypted** transfer +- **Discovery** - Multiple mechanisms + - DNS name servers + - Multicast DNS + - Kademlia DHT + +### SLEEP + +Data structure that supports traversing dat graphs + +### Protocol + +Message container format: + +``` + + + +``` + +Header consists of +- **type** - + - 0 - `feed` + - 1 - `handshake` + - 2 - `info` - state changes, like changing from uploading to downloading + - 3 - `have` - telling other peers what chunks we have + - 4 - `unhave` - you deleted something you used to have + - 5 - `want` - tell me when you `have` these chunks + - 6 - `unwant` - I no longer want these! + - 7 - `request` - Get a single chunk of specifically indexed data. + - 8 - `cancel` - nevermind + - 9 - `data` - actually send/receive a chunk! +- **channel** - 0 for metadata, 1 for content + +## Lessons + +### Adopt + +- Using hashes of public keys during discovery rather than the public keys themselves. Avoids needing a bunch of key rotations. +- Use per-file hashing (as per BitTorrent v2 as well) + +### Adapt + +- Identities as cryptographic keys is great, but need some means of giving them petnames/shortnames. +- Tree-only data structures make everything append-only! +- The Random Access properties are really neat! (being able to read a specific 100MB chunk within a CSV) but they come with some tradeoffs! + +### Ignore + + +```{index} Hypercore; Holepunch +``` +## Holepunch + +https://docs.holepunch.to/ diff --git a/src/comparison/p2p/index.md b/src/comparison/p2p/index.md new file mode 100644 index 0000000..b03880d --- /dev/null +++ b/src/comparison/p2p/index.md @@ -0,0 +1,11 @@ +# P2P + +```{toctree} +:caption: P2P +:maxdepth: 1 + +bittorrent +ipfs +hypercore +spritely +``` \ No newline at end of file diff --git a/src/comparison/ipfs.md b/src/comparison/p2p/ipfs.md similarity index 65% rename from src/comparison/ipfs.md rename to src/comparison/p2p/ipfs.md index 0084900..0b28d28 100644 --- a/src/comparison/ipfs.md +++ b/src/comparison/p2p/ipfs.md @@ -1,9 +1,26 @@ ```{index} IPFS ``` +(IPFS)= # IPFS -If IPFS is {index}`BitTorrent` + {index}`git`, and {key}`ActivityPub` is {key}`Distributed Messaging` + {key}`Linked Data`, then p2p-ld is IPFS + ActivityPub. We build on IPFS and are heavily inspired by its design and shortcomings revealed by practical use. +If IPFS is {index}`BitTorrent` + {index}`git`, and {index}`ActivityPub` is {index}`Distributed Messaging` + {index}`Linked Data`, then p2p-ld is IPFS + ActivityPub. We build on IPFS and are heavily inspired by its design and shortcomings revealed by practical use. +## Summary + +```{index} IPFS; BitSwap +``` +(BitSwap)= +### BitSwap + +```{index} IPFS; IPLD +``` +(IPLD)= +### IPLD + +```{index} IPFS; libp2p +``` +(libp2p)= +### libp2p ## Problems @@ -13,8 +30,6 @@ If IPFS is {index}`BitTorrent` + {index}`git`, and {key}`ActivityPub` is {key}`D - Trust! eg. its use in phishing attacks is because there is no way to know who the hell a given CID is owned by. It needs to be possible to do social curation, or at leats know when something is riskier or not. - Lack of metadata means having to build a lot of shit post-hoc, like IPLD and multihashes and codecs and whatnot. -## IPLD - ## Overlap - {index}`Merkle DAG`s diff --git a/src/comparison/spritely.md b/src/comparison/p2p/spritely.md similarity index 100% rename from src/comparison/spritely.md rename to src/comparison/p2p/spritely.md diff --git a/src/comparison/social/activitypub.md b/src/comparison/social/activitypub.md new file mode 100644 index 0000000..073a871 --- /dev/null +++ b/src/comparison/social/activitypub.md @@ -0,0 +1,2 @@ +(activitypub)= +# ActivityPub diff --git a/src/comparison/at_protocol.md b/src/comparison/social/at_protocol.md similarity index 73% rename from src/comparison/at_protocol.md rename to src/comparison/social/at_protocol.md index 5606e8a..46fea8b 100644 --- a/src/comparison/at_protocol.md +++ b/src/comparison/social/at_protocol.md @@ -14,3 +14,15 @@ Specifically, AT protocol differentiates between *handles* and *identities*, whe That's about it, the rest of the handling of DID's is extremely centralized (see [did:plc](https://atproto.com/specs/did-plc) which requires resolution against a single domain), and the requirement of all posts to be funneled through [Big Graph Services](https://blueskyweb.xyz/blog/5-5-2023-federation-architecture) rather than directly peer to peer is transparently designed to ensure a marketing and advertising layer in between actors in the network. +## Lessons + +### Adopt + +### Adapt + +- Using Domains as identity is great! the PLC method is not so great! We should use domains as a way of bootstrapping nodes into the network, giving people some extrinsic means of discovering the active peers within their identity, and also a means of distributed bootstrapping into the network. + +### Ignore + + + diff --git a/src/comparison/social/index.md b/src/comparison/social/index.md new file mode 100644 index 0000000..f1223d3 --- /dev/null +++ b/src/comparison/social/index.md @@ -0,0 +1,13 @@ +# Social + +```{toctree} +:caption: Social +:maxdepth: 1 + +activitypub +ssb +matrix +at_protocol +nostr +xmpp +``` \ No newline at end of file diff --git a/src/comparison/matrix.md b/src/comparison/social/matrix.md similarity index 100% rename from src/comparison/matrix.md rename to src/comparison/social/matrix.md diff --git a/src/comparison/nostr.md b/src/comparison/social/nostr.md similarity index 96% rename from src/comparison/nostr.md rename to src/comparison/social/nostr.md index d32033a..b4c0772 100644 --- a/src/comparison/nostr.md +++ b/src/comparison/social/nostr.md @@ -1,3 +1,4 @@ +(Nostr)= # Nostr Again, though we have a general distrust of the anarcho-capitalists, it's worth a comparison. diff --git a/src/comparison/ssb.md b/src/comparison/social/ssb.md similarity index 90% rename from src/comparison/ssb.md rename to src/comparison/social/ssb.md index ad4a844..d455879 100644 --- a/src/comparison/ssb.md +++ b/src/comparison/social/ssb.md @@ -1,5 +1,7 @@ -```{index} Protocol; Secure Scuttlebutt +```{index} pair: Protocol; Secure Scuttlebutt + single: Secure Scuttlebutt ``` +(SSB)= # Secure Scuttlebutt @@ -87,6 +89,12 @@ Uses for metafeeds - Device B posts a `proof-of-key` message - If device B lost, `tombstone` the fusion identity message +## Implementations + +```{index} Secure Scuttlebutt; Manyverse +``` +### Manyverse + ## References - https://ssbc.github.io/scuttlebutt-protocol-guide/ \ No newline at end of file diff --git a/src/comparison/xmpp.md b/src/comparison/social/xmpp.md similarity index 100% rename from src/comparison/xmpp.md rename to src/comparison/social/xmpp.md diff --git a/src/conf.py b/src/conf.py index c42def6..5b63a68 100644 --- a/src/conf.py +++ b/src/conf.py @@ -71,19 +71,20 @@ bibtex_default_style = 'bbibtex' mermaid_init_js = """ mermaid.initialize({ "startOnLoad":true, - "theme": "base", - "themeVariables": { - "darkMode": true, - "primaryColor": "#202020", - "primaryBorderColor": "#00A5CF", - "primaryTextColor": "#FFFFFF", - "secondaryColor": "#ffffff", - "mainBkg": "#30303000", - "lineColor": "#999999" - } + "theme": "dark" }) """ + # "themeVariables": { + # "darkMode": true, + # "primaryColor": "#202020", + # "primaryBorderColor": "#00A5CF", + # "primaryTextColor": "#FFFF00", + # "secondaryColor": "#ff0000", + # "mainBkg": "#303030", + # "lineColor": "#999999" + # } + ## Formatting to handle dates that are in the `date` field rather than `year` import re import pybtex.plugin diff --git a/src/federation.md b/src/federation.md index c2654fe..45f2592 100644 --- a/src/federation.md +++ b/src/federation.md @@ -1,3 +1,4 @@ +(Peer-Federations)= # Federation Making supra-peer clusters with explicit governance and policies for rehosting and sharing! diff --git a/src/p2p_ld_docs.bib b/src/p2p_ld_docs.bib index ab7c76a..57526df 100644 --- a/src/p2p_ld_docs.bib +++ b/src/p2p_ld_docs.bib @@ -1,3 +1,33 @@ +@online{cohenIncentivesBuildRobustness2003, + title = {Incentives {{Build Robustness}} in {{BitTorrent}}}, + author = {Cohen, Bram}, + date = {2003-05-22}, + url = {http://bittorrent.org/bittorrentecon.pdf}, + abstract = {The BitTorrent file distribution system uses tit-fortat as a method of seeking pareto efficiency. It achieves a higher level of robustness and resource utilization than any currently known cooperative technique. We explain what BitTorrent does, and how economic methods are used to achieve that goal.}, + archive = {https://web.archive.org/web/20230619231854/http://bittorrent.org/bittorrentecon.pdf}, + langid = {english}, + pubstate = {preprint}, + keywords = {archived}, + file = {/Users/jonny/Dropbox/papers/zotero/C/CohenB/cohen_2003_incentives_build_robustness_in_bittorrent.pdf} +} + +@article{danielIPFSFriendsQualitative2022, + title = {{{IPFS}} and {{Friends}}: {{A Qualitative Comparison}} of {{Next Generation Peer-to-Peer Data Networks}}}, + shorttitle = {{{IPFS}} and {{Friends}}}, + author = {Daniel, Erik and Tschorsch, Florian}, + date = {2022}, + journaltitle = {IEEE Communications Surveys \& Tutorials}, + volume = {24}, + number = {1}, + pages = {31--52}, + issn = {1553-877X}, + doi = {10.1109/COMST.2022.3143147}, + abstract = {Decentralized, distributed storage offers a way to reduce the impact of data silos as often fostered by centralized cloud storage. While the intentions of this trend are not new, the topic gained traction due to technological advancements, most notably blockchain networks. As a consequence, we observe that a new generation of peer-to-peer data networks emerges. In this survey paper, we therefore provide a technical overview of the next generation data networks. We use select data networks to introduce general concepts and to emphasize new developments. Specifically, we provide a deeper outline of the Interplanetary File System and a general overview of Swarm, the Hypercore Protocol, SAFE, Storj, and Arweave. We identify common building blocks and provide a qualitative comparison. From the overview, we derive future challenges and research goals concerning data networks.}, + eventtitle = {{{IEEE Communications Surveys}} \& {{Tutorials}}}, + keywords = {Blockchain networks,Blockchains,Cloud computing,data networks,File systems,Next generation networking,overlay networks,Overlay networks,Peer-to-peer computing,peer-to-peer networks,Protocols}, + file = {/Users/jonny/Dropbox/papers/zotero/D/DanielE/daniel_2022_ipfs_and_friends2.pdf} +} + @article{kunzePersistenceStatementsDescribing2017, title = {Persistence {{Statements}}: {{Describing Digital Stickiness}}}, shorttitle = {Persistence {{Statements}}}, @@ -20,6 +50,24 @@ file = {/Users/jonny/Dropbox/papers/zotero/K/KunzeJ/kunze_2017_persistence_statements.pdf} } +@inproceedings{legoutRarestFirstChoke2006, + title = {Rarest First and Choke Algorithms Are Enough}, + booktitle = {Proceedings of the 6th {{ACM SIGCOMM}} on {{Internet}} Measurement - {{IMC}} '06}, + author = {Legout, Arnaud and Urvoy-Keller, G. and Michiardi, P.}, + date = {2006}, + pages = {203}, + publisher = {{ACM Press}}, + location = {{Rio de Janeriro, Brazil}}, + doi = {10.1145/1177080.1177106}, + url = {http://portal.acm.org/citation.cfm?doid=1177080.1177106}, + urldate = {2018-11-09}, + abstract = {The performance of peer-to-peer file replication comes from its piece and peer selection strategies. Two such strategies have been introduced by the BitTorrent protocol: the rarest first and choke algorithms. Whereas it is commonly admitted that BitTorrent performs well, recent studies have proposed the replacement of the rarest first and choke algorithms in order to improve efficiency and fairness. In this paper, we use results from real experiments to advocate that the replacement of the rarest first and choke algorithms cannot be justified in the context of peer-to-peer file replication in the Internet.}, + eventtitle = {The 6th {{ACM SIGCOMM}}}, + isbn = {978-1-59593-561-8}, + langid = {english}, + file = {/Users/jonny/Dropbox/papers/zotero/L/LegoutA/legout_2006_rarest_first_and_choke_algorithms_are_enough2.pdf} +} + @online{lemmer-webberHeartSpritelyDistributed, title = {The {{Heart}} of {{Spritely}}: {{Distributed Objects}} and {{Capability Security}}}, author = {Lemmer-Webber, Christine and Farmer, Randy and Sims, Juliana}, @@ -31,6 +79,22 @@ file = {/Users/jonny/Dropbox/papers/zotero/L/Lemmer-WebberC/lemmer-webber_the_heart_of_spritely.pdf;/Users/jonny/Zotero/storage/32A9YVLN/spritely-core.html} } +@online{ogdenDatDistributedDataset2017, + type = {preprint}, + title = {Dat - {{Distributed Dataset Synchronization And Versioning}}}, + author = {Ogden, Maxwell}, + date = {2017-01-31}, + eprinttype = {Open Science Framework}, + doi = {10.31219/osf.io/nsv2c}, + url = {https://osf.io/nsv2c}, + urldate = {2021-10-01}, + abstract = {Dat is a protocol designed for streaming datasets over networks. Data in Dat Streams can be accessed ran- domly or fully replicated, be updated incrementally, and have the integrity of their contents be trusted. Dat clients can simultaneously be uploading and/or downloading, exchanging pieces of data with other clients over Dat Streams in a swarm on demand. Datasets can be multi-homed such that if the original source goes offline clients can choose to automatically discover additional sources. As data is added to a Dat repository, updated files are split into pieces using Rabin fingerprinting and deduplicated against known pieces to avoid retransmission of data. Dat Streams are automatically verified using secure hashes mean- ing data is protected against tampering or corruption. Dat guarantees privacy if the Dat Link is kept secret, but does not provide authentication of sources, only authentication of data.}, + langid = {english}, + pubstate = {preprint}, + keywords = {p2p}, + file = {/Users/jonny/Dropbox/papers/zotero/O/OgdenM/ogden_2017_dat_-_distributed_dataset_synchronization_and_versioning.pdf} +} + @online{saundersDecentralizedInfrastructureNeuro2022, title = {Decentralized {{Infrastructure}} for ({{Neuro}})Science}, author = {Saunders, Jonny L.}, diff --git a/src/querying.md b/src/querying.md index c81f6dc..3d5a339 100644 --- a/src/querying.md +++ b/src/querying.md @@ -31,4 +31,4 @@ References without version qualification indicate the most recent version at the ## Query Fragments -Using blank subgraphs to specify queries like {index}`Linked Data; Fragments` and {index}`SPARQL` \ No newline at end of file +Using blank subgraphs to specify queries like {index}`Linked Data; Fragments` and {index}`SPARQL ` \ No newline at end of file diff --git a/src/sketchpad.md b/src/sketchpad.md index 9b9ba37..5aee33b 100644 --- a/src/sketchpad.md +++ b/src/sketchpad.md @@ -1,8 +1,148 @@ # Sketchpad -Dummy change to check that we don't invalidate the rust cache on CI. -## System Diagram +## System Components + +Strictly schematic and keeping track of different pieces. Not indicative of code structure and definitely not final + +```{mermaid} +graph + subgraph data + direction TB + Schema + Triples + Translation + Codec + end + + Schema -->|Models| Triples + Codec <-->|Read/Write| Triples + External[External Data] --> Codec + External --> Translation + Translation <-->|Maps Between| Schema + + subgraph peer + direction TB + Identity + Instance + Beacon + end + + Identity -->|Has Many| Instance + Beacon -->|Indicates| Identity + Triples -->|Stored By| Instance + + + subgraph social + Federation + Permissions + Sharding + end + + Schema -->|Defines| Federation + +``` + +## Rough Roadmap + +Enough to get us to SfN for now... + +```{mermaid} +gantt + dateFormat YYYY-MM + + section Data Modeling and Transfer + Write Container Draft Spec :active, container, 2023-06, 2M + Experiment with basic Networking components :networking, 2023-07, 2M + Translate NWB Schema : trans, after container, 1M + Codec for hdf5 :codec1, after container, 1M + Webseeds with HTTP/S3: webseed, after trans, 1M + +``` + +## Data + +### Triple Data Model + +```{mermaid} +erDiagram + TRIPLE { + id subject + id predicate + id object + } + + CONTAINER { + str content_hash + str container_hash + str version_hash + str name + id creator + int timestamp + array capabilities + } + + + CONTAINER ||--|{ TRIPLE : hashes + +``` + +- `content_hash` - hash of contained triple graph, after resolution +- `container_hash` - original hash of `content_hash` and metadata of container when first created +- `version_hash` - the version of this particular instance of the container, excluding `container_hash` - should be equal to container_hash when first instantiating. + +Example + +```{mermaid} +graph TB + Root + + Root --> D1Root + + subgraph Dataset1 + direction TB + D1Root + D1Meta + D1Name + D1Date + D1Etc + + D1Root --> D1Meta + D1Meta --> D1Name + D1Meta --> D1Date + D1Meta --> D1Etc + end + + Root --> Imported + subgraph Vocabs + Imported[Imported Schema] + Term1 + + Imported --> Term1 + + end +``` + +Types of references and means of identifying +- Absolute (hash of a container): Containers are the only uniquely identifiable thing in the network. Everything else has to be done relative to them. +- Relative (resolve against the containing context) +- Container: `. -> pred -> obj` - links that describe the container. +- External: How to refer to some external but otherwise identifiable thing? eg. How do I identify that I am making a translation layer for `numpy` when they aren't involved with p2p-ld at all? I should be able to use a variety of tactics - eg. I should be able to say `pypi:numpy` and then in turn identify `pypi` by URI. If someone else declares it by saying `url:numpy` and referring to their homepage, for example, then later we can declare those things as equal + +Resolving Cycles +- The identity is the root node of the graph, so do a breath-first + +Resolving names +How do we go from an external hash to another object? Our peer should be able to hydrate every content hash into an `author:hash` pair so that our downloading peer knows who to ask about shit. Or if we are the owner of that thing they know they can ask us for an additional container. + + + + + + + + +## Scrap Just a stub to check if mermaid works