This commit is contained in:
sneakers-the-rat 2023-06-14 22:41:01 -07:00
parent b1712aa7ab
commit f8cad08bd2
9 changed files with 143 additions and 14 deletions

View file

@ -28,6 +28,7 @@ xmpp
solid
ld_fragments
ld_platform
nanopubs
```
@ -44,16 +45,18 @@ dmc
- Arweave
- CAN
- Chord
- Earthstar
- Earthstar - https://earthstar-project.org/
- Freenet
- Manyverse
- P2panda
- P2panda - v interesting! https://p2panda.org/
- SAFE
- Storj
- [Swarm](https://www.ethswarm.org/swarm-whitepaper.pdf)
- not interesting, based around coins and smart contracts
- kademlia routing
- chunks stored by nodes close in hash space
- Repute.Social
- LinkedTrust.us
## Points of comparison

View file

@ -2,11 +2,16 @@
```
# Linked Data Fragments
[Containers](Containers) are one example of:
> However, we envision that different kinds of ldf partitionings will emerge, and that these might even vary dynamically depending on server load. Perhaps a semantic way to express the data, metadata, and hypermedia controls of each fragment will be necessary. -{cite}`verborghWebScaleQueryingLinked2014`
## Summary
[Linked data fragments](https://linkeddatafragments.org/publications/) are designed to "fill in the middle" between entirely serverside ({index}`SPARQL`) or clientside (downloading a triple store) usage of linked data triples. SPARQL queries are notorious for being resource intensive, as queries can become much more complex than typical relational algebra and the server needs to resolve a potentially enormous number of resources. Placing all the logic on the server, rather than the client, is an architectural decision that has a complex history, but descends from the idea that the web should work by having "agents" that work on the web on our behalf[^semwebagents].
Linked data fragments (LDFs) split the difference by placing more of the work on clients, with the server providing pre-computed sets of triples for a given selector. "Selector" is a purposefully general concept, but the LDF authors focus primarily on [Triple Pattern Fragments](https://linkeddatafragments.org/specification/triple-pattern-fragments/) that are composed of:
Linked data fragments (LDFs) split the difference by placing more of the work on clients, with the server providing {index}`pre-computed sets of triples <pair: Graph; Partitioning>` for a given selector. "Selector" is a purposefully general concept, but the LDF authors focus primarily on [Triple Pattern Fragments](https://linkeddatafragments.org/specification/triple-pattern-fragments/) that are composed of:
- A **Triple Pattern**, a `?subject ?predicate ?object` that defines the contents of the fragment
- **Metadata**, specifically a `triples` predicate indicating the estimated total number of triples in the fragment since large fragments need to be paginated, and
@ -16,15 +21,18 @@ The hosting server then partitions all of the triples in a given dataset into al
## Overlap
p2p-ld follows Linked Data Fragments in that it emphasizes clientside logic rather than query logic on the network. Executing distributed complex queries adds substantial complexity to the protocol and would potentially import a lot of the problems with SPARQL like heightened resource requirements and potential for abuse for denial of service.
p2p-ld follows Linked Data Fragments in that it emphasizes clientside logic rather than query logic on the network. Executing distributed queries with as much logic as SPARQL can embed adds substantial complexity to the protocol and would potentially import a lot of the problems with SPARQL like heightened resource requirements and potential for abuse for denial of service.
LDF is a strategy for (pre-)partitioning a dataset of triples into cacheable chunks, rather than having the server query over the entire graph at once. It also emphasizes querying as iteration: do many small queries in sequence rather than one large query and waiting for the entire result.
## Differences
- re: linked data platform, p2p-ld also concerns "leaf" nodes with binary data accessed via codec, rather than represented as triplets. The results of queries are thus not necessarily imagined to be single factual assertions, but datasets, images, documents, posts, etc. -> So the container concept is less rigidly defined than an LDF host with a completely partitioned triplet graph.
- Primarily, containers are more generic than LDFs. Where LDFs create a deterministic partitioning of a set of triples (all combinations, including wildcards, of each subject, predicate, and object in the dataset), p2p-ld partitions based on meaning and use. They are not mutually exclusive, though - one could also make containers that correspond to the expected LDF format.
Additionally, by being an explicitly *social* system, p2p-ld is unconcerned with arbitrary query execution time on anonymous data systems - the expectation is that individual peers and {index}`peer federations <Peer Federations>`
- re: {index}`Linked Data Platform`, p2p-ld also concerns "leaf" nodes with binary data accessed via codec, rather than represented as triplets. The results of queries are thus not necessarily imagined to be single factual assertions, but datasets, images, documents, posts, etc. -> So the container concept is less rigidly defined than an LDF host with a completely partitioned triplet graph.
Additionally, by being an explicitly *social* system, p2p-ld is unconcerned with arbitrary query execution on anonymous data systems - the expectation is that individual peers and {index}`peer federations <Peer Federations>` manage their resources and the norms around their use. Accordingly, they would manage a set of containers (or, the partition of its graph) that
```{admonition} To be very clear!
@ -33,8 +41,6 @@ Additionally, by being an explicitly *social* system, p2p-ld is unconcerned with
p2p-ld does not attempt to replace or improve SPARQL. There are a number of philosophical and practical differences in the design of the greater semantic web, and particularly its instantiation as bigass corporate knowledge graphs. We will do what we can to integrate with RDF and RDF-like technologies, but p2p-ld is *not* a distributed SPARQL endpoint.
```
There are a number of philosophical
[^semwebagents]: See the history of the early to middle semantic web, discussed in {cite}`saundersSurveillanceGraphs2023`

View file

@ -0,0 +1,103 @@
```{index} Linked Data; Platform
```
# Linked Data Platform
```{index} Containers
```
## Containers
https://www.w3.org/TR/ldp/#ldpc
We extend the notion of LDP containers!
Terms:
- Containment Triples
- Membership Triples
Types:
- Direct Containers
```turtle
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix ldp: <http://www.w3.org/ns/ldp#>.
<http://example.org/c1/>
a ldp:BasicContainer;
dcterms:title "A very simple container";
ldp:contains <r1>, <r2>, <r3>.
```
- Indirect Containers - a way of interacting with existing data
Given:
```turtle
@prefix ldp: <http://www.w3.org/ns/ldp#>.
@prefix o: <http://example.org/ontology#>.
<http://example.org/netWorth/nw1/>
a o:NetWorth;
o:netWorthOf <http://example.org/users/JohnZSmith>;
o:asset
<assets/a1>,
<assets/a2>;
o:liability
<liabilities/l1>,
<liabilities/l2>,
<liabilities/l3>.
```
we can make direct containers that describe the assets and liabilities as containers without modifying the original data
```turtle
@prefix ldp: <http://www.w3.org/ns/ldp#>.
@prefix dcterms: <http://purl.org/dc/terms/>.
@prefix o: <http://example.org/ontology#>.
<http://example.org/netWorth/nw1/assets/>
a ldp:DirectContainer;
dcterms:title "The assets of JohnZSmith";
ldp:membershipResource <http://example.org/netWorth/nw1/>;
ldp:hasMemberRelation o:asset;
ldp:contains <a1>, <a2>.
```
Additionally, if one were to add a new set of "advisors," we would make an indirect container that tells us we need an additional triple when creating new members of the container (`foaf:primaryTopic`):
```turtle
<advisors/>
a ldp:IndirectContainer;
dcterms:title "The asset advisors of JohnZSmith";
ldp:membershipResource <>;
ldp:hasMemberRelation o:advisor;
ldp:insertedContentRelation foaf:primaryTopic;
ldp:contains
<advisors/bob>, # URI of a document a.k.a. an information resource
<advisors/marsha>. # describing a person
```
(still unclear to me what is different about that, still reading.)
| Completed Request | Membership Effect | Containment Effect |
| ----------------- | ------------------- | ------------------ |
| Create in Basic Container | New triple: (LDPC, ldp:contains, LDPR) | Same
| Create in Direct Container | New triple links LDP-RS to created LDPR. LDP-RS URI may be same as LDP-DC | New triple: (LDPC, ldp:contains, LDPR) |
| Create in Indirect Container | New triple links LDP-RS to content indicated URI | New triple: (LDPC, ldp:contains, LDPR) |
| Resource deleted | Membership triple may be removed | (LDPC, ldp:contains, LDPR) triple is removed |
| Container deleted | Triples and member resources may be removed | Triples of form (LDPC, ldp:contains, LDPR) and contained LDPRs may be removed |
## Similarities
- Separation between container data and metadata - "minimal-container triples," what remains in the container when the container has zero members and zero contained resources
## Differences
- Containers are not recursive??or at least that is suggested by the 'net worth' example that explains why we can't just turn the original subject into a container: "can't mix assets and liabilities" and i am like why not make one container for the person and then subcontainers for each of the types?
## References
- Spec: https://www.w3.org/TR/ldp/
- Use cases and requirements: https://www.w3.org/TR/ldp-ucr/
- eg. using virtuoso. https://github.com/vemonet/virtuoso-ldp

View file

@ -5,3 +5,5 @@ Stuff we like about XMPP
- Service discovery
- https://xmpp.org/extensions/xep-0030.html
- Protocol interoperability
- https://en.wikipedia.org/wiki/BOSH_(protocol)
- https://en.wikipedia.org/wiki/Jingle_(protocol)

View file

@ -7,6 +7,7 @@ Triplet graphs similar to linked data fragments with envelopes. decoupling conte
- Versioning
- Typed objects with formatting
(Containers)=
## Containers
- Packets of LD-triplets that contain
@ -30,6 +31,7 @@ Triplet graphs similar to linked data fragments with envelopes. decoupling conte
- Capabilities: A container can specify different capabilities that another account can take (eg. "Like", "Upvote", "Reply")
- Capabilities should also contain a permissions scope, if none is present, the global scope is assumed.
- Since Identities are just a special form of container, they too can advertise different actions that they support with capabilities.
- Basically a container is a merkle DAG with binary data at its leaves a la the {index}`Linked Data; Platform`
Re hashing a graph: the container always has one root node that is the container's identity from which a graph traversal starts. A {index}`Merkle DAG` is then constructed starting from the leaves.

View file

@ -26,3 +26,7 @@ Names and locations are *linguistic* not *mathematical.* Rather than trying to d
We should neither sacrifice control of the internet to platform giants nor should we insist that self-hosting is the only alternative. If the alternative to using Google Docs or Slack requires me to be a professional sysadmin, or even to keep a raspberry pi plugged in and online at all times, it isn't an alternative for 95% of people.
It should be possible to share resources such that relatively few people need to maintain persistent network infrastructure, and it should be possible to accomodate their leaving at any time. It should also be very difficult for one or a few actors to make a large number of other peers on the network dependent on them, claiming de-facto control over an ostensibly decentralized system (lookin at you mastodon.social).
## Lack of Agency is a tighter bottleneck than Performance
(rather than optimizing for performance of massive queries over huge datasets, we optimize for the ability for individual people to organize the resources that would be relevant to them. The thing that is limiting our ability to make sense of data in neuroscience, for example, is not that our servers aren't fast enough, but the barriers to making well-structured data are too high, as is the expertise to conduct large scale queries. Even then, our ability to *understand* and *make sense of* the information is even less constrained by performance, and more by the absence of infrastructure to link and communicate heterogeneous things. We focus on small-scale computing not only for ethical reasons, but also practical ones.)

View file

@ -5,6 +5,7 @@ How do we find people and know how to connect to them?
- Bootstrapping initial connections
- Gossiping
- Hole punching
- See: https://docs.holepunch.to/apps/keet.io
# Scraps
@ -18,3 +19,5 @@ https://xmpp.org/extensions/xep-0030.html
> - any additional items associated with the entity, whether or not they are addressable as JIDs
>
> All three MUST be supported, but the first two kinds of information relate to the entity itself whereas the third kind of information relates to items associated with the entity itself; therefore two different query types are needed.
- subscription to particular data types or query patterns - each peer keeps a list of things that we should tell it about when we make a new graph. So I might want to always see new posts and pre-emptively index those but I don't care about your datasets. This should probably exist at the level of a peer relationship rather than a peer outbox-like thing

View file

@ -2,12 +2,14 @@
How do we find peers that have subgraphs that are responsive to what we want?
- Query results should then become their own containers, with the component triplets of the query being hashed at the root level, so then the query-er can cache the query results (in case anyone else makes the same query) while also rehosting the original containers returned from the query.
## Syntax
(qlocation)=
### Location
How to refer to a given [container](data_structures.html#Containers), eg.
How to refer to a given [container](Containers), eg.
```
@user:containerName:childName
@ -20,7 +22,6 @@ or numerically
```
Children
### Version
@ -30,4 +31,4 @@ References without version qualification indicate the most recent version at the
## Query Fragments
Using blank subgraphs to specify queries
Using blank subgraphs to specify queries like {index}`Linked Data; Fragments` and {index}`SPARQL`

View file

@ -29,3 +29,8 @@ erDiagram
- Triplets
- Containers
- Codecs
## Random notes
- re: {index}`Backlinks` - https://lists.w3.org/Archives/Public/public-rdf-comments/2012Jul/0007.html