Linked Data Fragments#

Summary#

Linked data fragments are designed to “fill in the middle” between entirely serverside (SPARQL) or clientside (downloading a triple store) usage of linked data triples. SPARQL queries are notorious for being resource intensive, as queries can become much more complex than typical relational algebra and the server needs to resolve a potentially enormous number of resources. Placing all the logic on the server, rather than the client, is an architectural decision that has a complex history, but descends from the idea that the web should work by having “agents” that work on the web on our behalf[1].

Linked data fragments (LDFs) split the difference by placing more of the work on clients, with the server providing pre-computed sets of triples for a given selector. “Selector” is a purposefully general concept, but the LDF authors focus primarily on Triple Pattern Fragments that are composed of:

  • A Triple Pattern, a ?subject ?predicate ?object that defines the contents of the fragment

  • Metadata, specifically a triples predicate indicating the estimated total number of triples in the fragment since large fragments need to be paginated, and

  • Hypermedia Controls that can be used to retrieve other related fragments. For example, a triple pattern corresponding to s:people p:named o:tom would have links to retrieve all the related combinations including each field being unspecified, eg. any triplet whose subject is a person, predicate is named and so on.

The hosting server then partitions all of the triples in a given dataset into all the possible combinations of subjects, predicates, and objects.

Overlap#

p2p-ld follows Linked Data Fragments in that it emphasizes clientside logic rather than query logic on the network. Executing distributed complex queries adds substantial complexity to the protocol and would potentially import a lot of the problems with SPARQL like heightened resource requirements and potential for abuse for denial of service.

Differences#

  • re: linked data platform, p2p-ld also concerns “leaf” nodes with binary data accessed via codec, rather than represented as triplets. The results of queries are thus not necessarily imagined to be single factual assertions, but datasets, images, documents, posts, etc. -> So the container concept is less rigidly defined than an LDF host with a completely partitioned triplet graph.

Additionally, by being an explicitly social system, p2p-ld is unconcerned with arbitrary query execution time on anonymous data systems - the expectation is that individual peers and peer federations

To be very clear!

p2p-ld does not attempt to replace or improve SPARQL. There are a number of philosophical and practical differences in the design of the greater semantic web, and particularly its instantiation as bigass corporate knowledge graphs. We will do what we can to integrate with RDF and RDF-like technologies, but p2p-ld is not a distributed SPARQL endpoint.

There are a number of philosophical

References#