RDF and Friends#

RDF is one of the elephants in the room when it comes to triplet graphs and linked data. Its history is complex and torrid, known as hopelessly and aggressively complex or a divine calling, depending on your disposition.

p2p-ld does not necessarily seek to be an RDF-based p2p protocol, though strategizing for interoperability with RDF and RDF-derivative formats would be nice.

One of the primary challenges to using RDF-like formats is the conflation of URLs and URIs as the primary identifiers for schema and objects. This idea (roughly) maps onto the “neat” characterization of linked data where everything should have ideally one canonical representation, and there should be a handful of “correct” general-purpose schema capable of modeling the world.

We depart from that vision, instead favoring radical vernacularism [Saunders, 2023]. URIs are extremely general, and include decentralized identifiers like multiaddrs

RDF And Friends#

Important

Return here re: RDF canonicalization and IPFS https://github.com/multiformats/multicodec/pull/261

JSON-LD#

Challenges#

Ordered Data#

The edges from a node in a graph are unordered, which makes array and tabular data difficult to work with in RDF!

This has been approached in a few ways:

RDF uses a godforsaken rdf:first rdf:rest linked list syntax

eg. one would express MyList which contains the Friends ["Arnold", "Bob", "Carly"] in (longhand) turtle as

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <https://example.com> .

:MyList :Friends :list1 . 

:list1
  rdf:first :Amy ;
  rdf:rest :list2 .

:list2
  rdf:first :Bob ;
  rdf:rest :list3 .

:list3
  rdf:first :Carly ;
  rdf:rest rdf:nil .

And thankfully turtle has a shorthand, which isn’t so bad:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix : <https://example.com> .

:MyList
  :Friends (
    :Amy
    :Bob
    :Carly
  ).

Both of these correspond to the triplet graph:

Which is not great.

JSON-LD uses a @list keyword:

{
  "@context": {"foaf": "http://xmlns.com/foaf/0.1/"},
  "@id": "http://example.org/people#joebob",
  "foaf:nick": {
    "@list": [ "joe", "bob", "jaybee" ]
  },
}

which can be expanded recursively to mimic arrays

JSON-LD

{
  "@context": {
    "@vocab": "https://purl.org/geojson/vocab#",
    "coordinates": {"@container": "@list"}
  },
  "geometry": {
    "coordinates": [
        [
            [-10.0, -10.0],
            [10.0, -10.0],
            [10.0, 10.0],
            [-10.0, -10.0]
        ]
    ]
  }
}

Turtle

@prefix geojson: <https://purl.org/geojson/vocab#>.

[
  a geojson:Feature ;
  geojson:bbox (-10 -10 10 10) ;
  geojson:geometry [
    a geojson:Polygon ;
    geojson:coordinates (
      (
        (-10 -10)
        (10 -10)
        (10 10)
        (-10 -10)
      )
    )
  ]
] .

Tabular Data#

As an overbrief summary, converting data from tables to RDF needs a schema mapping:

Columns to Properties
Column names in source table to symbolic names used within the conversion schema
datatype (for representation in concrete RDF syntax)

According to the Tabular Data to RDF recommendation, one would convert the following table (encoded as csv):

countryCode	latitude	longitude	name
AD	42.5	1.6	Andorra
AE	23.4	53.8	United Arab Emirates
AF	33.9	67.7	Afghanistan

Into one of two “minimal” or “standard” formats of RDF:

Minimal mode

@base <http://example.org/countries.csv> .

:8228a149-8efe-448d-b15f-8abf92e7bd17
  <#countryCode> "AD" ;
  <#latitude> "42.5" ;
  <#longitude> "1.6" ;
  <#name> "Andorra" .

:ec59dcfc-872a-4144-822b-9ad5e2c6149c
  <#countryCode> "AE" ;
  <#latitude> "23.4" ;
  <#longitude> "53.8" ;
  <#name> "United Arab Emirates" .

:e8f2e8e9-3d02-4bf5-b4f1-4794ba5b52c9
  <#countryCode> "AF" ;
  <#latitude> "33.9" ;
  <#longitude> "67.7" ;
  <#name> "Afghanistan" .

Standard mode

@base <http://example.org/countries.csv> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:d4f8e548-9601-4e41-aadb-09a8bce32625 a csvw:TableGroup ;
  csvw:table [ a csvw:Table ;
    csvw:url <http://example.org/countries.csv> ;
    csvw:row [ a csvw:Row ;
      csvw:rownum "1"^^xsd:integer ;
      csvw:url <#row=2> ;
      csvw:describes :8228a149-8efe-448d-b15f-8abf92e7bd17
    ], [ a csvw:Row ;
      csvw:rownum "2"^^xsd:integer ;
      csvw:url <#row=3> ;
      csvw:describes :ec59dcfc-872a-4144-822b-9ad5e2c6149c
    ], [ a csvw:Row ;
      csvw:rownum "3"^^xsd:integer ;
      csvw:url <#row=4> ;
      csvw:describes :e8f2e8e9-3d02-4bf5-b4f1-4794ba5b52c9
    ]
  ] .

:8228a149-8efe-448d-b15f-8abf92e7bd17
  <#countryCode> "AD" ;
  <#latitude> "42.5" ;
  <#longitude> "1.6" ;
  <#name> "Andorra" .

:ec59dcfc-872a-4144-822b-9ad5e2c6149c
  <#countryCode> "AE" ;
  <#latitude> "23.4" ;
  <#longitude> "53.8" ;
  <#name> "United Arab Emirates" .

:e8f2e8e9-3d02-4bf5-b4f1-4794ba5b52c9
  <#countryCode> "AF" ;
  <#latitude> "33.9" ;
  <#longitude> "67.7" ;
  <#name> "Afghanistan" .

The recommendation also covers more complex situations. These make use of a JSON schema that handles mapping between the CSV data and RDF.

By default, each row of a table describes a single RDF resource, and each column has a single property (so each cell is a triple).

For example this table of concerts:

Name	Start Date	Location Name	Location Address	Ticket Url
B.B. King	2014-04-12T19:30	Lupo’s Heartbreak Hotel	79 Washington St., Providence, RI	https://www.etix.com/ticket/1771656
B.B. King	2014-04-13T20:00	Lynn Auditorium	Lynn, MA, 01901	http://frontgatetickets.com/venue.php?id=11766

Needs to be mapped to 3 separate resources with 7 properties. The values are not transformed, just grouped in different places under different resources. Notice how in the standard mode the csvw:describes entry can have three objects. The turtle is surprisingly humane.

The JSON schema describes five concrete triples that carry the data from the CSV, and five virtual triples that give the resources types and link them together. Abstractions over table iterators take the form of "#event-{_row}" to create a resource <#event-1>, <#event-2>, etc. for each row.

Minimal mode

@base <http://example.org/events-listing.csv> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<#event-1> a schema:MusicEvent ;
  schema:name "B.B. King" ;
  schema:startDate "2014-04-12T19:30:00"^^xsd:dateTime ;
  schema:location <#place-1> ;
  schema:offers <#offer-1> .

<#place-1> a schema:Place ;
  schema:name "Lupo’s Heartbreak Hotel" ;
  schema:address "79 Washington St., Providence, RI" .

<#offer-1> a schema:Offer ;
  schema:url "https://www.etix.com/ticket/1771656"^^xsd:anyURI .

<#event-2> a schema:MusicEvent ;
  schema:name "B.B. King" ;
  schema:startDate "2014-04-13T20:00:00"^^xsd:dateTime ;
  schema:location <#place-2> ;
  schema:offers <#offer-2> .

<#place-2> a schema:Place ;
  schema:name "Lynn Auditorium" ;
  schema:address "Lynn, MA, 01901" .

<#offer-2> a schema:Offer ;
  schema:url "http://frontgatetickets.com/venue.php?id=11766"^^xsd:anyURI .

Standard mode

@base <http://example.org/events-listing.csv> .
@prefix csvw: <http://www.w3.org/ns/csvw#> .
@prefix schema: <http://schema.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

:95cc7970-ce99-44b0-900c-e2c2c028bbd3 a csvw:TableGroup ;
  csvw:table [ a csvw:Table ;
    csvw:url <http://example.org/events-listing.csv> ;
    csvw:row [ a csvw:Row ;
      csvw:rownum 1 ;
      csvw:url <#row=2> ;
      csvw:describes <#event-1>, <#place-1>, <#offer-1>
    ], [ a csvw:Row ;
      csvw:rownum 2 ;
      csvw:url <#row=3> ;
      csvw:describes <#event-2>, <#place-2>, <#offer-2>
    ]
  ] .

<#event-1> a schema:MusicEvent ;
  schema:name "B.B. King" ;
  schema:startDate "2014-04-12T19:30:00"^^xsd:dateTime ;
  schema:location <#place-1> ;
  schema:offers <#offer-1> .

<#place-1> a schema:Place ;
  schema:name "Lupo’s Heartbreak Hotel" ;
  schema:address "79 Washington St., Providence, RI" .

<#offer-1> a schema:Offer ;
  schema:url "https://www.etix.com/ticket/1771656"^^xsd:anyURI .

<#event-2> a schema:MusicEvent ;
  schema:name "B.B. King" ;
  schema:startDate "2014-04-13T20:00:00"^^xsd:dateTime ;
  schema:location <#place-2> ;
  schema:offers <#offer-2> .

<#place-2> a schema:Place ;
  schema:name "Lynn Auditorium" ;
  schema:address "Lynn, MA, 01901" .

<#offer-2> a schema:Offer ;
  schema:url "http://frontgatetickets.com/venue.php?id=11766"^^xsd:anyURI .

JSON Schema

{
  "@context": ["http://www.w3.org/ns/csvw", {"@language": "en"}],
  "url": "events-listing.csv",
  "dialect": {"trim": true},
  "tableSchema": {
    "columns": [{
      "name": "name",
      "titles": "Name",
      "aboutUrl": "#event-{_row}",
      "propertyUrl": "schema:name"
    }, {
      "name": "start_date",
      "titles": "Start Date",
      "datatype": {
        "base": "datetime",
        "format": "yyyy-MM-ddTHH:mm"
      },
      "aboutUrl": "#event-{_row}",
      "propertyUrl": "schema:startDate"
    }, {
      "name": "location_name",
      "titles": "Location Name",
      "aboutUrl": "#place-{_row}",
      "propertyUrl": "schema:name"
    }, {
      "name": "location_address",
      "titles": "Location Address",
      "aboutUrl": "#place-{_row}",
      "propertyUrl": "schema:address"
    }, {
      "name": "ticket_url",
      "titles": "Ticket Url",
      "datatype": "anyURI",
      "aboutUrl": "#offer-{_row}",
      "propertyUrl": "schema:url"
    }, {
      "name": "type_event",
      "virtual": true,
      "aboutUrl": "#event-{_row}",
      "propertyUrl": "rdf:type",
      "valueUrl": "schema:MusicEvent"
    }, {
      "name": "type_place",
      "virtual": true,
      "aboutUrl": "#place-{_row}",
      "propertyUrl": "rdf:type",
      "valueUrl": "schema:Place"
    }, {
      "name": "type_offer",
      "virtual": true,
      "aboutUrl": "#offer-{_row}",
      "propertyUrl": "rdf:type",
      "valueUrl": "schema:Offer"
    }, {
      "name": "location",
      "virtual": true,
      "aboutUrl": "#event-{_row}",
      "propertyUrl": "schema:location",
      "valueUrl": "#place-{_row}"
    }, {
      "name": "offers",
      "virtual": true,
      "aboutUrl": "#event-{_row}",
      "propertyUrl": "schema:offers",
      "valueUrl": "#offer-{_row}"
    }]
  }
}

One could imagine how this might generalize into multidimensional array data, but that immediately becomes pretty ridiculous - a better strategy in all cases that I can think of would be to just provide metadata about the array like the encoding, the sizes, types, etc. of their axes and indices and then link to the array.

I’ll just leave this example of encoding the pixels in one RGB video frame as a joke.

@prefix vid: <http://example.com/GodforsakenVideoSchema> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

:myVideo a vid:VideoGroup ;
  vid:video [ a vid:Video ;
    vid:url <http://example.com/myVideo.mp4> ;
    vid:frame [ a vid:Frame ;
      vid:framenum 1 ;
      vid:url <#frame=1> ;
      vid:describes <#frame-1> ;
    ], [ a vid:Frame ;
      vid:framenum 2 ;
      vid:url <#frame=2> ;
      vid:describes <#frame-2> ;
    ]
  ] .

<#frame-1> a vid:VideoFrame ;
  vid:timestamp "2023-06-29T12:00:00"^^xsd:dateTime ;
  vid:bitDepth 8 ;
  vid:width 1920 ;
  vid:height 1080 ;
  vid:channels <#red-1>, <#green-1>, <#blue-1> ;

<#red-1> a vid:VideoChannel ;
  :pixel-1 a vid:pixelValue ;
    rdf:first 0 ;
    rdf:rest :pixel-2 .

  :pixel-2 a vid:pixelValue ;
    rdf:first 46 ;
    rdf:rest :pixel-3 .

  # ...

  :pixel-2073600 a vid:pixelValue ;
    rdf:first 57 ;
    rdf:rest rdf:nil .

Naming#

All names have to be global. Relative names must resolve to a global name via contexts/prefixes. The alternative is blank nodes, which are treated as equivalent in eg. graph merges. Probably here enters pattern matching or whatever those things are called.
Blank nodes and skolemization https://www.w3.org/TR/rdf11-mt/#skolemization-informative

References#

RDF 1.1 Primer
W3C Recommendation on generating RDF from tabular data: [Tandy et al., 2015]
- Tabular data model: https://www.w3.org/TR/2015/REC-tabular-data-model-20151217/#parsing
- Metadata model: https://www.w3.org/TR/2015/REC-tabular-metadata-20151217/
JSON Schema in RDF: [Charpenay et al., 2023]
Turtle
N-ary relations in RDF
RDF 1.1 Semantics

Libraries#

jsonld.js
rdf-canonize-native
biolink-model for a nice example of generating multiple schema formats from a .yaml file.
linkml - modeling language for linked data [Moxon et al., 2021]
- Multidimensional arrays in linkml https://linkml.io/linkml/howtos/multidimensional-arrays.html
oaklib - python package for managing ontologies
rdflib - maybe the canonical python rdf library
csv2rdf

RDF and Friends#

RDF And Friends#

JSON-LD#

Challenges#

Ordered Data#

Tabular Data#

Naming#

References#

Libraries#

See Also#

Example Datasets#