Translating NWB schema language to linkml
Find a file
2023-08-11 22:29:57 -07:00
translate_nwb starts 2023-08-11 19:52:35 -07:00
.gitignore starts 2023-08-11 19:52:35 -07:00
LICENSE Initial commit 2023-06-30 20:34:12 -07:00
poetry.lock starts 2023-08-11 19:52:35 -07:00
pyproject.toml starts 2023-08-11 19:52:35 -07:00
README.md pynwb schema notes 2023-08-11 22:29:57 -07:00

translate-nwb

Translating NWB schema language to linkml

The nwb specification language has several components

  • Namespaces: subcollections of specifications
  • Groups:

We want to translate the schema to LinkML so that we can export to other schema formats, generate code for dealing with the data, and ultimately make it interoperable with other formats.

To do that, we need to map:

  • Namespaces: seem to operate like separate schema? Then within a namespace the rest are top-level objects
  • Inheritance: NWB has an odd inheritance system, where the same syntax is used for inheritance, mixins, type declaration, and inclusion.
    • neurodata_type_inc -> is_a
  • Groups:
  • Slots: Lots of properties are reused in the nwb spec, and LinkML lets us separate these out as slots
  • dims, shape, and dtypes: these should have been just attributes rather than put in the spec language, so we'll just make an Array class and use that.

How does pynwb use the schema?

  • nwb-schema is included as a git submodule within pynwb
  • __get_resources encodes the location of the directory
  • __TYPE_MAP eventually contains the schema information
  • on import, load_namespaces populates __TYPE_MAP
  • register_class decorator is used on all pynwb classes to register with __TYPE_MAP
    • Unclear how the schema is used if the containers contain the same information
  • the register_container_type method in hdmf's TypeMap class seems to overwrite the loaded schema???
    • __NS_CATALOG seems to actually hold references to the schema but it doesn't seem to be used anywhere except within __TYPE_MAP ?
  • NWBHDF5IO uses TypeMap to greate a BuildManager
    • Parent class HDF5IO then reimplements a lot of basic functionality from elsehwere
    • Parent-parent metaclass HDMFIO appears to be the final writing class?
    • BuildManager.build then calls TypeMap.build ???
  • TypeMap.build ...
  • The DatasetBuilder is returned to the BuildManager which seems to just store it?
  • HDMFIO.write then calls write_builder to use the builder, which is unimplemented in the metaclass
    • HDF5IO.write_builder implements it for HDF5, which then calls write_group, write_dataset, write_link, depending on the builder types, each of which are extremely heavy methods!
    • eg. write_dataset is basically unreadable to me, but seems to implement every type of dataset writing in a single method.
  • At this point it is entirely unclear how the schema is involved, but the file is written.