moving notes to own folder

2025-01-10 06:04:28 +00:00 · 2023-08-15 02:25:21 -07:00 · 2023-08-15 02:25:21 -07:00 · cee4f3146b
commit cee4f3146b
parent 819513e797
4 changed files with 213 additions and 45 deletions
--- a/README.md
+++ b/README.md
@ -1,47 +1,2 @@
 # translate-nwb
 Translating NWB schema language to linkml
-
-The [nwb specification language](https://schema-language.readthedocs.io/en/latest/description.html)
-has several components
-
- Namespaces: subcollections of specifications
- Groups: 
-
-We want to translate the schema to LinkML so that we can export to other schema formats,
-generate code for dealing with the data, and ultimately make it interoperable
-with other formats.
-
-To do that, we need to map:
- Namespaces: seem to operate like separate schema? Then within a namespace the
-  rest are top-level objects
- Inheritance: NWB has an odd inheritance system, where the same syntax is used for
-  inheritance, mixins, type declaration, and inclusion.
-  - `neurodata_type_inc` -> `is_a`
- Groups: 
- Slots: Lots of properties are reused in the nwb spec, and LinkML lets us separate these out as slots
- dims, shape, and dtypes: these should have been just attributes rather than put in the spec
-  language, so we'll just make an Array class and use that.
-
-## How does pynwb use the schema?
-
-* nwb-schema is included as a git submodule within pynwb
-* [__get_resources](https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/src/pynwb/__init__.py#L23) encodes the location of the directory
-* [__TYPE_MAP](https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/src/pynwb/__init__.py#L51) eventually contains the schema information
-* on import, [load_namespaces](https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/src/pynwb/__init__.py#L115-L116) populates `__TYPE_MAP`
-* [register_class](https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/src/pynwb/__init__.py#L135-L136) decorator is used on all pynwb classes to register with `__TYPE_MAP`
-  * Unclear how the schema is used if the containers contain the same information
-* the [register_container_type](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/build/manager.py#L727-L736) method in hdmf's TypeMap class seems to overwrite the loaded schema???
-  * `__NS_CATALOG` seems to actually hold references to the schema but it doesn't seem to be used anywhere except within `__TYPE_MAP` ? 
-* [NWBHDF5IO](https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/src/pynwb/__init__.py#L237-L238) uses `TypeMap` to greate a `BuildManager`
-  * Parent class [HDF5IO](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/backends/hdf5/h5tools.py#L37) then reimplements a lot of basic functionality from elsehwere
-  * Parent-parent metaclass [HDMFIO](https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/backends/io.py) appears to be the final writing class?
-  * `BuildManager.build` then [calls `TypeMap.build`](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/build/manager.py#L171) ???
-* `TypeMap.build` ...
-  * gets the [`ObjectMapper`](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/build/manager.py#L763) which does [god knows what](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/build/manager.py#L697)
-  * Calls the [`ObjectMapper.build`](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/build/objectmapper.py#L700) method
-  * Which seems to ultimately create a [`DatasetBuilder`](https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/build/builders.py#L315) object
-* The `DatasetBuilder` is returned to the `BuildManager` which seems to just store it?
-* [HDMFIO.write](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/backends/io.py#L78) then calls `write_builder` to use the builder, which is unimplemented in the metaclass
-  * [HDF5IO.write_builder](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/backends/hdf5/h5tools.py#L806) implements it for HDF5, which then calls `write_group`, `write_dataset`, `write_link`, depending on the builder types, each of which are extremely heavy methods! 
-  * eg. [`write_dataset`](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/backends/hdf5/h5tools.py#L1080) is basically unreadable to me, but seems to implement every type of dataset writing in a single method.
-* At this point it is entirely unclear how the schema is involved, but the file is written.
--- a/docs/notes/pynwb.md
+++ b/docs/notes/pynwb.md
@ -0,0 +1,25 @@
+# PyNWB notes
+
+## How does pynwb use the schema?
+
+* nwb-schema is included as a git submodule within pynwb
+* [__get_resources](https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/src/pynwb/__init__.py#L23) encodes the location of the directory
+* [__TYPE_MAP](https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/src/pynwb/__init__.py#L51) eventually contains the schema information
+* on import, [load_namespaces](https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/src/pynwb/__init__.py#L115-L116) populates `__TYPE_MAP`
+* [register_class](https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/src/pynwb/__init__.py#L135-L136) decorator is used on all pynwb classes to register with `__TYPE_MAP`
+  * Unclear how the schema is used if the containers contain the same information
+* the [register_container_type](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/build/manager.py#L727-L736) method in hdmf's TypeMap class seems to overwrite the loaded schema???
+  * `__NS_CATALOG` seems to actually hold references to the schema but it doesn't seem to be used anywhere except within `__TYPE_MAP` ? 
+* [NWBHDF5IO](https://github.com/NeurodataWithoutBorders/pynwb/blob/dev/src/pynwb/__init__.py#L237-L238) uses `TypeMap` to greate a `BuildManager`
+  * Parent class [HDF5IO](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/backends/hdf5/h5tools.py#L37) then reimplements a lot of basic functionality from elsehwere
+  * Parent-parent metaclass [HDMFIO](https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/backends/io.py) appears to be the final writing class?
+  * `BuildManager.build` then [calls `TypeMap.build`](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/build/manager.py#L171) ???
+* `TypeMap.build` ...
+  * gets the [`ObjectMapper`](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/build/manager.py#L763) which does [god knows what](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/build/manager.py#L697)
+  * Calls the [`ObjectMapper.build`](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/build/objectmapper.py#L700) method
+  * Which seems to ultimately create a [`DatasetBuilder`](https://github.com/hdmf-dev/hdmf/blob/dev/src/hdmf/build/builders.py#L315) object
+* The `DatasetBuilder` is returned to the `BuildManager` which seems to just store it?
+* [HDMFIO.write](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/backends/io.py#L78) then calls `write_builder` to use the builder, which is unimplemented in the metaclass
+  * [HDF5IO.write_builder](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/backends/hdf5/h5tools.py#L806) implements it for HDF5, which then calls `write_group`, `write_dataset`, `write_link`, depending on the builder types, each of which are extremely heavy methods! 
+  * eg. [`write_dataset`](https://github.com/hdmf-dev/hdmf/blob/dd39b3878523c4b03f5286fc740752befd192d8b/src/hdmf/backends/hdf5/h5tools.py#L1080) is basically unreadable to me, but seems to implement every type of dataset writing in a single method.
+* At this point it is entirely unclear how the schema is involved, but the file is written.
--- a/docs/notes/schema.md
+++ b/docs/notes/schema.md
@ -0,0 +1,185 @@
+# Schema Notes
+
+https://schema-language.readthedocs.io/en/latest/
+
+rough notes kept while thinking about how to translate the schema
+
+The easiest thing to do seems to just be to make a linkML schema of the nwb-schema spec itself and then use that to generate python dataclasses that process the loaded namespaces using mixin methods lol
+
+## Overview
+
+We want to translate the schema to LinkML so that we can export to other schema formats,
+generate code for dealing with the data, and ultimately make it interoperable
+with other formats.
+
+
+## Structure
+
+- root is `nwb.namespace.yaml` and imports the rest of the namespaces
+- `hdmf-common` is implicitly loaded (`TODO` link to issue)
+- 
+
+## Components
+
+The [nwb specification language](https://schema-language.readthedocs.io/en/latest/description.html)
+has several components
+
+- **Namespaces:** top level object
+- **Schema:** specified within a `namespaces` object. Each schema is a list of data types
+- **Data types:** Each top-level list in a schema file is a data type. data types are one of three subtypes:
+	- Groups: generic collection
+	- Datasets: like groups, but also describe arrays
+	- Links: references to other top-level 
+- Attributes: Groups and Datasets, in addition to their default properties, also can have a list named `attributes` that seem to just be used like `**kwargs`, but also seem to maybe be used to specify arrays? 
+	- > The specification of datasets looks quite similar to attributes and groups. Similar to attributes, datasets describe the storage of arbitrary n-dimensional array data. However, in contrast to attributes, datasets are not associated with a specific parent group or dataset object but are (similar to groups) primary data objects (and as such typically manage larger data than attributes)
+
+The components, in turn:
+
+- Groups and Datasets are recursive: ie. groups and datasets can have groups and datasets 
+	- and also links (but the recursive part is just the group or dataset being linked to)
+
+## Properties
+
+**`dtype`** defines the storage type of the given "data type," which we'll also start calling "class" because confusing. 
+
+dtypes can be 
+- unset, where then the "data type"/"class" becomes a group of datasets.
+- a string
+- a list of dtypes: single-layer recursion
+- a dictionary defining a "reference", 
+	- `target_type`: that type the target of the reference is
+	- `reftype`: the kind of reference being made, `ref/reference/object` (all equivalent) or `region` for a subset of the referred object.
+
+**`dims`** defines the axis names, and `shape`** defines the possible shapes of an array. The structure of each has to match
+
+eg: 
+
+```yml
+- neurodata_type_def: Image
+  neurodata_type_inc: NWBData
+  dtype: numeric
+  dims:
+  - - x
+    - y
+  - - x
+    - y
+    - r, g, b
+  - - x
+    - y
+    - r, g, b, a
+  shape:
+  - - null
+    - null
+  - - null
+    - null
+    - 3
+  - - null
+    - null
+    - 4
+```
+
+Can a compound dtype be used with multiple dims?? if dtype also controls the shape of the data type (eg. the tabular data example with a bigass dtype,) then what are dims?
+
+Seems like when `dtype` is specified with `dims` then it is treated as an array, but otherwise scalar. 
+
+
+### Inheritance
+
+- `neurodata_type_def` - defines a new data type
+- `neurodata_type_inc` - includes/inherits from another data type within the namespace
+
+Both are optional. Inheritance and instantiation appear to be conflated here
+
+- `(def unset/inc unset)` - untyped data type? - seems to be because "datasets" are recursive, so the actual numerical arrays are "datasets" but so are the top-level classes. but can datasets truly be recursive? i think the HDF5 implementation probably means that untyped datasets are terminal - ie. untype datasets cannot contain datasets. maybe?
+- `(def set  /inc unset)` - new data type
+- `(def set  /inc set  )` - inheritance
+- `(def unset/inc set  )` - instantiate???
+
+
+If no new type is defined, the "data type" has a "data type" of the `inc`luded type? 
+
+I believe this means that including without defining is instantiating the type, hence the need for a unique name. Otherwise, the "name" is presumably the name of the type?
+
+Does overriding a dataset or group from the parent class ... override it? or add to it? or does it need to be validated against the parent dataset schema?
+
+instantiation as a group can be used to indicate an abstract number of a dataset, not sure how that's distinct from `dtype` and `dims` yet.
+
+
+
+## Mappings
+
+What can be restructured to fit LinkML
+
+we need to map:
+- Namespaces: seem to operate like separate schema? Then within a namespace the
+  rest are top-level objects
+- Inheritance: NWB has an odd inheritance system, where the same syntax is used for
+  inheritance, mixins, type declaration, and inclusion.
+  - `neurodata_type_inc` -> `is_a`
+- Groups: 
+- Slots: Lots of properties are reused in the nwb spec, and LinkML lets us separate these out as slots
+- `quantity` needs a manual map
+- dims, shape, and dtypes: these should have been just attributes rather than put in the spec
+  language, so we'll just make an Array class and use that.
+  - dims and shape should probably be a dictionary so you don't need a zillion nulls, eg rather than 
+  ```yml
+  dims:
+  - - x
+    - y
+  - - x
+    - y
+    - r, g, b
+  shape:
+  - - null
+    - null
+  - - null
+    - null
+    - 3
+  ```
+  do
+  ```yml
+  dims:
+  - - name: x
+    - name: y
+  - - name: x
+    - name: y
+    - name: r, g, b
+      shape: 3
+  ```
+  or even
+  ```yml
+  dims:
+  - - x
+    - y
+  - - x
+    - y
+    - name: r, g, b
+      shape: 3
+
+  ```
+
+  And also is there any case that would break where there is some odd dependency between dims where it wouldn't work to just use an `optional` param
+
+  ```yml
+  dims:
+  - name: x
+    shape: null
+  - name: y
+    shape: null
+  - name: r, g, b
+    shape: 3
+    optional: true
+  ```
+
+## Parsing
+
+- Given a `nwb.schema.yml` meta-schema that defines the types of objects in nwb schema...
+- The top level of an NWB schema is a `namespaces` object
+- each file specified in the `namespaces.schema` array is a distinct schema
+	- that inherits the 
+- `groups`
+	- Top level lists are parsed as "groups"
+
+## Special Types
+
+holy hell it appears as if `hdmf-common` is all special cases. eg. DynamicTable.... is like a parallel implementation of links and references???
--- a/docs/notes/storage.md
+++ b/docs/notes/storage.md
@ -0,0 +1,3 @@
+# NWB Storage
+
+https://nwb-storage.readthedocs.io/en/latest/