mirror of
https://github.com/p2p-ld/nwb-linkml.git
synced 2025-01-09 13:44:27 +00:00
translation notes
This commit is contained in:
parent
baa5471af7
commit
e0dde2c52f
1 changed files with 272 additions and 0 deletions
|
@ -16,6 +16,278 @@
|
|||
|
||||
### Arrays
|
||||
|
||||
## Special Cases
|
||||
|
||||
### DynamicTable
|
||||
|
||||
One of the major special cases in NWB is the use of `DynamicTable` to contain tabular data that
|
||||
contains columns that are not in the base spec.
|
||||
|
||||
#### Basic Usage
|
||||
|
||||
An example is the `TimeIntervals` neurodata type within `nwb.epoch` :
|
||||
|
||||
```yaml
|
||||
groups:
|
||||
- neurodata_type_def: TimeIntervals
|
||||
neurodata_type_inc: DynamicTable
|
||||
doc: A container for aggregating epoch data and the TimeSeries that each epoch applies
|
||||
to.
|
||||
datasets:
|
||||
- name: start_time
|
||||
neurodata_type_inc: VectorData
|
||||
dtype: float32
|
||||
doc: Start time of epoch, in seconds.
|
||||
- name: stop_time
|
||||
neurodata_type_inc: VectorData
|
||||
dtype: float32
|
||||
doc: Stop time of epoch, in seconds.
|
||||
- name: tags
|
||||
neurodata_type_inc: VectorData
|
||||
dtype: text
|
||||
doc: User-defined tags that identify or categorize events.
|
||||
quantity: '?'
|
||||
- name: tags_index
|
||||
neurodata_type_inc: VectorIndex
|
||||
doc: Index for tags.
|
||||
quantity: '?'
|
||||
- name: timeseries
|
||||
neurodata_type_inc: TimeSeriesReferenceVectorData
|
||||
doc: An index into a TimeSeries object.
|
||||
quantity: '?'
|
||||
- name: timeseries_index
|
||||
neurodata_type_inc: VectorIndex
|
||||
doc: Index for timeseries.
|
||||
quantity: '?'
|
||||
```
|
||||
|
||||
Each of the columns of the table are specified as `VectorData` objects,
|
||||
which create an implicit `{n<=4}`-dimensional array,
|
||||
and optionally have an adjoining `VectorIndex` attribute that has the `VectorData` item as a `target` :
|
||||
|
||||
```yaml
|
||||
- data_type_def: VectorData
|
||||
data_type_inc: Data
|
||||
doc: ...
|
||||
dims:
|
||||
- ...
|
||||
shape:
|
||||
- ...
|
||||
attributes:
|
||||
- name: description
|
||||
dtype: text
|
||||
doc: Description of what these vectors represent.
|
||||
|
||||
- data_type_def: VectorIndex
|
||||
data_type_inc: VectorData
|
||||
dtype: uint8
|
||||
doc: ...
|
||||
dims:
|
||||
- num_rows
|
||||
shape:
|
||||
- null
|
||||
attributes:
|
||||
- name: target
|
||||
dtype:
|
||||
target_type: VectorData
|
||||
reftype: object
|
||||
doc: Reference to the target dataset that this index applies to.
|
||||
```
|
||||
|
||||
The `DynamicTable` also allows for arbitrary additional `VectorData` columns,
|
||||
where the `name` field is used as an identifier: columns specified in the model have a fixed `name`
|
||||
given by the schema, but each additional column is identified by its given `name`:
|
||||
|
||||
```yaml
|
||||
- data_type_def: DynamicTable
|
||||
data_type_inc: Container
|
||||
doc: ...
|
||||
attributes:
|
||||
- name: colnames
|
||||
dtype: text
|
||||
dims:
|
||||
- num_columns
|
||||
shape:
|
||||
- null
|
||||
doc: The names of the columns in this table. This should be used to specify
|
||||
an order to the columns.
|
||||
- name: description
|
||||
dtype: text
|
||||
doc: Description of what is in this dynamic table.
|
||||
datasets:
|
||||
- name: id
|
||||
data_type_inc: ElementIdentifiers
|
||||
dtype: int
|
||||
dims:
|
||||
- num_rows
|
||||
shape:
|
||||
- null
|
||||
doc: Array of unique identifiers for the rows of this dynamic table.
|
||||
- data_type_inc: VectorData
|
||||
doc: Vector columns, including index columns, of this dynamic table.
|
||||
quantity: '*'
|
||||
```
|
||||
|
||||
Where `colnames` is stored as an array in the metadata attributes of the group,
|
||||
but all others are stored as hdf5 datasets.
|
||||
|
||||
In the simplest case, this results in a `TimeIntervals` group that looks like this
|
||||
(abbreviated for clarity):
|
||||
|
||||
```
|
||||
$ h5ls -rv an_nwb_dataset.nwb/trials
|
||||
/trials Group
|
||||
Attribute: colnames {7}
|
||||
Type: variable-length null-terminated UTF-8 string
|
||||
Attribute: neurodata_type scalar
|
||||
Type: variable-length null-terminated UTF-8 string
|
||||
Value: TimeIntervals
|
||||
/trials/id Dataset {121/121}
|
||||
Attribute: neurodata_type scalar
|
||||
Type: variable-length null-terminated UTF-8 string
|
||||
Value: ElementIdentifiers
|
||||
/trials/start_time Dataset {121/121}
|
||||
Attribute: neurodata_type scalar
|
||||
Type: variable-length null-terminated UTF-8 string
|
||||
Value: VectorData
|
||||
/trials/stop_time Dataset {121/121}
|
||||
Attribute: neurodata_type scalar
|
||||
Type: variable-length null-terminated UTF-8 string
|
||||
Value: VectorData
|
||||
/trials/surface_excursion_start_time Dataset {121/121}
|
||||
Attribute: neurodata_type scalar
|
||||
Type: variable-length null-terminated UTF-8 string
|
||||
Value: VectorData
|
||||
/trials/surface_excursion_stop_time Dataset {121/121}
|
||||
Attribute: neurodata_type scalar
|
||||
Type: variable-length null-terminated UTF-8 string
|
||||
Value: VectorData
|
||||
/trials/surface_location Dataset {121/121}
|
||||
Attribute: neurodata_type scalar
|
||||
Type: variable-length null-terminated UTF-8 string
|
||||
Value: VectorData
|
||||
/trials/surface_return_start_time Dataset {121/121}
|
||||
Attribute: neurodata_type scalar
|
||||
Type: variable-length null-terminated UTF-8 string
|
||||
Value: VectorData
|
||||
/trials/surface_return_stop_time Dataset {121/121}
|
||||
Attribute: neurodata_type scalar
|
||||
Type: variable-length null-terminated UTF-8 string
|
||||
Value: VectorData
|
||||
```
|
||||
|
||||
#### Ragged Tables
|
||||
|
||||
`VectorIndex` and `VectorData` pairs can also be used to create ragged arrays, eg. in the case of the
|
||||
`Units` model from `nwb.misc`
|
||||
|
||||
```yaml
|
||||
- neurodata_type_def: Units
|
||||
neurodata_type_inc: DynamicTable
|
||||
default_name: Units
|
||||
doc: Data about spiking units. Event times of observed units (e.g. cell, synapse,
|
||||
etc.) should be concatenated and stored in spike_times.
|
||||
datasets:
|
||||
- name: spike_times_index
|
||||
neurodata_type_inc: VectorIndex
|
||||
doc: Index into the spike_times dataset.
|
||||
quantity: '?'
|
||||
- name: spike_times
|
||||
neurodata_type_inc: VectorData
|
||||
dtype: float64
|
||||
doc: Spike times for each unit in seconds.
|
||||
quantity: '?'
|
||||
attributes:
|
||||
- name: resolution
|
||||
dtype: float64
|
||||
doc: The smallest possible difference between two spike times. Usually 1 divided by the acquisition sampling rate
|
||||
from which spike times were extracted, but could be larger if the acquisition time series was downsampled or
|
||||
smaller if the acquisition time series was smoothed/interpolated and it is possible for the spike time to be
|
||||
between samples.
|
||||
required: false
|
||||
```
|
||||
|
||||
In this case, the `spike_times` are stored as a 1-dimensional vector with spike times for each of the units
|
||||
concatenated. The `spike_times_index` then stores the first index for each of the units such that when one
|
||||
indexes the `NWBFile.units[0]` one gets an array of all the spike times for the `0th` unit.
|
||||
|
||||
#### Inter-table views
|
||||
|
||||
The `DynamicTableRegion` model is a subclass of `VectorData` that refers to rows within another `DynamicTable`.
|
||||
|
||||
For example, the `ElectricalSeries` model from `nwb.ecephys` (abbreviated for clarity):
|
||||
|
||||
```yaml
|
||||
- neurodata_type_def: ElectricalSeries
|
||||
neurodata_type_inc: TimeSeries
|
||||
doc: ...
|
||||
datasets:
|
||||
- name: data
|
||||
dtype: numeric
|
||||
dims:
|
||||
- ...
|
||||
shape:
|
||||
- ...
|
||||
doc: Recorded voltage data.
|
||||
attributes:
|
||||
- name: unit
|
||||
dtype: text
|
||||
value: volts
|
||||
doc: ...
|
||||
- name: electrodes
|
||||
neurodata_type_inc: DynamicTableRegion
|
||||
doc: DynamicTableRegion pointer to the electrodes that this time series was generated from.
|
||||
```
|
||||
|
||||
This produces an HDF5 dataset like `/acquisition/{name}/electrodes` that has
|
||||
- a `table` attribute that is a reference to another dynamic table (eg. `/general/extracellular_ephys/electrodes`)
|
||||
- a vector of values that are references to the row indices of that table
|
||||
|
||||
such that the `{n_times} x {n_electrodes}` `/data` array can be indexed such that
|
||||
each of the channels from `electrodes` correspond to a column of the array.
|
||||
|
||||
|
||||
|
||||
#### Implicit Behavior
|
||||
|
||||
- A `VectorIndex` does not need to explicitly refer to a `VectorData` column using the `target` attribute,
|
||||
but can be implicitly linked by being named `{VectorData.name}_index`
|
||||
- When indexing a dynamictable, the result that is returned with `DynamicTable.columname[0]`
|
||||
is actually the `VectorIndex`ed view into the `VectorData` column, rather than the `VectorData` column itself
|
||||
- References through `DynamicTableRegion` are similarly resolved by the API, replacing values from the referenced
|
||||
tables and datasets.
|
||||
|
||||
#### Implementation
|
||||
|
||||
When translating from nwb-schema-language to linkml we....
|
||||
|
||||
```{todo}
|
||||
Link to relevant adapter classes
|
||||
```
|
||||
|
||||
- Interpret `VectorData` as regular array-like slots if they have no additional attributes, or as subclasses when they do
|
||||
- Replace all the special reference notation with `range: Class` annotations that directly refer to the classes being linked to
|
||||
|
||||
When generating pydantic models we...
|
||||
|
||||
- Include a special :class:`~nwb_linkml.includes.hdmf.DynamicTableMixin` in the generated `hdmf_common.table` module and
|
||||
replace the configured base model
|
||||
- Since `linkml` doesn't have the notion of "arbitrary additional slots of this type" differentiated by a `name`,
|
||||
the Mixin class reconfigures the model to allow for `extra` fields.
|
||||
- The mixin then has model-level validation routines to verify that the columns are of equal length
|
||||
- The mixin also provides the accessor magic methods for indexing as usual.
|
||||
|
||||
|
||||
|
||||
|
||||
### References
|
||||
|
||||
There are several different ways to create references between objects in nwb/hdmf:
|
||||
|
||||
- ...
|
||||
|
||||
|
||||
|
||||
## LinkML to Everything
|
||||
|
||||
How to generalize to linked data triplets.
|
Loading…
Reference in a new issue