# Purpose If [pynwb](https://pynwb.readthedocs.io/en/stable/) already exists, why `nwb_linkml`? Two kinds of reasons: - using NWB as a test case for a larger infrastructure project, and - potentially improving the state of NWB itself. ## A Stepping Stone... In the (word on how and why we are focusing on NWB as part of a larger project) ## Interoperable Schema Language **We want to make NWB a seed format in an interoperable, peer-to-peer graph of research data** NWB is written with its own [{index}`schema language`](https://schema-language.readthedocs.io/en/latest/) (And see [the next section](nwb) for more information). It seems to have been created primarily because other schema languages at the time couldn't easily handle array specifications with fine-grained control of numerical format and shape. The schema language is now relatively stable and does what it's designed to do, but it being a domain-specific language rather than a general one makes it very difficult to use NWB data alongside other formats. `nwb_linkml` translates NWB to [linkml](https://linkml.io/), a schema language for declaring **{index}`Linked Data`** schema. Linked Data schema consist of semantic triplets, rather than an object hierarchy, and can make use of **controlled vocabularies** to reuse terms and classes from other schemas and ontologies. ## Storage Format Flexibility **We want to use NWB in lots of different ways** NWB as a format is designed with the intention for use with multiple storage backends, but patterns and features of HDF5 have made their way into the schema and the schema language, making direct translation to other storage systems difficult. This is a problem for practical usage of NWB data, since HDF5 files don't lend themselves to querying across many files - eg. to find datasets that have some common piece of metadata, one would have to download them all in full first. Having a whole hierarchy of data in a single file is convenient in some ways, but this also makes them difficult to share or split between computers which is a common need when collecting data across multiple instruments and computers. NWB, currently, lends itself towards being an **archival** format --- where data is converted as a last step before publishing --- rather than a **experimental** or **computational** format that can be used as a convenient container of heterogeneous data during collection and analysis. The LinkML team has also made a large number of [generators](https://linkml.io/linkml/generators/index.html) to convert LinkML schema to different formats, including JSON Schema, GraphQL, SPARQL, SQL/SQLAlchemy, and {mod}`~nwb_linkml.generators.pydantic`. Since we have to use LinkML in a somewhat nonstandard way to accommodate NWB's arrays, references, and naming conventions, these generators won't be immediately available for use, but with some minor modification we should be able to get NWB out of HDF5 files and into other formats. ## Zero-code Schema Extensions **We want every researcher and every tool to have their own schemas.** pynwb makes use of NWB Schema internally, but [schema extensions](https://pynwb.readthedocs.io/en/stable/tutorials/general/extensions.html#sphx-glr-tutorials-general-extensions-py) require a decent amount of adjoining code to use. The underlying hdmf library is relatively complex, and so to use a schema extension one must also program the python classes or mappings to python class attributes needed to use them, configuration for getter and setter methods, i/o routines, etc. Since schema extensions are relatively hard to make, to accommodate heterogeneous data NWB uses `DynamicTable`s, which can be given arbitrary new columns. The loose coupling between schema and code has a few impacts: - Many labs end up with their own independent software library for converting their data into NWB - Interoperability and meta-analysis suffer because terms are defined ad-hoc and with little discoverability. - Linking and versioning schema is hard, as the schema language doesn't support it, and the code needs to be kept in-sync with the schema - It's hard for tool-builders to implement direct export to NWB while maintaining flexibility in their libraries Instead by making all models directly generated from schema, and by making use of pydantic and other validation and metaprogramming tools, we want to make it possible for every experiment to have its own schema extension. We want to make experimental data part of the normal social process of sharing results --- translation: we want to be able to put our work in conversation with other related work! ## Pythonic API **We want NWB to be as simple to use as a python dataclass.** We think there is room for improvement in NWB's API: `````{tab-set} ````{tab-item} pynwb From the ndx-miniscope extension: The extension code is intended to be used like this: ```python from pynwb import NWBFile, NWBHDF5IO from pynwb.image import ImageSeries from natsort import natsorted from ndx_miniscope.utils import ( add_miniscope_device, get_starting_frames, get_timestamps, read_miniscope_config, read_notes, ) nwbfile = NWBFile(...) # Load the miscroscope settings miniscope_folder_path = "C6-J588_Disc5/15_03_28/Miniscope/" miniscope_metadata = read_miniscope_config(folder_path=miniscope_folder_path) # Create the Miniscope device with the microscope metadata and add it to NWB add_miniscope_device(nwbfile=nwbfile, device_metadata=miniscope_metadata) # Load the behavioral camera settings behavcam_folder_path = "C6-J588_Disc5/15_03_28/BehavCam_2/" behavcam_metadata = read_miniscope_config(folder_path=behavcam_folder_path) # Create the Miniscope device with the behavioral camera metadata and add it to NWB add_miniscope_device(nwbfile=nwbfile, device_metadata=behavcam_metadata) save_path = os.path.join(folder_path, "test_out.nwb") with NWBHDF5IO(save_path, "w") as io: io.write(nwbfile) ``` That uses these underlying functions to handle validation, coercion, and add to the NWB file: ```python def add_miniscope_device(nwbfile: NWBFile, device_metadata: dict) -> NWBFile: """ Adds a Miniscope device based on provided metadata. Can be used to add device for the microscope and the behavioral camera. Parameters ---------- nwbfile : NWBFile The nwbfile to add the Miniscope device to. device_metadata: dict The metadata for the device to be added. Returns ------- NWBFile The NWBFile passed as an input with the Miniscope added. """ device_metadata_copy = deepcopy(device_metadata) assert "name" in device_metadata_copy, "'name' is missing from metadata." device_name = device_metadata_copy["name"] if device_name in nwbfile.devices: return nwbfile roi = device_metadata_copy.pop("ROI", None) if roi: device_metadata_copy.update(ROI=[roi["height"], roi["width"]]) device = Miniscope(**device_metadata_copy) nwbfile.add_device(device) return nwbfile def add_miniscope_image_series( nwbfile: NWBFile, metadata: dict, timestamps: np.ndarray, image_series_index: int = 0, external_files: Optional[List[str]] = None, starting_frames: Optional[List[int]] = None, ) -> NWBFile: """ Adds an ImageSeries with a linked Miniscope device based on provided metadata. The metadata for the device to be linked should be stored in metadata["Behavior]["Device"]. Parameters ---------- nwbfile : NWBFile The nwbfile to add the image series to. metadata: DeepDict The metadata storing the necessary metadata for creating the image series and linking it to the appropriate device. timestamps : np.ndarray The timestamps for the behavior movie source. image_series_index : int, optional The metadata for ImageSeries is a list of the different image series to add. Specify which element of the list with this parameter. external_files : List[str], optional List of external files associated with the ImageSeries. starting_frames : List[int], optional List of starting frames for each external file. Returns ------- NWBFile The NWBFile passed as an input with the ImageSeries added. """ assert "Behavior" in metadata, "The metadata for ImageSeries and Device should be stored in 'Behavior'." assert ( "ImageSeries" in metadata["Behavior"] ), "The metadata for ImageSeries should be stored in metadata['Behavior']['ImageSeries']." assert ( "Device" in metadata["Behavior"] ), "The metadata for Device should be stored in metadata['Behavior']['Device']." image_series_kwargs = deepcopy(metadata["Behavior"]["ImageSeries"][image_series_index]) image_series_name = image_series_kwargs["name"] if image_series_name in nwbfile.acquisition: return nwbfile # Add linked device to ImageSeries device_metadata = metadata["Behavior"]["Device"][image_series_index] device_name = device_metadata["name"] if device_name not in nwbfile.devices: add_miniscope_device(nwbfile=nwbfile, device_metadata=device_metadata) device = nwbfile.get_device(name=device_name) image_series_kwargs.update(device=device) assert external_files, "'external_files' must be specified." if starting_frames is None and len(external_files) == 1: starting_frames = [0] assert len(starting_frames) == len( external_files ), "The number of external files must match the length of 'starting_frame'." image_series_kwargs.update( format="external", external_file=external_files, starting_frame=starting_frames, timestamps=H5DataIO(timestamps, compression=True), ) image_series = ImageSeries(**image_series_kwargs) nwbfile.add_acquisition(image_series) ``` ```` ````{tab-item} nwb_linkml An example of how we want `nwb_linkml` to work. There are no additional underlying classes or functions to be written, since the pydantic models are directly generated from the schema extension, and `to` and `from` methods are generic for different types of input data (json files, videos). Tool developers can distribute NWB schema that map 1:1 to their output formats, decreasing the need for conversion code. ```python from pathlib import Path from nwb_linkml.models.miniscope import Miniscope from nwb_linkml.models.core import ImageSeries, NWBFile # Load data for miniscope and videos miniscope = Miniscope.from_json('config.json') videos = [] for video_path in Path('./my_data/').glob('*.avi'): video = ImageSeries.from_video(video_path) video.device = miniscope videos.append(video) # add to file file = NWBFile.from_hdf('my_data.nwb') file.devices['my_miniscope'] = miniscope file.acquisition['my_videos'] = videos file.save() ``` ```` `````