numpydantic/docs/design.md

2.2 KiB

Design

Why do this?

We want to bring the tidyness of modeling data with pydantic to the universe of software that uses arrays - particularly formats and packages that need to be very particular about what kind of arrays they are able to handle or match a specific schema.

To support a new generation of data formats and data analysis libraries that can model the structure of data independently from its implementation, we made numpydantic as a bridge between abstract schemas and programmatic use.

Challenges

The Python type annotation system is weird and not like the rest of Python! (at least until PEP 0649 gets mainlined). Similarly, Pydantic 2's core_schema system is wonderful but still has a few mysteries lurking under the documented surface. This package does the work of plugging them in together to make some kind of type validation frankenstein.

The first problem is that type annotations are evaluated statically by python, mypy, etc. This means you can't use typical python syntax for declaring types - it has to be present at the time __new__ is called, rather than __init__. So

Different implementations of arrays behave differently! HDF5 files need to be carefully opened and closed to avoid corruption, video files don't typically allow normal array slicing operations, and only some array libraries support lazy loading of arrays on disk.

We can't anticipate all the possible array libraries that exist now or in the future, so it has to be possible to extend support to them without needing to go through a potentially lengthy contribution process.

Strategy

Numpydantic uses {class}~numpydantic.NDArray as an abstract specification of an array that uses one of several interface classes to validate and interact with an array. These interface classes will set the instance attribute either as the passed array itself, or a transparent proxy class (eg. {class}~numpydantic.interface.hdf5.H5Proxy) in the case that the native array format doesn't support numpy-like array operations out of the box.

  • type hinting
  • nptyping syntax
  • not trying to be an array library
  • dtyping, mapping & schematization