Type and shape validation and serialization for numpy arrays in pydantic models
Find a file
2024-04-03 20:52:33 -07:00
.github/workflows now linkml 2024-02-05 16:51:50 -08:00
docs moving to src format 2024-04-03 16:34:03 -07:00
src/numpydantic numpy interface and type stub generation 2024-04-03 20:52:33 -07:00
tests numpy interface and type stub generation 2024-04-03 20:52:33 -07:00
.gitignore numpy interface and type stub generation 2024-04-03 20:52:33 -07:00
.readthedocs.yaml oop left default requirements in 2024-02-05 16:37:38 -08:00
LICENSE Initial commit 2024-02-01 17:50:57 -08:00
pdm.lock numpy interface and type stub generation 2024-04-03 20:52:33 -07:00
pyproject.toml numpy interface and type stub generation 2024-04-03 20:52:33 -07:00
README.md README demo :) 2024-02-05 11:50:56 -08:00

numpydantic

PyPI - Version Documentation Status Coverage Status Code style: black

Type and shape validation and serialization for numpy arrays in pydantic models

This package was picked out of nwb-linkml, a translation of the NWB schema language and data format to linkML and pydantic models.

It does two primary things:

  • Provide types - Annotations (based on npytyping) for specifying numpy arrays in pydantic models, and
  • Generate models from LinkML - extend the LinkML pydantic generator to create models that that use the linkml-arrays syntax

Parameterized Arrays

Arrays use the npytying syntax:

from typing import Union
from pydantic import BaseModel
from numpydantic import NDArray, Shape, UInt8, Float, Int

class Image(BaseModel):
    """
    Data values. Data can be in 1-D, 2-D, 3-D, or 4-D. The first dimension should always represent time. This can also be used to store binary data (e.g., image frames). This can also be a link to data stored in an external file.
    """
    array: Union[
        NDArray[Shape["* x, * y"], UInt8],
        NDArray[Shape["* x, * y, 3 rgb"], UInt8],
        NDArray[Shape["* x, * y, 4 rgba"], UInt8],
        NDArray[Shape["* t, * x, * y, 3 rgb"], UInt8],
        NDArray[Shape["* t, * x, * y, 4 rgba"], Float]
    ]

Validation:

import numpy as np
# works
frame_gray = Image(array=np.ones((1280, 720), dtype=np.uint8))
frame_rgb  = Image(array=np.ones((1280, 720, 3), dtype=np.uint8))
frame_rgba = Image(array=np.ones((1280, 720, 4), dtype=np.uint8))
video_rgb  = Image(array=np.ones((100, 1280, 720, 3), dtype=np.uint8))

# fails
wrong_n_dimensions = Image(array=np.ones((1280,), dtype=np.uint8))
wrong_shape = Image(array=np.ones((1280,720,10), dtype=np.uint8))
wrong_type = Image(array=np.ones((1280,720,3), dtype=np.float64))

# shapes and types are checked together
float_video = Image(array=np.ones((100, 1280, 720, 4),dtype=float))
wrong_shape_float_video = Image(array=np.ones((100, 1280, 720, 3),dtype=float))

JSON schema generation:

class MyArray(BaseModel):
  array: NDArray[Shape["2 x, * y, 4 z"], Float]
>>> print(json.dumps(MyArray.model_json_schema(), indent=2))
{
  "properties": {
    "array": {
      "items": {
        "items": {
          "items": {
            "type": "number"
          },
          "maxItems": 4,
          "minItems": 4,
          "type": "array"
        },
        "type": "array"
      },
      "maxItems": 2,
      "minItems": 2,
      "title": "Array",
      "type": "array"
    }
  },
  "required": [
    "array"
  ],
  "title": "MyArray",
  "type": "object"
}

Serialization

class SmolArray(BaseModel):
    array: NDArray[Shape["2 x, 2 y"], Int]

class BigArray(BaseModel):
    array: NDArray[Shape["1000 x, 1000 y"], Int]

Serialize small arrays as lists of lists, and big arrays as a b64-encoded blosc compressed string

>>> smol = SmolArray(array=np.array([[1,2],[3,4]], dtype=int))
>>> big = BigArray(array=np.random.randint(0,255,(1000,1000),int))

>>> print(smol.model_dump_json())
{"array":[[1,2],[3,4]]}
>>> print(big.model_dump_json())
{
  "array": "( long b64 encoded string )",
  "shape": [1000, 1000],
  "dtype": "int64",
  "unpack_fns": ["base64.b64decode", "blosc2.unpack_array2"],
}