diff --git a/README.md b/README.md index b7ceeff..5318eab 100644 --- a/README.md +++ b/README.md @@ -13,7 +13,9 @@ A python package for specifying, validating, and serializing arrays with arbitra but ... -3) if you try and specify an array in pydantic, this happens: +3) Typical type annotations would only work for a single array library implementation +4) They wouldn’t allow you to specify array shapes and dtypes, and +5) If you try and specify an array in pydantic, this happens: ```python >>> from pydantic import BaseModel @@ -27,8 +29,69 @@ Set `arbitrary_types_allowed=True` in the model_config to ignore this error or implement `__get_pydantic_core_schema__` on your type to fully support it. ``` -And setting `arbitrary_types_allowed = True` still prohibits you from -generating JSON Schema, serialization to JSON +**Solution** + +Numpydantic allows you to do this: + +```python +from pydantic import BaseModel +from numpydantic import NDArray, Shape + +class MyModel(BaseModel): + array: NDArray[Shape["3 x, 4 y, * z"], int] +``` + +And use it with your favorite array library: + +```python +import numpy as np +import dask.array as da +import zarr + +# numpy +model = MyModel(array=np.zeros((3, 4, 5), dtype=int)) +# dask +model = MyModel(array=da.zeros((3, 4, 5), dtype=int)) +# hdf5 datasets +model = MyModel(array=('data.h5', '/nested/dataset')) +# zarr arrays +model = MyModel(array=zarr.zeros((3,4,5), dtype=int)) +model = MyModel(array='data.zarr') +model = MyModel(array=('data.zarr', '/nested/dataset')) +# video files +model = MyModel(array="data.mp4") +``` + +`numpydantic` supports pydantic but none of its behavior is dependent on it! +Use the `NDArray` type annotation like a regular type outside +of pydantic -- eg. to validate an array anywhere, use `isinstance`: + +```python +array_type = NDArray[Shape["1, 2, 3"], int] +isinstance(np.zeros((1,2,3), dtype=int), array_type) +# True +isinstance(zarr.zeros((1,2,3), dtype=int), array_type) +# True +isinstance(np.zeros((4,5,6), dtype=int), array_type) +# False +isinstance(np.zeros((1,2,3), dtype=float), array_type) +# False +``` + +Or use it as a convenient callable shorthand for validating and working with +array types that usually don't have an array-like API. + +```python +>>> rgb_video_type = NDArray[Shape["* t, 1920 x, 1080 y, 3 rgb"], np.uint8] +>>> video = rgb_video_type('data.mp4') +>>> video.shape +(10, 1920, 1080, 3) +>>> video[0, 0:3, 0:3, 0] +array([[0, 0, 0], + [0, 0, 0], + [0, 0, 0]], dtype=uint8) +``` + ## Features: - **Types** - Annotations (based on [npytyping](https://github.com/ramonhagenaars/nptyping)) @@ -44,6 +107,9 @@ generating JSON Schema, serialization to JSON recreate the model in the native format - **Schema Generation** - Correct JSON Schema for arrays, complete with shape and dtype constraints, to make your models interoperable +- **Fast** - The validation codepath is careful to take quick exits and not perform unnecessary work, + and interfaces use whatever tools available to validate against array metadata and lazy load to avoid + expensive i/o operations. Our goal is to make numpydantic a tool you don't ever need to think about. Coming soon: - **Metadata** - This package was built to be used with [linkml arrays](https://linkml.io/linkml/schemas/arrays.html), @@ -77,6 +143,10 @@ pip intsall 'numpydantic[array]' ## Usage +> [!TIP] +> The README is just a sample! See the full documentation at +> https://numpydantic.readthedocs.io + Specify an array using [nptyping syntax](https://github.com/ramonhagenaars/nptyping/blob/master/USERDOCS.md) and use it with your favorite array library :) diff --git a/docs/index.md b/docs/index.md index bced657..127719f 100644 --- a/docs/index.md +++ b/docs/index.md @@ -57,7 +57,8 @@ model = MyModel(array=('data.zarr', '/nested/dataset')) model = MyModel(array="data.mp4") ``` -And use the `NDArray` type annotation like a regular type outside +`numpydantic` supports pydantic but none of its behavior is dependent on it! +Use the `NDArray` type annotation like a regular type outside of pydantic -- eg. to validate an array anywhere, use `isinstance`: ```python @@ -72,9 +73,26 @@ isinstance(np.zeros((1,2,3), dtype=float), array_type) # False ``` +Or use it as a convenient callable shorthand for validating and working with +array types that usually don't have an array-like API. + +```python +>>> rgb_video_type = NDArray[Shape["* t, 1920 x, 1080 y, 3 rgb"], np.uint8] +>>> video = rgb_video_type('data.mp4') +>>> video.shape +(10, 1920, 1080, 3) +>>> video[0, 0:3, 0:3, 0] +array([[0, 0, 0], + [0, 0, 0], + [0, 0, 0]], dtype=uint8) +``` + ```{note} `NDArray` can't do validation with static type checkers yet, see -{ref}`design_challenges` and {ref}`type_checkers` +{ref}`design_challenges` and {ref}`type_checkers` . + +Converting the `NDArray` type away from the inherited `nptyping` +class towards a proper generic is the top development priority for `v2.0.0` ``` ## Features: @@ -90,6 +108,9 @@ isinstance(np.zeros((1,2,3), dtype=float), array_type) recreate the model in the native format. Full roundtripping is supported :) - **Schema Generation** - Correct JSON Schema for arrays, complete with shape and dtype constraints, to make your models interoperable +- **Fast** - The validation codepath is careful to take quick exits and not perform unnecessary work, + and interfaces use whatever tools available to validate against array metadata and lazy load to avoid + expensive i/o operations. Our goal is to make numpydantic a tool you don't ever need to think about. Coming soon: - **Metadata** - This package was built to be used with [linkml arrays](https://linkml.io/linkml/schemas/arrays.html), diff --git a/tests/fixtures.py b/tests/fixtures.py index 89359ae..39bb5c9 100644 --- a/tests/fixtures.py +++ b/tests/fixtures.py @@ -173,8 +173,8 @@ def zarr_array(tmp_output_dir_func) -> Path: @pytest.fixture(scope="function") -def avi_video(tmp_path) -> Callable[[Tuple[int, int], int, bool], Path]: - video_path = tmp_path / "test.avi" +def avi_video(tmp_output_dir_func) -> Callable[[Tuple[int, int], int, bool], Path]: + video_path = tmp_output_dir_func / "test.avi" def _make_video(shape=(100, 50), frames=10, is_color=True) -> Path: writer = cv2.VideoWriter(