diff --git a/docs/changelog.md b/docs/changelog.md index 684602d..af6375e 100644 --- a/docs/changelog.md +++ b/docs/changelog.md @@ -2,6 +2,73 @@ ## 1.* +### 1.6.* + +#### 1.6.0 - 24-09-23 - Roundtrip JSON Serialization + +Roundtrip JSON serialization is here - with serialization to list of lists, +as well as file references that don't require copying the whole array if +used in data modeling, control over path relativization, and stamping of +interface version for the extra provenance conscious. + +Please see [serialization](./serialization.md) for narrative documentation :) + +**Potentially Breaking Changes** +- See [development](./development.md) for a statement about API stability +- An additional {meth}`.Interface.deserialize` method has been added to + {meth}`.Interface.validate` - downstream users are not intended to override the + `validate method`, but if they have, then JSON deserialization will not work for them. +- `Interface` subclasses now require a `name` attribute, a short string identifier for that interface, + and a `json_model` that inherits from {class}`.interface.JsonDict`. Interfaces without + these attributes will not be able to be instantiated. +- {meth}`.Interface.to_json` is now an abstract method that all interfaces must define. + +**Features** +- Roundtrip JSON serialization - by default dump to a list of list arrays, but + support the `round_trip` keyword in `model_dump_json` for provenance-preserving dumps +- JSON Schema generation has been separated from `core_schema` generation in {class}`.NDArray`. + Downstream interfaces can customize json schema generation without compromising ability to validate. +- All proxy classes must have an `__eq__` dunder method to compare equality - + in proxy classes, these compare equality of arguments, since the arrays that + are referenced on disk should be equal by definition. Direct array comparison + should use {func}`numpy.array_equal` +- Interfaces previously couldn't be instantiated without explicit shape and dtype arguments, + these have been given `Any` defaults. +- New {mod}`numpydantic.serialization` module to contain serialization logic. + +**New Classes** +See the docstrings for descriptions of each class +- `MarkMismatchError` for when an array serialized with `mark_interface` doesn't match + the interface that's deserializing it +- {class}`.interface.InterfaceMark` +- {class}`.interface.MarkedJson` +- {class}`.interface.JsonDict` + - {class}`.dask.DaskJsonDict` + - {class}`.hdf5.H5JsonDict` + - {class}`.numpy.NumpyJsonDict` + - {class}`.video.VideoJsonDict` + - {class}`.zarr.ZarrJsonDict` + +**Bugfix** +- [`#17`](https://github.com/p2p-ld/numpydantic/issues/17) - Arrays are re-validated as lists, rather than arrays +- Some proxy classes would fail to be serialized becauase they lacked an `__array__` method. + `__array__` methods have been added, and tests for coercing to an array to prevent regression. +- Some proxy classes lacked a `__name__` attribute, which caused failures to serialize + when the `__getattr__` methods attempted to pass it through. These have been added where needed. + +**Docs** +- Add statement about versioning and API stability to [development](./development.md) +- Add docs for serialization! +- Remove stranded docs from hooks and monkeypatch +- Added `myst_nb` to docs dependencies for direct rendering of code and output + +**Tests** +- Marks have been added for running subsets of the tests for a given interface, + package feature, etc. +- Tests for all the above functionality + + + ### 1.5.* #### 1.5.3 - 24-09-03 - Bugfix, type checking for empty HDF5 datasets diff --git a/docs/serialization.md b/docs/serialization.md index 5812f30..ccad606 100644 --- a/docs/serialization.md +++ b/docs/serialization.md @@ -110,14 +110,17 @@ as `int` ({class}`numpy.int64`) or `float` ({class}`numpy.float64`) ## Roundtripping To roundtrip make arrays round-trippable, use the `round_trip` argument -to {func}`~pydantic.BaseModel.model_dump_json` +to {func}`~pydantic.BaseModel.model_dump_json`. +All the following should return an equivalent array from the same +file/etc. as the source array when using +`{func}`~pydantic.BaseModel.model_validate_json`` . ```{code-cell} print_json(model.model_dump_json(round_trip=True)) ``` -Each interface should[^notenforced] implement a dataclass that describes a +Each interface must implement a dataclass that describes a json-able roundtrip form (see {class}`.interface.JsonDict`). That dataclass then has a {meth}`JsonDict.is_valid` method that checks @@ -220,12 +223,34 @@ print_json( )) ``` +When an array marked with the interface is deserialized, +it short-circuits the {meth}`.Interface.match` method, +attempting to directly return the indicated interface as long as the +array dumped in `value` still satisfies that interface's {meth}`.Interface.check` +method. Arrays dumped *without* `round_trip=True` might *not* validate with +the originating model, even when marked -- eg. an array dumped without `round_trip` +will be revalidated as a numpy array for the same reasons it is everywhere else, +since all connection to the source file is lost. + +```{todo} +Currently, the version of the package the interface is from (usually `numpydantic`) +will be stored, but there is no means of resolving it on the fly. +If there is a mismatch between the marked interface description and the interface +that was matched on revalidation, a warning is emitted, but validation +attempts to proceed as normal. + +This feature is for extra-verbose provenance, rather than airtight serialization +and deserialization, but PRs welcome if you would like to make it be that way. +``` + ```{todo} We will also add a separate `mark_version` parameter for marking the specific version of the relevant interface package, like `zarr`, or `numpy`, patience. ``` + + ## Context parameters A reference listing of all the things that can be passed to @@ -305,9 +330,3 @@ print_json(data) [^normalstyle]: o ya we're posting JSON [normal style](https://normal.style) -[^notenforced]: This is only *functionally* enforced at the moment, where - a roundtrip test confirms that dtype and type are preserved, - but there is no formal test for each interface having its own serialization class - - - diff --git a/pyproject.toml b/pyproject.toml index a53ef1a..0e6926b 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "numpydantic" -version = "1.5.3" +version = "1.6.0" description = "Type and shape validation and serialization for arbitrary array types in pydantic models" authors = [ {name = "sneakers-the-rat", email = "sneakers-the-rat@protonmail.com"}, diff --git a/tests/conftest.py b/tests/conftest.py index 0467f25..c9035f4 100644 --- a/tests/conftest.py +++ b/tests/conftest.py @@ -1,4 +1,3 @@ -import pdb import sys import pytest diff --git a/tests/test_ndarray.py b/tests/test_ndarray.py index 7be03bb..cda092c 100644 --- a/tests/test_ndarray.py +++ b/tests/test_ndarray.py @@ -1,5 +1,3 @@ -import pdb - import pytest from typing import Union, Optional, Any diff --git a/tests/test_serialization.py b/tests/test_serialization.py index dc0ef06..702dc1a 100644 --- a/tests/test_serialization.py +++ b/tests/test_serialization.py @@ -3,8 +3,6 @@ Test serialization-specific functionality that doesn't need to be applied across every interface (use test_interface/test_interfaces for that """ -import pdb - import h5py import pytest from pathlib import Path diff --git a/tests/test_shape.py b/tests/test_shape.py index 03de477..b521054 100644 --- a/tests/test_shape.py +++ b/tests/test_shape.py @@ -1,5 +1,3 @@ -import pdb - import pytest from typing import Any