changelog, bump version, remove pdb

This commit is contained in:
sneakers-the-rat 2024-09-23 18:15:10 -07:00
parent 16b0eb0542
commit 0a175d17c0
Signed by untrusted user who does not match committer: jonny
GPG key ID: 6DCB96EF1E4D232D
7 changed files with 95 additions and 16 deletions

View file

@ -2,6 +2,73 @@
## 1.*
### 1.6.*
#### 1.6.0 - 24-09-23 - Roundtrip JSON Serialization
Roundtrip JSON serialization is here - with serialization to list of lists,
as well as file references that don't require copying the whole array if
used in data modeling, control over path relativization, and stamping of
interface version for the extra provenance conscious.
Please see [serialization](./serialization.md) for narrative documentation :)
**Potentially Breaking Changes**
- See [development](./development.md) for a statement about API stability
- An additional {meth}`.Interface.deserialize` method has been added to
{meth}`.Interface.validate` - downstream users are not intended to override the
`validate method`, but if they have, then JSON deserialization will not work for them.
- `Interface` subclasses now require a `name` attribute, a short string identifier for that interface,
and a `json_model` that inherits from {class}`.interface.JsonDict`. Interfaces without
these attributes will not be able to be instantiated.
- {meth}`.Interface.to_json` is now an abstract method that all interfaces must define.
**Features**
- Roundtrip JSON serialization - by default dump to a list of list arrays, but
support the `round_trip` keyword in `model_dump_json` for provenance-preserving dumps
- JSON Schema generation has been separated from `core_schema` generation in {class}`.NDArray`.
Downstream interfaces can customize json schema generation without compromising ability to validate.
- All proxy classes must have an `__eq__` dunder method to compare equality -
in proxy classes, these compare equality of arguments, since the arrays that
are referenced on disk should be equal by definition. Direct array comparison
should use {func}`numpy.array_equal`
- Interfaces previously couldn't be instantiated without explicit shape and dtype arguments,
these have been given `Any` defaults.
- New {mod}`numpydantic.serialization` module to contain serialization logic.
**New Classes**
See the docstrings for descriptions of each class
- `MarkMismatchError` for when an array serialized with `mark_interface` doesn't match
the interface that's deserializing it
- {class}`.interface.InterfaceMark`
- {class}`.interface.MarkedJson`
- {class}`.interface.JsonDict`
- {class}`.dask.DaskJsonDict`
- {class}`.hdf5.H5JsonDict`
- {class}`.numpy.NumpyJsonDict`
- {class}`.video.VideoJsonDict`
- {class}`.zarr.ZarrJsonDict`
**Bugfix**
- [`#17`](https://github.com/p2p-ld/numpydantic/issues/17) - Arrays are re-validated as lists, rather than arrays
- Some proxy classes would fail to be serialized becauase they lacked an `__array__` method.
`__array__` methods have been added, and tests for coercing to an array to prevent regression.
- Some proxy classes lacked a `__name__` attribute, which caused failures to serialize
when the `__getattr__` methods attempted to pass it through. These have been added where needed.
**Docs**
- Add statement about versioning and API stability to [development](./development.md)
- Add docs for serialization!
- Remove stranded docs from hooks and monkeypatch
- Added `myst_nb` to docs dependencies for direct rendering of code and output
**Tests**
- Marks have been added for running subsets of the tests for a given interface,
package feature, etc.
- Tests for all the above functionality
### 1.5.*
#### 1.5.3 - 24-09-03 - Bugfix, type checking for empty HDF5 datasets

View file

@ -110,14 +110,17 @@ as `int` ({class}`numpy.int64`) or `float` ({class}`numpy.float64`)
## Roundtripping
To roundtrip make arrays round-trippable, use the `round_trip` argument
to {func}`~pydantic.BaseModel.model_dump_json`
to {func}`~pydantic.BaseModel.model_dump_json`.
All the following should return an equivalent array from the same
file/etc. as the source array when using
`{func}`~pydantic.BaseModel.model_validate_json`` .
```{code-cell}
print_json(model.model_dump_json(round_trip=True))
```
Each interface should[^notenforced] implement a dataclass that describes a
Each interface must implement a dataclass that describes a
json-able roundtrip form (see {class}`.interface.JsonDict`).
That dataclass then has a {meth}`JsonDict.is_valid` method that checks
@ -220,12 +223,34 @@ print_json(
))
```
When an array marked with the interface is deserialized,
it short-circuits the {meth}`.Interface.match` method,
attempting to directly return the indicated interface as long as the
array dumped in `value` still satisfies that interface's {meth}`.Interface.check`
method. Arrays dumped *without* `round_trip=True` might *not* validate with
the originating model, even when marked -- eg. an array dumped without `round_trip`
will be revalidated as a numpy array for the same reasons it is everywhere else,
since all connection to the source file is lost.
```{todo}
Currently, the version of the package the interface is from (usually `numpydantic`)
will be stored, but there is no means of resolving it on the fly.
If there is a mismatch between the marked interface description and the interface
that was matched on revalidation, a warning is emitted, but validation
attempts to proceed as normal.
This feature is for extra-verbose provenance, rather than airtight serialization
and deserialization, but PRs welcome if you would like to make it be that way.
```
```{todo}
We will also add a separate `mark_version` parameter for marking
the specific version of the relevant interface package, like `zarr`, or `numpy`,
patience.
```
## Context parameters
A reference listing of all the things that can be passed to
@ -305,9 +330,3 @@ print_json(data)
[^normalstyle]: o ya we're posting JSON [normal style](https://normal.style)
[^notenforced]: This is only *functionally* enforced at the moment, where
a roundtrip test confirms that dtype and type are preserved,
but there is no formal test for each interface having its own serialization class

View file

@ -1,6 +1,6 @@
[project]
name = "numpydantic"
version = "1.5.3"
version = "1.6.0"
description = "Type and shape validation and serialization for arbitrary array types in pydantic models"
authors = [
{name = "sneakers-the-rat", email = "sneakers-the-rat@protonmail.com"},

View file

@ -1,4 +1,3 @@
import pdb
import sys
import pytest

View file

@ -1,5 +1,3 @@
import pdb
import pytest
from typing import Union, Optional, Any

View file

@ -3,8 +3,6 @@ Test serialization-specific functionality that doesn't need to be
applied across every interface (use test_interface/test_interfaces for that
"""
import pdb
import h5py
import pytest
from pathlib import Path

View file

@ -1,5 +1,3 @@
import pdb
import pytest
from typing import Any