Skip to content

Augmented pickle

Suppose you have some input data sources data_in on which you apply some process F parameterized by args:

data_out = F(data_in, args)

You want to serialize data_out, but also don't want to lose args, to preserve the exact setup that generated the output data.

Now suppose you want to inspect args for a particular data_out: - Saving both {"data": data_out, "args": args} may not be a viable solution, as data_out needs to be fully loaded into memory without actually needing it. - Saving data_out and args separately necessitates extra care to keep them tied together.

define a simple data format -- augmented pickle

Pickle both objects, but read body on-demand:

res = read_augmented_pickle("./data.apkl", get_body=True)

# get metadata (body is not loaded)
meta = next(res)

# query the generator again to get body (data)
data = next(res)

read_augmented_pickle(path: str | PathLike, get_body: bool) -> Iterable[Any]

Read an augmented pickle file containing metadata and body. Returns a generator that can be queried on-demand using next. If get_body is False, only metadata is yielded.

Source code in opskrift/augmented_pickle.py
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
def read_augmented_pickle(
    path: str | PathLike,
    get_body: bool,
) -> Iterable[Any]:
    """Read an augmented pickle file containing `metadata` and `body`.
    Returns a generator that can be queried on-demand using `next`.
    If `get_body` is False, only `metadata` is yielded.
    """
    with open(path, "rb") as fp:
        metadata = pickle.load(fp)

        yield metadata

        if not get_body:
            return

        body = pickle.load(fp)

    yield body

write_augmented_pickle(metadata: Any, body: Any, path: str | PathLike) -> None

Write an augmented pickle file containing metadata and body.

Source code in opskrift/augmented_pickle.py
38
39
40
41
42
43
44
45
46
def write_augmented_pickle(
    metadata: Any,
    body: Any,
    path: str | PathLike,
) -> None:
    """Write an augmented pickle file containing `metadata` and `body`."""
    with open(path, "wb") as fp:
        pickle.dump(metadata, fp)
        pickle.dump(body, fp)