Skip to content

Speed up results serialization #46

Open
@mkardas

Description

@mkardas

Describe a requested feature

I was running some performance tests and I noticed that checking if an object is pickable:

outputs = self.check_picklable(outputs)
takes a lot of time when the output is big (f.e., when a model returns a large logits tensor), because the whole object is being serialized into memory and then deserialized. I wonder what are the cases in which check_pickable helps, as dataclasses and ModelOutput should be as pickable as its dictionary representation.

If the check is still needed, I guess the code could be still sped up by modifying an object only on pickle failure. That would require some workarounds (perhaps overriding https://github.com/python/cpython/blob/9dc787ea96916552695e79397588fdfa68f22024/Lib/multiprocessing/queues.py#L275) so I want to make sure the check is still necessary, before giving it a shot. Another option is to always check for

if _is_dataclass_instance(obj) or isinstance(obj, ModelOutput):
_obj = asdict(obj)
_obj["orig_dataclass_type"] = obj.__class__
obj = _obj
and modify the object even if it's pickable, but that would remove custom fields added outside a definition of a given class.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions