Skip to content

Parallelize the model download with the weight loading #42559

@LysandreJik

Description

@LysandreJik

Feature request

It could be cool to load each individual shard as it finishes being downloaded instead of waiting for all the shards to be downloaded before loading them.

Right now, with the dynamic weight loading, we see a significant improvement in model loading time; this could help push this further.

cc @Cyrilvallez and @ArthurZucker probably

Motivation

Speed!

Your contribution

I can test it out 😉

Metadata

Metadata

Assignees

No one assigned

    Labels

    Core: ModelingInternals of the library; Models.Feature requestRequest for a new feature

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions