-
Notifications
You must be signed in to change notification settings - Fork 31.4k
enable cpu paged cache #42869
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
enable cpu paged cache #42869
Conversation
|
Hi @jiqing-feng , thanks for the contribution! Just letting you know that CPU-compatible continuous batching is not a priority right now, so even though this PR is small, it will not be reviewed right away. I am cautious about two things:
Will get to review this as soon as I have the bandwidth, thanks you! |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
|
4ed8d51 to
2a5e941
Compare
|
View the CircleCI Test Summary for this PR: https://huggingface.co/spaces/transformers-community/circle-ci-viz?pr=42869&sha=2a5e94 |
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
Signed-off-by: jiqing-feng <jiqing.feng@intel.com>
CPU can also use paged cache with eager or sdpa:
python continuous_batching_simple.py --attn sdpaWithout this change, the previous command error would be like: