-
Notifications
You must be signed in to change notification settings - Fork 561
Support Mooncake migration backend for PD disaggregation #3620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@Risc-lt You can use tools like ruff or yapf to automatically fix code formatting and linting issues. |
The linting issue can be resolved by the following:
Make sure that the python version is 3.10 |
"""Initialize p2p connection for this specific link.""" | ||
# TODO: Support more types of metadata_server | ||
# e.g. "etcd://192.168.0.137:2379" | ||
metadata_server = 'P2PHANDSHAKE' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is metadata_server used for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two modes: (1) 'P2PHANDSHAKE' (a magic string): no metadata server for maintaining connection information, which is intended for small-scale PD disaggregation, and (2) support for etcd/redis/http_server as the centralized server for larger-scale PD disaggregation.
try: | ||
from mooncake.engine import TransferEngine | ||
except ImportError as e: | ||
raise ImportError('Please install mooncake by following the instructions at ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When passing --migration-backend Mooncake, it's better to raise an import error immediately if Mooncake is not installed during API server launch.
Can we put it in the constructor of MooncakeBackend?
Having solved the problems above. NVLink support will be covered in next pr. cc @stmatengss @lvhan028 @JimyMa |
PD-Disaggregated KVCache Transfer Pipeline with Mooncake
This PR introduces a new implementation of the Prefill-Decode disaggregated KVCache transfer pipeline with LMDeploy, using native Mooncake components of transfer engine as an option other than
dlslime
. The goal is to enable disaggregated prefill/decode workloads across nodes for large-scale LLM inference, inspired by lmdeploy-distserve. #3304 (comment)Architecture Overview
Interfaces
The Mooncake migration backend implementation expose interfaces below:
p2p_initialize
: Notify Prefill & Decode Engines to initilize migration backend instance of Mooncake transfer engine.register_memory_region
: Register memory region for the connectionendpoint_info
: Return local memry pool and endpoint configuartion info.p2p_connect
: Recieve endpoint infomation from the other side of connecting nodes.p2p_migrate
: Set up conection for prefill-decode nodes and transfer kvcache synchronously in read mode.Control Plane
Proxy server firstly use FastAPI post to send the endpoint info to notify the prefill-decode servers to send their local endpoint info to the other one through TCP socket. After p2p-connection is established, Mooncake migration backend start to transfer kvcache through RDMA link.
Workflow
Current Status
Next Steps
How to Build
pip install mooncake-transfer-engine pip install -v -e .
How to Run
Start Proxy
Start Prefill Engine
Start Decode Engine
Client Side