Bingyang Wu
Bingyang Wu
Light
Dark
Automatic
Zili Zhang
Latest
TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving
Fast Distributed Inference Serving for Large Language Models
Optimizing RLHF Training for Large Language Models with Stage Fusion
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving
Transparent GPU Sharing in Container Clouds for Deep Learning Workloads
Cite
×