Bingyang Wu
Bingyang Wu
Light
Dark
Automatic
Xuanzhe Liu
Latest
TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving
Fast Distributed Inference Serving for Large Language Models
LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism
dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving
A Survey of Resource-efficient LLM and Multimodal Foundation Models
XRON: A Hybrid Elastic Cloud Overlay Network for Video Conferencing at Planetary Scale
Transparent GPU Sharing in Container Clouds for Deep Learning Workloads
Cite
×