Search

Bingyang Wu

Xuanzhe Liu

TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving
Fast Distributed Inference Serving for Large Language Models
LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism
dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving
A Survey of Resource-efficient LLM and Multimodal Foundation Models
XRON: A Hybrid Elastic Cloud Overlay Network for Video Conferencing at Planetary Scale
Transparent GPU Sharing in Container Clouds for Deep Learning Workloads

Published with Wowchemy — the free, open source website builder that empowers creators.