Bingyang Wu
Bingyang Wu
About Me
Publications
Light
Dark
Automatic
Xuanzhe Liu
Latest
LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism
dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving
A Survey of Resource-efficient LLM and Multimodal Foundation Models
XRON: A Hybrid Elastic Cloud Overlay Network for Video Conferencing at Planetary Scale
Fast Distributed Inference Serving for Large Language Models
Transparent GPU Sharing in Container Clouds for Deep Learning Workloads
Cite
×