Bingyang Wu
Bingyang Wu
About Me
Publications
Light
Dark
Automatic
1
LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism
The context window of large language models (LLMs) is rapidly increasing, leading to a huge variance in resource usage between …
Bingyang Wu
,
Shengyu Liu
,
Yinmin Zhong
,
Peng Sun
,
Xuanzhe Liu
,
Xin Jin
Cite
dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving
Low-rank adaptation (LoRA) is a popular approach to finetune pre-trained large language models (LLMs) to specific domains. This paper …
Bingyang Wu
,
Ruidong Zhu
,
Zili Zhang
,
Peng Sun
,
Xuanzhe Liu
,
Xin Jin
PDF
Cite
XRON: A Hybrid Elastic Cloud Overlay Network for Video Conferencing at Planetary Scale
Quality and cost are two key considerations for video conferencing services. Service providers face a dilemma when selecting network …
Bingyang Wu
,
Kun Qian
,
Bo Li
,
Yunfei Ma
,
Qi Zhang
,
Zhigang Jiang
,
Jiayu Zhao
,
Dennis Cai
,
Ennan Zhai
,
Xuanzhe Liu
,
Xin Jin
PDF
Cite
DOI
Transparent GPU Sharing in Container Clouds for Deep Learning Workloads
Containers are widely used for resource management in datacenters. A common practice to support deep learning (DL) training in …
Bingyang Wu
,
Zili Zhang
,
Zhihao Bai
,
Xuanzhe Liu
,
Xin Jin
PDF
Cite
AMOS: Enabling Automatic Mapping for Tensor Computations On Spatial Accelerators with Hardware Abstraction
Hardware specialization is a promising trend to sustain performance growth. Spatial hardware accelerators that employ specialized and …
Size Zheng
,
Renze Chen
,
Anjiang Wei
,
Yicheng Jin
,
Qin Han
,
Liqiang Lu
,
Bingyang Wu
,
Xiuhong Li
,
Shengen Yan
,
Yun Liang
PDF
Cite
DOI
Cite
×