Bingyang Wu
Bingyang Wu
Light
Dark
Automatic
3
TokenLake: A Unified Segment-level Prefix Cache Pool for Fine-grained Elastic Long-Context LLM Serving
Prefix caching is crucial to accelerate multi-turn interactions and requests with shared prefixes. At the cluster level, existing …
Bingyang Wu
,
Zili Zhang
,
Yinmin Zhong
,
Guanzhe Huang
,
Yibo Zhu
,
Xuanzhe Liu
,
Xin Jin
PDF
Cite
DOI
Fast Distributed Inference Serving for Large Language Models
Large language models (LLMs) power a new generation of interactive AI applications exemplified by ChatGPT. The interactive nature of …
Bingyang Wu
,
Yinmin Zhong
,
Zili Zhang
,
Shengyu Liu
,
Fangyue Liu
,
Yuanhang Sun
,
Gang Huang
,
Xuanzhe Liu
,
Xin Jin
PDF
Cite
DOI
StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
Reinforcement learning (RL) has become the core post-training technique for large language models (LLMs). RL for LLMs involves two …
Yinmin Zhong
,
Zili Zhang
,
Xiaoniu Song
,
Hanpeng Hu
,
Chao Jin
,
Bingyang Wu
,
Nuo Chen
,
Yukun Chen
,
Yu Zhou
,
Changyi Wan
,
Hongyu Zhou
,
Yimin Jiang
,
Yibo Zhu
,
Daxin Jiang
PDF
Cite
DOI
A Survey of Resource-efficient LLM and Multimodal Foundation Models
Large foundation models, including large language models (LLMs), vision transformers (ViTs), diffusion, and LLM-based multimodal …
Mengwei Xu
,
Wangsong Yin
,
Dongqi Cai
,
Rongjie Yi
,
Daliang Xu
,
Qipeng Wang
,
Bingyang Wu
,
Yihao Zhao
,
Chen Yang
,
Shihe Wang
,
Qiyang Zhang
,
Zhenyan Lu
,
Li Zhang
,
Shangguang Wang
,
Yuanchun Li
,
Yunxin Liu
,
Xin Jin
,
Xuanzhe Liu
PDF
Cite
DOI
Cite
×