Bingyang Wu
Bingyang Wu
About Me
Publications
Light
Dark
Automatic
Xin Jin
Latest
LoongServe: Efficiently Serving Long-Context Large Language Models with Elastic Sequence Parallelism
RLHFuse: Efficient RLHF Training for Large Language Models with Inter- and Intra-Stage Fusion
dLoRA: Dynamically Orchestrating Requests and Adapters for LoRA LLM Serving
A Survey of Resource-efficient LLM and Multimodal Foundation Models
XRON: A Hybrid Elastic Cloud Overlay Network for Video Conferencing at Planetary Scale
Fast Distributed Inference Serving for Large Language Models
Transparent GPU Sharing in Container Clouds for Deep Learning Workloads
Cite
×