1.Exploiting Student Parallelism for Low-latency GPU Inference of BERT-like Models in Online Services
Wang, Weiyan, Jin, Y
More...
Wang, Weiyan, Jin, Yilun, Zhang, Yiming, Wei, Victor Junqiu, Tian, Han, Chen, Li, Xue, Jinbao, Tao, Yangyu, Wang, Di, Chen, Kai
Less
Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining[2154-817X],
Published 2025,
Volume 2,
Pages 3055-3066
收錄情况:
SCOPUS