Skip to content

Commit

Permalink
add new ppl and project
Browse files Browse the repository at this point in the history
  • Loading branch information
cylinbao committed Feb 13, 2024
1 parent 17a4c84 commit 169bb4f
Show file tree
Hide file tree
Showing 3 changed files with 56 additions and 0 deletions.
9 changes: 9 additions & 0 deletions _data/people.yml
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,15 @@ kanzhu:
webpage: "https://kanzhu.netlify.app/"
role: grad

kamahori:
display_name: "Keisuke Kamahori"
webpage: "https://kamahori.org/"
role: grad

yilegu:
display_name: "Yile (Michael) Gu"
webpage: "https://ikace.github.io/"
role: grad

# Alums
# The order is: PhD/postdoc by year; MS by year; BS by year
Expand Down
20 changes: 20 additions & 0 deletions _projects/fiddler.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
title: Fiddler

description: |
CPU-GPU Orchestration for Fast Inference of MoE Models
people:
- kamahori
- yilegu
- kanzhu
- baris

layout: project
last-updated: 2024-02-12
---

Fiddler is a fast inference system for LLMs based on Mixture-of-Experts (MoE) architecture at local devices.

- Preprint: [Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models](https://arxiv.org/abs/2402.07033)
- GitHub: [Fiddler](https://github.com/efeslab/fiddler)
27 changes: 27 additions & 0 deletions bib/pubs.bib
Original file line number Diff line number Diff line change
@@ -1,3 +1,30 @@
@misc{kamahori2024fiddler,
title={Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models},
author={Keisuke Kamahori and Yile Gu and Kan Zhu and Baris Kasikci},
year={2024},
eprint={2402.07033},
archivePrefix={arXiv},
primaryClass={cs.LG}
}

@misc{chen2023punica,
title={Punica: Multi-Tenant LoRA Serving},
author={Lequn Chen and Zihao Ye and Yongji Wu and Danyang Zhuo and Luis Ceze and Arvind Krishnamurthy},
year={2023},
eprint={2310.18547},
archivePrefix={arXiv},
primaryClass={cs.DC}
}

@misc{zhao2023atom,
title={Atom: Low-bit Quantization for Efficient and Accurate LLM Serving},
author={Yilong Zhao and Chien-Yu Lin and Kan Zhu and Zihao Ye and Lequn Chen and Size Zheng and Luis Ceze and Arvind Krishnamurthy and Tianqi Chen and Baris Kasikci},
year={2023},
eprint={2310.19102},
archivePrefix={arXiv},
primaryClass={cs.LG}
}

@inproceedings{sparsetir,
author = {Zihao Ye and
Ruihang Lai and
Expand Down

0 comments on commit 169bb4f

Please sign in to comment.