Simon Mo simon-mo

👋 I'm Simon.

Currently, I'm a PhD student at Berkeley Sky Computing Lab for machine learning system and cloud infrastructures. I am advised by Prof. Joseph Gonzalez and Prof. Ion Stoica.

My latest focus is building an end to end stack for LLM inference on your own infrastructure:

vLLM runs LLM inference efficiently.

Previous exploration includes:

Conex: builds, push, and pull containers fast.
SkyATC: orchestrate LLMs in multi-cloud and scaling them to zero.

I previously work on Model Serving System @anyscale.

Ray takes your Python code and scale it to thousands of cores.
Ray Serve empowers data scientists to own their end-to-end inference APIs.

Before Anyscale, I was a undergraduate researcher @ucbrise.

Publications:

Under submission: Optimizing LLM Queries in Relational Workloads
NSDI 2024: Cloudcast: High-Throughput, Cost-Aware Overlay Multicast in the Cloud plan the best network for cloud object store replications.
VLDB 2024: RALF: Accuracy-Aware Scheduling for Feature Store Maintenance proposes feature update in feature store can be a lot more efficient.
SoCC 2020: InferLine: ML Inference Pipeline Composition Framework studies how to optimize model serving pipelines.
VLDB 2020: Towards Scalable Dataframe Systems formalizes Pandas DataFrame.
SysML Workshop @ Neurips 2018: The OoO VLIW JIT Compiler for GPU Inference tries to multiplex many kernels on the same GPU.

Reach out to me: simon.mo at hey.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simon Mo simon-mo

Achievements

Achievements

Block or report simon-mo

👋 I'm Simon.

Pinned Loading