A sparse attention framework for large language model decoding
A universal solution that allows any language model to be deployed natively on a diverse set of hardware backends and native applications.
Low-Latency, High-Performance LLM Serving
End-to-end compilation of ML applications with dynamic and irregular control flow and data structure accesses
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
Principled early-stopping approaches for hyperparameter optimization
Automatically Discovering Fast Parallelization Strategies for DNN Training
The Tensor Algebra SuperOptimizer for Deep Learning
A Scalable Tree Boosting System