Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
-
Updated
May 13, 2019 - Python
Learning Efficient Convolutional Networks through Network Slimming, In ICCV 2017.
Flux diffusion model implementation using quantized fp8 matmul & remaining layers use faster half precision accumulate, which is ~2x faster on consumer devices.
[NeurIPS'23] Speculative Decoding with Big Little Decoder
Implementation of the paper Fast Inference from Transformers via Speculative Decoding, Leviathan et al. 2023.
This is the official repo of "QuickLLaMA: Query-aware Inference Acceleration for Large Language Models"
Demo code for CVPR2023 paper "Sparsifiner: Learning Sparse Instance-Dependent Attention for Efficient Vision Transformers"
An implementation of the encoder-decoder transformer for SMILES-to-SMILES translation tasks with inference accelerated by speculative decoding
Fast Forward-Only Deep Neural Network Library for the Nao Robots
Verification of the effect of speculative decoding in Japanese.
Multilable fast inference classifiers (Ridge Regression and MLP) for NLPs with Sentence Embedder, K-Fold, Bootstrap and Boosting. NOTE: since the MLP (fully connected NN) Classifier was too heavy to be loaded, you can just compile it with the script.
Reproducibility Project for [NeurIPS'23] Speculative Decoding with Big Little Decoder
Add a description, image, and links to the fast-inference topic page so that developers can more easily learn about it.
To associate your repository with the fast-inference topic, visit your repo's landing page and select "manage topics."