With Speculative Sampling (SSp), a large language model can generate tokens quite faster using a smaller model as help. This repo shows how it's done and measures timing improvements using Llama ...