A production-minded FastAPI sidecar for serving Gemma 4 31B on vLLM with Gemma 4 Multi-Token Prediction (MTP) speculative decoding. It keeps the raw vllm serve process private and adds ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results