A production-minded FastAPI sidecar for serving Gemma 4 31B on vLLM with Gemma 4 Multi-Token Prediction (MTP) speculative decoding. It keeps the raw vllm serve process private and adds ...