Announcing vLLM v0.12.0, Ministral 3 and DeepSeek-V3.2 for Docker Model Runner

December 8, 2025 · 441 words · 3 min

At Docker, we are committed to making the AI development experience as seamless as possible. Today,

At Docker, we are committed to making the AI development experience as seamless as possible. Today, we are thrilled to announce two major updates that bring state-of-the-art performance and frontier-class models directly to your fingertips: the immediate availability of and , alongside the release of on Docker Model Runner. Whether you are building high-throughput serving pipelines or experimenting with edge-optimized agents on your laptop, today’s updates are designed to accelerate your workflow. While vLLM powers your production infrastructure, we know that development needs speed and efficiency . That’s why we are proud to add Mistral AI’s newest marvel, , to the Docker Model Runner library on . Ministral 3 is Mistral AI’s premier edge model. It packs frontier-level reasoning and capabilities into a dense, efficient architecture designed specifically for local inference. It is perfect for: We are equally excited to introduce support for . Known for pushing the boundaries of what open-weights models can achieve, the DeepSeek-V3 series has quickly become a favorite for developers requiring high-level reasoning and coding proficiency. DeepSeek-V3.2 brings Mixture-of-Experts (MoE) architecture efficiency to your local environment, delivering performance that rivals top-tier closed models. It is the ideal choice for: With Docker Model Runner, you don’t need to worry about complex environment setups, python dependencies, or weight downloads. We’ve packaged both models so you can get started immediately. These commands automatically pull the model, set up the runtime, and drop you into an interactive chat session. You can also point your applications to them using our OpenAI-compatible local endpoint, making them drop-in replacements for your cloud API calls during development. We are excited to highlight the release of . vLLM has quickly become the gold standard for high-throughput and memory-efficient LLM serving, and this latest version raises the bar again. Version 0.12.0 brings critical enhancements to the engine, including: The combination of , , and represents the maturity of the open AI ecosystem. You now have access to a serving engine that maximizes data center performance, alongside a choice of models to fit your specific needs—whether you prioritize the edge-optimized speed of Ministral 3 or the deep reasoning power of DeepSeek-V3.2. All of this is easily accessible via Docker Model Runner. The strength of Docker Model Runner lies in its community, and there’s always room to grow. We need your help to make this project the best it can be. To get involved, you can: We’re incredibly excited about this new chapter for Docker Model Runner, and we can’t wait to see what we can build together. Let’s get to work!