Ollama will almost always be slower than running vllm or llama.cpp, nobody should be suggesting it for anything agentic. On most consumer hardware, the availability of llama.cpp’s --cpu-moe flag alone is absurdly good and worth the effort to familiarize yourself with llamacpp instead of ollama.
Ollama will almost always be slower than running vllm or llama.cpp, nobody should be suggesting it for anything agentic. On most consumer hardware, the availability of llama.cpp’s --cpu-moe flag alone is absurdly good and worth the effort to familiarize yourself with llamacpp instead of ollama.
AI Acknowledgement
The joke is worth the slop, imo. “Cpu Moe”. 😂 Find me an anime drawing of a CPU (especially an iconic one) and I’ll use that instead.
In your defense, I’ve thought the same joke every time I’ve seen it lol