• Hexarei@beehaw.org
    link
    fedilink
    arrow-up
    2
    ·
    2 days ago

    run a local LLM like Claude!

    Look inside

    “Run ollama”

    Ollama will almost always be slower than running vllm or llama.cpp, nobody should be suggesting it for anything agentic. On most consumer hardware, the availability of llama.cpp’s --cpu-moe flag alone is absurdly good and worth the effort to familiarize yourself with llamacpp instead of ollama.

    • Quibblekrust@thelemmy.club
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      11 hours ago

      –cpu-moe

      AI Acknowledgement

      The joke is worth the slop, imo. “Cpu Moe”. 😂 Find me an anime drawing of a CPU (especially an iconic one) and I’ll use that instead.

    • ctrl_alt_esc@lemmy.ml
      link
      fedilink
      arrow-up
      1
      ·
      2 days ago

      I have used Ollama so far and it’s indeed quite slow, can you recommend a good guide for setting up llama.cpp (on linux). I have Ollama running in a docker container with openwebui, that kind of setup would be ideal.

      • Hexarei@beehaw.org
        link
        fedilink
        arrow-up
        2
        ·
        2 days ago

        I just run the llama-swap docker container with a config file mounted, set to listen for config changes so I don’t have to restart it to add new models. I don’t have a guide besides the README for llama-swap.