The Llama.cpp Provider supports querying local Llama.cpp models for prompt-based interactions. Make sure you have Llama.cpp server running locally with your desired model.

Cloud Limitation

This provider is disabled for cloud environments and can only be used in local or self-hosted environments.

Authentication

This provider requires authentication.

  • host: Llama.cpp Server Host URL (required: True, sensitive: False)

In workflows

This provider can be used in workflows.

As “step” to query data, example:

steps:
    - name: Query llamacpp
      provider: llamacpp
      config: "{{ provider.my_provider_name }}"
      with:
        prompt: {value}  
        max_tokens: {value}  

If you need workflow examples with this provider, please raise a GitHub issue.

Connecting with the Provider

To use the Llama.cpp Provider:

  1. Install Llama.cpp on your system
  2. Download or convert your model to GGUF format
  3. Start the Llama.cpp server with HTTP interface:
    ./server --model /path/to/your/model.gguf --host 0.0.0.0 --port 8080
    
  4. Configure the host URL and model path in your Keep configuration

Prerequisites

  • Llama.cpp must be installed and compiled with server support
  • A GGUF format model file must be available on your system
  • The Llama.cpp server must be running and accessible
  • The server must have sufficient resources to load and run your model

Model Compatibility

The provider works with any GGUF format model compatible with Llama.cpp, including:

  • LLaMA and LLaMA-2 models
  • Mistral models
  • OpenLLaMA models
  • Vicuna models
  • And other compatible model architectures

Make sure your model is in GGUF format before using it with the provider.