The Llama.cpp Provider supports querying local Llama.cpp models for prompt-based interactions. Make sure you have Llama.cpp server running locally with your desired model.

Cloud Limitation

This provider is disabled for cloud environments and can only be used in local or self-hosted environments.

Inputs

The Llama.cpp Provider supports the following inputs:

  • prompt: Interact with Llama.cpp models by sending prompts and receiving responses
  • max_tokens: Limit amount of tokens returned by the model, default 1024

Outputs

Currently, the Llama.cpp Provider outputs the response from the model based on the prompt provided.

Authentication Parameters

The Llama.cpp Provider requires the following configuration parameters:

Connecting with the Provider

To use the Llama.cpp Provider:

  1. Install Llama.cpp on your system
  2. Download or convert your model to GGUF format
  3. Start the Llama.cpp server with HTTP interface:
    ./server --model /path/to/your/model.gguf --host 0.0.0.0 --port 8080
    
  4. Configure the host URL and model path in your Keep configuration

Prerequisites

  • Llama.cpp must be installed and compiled with server support
  • A GGUF format model file must be available on your system
  • The Llama.cpp server must be running and accessible
  • The server must have sufficient resources to load and run your model

Model Compatibility

The provider works with any GGUF format model compatible with Llama.cpp, including:

  • LLaMA and LLaMA-2 models
  • Mistral models
  • OpenLLaMA models
  • Vicuna models
  • And other compatible model architectures

Make sure your model is in GGUF format before using it with the provider.