Llama.cpp Provider
The Llama.cpp Provider allows for integrating locally running Llama.cpp models into Keep.
The Llama.cpp Provider supports querying local Llama.cpp models for prompt-based interactions. Make sure you have Llama.cpp server running locally with your desired model.
Cloud Limitation
This provider is disabled for cloud environments and can only be used in local or self-hosted environments.
Inputs
The Llama.cpp Provider supports the following inputs:
prompt
: Interact with Llama.cpp models by sending prompts and receiving responsesmax_tokens
: Limit amount of tokens returned by the model, default 1024
Outputs
Currently, the Llama.cpp Provider outputs the response from the model based on the prompt provided.
Authentication Parameters
The Llama.cpp Provider requires the following configuration parameters:
- host (required): The Llama.cpp server host URL, defaults to “http://localhost:8080”
Connecting with the Provider
To use the Llama.cpp Provider:
- Install Llama.cpp on your system
- Download or convert your model to GGUF format
- Start the Llama.cpp server with HTTP interface:
- Configure the host URL and model path in your Keep configuration
Prerequisites
- Llama.cpp must be installed and compiled with server support
- A GGUF format model file must be available on your system
- The Llama.cpp server must be running and accessible
- The server must have sufficient resources to load and run your model
Model Compatibility
The provider works with any GGUF format model compatible with Llama.cpp, including:
- LLaMA and LLaMA-2 models
- Mistral models
- OpenLLaMA models
- Vicuna models
- And other compatible model architectures
Make sure your model is in GGUF format before using it with the provider.