Add embedding models for RAG vector search. Only one model can be active at a time. Changing the active model requires re-indexing existing collections.
Chat Memory Embedding
Select which embedding model to use for indexing and recalling chat conversations. This is separate from the RAG embedding model above.
Choose from your configured embedding models above, or leave as default to use the active RAG model.
User Memories
Facts and preferences the AI has learned about you across conversations. These are used to personalize responses.
Select models to show in the chat model selector. Configure each provider below first, then add models here.
Add Model to Quick Access
Customize the AI assistant's behavior and personality by modifying the system prompt.
This is the instruction that shapes how the AI responds. The current date/time and available tools will be automatically appended.
This text will be automatically added to every message you send. Useful for persistent instructions like "use MCP tools" or "be concise".
Prompt Enhance (Magic Wand)
When enabled, the magic wand button will rewrite your prompt using prompt engineering best practices before sending.
Instructions that tell the AI how to rewrite your prompts. Customize this to match your workflow.
Choose which model rewrites your prompts. A fast, cheap model (e.g. GPT-4o-mini, Gemini Flash) works great here.
Performance Options
Context window size (tokens)
Max tokens to generate
How long to keep model loaded in memory. Longer = faster subsequent responses.
Get your API key from Google AI Studio
Used for embedding models and reranking. Get your key from Cohere Dashboard.
Define reusable instruction sets to shape AI behavior. Active skills are injected into every conversation โ similar to .cursorrules or copilot-instructions.md.
Create Skill
Priority controls injection order (lower = first). Skill instructions are appended to the system prompt for every message.
Import / Export
Import skills from .md files or export your skills for backup and sharing.
Add MCP Server
Tool Settings
Maximum characters from MCP tool results. Larger = more context but slower responses. (1,000 - 100,000)
Configure RAG (Retrieval Augmented Generation) to query your indexed documents using vector search.
When enabled, you can query your indexed documents for context-aware responses.
Connection
URL of your Qdrant vector database instance.
Configure embedding models in the Embedding Models tab.
Retrieval Settings
Number of chunks to retrieve (1-200)
Minimum similarity score (0-1)
Maximum characters of context to include in prompt (500-100,000).
Metadata field name used to categorize collections in the dropdown. Leave empty to show flat list.
Rerank (Optional)
Reranking improves retrieval quality by reordering initial search results based on semantic relevance. Requires a Cohere API key configured in the Cohere provider tab.
When enabled, search results are reranked for better relevance.
Select the Cohere rerank model to use.
Number of top results to keep after reranking.