Ollama Guide: The Easiest Local LLM Setup for SillyTavern & MiniTavern (2026)
Ollama runs open-weight models with one command and exposes an OpenAI-compatible API on port 11434—the fastest path to private SillyTavern and MiniTavern character-card roleplay without cloud keys.
- ollama
- local llm
- privacy
- sillytavern
- minitavern
- tutorial
Ollama Guide: The Easiest Local LLM Setup for SillyTavern & MiniTavern (2026)
If you want local LLM roleplay without wrestling with GPU layer sliders or portable binaries, Ollama is usually the first recommendation in the SillyTavern community. Install once, run ollama pull, connect SillyTavern or MiniTavern to http://localhost:11434, and your character cards stay private—no OpenAI account, no per-token bill.
This guide covers what Ollama is, key terminology, setup on desktop and LAN/mobile, and how it compares to KoboldCpp and LM Studio in 2026.
What Is Ollama?
Ollama is a local LLM runtime and model manager. It downloads open-weight models, keeps them in a local library, runs a background daemon on port 11434, and exposes:
- Native Ollama API —
http://localhost:11434/api/ - OpenAI-compatible API —
http://localhost:11434/v1/(chat/completions)
Under the hood Ollama uses llama.cpp (and related runtimes) with a developer-friendly CLI. It is available on macOS, Linux, and Windows.
Unlike cloud APIs, your prompts, character card text, World Info, and chat history are processed only on hardware you control.
Key Ollama Terminology
| Term | Meaning |
|---|---|
| ollama pull | Download a model from the Ollama library (e.g. ollama pull llama3.1) |
| ollama run | Load a model interactively in the terminal for quick tests |
| ollama serve | Start or confirm the background API server (often auto-starts on install) |
| ollama list | Show models stored locally |
| ollama ps | Show currently loaded/running models |
| Modelfile | Recipe to create custom models (system prompt, parameters, base model) |
| OLLAMA_HOST | Environment variable to bind the server to LAN (e.g. 0.0.0.0:11434) |
| Model library | Curated tags at ollama.com/library |
| Context length | Set per model variant; larger context needs more RAM/VRAM |
Why Tavern Users Pick Ollama
- Lowest friction — two commands (
pull+ connect ST) vs manual GGUF hunting. - Native SillyTavern connector — API dropdown includes Ollama out of the box.
- Privacy by default — no data leaves your machine unless you expose the port.
- Great for beginners — ideal first local backend before graduating to KoboldCpp tuning or LM Studio GUI.
MiniTavern users on the same Wi-Fi can point the Multi-Model Hub at http://192.168.x.x:11434/v1. For encrypted remote access from outside the home, pair Ollama on LAN with a VPN—or compare LM Studio LM Link and KoboldCpp Remote Tunnel.
Prerequisites
- OS: macOS, Linux, or Windows 10+.
- RAM: 16 GB recommended; 8 GB works for 3B–7B quants.
- GPU (optional): Apple Silicon, NVIDIA, or AMD—Ollama auto-accelerates when possible.
- SillyTavern or MiniTavern with character cards (Card Quest Market or Chrome extension).
Step 1: Install Ollama
macOS / Linux:
curl -fsSL https://ollama.com/install.sh | sh
Windows: download the installer from ollama.com.
Verify:
ollama --version
The Ollama app icon (macOS/Windows) or systemd service (Linux) keeps the daemon running.
Step 2: Pull a Roleplay-Friendly Model
ollama pull llama3.1:8b
ollama pull mistral
ollama pull qwen2.5:7b
Quick test:
ollama run llama3.1:8b
Type a message; if you get a reply, the runtime works. /bye to exit.
| Model tag | VRAM/RAM hint | RP notes |
|---|---|---|
llama3.1:8b | ~8 GB | Strong instruction following |
mistral | ~6 GB | Fast, classic choice |
qwen2.5:7b | ~6 GB | Good for multilingual cards |
gemma2:9b | ~8 GB | Natural dialogue |
Avoid tiny models (<3B) for complex character cards and World Info.
Step 3: Confirm the API Server
Default endpoint:
http://localhost:11434
Check running models:
ollama ps
List library:
ollama list
OpenAI-compatible test:
curl http://localhost:11434/v1/models
Step 4: Connect SillyTavern (Ollama — Recommended)
- Open SillyTavern → plug icon → API Connections.
- API: select Ollama (or Chat Completion with Ollama source, depending on ST version).
- Server URL:
http://localhost:11434(orhttp://127.0.0.1:11434). - Connect — pick the model you pulled.
- Import a character card → send a greeting.
Local RP tips:
- Shorten verbose system prompts for 7B–8B models.
- Set context 4096–8192 in ST if the model supports it.
- Temperature 0.7–0.9 for character play.
- More tuning: local LLM privacy guide.
Alternative: OpenAI-Compatible Mode
- API: Chat Completion.
- Source: Custom (OpenAI-compatible).
- Base URL:
http://localhost:11434/v1. - Connect — useful for presets expecting chat message arrays.
Step 5: MiniTavern on Phone (Same Wi-Fi)
Ollama binds to localhost by default. For phone access on your LAN:
macOS / Linux:
export OLLAMA_HOST=0.0.0.0:11434
ollama serve
Windows: set environment variable OLLAMA_HOST=0.0.0.0:11434 and restart Ollama.
Then in MiniTavern:
- Find PC IP (e.g.
192.168.1.60). - Custom endpoint:
http://192.168.1.60:11434/v1. - Allow firewall port 11434 on the PC.
Workflow: Character Card Market → Chrome Extension → MiniTavern iOS/Android.
Security note: binding to 0.0.0.0 exposes Ollama to your LAN—do not port-forward to the public internet without authentication.
Optional: Custom Model with Modelfile
Create my-rp-model.Modelfile:
FROM llama3.1:8b
PARAMETER temperature 0.8
SYSTEM You are a concise roleplay assistant. Stay in character.
Build and run:
ollama create my-rp -f my-rp-model.Modelfile
ollama run my-rp
Select my-rp in SillyTavern after connecting.
Troubleshooting
| Issue | Fix |
|---|---|
| Connection refused | Start Ollama app / ollama serve; check port 11434 |
| Model not in ST list | Run ollama pull <name> first; reconnect |
| Slow replies | Smaller model or ensure GPU acceleration (Apple Silicon / NVIDIA) |
| Out of memory | Use smaller tag (:7b not :70b) or close other GPU apps |
| Phone cannot connect | Set OLLAMA_HOST=0.0.0.0:11434; check firewall |
| OOC / format breaks | Match ST preset to chat model; shorten card prompt |
Ollama vs KoboldCpp vs LM Studio
| Ollama | KoboldCpp | LM Studio | |
|---|---|---|---|
| Setup ease | Easiest (CLI pull) | Portable binary + GGUF file | Desktop GUI + catalog |
| Default port | 11434 | 5001 | 1234 |
| ST connector | Ollama native | KoboldCpp / Text Completion | KoboldAI / OpenAI |
| GPU tuning | Automatic / limited | Deep (GPU layers) | GUI-friendly |
| Remote mobile | LAN + VPN | Remote Tunnel | LM Link (Tailscale) |
| Best for | Beginners, quick start | Power users, fine VRAM control | Browse models + LM Link |
Many users start with Ollama, then move heavy 14B+ workloads to KoboldCpp or a home PC with LM Studio.
Privacy Best Practices
- Disable cloud API fallbacks in SillyTavern/MiniTavern.
- Do not expose port 11434 to the open internet.
- Pull models only from Ollama library or trusted Modelfiles.
- Keep Ollama updated (
ollama --version/ reinstall). - Encrypt sensitive character PNGs if storing personal lore locally.
Conclusion
Ollama is the fastest on-ramp to private character-card roleplay with SillyTavern and MiniTavern in 2026: install, ollama pull, connect port 11434, play. No cloud keys, no usage caps—just your models and your cards.
Ready to start? Pull a model, browse the Character Card Market, install MiniTavern for mobile, and point your API connector at localhost:11434.
Keep reading
More guides you might like
SillyTavern Character Cards with Google Gemini: Setup, Optimization, and Best Practices in 2026
The landscape of AIpowered roleplay has evolved dramatically, and in 2026, the combination of SillyTavern and Google Gemini offers one of the most versatil…
- sillytavern
- google gemini
- character cards
- setup guide
How to Use SillyTavern Character Cards with Local LLMs: A Complete Privacy-Focused Guide for 2026
In an era where data privacy is becoming a premium, the allure of running AI characters locally has never been stronger. SillyTavern, the beloved frontend…
- sillytavern
- local llm
- privacy
- character cards
How to Create SillyTavern Character Cards for Free: Top Online Generators in 2026
Creating engaging SillyTavern character cards is the heart of any immersive roleplay experience. Whether you're a seasoned storyteller or a newcomer, the a…
- character-card-generator
- free-tools
- sillytavern
- online-creator