Ollama Guide: The Easiest Local LLM Setup for SillyTavern & MiniTavern (2026)

If you want local LLM roleplay without wrestling with GPU layer sliders or portable binaries, Ollama is usually the first recommendation in the SillyTavern community. Install once, run ollama pull, connect SillyTavern or MiniTavern to http://localhost:11434, and your character cards stay private—no OpenAI account, no per-token bill.

This guide covers what Ollama is, key terminology, setup on desktop and LAN/mobile, and how it compares to KoboldCpp and LM Studio in 2026.

What Is Ollama?

Ollama is a local LLM runtime and model manager. It downloads open-weight models, keeps them in a local library, runs a background daemon on port 11434, and exposes:

Native Ollama API — http://localhost:11434/api/
OpenAI-compatible API — http://localhost:11434/v1/ (chat/completions)

Under the hood Ollama uses llama.cpp (and related runtimes) with a developer-friendly CLI. It is available on macOS, Linux, and Windows.

Unlike cloud APIs, your prompts, character card text, World Info, and chat history are processed only on hardware you control.

Key Ollama Terminology

Term	Meaning
ollama pull	Download a model from the Ollama library (e.g. `ollama pull llama3.1`)
ollama run	Load a model interactively in the terminal for quick tests
ollama serve	Start or confirm the background API server (often auto-starts on install)
ollama list	Show models stored locally
ollama ps	Show currently loaded/running models
Modelfile	Recipe to create custom models (system prompt, parameters, base model)
OLLAMA_HOST	Environment variable to bind the server to LAN (e.g. `0.0.0.0:11434`)
Model library	Curated tags at ollama.com/library
Context length	Set per model variant; larger context needs more RAM/VRAM

Why Tavern Users Pick Ollama

Lowest friction — two commands (pull + connect ST) vs manual GGUF hunting.
Native SillyTavern connector — API dropdown includes Ollama out of the box.
Privacy by default — no data leaves your machine unless you expose the port.
Great for beginners — ideal first local backend before graduating to KoboldCpp tuning or LM Studio GUI.

MiniTavern users on the same Wi-Fi can point the Multi-Model Hub at http://192.168.x.x:11434/v1. For encrypted remote access from outside the home, pair Ollama on LAN with a VPN—or compare LM Studio LM Link and KoboldCpp Remote Tunnel.

Prerequisites

OS: macOS, Linux, or Windows 10+.
RAM: 16 GB recommended; 8 GB works for 3B–7B quants.
GPU (optional): Apple Silicon, NVIDIA, or AMD—Ollama auto-accelerates when possible.
SillyTavern or MiniTavern with character cards (Card Quest Market or Chrome extension).

Step 1: Install Ollama

macOS / Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows: download the installer from ollama.com.

Verify:

ollama --version

The Ollama app icon (macOS/Windows) or systemd service (Linux) keeps the daemon running.

Step 2: Pull a Roleplay-Friendly Model

ollama pull llama3.1:8b
ollama pull mistral
ollama pull qwen2.5:7b

Quick test:

ollama run llama3.1:8b

Type a message; if you get a reply, the runtime works. /bye to exit.

Model tag	VRAM/RAM hint	RP notes
`llama3.1:8b`	~8 GB	Strong instruction following
`mistral`	~6 GB	Fast, classic choice
`qwen2.5:7b`	~6 GB	Good for multilingual cards
`gemma2:9b`	~8 GB	Natural dialogue

Avoid tiny models (<3B) for complex character cards and World Info.

Step 3: Confirm the API Server

Default endpoint:

http://localhost:11434

Check running models:

ollama ps

List library:

ollama list

OpenAI-compatible test:

curl http://localhost:11434/v1/models

Step 4: Connect SillyTavern (Ollama — Recommended)

Open SillyTavern → plug icon → API Connections.
API: select Ollama (or Chat Completion with Ollama source, depending on ST version).
Server URL: http://localhost:11434 (or http://127.0.0.1:11434).
Connect — pick the model you pulled.
Import a character card → send a greeting.

Local RP tips:

Shorten verbose system prompts for 7B–8B models.
Set context 4096–8192 in ST if the model supports it.
Temperature 0.7–0.9 for character play.
More tuning: local LLM privacy guide.

Alternative: OpenAI-Compatible Mode

API: Chat Completion.
Source: Custom (OpenAI-compatible).
Base URL: http://localhost:11434/v1.
Connect — useful for presets expecting chat message arrays.

Step 5: MiniTavern on Phone (Same Wi-Fi)

Ollama binds to localhost by default. For phone access on your LAN:

macOS / Linux:

export OLLAMA_HOST=0.0.0.0:11434
ollama serve

Windows: set environment variable OLLAMA_HOST=0.0.0.0:11434 and restart Ollama.

Then in MiniTavern:

Find PC IP (e.g. 192.168.1.60).
Custom endpoint: http://192.168.1.60:11434/v1.
Allow firewall port 11434 on the PC.

Workflow: Character Card Market → Chrome Extension → MiniTavern iOS/Android.

Security note: binding to 0.0.0.0 exposes Ollama to your LAN—do not port-forward to the public internet without authentication.

Optional: Custom Model with Modelfile

Create my-rp-model.Modelfile:

FROM llama3.1:8b
PARAMETER temperature 0.8
SYSTEM You are a concise roleplay assistant. Stay in character.

Build and run:

ollama create my-rp -f my-rp-model.Modelfile
ollama run my-rp

Select my-rp in SillyTavern after connecting.

Troubleshooting

Issue	Fix
Connection refused	Start Ollama app / `ollama serve`; check port 11434
Model not in ST list	Run `ollama pull <name>` first; reconnect
Slow replies	Smaller model or ensure GPU acceleration (Apple Silicon / NVIDIA)
Out of memory	Use smaller tag (`:7b` not `:70b`) or close other GPU apps
Phone cannot connect	Set `OLLAMA_HOST=0.0.0.0:11434`; check firewall
OOC / format breaks	Match ST preset to chat model; shorten card prompt

Ollama vs KoboldCpp vs LM Studio

	Ollama	KoboldCpp	LM Studio
Setup ease	Easiest (CLI pull)	Portable binary + GGUF file	Desktop GUI + catalog
Default port	11434	5001	1234
ST connector	Ollama native	KoboldCpp / Text Completion	KoboldAI / OpenAI
GPU tuning	Automatic / limited	Deep (GPU layers)	GUI-friendly
Remote mobile	LAN + VPN	Remote Tunnel	LM Link (Tailscale)
Best for	Beginners, quick start	Power users, fine VRAM control	Browse models + LM Link

Many users start with Ollama, then move heavy 14B+ workloads to KoboldCpp or a home PC with LM Studio.

Privacy Best Practices

Disable cloud API fallbacks in SillyTavern/MiniTavern.
Do not expose port 11434 to the open internet.
Pull models only from Ollama library or trusted Modelfiles.
Keep Ollama updated (ollama --version / reinstall).
Encrypt sensitive character PNGs if storing personal lore locally.

Conclusion

Ollama is the fastest on-ramp to private character-card roleplay with SillyTavern and MiniTavern in 2026: install, ollama pull, connect port 11434, play. No cloud keys, no usage caps—just your models and your cards.

Ready to start? Pull a model, browse the Character Card Market, install MiniTavern for mobile, and point your API connector at localhost:11434.

Ollama Guide: The Easiest Local LLM Setup for SillyTavern & MiniTavern (2026)

Ollama Guide: The Easiest Local LLM Setup for SillyTavern & MiniTavern (2026)

What Is Ollama?

Key Ollama Terminology

Why Tavern Users Pick Ollama

Prerequisites

Step 1: Install Ollama

Step 2: Pull a Roleplay-Friendly Model

Step 3: Confirm the API Server

Step 4: Connect SillyTavern (Ollama — Recommended)

Alternative: OpenAI-Compatible Mode

Step 5: MiniTavern on Phone (Same Wi-Fi)

Optional: Custom Model with Modelfile

Troubleshooting

Ollama vs KoboldCpp vs LM Studio

Privacy Best Practices

Conclusion

SillyTavern Character Cards with Google Gemini: Setup, Optimization, and Best Practices in 2026

How to Use SillyTavern Character Cards with Local LLMs: A Complete Privacy-Focused Guide for 2026

How to Create SillyTavern Character Cards for Free: Top Online Generators in 2026

Ollama Guide: The Easiest Local LLM Setup for SillyTavern & MiniTavern (2026)

What Is Ollama?

Key Ollama Terminology

Why Tavern Users Pick Ollama

Prerequisites

Step 1: Install Ollama

Step 2: Pull a Roleplay-Friendly Model

Step 3: Confirm the API Server

Step 4: Connect SillyTavern (Ollama — Recommended)

Alternative: OpenAI-Compatible Mode

Step 5: MiniTavern on Phone (Same Wi-Fi)

Optional: Custom Model with Modelfile

Troubleshooting

Ollama vs KoboldCpp vs LM Studio

Privacy Best Practices

Conclusion

Keep reading

SillyTavern Character Cards with Google Gemini: Setup, Optimization, and Best Practices in 2026

How to Use SillyTavern Character Cards with Local LLMs: A Complete Privacy-Focused Guide for 2026

How to Create SillyTavern Character Cards for Free: Top Online Generators in 2026