← Back to blog

Ollama Guide: The Easiest Local LLM Setup for SillyTavern & MiniTavern (2026)

Ollama runs open-weight models with one command and exposes an OpenAI-compatible API on port 11434—the fastest path to private SillyTavern and MiniTavern character-card roleplay without cloud keys.

Published
  • ollama
  • local llm
  • privacy
  • sillytavern
  • minitavern
  • tutorial

Ollama Guide: The Easiest Local LLM Setup for SillyTavern & MiniTavern (2026)

If you want local LLM roleplay without wrestling with GPU layer sliders or portable binaries, Ollama is usually the first recommendation in the SillyTavern community. Install once, run ollama pull, connect SillyTavern or MiniTavern to http://localhost:11434, and your character cards stay private—no OpenAI account, no per-token bill.

This guide covers what Ollama is, key terminology, setup on desktop and LAN/mobile, and how it compares to KoboldCpp and LM Studio in 2026.

What Is Ollama?

Ollama is a local LLM runtime and model manager. It downloads open-weight models, keeps them in a local library, runs a background daemon on port 11434, and exposes:

  • Native Ollama APIhttp://localhost:11434/api/
  • OpenAI-compatible APIhttp://localhost:11434/v1/ (chat/completions)

Under the hood Ollama uses llama.cpp (and related runtimes) with a developer-friendly CLI. It is available on macOS, Linux, and Windows.

Unlike cloud APIs, your prompts, character card text, World Info, and chat history are processed only on hardware you control.

Key Ollama Terminology

TermMeaning
ollama pullDownload a model from the Ollama library (e.g. ollama pull llama3.1)
ollama runLoad a model interactively in the terminal for quick tests
ollama serveStart or confirm the background API server (often auto-starts on install)
ollama listShow models stored locally
ollama psShow currently loaded/running models
ModelfileRecipe to create custom models (system prompt, parameters, base model)
OLLAMA_HOSTEnvironment variable to bind the server to LAN (e.g. 0.0.0.0:11434)
Model libraryCurated tags at ollama.com/library
Context lengthSet per model variant; larger context needs more RAM/VRAM

Why Tavern Users Pick Ollama

  1. Lowest friction — two commands (pull + connect ST) vs manual GGUF hunting.
  2. Native SillyTavern connector — API dropdown includes Ollama out of the box.
  3. Privacy by default — no data leaves your machine unless you expose the port.
  4. Great for beginners — ideal first local backend before graduating to KoboldCpp tuning or LM Studio GUI.

MiniTavern users on the same Wi-Fi can point the Multi-Model Hub at http://192.168.x.x:11434/v1. For encrypted remote access from outside the home, pair Ollama on LAN with a VPN—or compare LM Studio LM Link and KoboldCpp Remote Tunnel.

Prerequisites

  • OS: macOS, Linux, or Windows 10+.
  • RAM: 16 GB recommended; 8 GB works for 3B–7B quants.
  • GPU (optional): Apple Silicon, NVIDIA, or AMD—Ollama auto-accelerates when possible.
  • SillyTavern or MiniTavern with character cards (Card Quest Market or Chrome extension).

Step 1: Install Ollama

macOS / Linux:

curl -fsSL https://ollama.com/install.sh | sh

Windows: download the installer from ollama.com.

Verify:

ollama --version

The Ollama app icon (macOS/Windows) or systemd service (Linux) keeps the daemon running.

Step 2: Pull a Roleplay-Friendly Model

ollama pull llama3.1:8b
ollama pull mistral
ollama pull qwen2.5:7b

Quick test:

ollama run llama3.1:8b

Type a message; if you get a reply, the runtime works. /bye to exit.

Model tagVRAM/RAM hintRP notes
llama3.1:8b~8 GBStrong instruction following
mistral~6 GBFast, classic choice
qwen2.5:7b~6 GBGood for multilingual cards
gemma2:9b~8 GBNatural dialogue

Avoid tiny models (<3B) for complex character cards and World Info.

Step 3: Confirm the API Server

Default endpoint:

http://localhost:11434

Check running models:

ollama ps

List library:

ollama list

OpenAI-compatible test:

curl http://localhost:11434/v1/models
  1. Open SillyTavern → plug iconAPI Connections.
  2. API: select Ollama (or Chat Completion with Ollama source, depending on ST version).
  3. Server URL: http://localhost:11434 (or http://127.0.0.1:11434).
  4. Connect — pick the model you pulled.
  5. Import a character card → send a greeting.

Local RP tips:

  • Shorten verbose system prompts for 7B–8B models.
  • Set context 4096–8192 in ST if the model supports it.
  • Temperature 0.7–0.9 for character play.
  • More tuning: local LLM privacy guide.

Alternative: OpenAI-Compatible Mode

  1. API: Chat Completion.
  2. Source: Custom (OpenAI-compatible).
  3. Base URL: http://localhost:11434/v1.
  4. Connect — useful for presets expecting chat message arrays.

Step 5: MiniTavern on Phone (Same Wi-Fi)

Ollama binds to localhost by default. For phone access on your LAN:

macOS / Linux:

export OLLAMA_HOST=0.0.0.0:11434
ollama serve

Windows: set environment variable OLLAMA_HOST=0.0.0.0:11434 and restart Ollama.

Then in MiniTavern:

  1. Find PC IP (e.g. 192.168.1.60).
  2. Custom endpoint: http://192.168.1.60:11434/v1.
  3. Allow firewall port 11434 on the PC.

Workflow: Character Card MarketChrome Extension → MiniTavern iOS/Android.

Security note: binding to 0.0.0.0 exposes Ollama to your LAN—do not port-forward to the public internet without authentication.

Optional: Custom Model with Modelfile

Create my-rp-model.Modelfile:

FROM llama3.1:8b
PARAMETER temperature 0.8
SYSTEM You are a concise roleplay assistant. Stay in character.

Build and run:

ollama create my-rp -f my-rp-model.Modelfile
ollama run my-rp

Select my-rp in SillyTavern after connecting.

Troubleshooting

IssueFix
Connection refusedStart Ollama app / ollama serve; check port 11434
Model not in ST listRun ollama pull <name> first; reconnect
Slow repliesSmaller model or ensure GPU acceleration (Apple Silicon / NVIDIA)
Out of memoryUse smaller tag (:7b not :70b) or close other GPU apps
Phone cannot connectSet OLLAMA_HOST=0.0.0.0:11434; check firewall
OOC / format breaksMatch ST preset to chat model; shorten card prompt

Ollama vs KoboldCpp vs LM Studio

OllamaKoboldCppLM Studio
Setup easeEasiest (CLI pull)Portable binary + GGUF fileDesktop GUI + catalog
Default port1143450011234
ST connectorOllama nativeKoboldCpp / Text CompletionKoboldAI / OpenAI
GPU tuningAutomatic / limitedDeep (GPU layers)GUI-friendly
Remote mobileLAN + VPNRemote TunnelLM Link (Tailscale)
Best forBeginners, quick startPower users, fine VRAM controlBrowse models + LM Link

Many users start with Ollama, then move heavy 14B+ workloads to KoboldCpp or a home PC with LM Studio.

Privacy Best Practices

  1. Disable cloud API fallbacks in SillyTavern/MiniTavern.
  2. Do not expose port 11434 to the open internet.
  3. Pull models only from Ollama library or trusted Modelfiles.
  4. Keep Ollama updated (ollama --version / reinstall).
  5. Encrypt sensitive character PNGs if storing personal lore locally.

Conclusion

Ollama is the fastest on-ramp to private character-card roleplay with SillyTavern and MiniTavern in 2026: install, ollama pull, connect port 11434, play. No cloud keys, no usage caps—just your models and your cards.

Ready to start? Pull a model, browse the Character Card Market, install MiniTavern for mobile, and point your API connector at localhost:11434.

More guides you might like