Skip to content

Local LLM Connection (Ollama)

How Ollama works with browser agents

Ollama runs locally on the user's machine, exposing a REST API at http://localhost:11434. Browser agents can connect to it for LLM features without any cloud dependency.

Setup (one-time, user side)

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama3.2

# Enable browser access (CORS)
OLLAMA_ORIGINS="*" ollama serve

Detection from agent

async function detectOllama() {
  try {
    const res = await fetch('http://localhost:11434/api/tags', {
      signal: AbortSignal.timeout(2000)
    });
    if (!res.ok) return null;
    const { models } = await res.json();
    return models; // [{ name: 'llama3.2', size: 2048000000, ... }]
  } catch {
    return null; // Ollama not running
  }
}

API usage (OpenAI-compatible)

// Chat completion
const response = await fetch('http://localhost:11434/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'llama3.2',
    messages: [
      { role: 'system', content: 'You are a helpful code reviewer.' },
      { role: 'user', content: `Review this code:\n${code}` }
    ],
    stream: true
  })
});

// Stream response
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const chunk = decoder.decode(value);
  // Parse SSE chunks, update UI
}

SDK integration

// @freeagentstore/sdk
import { useOllama } from '@freeagentstore/sdk/hooks';

function MyAgent() {
  const { available, models, chat, generate } = useOllama();

  if (!available) {
    return <p>Install Ollama for enhanced AI features</p>;
  }

  return (
    <div>
      <p>Local models: {models.map(m => m.name).join(', ')}</p>
      <button onClick={() => chat('Explain this code...')}>
        Ask local LLM
      </button>
    </div>
  );
}

Enhancement patterns

Pattern 1: Browser model + Ollama refinement

Input → Whisper (browser, fast) → raw transcript
  → Ollama (local, smart) → cleaned + summarized transcript

Pattern 2: Ollama as fallback for WebGPU

if (navigator.gpu && modelFitsInVRAM) {
  // Use WebGPU (fastest, works offline)
  result = await webgpuModel.generate(input);
} else if (ollamaAvailable) {
  // Use Ollama (local, any model size)
  result = await ollama.chat(input);
} else {
  // WASM fallback (slow but works everywhere)
  result = await wasmModel.generate(input);
}

Pattern 3: Ollama for tasks too big for browser

Task Browser model Ollama enhancement
OCR Florence-2 extracts text LLM answers questions about document
Translation NLLB translates LLM adapts tone/formality
Code lint ESLint finds issues LLM explains + suggests fixes
Data analysis Chart generation LLM narrates insights
Image caption CLIP/Florence describes LLM writes alt text / SEO description

Privacy model

Component Where data goes
Browser model (WebGPU/WASM) Never leaves tab
Ollama Never leaves machine
FreeAgentStore servers Never see user data
HuggingFace CDN Model download only (no user data)

This is the strongest privacy story in the market. No cloud AI provider sees any user data, ever.

Agents that benefit most from Ollama

  1. Code Review Agent — ESLint catches syntax, Ollama catches logic issues
  2. Document Q&A — OCR extracts text, Ollama answers questions
  3. Writing Assistant — Grammar check in browser, style suggestions via Ollama
  4. Data Narrator — Charts in browser, natural language insights via Ollama
  5. Email Drafter — Templates in browser, personalization via Ollama
  6. Study Buddy — Flashcards generated by Ollama from uploaded material