Skip to content

Stores Platform — Ecosystem Overview

Local LLM Connection

Local LLM Connection (Ollama)¶

How Ollama works with browser agents¶

Ollama runs locally on the user's machine, exposing a REST API at http://localhost:11434. Browser agents can connect to it for LLM features without any cloud dependency.

Setup (one-time, user side)¶

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Pull a model
ollama pull llama3.2

# Enable browser access (CORS)
OLLAMA_ORIGINS="*" ollama serve

Detection from agent¶

async function detectOllama() {
  try {
    const res = await fetch('http://localhost:11434/api/tags', {
      signal: AbortSignal.timeout(2000)
    });
    if (!res.ok) return null;
    const { models } = await res.json();
    return models; // [{ name: 'llama3.2', size: 2048000000, ... }]
  } catch {
    return null; // Ollama not running
  }
}

API usage (OpenAI-compatible)¶

// Chat completion
const response = await fetch('http://localhost:11434/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'llama3.2',
    messages: [
      { role: 'system', content: 'You are a helpful code reviewer.' },
      { role: 'user', content: `Review this code:\n${code}` }
    ],
    stream: true
  })
});

// Stream response
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  const chunk = decoder.decode(value);
  // Parse SSE chunks, update UI
}

SDK integration¶

// @freeagentstore/sdk
import { useOllama } from '@freeagentstore/sdk/hooks';

function MyAgent() {
  const { available, models, chat, generate } = useOllama();

  if (!available) {
    return <p>Install Ollama for enhanced AI features</p>;
  }

  return (
    <div>
      <p>Local models: {models.map(m => m.name).join(', ')}</p>
      <button onClick={() => chat('Explain this code...')}>
        Ask local LLM
      </button>
    </div>
  );
}

Enhancement patterns¶

Input → Whisper (browser, fast) → raw transcript
  → Ollama (local, smart) → cleaned + summarized transcript

Pattern 2: Ollama as fallback for WebGPU¶

if (navigator.gpu && modelFitsInVRAM) {
  // Use WebGPU (fastest, works offline)
  result = await webgpuModel.generate(input);
} else if (ollamaAvailable) {
  // Use Ollama (local, any model size)
  result = await ollama.chat(input);
} else {
  // WASM fallback (slow but works everywhere)
  result = await wasmModel.generate(input);
}

Pattern 3: Ollama for tasks too big for browser¶

Task	Browser model	Ollama enhancement
OCR	Florence-2 extracts text	LLM answers questions about document
Translation	NLLB translates	LLM adapts tone/formality
Code lint	ESLint finds issues	LLM explains + suggests fixes
Data analysis	Chart generation	LLM narrates insights
Image caption	CLIP/Florence describes	LLM writes alt text / SEO description

Privacy model¶

Component	Where data goes
Browser model (WebGPU/WASM)	Never leaves tab
Ollama	Never leaves machine
FreeAgentStore servers	Never see user data
HuggingFace CDN	Model download only (no user data)

This is the strongest privacy story in the market. No cloud AI provider sees any user data, ever.

Agents that benefit most from Ollama¶

Code Review Agent — ESLint catches syntax, Ollama catches logic issues
Document Q&A — OCR extracts text, Ollama answers questions
Writing Assistant — Grammar check in browser, style suggestions via Ollama
Data Narrator — Charts in browser, natural language insights via Ollama
Email Drafter — Templates in browser, personalization via Ollama
Study Buddy — Flashcards generated by Ollama from uploaded material