Local LLM Connection (Ollama)¶
How Ollama works with browser agents¶
Ollama runs locally on the user's machine, exposing a REST API at http://localhost:11434. Browser agents can connect to it for LLM features without any cloud dependency.
Setup (one-time, user side)¶
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a model
ollama pull llama3.2
# Enable browser access (CORS)
OLLAMA_ORIGINS="*" ollama serve
Detection from agent¶
async function detectOllama() {
try {
const res = await fetch('http://localhost:11434/api/tags', {
signal: AbortSignal.timeout(2000)
});
if (!res.ok) return null;
const { models } = await res.json();
return models; // [{ name: 'llama3.2', size: 2048000000, ... }]
} catch {
return null; // Ollama not running
}
}
API usage (OpenAI-compatible)¶
// Chat completion
const response = await fetch('http://localhost:11434/v1/chat/completions', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
model: 'llama3.2',
messages: [
{ role: 'system', content: 'You are a helpful code reviewer.' },
{ role: 'user', content: `Review this code:\n${code}` }
],
stream: true
})
});
// Stream response
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value);
// Parse SSE chunks, update UI
}
SDK integration¶
// @freeagentstore/sdk
import { useOllama } from '@freeagentstore/sdk/hooks';
function MyAgent() {
const { available, models, chat, generate } = useOllama();
if (!available) {
return <p>Install Ollama for enhanced AI features</p>;
}
return (
<div>
<p>Local models: {models.map(m => m.name).join(', ')}</p>
<button onClick={() => chat('Explain this code...')}>
Ask local LLM
</button>
</div>
);
}
Enhancement patterns¶
Pattern 1: Browser model + Ollama refinement¶
Input → Whisper (browser, fast) → raw transcript
→ Ollama (local, smart) → cleaned + summarized transcript
Pattern 2: Ollama as fallback for WebGPU¶
if (navigator.gpu && modelFitsInVRAM) {
// Use WebGPU (fastest, works offline)
result = await webgpuModel.generate(input);
} else if (ollamaAvailable) {
// Use Ollama (local, any model size)
result = await ollama.chat(input);
} else {
// WASM fallback (slow but works everywhere)
result = await wasmModel.generate(input);
}
Pattern 3: Ollama for tasks too big for browser¶
| Task | Browser model | Ollama enhancement |
|---|---|---|
| OCR | Florence-2 extracts text | LLM answers questions about document |
| Translation | NLLB translates | LLM adapts tone/formality |
| Code lint | ESLint finds issues | LLM explains + suggests fixes |
| Data analysis | Chart generation | LLM narrates insights |
| Image caption | CLIP/Florence describes | LLM writes alt text / SEO description |
Privacy model¶
| Component | Where data goes |
|---|---|
| Browser model (WebGPU/WASM) | Never leaves tab |
| Ollama | Never leaves machine |
| FreeAgentStore servers | Never see user data |
| HuggingFace CDN | Model download only (no user data) |
This is the strongest privacy story in the market. No cloud AI provider sees any user data, ever.
Agents that benefit most from Ollama¶
- Code Review Agent — ESLint catches syntax, Ollama catches logic issues
- Document Q&A — OCR extracts text, Ollama answers questions
- Writing Assistant — Grammar check in browser, style suggestions via Ollama
- Data Narrator — Charts in browser, natural language insights via Ollama
- Email Drafter — Templates in browser, personalization via Ollama
- Study Buddy — Flashcards generated by Ollama from uploaded material