Architecture Overview¶
System diagram¶
User's Browser (GPU/CPU)
├── Agent App (React + Vite, hosted on R2)
│ ├── Web Worker (off-main-thread inference)
│ │ ├── WebGPU runtime (GPU-accelerated, 60 tok/s)
│ │ └── WASM fallback (CPU, 10-20 tok/s)
│ ├── Model Cache (Cache Storage API, survives sessions)
│ ├── Result Storage (IndexedDB)
│ └── Optional: WebContainers (Node.js in browser)
│ ├── npm packages
│ ├── Build tools
│ └── CLI tools
├── Optional: Local Ollama (localhost:11434)
│ └── Full LLM inference (user's own models)
└── Optional: Service Worker (offline support)
Cloud Infrastructure (same as FAS)
├── Host Worker (*.freeagentstore.online → R2)
├── API Worker (api.freeagentstore.online)
│ ├── Auth (GitHub OAuth)
│ ├── KV storage
│ ├── Rooms (real-time)
│ └── Registry
├── Agent Worker (agent.freeagentstore.online)
│ └── VibeCode — AI builds agents from description
├── Publisher Worker (publish.freeagentstore.online)
├── Admin Worker (admin.freeagentstore.online)
├── Store Site (freeagentstore.online)
│ └── Static HTML from registry.json
├── D1 Database (routes, users, sessions)
├── R2 Bucket (fags-agents)
└── GitHub Org (freeagentstore-online → renamed to FreeAgentStore)
Key principle: browser is the runtime¶
Unlike every other agent marketplace that charges for compute, our agents run on the user's hardware. The store's infrastructure only handles:
- Hosting — serve the app's HTML/JS/CSS from R2 (pennies)
- Auth — GitHub OAuth for creators (existing pattern)
- Registry — which agents are published (existing pattern)
- Discovery — store site with search/categories (existing pattern)
All AI inference happens client-side. The model downloads once and caches in Cache Storage (survives browser restarts). Subsequent uses are instant, offline-capable.
Four runtime layers (all in-browser)¶
Layer 1: AI Inference (WebGPU/WASM)¶
Core capability. Every agent runs at least one AI model client-side.
| Technology | Role | Performance |
|---|---|---|
| Transformers.js v4 | HuggingFace model runner | 60 tok/s (3B model, WebGPU) |
| ONNX Runtime Web | Generic ONNX inference | Near-native via WebGPU |
| WebLLM | Chat/text generation | 30-70 tok/s |
| kokoro-js | Text-to-speech | Real-time audio (proven in bepub) |
Layer 2: Node.js Runtime (WebContainers)¶
Optional. For agents that need npm packages, build tools, or server-like logic.
| Technology | Role | Limitation |
|---|---|---|
| WebContainers (StackBlitz) | Full Node.js in browser | Needs SharedArrayBuffer (no Safari) |
| Nodebox (CodeSandbox) | Node.js alternative | Works in Safari, beta |
Use cases: code linting agents, build tool agents, test runner agents, file processing agents.
Layer 3: Browser Automation (iframe DOM)¶
Optional. For agents that manipulate web content.
Instead of Playwright/Puppeteer (which need a server), agents load target content in an iframe and manipulate it directly via DOM APIs:
iframe.contentDocumentfor same-origin contentpostMessagebridge for cross-origin communication- MutationObserver for watching changes
- Direct click/fill/extract via JavaScript
This is how FAS Quality Reporter already works — a lightweight client-side "automation" layer via iframe + postMessage.
Layer 4: Local LLM (Ollama)¶
Optional. For power users who run Ollama locally.
- Ollama exposes REST API at
localhost:11434 - OpenAI-compatible API format
- Agent detects Ollama availability, offers enhanced features
- No server cost — user's own hardware
- CORS: user sets
OLLAMA_ORIGINS="*"
Data flow for a typical agent¶
User opens transcriber.freeagentstore.online
↓
Host Worker serves React app from R2 (< 500KB)
↓
App checks Cache Storage for Whisper model
├── Cached? → Ready instantly
└── Not cached? → Download from HuggingFace CDN (~240MB, one-time)
└── Show progress bar, cache for next time
↓
User drops audio file
↓
App sends audio to Web Worker
↓
Web Worker runs Whisper inference (WebGPU → WASM fallback)
↓
Transcription returned to main thread
↓
User sees result, can copy/download
↓
Optional: save to IndexedDB for later
Zero server calls for inference. Zero cost per user. Privacy preserved.