/ freelance build AI Product · Freelance project, built after leaving full-time role · 2026

AI Chat System with RAG

Built on my own to go deeper into AI engineering. An AI companion app with memory, agents, and proactive outreach. Provider-agnostic LLM layer, 3-channel hybrid memory search.

Impact

Solo build

Architecture to staging deployment

Validated

Full RAG pipeline with hybrid search + memory extraction

Validated

Provider-agnostic LLM layer, reusable across projects

Built this after leaving my full-time role to go deeper into AI engineering. Core architecture complete and deployed to staging. Project on hold due to client funding constraints.

Key lesson: The architecture is solid but unvalidated by real users. If restarting, I would run a concierge MVP first (manually curating memories for a small group of beta users) before building any infrastructure.

Problem

A client wanted an AI companion chat app with memory. Users chat with an AI character, and the system remembers past conversations, extracts facts, and uses them to give contextually relevant responses over time. The roadmap also called for AI agents that could use tools (Google Search, Maps), and push notifications for proactive check-ins.

Key Decisions

Why rebuild from a Next.js prototype to a full production stack?

V1 was a rapid prototype (Next.js + Supabase Edge Functions + Gemini + OpenAI embeddings) to validate the core concept. It worked, but the roadmap required AI agents with tool execution and persistent background job processing. Edge Functions are stateless and short-lived, which does not fit agent workflows or persistent job polling. V2 split into: a Fastify API on Cloud Run for request-response, and a pg-boss worker on GCE for background jobs.
Why Gemini, and why make the LLM layer provider-agnostic?

The LLM layer is abstracted behind a provider interface, so switching models is a configuration change. Gemini was the starting provider because it offered built-in grounding tools (Google Search, Maps) at a relatively low per-token cost. OpenAI was kept specifically for embeddings (text-embedding-3-small).
How does memory retrieval work?

Three-channel hybrid search running in parallel: Channel A is vector similarity (cosine distance via pgvector), Channel B is full-text trigram search (pg_trgm), Channel C is daily-summary similarity. Results from A and B are merged by ID, then ranked by a composite score that weights semantic similarity and recency decay. Parameters were tuned through scripted batch testing + manual evaluation, not yet fully optimized at scale. Memory extraction uses LLM-based fact extraction with deduplication and contradiction detection via cosine similarity thresholds.
Why Flutter instead of staying with Next.js?

The product needed native mobile apps (iOS and Android) with push notifications via Firebase Cloud Messaging for proactive outreach. Flutter delivered both platforms from a single Dart codebase, with Riverpod for state management and native Firebase integration.

Architecture

Project on hold. Live demo not available. Architecture deployed to staging.

Flutter App (iOS / Android)
  |
  v
Fastify API (Cloud Run, auto-scales to zero)
  |-- Auth: Supabase JWT verification
  |-- Chat: embed query > 3-channel hybrid search (vector + trigram + daily summaries)
  |          > composite ranking (semantic + recency weighted)
  |          > augment prompt > Gemini generation > SSE stream to client
  |-- Agents: tool registry, function calling, job tracking
  |-- Characters, threads, memories, onboarding, profile
  |
  v
pg-boss Worker (GCE e2-small, Container-Optimized OS)
  |-- Persistent job polling (PostgreSQL-backed queue)
  |-- Embedding generation (OpenAI text-embedding-3-small)
  |-- Memory extraction: LLM-based fact extraction
  |     Categories: fact, preference, event, person, place, relationship
  |     Dedup + contradiction detection via cosine similarity
  |-- Daily summary generation
  |-- Proactive check-in evaluation
  |-- Agent job execution
  |
  v
Supabase PostgreSQL
  |-- public schema (RLS, client-facing)
  |-- private schema (worker-only)
  |-- pgvector (cosine distance for similarity search)
  |-- pg_trgm (trigram full-text search)
  |-- Storage

Shared Package (@migo/shared, pnpm monorepo)
  |-- LLM provider abstraction (Gemini, extensible)
  |-- Embedding, memory search, extraction modules
  |-- DB connection pools

Transferable Pattern

Building AI systems with persistent memory and provider-agnostic architecture. Applicable to any product that needs personalized AI interactions: customer-service bots that remember past issues, sales assistants that know the client's history, or internal tools that learn from usage patterns.

Tech Stack

FlutterFastifyTypeScriptGoogle Cloud RunGCEpg-bossSupabasepgvectorpg_trgmGeminiOpenAI EmbeddingsFirebasepnpm Monorepo