Accepting new 2025 CodeWheel AI engagements for AI web, security, and commerce programs.

Services / AI Agents & MCP

Autonomous AI agents and MCP servers built for production, not demos

We build AI agents and Model Context Protocol (MCP) servers with authentication boundaries, audit logging, and security testing from day one. Agents that reason deeply, call tools safely, and survive investor diligence—not just work in demos.

We focus on single agents that reason deeply, call tools safely, and integrate into your platform. When a prompt is enough we'll say so—when agents are required, we build them to withstand audits.

Review AI security baseline

Fit check

You’re a fit if…

Production stakes

Agents will touch real systems (billing, support, data pipelines) and must be auditable with rollback and RBAC.

Retrieval + tools

You need semantic search/RAG plus tool calls in one flow, with clear tenant filters and schema validation.

One owner

You want architecture diagrams, MCP server code, and security testing handled by the same person.

What the diagram covers

Agent → MCP server → tools/APIs → RAG/semantic search → response, with checkpoints for auth scopes, tenant filters, logging, and human review. You get the diagram and the implementation notes.

Ready to explore agent feasibility?

Map your agent workflow

Share your workflow in a 30-minute session. I'll map agent architecture, tool requirements, guardrails, and platform touchpoints so you know exactly what to build (or if a prompt will do).

Review broader platform build

What are AI agents?

Practical explanation for business buyers

AI agents are LLM-powered workflows that keep state, call tools, and make decisions across multiple steps (schedule, summarize, update systems). They aren't always the right fit-simple prompts often work. Agents make sense when you need long-running automation, structured tool usage, and human approvals. We keep agents grounded in real objectives, RAG knowledge bases, and clear guardrails so they behave in production.

When you actually need agents

  • You have multi-step workflows where a single prompt can't manage state or secure MCP tool usage.
  • You need human-in-the-loop approvals with audit trails.
  • You must chain RAG retrieval, tool execution, and policy checks without brittle scripts.
  • You require long-running processes (hours/days) orchestrated across tasks.

Use cases we build

Common use cases for custom AI agents
Common use cases for custom AI agents including automated workflows, data analysis, customer support, and system integrations.
Customer support automation
Agents triage tickets, summarize context, and draft responses with human approval.
Document analysis & processing
Autonomous workflows digest PDFs, extract structured data, and push into CRMs or ERPs.
Research & data gathering
Agents crawl knowledge bases, run retrieval, and compile briefings for internal teams.
Workflow automation
Connect Jira, GitHub, Stripe, or HubSpot so agents update systems after reasoning steps.

Integrate agents into the full platform

Agents rarely live alone. They rely on RAG knowledge bases, traditional SaaS features, and secure identity/billing layers. We build the connective tissue so agents are part of your AI platform, not a sidecar demo.

RAG & knowledge base integration
Agents tap into multi-tenant RAG pipelines, retrieval APIs, and eval harnesses that we already deploy. Simplifies agentic RAG and keeps citations accurate.
Tooling & orchestration stack
LangChain, Anthropic Claude, OpenAI GPT, custom orchestrators, Inngest/Temporal, and MCP servers all wired with RBAC, logging, and guardrails described above.

Next.js development with secure MCP integration

AI agents need authentication architecture, not just UI polish. We implement Clerk orgs, OAuth scopes, and per-tool API keys so agents respect tenant boundaries and survive security reviews.

Next.js authentication patterns
Clerk orgs with RBAC per MCP tool, JWT validation in agent middleware, session management for long-running tasks, OAuth refresh flows for third-party APIs.
Why it matters
Most Next.js shops treat AI integration as an afterthought. We design auth architecture first so MCP servers plug into the platform securely.

Vercel Hosting & Deployment for Production AI Agents

Vercel hosting handles MCP servers well-but agents need deployment patterns tuned for external tool calls, streaming responses, and sensitive data.

Vercel deployment details
Separate projects for agent APIs vs dashboards, env var management for 20+ LLM keys, build-time validation of MCP tool schemas, custom deployment hooks testing prompt injection defenses.
Why most deployments fail
Agents introduce new runtime considerations—prompt injection defenses, rate limiting, API key scoping, audit logging, and sandboxed execution. We design deployment pipelines that account for these constraints so launches don't get blocked at security review.

Case studies

Real projects: What we're building and what we've learned

We're an early-stage consulting practice focused on production-grade AI platforms with security testing from day one. These case studies show actual projects—including in-progress work—so you can see how we approach architecture, security, and deployment.

Completed · MCP Platform

Multi-tenant MCP API platform

Project overview: Built a multi-tenant SaaS platform with MCP server integration, Clerk authentication, rate limiting, and production infrastructure on Vercel with comprehensive security testing.

Technical challenges

  • Row-level security for multi-tenant data isolation
  • Redis rate limiting per tenant + per tool
  • MCP server integration with typed auth boundaries
  • 2,000+ automated tests from the first sprint

Architecture decisions

  • Next.js App Router with TypeScript strict mode
  • Clerk for org-level authentication and RBAC
  • Postgres + pgvector for semantic search
  • Docker sandboxes for untrusted tool execution

Key outcomes

  • Production deployment with zero downtime
  • Auth boundaries prevent tenant data leakage
  • Rate limiting handles API abuse gracefully
  • Preview environments for every pull request

8 weeks

Build time

2,000+

Security tests

Multi-tenant

Architecture

Building in public: Demonstrates how we handle multi-tenant architecture with security testing from day one.

In progress · RAG

AI-powered SEO analysis platform

Project overview: Real-time keyword research platform using Claude and OpenAI, built with a Next.js frontend and Python API backend. Currently in active development.

Technical challenges

  • Live sitemap crawling and keyword extraction
  • Multi-LLM inference orchestration
  • Async Python workers for crawling + scoring
  • Intelligent caching to reduce API costs

Architecture decisions

  • Next.js 14 UI with RSC
  • Python FastAPI backend for crawling logic
  • Playwright for automated testing
  • Vercel deployment with edge functions

Current status

  • Core keyword analysis features deployed
  • Sitemap scanning in production preview
  • Multi-LLM inference pipeline operational
  • Security testing integrated into CI/CD

Building in public: Shows our approach to multi-LLM orchestration and real-time data processing.

Internal project · Modernization

Rails 5.2 → 7.1 platform modernization

Project overview: Internal capability project to modernize a legacy Rails 5.2 platform. Project was paused before production launch when priorities shifted, but delivered a complete upgrade path, schema cleanup, and working prototype demonstrating modern Rails best practices.

Technical challenges

  • Legacy Rails 5.2 codebase with years of technical debt
  • Database schema inconsistencies accumulated over multiple versions
  • No test coverage for critical business logic
  • Outdated dependencies blocking security patches

Modernization approach

  • Upgraded Rails 5.2 → 7.1 with incremental version jumps
  • Normalized schema and added missing foreign keys
  • Built RSpec suite covering auth and core workflows
  • Implemented Postgres row-level security for tenant isolation
  • Modernized deployment pipeline with CI/CD automation

Deliverables

  • Working Rails 7.1 prototype with clean schema
  • Comprehensive upgrade documentation for future launch
  • Security improvements including row-level access controls
  • Automated tests enabling confident refactoring

Rails 7.1

Final version

Schema

Normalized

RSpec

Test suite

Rails 5.2 → 7.1 PostgreSQL RSpec Row-level security CI/CD

Project status: Paused before production launch when priorities shifted. Demonstrates our Rails modernization expertise and provides a proven upgrade path for similar legacy platforms.

Transparency note: We're an early-stage consulting practice building in public. These case studies represent real projects—including in-progress work—to show our technical approach and lessons learned.

Engagement flow

  1. 1. Workflow mapping. Identify user journeys, desired automation, and human-in-the-loop steps.
  2. 2. Tool surface design. Define MCP tools/functions, schemas, auth scopes, and failure modes.
  3. 3. Build & guard. Implement agents, prompts, tool servers, and guardrails. Wire logging + alerts.
  4. 4. Hardening & rollout. Prompt-injection testing, pen tests, telemetry dashboards, and staged rollout.
Use cases we deliver
  • Customer support copilots with CRM integrations
  • DevOps + SRE assistants (runbooks, PagerDuty, GitHub)
  • Vibe coding pipelines (Cursor/VScode plug-ins, tool APIs)
  • Billing + finance agents (Stripe, Netsuite, QuickBooks)
  • Internal knowledge copilots with RAG + workflows

MCP Server Architecture & Tool Development

Typed tools, rate limiting per tenant, runtime sandboxes, and audit logging make Model Context Protocol servers production-ready.

Technical stack
Next.js 16+ with App Router, Vercel Edge functions for low-latency routing, Supabase Postgres + pgvector for agent memories, Redis/Upstash for queues, Docker sandboxes for code execution, PostHog instrumentation for every tool invocation.
Security testing
Our RAG security testing approach applies to MCP servers. We attack tool definitions with malicious prompts, test privilege escalation, and validate audit logs.

AI Agent development faq

Common questions

Honest answers to the questions founders ask before building agents and MCP servers.

How do AI agents differ from simple prompts?
Prompts are great for single-turn responses. Agents maintain state, call tools, log actions, and support human approvals. I recommend agents only when workflows truly need those behaviors.
What is the typical timeline for agent development?
Most production-ready agent builds take 6-8 weeks including discovery, tool design, guardrails, observability, and penetration testing. Smaller pilots can ship faster if prerequisites exist.
How do you secure agent tool access?
Every MCP tool has RBAC, OAuth scopes, per-tool API keys, rate limiting, prompt-injection defenses, and logging to SIEM/PostHog. We also add runtime sandboxes when tools execute code.
Do you build custom MCP servers or use existing ones?
Both. I build custom MCP servers when you need typed schemas, tenant-aware logic, or internal APIs. I also integrate Anthropic Claude MCP or LangChain routers when they fit.

Launch secure AI agents

Ready to build autonomous agents that survive production? Book a call to map use cases, tool requirements, and guardrail expectations.

For a deeper look at runtime patterns and guardrails, see our complete guide to AI agent architecture.

Serving companies across the San Francisco Bay Area, Silicon Valley, and remote teams worldwide.